Foxhound: Server-Grade Observability for Network-Augmented Applications

Lucas Castanheira, Alberto Schaeffer-Filho, Theophilus A. Benson: Foxhound: Server-Grade Observability for Network-Augmented Applications. In: Proceedings of the Eighteenth European Conference on Computer Systems, pp. 18–32, Association for Computing Machinery, Rome, Italy, 2023, ISBN: 9781450394871.

Abstract

There is a growing move to offload functionality, e.g., TCP or key-value stores, into programmable networks - either on SmartNICs or programmable switches. While offloading promises significant performance boosts, these programmable devices often provide little visibility into their performance. Moreover, many existing tools for analyzing and debugging performance problems, e.g., distributed tracing, do not extend into these devices.Motivated by this lack of visibility, we present the design and implementation of an observability framework called Foxhound, which introduces a co-designed query language, compiler, and storage abstraction layer for expressing, capturing and analyzing distributed traces and their performance data across an infrastructure comprising servers and programmable data planes. While general, Foxhound's query language offers optimized constructs which can circumvent limitations of programmable devices by pushing down operations to hardware. We have evaluated Foxhound using a Tofino switch and a large scale simulator. Our evaluations show that our storage layer can support common tracing tasks and detect associated problems at scale.



BibTeX (Download)

@inproceedings{10.1145/3552326.3567502,
title = {Foxhound: Server-Grade Observability for Network-Augmented Applications},
author = {Lucas Castanheira and Alberto Schaeffer-Filho and Theophilus A. Benson},
url = {https://doi.org/10.1145/3552326.3567502},
doi = {10.1145/3552326.3567502},
isbn = {9781450394871},
year  = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {Proceedings of the Eighteenth European Conference on Computer Systems},
pages = {18–32},
publisher = {Association for Computing Machinery},
address = {Rome, Italy},
series = {EuroSys '23},
abstract = {There is a growing move to offload functionality, e.g., TCP or key-value stores, into programmable networks - either on SmartNICs or programmable switches. While offloading promises significant performance boosts, these programmable devices often provide little visibility into their performance. Moreover, many existing tools for analyzing and debugging performance problems, e.g., distributed tracing, do not extend into these devices.Motivated by this lack of visibility, we present the design and implementation of an observability framework called Foxhound, which introduces a co-designed query language, compiler, and storage abstraction layer for expressing, capturing and analyzing distributed traces and their performance data across an infrastructure comprising servers and programmable data planes. While general, Foxhound's query language offers optimized constructs which can circumvent limitations of programmable devices by pushing down operations to hardware. We have evaluated Foxhound using a Tofino switch and a large scale simulator. Our evaluations show that our storage layer can support common tracing tasks and detect associated problems at scale.},
keywords = {debugging, INC, programmable networks, telemetry, tracing},
pubstate = {published},
tppubtype = {inproceedings}
}