Diagnosis and Remediation of SLO Violations via Telemetry in Programmable Networks

Abstract: The operation requirements for many of today’s high-performance networks are expressed as service-level objectives (SLOs), i.e., precise guarantees, often in latency, bandwidth, and jitter, which a user can expect from the network. For operators, monitoring their compliance with SLOs and quickly diagnosing any violations is a critical element of effective operations. Unfortunately, existing network architectures are not designed for this purpose. There is no mechanism, for example, for the operator to monitor the latency at the 95th percentile experienced by a customer. Data plane programming and, more specifically, in-band telemetry (INT) enables measurements with unprecedented accuracy and precision but imposes the challenge of maintaining low and practical monitoring overheads. Another underexplored key aspect is how to react promptly to diagnosed violations. As an illustration, several techniques proposed for fast recovery/rerouting focus on resisting link failures and not on adapting paths in the face of traffic fluctuations that lead to SLO violations. This represents a more complex problem (given its dynamics), which also has the potential to be solved elegantly via the new technological framework mentioned. This project aims to explore telemetry in programmable networks to detect, diagnose, and remediate SLO violations. The problem will be approached from two perspectives: experimental and theoretical. In the experimental part, it is proposed to design, implement, and evaluate a system that, based on INT, introduces in-band computing of network indicators, selective generation of reports, and fast routing of flows (via coordination of actions to be performed in the data and control planes). In the theoretical part, it is proposed the modeling of exact/heuristic solutions that allow optimized online decisions about which flows will be responsible and what (/when) information will need to be collected through monitoring.

Ano Inicio: 2021

Ano Fim: 2023

Coordenador Local: Luciano Paschoal Gaspary

Agência de Fomento: CNPq