Cognitive and Self-Adaptive System for Effective Distributed-Tracing (using Jaeger, Open Tracing)

With the advent of distributed systems and micro-services based architecture, end-to-end execution based dynamic API tracing systems have become an important tool for effective diagnosis of API failures and performance issues. However, current execution tracing system implementations available capture only a subset (only 1-5 %) of traces to manage storage and scale constraints effectively. The distribution of randomly sampled set is heavily skewed towards the normal/consistent execution-traces, missing out on unusual/interesting execution-traces required for the purposes of effective diagnosis of API-failures and Performance issues in the application and thus affecting both the Developers and SRE teams. We proposed a solution based on Machine Learning and Cognitive approach to remove this bias in collection of traces and storage and use self-adaptive method that can dynamically adapt based on actual data. The system can self-learn to capture the traces that are of higher interest (more variations in errors, warnings, response-codes etc.) and which can add value in finding the actual root cause of an issues, while maintaining the distribution ratio intact.
The solution has certainly proven to be a game changer for the SRE teams within the org in triaging issues with complex applications based on the logs, metrics and intelligent-traces with improved time to resolve or, MTTR. This approach is a forward-looking way to approach the common issues we as SRE face in the Observability and Infrastructure space and thus provides insights & paths that next generation of solutions SRE can adopt. In other words, we have used analytical methods in the pursuit of gaining efficient reliability work and make work/budget lighter.
Using the Adaptive Sampling approach with normal distributed tracing was a data-driven decision and proved effective because:

  • Standardises distribution of collected traces.
  • Reduces storage requirements by quite a lot and helps in COGS reduction signif.

Speaker

  • Susobhit Panigrahi
    Susobhit Panigrahi

    Susobhit works as a Developer and DevOps Engineer at VMware, focusing on scalable cloud software using microservices. He contributes to open-source projects, including VMware’s Xenon microservice architecture. Fascinated by Kubernetes and its role in deploying and managing production systems, Susobhit aims to learn and contribute to the open-source community. He has attended, spoken and organized multiple conferences like KubeCon, SRECon, KCD etc. in the past and would love to explore more to exchange ideas and knowledge with the larger community.

Date

Nov 20 2025
Expired!

Time

11:15 - 12:00

Location

Jacobi