OSS Software and Ecosystem Engineer @Diagrid / Java Champion / Cloud Native Ambassador


My OpenTelemetry journey spans three distinct chapters: THE GOOD (Collective Minds): OTEL helped us debug slow calls to our medical imaging PACS system. When radiological images took forever to load, manual tracing showed us exactly where the bottleneck was. Perfect for targeted debugging. THE BAD (FrankenPHP Performance Talk): I added auto-instrumentation to measure everything in production. My throughput dropped from 2,519 to 1,400 req/s—OTEL consumed 44% of my system capacity. But it also helped me find database connection issues I didn't know existed. THE 44% (This Talk): After discovering the overhead, I researched mitigation strategies inspired by talks from companies like Glovo at CNCF events. Through testing across PHP, Node.js, C#, and Go, I found practical solutions: sampling strategies, batch optimization, and selective instrumentation. This isn't "OTEL is good" or "OTEL is bad"—it's "OTEL is powerful but expensive, here's exactly what it costs and how to optimize that trade-off." You'll leave with real production numbers, proven mitigation strategies, and a decision framework for when to instrument everything vs. when to be selective. Real metrics. Real problems. Real solutions.
Senior Software Engineer