Benchmarking OpenTelemetry: Can AI trace your failed login?
Summary
OTelBench benchmarks OpenTelemetry instrumentation by evaluating 14 frontier LLMs on 23 tasks across 11 languages. The top models perform poorly (best around 29% success), highlighting real gaps in AI-assisted SRE tooling and the need for polyglot benchmarks and open-source evaluation.