Show HN: A new benchmark for testing LLMs for deterministic outputs
Summary
The article introduces the Structured Output Benchmark (SOB), a benchmark designed to evaluate LLMs on producing deterministic, well-structured outputs across text, image, and audio sources. It emphasizes separating schema parsing from value grounding, presents seven evaluation metrics, and reveals a unified leaderboard showing gaps between JSON parsing success and leaf-value accuracy. The work aims to push toward more reliable, production-ready structured data extraction from diverse inputs.