DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

Quality: 8/10 Relevance: 9/10

Summary

Agent-skills-eval is a test runner for the Agent Skills ecosystem that empirically tests whether SKILL.md improves outputs. It compares with_skill vs baseline outputs using a judge model, producing a workspace, artifact reports, and YAML/CLI tooling for CI. It aims to provide evidence-backed validation for skill improvements across OpenAI-compatible models and various runtimes.

🚀 Service construit par Johan Denoyer