Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

May 7, 2026 at 06:12

Quality: 8/10 Relevance: 9/10

Summary

Agent-skills-eval is a test runner for the Agent Skills ecosystem that empirically tests whether SKILL.md improves outputs. It compares with_skill vs baseline outputs using a judge model, producing a workspace, artifact reports, and YAML/CLI tooling for CI. It aims to provide evidence-backed validation for skill improvements across OpenAI-compatible models and various runtimes.

AI Tools LLM & Prompting

Read Original Article