Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs
Summary
Agent-skills-eval is a test runner for the Agent Skills ecosystem that empirically tests whether SKILL.md improves outputs. It compares with_skill vs baseline outputs using a judge model, producing a workspace, artifact reports, and YAML/CLI tooling for CI. It aims to provide evidence-backed validation for skill improvements across OpenAI-compatible models and various runtimes.