SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Summary
The paper introduces SkillsBench, a benchmark to measure agent Skills across 86 tasks in 11 domains. It finds curated Skills significantly boost performance, while self-generated Skills do not help on average. The results suggest focusing on small, modular Skills can outperform broad documentation and enable smaller models to match larger ones with Skills.