SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

February 16, 2026 at 21:15

Quality: 9/10 Relevance: 9/10

Summary

The paper introduces SkillsBench, a benchmark to measure agent Skills across 86 tasks in 11 domains. It finds curated Skills significantly boost performance, while self-generated Skills do not help on average. The results suggest focusing on small, modular Skills can outperform broad documentation and enable smaller models to match larger ones with Skills.

Read Original Article