DigiNews

Tech Watch Articles

← Back to articles

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Quality: 9/10 Relevance: 9/10

Summary

The paper introduces SkillsBench, a benchmark to measure agent Skills across 86 tasks in 11 domains. It finds curated Skills significantly boost performance, while self-generated Skills do not help on average. The results suggest focusing on small, modular Skills can outperform broad documentation and enable smaller models to match larger ones with Skills.

🚀 Service construit par Johan Denoyer