Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File
Summary
Wax introduces a serverless, on-device memory layer for AI agents that stores all data in a single .mv2s file, enabling full RAG without external infrastructure. It showcases sub-millisecond vector search on Apple Silicon using Metal, with benchmarks comparing CPU and GPU performance, and emphasizes privacy, determinism, and portability. The project provides a quick-start guide, architectural details, and deployment requirements, positioning itself as a minimal, offline memory solution for AI apps.