How we index images for RAG
Summary
Kapa.ai describes a scalable approach to multimodal retrieval for RAG by indexing images at ingest time. Instead of feeding images to the model at query time, each image is described by a text caption produced by a vision-language model and stored as text alongside text chunks. This one-time processing reduces per-query cost and improves answer quality, especially for load-bearing figures and charts.