Target 1: Baseten
Summary
SAIL documents system-level optimizations for Baseten's Orpheus-TTS deployment, achieving near 10x concurrency and major cost savings without changing model weights. The report emphasizes a holistic pipeline approach, including pin_memory fixes, 2D batching, async scheduling, penalty refactors, and pipeline tuning, resulting in stable latency and higher throughput under load.