Stable Audio 3
Summary
arXiv: Stable Audio 3 presents fast latent diffusion models for variable-length audio generation and editing, built on a semantic-acoustic autoencoder to maintain fidelity while enabling efficient diffusion. It features adversarial post-training to speed up inference and improve quality, with claims of running on consumer hardware and providing training/inference pipelines and model weights for small/medium configurations.