Trinity Large: An open 400B sparse MoE model
Summary
Trinity Large is a 400B parameter sparse MoE model with 256 experts and 4 active per token, shipped with Preview, Base, and TrueBase checkpoints. The post details the architecture, training at scale on a large GPU cluster, data curation, and performance benchmarks, while also highlighting open-access hosting, cost considerations, and the distinction between non-reasoning Preview and future reasoning variants.