Large genome model: Open source AI trained on trillions of bases
Summary
Ars Technica covers Evo 2, an open-source AI trained on genomes from bacteria, archaea, and eukaryotes using an OpenGenome2 dataset of 8.8 trillion bases. The model, built on a StripedHyena 2 CNN, learns conserved sequence patterns to identify genome features and perform zero-shot predictions, with potential for automated genome annotation and variant interpretation, while highlighting limitations and ethical considerations.