DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Training an LLM in Swift, Part 1: Taking matrix multiplication from Gflop/s to Tflop/s

Quality: 9/10 Relevance: 9/10

Summary

A detailed performance optimization study of training an LLM in Swift on Apple Silicon, comparing C, Swift, and Metal implementations across multiple optimization techniques (MutableSpan, Relaxed math, and AMX/GPU tiling). Demonstrates substantial speedups from basic Swift to tiled Metal, with final results approaching CPU/GPU acceleration and a plan for future library-based approaches.

🚀 Service construit par Johan Denoyer