Making Julia as Fast as C++
Summary
The article documents a journey from a baseline, readable Julia implementation to a highly optimized version that rivals or surpasses C++ performance for a vortex particle method. It covers concrete optimization steps (strong typing, reducing allocations, avoiding LinearAlgebra, unrolling, and advanced Julia features) and compares against C++ benchmarks, including fast-math variants. The overall takeaway is that careful data layout and C++-style optimizations can make Julia HPC code highly competitive.