I was reading Owen Shepherd’s post “{n} times faster than C”, which explores how to hand-tune x86-64 assembly to make a certain problem faster (see below). Originally, this inspired me to write a short introduction to using Rust’s portable SIMD to manually speed up problems like this. I rewrote the problem in Rust (of course), used explicit SIMD, and observed a substantial speed-up.