Similarly, when you write int i = j + k to add 2 integer numbers, you could have added four or eight numbers instead, with corresponding SSE2 or AVX2 instructions. The processor could have added four float numbers to another four numbers, or even eight numbers to another eight numbers if that processor supports AVX. If you want to maximize performance, you need to write code tailored to these vectors.Įvery time you write float s = a + b you’re leaving a lot of performance on the table. Unlike scalar processors, which process data individually, modern vector processors process one-dimensional arrays of data. After all, that’s one of the major reasons why we still pick C or C++ language these days.Īll modern processors are actually vector under the hood. Many developers write software that’s performance sensitive. For the cases presented in this blog post, vectorization improved performance by a factor of 3 to 12.
When done right, supplementing C or C++ code with vector intrinsics is exceptionally good for performance.