In the first article in this two-part series on working with SIMD-enabled vector types with C#, I explained how to install and configure the necessary components to make Microsoft's new JIT, RyuJIT, generate SIMD instructions from your C# code with fixed vector types. Here, I explain the different operations that map to SIMD instructions for fixed vector types. I also provide examples of more-advanced scenarios in which you can use hardware-dependent vector types that adjust their number of elements (based on the capabilities of the underlying hardware) and allow you to work with other data types than float
.
Methods and Operations that Generate SIMD Instructions
The three fixed-sized vectors (Vector2f
, Vector3f
, and Vector4f
) with different numbers of single-precision floating-point elements define operators and methods that generate SIMD instructions optimized to perform operations on packed floating points. If you've worked with SIMD intrinsics in C or C++, you will be able to take advantage of your existing knowledge in C#. But instead of coding with SIMD intrinsics, you can use the operators and methods provided by the fixed-sized vectors and have RyuJIT generate optimized SIMD instructions.
The documentation for fixed vector types included in Microsoft.Bcl.Simd
is really very poor. So here I provide a summary with the operators and methods of these vector types. I include sample C# code and the main SIMD instructions that each operator or method generates with RyuJIT. I also include the equivalent SIMD intrinsics in case you have experience with their use in the Visual C++ compiler or Intel C/C++ Compiler. This way, you will know all the optimized operations you can use with the vectors and can write your algorithms using them. Don't forget that hardware-dependent vector types will allow you to work with a higher number of elements per SIMD instruction on capable hardware. In addition, the examples will be useful when you work with vectors that pack types other than single-precision floating point.
For each code sample, consider that the following lines define two Vector3f
instances:
var vector1 = new Vector3f(x: 5f, y: 15f, z: 25f); var vector2 = new Vector3f(x: 3f, y: 5f, z: 8f);
The following operations take advantage of SIMD instructions:
-
operator orSubtract
methods: They use theSUBPS
instruction (Subtract Packed Floating Point Floating Point Values), equivalent to the_mm_sub_ps
instrinsic. Sample lines that generate theSUBPS
instruction:
Var vector3 = vector2 - vector1; var vector4 = Vector3f.Subtract(vector2, vector1);
*
operator orMultiply
methods: They use theMULPS
instruction (Multiply Packed Floating Point Floating Point Values), equivalent to the_mm_mul_ps
intrinsic. Sample lines that generates theMULPS
instruction:
ar vector3 = vector1 * vector2; var vector4 = Vector3f.Multiply(vector1, vector2);
/
operator orDivide
methods: They use theDIVPS
instruction (Divide Packed Floating Point Floating Point Values), equivalent to the_mm_div_ps
intrinsic. Sample code that generates theDIVPS
instruction:
var vector3 = vector1 / vector2; var vector4 = Vector3f.Divide(vector1, vector2);
+
operator orAdd
methods: They use theADDPS
instruction (Add Packed Floating Point Floating Point Values), equivalent to the_mm_add_ps
instrinsic. Sample code that generates theADDPS
instruction:
var vector3 = vector1 + vector2; var vector4 = Vector3f.Add(vector1, vector2);
==
operator orEquals
methods: They use theCMPEQPS
instruction (Compare Packed Floating Point Floating Point Values), equivalent to the_mm_cmpeq_ps
intrinsic. Sample code that generates theCMPEQPS
instruction:
var areEqual = (vector1 == vector2);
!=
operator: It also uses theCMPEQPS
instruction explained for the==
operator. Sample code that generates theCMPEQPS
instruction:
var areNotEqual = (vector1 != vector2);
CopyTo
method: It uses both theMOVAPS
(Move/Load Aligned Packed Floating Point Floating Point Values) andMOVUPS
(Move/Load Unaligned Packed Floating Point Floating Point Values) instructions. These instructions are equivalent to the_mm_load_ps
and_mm_loadu_ps
intrinsics. In previous versions of RyuJIT andMicrosoft.Bcl.Simd
, theCopyTo
method didn't take advantage of these SIMD instructions and generated a big distortion when measuring performance improvements in the SIMD-improved version of the code. Starting with CTP4,CopyTo
has been improved to useMOVAPS
andMOVUPS
. Sample code that generates theMOVAPS
andMOVUPS
instructions:
var array = new float[3]; vector1.CopyTo(array);
The VectorMath
class provides math functions that operate on vectors and generate optimized SIMD intrinsics. The math functions are useful for vectors with both a fixed size and a hardware-dependent size. I include sample C# code and the main SIMD instructions that each VectorMath
method generates with RyuJIT. However, take into account that, in some cases, the generated SIMD instructions don't use the best instructions (that would reduce the number of required instructions to perform the math operation on the packed types). Newer versions might produce better optimizations and the use of more specific SIMD instructions.
Max
: It uses theMAXPS
instruction (Return Maximum Packed Single Precision Floating Point Values), equivalent to the_mm_max_ps
intrinsic. Sample code that generates theMAXPS
instruction:
var vector3 = VectorMath.Max(vector1, vector2);
Min
: It uses theMINPS
instruction (Return Minimum Packed Single Precision Floating Point Values), equivalent to the_mm_min_ps
intrinsic. Sample code that generates theMINPS
instruction:
var vector3 = VectorMath.Min(vector1, vector2);
SquareRoot
: It uses theSQRTPS
instruction (Compute Square Roots of Packed Single Precision Floating Point Values), equivalent to the_mm_sqrt_ps
instrinsic. Sample code that generates theSQRTPS
instruction:
var vector3 = VectorMath.SquareRoot(vector1);
Abs
: It uses many SIMD instructions includingMOVSS
,SHUFPS
,MOVAPS
, andANDPS
to calculate the absolute value for all the elements of the vector. Sample code that generates many SIMD instructions to calculate the absolute value for all the elements of a vector:
var vector3 = VectorMath.Abs(vector1);
DotProduct
: It uses many SIMD instructions includingMULPS
,MOVAPS
andADDPS
to calculate the dot product, also known as scalar product, of two vectors. Sample code that generates many SIMD instructions to calculate the dot product:
var dotProduct = VectorMath.DotProduct(vector1, vector2);