 Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

# SIMD-Enabled Vector Types with C#

In the first article in this two-part series on working with SIMD-enabled vector types with C#, I explained how to install and configure the necessary components to make Microsoft's new JIT, RyuJIT, generate SIMD instructions from your C# code with fixed vector types. Here, I explain the different operations that map to SIMD instructions for fixed vector types. I also provide examples of more-advanced scenarios in which you can use hardware-dependent vector types that adjust their number of elements (based on the capabilities of the underlying hardware) and allow you to work with other data types than `float`.

### Methods and Operations that Generate SIMD Instructions

The three fixed-sized vectors (`Vector2f`, `Vector3f`, and `Vector4f`) with different numbers of single-precision floating-point elements define operators and methods that generate SIMD instructions optimized to perform operations on packed floating points. If you've worked with SIMD intrinsics in C or C++, you will be able to take advantage of your existing knowledge in C#. But instead of coding with SIMD intrinsics, you can use the operators and methods provided by the fixed-sized vectors and have RyuJIT generate optimized SIMD instructions.

The documentation for fixed vector types included in `Microsoft.Bcl.Simd` is really very poor. So here I provide a summary with the operators and methods of these vector types. I include sample C# code and the main SIMD instructions that each operator or method generates with RyuJIT. I also include the equivalent SIMD intrinsics in case you have experience with their use in the Visual C++ compiler or Intel C/C++ Compiler. This way, you will know all the optimized operations you can use with the vectors and can write your algorithms using them. Don't forget that hardware-dependent vector types will allow you to work with a higher number of elements per SIMD instruction on capable hardware. In addition, the examples will be useful when you work with vectors that pack types other than single-precision floating point.

For each code sample, consider that the following lines define two `Vector3f` instances:

```var vector1 = new Vector3f(x: 5f, y: 15f, z: 25f);
var vector2 = new Vector3f(x: 3f, y: 5f, z: 8f);```

The following operations take advantage of SIMD instructions:

• `-` operator or `Subtract` methods: They use the `SUBPS` instruction (Subtract Packed Floating Point Floating Point Values), equivalent to the `_mm_sub_ps` instrinsic. Sample lines that generate the `SUBPS` instruction:
```Var vector3 = vector2 - vector1;
var vector4 = Vector3f.Subtract(vector2, vector1);```
• `*` operator or `Multiply` methods: They use the `MULPS` instruction (Multiply Packed Floating Point Floating Point Values), equivalent to the `_mm_mul_ps` intrinsic. Sample lines that generates the `MULPS` instruction:
```ar vector3 = vector1 * vector2;
var vector4 = Vector3f.Multiply(vector1, vector2);```
• `/` operator or `Divide` methods: They use the `DIVPS` instruction (Divide Packed Floating Point Floating Point Values), equivalent to the `_mm_div_ps` intrinsic. Sample code that generates the `DIVPS` instruction:
```var vector3 = vector1 / vector2;
var vector4 = Vector3f.Divide(vector1, vector2);```
• `+` operator or `Add` methods: They use the `ADDPS` instruction (Add Packed Floating Point Floating Point Values), equivalent to the `_mm_add_ps` instrinsic. Sample code that generates the `ADDPS` instruction:
```var vector3 = vector1 + vector2;
• `==` operator or `Equals` methods: They use the `CMPEQPS` instruction (Compare Packed Floating Point Floating Point Values), equivalent to the `_mm_cmpeq_ps` intrinsic. Sample code that generates the `CMPEQPS` instruction:
`var areEqual = (vector1 == vector2);`
• `!=` operator: It also uses the `CMPEQPS` instruction explained for the `==` operator. Sample code that generates the `CMPEQPS` instruction:
`var areNotEqual = (vector1 != vector2);`
• `CopyTo` method: It uses both the `MOVAPS` (Move/Load Aligned Packed Floating Point Floating Point Values) and `MOVUPS` (Move/Load Unaligned Packed Floating Point Floating Point Values) instructions. These instructions are equivalent to the `_mm_load_ps` and `_mm_loadu_ps` intrinsics. In previous versions of RyuJIT and `Microsoft.Bcl.Simd`, the `CopyTo` method didn't take advantage of these SIMD instructions and generated a big distortion when measuring performance improvements in the SIMD-improved version of the code. Starting with CTP4, `CopyTo` has been improved to use `MOVAPS` and `MOVUPS`. Sample code that generates the `MOVAPS` and `MOVUPS` instructions:
```var array = new float;
vector1.CopyTo(array);```

The `VectorMath` class provides math functions that operate on vectors and generate optimized SIMD intrinsics. The math functions are useful for vectors with both a fixed size and a hardware-dependent size. I include sample C# code and the main SIMD instructions that each `VectorMath` method generates with RyuJIT. However, take into account that, in some cases, the generated SIMD instructions don't use the best instructions (that would reduce the number of required instructions to perform the math operation on the packed types). Newer versions might produce better optimizations and the use of more specific SIMD instructions.

• `Max`: It uses the `MAXPS` instruction (Return Maximum Packed Single Precision Floating Point Values), equivalent to the `_mm_max_ps` intrinsic. Sample code that generates the `MAXPS` instruction:
`var vector3 = VectorMath.Max(vector1, vector2); `
• `Min`: It uses the `MINPS` instruction (Return Minimum Packed Single Precision Floating Point Values), equivalent to the `_mm_min_ps` intrinsic. Sample code that generates the `MINPS` instruction:
`var vector3 = VectorMath.Min(vector1, vector2); `
• `SquareRoot`: It uses the `SQRTPS` instruction (Compute Square Roots of Packed Single Precision Floating Point Values), equivalent to the `_mm_sqrt_ps` instrinsic. Sample code that generates the `SQRTPS` instruction:
`var vector3 = VectorMath.SquareRoot(vector1);`
• `Abs`: It uses many SIMD instructions including `MOVSS`, `SHUFPS`, `MOVAPS`, and `ANDPS` to calculate the absolute value for all the elements of the vector. Sample code that generates many SIMD instructions to calculate the absolute value for all the elements of a vector:
`var vector3 = VectorMath.Abs(vector1);`
• `DotProduct`: It uses many SIMD instructions including `MULPS`, `MOVAPS` and `ADDPS` to calculate the dot product, also known as scalar product, of two vectors. Sample code that generates many SIMD instructions to calculate the dot product:
`var dotProduct = VectorMath.DotProduct(vector1, vector2);`

### More Insights To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

## Featured Reports ## Featured Whitepapers 