Channels ▼


64-Bit SIMD Code from C#

If everything is set up correctly, right-click on the project name in Solution Explorer and select Properties | Build. Select x64 in the Platform target dropdown list in order to compile for x64. Finally, it is possible to start adding code that uses the System.Numerics types, operations, and classes in the Windows WPF application, and when you execute it, you will know RyuJIT is generating SIMD intrinsics.

Working with Fixed Vector Types

The Microsoft.Bcl.Simd NuGet package provides three fixed-sized vectors within the System.Numerics namespace. These vectors encapsulate a different number of single-precision floating-point values. All the operations for these fixed-sized vectors are mapped to SIMD intrinsics:

  • Vector2f: Two single-precision floating-point values (x and y components).
  • Vector3f: Three single-precision floating-point values (x, y, and z components).
  • Vector4f: Four single-precision floating-point values (x, y, z, and w components).

The three fixed-sized vectors are very common when you work with graphics. You can easily replace the types (structures or classes) you use in existing code to work with vectors with a different number of single-precision floating-point values with any of the fixed-sized vectors included in System.Numerics. This way, your existing code will reduce the number of instructions required to perform math operations on these vectors and can start taking advantage of SIMD intrinsics.

In previous Microsoft.Bcl.Simd versions, the fixed-sized vectors were immutable. However, the latest version converted them to mutable. I think it was a good idea to have immutable vectors, but it seems that there are too many applications that benefit from the compatibility offered by mutable vectors.

The following lines show the code for MainWindow.xaml.cs with code for a button event handler (Button_Click). You just need to add a Button to MainWindow.xaml and define Button_Click as the attached event handler for Click. Forgive me for attaching code to the button event handler: I just want to keep this example simple and stay focused on RyuJIT and SIMD intrinsics, and a proper Model-View-View Model (MVVM) WPF sample would require a lot of extra code that wouldn't provide additional value.

using System;
using System.Windows;
using System.Windows.Controls;
// Added for SIMD
using System.Numerics;

namespace SIMDWpfApplication1
    public partial class MainWindow : Window
        public MainWindow()

        private void Button_Click(object sender, RoutedEventArgs e)
            var vector1 = new Vector3f(x: 5f, y: 5f, z: 5f);
            var vector2 = new Vector3f(x: 1f, y: 1f, z: 1f);
            var finalVector = vector1 + vector2;
            (sender as Button).Content = finalVector.X.ToString();

After you build the application with the Release configuration, when you click or tap the button, the code within the Button_Click event handler creates two Vector3fs (vector1 and vector2). Then, the code generates a new Vector3f (finalVector) with the result of the sum of vector1 and vector2. The code changes the button's title with the value of the X component of the resulting vector, just to avoid optimizations removing unused variables. Le'ts look at the assembly code generated by RyuJIT.

You can set a breakpoint within Button_Click and select Debug | Windows | Disassembly to see the assembly code generated by RyuJIT and check the use of SIMD instructions. The following lines show the assembly code generated by RyuJIT for the line that sums the two vectors: vector1 and vector2 (see Figure 1). The addps instruction performs a SIMD add of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. In this case, each vector has three 32-bit components. However, Vector3f stores an additional value that isn't considered for its components, but makes it possible to work with 128-bit registers. This way, a single instruction can sum all the components of a vector with three visible elements for the C# code. Don't worry, you don't need to learn assembly code to understand the benefits of using SIMD instructions for the sum operation.

var finalVector = vector1 + vector2;
00007FFAD3AD7A19  addps       xmm0,xmm1  
00007FFAD3AD7A1C  movaps      xmmword ptr [rsp+30h],xmm0  

Figure 1: Visual Studio 2013 displaying the assembly code with SIMD intrinsics generated for the line of code that sums two vectors.

Now, if you change the line that performs the sum operation and assigns the result to finalVector with one of the possible lines of C# code that would be necessary to perform a sum, you will notice a big change in the number of generated assembly instructions. In such a case, it is necessary to perform an independent sum operation for each component (x, y, and z). If you have any struct or class that represents a vector with three elements in C#, at some point, you end up with code that sums each element, as in the next line.

var finalVector = new Vector3f(x: vector1.X + vector2.X, y: vector1.Y + vector2.Y, z: vector1.Z + vector2.Z);

You can establish a breakpoint again, and view the assembly code now generated by RyuJIT:

var finalVector = new Vector3f(x: vector1.X + vector2.X, y: vector1.Y + vector2.Y, z: vector1.Z + vector2.Z);
00007FFAD3AC7A12  movaps      xmm2,xmm0  
00007FFAD3AC7A15  psrldq      xmm2,8  
00007FFAD3AC7A1A  movaps      xmm3,xmm1  
00007FFAD3AC7A1D  psrldq      xmm3,8  
00007FFAD3AC7A22  addss       xmm2,xmm3  
00007FFAD3AC7A26  movaps      xmm3,xmm0  
00007FFAD3AC7A29  psrldq      xmm3,4  
00007FFAD3AC7A2E  movaps      xmm4,xmm1  
00007FFAD3AC7A31  psrldq      xmm4,4  
00007FFAD3AC7A36  addss       xmm3,xmm4  
00007FFAD3AC7A3A  addss       xmm0,xmm1  
00007FFAD3AC7A3E  xorps       xmm1,xmm1  
00007FFAD3AC7A41  movss       xmm1,xmm2  
00007FFAD3AC7A45  pslldq      xmm1,4  
00007FFAD3AC7A4A  movss       xmm1,xmm3  
00007FFAD3AC7A4E  pslldq      xmm1,4  
00007FFAD3AC7A53  movss       xmm1,xmm0  
00007FFAD3AC7A57  movaps      xmm6,xmm1  

As you can see, the code now generates 18 assembly instructions instead of the previously two. Each instruction is different, and I don't want to compare them, but you will obviously benefit from reducing the number from 18 to 2 instructions when performing a sum of two vectors with three elements. Imagine the speed-up on your existing algorithms that use vectors with different numbers of elements.


As you can see from this simple example, it makes sense to take the time and make the effort to play with RyuJIT in order to unleash its ability to generate SIMD intrinsics. Don't forget to disable RyuJIT after you finish working with it. In the next article, I'll explain the different operations that map to SIMD instructions for fixed vector types. In addition, I'll provide examples of a more advanced scenario where you can use hardware-dependent vector types that adjust their number of elements based on the capabilities provided by the underlying hardware and that allow you to work with other types apart from float.

Gastón Hillar is a senior contributing editor at Dr. Dobb's.

Related Article

SIMD-Enabled Vector Types with C#

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.