Channels ▼
RSS

.NET

Debugging GPU Code in Microsoft C++ AMP


Visual Studio 2013 added many important new features for running general-purpose code on the GPU. The most important of these are the support for shared CPU/GPU memory and the improvements in debugging. In this article, the first in a two-part series, I provide a sample application that solves algorithms on the GPUs using Microsoft C++ AMP parallel technology and I demonstrate how to debug the code running on the GPU from inside Visual Studio.

The New C++ AMP Debugging Experience

When Microsoft introduced C++ AMP in Visual Studio 2012, the support for GPU debugging was limited to Windows 8 and Windows Server 2012 platforms. If you worked with previous Visual Studio versions, you already know that Microsoft reserves certain features to the most recent Windows version. The lack of support of previous Windows versions for GPU debugging was a big problem for C++ AMP and its popularity. Fortunately, Visual Studio 2013 added support for GPU debugging on both Windows 7 Service Pack 1 and Windows Server 2008 R2 Service Pack 1.

That's great news because sometimes one of the big problems when moving to a newer Windows version is the lack of support for graphics hardware drivers that don't allow you to take full advantage of your existing GPUs. Thus, the reduced debugging experience in both Windows 7 and Windows Server 2008 R2 wasn't enabling a huge number of developers running on older versions of Windows to fully debug C++ AMP code.

There is still one specific feature that is only available on Windows 8.1: the side-by-side CPU/GPU debugging. Although the WARP (short for Windows Advanced Rasterization Platform) accelerator supports this mixed mode debugging only on Windows 8.1, you can still debug either CPU or GPU code per debugging instance on the other supported platforms. WARP is a high-speed, fully conformant software rasterizer, introduced with the Direct3D 11 runtime.

When you need to debug a C++ AMP application, you can launch three different kinds of debuggers in Visual Studio 2013, depending on your platform:

  • Debug just the code that runs on the CPU.
  • Debug just the code that runs on the GPU.
  • Use the WARP accelerator to enable mixed-mode debugging and debug code that runs either on the CPU or the GPU.

Debugging CPU Code

The following lines show an example of a simple C++ AMP application that performs a very simple sum of the elements of three one-dimensional arrays of ints.

#include "stdafx.h"
#include <amp.h>
#include <iostream>
using namespace concurrency;

void amp_sum(int *array_a, int *array_b, int *array_c, int *array_sum, int size)
{
	// Create the C++ AMP objects that will make the necessary transfers from CPU to GPU
	array_view<const int, 1> a(size, array_a);
	array_view<const int, 1> b(size, array_b);
	array_view<const int, 1> c(size, array_c);
	array_view<int, 1> sum(size, array_sum);
	// Use the discard_data optimization hint to tell the runtime to avoid copying the current contents
	// of the view to a target accelerator_view because the existing content is not needed
	sum.discard_data();

	parallel_for_each(
		// Define the compute domain, i.e., the set of threads that are created
		sum.extent,
		// The following line will run on each thread on the accelerator
		[=](index<1> idx) restrict(amp)
		{
			sum[idx] = a[idx] + b[idx] + c[idx];
		}
	);
}

int _tmain(int argc, _TCHAR* argv[])
{
	const int size = 10;
	int valuesA[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
	int valuesB[] = { 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 };
	int valuesC[] = { 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 };
	int valuesAPlusBPlusC[size];

	amp_sum(valuesA, valuesB, valuesC, valuesAPlusBPlusC, size);

	// Print the results
	for (int i = 0; i < size; i++) {
		std::cout << valuesAPlusBPlusC[i] << "\n";
	}

	std::cin.get();

	return 0;
}

If you execute the application, you will see the following output in the console:

111
222
333
444
555
666
777
888
999
1110

The _tmain method defines three one-dimensional arrays of int with 10 elements: valuesA, valuesB and valuesC. In addition, the code defines another one-dimensional array 10 integers that will hold the results of the sum operation of the elements in the three arrays, valuesAPlusBPlusC. The algorithm is very simple, valuesAPlusBPlusC[0] will have the result of valuesA[0] + valuesB[0] + valuesC[0], valuesAPlusBPlusC[1] will have the result of valuesA[1] + valuesB[1] + valuesC[1], and so on. The amp_sum function receives the three input arrays, the output array and the number of elements (size), and then, the code prints the results saved in the valuesAPlusBPlusC array to the console.

The amp_sum method has some code that will run on the CPU and some code that will be executed on the GPU. First, the method creates the C++ AMP objects that will make the necessary transfers from CPU to GPU. The code defines one Concurrency::array_view<const int, 1> for each input array received as a parameter. The GPU won't change the data in the input arrays, and therefore, the code specifies const int instead of int for the array_view type. C++ AMP will copy the current contents of each array to a target accelerator_view because these arrays will be necessary as the input data.

array_view<const int, 1> a(size, array_a);
array_view<const int, 1> b(size, array_b);
array_view<const int, 1> c(size, array_c);

Each array_view is an N-dimensional view over data held in another container. In these three cases, the containers are one-dimensional arrays, so the _Rank parameter that specifies the number of dimensions of each array_view is set to 1. Each array_view exposes an indexing interface congruent to that of the array.

The situation is a bit different with the output array. This array will be necessary for the C++ AMP code, but it is needed only for holding the results of each sum operation, and therefore, I don't want to waste time copying the initial contents of the array to the GPU. Thus, the code defines the array_view<int, 1> and then calls the discard_data optimization hint to tell the C++ AMP runtime to avoid copying the current contents of the view to a target accelerator_view because the existing content isn't needed.

array_view<int, 1> sum(size, array_sum);
sum.discard_data();

The amp_sum function receives array_a, array_b, array_c, array_sum and size. With these arguments, the code created the following C++ AMP objects:

  • a: Allows the C++ AMP code to access the read-only (const) contents of array_a copied to the target accelerator_view.
  • b: Allows the C++ AMP code to access the read-only (const) contents of array_b copied to the target accelerator_view.
  • c: Allows the C++ AMP code to access the read-only (const) contents of array_c copied to the target accelerator_view.
  • sum: Allows the C++ AMP code to write to the target accelerator_view. Once the C++ AMP code finishes its execution, the runtime will transfer the contents from the accelerator_view to the mapped array (array_sum).

The C++ AMP runtime includes many performance improvements from Visual Studio 2013. For example, in this case, the code benefits from one of these enhancements because the runtime has an optimized performance when copying small data sizes between the CPU and the accelerator.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video