Channels ▼
RSS

Parallel

Running CUDA Code Natively on x86 Processors


How It Works

In CUDA-x86, thread blocks are mapped to x86 processor cores. Thread-level parallelism is mapped to SSE (Streaming SIMD Extensions) or AVX SIMD units as shown below. (AVX is an extension of SSE to 256-bit operation). PGI indicates that:

  • The size of a warp (that is, the basic unit of code to be run) will be different than the typical 32 threads per warp for a GPU. For x86 computing, a warp might be the size of the SIMD units on the x86 core (either four or eight threads) or one thread per warp when SIMD execution is not utilized.
  • In many cases, the PGI CUDA C compiler removes explicit synchronization of the thread processors when the compiler can determine it is safe to split loops.
  • CUDA considers the GPU as a separate device from the host processors. CUDA x86 maintains this memory model, which means that data movement between the host and device memory spaces still consumes application runtime. As shown in the device bandwidth SDK example below, a modern Xeon processor can transfer data to a CUDA-x86 "device" at about 4GB/sec. All CUDA x86 pointers reside in the x86 memory space, so programmers can use conditional compilation to directly access memory without requiring data transfers when running on multicore processors.

Trying Out the Compiler

The PGI installation process is fairly straightforward:

  1. Register and download the latest version from PGI
  2. Extract the tarfile at the location of your choice and follow the instructions in INSTALL.txt.
    • Under Linux, this basically requires running the file ./install as superuser and answering a few straight-forward questions.
    • Note that you should answer "yes" to the installation of CUDA even if you have a GPU version of CUDA already installed on your system. The PGI x86 version will not conflict with the GPU version. Otherwise, the PGI compiler will not understand files with the .cu file extension.
  3. Create the license.dat file.

At this point, you have a 15-day license for the PGI compilers.

Setup the environment to build with the PGI tools as discussed in the installation guide. Following are the commands for bash under Linux:

PGI=/opt/pgi; export PGI 
MANPATH=$MANPATH:$PGI/linux86-64/11.5/man; export MANPATH 
LM_LICENSE_FILE=$PGI/license.dat; export LM_LICENSE_FILE 
PATH=$PGI/linux86-64/11.5/bin:$PATH; export PATH

Copy the PGI NVIDIA SDK samples to a convenient location and build them:

cp –r /opt/pgi/linux86-64/2011/cuda/cudaX86SDK  .
cd cudaX86SDK ;
make

This is the output of deviceQuery on an Intel Xeon e5560 processor:

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "DEVICE EMULATION MODE"
  CUDA Driver Version:                           99.99
  CUDA Runtime Version:                          99.99
  CUDA Capability Major revision number:         9998
  CUDA Capability Minor revision number:         9998
  Total amount of global memory:                 128000000 bytes
  Number of multiprocessors:                     1
  Number of cores:                               0
  Total amount of constant memory:               1021585952 bytes
  Total amount of shared memory per block:       1021586048 bytes
  Total number of registers available per block: 1021585904
  Warp size:                                     1
  Maximum number of threads per block:           1021585920
  Maximum sizes of each dimension of a block:    32767 x 2 x 0
  Maximum sizes of each dimension of a grid:     1021586032 x 32767 x 1021586048
  Maximum memory pitch:                          4206313 bytes
  Texture alignment:                             1021585952 bytes
  Clock rate:                                    0.00 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Unknown
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 99.99, CUDA Runtime Version = 99.99, NumDevs = 1, Device = DEVICE EMULATION MODE


PASSED

Press <Enter> to Quit...
-----------------------------------------------------------

The output of bandwidthTest shows that device transfers work as expected:

 Running on...

 Device 0: DEVICE EMULATION MODE
 Quick Mode

 Host to Device Bandwidth, 1 Device(s), Paged memory
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			4152.5

 Device to Host Bandwidth, 1 Device(s), Paged memory
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			4257.0

 Device to Device Bandwidth, 1 Device(s)
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			8459.2


[bandwidthTest] - Test results:
PASSED


Press <Enter> to Quit...
-----------------------------------------------------------

As with NVIDIA's nvcc compiler, it is easy to use the PGI pgCC compiler to build an executable from a CUDA source file. As an example, copy the arrayReversal_multiblock_fast.cu code from Part 3 of this series. To compile and run it under Linux, type:


pgCC arrayReversal_multiblock_fast.cu
./a.out
Correct!

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video