Channels ▼
RSS

Tools

Easy OpenCL with Python


PyOpenCL is an open-source package (MIT license) that enables developers to easily access the OpenCL API from Python. The latest stable version of PyOpenCL provides features that make it one of the handiest OpenCL wrappers for Python because you can easily start working with OpenCL kernels without leaving your favorite Python environment. In this first article of a two-part series on PyOpenCL, I explain how you can easily build and deploy an OpenCL kernel, that is, a function that executes on the GPU device.

More Than an OpenCL Wrapper

PyOpenCL enables you to access the entire OpenCL 1.2 API from Python. Thus, you are able to retrieve all the information that the OpenCL API offers for platforms, contexts, devices, programs, kernels, and command queues. One of the main goals of PyOpenCL is to make sure that you can access in Python the same features that a C++ OpenCL host program can. However, PyOpenCL is not just an OpenCL wrapper for Python: It also provides enhancements and shortcuts for the most common tasks. With PyOpenCL, you usually need only a few lines of Python code to perform tasks that require dozens of C++ lines.

PyOpenCL reduces the number of OpenCL calls needed to retrieve all the information usually required to build and deploy a kernel for OpenCL execution on the GPU device. It provides automatic object cleanup tied to the lifetime of objects, so you don't need to worry about writing cleanup code. And PyOpenCL automatically translates all OpenCL errors to Python exceptions.

You can use PyOpenCL to create programs and build kernels as you would with a C++ OpenCL host program, or you can take advantage of the many OpenCL kernel builders that simplify the creation of kernels that need to perform common parallel algorithms. PyOpenCL provides kernel builders for the following parallel algorithms:

  • Element-wise expression evaluation builder (map)
  • Sum and counts builder (reduce)
  • Prefix sums builder (scan)
  • Custom scan kernel builder
  • Radix sort

Gathering Information About Platforms, Contexts, and Devices

You probably have some basic knowledge of how OpenCL works and the it organizes the underlying drivers and hardware, such as platforms, contexts, and devices. If not, I suggest Matthew Scarpino's A Gentle Introduction to OpenCL, which is a good introductory tutorial; you will then be able to understand the examples I provide in this series. Scarpino's analogy of OpenCL processing and a game of cards makes it easy to understand the way OpenCL works.

One of the problems I found in the good documentation provided by PyOpenCL is that it assumes you have just one OpenCL platform available in your development workstation. Sometimes, you have more than one platform. For example, in my laptop I have two OpenCL platforms:

  • AMD Accelerated Parallel Processing: The drivers for my ATI GPU, which include support for OpenCL.
  • Intel OpenCL: The Intel OpenCL runtime which provides a CPU-only OpenCL runtime for my Intel Core i7 CPU.

Thus, it is good practice to prepare your code to run on computers that might have more than one OpenCL platform. If you want to easily check the different OpenCL platforms available in any computer, you can use a simple and useful utility to list them and check their features, GPU Caps Viewer.Download the latest version of GPU Caps Viewer and then read about its features in GPU Caps Viewer v1.8.6 Dives Deep on OpenCL Support. It is wise to learn about the OpenCL features that are available in your development workstation before you start diving deeper into PyOpenCL.

I'll use the import pyopencl as cl import for all the code snippets in this article. If you want to retrieve all the OpenCL platforms, you can use platforms = cl.get_platforms().

The get_platforms() method returns a list of pyopencl.Platform instances that include all the information you need about each platform. In my case, get_platforms returns a list with two instances, and the pyopencl.Platform instances have the following values for their name property: 'AMD Accelerated Parallel Processing' and 'Intel(R) OpenCL'.

A common requirement for OpenCL host programming is to obtain different platform information parameters, such as the list of extensions supported by the platform. In C++, retrieving this information requires many lines of code. PyOpenCL makes it easier because each pyopencl.Platform instance includes all the properties you might need to check. Table 1 shows the pyopencl.Platform property names that provide the equivalent information to an OpenCL platform parameter name. As you can see, you just need to remove the CL_PLATFORM_ prefix and use lowercase letters to generate the equivalent property name.

OpenCL Platform Parameter Name

pyopencl.Platform Property Name

Description

CL_PLATFORM_EXTENSIONS

extensions

A string with the list of extensions supported by the platform.

CL_PLATFORM_NAME

name

A string with the platform's name.

CL_PLATFORM_PROFILE

profile

A string with two possible values: 'FULL_PROFILE' when the platform supports the full OpenCL standard or 'EMBEDDED_PROFILE' when the platform just supports the embedded OpenCL standard.

CL_PLATFORM_VENDOR

vendor

A string with the platform's vendor name.

CL_PLATFORM_VERSION

version

A string with the maximum version of the OpenCL API supported by the platform.

Table 1. PyOpenCL property names

For example, the following line retrieves a string with the list of extensions supported by the first OpenCL platform found. Because you need at least one OpenCL platform to be able to work with PyOpenCL and OpenCL, the line will work on any OpenCL development workstation:

platform_extensions = platforms[0].extensions

The following lines show examples of the string value of the extensions property for two different platforms:

'cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing'
'cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing'

You can check whether the platform supports the cl_khr_icd extension with the following line:

supports_cl_khr_icd = platform_extensions.__contains__('cl_khr_icd')

Now that you have a platform, you can access devices that can receive tasks and data from the host. If you want to retrieve all the OpenCL devices available for a specific platform, you can call the get_devices method for the pyopencl.Platform instance. For example, the following line retrieves all the devices for the first OpenCL platform found:

devices = platforms[0].get_devices()

The get_devices() method returns a list of pyopencl.Device instances that include all the information you need about each device. When you call get_devices() without parameter, it is equivalent to the following line that retrieves devices without filtering by device type:

devices = platforms[0].get_devices(cl.device_type.ALL)

For example, in my case, when I don't specify the desired device type, get_devices() returns a list with two instances of the pyopencl.Device class — one for the GPU, and the other for the CPU. If you only want to retrieve the available GPU devices, you can specify the desired filter:

gpu_devices = platforms[0].get_devices(cl.device_type.GPU)

As with the platforms, PyOpenCL makes it easy to obtain different device information parameters, such as the device's global memory size. Each pyopencl.Device instance includes all the properties you might need to check. Table 2 shows some of the pyopencl.Device property names that provide the equivalent information to an OpenCL platform parameter name. As you can see, you just need to remove the CL_DEVICE_ prefix and use lowercase letters to generate the equivalent property name.

OpenCL Device Parameter Name

pyopencl.Device Property Name

Description

CL_DEVICE_ADDRESS_BITS

address_bits

An unsigned integer with the size of the device's address space.

CL_DEVICE_EXTENSIONS

extensions

A string with the list of extensions supported by the device.

CL_DEVICE_GLOBAL_MEM_SIZE

global_mem_size

An unsigned long with the size of the device's global memory

CL_DEVICE_MAX_WORK_GROUP_SIZE

max_work_group_size

An unsigned integer with the maximum size of a workgroup for the device.

CL_DEVICE_NAME

name

A string with the device's name.

CL_DEVICE_VENDOR

vendor

A string with the device's vendor name.

Table 2. Typical device properties returned in PyOpenCL

The following line retrieves a string with the list of extensions supported by the first OpenCL GPU device in the selected platform. You need at least one OpenCL GPU device to run the next line in any OpenCL development workstation:

gpu_device_extensions = gpu_devices[0].extensions

The following line shows examples of the string value of the extensions property for one device:

'cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing'

You can check whether the device supports the cl_khr_gl_sharing extension with the following line:

supports_cl_khr_gl_sharing = gpu_device_extensions.__contains__('cl_khr_gl_sharing')

It is very common to check some extensions related to graphics for a device, such as cl_khr_d3d10_sharing and cl_khr_gl_sharing. If you've ever written a OpenCL host application in C++, you will definitely notice how much simpler things are with PyOpenCL.

Building and Deploying a Kernel

To build and deploy a basic OpenCL kernel, you usually need to follow these steps in a typical OpenCL C++ host program:

  1. Obtain an OpenCL platform.
  2. Obtain a device id for at least one device (accelerator).
  3. Create a context for the selected device or devices.
  4. Create the accelerator program from source code.
  5. Build the program.
  6. Create one or more kernels from the program functions.
  7. Create a command queue for the target device.
  8. Allocate device memory and move input data from the host to the device memory.
  9. Associate the arguments to the kernel with kernel object.
  10. Deploy the kernel for device execution.
  11. Move the kernel's output data to host memory.
  12. Release context, program, kernels and memory.

These steps represent a simplified version of the tasks that your host program must perform (each step is a bit more complex in real life). For example, the first step (obtain an OpenCL platform) usually requires checking the properties for the platforms, as I explained in the previous section. In addition, each step requires error checking. Because you can work with PyOpenCL from any Python console, you can execute the different steps with an interactive environment that makes it easy for you to learn both OpenCL and the way PyOpenCL exposes the features in the API.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video