Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Open Source

CUDA, Supercomputing for the Masses: Part 15

Creating an Entire Artificial Landscape

Now let's have some fun and exploit the flexibility of CUDA and the OpenGL framework described in this article!

In 1997, Ken Perlin received an Academy Award for developing the Perlin Noise generator. Perlin Noise has a multitude of uses ranging from the creation of natural textures (Texture demo) to artificial terrain and even worlds (Planet demo).

In this article, I implement a PBO-based CUDA version of his reference Java implementation of the Improved Perlin Noise generator to generate animated images of artificial terrain. The next article in this series will use the same code to create fully rendered 3D landscapes using this noise function.

Numerous websites on the Internet discuss Perlin Noise. It is a popular topic. Here are several of my favorites:

Following is the complete source of the file, perlinKernelPBO.cu. Please note that there is a bank conflict in this code that will be found in a later article on performance tools. Can you find it? Even so, the provided CUDA version of this code is extremely fast.

//perlinKernelPBO.cu (Rob Farber)
#include <cutil_math.h>
#include <cutil_inline.h>
#include <cutil_gl_inline.h>
#include <cuda_gl_interop.h>

float gain=0.75f;
float xStart=2.f;
float yStart=1.f;
float zOffset = 0.0f;
#define Z_PLANE 50.f

__constant__ unsigned char c_perm[256];
__shared__ unsigned char s_perm[256]; // shared memory copy of permuation array
unsigned char* d_perm=NULL; // global memory copy of permutation array
// host version of permutation array
const static unsigned char h_perm[] = {151,160,137,91,90,15,
   190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
   88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
   102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
   135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
   223,183,170,213,119,248,152,2,44,154,163, 70,221,153,101,155,167, 43,172,9,
   129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
   251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
   49,192,214, 31,181,199,106,157,184,84,204,176,115,121,50,45,127, 4,150,254,

__device__ inline int perm(int i) { return(s_perm[i&0xff]); }
__device__ inline float fade(float t) { return t * t * t * (t * (t * 6.f - 15.f) + 10.f); }
__device__ inline float lerpP(float t, float a, float b) { return a + t * (b - a); }
__device__ inline float grad(int hash, float x, float y, float z) {
  int h = hash & 15;                      // CONVERT LO 4 BITS OF HASH CODE
  float u = h<8 ? x : y,                 // INTO 12 GRADIENT DIRECTIONS.
    v = h<4 ? y : h==12||h==14 ? x : z;
  return ((h&1) == 0 ? u : -u) + ((h&2) == 0 ? v : -v);

__device__ float inoise(float x, float y, float z) {
  int X = ((int)floorf(x)) & 255, // FIND UNIT CUBE THAT
    Y = ((int)floorf(y)) & 255,   // CONTAINS POINT.
    Z = ((int)floorf(z)) & 255;
  x -= floorf(x);               // FIND RELATIVE X,Y,Z
  y -= floorf(y);               // OF POINT IN CUBE.
  z -= floorf(z);
  float u = fade(x),            // COMPUTE FADE CURVES
    v = fade(y),                // FOR EACH OF X,Y,Z.
    w = fade(z);
  int A = perm(X)+Y, AA = perm(A)+Z, AB = perm(A+1)+Z, // HASH COORDINATES OF
    B = perm(X+1)+Y, BA = perm(B)+Z, BB = perm(B+1)+Z; // THE 8 CUBE CORNERS,
  return lerpP(w, lerpP(v, lerpP(u, grad(perm(AA), x  , y  , z   ), // AND ADD
				 grad(perm(BA), x-1.f, y  , z   )),   // BLENDED
			lerpP(u, grad(perm(AB), x  , y-1.f, z   ),    // RESULTS
			      grad(perm(BB), x-1.f, y-1.f, z   ))),     // FROM  8
	       lerpP(v, lerpP(u, grad(perm(AA+1), x  , y  , z-1.f ),  // CORNERS
			      grad(perm(BA+1), x-1.f, y  , z-1.f )),    // OF CUBE
		     lerpP(u, grad(perm(AB+1), x  , y-1.f, z-1.f ),
			   grad(perm(BB+1), x-1.f, y-1.f, z-1.f ))));
#ifdef ORIG

__device__ inline float height2d(float x, float y, int octaves,
		     float lacunarity = 2.0f, float gain = 0.5f)
  float freq = 1.0f, amp = 0.5f;
  float sum = 0.f;  
  for(int i=0; i<octaves; i++) {
    sum += inoise(x*freq,y*freq, Z_PLANE)*amp;
    freq *= lacunarity;
    amp *= gain;
  return sum;

__device__ inline uchar4 colorElevation(float texHeight)
  uchar4 pos;

  // color textel (r,g,b,a)
       if (texHeight < -1.000f) pos = make_uchar4(000, 000, 128, 255); //deeps
  else if (texHeight < -.2500f) pos = make_uchar4(000, 000, 255, 255); //shallow
  else if (texHeight < 0.0000f) pos = make_uchar4(000, 128, 255, 255); //shore
  else if (texHeight < 0.0625f) pos = make_uchar4(240, 240, 064, 255); //sand
  else if (texHeight < 0.1250f) pos = make_uchar4(032, 160, 000, 255); //grass
  else if (texHeight < 0.3750f) pos = make_uchar4(224, 224, 000, 255); //dirt
  else if (texHeight < 0.7500f) pos = make_uchar4(128, 128, 128, 255); //rock
  else                          pos = make_uchar4(255, 255, 255, 255); //snow


void checkCUDAError(const char *msg) {
  cudaError_t err = cudaGetLastError();
  if( cudaSuccess != err) {
    fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) ); 

//Simple kernel fills an array with perlin noise
__global__ void k_perlin(uchar4* pos, unsigned int width, unsigned int height, 
			 float2 start, float2 delta, float gain, float zOffset,
			 unsigned char* d_perm)
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  float xCur = start.x + ((float) (idx%width)) * delta.x;
  float yCur = start.y + ((float) (idx/width)) * delta.y;

  if(threadIdx.x < 256)
    // Optimization: this causes bank conflicts
    s_perm[threadIdx.x] = d_perm[threadIdx.x];
  // this synchronization can be important if there are more that 256 threads

  // Each thread creates one pixel location in the texture (textel)
  if(idx < width*height) {
    float h = height2d(xCur, yCur, 4, 2.f, 0.75f) + zOffset;
    pos[idx] = colorElevation(h);

// Wrapper for the __global__ call that sets up the kernel call
extern "C" void launch_kernel(uchar4* pos, unsigned int image_width, 
			      unsigned int image_height, float time)
  int nThreads=256; // must be equal or larger than 256! (see s_perm)
  int totalThreads = image_height * image_width;
  int nBlocks = totalThreads/nThreads; 
  nBlocks += ((totalThreads%nThreads)>0)?1:0;
  float xExtent = 10.f;
  float yExtent = 10.f;
  float xDelta = xExtent/(float)image_width;
  float yDelta = yExtent/(float)image_height;
  if(!d_perm) { // for convenience allocate and copy d_perm here
    cudaMalloc((void**) &d_perm,sizeof(h_perm));
    checkCUDAError("d_perm malloc or copy failed!");

  k_perlin<<< nBlocks, nThreads>>>(pos, image_width, image_height,
				   make_float2(xStart, yStart),
				   make_float2(xDelta, yDelta),
				   gain, zOffset, d_perm);
  // make certain the kernel has completed 
  checkCUDAError("kernel failed!");

We also define several additional keystrokes in callbacksPerlin.cpp that allow interactively movement around the artificial landscape using "vi" keystrokes (e.g., h, j, k, and l). The + and - keys are also redefined from to vary the sea level of the artificial landscape:

  • <ESC> or q terminates the application.
  • A or a toggles the animation. Stopping the animation will freeze the picture and stop the frames per second calculation.
  • The + key raises the terrain (effectively lowering the sea level).
  • The - key lowers the terrain (effectively increasing the sea level).
  • h moves left.
  • l moves right.
  • k moves up.
  • j moves down.

The complete source for callbacksPerlin.cpp follows. (Note: If no additional keyboard commands had been defined, the original callbacksPBO.cpp can be used unchanged.)

//perlinCallbacksPBO.cpp (Rob Farber)
#include <GL/glew.h>
#include <cuda_runtime.h>
#include <cutil_inline.h>
#include <cutil_gl_inline.h>
#include <cuda_gl_interop.h>
//#include <cutil_gl_error.h>
#include <rendercheck_gl.h>

// variables for keyboard control
int animFlag=1;
float animTime=0.0f;
float animInc=0.1f;

//external variables
extern GLuint pbo;
extern GLuint textureID;
extern unsigned int image_width;
extern unsigned int image_height;

// The user must create the following routines:
void runCuda();

void display()
  // run CUDA kernel
  glBindBuffer( GL_PIXEL_UNPACK_BUFFER, pbo);

  // load texture from PBO
  glBindTexture(GL_TEXTURE_2D, textureID);

  // Note: glTexSubImage2D will perform a format conversion if the
  // buffer is a different format from the texture. We created the
  // texture with format GL_RGBA8. In glTexSubImage2D we specified
  // GL_BGRA and GL_UNSIGNED_INT. This is a fast-path combination

  // Note: NULL indicates the data resides in device memory
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, image_width, image_height, 

  // Draw a single Quad with texture coordinates for each vertex.

  glTexCoord2f(0.0f,1.0f); glVertex3f(0.0f,0.0f,0.0f);
  glTexCoord2f(0.0f,0.0f); glVertex3f(0.0f,1.0f,0.0f);
  glTexCoord2f(1.0f,0.0f); glVertex3f(1.0f,1.0f,0.0f);
  glTexCoord2f(1.0f,1.0f); glVertex3f(1.0f,0.0f,0.0f);

  // Don't forget to swap the buffers!

  // if animFlag is true, then indicate the display needs to be redrawn
  if(animFlag) {
    animTime += animInc;

extern float xStart,yStart,zOffset;

//! Keyboard events handler for GLUT
void keyboard(unsigned char key, int x, int y)
  switch(key) {
  case('q') :
  case(27) :
  case 'a': // toggle animation
  case 'A':
    animFlag = (animFlag)?0:1;
  case '+': // lower the ocean level
    zOffset += 0.01;
  case '-': // raise the ocean level
    zOffset -= 0.01;
  case 'k': 
    yStart -= 0.1;
  case 'j': 
    yStart += 0.1;
  case 'l': 
    xStart += 0.1;
  case 'h': 
    xStart -= 0.1;

  // indicate the display must be redrawn

// No mouse event handlers defined
void mouse(int button, int state, int x, int y)

void motion(int x, int y)

The complete PBO-based Perlin Noise example utilizes the following files. (Note: that simpleGLmain.cpp and simplePBO.cpp are unchanged from the first example.)

  • simpleGLmain.cpp: Create an OpenGL window and performs basic OpenGl/GLUT setup.
  • simplePBO.cpp: Perform CUDA-centric setup – in this case for a Pixel Buffer Object (PBO).
  • perlinCallbacksPBO.cpp: Define keyboard, mouse and other callbacks for the Perlin Noise demonstration using PBOs.
  • perlinKernelPBO.cu: The CUDA kernel that calculates the Perlin Noise landscape to be displayed.

Use the following nvcc command to compile the testperlin executable. (Please substitute the path to the CUDA SDK libraries for CUDA_LIBRARIES. Similarly, substitute the path to the CUDA SDK include files for CUDA_INCLUDES.)

nvcc –O3 -L CUDA_LIBRARIES -I CUDA_INCLUDES simpleGLmain.cpp simplePBO.cpp callbacksPerlin.cpp kernelPerlin.cu -lglut -lGLEW –lcutil –o testperlin

Running the executable will display an animated artificial landscape can be interactively moved through the use of the h, j, k, and l keys. Figure 6 is one example.

Figure 6

Note that the frame rate reported in the window title reflects the GPU performance as the application recalculates each and every pixel in the landscape for every frame. This was done on purpose to illustrate the excellent performance possible in the worst case when the CUDA kernel must modify every pixel in every frame of an application. For example, an NVIDIA GeForce GTX 285 can deliver many hundreds of frames per second for these examples. Hopefully, these demonstrations can help to illustrate the remarkable headroom available in the current generation of GPUs. Play around with the source code and see if they can deliver a high frame rate per second on your problems for smooth hesitation-free visualization even when the GPU is required to perform extensive physics and other real-time calculations.

For More Information

Rob Farber is a senior scientist at Pacific Northwest National Laboratory. He has worked in massively parallel computing at several national laboratories and as co-founder of several startups. He can be reached at [email protected]

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.