Channels ▼
RSS

Parallel

Optimizing Software for Multicore Processors

Source Code Accompanies This Article. Download It Now.


Decompose Code

VolPack performs the same operation on each pixel, and this characteristic, combined with the relatively simple data structure, provides for a straightforward parallelization strategy. We used POSIX threads to divide the New_Pixel_Loop into four subloops, each running on one of four cores.

In our implementation, Core 1 first executes serial code to load the image, which precedes the dataflow in Figure 2, while the other three cores are idle. Next, Core 1 uses POSIX threads to spawn three threads for cores 2-4. Our software experts took two days to inspect the code, parallelize the image-rendering code, and test the workload balance.

Create the Threads. Listing One initializes one thread per core using pthread_create. Each thread executes the same function, vp_thread_task_loop. Listing One uses a variable NUM_THREADS, which corresponds to the number of cores in the system. Because the number of cores is not hard coded, it can easily be ported to run on systems with any number of cores.

{
        vp_begin_threads();
        vp_threads_begun = 1;
}

void vp_begin_threads()
{
        int i;
        int mask = 0xf;
        vp_pthreads_data = vp_create_thread_data();
          if (vp_pthreads_data == NULL)
          {
                printf("Unable to allocate memory for threads.\n");
                return;
          }
       
        for(i=1;i<NUM_THREADS;i++)
        {
                vp_pthreads_args[i].data = vp_pthreads_data;
                vp_pthreads_args[i].id = i;
                pthread_create(&(vp_threads[i]), NULL, vp_thread_task_loop,
                                &(vp_pthreads_args[i]));
        }
}
Listing One

vp_thread_big_loop_args loop_args[NUM_THREADS];
            int num = (kcount)>>THREAD_SHIFT;
            int extras = (kcount)&THREAD_MASK;
           int cur_num = kstart;

    loop_args[0].vpc = vpc;
    loop_args[0].kstart = kstart;
    loop_args[0].kinc = kinc;
    loop_args[0].icount = icount;
    loop_args[0].jcount = jcount;
    loop_args[0].kcount = kcount;
    loop_args[0].istride = istride;
    loop_args[0].jstride = jstride;
    loop_args[0].kstride = kstride;
    loop_args[0].composite_func = composite_func;

    for(i=0;i<NUM_THREADS;i++)
    {
           loop_args[i] = loop_args[0];
           loop_args[i].kmystart = cur_num; 
           loop_args[i].id = i;
           cur_num += (num * kincr);
           cur_num += ((i < extras) * kincr);
           loop_args[i].kstop = cur_num; 
    }
                                                                                
   vp_pthreads_data->completed_threads = 0;
   for(i=1;i<NUM_THREADS;i++)
   {
        vp_pthreads_data->inputs[i] = &(loop_args[i]);
        vp_pthreads_data->task_number = LOOP_TASK;
        pthread_cond_signal(vp_pthreads_data->task_cond[i]);
   }
Listing Two

Start the Threads. After the threads have been created, Core 0 parallelizes Amide at the New_Pixel_Loop to run on four cores. Listing Two illustrates how this is accomplished. Each core is assigned variables in a global array using the instruction:


loop_args[i].kmystart = cur_num;

One quarter of the array indexes are passed to each core to process the image volume using the variable loop_args[i]. The four cores are started by first assigning memory for data with:


vp_pthreads_data->inputs[i] = 
  &(loop_args[i]) 


Next, each core begins executing the LOOP_TASK code initiated by:


vp_pthreads_data->task_number = 
  LOOP_TASK;

Finally, each core is released to begin processing using:


pthread_cond_signal
  (vp_pthreads_data->task_cond[i]



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video