Channels ▼
RSS

Parallel

Introduction to OpenMP


OpenMP Directives

Fortran Directives Format

Format: (case insensitive)

    sentinel directive-name [clause ...]
    All Fortran OpenMP directives must begin with a sentinel. The accepted sentinels depend upon the type of Fortran source. Possible sentinels are:
        !$OMP
        C$OMP
        *$OMP 
    A valid OpenMP directive. Must appear after the sentinel and before any clauses. Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted.

Example:

    
    !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(BETA,PI)
    
    

Fixed Form Source:

  • !$OMP C$OMP *$OMP are accepted sentinels and must start in column 1

  • All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the entire directive line

  • Initial directive lines must have a space/zero in column 6.

  • Continuation lines must have a non-space/zero in column 6.

Free Form Source:

  • !$OMP is the only accepted sentinel. Can appear in any column, but must be preceded by white space only.

  • All Fortran free form rules for line length, white space, continuation and comment columns apply for the entire directive line

  • Initial directive lines must have a space after the sentinel.

  • Continuation lines must have an ampersand as the last non-blank character in a line. The following line must begin with a sentinel and then the continuation directives.

General Rules:

  • Comments can not appear on the same line as a directive

  • Only one directive-name may be specified per directive

  • Fortran compilers which are OpenMP enabled generally include a command line option which instructs the compiler to activate and interpret all OpenMP directives.

  • Several Fortran OpenMP directives come in pairs and have the form shown below. The "end" directive is optional but advised for readability.

    
    !$OMP  directive 
    
        [ structured block of code ]
    
    !$OMP end  directive
    
    

C / C++ Directives Format

Format:

    #pragma omp directive-name [clause, ...] newline
    Required for all OpenMP C/C++ directives. A valid OpenMP directive. Must appear after the pragma and before any clauses.

    Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted. Required. Precedes the structured block which is enclosed by this directive.

Example:

    #pragma omp parallel default(shared) private(beta,pi)
    </PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    <P>General Rules:
    <UL> 
    <LI>Case sensitive
    <P>
    <LI>Directives follow conventions of the C/C++ standards for compiler 
        directives 
    <P>
    <LI>Only one directive-name may be specified per directive 
    <P>
    <LI>Each directive applies to at most one succeeding statement, which must be
        a structured block.
    <P>
    <LI>Long directive lines can be "continued" on succeeding lines by escaping
        the newline character with a backslash ("\") at the end of a directive line.
    </UL>
    <P>
    
    <H4>Directive Scoping</H4>
    <P>
    <p>Do we do this now...or do it later? Oh well, let's get it over with early...
    <p>Static (Lexical) Extent:
    <UL>
    <LI>The code textually enclosed between the beginning and the
        end of a structured block following a directive.
    <P>
    <LI>The static extent of a directives does not span multiple routines 
        or code files
    </UL>
     
    <p>Orphaned Directive:
    <UL>
    <LI>An OpenMP directive that appears independently from another
        enclosing directive is said to be an orphaned directive.  
        It exists outside of another directive's static (lexical) extent.
    <P>
    <LI>Will span routines and possibly code files
    </UL>
    <P>Dynamic Extent:
    <UL>
    <LI>The dynamic extent of a directive includes both its static 
        (lexical) extent and the extents of its orphaned directives.
    </UL>
    <P>Example:
    <UL>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR VALIGN=top>
    <TD BGCOLOR=EEEEEE><PRE>
          PROGRAM TEST
          ...
    <FONT COLOR=red>!$OMP PARALLEL</FONT>
    
          ...
    <FONT COLOR=red>!$OMP DO</FONT>
          DO I=...
          ...
          CALL SUB1
          ...
          ENDDO
          ...
          CALL SUB2
          ...
    <FONT COLOR=red>!$OMP END PARALLEL</FONT>
    </TD>
    
    <TD BGCOLOR=EEEEEE><PRE>
          SUBROUTINE SUB1
          ...
    <FONT COLOR=red>!$OMP CRITICAL</FONT>
          ...
    
    <FONT COLOR=red>!$OMP END CRITICAL</FONT>
          END
    
    
          SUBROUTINE SUB2
          ...
    <FONT COLOR=red>!$OMP SECTIONS</FONT>
          ...
    <FONT COLOR=red>!$OMP END SECTIONS</FONT>
          ...
          END
    </TD>
    </TR>
    
    <TR VALIGN=top>
    <TD ALIGN=center BGCOLOR=DDDDDD WIDTH=50%>
    
    STATIC EXTENT
    <BR>The <TT>DO</TT> directive occurs within an
        enclosing parallel region
    </TD>
    <TD ALIGN=center BGCOLOR=DDDDDD WIDTH=50%>
    ORPHANED DIRECTIVES
    <BR>The <TT>CRITICAL</TT> and <TT>SECTIONS</TT>
        directives occur outside an enclosing parallel region
    
    </TD>
    </TR><TR VALIGN=top> 
    <TD COLSPAN=2 ALIGN=center BGCOLOR=DDDDDD>
    DYNAMIC EXTENT
    <BR>The CRITICAL and SECTIONS directives occur within the dynamic extent of the DO and PARALLEL directives.
    </TD></TR>
    </TABLE>
    </UL>
    
    Why Is This Important?</p>
    <UL>
    <LI>OpenMP specifies a number of scoping rules on how directives may 
        associate (bind) and nest within each other
    <LI>Illegal and/or incorrect programs may result if the OpenMP binding
        and nesting rules are ignored
    <LI>See <A HREF=#BindingNesting>
        Directive Binding and Nesting Rules</A> for specific details
    </UL>
    
    
    
    <H3>PARALLEL Region Construct</H3>
    Purpose:
    <UL>
    
    <LI>A parallel region is a block of code that will be executed by multiple
        threads.  This is the fundamental OpenMP parallel construct.
    </UL>
    Format:
    
    <UL>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP PARALLEL <I>[clause ...] </I>
                   IF <I>(scalar_logical_expression) </I>
                   PRIVATE <I>(list) </I>
    
                   SHARED <I>(list) </I>
                   DEFAULT (PRIVATE | FIRSTPRIVATE | SHARED | NONE) 
                   FIRSTPRIVATE <I>(list) </I>
                   REDUCTION <I>(operator: list) </I>
                   COPYIN <I>(list) </I>
                   NUM_THREADS <I>(scalar-integer-expression)</I>
    
       <I>block</I>
    
    !$OMP END PARALLEL
    
    </TD>
    </TR><TR> 
    <TH>C/C++ </TH>
    <TD><PRE>
    #pragma omp parallel <I>[clause ...]  newline </I>
                         if <I>(scalar_expression) </I>
    
                         private <I>(list) </I>
                         shared <I>(list) </I>
                         default (shared | none) 
                         firstprivate <I>(list) </I>
                         reduction <I>(operator: list) </I>
                         copyin <I>(list) </I>
    
                         num_threads <I>(integer-expression)</I>
    
     
       <I>structured_block</I>
    
    </TD></TR></TABLE>
    
    
    </UL>
    
     
    Notes:
    
    <UL>
    
    <LI>When a thread reaches a PARALLEL directive, it creates a team of 
        threads and becomes the master of the team. The master is a member of 
        that team and has thread number 0 within that team.
    
    <LI>Starting from the beginning of this parallel region, the code is
        duplicated and all threads will execute that code.
    
    <LI>There is an implied barrier at the end of a parallel section.  Only the
        master thread continues execution past this point.
    
    <LI>If any thread terminates within a parallel region, all threads in the team
        will terminate, and the work done up until that point is undefined.
    </UL>
    
    
     
    How Many Threads?
    <UL>
    <LI>The number of threads in a parallel region is determined by the following
        factors, in order of precedence:
        <OL>
    
        
        <LI>Evaluation of the <TT><B>IF</B></TT> clause
        
        <LI>Setting of the <TT><B>NUM_THREADS</B></TT> clause
        
        <LI>Use of the <TT><B>omp_set_num_threads()</B></TT> library function
        
    
        <LI>Setting of the <B>OMP_NUM_THREADS</B> environment variable
        
        <LI>Implementation default - usually the number of CPUs on a node, though
            it could be dynamic (see next bullet).
        </OL>
    
    <LI>Threads are numbered from 0 (master thread) to N-1
    </UL>
    
     
    Dynamic Threads:
    <UL>
    
    <LI>Use the <TT><B>omp_get_dynamic()</B></TT> library function to determine if
        dynamic threads are enabled.
    
    <LI>If supported, the two methods available for enabling dynamic threads are:
        <OL>
        
        <LI>The <TT><B>omp_set_dynamic()</B></TT> library routine
        
        <LI>Setting of the <B>OMP_DYNAMIC</B> environment variable to TRUE
        </OL>
    
    </UL>
    
    
     
    Nested Parallel Regions:
    <UL>
    
    <LI>Use the <TT><B>omp_get_nested()</B></TT> library function to determine if
        nested parallel regions are enabled.
    
    <LI>The two methods available for enabling nested parallel regions
        (if supported) are:
        <OL>
        
    
        <LI>The <TT><B>omp_set_nested()</B></TT> library routine
        
        <LI>Setting of the <B>OMP_NESTED</B> environment variable to TRUE
        </OL>
    
    <LI>If not supported, a parallel region nested within another parallel region 
        results in the creation of a new team, consisting of one thread, by 
        default.
    </UL>
    
    
     
    Clauses:
    <UL>
    
    <LI><B>IF</B> clause: If present, it must evaluate to .TRUE. (Fortran) or
         non-zero
         (C/C++) in order for a team of threads to be created.  Otherwise, the
         region is executed serially by the master thread.
    
    <LI>The remaining clauses are described in detail later, in 
        the <A HREF=#Clauses>Data Scope Attribute Clauses</A> section.
    </UL>
    
     
    Restrictions:
    
    <UL>
    
    <LI>A parallel region must be a structured block that does not span 
        multiple routines or code files
    
    <LI>It is illegal to branch into or out of a parallel region
    
    <LI>Only a single IF clause is permitted
    
    <LI>Only a single NUM_THREADS clause is permitted
    </UL>
    
    <HR>
    
    Example: Parallel Region
    
    <UL>
    
    <LI>Simple "Hello World" program
        <UL>
        <LI>Every thread executes all code enclosed in the parallel section
        <LI>OpenMP library routines are used to obtain thread identifiers and total 
            number of threads
        </UL>
    
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    
    Fortran - Parallel Region Example
    <HR>
    <PRE>
           PROGRAM HELLO
    
           INTEGER NTHREADS, TID, <FONT COLOR=red>OMP_GET_NUM_THREADS</FONT>,
         +   <FONT COLOR=red>OMP_GET_THREAD_NUM</FONT>
    
    C     Fork a team of threads with each thread having a private TID variable
    <FONT COLOR=red>!$OMP PARALLEL PRIVATE(TID)</FONT>
    
    C     Obtain and print thread id
          TID = <FONT COLOR=red>OMP_GET_THREAD_NUM()</FONT>
          PRINT *, 'Hello World from thread = ', TID
    
    C     Only master thread does this
          IF (TID .EQ. 0) THEN
            NTHREADS = <FONT COLOR=red>OMP_GET_NUM_THREADS()</FONT>
            PRINT *, 'Number of threads = ', NTHREADS
          END IF
    
    C     All threads join master thread and disband
    <FONT COLOR=red>!$OMP END PARALLEL</FONT>
    
           END
    </B></PRE>
    </TD></TR></TABLE>
    <P>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    <p>
    C / C++ - Parallel Region Example
    <HR>
    <PRE>
    #include <omp.h>
    
    main ()  {
    
    int nthreads, tid;
    
    /* Fork a team of threads with each thread having a private tid variable */
    <FONT COLOR=red>#pragma omp parallel private(tid)</FONT>
      {
    
      /* Obtain and print thread id */
      tid = <FONT COLOR=red>omp_get_thread_num()</FONT>;
      printf("Hello World from thread = %d\n", tid);
    
      /* Only master thread does this */
      if (tid == 0) 
        {
        nthreads = <FONT COLOR=red>omp_get_num_threads()</FONT>;
        printf("Number of threads = %d\n", nthreads);
        }
    
      }  /* All threads join master thread and terminate */
    
    }
    </B></PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    
    <P>
    
    <H3>Work-Sharing Constructs</H3>
    <p>
    <UL>
    <LI>A work-sharing construct divides the execution of the enclosed code
        region among the members of the team that encounter it. 
    <P>
    <LI>Work-sharing constructs do not launch new threads
    <P>
    <LI>There is no implied barrier upon entry to a work-sharing construct, 
        however there is an implied barrier at the end of a work sharing
        construct.
    </UL>
    <P>
     
    <P>
    <p>Types of Work-Sharing Constructs:
    <P>
    <p>NOTE: The Fortran <B><TT>workshare</TT></B> construct is not shown here, but
    is discussed later.
    <P>
    <P>
    <TABLE BORDER=0 CELLPADDING=5 CELLSPACING=0> 
    <TR VALIGN=top>
    <TD><B>DO / for</B> - shares iterations of a loop across the team.  Represents a type of "data parallelism".
    <TD><B>SECTIONS</B> - breaks work into separate, discrete sections.  Each section is executed by a thread.  Can be used to implement a type of  "functional parallelism."
    </TD>
    <P>
    <img src="http://twimgs.com/ddj/images/article/2009/0902/090225gointelopenmp_f3.gif">
    <P>
     
    <p>Restrictions:
    <P>
    <LI>A work-sharing construct must be enclosed dynamically within a parallel
        region in order for the directive to execute in parallel.
    <P>
    <P>
    <LI>Work-sharing constructs must be encountered by all members of a team
        or none at all
    <P>
    <LI>Successive work-sharing constructs must be encountered in the same
        order by all members of a team
    <P>
    
    <H3>Work-Sharing Constructs <BR> DO / for Directive</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <LI>The DO / for directive specifies that the iterations of the loop immediately
        following it must be executed in parallel by the team. This assumes a 
        parallel region has already been initiated, otherwise it executes in serial
        on a single processor.  
    </UL>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <P>
    <TD><PRE>
    !$OMP DO <I>[clause ...] </I>
             SCHEDULE <I>(type [,chunk]) </I>
             ORDERED 
             PRIVATE <I>(list) </I>
             FIRSTPRIVATE <I>(list) </I>
             LASTPRIVATE <I>(list) </I>
    
             SHARED <I>(list) </I>
             REDUCTION <I>(operator | intrinsic : list) </I>
             COLLAPSE <I>(n) </I>
    
       <I>do_loop</I>
    
    !$OMP END DO  [ NOWAIT ]
    
    </TD>
    </TR><TR>
    <TH>C/C++</TH>
    <TD><PRE>
    #pragma omp for <I>[clause ...]  newline </I>
                    schedule <I>(type [,chunk]) </I>
                    ordered
                    private <I>(list) </I>
    
                    firstprivate <I>(list) </I>
                    lastprivate <I>(list) </I>
                    shared <I>(list) </I>
                    reduction <I>(operator: list) </I>
                    collapse <I>(n) </I>
    
                    nowait 
    
       <I>for_loop</I>
    
    </TD></TR></TABLE>
    </UL>
    
     
    Clauses:
    <UL>
        
        <LI><B>SCHEDULE</B>: Describes how iterations of the loop are 
            divided among the threads in the team.  The default schedule is 
            implementation dependent.  
            <ol>
    
            
            STATIC 
            Loop iterations are divided into pieces of size <I>chunk</I> 
            and then statically assigned to threads.  If chunk is not specified, 
            the iterations are evenly (if possible) divided contiguously among 
            the threads.
            
            DYNAMIC 
            Loop iterations are divided into pieces of size <I>chunk</I>, 
            and dynamically scheduled among the threads; when a thread finishes one
            chunk, it is dynamically assigned another. The default chunk size is 1.
            
            GUIDED 
            For a chunk size of 1, the size of each chunk is proportional to 
                the number of unassigned iterations divided by the number of 
                threads, decreasing to 1. For a chunk size with value k 
                (greater than 1), the size of each chunk is determined in the 
                same way with the restriction that the chunks do not contain 
                fewer than k iterations (except for the last chunk to be assigned, 
                which may have fewer than k iterations).  The default chunk 
                size is 1.
            
    
            RUNTIME 
            The scheduling decision is deferred until runtime by the
            environment variable OMP_SCHEDULE.  It is illegal to specify a chunk 
            size for this clause.
            
            AUTO 
            The scheduling decision is delegated to the compiler and/or runtime
                system.
           </ol>
            
        <LI><B>NO WAIT / nowait</B>: If specified, then threads do not synchronize 
            at the end of the parallel loop.  
        
        <LI><B>ORDERED</B>: Specifies that the iterations of the loop must be 
            executed as they would be in a serial program.
        
    
        <LI><B>COLLAPSE</B>: Specifies how many loops in a nested loop should be
            collapsed into one large iteration space and divided according to the
            <TT>schedule</TT> clause. The sequential execution of the iterations in all
            associated loops determines the order of the iterations in the collapsed
            iteration space.
        
        <LI>Other clauses are described in detail later, in the 
            <A HREF=#Clauses>Data Scope Attribute Clauses</A> section.
        </UL>
    </UL>
    
     
    Restrictions:
    
    <UL>
    
    <LI>The DO loop can not be a DO WHILE loop, or a loop without loop
        control. Also, the loop iteration variable must be an integer and
        the loop control parameters must be the same for all
        threads.
    
    <LI>Program correctness must not depend upon which thread executes a
        particular iteration.
    
    <LI>It is illegal to branch out of a loop associated with a DO/for                
    directive.
    
    <LI>The chunk size must be specified as a loop invarient integer
        expression, as there is no synchronization during its evaluation by
        different threads.
    
    <LI>ORDERED, COLLAPSE and SCHEDULE clauses may appear once each.
    
    <LI>See the OpenMP specification document for additional restrictions.
    </UL>
    
    <HR>
    <H3>Example: DO / for Directive</H3>
    
    <UL>
    <LI>Simple vector-add program
        <UL>
        <LI>Arrays A, B, C, and variable N will be shared by all threads.
        <LI>Variable I will be private to each thread; each thread will have its own
            unique copy.  
        <LI>The iterations of the loop will be distributed dynamically in
            CHUNK sized pieces.  
        <LI>Threads will not synchronize upon completing their individual pieces of 
            work (NOWAIT).
        </UL>
    
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    
    Fortran - DO Directive Example
    
    <HR>
    <PRE>
          PROGRAM VEC_ADD_DO
    
          INTEGER N, CHUNKSIZE, CHUNK, I
          PARAMETER (N=1000) 
          PARAMETER (CHUNKSIZE=100) 
          REAL A(N), B(N), C(N)
    
    !     Some initializations
          DO I = 1, N
            A(I) = I * 1.0
            B(I) = A(I)
          ENDDO
          CHUNK = CHUNKSIZE
            
    <FONT COLOR=red>!$OMP PARALLEL SHARED(A,B,C,CHUNK) PRIVATE(I)</FONT>
    
    <FONT COLOR=red>!$OMP DO SCHEDULE(DYNAMIC,CHUNK)</FONT>
          DO I = 1, N
             C(I) = A(I) + B(I)
          ENDDO
    <FONT COLOR=red>!$OMP END DO NOWAIT</FONT>
    
    <FONT COLOR=red>!$OMP END PARALLEL</FONT>
    
          END
    </B></PRE>
    </TD></TR></TABLE>
    <P>
    <P>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    <p>
    C / C++ - for Directive Example
    <HR>
    <PRE>
    #include <omp.h>
    
    #define CHUNKSIZE 100
    #define N     1000
    
    main ()  
    {
    
    int i, chunk;
    float a[N], b[N], c[N];
    
    /* Some initializations */
    for (i=0; i < N; i++)
      a[i] = b[i] = i * 1.0;
    chunk = CHUNKSIZE;
    
    <FONT COLOR=red>#pragma omp parallel shared(a,b,c,chunk) private(i)</FONT>
      {
    
      <FONT COLOR=red>#pragma omp for schedule(dynamic,chunk) nowait</FONT>
      for (i=0; i < N; i++)
        c[i] = a[i] + b[i];
    
      }  /* end of parallel section */
    
    }
    </B></PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    
    <P>
    <H3>Work-Sharing Constructs <BR> SECTIONS Directive</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <P>
    <P>
    <LI>The SECTIONS directive is a non-iterative work-sharing construct.
        It specifies that the enclosed section(s) of code are to be divided among 
        the threads in the team.  
    <P>
    <LI>Independent SECTION directives are nested within a SECTIONS directive. 
        Each SECTION is executed once by a thread in the team.  Different sections 
        may be executed by different threads. It is possible that for a thread
        to execute more than one section if it is quick enough and the 
        implementation permits such.
    </UL>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP SECTIONS <I>[clause ...] </I>
    
                   PRIVATE <I>(list) </I>
                   FIRSTPRIVATE <I>(list) </I>
                   LASTPRIVATE <I>(list) </I>
                   REDUCTION <I>(operator | intrinsic : list) </I>
    
    !$OMP  SECTION 
    
       <I>block</I>
    
    !$OMP  SECTION 
    
        <I>block</I> 
    
    !$OMP END SECTIONS  [ NOWAIT ]
    
    </TD>
    </TR><TR>
    <TH>C/C++</TH>
    <TD><PRE>
    #pragma omp sections <I>[clause ...]  newline </I>
                         private <I>(list) </I>
    
                         firstprivate <I>(list) </I>
                         lastprivate <I>(list) </I>
                         reduction <I>(operator: list) </I>
                         nowait
      {
    
      #pragma omp section   <I>newline </I>
    
         <I>structured_block</I>
    
      #pragma omp section   <I>newline </I>
    
         <I>structured_block</I>
    
      }
    </TD></TR></TABLE>
    
    </UL>
    
     
    Clauses:
    <UL>
    
    <LI>There is an implied barrier at the end of a SECTIONS directive, unless 
        the <TT>NOWAIT/nowait</TT> clause is used.
    
    <LI>Clauses are described in detail later, in the 
        <A HREF=#Clauses>Data Scope Attribute Clauses</A> section.
    </UL>
    
    
     
    Questions:
    <UL>
    
    <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0>
    <TR VALIGN=top>
    <TD>What happens if the number of threads and the number of SECTIONs are
        different?  More threads than SECTIONs? Less threads than SECTIONs?
        <BR
    <TR><TD> 
    <TR VALIGN=top>
    <TD>Which thread executes which SECTION?
        <BR></TD></TR></TABLE>
    
    </UL>
    
     
    Restrictions:
    <UL>
    
    <LI>It is illegal to branch into or out of section blocks.
    
    <LI>SECTION directives must occur within the lexical extent of an
        enclosing SECTIONS directive
    </UL>
    
    
    
    
    <HR>
    <H3>Example: SECTIONS Directive</H3>
    
    <UL>
    
    <LI>Simple program demonstrating that different blocks of work will be 
        done by different threads.
    
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    
    Fortran - SECTIONS Directive Example
    <HR>
    <PRE>
          PROGRAM VEC_ADD_SECTIONS
    
          INTEGER N, I
          PARAMETER (N=1000)
          REAL A(N), B(N), C(N), D(N)
    
    !     Some initializations
          DO I = 1, N
            A(I) = I * 1.5
            B(I) = I + 22.35
          ENDDO
    
    <FONT COLOR=red>!$OMP PARALLEL SHARED(A,B,C,D), PRIVATE(I)</FONT>
    
    <FONT COLOR=red>!$OMP SECTIONS</FONT>
    
    <FONT COLOR=red>!$OMP SECTION</FONT>
          DO I = 1, N
             C(I) = A(I) + B(I)
          ENDDO
    
    <FONT COLOR=red>!$OMP SECTION</FONT>
          DO I = 1, N
             D(I) = A(I) * B(I)
          ENDDO
    
    <FONT COLOR=red>!$OMP END SECTIONS NOWAIT</FONT>
    
    <FONT COLOR=red>!$OMP END PARALLEL</FONT>
    
          END
    </B></PRE>
    </TD></TR></TABLE>
    <P>
    <P>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    <p>C / C++ - sections Directive Example
    <HR>
    <pre class="brush: c; html: collapse;" style="font-family: Andale Mono,Lucida Console,Monaco,fixed,monospace; color: rgb(0, 0, 0); background-color: rgb(238, 238, 238); font-size: 12px; border: 1px dashed rgb(153, 153, 153); line-height: 14px; padding: 5px; overflow: auto; width: 100%;">
    #include <omp.h>
    
    #define N     1000
    
    main ()
    {
    
    int i;
    float a[N], b[N], c[N], d[N];
    
    /* Some initializations */
    for (i=0; i < N; i++)
      a[i] = i * 1.5;
      b[i] = i + 22.35;
    
    #pragma omp parallel shared(a,b,c,d) private(i)
      {
    
       #pragma omp sections nowait
        {
    
        #pragma omp section
        for (i=0; i < N; i++)
          c[i] = a[i] + b[i];
    
        #pragma omp section
    
        for (i=0; i < N; i++)
          d[i] = a[i] * b[i];
    
        }  /* end of sections */
    
      }  /* end of parallel section */
    
    }
    </B></PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    
    <P>
    
    <H3>Work-Sharing Constructs <BR> WORKSHARE Directive</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <P>
    <LI>Fortran only
    <P>
    <LI>The WORKSHARE directive divides the execution of the enclosed structured
        block into separate units of work, each of which is executed only once. 
    <P>
    <LI>The structured block must consist of only the following:
        <UL>
        <LI>array assignments
        <LI>scalar assignments
        <LI>FORALL statements
        <LI>FORALL constructs
        <LI>WHERE statements
        <LI>WHERE constructs
        <LI>atomic constructs
        <LI>critical constructs
        <LI>parallel constructs
        </UL>
    <P>
    <P>
    <LI>See the OpenMP API documentation for additional information.
    </UL>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP WORKSHARE
    
       <I>structured block</I>
    
    !$OMP END WORKSHARE [ NOWAIT ]
    
    </TD>
    </TR></TABLE>
    </UL>
    
    
     
    Restrictions:
    <UL>
    
    <LI>The construct must not contain any user defined function calls 
        unless the function is ELEMENTAL.
    </UL>
    
    <HR>
    <H3>Example: WORKSHARE Directive</H3>
    
     
    <UL>
    
    <LI>Simple array and scalar assigments shared by the team of threads. A
        unit of work would include:
        <UL>
        <LI>Any scalar assignment
        <LI>For array assignment statements, the assignment of each element is
            a unit of work
        </UL>
    
    
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    
    Fortran - WORKSHARE Directive Example
    
    <HR>
    <PRE>
          PROGRAM WORKSHARE
     
          INTEGER N, I, J
          PARAMETER (N=100)
          REAL AA(N,N), BB(N,N), CC(N,N), DD(N,N), FIRST, LAST
     
    !     Some initializations
          DO I = 1, N
            DO J = 1, N
              AA(J,I) = I * 1.0
              BB(J,I) = J + 1.0
            ENDDO
          ENDDO
     
    <FONT COLOR=red>!$OMP PARALLEL SHARED(AA,BB,CC,DD,FIRST,LAST)</FONT>
    
    <FONT COLOR=red>!$OMP WORKSHARE</FONT>
          CC = AA * BB
          DD = AA + BB
          FIRST = CC(1,1) + DD(1,1)
          LAST = CC(N,N) + DD(N,N)
    <FONT COLOR=red>!$OMP END WORKSHARE NOWAIT</FONT>
    
    <FONT COLOR=red>!$OMP END PARALLEL</FONT>
     
          END
    
    </B></PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    
    <H3>Work-Sharing Constructs <BR> SINGLE Directive</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <P>
    <LI>The SINGLE directive specifies that the enclosed code is to be executed by
        only one thread in the team.
    <P>
    <LI>May be useful when dealing with sections of code that are not thread
        safe (such as I/O)
    </UL>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <P>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP SINGLE <I>[clause ...] </I>
                 PRIVATE <I>(list) </I>
                 FIRSTPRIVATE <I>(list) </I>
    
       <I>block</I>
    
    !$OMP END SINGLE [ NOWAIT ]
    
    </TD>
    </TR><TR>
    <TH>C/C++</TH>
    <TD><PRE>
    #pragma omp single <I>[clause ...]  newline </I>
                       private <I>(list) </I>
                       firstprivate <I>(list) </I>
    
                       nowait
    
         <I>structured_block</I>
    
    </TD></TR></TABLE>
    
    </UL>
    
     
    Clauses:
    <UL>
    
    <LI>Threads in the team that do not execute the SINGLE directive, wait at
        the end of the enclosed code block, unless a <TT>NOWAIT/nowait</TT> 
        clause is specified.
    
    
    <LI>Clauses are described in detail later, in the 
        <A HREF=#Clauses>Data Scope Attribute Clauses</A> section.
    </UL>
    
     
    Restrictions:
    <UL>
    
    <LI>It is illegal to branch into or out of a SINGLE block.
    </UL>
    
    
    <H3>Combined Parallel Work-Sharing Constructs </H3>
    
    <UL>
    
    <LI>OpenMP provides three directives that are merely conveniences:
        <UL>
        <LI>PARALLEL DO  / parallel for
        <LI>PARALLEL SECTIONS
        <LI>PARALLEL WORKSHARE (fortran only)
        </UL>
    
    
    <LI>For the most part, these directives behave identically to an
        individual PARALLEL directive being immediately followed 
        by a separate work-sharing directive.
    
    <LI>Most of the rules, clauses and restrictions that apply to both directives
        are in effect. See the OpenMP API for details.
    
    <LI>An example using the PARALLEL DO  / parallel for combined directive is
        shown below.
    
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR><TD BGCOLOR=EEEEEE>
      
    
    Fortran - PARALLEL DO Directive Example
    <HR>
    <PRE>
          PROGRAM VECTOR_ADD
    
          INTEGER N, I, CHUNKSIZE, CHUNK
          PARAMETER (N=1000) 
          PARAMETER (CHUNKSIZE=100) 
          REAL A(N), B(N), C(N)
    
    !     Some initializations
          DO I = 1, N
            A(I) = I * 1.0
            B(I) = A(I)
          ENDDO
          CHUNK = CHUNKSIZE
                 
    
    <FONT COLOR=red>!$OMP PARALLEL DO
    !$OMP& SHARED(A,B,C,CHUNK) PRIVATE(I) 
    !$OMP& SCHEDULE(STATIC,CHUNK)</FONT>
    
          DO I = 1, N
             C(I) = A(I) + B(I)
          ENDDO
    
    <FONT COLOR=red>!$OMP END PARALLEL DO</FONT>
    
          END
    </B></PRE>
    </TD></TR></TABLE>
    <P>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <P>
    <TR><TD BGCOLOR=EEEEEE>
      
    <p>C / C++ - parallel for Directive Example
    <HR>
    <pre class="brush: c; html: collapse;" style="font-family: Andale Mono,Lucida Console,Monaco,fixed,monospace; color: rgb(0, 0, 0); background-color: rgb(238, 238, 238); font-size: 12px; border: 1px dashed rgb(153, 153, 153); line-height: 14px; padding: 5px; overflow: auto; width: 100%;">
    #include <omp.h>
    #define N       1000
    #define CHUNKSIZE   100
    
    main ()  {
    
    int i, chunk;
    float a[N], b[N], c[N];
    
    /* Some initializations */
    for (i=0; i < N; i++)
      a[i] = b[i] = i * 1.0;
    chunk = CHUNKSIZE;
    
    #pragma omp parallel for \
       shared(a,b,c,chunk) private(i) \
       schedule(static,chunk)
      for (i=0; i < n; i++)
        c[i] = a[i] + b[i];
    }
    </B></PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    
    <H3>TASK Construct</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <P>
    <LI>New construct with OpenMP 3.0
    <P>
    <LI>The TASK construct defines an explicit task, which may be executed by
        the encountering thread, or deferred for execution by any other thread
        in the team.
    <P>
    <LI>The data environment of the task is determined by the data sharing 
        attribute clauses.
    <P>
    <LI>Task execution is subject to task scheduling - see the OpenMP 
        3.0 specification document for details.
    </UL>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP TASK <I>[clause ...] </I>
                 IF <I>(scalar expression) </I>
                 UNTIED
                 DEFAULT (PRIVATE | FIRSTPRIVATE | SHARED | NONE)
                 PRIVATE <I>(list) </I>
    
                 FIRSTPRIVATE <I>(list) </I>
                 SHARED <I>(list) </I>
    
       <I>block</I>
    
    !$OMP END TASK
    
    </TD>
    </TR><TR>
    <TH>C/C++</TH>
    
    <TD><PRE>
    #pragma omp task <I>[clause ...]  newline </I>
                       if <I>(scalar expression) </I>
                       untied
                       default (shared | none)
                       private <I>(list) </I>
                       firstprivate <I>(list) </I>
                       shared <I>(list) </I>
    
         <I>structured_block</I>
    
    </TD></TR></TABLE>
    
    </UL>
    
     
    Clauses and Restrictions:
    <UL>
    
    <LI>Please consult the OpenMP 3.0 specifications document for details.
    </UL>
    
    
    <H3>Synchronization Constructs</H3>
    <UL>
    <LI>Consider a simple example where two threads on two different processors are
        both trying to increment a variable x at the same time (assume x is
        initially 0):
    
    
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR VALIGN=top>
    <TD><B>THREAD 1:</B>
    <PRE>
    increment(x)
    {
        x = x + 1;
    }
    </PRE>
    <B>THREAD 1:</B>
    <PRE>
    10  LOAD A, (x address)
    20  ADD A, 1
    30  STORE A, (x address)
    </PRE>
    <P>
    <TD><B>THREAD 2:</B> 
    <PRE>
    increment(x)
    {
        x = x + 1;
    }
    
    </PRE>
    <B>THREAD 2:</B>
    <PRE>
    10  LOAD A, (x address)
    20  ADD A, 1
    30  STORE A, (x address)
    </PRE>
    </TD></TR></TABLE>
    <P>
    <P>
    <LI>One possible execution sequence:
    <OL>
      <LI>Thread 1 loads the value of x into register A.
      <LI>Thread 2 loads the value of x into register A.
      <LI>Thread 1 adds 1 to register A
      <LI>Thread 2 adds 1 to register A
      <LI>Thread 1 stores register A at location x
      <LI>Thread 2 stores register A at location x
    <P>
    </OL>
    <P>The resultant value of x will be 1, not 2 as it should be.
    <P>
    <LI>To avoid a situation like this, the incrementation of x must be
        synchronized between the two threads to insure that the correct result is
        produced.
    <P>
    <LI>OpenMP provides a variety of Synchronization Constructs that control how 
        the execution of each thread proceeds relative to other team threads.
    </UL>
    <P>
    
    <P>
    <H3>Synchronization Constructs <BR>
    MASTER Directive</H3>
    <p>Purpose:
    <UL>
    <P>
    <LI>The MASTER directive specifies a region that is to be executed only by the
        master thread of the team.  All other threads on the team skip this 
        section of code
    <P>
    <LI>There is no implied barrier associated with this directive
    </UL>
    <P>
     
    <P>
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP MASTER
    
       <I>block</I>
    
    !$OMP END MASTER
    
    </TD>
    </TR><TR>
    <TH>C/C++</TH>
    
    <TD><PRE>
    #pragma omp master  <I>newline</I>
    
       <I>structured_block</I>
    
    

Restrictions:

  • It is illegal to branch into or out of MASTER block.

Synchronization Constructs
CRITICAL Directive

Purpose:

  • The CRITICAL directive specifies a region of code that must be executed by only one thread at a time.

Format:

    Fortran
    !$OMP CRITICAL [ name ]
    
       block
    
    !$OMP END CRITICAL
    
    
    C/C++
    #pragma omp critical [ name ]  newline
    
       structured_block
    
    

Notes:

  • If a thread is currently executing inside a CRITICAL region and another thread reaches that CRITICAL region and attempts to execute it, it will block until the first thread exits that CRITICAL region.

  • The optional name enables multiple different CRITICAL regions to exist:

    • Names act as global identifiers. Different CRITICAL regions with the same name are treated as the same region.
    • All CRITICAL sections which are unnamed, are treated as the same section.

Restrictions:

  • It is illegal to branch into or out of a CRITICAL block.


Example: CRITICAL Construct

  • All threads in the team will attempt to execute in parallel, however, because of the CRITICAL construct surrounding the increment of x, only one thread will be able to read/increment/write x at any time

    Fortran - CRITICAL Directive Example


          PROGRAM CRITICAL
    
          INTEGER X
          X = 0
    
    !$OMP PARALLEL SHARED(X) 
    
    !$OMP CRITICAL 
          X = X + 1
    !$OMP END CRITICAL 
    
    !$OMP END PARALLEL 
    
          END
    

    C / C++ - critical Directive Example


    #include <omp.h>
    
    main()
    {
    
    int x;
    x = 0;
    
    #pragma omp parallel shared(x) 
      {
    
    
    #pragma omp critical 
      x = x + 1;
    
      }  /* end of parallel section */
    
    }
    </B></PRE>
    </TD></TR></TABLE>
    </UL>
    <P>
    
    <P>
    <H3>Synchronization Constructs 
    <BR>BARRIER Directive</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <P>
    <LI>The BARRIER directive synchronizes all threads in the team.
    <P>
    <LI>When a BARRIER directive is reached, a thread will wait at that point until
        all other threads have reached that barrier.  All threads then resume
        executing in parallel the code that follows the barrier.
    </UL>
    <P>
    <P>
     
    <P>
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP BARRIER
    
    </B></PRE></TD>
    </TR><TR>
    <TH>C/C++</TH>
    <TD><PRE>
    #pragma omp barrier  <I>newline</I>
    
    </B></PRE></TD></TR></TABLE>
    </UL>
    <P>
    <P>
     
    <p>Restrictions:
    <UL>
    <P>
    <LI>All threads in a team (or none) must execute the BARRIER region.
    <P>
    <LI>The sequence of work-sharing regions and barrier regions encountered must 
        be the same for every thread in a team.
    </UL>
    <P>
    
    <H3>Synchronization Constructs 
    <BR>TASKWAIT Directive</H3>
    <P>
     
    <p>Purpose:
    <UL>
    <P>
    <P>
    <LI>The TASKWAIT construct specifies a wait on the completion of child tasks  
        generated since the beginning of the current task.
    </UL>
    <P>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <TD><PRE>
    !$OMP TASKWAIT
    
    </B></PRE></TD>
    </TR><TR>
    <P>
    <TH>C/C++</TH>
    <TD><PRE>
    #pragma omp taskwait  <I>newline</I>
    
    </B></PRE></TD></TR></TABLE>
    </UL>
    <P>
    <P>
     
    <p>Restrictions:
    <UL>
    <P>
    <LI>Because the taskwait construct does not have a C language statement as part of its syntax, there are some restrictions on its placement within a program. The taskwait directive may be placed only at a point where a base language statement is allowed. The taskwait directive may not be used in place of the statement following an if, while, do, switch, or label. See the OpenMP 3.0 specifications document for details.
    <P>
    </UL>
    <P>
    <H3>Synchronization Constructs <BR> ATOMIC Directive</H3>
    <P>
    
    <p>Purpose:
    <UL>
    <P>
    <LI>The ATOMIC directive specifies that a specific memory location must be
        updated atomically, rather than letting multiple threads attempt to 
        write to it.  In essence, this directive provides a mini-CRITICAL section.
    </UL>
    <P>
     
    <p>Format:
    <UL>
    <P>
    <TABLE BORDER=1 CELLPADDING=5 CELLSPACING=0 WIDTH=90%>
    <TR>
    <TH WIDTH=5%>Fortran</TH>
    <P>
    <TD><PRE>
    !$OMP ATOMIC
    
       <I>statement_expression</I>
    
    C/C++
    #pragma omp atomic  newline
    
       statement_expression
    

Restrictions:

  • The directive applies only to a single, immediately following statement

  • An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this.



Synchronization Constructs
FLUSH Directive

Purpose:

  • The FLUSH directive identifies a synchronization point at which the implementation must provide a consistent view of memory. Thread-visible variables are written back to memory at this point.

  • There is a fair amount of discussion on this directive within OpenMP circles that you may wish to consult for more information.

  • To quote from the openmp.org FAQ:

    Q17: Is the !$omp flush directive necessary on a cache coherent system?

    A17: Yes the flush directive is necessary. Look in the OpenMP specifications for examples of it's uses. The directive is necessary to instruct the compiler that the variable must be written to/read from the memory system, i.e. that the variable can not be kept in a local CPU register over the flush "statement" in your code.

    Cache coherency makes certain that if one CPU executes a read or write instruction from/to memory, then all other CPUs in the system will get the same value from that memory address when they access it. All caches will show a coherent value. However, in the OpenMP standard there must be a way to instruct the compiler to actually insert the read/write machine instruction and not postpone it. Keeping a variable in a register in a loop is very common when producing efficient machine language code for a loop.

  • Also see the most recent OpenMP specs for details.

Format:

    Fortran
    !$OMP FLUSH  (list)
    
    

    C/C++
    #pragma omp flush (list)  newline
    
    

Notes:

  • The optional list contains a list of named variables that will be flushed in order to avoid flushing all variables. For pointers in the list, note that the pointer itself is flushed, not the object it points to.

  • Implementations must ensure any prior modifications to thread-visible variables are visible to all threads after this point; ie. compilers must restore values from registers to memory, hardware might need to flush write buffers, etc

  • The FLUSH directive is implied for the directives shown in the table below. The directive is not implied if a NOWAIT clause is present.

    Fortran C / C++

      BARRIER
      END PARALLEL
      CRITICAL and END CRITICAL
      END DO
      END SECTIONS
      END SINGLE
      ORDERED and END ORDERED

      barrier
      parallel - upon entry and exit
      critical - upon entry and exit


      ordered - upon entry and exit
      for - upon exit
      sections - upon exit
      single - upon exit

Synchronization Constructs
ORDERED Directive

Purpose:

  • The ORDERED directive specifies that iterations of the enclosed loop will be executed in the same order as if they were executed on a serial processor.

  • Threads will need to wait before executing their chunk of iterations if previous iterations haven't completed yet.

  • Used within a DO / for loop with an ORDERED clause

  • The ORDERED directive provides a way to "fine tune" where ordering is to be applied within a loop. Otherwise, it is not required.

Format:

    Fortran
    !$OMP DO ORDERED [clauses...]
       (loop region)
    
    !$OMP ORDERED
    
       (block)
    
    !$OMP END ORDERED
    
       (end of loop region)
    
    !$OMP END DO
    
    
    C/C++
    #pragma omp for ordered [clauses...]
       (loop region)
    
    #pragma omp ordered  newline
    
       structured_block
    
       (endo of loop region)
    

Restrictions:

  • An ORDERED directive can only appear in the dynamic extent of the following directives:
    • DO or PARALLEL DO (Fortran)
    • for or parallel for (C/C++)

  • Only one thread is allowed in an ordered section at any time

  • It is illegal to branch into or out of an ORDERED block.

  • An iteration of a loop must not execute the same ORDERED directive more than once, and it must not execute more than one ORDERED directive.

  • A loop which contains an ORDERED directive, must be a loop with an ORDERED clause.

THREADPRIVATE Directive

Purpose:

  • The THREADPRIVATE directive is used to make global file scope variables (C/C++) or common blocks (Fortran) local and persistent to a thread through the execution of multiple parallel regions.

Format:

    Fortran
    
    !$OMP THREADPRIVATE (/cb/, ...)  cb is the name of a common block
    
    
    C/C++
    #pragma omp threadprivate (list)
    
    

Notes:

  • The directive must appear after the declaration of listed variables/common blocks. Each thread then gets its own copy of the variable/common block, so data written by one thread is not visible to other threads. For example:

    Fortran - THREADPRIVATE Directive Example


          PROGRAM THREADPRIV
     
          INTEGER A, B, I, TID, OMP_GET_THREAD_NUM
          REAL*4 X
          COMMON /C1/ A
     
    !$OMP THREADPRIVATE(/C1/, X) 
    
     
    C     Explicitly turn off dynamic threads
          CALL OMP_SET_DYNAMIC(.FALSE.)
     
          PRINT *, '1st Parallel Region:'
    !$OMP PARALLEL PRIVATE(B, TID) 
          TID = OMP_GET_THREAD_NUM()
          A = TID
          B = TID
          X = 1.1 * TID + 1.0
          PRINT *, 'Thread',TID,':   A,B,X=',A,B,X
    !$OMP END PARALLEL 
     
          PRINT *, '************************************'
          PRINT *, 'Master thread doing serial work here'
          PRINT *, '************************************'
     
          PRINT *, '2nd Parallel Region: '
    !$OMP PARALLEL PRIVATE(TID) 
    
          TID = OMP_GET_THREAD_NUM()
          PRINT *, 'Thread',TID,':   A,B,X=',A,B,X
    !$OMP END PARALLEL 
     
          END
    
    Output: 1st Parallel Region: Thread 0 : A,B,X= 0 0 1.000000000 Thread 1 : A,B,X= 1 1 2.099999905 Thread 3 : A,B,X= 3 3 4.300000191 Thread 2 : A,B,X= 2 2 3.200000048 ************************************ Master thread doing serial work here ************************************ 2nd Parallel Region: Thread 0 : A,B,X= 0 0 1.000000000 Thread 2 : A,B,X= 2 0 3.200000048 Thread 3 : A,B,X= 3 0 4.300000191 Thread 1 : A,B,X= 1 0 2.099999905

    C/C++ - threadprivate Directive Example


    #include <omp.h> 
     
    int  a, b, i, tid;
    float x;
     
    #pragma omp threadprivate(a, x)
     
    main ()  {
     
    /* Explicitly turn off dynamic threads */
      omp_set_dynamic(0);
     
      printf("1st Parallel Region:\n");
    
    #pragma omp parallel private(b,tid)
      {
      tid = omp_get_thread_num();
      a = tid;
      b = tid;
      x = 1.1 * tid +1.0;
      printf("Thread %d:   a,b,x= %d %d %f\n",tid,a,b,x);
      }  /* end of parallel section */
     
      printf("************************************\n");
      printf("Master thread doing serial work here\n");
      printf("************************************\n");
     
      printf("2nd Parallel Region:\n");
    #pragma omp parallel private(tid)
      {
      tid = omp_get_thread_num();
      printf("Thread %d:   a,b,x= %d %d %f\n",tid,a,b,x);
      }  /* end of parallel section */
    
    }
    
    
    Output: 1st Parallel Region: Thread 0: a,b,x= 0 0 1.000000 Thread 2: a,b,x= 2 2 3.200000 Thread 3: a,b,x= 3 3 4.300000 Thread 1: a,b,x= 1 1 2.100000 ************************************ Master thread doing serial work here ************************************ 2nd Parallel Region: Thread 0: a,b,x= 0 0 1.000000 Thread 3: a,b,x= 3 0 4.300000 Thread 1: a,b,x= 1 0 2.100000 Thread 2: a,b,x= 2 0 3.200000

  • On first entry to a parallel region, data in THREADPRIVATE variables and common blocks should be assumed undefined, unless a COPYIN clause is specified in the PARALLEL directive

  • THREADPRIVATE variables differ from PRIVATE variables (discussed later) because they are able to persist between different parallel sections of a code.

Restrictions:

  • Data in THREADPRIVATE objects is guaranteed to persist only if the dynamic threads mechanism is "turned off" and the number of threads in different parallel regions remains constant. The default setting of dynamic threads is undefined.

  • The THREADPRIVATE directive must appear after every declaration of a thread private variable/common block.

  • Fortran: only named common blocks can be made THREADPRIVATE.



Data Scope Attribute Clauses

  • Also called Data-sharing Attribute Clauses

  • An important consideration for OpenMP programming is the understanding and use of data scoping

  • Because OpenMP is based upon the shared memory programming model, most variables are shared by default

  • Global variables include:
    • Fortran: COMMON blocks, SAVE variables, MODULE variables
    • C: File scope variables, static

  • Private variables include:
    • Loop index variables
    • Stack variables in subroutines called from parallel regions
    • Fortran: Automatic variables within a statement block

  • The OpenMP Data Scope Attribute Clauses are used to explicitly define how variables should be scoped. They include:
    • PRIVATE
    • FIRSTPRIVATE
    • LASTPRIVATE
    • SHARED
    • DEFAULT
    • REDUCTION
    • COPYIN

  • Data Scope Attribute Clauses are used in conjunction with several directives (PARALLEL, DO/for, and SECTIONS) to control the scoping of enclosed variables.

  • These constructs provide the ability to control the data environment during execution of parallel constructs.

    • They define how and which data variables in the serial section of the program are transferred to the parallel sections of the program (and back)

    • They define which variables will be visible to all threads in the parallel sections and which variables will be privately allocated to all threads.

  • Data Scope Attribute Clauses are effective only within their lexical/static extent.

  • Important: Please consult the latest OpenMP specs for important details and discussion on this topic.

  • A Clauses / Directives Summary Table is provided for convenience.


PRIVATE Clause

Purpose:

  • The PRIVATE clause declares variables in its list to be private to each thread.

Format:

    Fortran
    PRIVATE (list)
    
    
    C/C++
    private (list)
    
    

Notes:

  • PRIVATE variables behave as follows:

    • A new object of the same type is declared once for each thread in the team

    • All references to the original object are replaced with references to the new object

    • Variables declared PRIVATE should be assumed to be uninitialized for each thread

  • Comparison between PRIVATE and THREADPRIVATE:

      PRIVATE THREADPRIVATE
    Data Item C/C++: variable
    Fortran: variable or common block
    C/C++: variable
    Fortran: common block
    Where Declared At start of region or work-sharing group In declarations of each routine using block or global file scope
    Persistent? No Yes
    Extent Lexical only - unless passed as an argument to subroutine Dynamic
    Initialized Use FIRSTPRIVATE Use COPYIN

Questions:


SHARED Clause

Purpose:

  • The SHARED clause declares variables in its list to be shared among all threads in the team.

Format:

    Fortran
    
    SHARED (list)
    
    
    C/C++
    shared (list)
    
    

Notes:

  • A shared variable exists in only one memory location and all threads can read or write to that address

  • It is the programmer's responsibility to ensure that multiple threads properly access SHARED variables (such as via CRITICAL sections)


DEFAULT Clause

Purpose:

  • The DEFAULT clause allows the user to specify a default scope for all variables in the lexical extent of any parallel region.

Format:

    Fortran

    DEFAULT (PRIVATE | FIRSTPRIVATE | SHARED | NONE)
    
    
    C/C++
    default (shared | none)
    
    

Notes:

  • Specific variables can be exempted from the default using the PRIVATE, SHARED, FIRSTPRIVATE, LASTPRIVATE, and REDUCTION clauses

  • The C/C++ OpenMP specification does not include private or firstprivate as a possible default. However, actual implementations may provide this option.

  • Using NONE as a default requires that the programmer explicitly scope all variables.

Restrictions:

  • Only one DEFAULT clause can be specified on a PARALLEL directive


FIRSTPRIVATE Clause

Purpose:

  • The FIRSTPRIVATE clause combines the behavior of the PRIVATE clause with automatic initialization of the variables in its list.

Format:

    Fortran
    FIRSTPRIVATE (list)
    
    
    C/C++
    firstprivate (list)
    
    

Notes:

  • Listed variables are initialized according to the value of their original objects prior to entry into the parallel or work-sharing construct.


LASTPRIVATE Clause

Purpose:

  • The LASTPRIVATE clause combines the behavior of the PRIVATE clause with a copy from the last loop iteration or section to the original variable object.

Format:

    Fortran
    LASTPRIVATE (list)
    
    
    C/C++
    lastprivate (list)
    
    

Notes:

  • The value copied back into the original variable object is obtained from the last (sequentially) iteration or section of the enclosing construct.

    For example, the team member which executes the final iteration for a DO section, or the team member which does the last SECTION of a SECTIONS context performs the copy with its own values


COPYIN Clause

Purpose:

  • The COPYIN clause provides a means for assigning the same value to THREADPRIVATE variables for all threads in the team.

Format:

    Fortran
    COPYIN (list)
    
    
    C/C++
    copyin  (list)
    
    

Notes:

  • List contains the names of variables to copy. In Fortran, the list can contain both the names of common blocks and named variables.

  • The master thread variable is used as the copy source. The team threads are initialized with its value upon entry into the parallel construct.


COPYPRIVATE Clause

Purpose:

  • The COPYPRIVATE clause can be used to broadcast values acquired by a single thread directly to all instances of the private variables in the other threads.

  • Associated with the SINGLE directive

  • See the most recent OpenMP specs document for additional discussion and examples.

Format:

    Fortran
    COPYPRIVATE (list)
    
    
    C/C++
    copyprivate  (list)
    
    


REDUCTION Clause

Purpose:

  • The REDUCTION clause performs a reduction on the variables that appear in its list.

  • A private copy for each list variable is created for each thread. At the end of the reduction, the reduction variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable.

Format:

    Fortran
    REDUCTION (operator|intrinsic: list)
    
    
    C/C++
    reduction (operator: list)
    
    

Example: REDUCTION - Vector Dot Product:

  • Iterations of the parallel loop will be distributed in equal sized blocks to each thread in the team (SCHEDULE STATIC)

  • At the end of the parallel loop construct, all threads will add their values of "result" to update the master thread's global copy.

    Fortran - REDUCTION Clause Example


           PROGRAM DOT_PRODUCT
    
           INTEGER N, CHUNKSIZE, CHUNK, I
           PARAMETER (N=100)
           PARAMETER (CHUNKSIZE=10)
           REAL A(N), B(N), RESULT
    
    !      Some initializations
           DO I = 1, N
             A(I) = I * 1.0
             B(I) = I * 2.0
           ENDDO
           RESULT= 0.0
           CHUNK = CHUNKSIZE
    
    !$OMP  PARALLEL DO
    !$OMP& DEFAULT(SHARED) PRIVATE(I)
    !$OMP& SCHEDULE(STATIC,CHUNK)
    !$OMP& REDUCTION(+:RESULT)
    
           DO I = 1, N
             RESULT = RESULT + (A(I) * B(I))
           ENDDO
    
    !$OMP  END PARALLEL DO NOWAIT
    
           PRINT *, 'Final Result= ', RESULT
           END
    

    C / C++ - reduction Clause Example


    #include <omp.h>
    
    main ()  {
    
    int   i, n, chunk;
    float a[100], b[100], result;
    
    /* Some initializations */
    n = 100;
    chunk = 10;
    result = 0.0;
    for (i=0; i < n; i++)
      {
      a[i] = i * 1.0;
      b[i] = i * 2.0;
      }
    
    #pragma omp parallel for      \  
      default(shared) private(i)  \  
      schedule(static,chunk)      \  
    
      reduction(+:result)  
    
      for (i=0; i < n; i++)
        result = result + (a[i] * b[i]);
    
    printf("Final result= %f\n",result);
    
    }
    

Restrictions:

  • Variables in the list must be named scalar variables. They can not be array or structure type variables. They must also be declared SHARED in the enclosing context.

  • Reduction operations may not be associative for real numbers.

  • The REDUCTION clause is intended to be used on a region or work-sharing construct in which the reduction variable is used only in statements which have one of following forms:

    Fortran C / C++
      x = x operator expr


      x = expr operator x
      (except subtraction)
      x = intrinsic(x, expr)
      x = intrinsic(expr, x)

      x = x op expr
      x = expr op x
      (except subtraction)
      x binop = expr
      x++
      ++x
      x--
      --x

    x is a scalar variable in the list
    expr is a scalar expression that does not reference x
    intrinsic is one of MAX, MIN, IAND, IOR, IEOR
    operator is one of +, *, -, .AND., .OR., .EQV., .NEQV.

    x is a scalar variable in the list
    expr is a scalar expression that does not reference x
    op is not overloaded, and is one of +, *, -, /, &, ^, |, &&, ||
    binop is not overloaded, and is one of +, *, -, /, &, ^, |



Clauses / Directives Summary

  • The table below summarizes which clauses are accepted by which OpenMP directives.

    Clause Directive
    PARALLEL DO/for SECTIONS SINGLE PARALLEL
    DO/for
    PARALLEL
    SECTIONS
    IF      

    PRIVATE

    SHARED

       

    DEFAULT      

    FIRSTPRIVATE

    LASTPRIVATE    

    REDUCTION  

    COPYIN      

    COPYPRIVATE          

    SCHEDULE  

         

    ORDERED        

    NOWAIT      

  • The following OpenMP directives do not accept clauses:
    • MASTER
    • CRITICAL
    • BARRIER
    • ATOMIC
    • FLUSH
    • ORDERED
    • THREADPRIVATE

  • Implementations may (and do) differ from the standard in which clauses are supported by each directive.



Directive Binding and Nesting Rules

Note This section is provided mainly as a quick reference on rules which govern OpenMP directives and binding. Users should consult their implementation documentation and the OpenMP standard for other rules and restrictions.

  • Unless indicated otherwise, rules apply to both Fortran and C/C++ OpenMP implementations.

  • Note: the Fortran API also defines a number of Data Environment rules. Those have not been reproduced here.

Directive Binding:

  • The DO/for, SECTIONS, SINGLE, MASTER and BARRIER directives bind to the dynamically enclosing PARALLEL, if one exists. If no parallel region is currently being executed, the directives have no effect.

  • The ORDERED directive binds to the dynamically enclosing DO/for.

  • The ATOMIC directive enforces exclusive access with respect to ATOMIC directives in all threads, not just the current team.

  • The CRITICAL directive enforces exclusive access with respect to CRITICAL directives in all threads, not just the current team.

  • A directive can never bind to any directive outside the closest enclosing PARALLEL.

Directive Nesting:

  • A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team, which is composed of only the current thread unless nested parallelism is enabled.

  • DO/for, SECTIONS, and SINGLE directives that bind to the same PARALLEL are not allowed to be nested inside of each other.

  • DO/for, SECTIONS, and SINGLE directives are not permitted in the dynamic extent of CRITICAL, ORDERED and MASTER regions.

  • CRITICAL directives with the same name are not permitted to be nested inside of each other.

  • BARRIER directives are not permitted in the dynamic extent of DO/for, ORDERED, SECTIONS, SINGLE, MASTER and CRITICAL regions.

  • MASTER directives are not permitted in the dynamic extent of DO/for, SECTIONS and SINGLE directives.

  • ORDERED directives are not permitted in the dynamic extent of CRITICAL regions.

  • Any directive that is permitted when executed dynamically inside a PARALLEL region is also legal when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed with respect to a team composed of only the master thread.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video