OpenMP is a well-known, widely accepted programming model for parallel programming on shared-memory platforms. It is based on so-called OpenMP directives that programmers add to their code to inform the compiler about which fragments of the program can be executed in parallel. Most of OpenMP is designed to allow for maximum performance of the compiled code. Hence, sometimes it might prove hard to understand some of the restrictions that OpenMP imposes in the programming model without knowing abouot how OpenMP code is compiled.
This series of articles gives insight into the inner workings of an OpenMP compiler. Everything will be kept on a high level and will not expose any implementation details of existing OpenMP compilers. Instead I describe generic compiler techniques that can be used to translate an OpenMP-enriched program to machine code. This may help you as an OpenMP programmer to better understand some of OpenMP's limitations and the performance behavior of your OpenMP application.
If you find yourself familiar with low-level multi-threaded programming -- e.g. with POSIX threads, Windows threads, or the Java Thread API -- you will recognize all the patters that will be covered in this series. It is the OpenMP compiler’s task to free you as the programmer from the "painful" low-level details of multi-threaded programming and to automatically do the transformation to map an OpenMP program to a threading library such as POSIX threads.
Figure 1 shows a typical compiler pipeline from a very high level. Most compilers will have a by far more complex pipeline with a variety of intermediate steps that are needed to perform all the optimization and transformation steps that we are all used to. To not overwhelm anybody with such a highly complicated compiler pipeline, the figure leaves out all the intermediate steps and only focuses on the main steps that are absolutely necessary for compilation of code.
Modern compilers are typically divided into two main compiler stages: the front-end and the back-end. Let's ignore the presence of the middle-end in the figure for now, we will come back to it later when we are talking about compiling an OpenMP program. Separating compilers into a front-end and a back-end greatly simplifies implementation of compilers that support more than one programming language and compilation target. Instead of n x m implementations (with n being the number of programming languages and m being the number of compilation targets), one only needs to implement n front-ends and link them to m back-ends, which gives n+m compiler implementations. In this architecture, the compiled program is passed from the front-end to the back-end in an intermediate language, a language-independent representation of programs.
The compiler front-end performs all tasks that depend on the programming language compiled. It reads in the source code, divides the stream of characters into tokens (e.g. the tokens "int" "a" ";"), and checks for syntactical correctness of the source code. The following semantic analysis ensures that everything is defined correctly. For instance, each variable used is checked for a valid declaration that matches the variable's usage. The front-end finally transforms the input program into a program in the compiler’s intermediate language and feeds the generated intermediate program into the back-end for further processing. Depending on the back-end, the intermediate language may contain high-level constructs (e.g. loops, conditional statements) or may be close to assembly code with only branch instructions and other low-level instructions.
The back-end takes over the intermediate program from the front-end and lowers it from the high-level intermediate representation to assembly for the compilation target. During this process, the back-end typically performs various optimizations. Most compiler front-ends follow a rule-directed compilation approach that leads to the creation of sub-optimal code paths. In addition, a programming language’s high level of abstraction also frequently leads to sub-optimal code. For example, in array-based programs the same index calculations are performed redundantly in loops and can be replaced by pointer arithmetic with a pointer that runs from the array's first element to its last element. It is the back-end’s responsibility to optimize the code accordingly. It removes redundant code and replaces sub-optimal code with better performing code. Dead code elimination, function inlining, loop unrolling, vectorization, and auto-parallelization are only a few optimizations that happen in this compiler stage. After all optimizations have been finished, the program is transformed to assembly code and written to an object file for the linker.
Let us now come back to the middle-end. The middle-end takes care of all the OpenMP-related work that needs to be performed in an OpenMP compiler. Please note that not all compilers run through an explicit middle-end stage. Some compilers merge the middle-end with the front-end and do all the OpenMP-related transformations in the front-end on the syntax tree of the program. Some compilers move the tasks of the middle-end into the back-end and implement transformations on top of the intermediate representation. For the sake of clarity, we will keep the notion of the middle-end and see how it transforms OpenMP programs.
In an OpenMP compiler, the front-end is extended to respect the OpenMP directive and to add the directives to the intermediate representation of the program. It would be possible for the front-end to create an intermediate program with threading calls and pass it to the back-end. However this would severely limit the back-end’s ability to analyze and optimize the OpenMP code and is therefore avoided in most optimizing OpenMP compilers. The OpenMP-enriched intermediate program is passed to the middle-end that to performs all OpenMP-specific transformations and optimizations. We will now look at the code transformations of the middle-end in more detail.