If you want to define suitable process pinning settings for any kind of MPI application, you will need to gather detailed information about your cluster nodes, their logical processors identifications, the placement on packages, their different cache levels sharing, their feature flags, etc. The
cpuinfo command-line utility included in the Intel MPI Library is the most convenient tool to gather that information because you can use the different identifiers that the output provides to configure process pinning.
I usually write posts about tools that provide information about the underlying hardware because when you understand the underlying hardware, you can optimize applications to take full advantage of that hardware. In fact, MPI applications usually take advantage of multiple high performance cluster nodes that provide dozens, hundreds, or thousands of multicore CPUs.
Since Intel launched Intel Cluster Studio 2012 and Intel Cluster Studio XE 2012, the interest on the most recent Intel MPI Library and its related tools has grown. Both suites include the Intel MPI Library version 4.0 Update 3, and therefore, they provide access to the
cpuinfo utility. Whenever someone who wants to take his/her first steps with the Intel MPI Library asks me about processor pinning configuration, I always stress the need to use the
cpuinfo command-line utility. The next question is usually where this utility can be downloaded. There is no need to download it because it is already available when you have the Intel MPI Library (and it has been there since version 3.1). By default, you will find the Intel MPI Library applications and documentation in Start Menu | Intel(R) Software Development Products | Intel(R) MPI Library 4.0 Update 3. You will find two Build environment folders with the
Build Environment command-line launcher: Build environment for IA-32 and Build environment for Intel(R) 64 (see Figure 1). If you click
Build Environment within the Build environment for Intel(R) 64 folder, the shortcut will launch a new command-line and run the
mpivars.bat batch file to define all the necessary paths and environment variables. By default, the shortcut executes:
C:\Windows\SysWOW64\cmd.exe /K "C:\Program Files (x86)\Intel\MPI\4.0.3.009\em64t\bin\mpivars.bat"
The result is a command-line window that will allow you to execute the
cpuinfo utility and any other Intel MPI command-line utility (see Figure 2). In this case, I've chosen the EM64T build environment.
Once you're in the build environment command-line, you can run the cpuinfo utility. If you run
cpuinfo A, the utility will print out all the information tables. The
A option represents the union of all available options. The output for cpuinfo is console text, and therefore, you can easily process it if necessary.
If you just need specific information and you don't want to be confused by all the tables, you can specify the letters for the options that represent each table.
g prints out general information about a single cluster node. This option displays the processor name including its model, the number of packages (sockets) on the node, the total number of physical cores (cores) per node, the total number of hardware threads or logical processors per node, the total number of physical cores (cores) per package (socket), and the total number of hardware threads per physical core. The following lines show an example of the table that
coreinfo g prints out, and then I provide an explanation of each line because some terms might be confusing:
===== Processor composition ===== Processor name: Genuine Intel(R) Name Model Packages(sockets) : 2 Cores : 16 Processors(CPUs) : 32 Cores per package : 8 Threads per core : 2
The following lines show the explanation of each line:
===== Processor composition ===== Processor name: Genuine Intel(R) Name Model Packages(sockets) : 2 -- The number of packages (sockets) on the node Cores : 16 -- The total number of physical cores (cores) per node Processors(CPUs) : 32 -- The total number of hardware threads or logical processors per node Cores per package : 8 -- The total number of physical cores (cores) per package (socket) Threads per core : 2 -- The total number of hardware threads per physical core. Because the CPU has Intel Hyper-Threading, it provides two hardware threads per physical core.
i prints out the logical processors identification, and therefore, allows you to identify each hardware thread or logical processor (Processor), unique processor identifier within a core (Thread ID), unique core identifier within a package (Core ID), and unique package identifier within a node (Package ID). The following lines show an example of the table that
coreinfo i prints out for two packages that provide 32 logical processors:
===== Processor identification =====
|Processor||Thread ID||Core ID||Package ID|
The previous table might be confusing, and that's where the node decomposition table is more helpful. Option
d prints out the node decomposition table that displays one entry for each physical package identifier (Package ID). Each package appears with both the list of unique core identifiers (Core ID) and the list of hardware threads or logical processors (Processors) that belong to the package. When a group of hardware threads or logical processors (Processors) is enclosed in brackets, it means that it belongs to a unique core identifier.
The following lines show an example of the table that
coreinfo d prints out (you can compare it with the previous table for
===== Placement on packages ===== Package ID Core ID Processors 0 0,1,2,3,4,5,6,7 (0,16)(1,17)(2,18)(3,19)(4,20)(5,21)(6,22)(7,23) 1 0,1,2,3,4,5,6,7 (8,24)(9,25)(10,26)(11,27)(12,28)(13,29)(14,30)(15,31)
c prints out the cache sharing by logical processors for the different cache levels and their sizes. Each entry displays the cache level (Cache), its size (Size), and the list of groups of logical processors that shared the cache enclosed in parentheses. Each group of logical processors enclosed in a parenthesis is sharing the cache.
The following lines show an example of the table that
coreinfo c prints out:
===== Cache sharing ===== Cache Size Processors L1 32 KB (0,16)(1,17)(2,18)(3,19)(4,20)(5,21)(6,22)(7,23)(8,24)(9,25)(10,26)(11,27)(12,28)(13,29)(14,30)(15,31) L2 256 KB (0,16)(1,17)(2,18)(3,19)(4,20)(5,21)(6,22)(7,23)(8,24)(9,25)(10,26)(11,27)(12,28)(13,29)(14,30)(15,31) L3 20 MB (0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23)(8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31)
There are two additional options:
- s — Prints out the processor signature
- f — Prints out the processor feature flags
If you need information about your nodes, the previously explained options will allow you to retrieve all the necessary information by using
cpuinfo. When you understand the identifiers, it is easy to define suitable process pinning settings for any kind of MPI application.
Intel MPI Library is a commercial product, but you can download a free trial version here.