Aljosa Vrancic is a principal engineer at National Instruments. He holds a B.S. in electrical engineering from the University of Zagreb, and an M.S. degree and PhD in Physics from Louisiana State University. Jeff Meisel is the LabVIEW product manager at National Instruments and holds a B.S. in computer engineering from Kansas State University. Courtesy Intel Corp. All rights reserved.
Because tasks that require acceleration are so computationally intensive, your typical time high-performance computing (HPC) problem could not traditionally be solved with a normal desktop computer, let alone an embedded system. However, disruptive technologies such as multicore processors enable more and more HPC applications to now be solved with off-the-shelf hardware.
Where the concept of real-time HPC comes into the picture is with regard to number crunching in a deterministic, low-latency environment. Many HPC applications perform offline simulations thousands and thousands of times and then report the results. This is not a real-time operation because there is no timing constraint specifying how quickly the results must be returned. The results just need to be calculated as fast as possible.
Previously, these applications have been developed using a message passing protocol (such as MPI or MPICH) to divide tasks across the different nodes in the system. A typical distributed computer scenario looks like that in Figure 1, with one head node that acts as a master and distributes processing to the slave nodes in the system.
By default, it is not real-time friendly because of latencies associated with networking technologies (like Ethernet). In addition, the synchronization implied by the message passing protocol is not necessarily predictable with granular timing in the millisecond ranges. Note that such a configuration could potentially be made real-time by replacing the communication layer with a real-time hardware and software layer (such as reflective memory), and by adding manual synchronization to prioritize and ensure completion of tasks in a bounded timeframe. Generally speaking though, the standard HPC approach was not designed for real-time systems and presents serious challenges when real-time control is needed.
An Embedded, Real-Time HPC Approach with Multicore Processors
The approach outlined in this article is based on a real-time software stack, as described in Table 1, and off-the-shelf multicore processors.
Real-time applications have algorithms that need to be accelerated but often involve the control of real-world physical systems -- so the traditional HPC approach is not applicable. In a real-time scenario, the result of an operation must be returned in a predictable amount of time. The challenge is that until recently, it has been very hard to solve an HPC problem while at the same time closing a loop under 1 millisecond.
Furthermore, a more embedded approach may need to be implemented, where physical size and power constraints place limitations on the design of the system.
Now consider a multicore architecture, where today you can find up to 16 processing cores.
From a latency perspective, instead of communicating over Ethernet, with a multicore architecture that can be found in off-the-hardware there is inter-core communication that is determined by system bus speeds. So return-trip times are much more bounded. Consider a simplified diagram of a quad-core system in Figure 2.
In addition, multicore processors can utilize symmetric multiprocessing (SMP) operating systems -- a technology found in general-purpose operating systems like Windows, Linux, and Mac OS for years to automatically load-balance tasks across available CPU resources. Now real-time operating systems are offering SMP support. This means that a developer can specify timing and prioritize tasks that are applicable across many cores at one time, and the OS handles the thread interactions. This is a tremendous simplification compared with message-passing and manual synchronization, and it can all be done in real-time.
Real-Time HPC System Description
For the approaches outlined in this article, Figure 3 represents the general software and hardware approach that has been applied.
Note: The optimizations layer is included as part of the LabVIEW language; however, it deserves mentioning as a separate component.