Anatomy of an Intel QuickPath Interconnect System
Figures 6, 7, and 8 show the typical configuration of two-, four-, and eight-processor systems respectively that can be built using the Intel QuickPath Interconnect. Each processor has a memory controller on the same die to make the systems more scalable in performance. However this in not essential and Intel QPI systems can have separate, discreet memory controllers. Similarly, the I/O subsystems can either be incorporated onto the same die as the processors or built as separate I/O hubs. Both variations are shown in the illustrations.
These systems with Intel QPI have multiple links connecting the devices and each one of the links can operate independently of the other. The performance of such systems can be very high, particularly if the processors are allocated tasks to work data that are optimally distributed across the across the different memory controllers and close to the processors. Almost all current operating systems do recognize such system configurations with Non Uniform Memory Accesses (NUMA) from each processor and place the data in the memory accordingly. Such link-based systems have much higher aggregate system bandwidth and correspondingly higher performance.
The systems shown in Figures 6 through 8 are fully connected in that each device has a direct link to every other device in the system. However it is possible to build systems with Intel QPI where all the devices do not connect to all others. Figure 9 shows a four-processor system built with processors with only two links to connect to other processors.
If processor A in Figure 9 needs to access the memory controller in processor C, it must send its request through either processors B or D, who must in turn forward that request on to the memory controller in C. Similarly, larger eight or more processor systems can be built using processors with three links and routing traffic through intermediate processors. The Intel QuickPath Interconnect mechanisms enable such systems to be built. However systems that are not fully interconnected will have lower performance than those that are fully connected due to the longer latency through the intermediate processors and possible bandwidth congestion on the fewer links in the system. The Intel QuickPath Interconnect also provides mechanisms to help improve the performance of such partially connected systems by reducing the amount of traffic that is created across the links.
Yet larger systems can be built using the Intel QuickPath Interconnect fabric where the processors are connected in a hierarchy. Two or four processors are connected to a controller forming a cache coherent node. Multiples of these node controllers are connected together either using Intel QPI or some other scalable, coherent interconnect building systems with many more processors. Figure 10 illustrates the organization of such systems, which are capable of very high performance on distributable programs. Clearly the operating system must play a large role in their operation to get the highest performance from such systems.


