Performance of the Intel QuickPath Interconnect
Intel QPI is designed to provide very high system performance over a wide range of system configurations and workloads. It provides an excellent range of capabilities and permits the component designers to select the features that are best suited for their target systems.
High-performance small-scale systems such as workstations and compute intensive desktop machines tend to benefit from the low latency and high efficiency of the source snoop cache coherence mechanism. This variant of the snooping mechanism is designed to provide data to the processors with the lowest latency possible, as it requires the fewest number of hops across the links. A source snooping approach also takes advantage of the low latency of cache accesses to emphasize forwarding data from one processor to another, rather than getting the data from slower DRAM memory systems. This reduces latency by about 25 percent over comparably sized home snooped systems, producing a significant performance benefit.
Server systems benefit from large memory capacity and high memory bandwidth that can be readily shared across the multiple processors in the system. Intel QPI provides efficient mechanisms to handle data traffic from multiple memory controllers. Very high system throughput is achieved by pipelining a very large number of transactions between the processors and the memory controllers, and handling those transactions simultaneously and independently. Such systems can benefit from the home snooped mechanism of Intel QPI where system bandwidth can be further optimized with snoop filters or directories built into the memory controllers. This behavior allows the home agent, or the memory controller, to keep track of the agents that have requested a particular cache line, and only query them in order to cut down on traffic for cache coherence resolution. Product designers can choose from a wide range of snoop filter or directory mechanisms to help reduce the traffic across the links. The Intel QPI coherence protocol provides considerable flexibility in this area.
Larger system configurations with tens of processors can also be readily built in a hierarchical manner. Groups of two to eight processors and a node controller can be connected to other such nodes, where the node controllers are tied to each other over a set of links. Such a system configuration of two tiers of interconnect is shown in Figure 1. The second level of interconnect between node controllers may use Intel QPI, that is a platform architectural decision. Alternatively, an existing protocol can be used between nodes while Intel QPI is used for connectivity within the node. Such systems can take advantage of relatively inexpensive mass-produced processors to build massively parallel computing systems that can take on very large computing problems that require access to shared memory. Such node controllers can also incorporate sophisticated traffic filtering algorithms, taking advantage of the rich and flexible protocol of Intel QPI.
Reliability of the Intel QuickPath Interconnect
Intel QPI is designed to meet the demands of server systems where a premium is placed upon reliability, availability, and serviceability (RAS). The architecture offers several levels of error detection across the links and provides methods to correct those errors on the fly. However, if errors are seen repeatedly on one or more links, Intel QPI has the capability to isolate the faulty lanes or links and work around the failed elements. Intel QPI has mechanisms to try and recover from routine transient errors.
Intel QPI systems also may support memory mirroring where multiple memory controllers are paired together to provide a more reliable memory storage capability. Intel QPI handles all the traffic appropriately and ensures that data is properly delivered reliably to both the memory controllers in the mirrored pair. In the event of the failure of one of the controllers, the system can continue operation by seamlessly drawing upon the data from the partner controller. The Intel QPI interface can indicate the occurrence of such a failure and permit replacement of the failed memory unit as the links provide the capability to support hot-plug of devices. In all cases, the goal is to keep the system up and running even in the face of several link failures.
Deployment of the Intel QuickPath Interconnect
The Intel QuickPath Interconnect is the backbone of the next generation of platforms from Intel. Intel QPI is used to tie the processors and the I/O hubs together into single, dual, and multiple processor systems. Intel offers a range of processors and I/O hubs that utilize a varying number of Intel QPI links, The number of links implemented on a device depends upon the specific product requirements the device is intended to meet. This interconnect is used in both the IA-32 architecture and the Intel Itanium families of processors. Some of the processors will utilize the source snooped behavior for high performance computing applications that require low latency. The Intel Itanium product family and future high end processors designed for large multiprocessor server systems, tend to favor the home snooped behavior as they will focus on scalability and high system bandwidth. The I/O hubs serve the needs of both the small high performance systems and large servers, where multiple I/O hubs in a system can provide the required levels of connectivity.
System designers who want to build large systems can do so using node controllers to tie together clusters of processors. These clusters could have from two to eight processors connected to a node controller. These node controllers can be designed and built by the companies offering large systems and tailored to the specific goals of that system. Intel has enabled a third party to develop a macro cell of the Intel QPI physical layer. This macro cell has been tested and validated to work with certain Intel products. This macro cell is available through an ASIC design company and can be used as part of semi-custom components such as node controllers.
The Intel QPI coherence protocol is flexible in nature and works well to connect heterogeneous devices such as compute accelerators and graphics engines. The interface provides high bandwidth and uniform accessibility to the entire memory address space that can be readily cached by any of the engines. This helps keep software simple.
Designing with the Intel QuickPath Interconnect
Intel QPI brings about a fundamental change in the design of systems and platforms. This is a very significant departure from the Front Side Bus system approach and requires a shift in thinking and new methods and techniques to analyze and observe system operation for performance tuning and debugging. In systems based on Intel QPI, traffic travels across multiple links to multiple memory controllers all working in loose concert with each other. Thus anyone modeling system operational behavior for performance analysis purposes must understand the nature of the interactions between the various elements. As an example, two processors in a multiple link system can issue transactions to different memory controllers down two different links. Although each of these transactions can be considered to be essentially independent of the other, they may share resources within the processors and create snoop traffic on adjacent links.
Similarly, trying to observe the transactions of a single processor for the purpose of tracking down an anomaly requires integrating information from several locations in the system to create a complete picture. The high speed signaling technology of the Intel QPI links cannot be observed by a direct electrical connection to the signals. Intel, working with logic analyzer vendors, has created solutions to provide a way to observe these signals directly or indirectly. Several such probes are required to observe the transactions across many links while debugging a multiple processor system. The user then integrates information from all these sources to get a complete picture of the transactions in the system. All this is a significant departure from the methods used in FSB systems where one can observe all the system traffic on one logic analyzer connected to the bus.
With its high-bandwidth, low-latency characteristics, the Intel QuickPath Interconnect advances the processor bus evolution, unlocking the potential of next-generation microprocessors. With all these features and performance it's no surprise that numerous vendors are designing innovative products around this interconnect.