Robert J. Safranek and Michelle J. Moravan are Xeon architects at Intel. This article is based on material in the book Weaving High Performance Multiprocessor Fabric by Robert A. Maddox, Gurbir Singh and Robert J. Safranek. Courtesy Intel Corporation. All rights reserved.
In early generation Intel systems traffic was broadcast across the shared, bi-directional "Front Side" Bus. With many sources driving the bus, electrical constraints made improving Intel FSB performance a challenge. Techniques included increasing the bus clock frequency and transferring requests at two and four times the bus clock rate, called double- and quad-pumping respectively. As systems grew to include many sockets, more busses were added, first two per system, called dual independent buses (DIB), and then eventually at a one to one ratio, creating a dedicated high-speed interconnect (DHSI) between each socket and the chipset.
With ever-increasing numbers of cores, the system demand for memory bandwidth exceeded the capabilities of a single memory controller. At this point, additional transistors provided by smaller process technologies made it economical to integrate the memory controller on-die, implying at least one per socket. The Intel FSB changes required to support multiple memory controllers would have relinquished full backwards compatibility. Without the need to maintain full backwards hardware compatibility, the system architects had an opportunity to introduce an entirely new interconnect paradigm: the inception of the Intel QuickPath Interconnect.
When defining the new interconnect architecture, the architects had several goals in mind. They wanted a complete system-level solution that could also accommodate future growth in terms of:
- Scalability: adding nodes to the system with a proportional impact on performance and power,
- Performance: lower latency and higher bandwidth,
- New features: reliability, availability, and serviceability (RAS) for servers, and power management for energy efficiency across all market segments.
However, the initial implementation still had to provide a substantial performance improvement over the FSB. Finally, business realities made low system and silicon costs, including pin reduction, a primary concern.
At the same time, the design space was sharply bound by compatibility requirements with other aspects of Intel architectural legacy. Broadly speaking, architects use legacy to refer to features and characteristics which primarily exist to ensure backward compatibility. While the introduction of Intel QPI involved a deliberate choice to forego backward compatibility from the hardware perspective, preservation from a software perspective was mandatory: software that runs on old hardware must have the same outcome when migrated to new hardware. This is especially important for compatibility with most operating systems.
Legacy requirements can occur when widespread usage of platform conventions results in a de facto future requirement. The combination of these customer expectations with the business drive for profit and increasing platform integration makes for some unique attributes in modern systems. We will explore a few examples in detail later. For now, let'soverview the key properities of the new architecture.
The architecture of the Intel QuickPath Interconnect defines a set of high-speed, packetized, point-to-point link pairs (two uni-directional links). From a physical standpoint, each link is implemented as a uni-directional narrow parallel bus that sends information (such as addresses, data, and data status) sequentially in time across multiple lanes. These narrow links connect one or more microprocessor sockets (sockets) in a distributed shared memory platform architecture. These sockets all view memory and the IO subsystem via a common address space. A socket comprises multiple microprocessor cores (cores), a last level cache, and one or more memory controllers. Sockets, memory controller(s), and the IO interface act as nodes in an Intel QPI topology.
Figure 1 portrays an example four-socket link based system, with each socket labeled "CPU". The red arrows represent Intel QPI link pairs. The figure shows that Intel QPI can provide both CPU-to-CPU and CPU-to-IO Hub (IOH) connections. Note that the system need not be fully connected, though doing so may provide a latency and bandwidth advantage. Since a physical link is uni-directional, a connection is implemented as a pair of links running in opposite directions.
The architecture of the Intel QuickPath Interconnect specifies a layered model, influenced by the success of the seven-layer Open Systems Interconnection (OSI) network model. The five layers, bottom up, are the:
- Physical layer: the actual wires carrying the signals and the associated circuitry and logic to support transmission, including both an analog and a digital aspect;
- Link layer, which guarantees reliable transmission and flow control on a single link;
- Routing layer, which directs packets through the fabric;
- Transport layer (reserved for future use);
- Protocol layer, which enforces a set of rules for exchanging packets.
The remainder of the article discusses the physical, link, routing, and protocol layers in detail. The physical layer works with data in granularities dubbed phits. The link layer works with flits, and all higher layers work with packets. Each section will explore its quantum of data in detail; for now only familiarity with the terms is needed to understand the relationships among the layers as we progress.
The initial Intel QPI products do not implement the transport layer. Therefore we offer a brief description here, but will not treat it further in this paper. The transport layer provides advanced routing and reliable end-to-end transmission in large-scale systems, and also includes other features such as encapsulation, which allows transmission of non-Intel QPI protocols across the Intel QPI fabric. The initial Intel QPI products did not require these features and thus do not implement the transport layer. Nevertheless, the architecture of the Intel QPI Interconnect specifies the transport layer to allow extensibility to, and define compatibility with, future products which might benefit from these sophisticated capabilities.
Note that the layered architecture allows modularity; for example, an Intel Core™ i7 processori is a single socket device that uses the Intel QPI protocols to pass information among internal uncore modules (the part of the processor excluding the cores) and the chipset (Intel X58). However implementation of the underlying layers is unique to that application. Implementing a common set of protocols among devices provides seamless integration into multi-socket systems, where a core on one socket can target a message to a core on another socket, all using Intel QPI messaging, while inter-core communication on a single socket may occur at a higher bandwidth using more signals than the Intel QPI physical layer specifies.
In the following sections, we will describe each of the layers in turn, beginning with the physical layer, continuing with the link and routing layers, and finally coming to the protocol layer. In the protocol layer section, we will detail both coherent and non-coherent flows, and sketch the impact of various legacy features in the final architecture of the protocol layer. At the end of the article, we provide an overview of Intel QuickPath Interconnect performance in its first generation of commercially available products.