Channels ▼
RSS

Tools

QuickPath Interconnect: Rules of the Revolution


Performance

Now that we have an appreciation for the architecture of the Intel QuickPath Interconnect, we must turn to practicalities and ask, "What performance did it achieve?" To make a fair evaluation, we must also put our query in context; for what performance did the architects strive?

Although lower latency and higher bandwidth always seem desirable, the interconnect need only provide enough performance so as not to act as the system bottleneck. In its first wave of products, Intel QPI more than achieves this goal. The theoretical maximum bandwidth calculation follows. First, a full width link is 20 lanes wide and while handling a data packet transfers two bytes of data payload at a time. It is double-pumped (executes a transfer on every edge of the forwarded clock), which means that it performs two transfers per cycle, for a total of four bytes per cycle. In the initial implementation, the underlying packetized busses comprising the link can be clocked at a frequency of 3.2 GHz, giving 12.8 GB/s bandwidth in a single direction. Thus two components connected with an Intel QPI link-pair can support a raw bandwidth of 25.6 GB/s. This corresponds to 6.4 GT/s (giga-transfers per second) on a single, uni-directional link, which drops to 4.8 GT/s if the components are far apart. To put this in perspective, Harpertown, with a 400 MHZ FSB, provides a peak bandwidth of 12.8 GB/s on certain specialized platforms.

Nonetheless, it is unlikely that an end user will observe the theoretical peak bandwidth, for any architecture. Generally only synthetic workloads that artificially stress the system achieve it; current Intel sockets do not generate sufficient traffic to strain the Intel QPI fabric. Furthermore, the above calculation excludes packetizing overhead: the transmitted data stream is divided into smaller packets that are labeled with header information to guide them through the topology. Within the interconnect, the information being communicated is divided into 20 lane phits. A header requires four phits (at full link width), and the typical data payload is a 64B cache line, requiring 32 phits, for a total of 36 phits in a data packet. This comes out to about an 11% packetization overhead (clocks used to transfer data versus total clocks elapsed), and a 5.6 ns latency to transfer a cache line, assuming 6.4 GT/s.

Summary

The architecture of the Intel QuickPath Interconnect revolutionizes Intel system platform topologies by replacing the FSB with packetized, point-to-point link pairs. The initial implementation supplies 25.6 GB/s of peak bandwidth per link-pair and can transfer a 64 B cache line in only 5.6 ns, all with fewer pins than its FSB predecessor. The five layer architecture targets multiprocessor distributed shared memory systems, and as such supports coherency by extending the traditional MESI protocol with a new Forward state, which reduces latency by permitting direct cache-to-cache transfers of data. The protocol supports both source snoop for the lowest latency in small systems, as well as home snoop to allow scalability by reducing snoop bandwidth in large systems.

For design ease, the physical layer features waveform equalization, deskew circuits, polarity inversion and lane reversal. The link layer builds on this to guarantee reliable transmission. It also implements flow control via a credit scheme and additionally defines many reliability and availability features, including link self-healing, clock failover, link level retry, and hot swap support. Self-healing and clock failover are based on dynamic width reduction. Finally, the link layer employs inline 8 bit CRC for low-latency, flit-level error detection with the option of additional error protection implemented as rolling 16 bit CRC. Virtualization of six message classes and up to three virtual networks onto 18 virtual channels guarantees deadlock and livelock avoidance and paves the way for routing optimizations in complex topologies.

Intel QPI provides all this while still maintaining compatibility with pre-existing Intel architectural legacy features. For instance, it provides support for critical chunk, has a VLW interface to mimic the effects of side-band signals such as INTR, A20M, and SMI, as well as a scheme for atomicity via locks. Intel QPI additionally defines both request interaction in the PAM regions as well as the results for alternative memory types like uncacheable regions.

In conclusion, the Intel QuickPath Interconnect revolutionizes Intel interconnect technology. It provides exceptional performance over existing bus technology and a wide array of new features, all while maintaining legacy compatibility. We refer the interested reader to the public specification for further information.

Acknowledgments

We would like to thank Intel Corporation for encouraging this article describing the legacy requirements placed on the Intel QuickPath Architecture. In addition we gratefully acknowledge the help and time of many of our colleagues who contributed to this work, in particular Malini Bhandaru, Robert Maddox, Jeffrey Gilbert, Leslie Xu, Jeff Casazza and Gurbir Singh. Michelle also thanks Mark Hill and David Wood for giving her so many opportunities to practice writing papers.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video