The Physical Layer
This layer of the interface defines the operation and characteristics of the individual signals of an Intel QPI link. High-speed differential signaling is used, with 20 differential pairs in one direction creating one link. A clock lane accompanies the set of 20 data lanes. One link in each direction completes the connection. Figure 2 shows the signals of two Intel QPI links, forming a link pair between the two devices.
With the currently defined physical layer, 84 pins are used to carry all the signals of one Intel QPI link operating at its full width. In some applications, the link can also operate at half or quarter widths in order to reduce power consumption or work around failures. The unit of information transferred in each unit of time by the physical layer is termed a "phit", which is an acronym for physical unit. In the example shown in Figure 2, each phit would contain 20 bits of information. Typical signaling speeds of the link in current products calls for operation at 6.4 GT/s for systems with short traces between components, and 4.8 GT/s for longer traces found in large multiprocessor systems.
The physical layer is divided into two sections. The analog or electrical section manages the transmission of the digital data on the traces. This section drives the appropriate signal levels with the proper timing relative to the clock signal and then recovers the data at the other end and converts it back into digital data.
The logical portion of the physical layer interfaces with the link layer and manages the flow of information back and forth between them. It also handles initialization and training of the link and manages the width of operation.
The Link Layer
The Intel QPI link layer controls the flow of information across the link and ensures that the information is transferred reliably. It also abstracts the physical layer into independent message classes and virtual networks that are required for the upper layers of the interface. We will look at each of these functions of the link layer in some detail below.
Two connected link layers communicate with each other at the granularity of a "flit", which is an acronym for flow control unit. In Intel QPI, a flit is always 80 bits of information. Every flit contains 72 bits of message payload and 8 bits of CRC. The size of the flit is independent of the width of the physical link. The physical layer manages the transformation between flits and phits transparently. Figure 3 shows the subdivision of the 20 possible lanes into four quadrants (labeled Q_O to Q_3 in the figure) of 5 lanes each. Flits are mapped onto the available physical lanes by the physical layer.
The link layer handles the flow of data and ensures that only enough data is sent to the link as the receiving agent can accept without overruns. It uses a credit exchange mechanism where the two agents on a link exchange information about the number of buffers (and hence credits) they support. The link layer counts down the credits for each data entity it sends, ensuring it will not overrun the receiver. The receiver returns credits to the sender as it frees up its buffers.
The link layer also ensures that data is transferred reliably across the link. Every flit received by the link layer is checked for errors and groups of flits are acknowledged if they are free of errors. Otherwise the receiving link layer requests retransmission of the flit with errors and all flits subsequent to it that may have been transmitted. Both short transient errors, and burst errors that affect several flits, can be corrected by this means. Moreover, the order of transmission of the flits is maintained.
The link layer abstracts the physical link of Intel QPI into a set of message classes that operate independently of each other. The message classes are very similar to the different types of mail handled by the post office. You can request the post office to send a letter as ordinary first class mail, or registered mail, or even as express mail. You may also have different types of things to send, a letter and a large package for example, and you may wish to use different delivery mechanisms for the different types. Each one of these classes of mail is handled independently of the other, very much like the message classes of the Intel QPI link layer. We will briefly introduce the message classes here without going into more details of their operation and usage. The six message classes are: Home (HOM), Data Response (DRS), Non-Data Response (NDR), Snoop (SNP), Non-Coherent Standard (NCS), and Non-Coherent Bypass (NCB).
The link layer extends the notion of message classes to another level. One collection of the six message classes is called a virtual network. Intel QPI supports up to three independent virtual networks in a system. These are labeled VN0, VN1, and VNA. A basic one- or two-processor system can be implemented with just two networks and typically VN0 and VNA are used. All three networks are typically used in multiple processor systems that use the extra networks to manage traffic loading across the links and also avoid deadlocks and work around link failures.
The Routing Layer
This layer directs messages to their proper destinations. Every packet on an Intel QPI link contains an identifier of its intended destination. The routing layer logic contains a number of routing tables that indicate which physical link of a processor is the best route to a particular destination. These tables essentially reflect the physical topology of the system. Whenever the link layer hands a message to the routing layer, the routing layer looks up the destination address in the tables and forwards the message accordingly. All messages directed at caching or home agents in a local component are sent to corresponding internal elements. Messages destined for agents in other sockets are sent down the appropriate Intel QPI links identified in the tables.
The routing tables are set up by the firmware when the system is first powered up. Typical small systems will usually run with these values unchanged. Multiprocessor systems typically have more elaborate routing tables that contain information about alternative paths to reach the same destination. These can be used to help redirect traffic around a link that is heavily loaded. Fault resilient multiprocessor systems can also use this information to work around failures in one or more links.
The routing layer can also help to partition and reconfigure multi-processor systems into several smaller systems that logically operate independently of each other while sharing some of the same physical resources. Intel offers Xeon and Itanium processors for high reliability servers that implement several of these features in their routing layers.
The Protocol Layer
The protocol layer is the highest layer in the Intel QPI hierarchy. A primary function of this layer is to manage the coherence of data in the entire system by coordinating the actions of all caching and home agents. The protocol layer also has another set of functions to deal with non-coherent traffic. Intel QPI uses the MESI protocol for cache coherence, but also adds a new state labeled Forward (F) to allow fast transfers of Shared data. Hence the term MESIF better identifies the coherent protocol.
Intel QPI offers flexibility in the way cache coherence is managed in a typical system. Proper cache coherence management is a responsibility distributed to all the home and cache agents within the system, each has a part to play. There are some operational choices that can be made. Cache coherence snooping can be initiated by the caching agents that request data, and this mechanism is called source snooping. This method is best suited for small systems that require the lowest latency to access the data in system memory. Larger systems can be designed to rely upon the home agents to issue snoops. This is termed the home snooped coherence mechanism. The latter can be further enhanced by adding a filter or directory in the home agent that helps reduce the cache coherence traffic across the links. Let us look at each of these methods of handling cache coherency and revisit our legal team to see how they would use each one of these mechanisms for their work.
A quick recap is in order: Our legal team of Robert, Janice, Patty, and Tom are all collaborating on a common document. They can keep local copies of the various pages of the document and in our analogy they operate as the caching agents in an Intel QPI based computer system. Mary handles the repository of the entire document and is therefore serving as the home agent.
The team decides that time is of the essence and they should come up with a mechanism that lets them get at the pages of interest as quickly as possible. So when Robert decides he needs page eleven of the document, he asks Mary for a copy from the central repository. However, he simultaneously sends messages to each of his peers to see if they might have copies of the page. If any one of them has a Modified copy, that person will send it to Robert and let Mary know she or he has done so. So Robert gets the page in two messages or hops in Intel QPI terms. This is the source snooped method of managing cache coherence, as the source of the requestor for data also sends snoop messages to all peer caching agents.
This is a good time to revisit the Forward (F) state. In the example above Robert requests page eleven from Mary and lets each peer know that he is looking for that page. Normally if any one of Robert's peers had the page in the Shared state they would let Mary know of that fact and Mary would send a copy of the page to Robert. This would take a while as Mary has to go through the entire repository to find the page, not unlike the long access time to read the contents of DRAM in a computer system. However, suppose Patty does have a copy, she can send it directly to Robert as she can very quickly look for it in her small stack of cached pages. If both Patty and Tom had copies of page eleven, one of them would be marked as the designated Forwarder so that Robert would not get two copies of the same page. The Forward (F) state is used to designate one of the caching agents (among those with data in the Shared state), as the one responsible for forwarding data on to the next requestor. This reduces the time to get data to the destination and therefore improves system performance.
The team can use an alternate method where Mary at the repository of all the pages is responsible for checking with all the others for any cached copies of pages. So when Robert requests a page from Mary, she first sends snoop messages to all of Robert's peers to see if any of them have a copy of page eleven. If none of them do, they let Mary know of that fact by their own messages and Mary can send the page to Robert from her central repository. However, if Janice had a copy of the page and she had changed it, Janice would send it directly to Robert and let Mary know she has done so. Janice would also let Robert know that she has kept a copy of the page so that Robert can mark his copy of the page as Shared. In this example Mary had to initiate all the snoops, meaning she was using a home snooped approach, and the page reached Robert after three hops.
Let us extend our example a bit and introduce the notion of a directory. Tom is called on to travel on a business trip and he takes his copies of the pages of the contract with him. When Robert requests page eleven from Mary, then Mary has to ask all the others for the status of their caches regarding page eleven, just as before. However, contacting Tom takes a long time and this is becoming a bottleneck for the whole team. So Mary decides that she is going to keep a list of what pages she has handed out to whom—in effect creating a directory of the state of sharing of pages of the document. So when Robert requests page eleven, Mary checks her directory and immediately knows whom to contact about that page. If Mary's directory indicates that Tom does not have a copy of page eleven, she can give Robert the copy of the page and let him proceed with his work. The directory is an excellent way to reduce the number of snoops that have to be sent across the system. A directory can also be used to manage traffic to agents with long latencies such as those in remote nodes in large systems with multiple nodes. Intel QPI provides all the rules necessary to create such systems, and have them operate effectively.
Our examples above lay out the basic mechanisms of cache coherence management in Intel QPI.