Intelligent Cache
With an increase in transistor density, manufacturers such as Intel can build significantly more cache for each core. This increases the probability that each execution core can access data from the faster, more efficient cache subsystem. Advanced parallelism in the microarchitecture then optimizes the use of that cache to reduce latency to frequently used data.
Each execution core now has a dedicated L1 cache for data specific to that core. Since more data is available locally, fewer fetches are made outside the processor, and traffic on the system bus is reduced. This reduces memory latency and accelerates data flow. All cores then share a larger L2 cache for common data, to better optimize cache resources.
The advanced parallelism takes work traditionally done in the processor architecture and performs it at the micro level--at the core level, core-to-core level, and memory level. Since this method uses fewer hardware elements in the server platform, power requirements are also reduced. The result is greater performance at an increased level of energy efficiency.
Dynamic Allocation of L2 Cache
Another advanced optimization being used by Intel is dynamic allocation of the shared L2 cache, based on each core's requirements. Each core can now dynamically use up to 100 percent of available L2 cache. If one core has minimal cache requirements, the other core can dynamically increase its proportion of L2 cache (Figure 3). This helps decrease cache misses and reduce latency.
Dynamic allocation of L2 cache also allows each core to obtain data from the cache at higher throughput rates as compared to previous-generation architectures. This increases processor efficiency, increasing absolute performance, as well as performance per watt, a critical benefit for servers.
Challenges and Approaches To Memory Access
No matter how much cache is put in the system, data must still be fetched from main memory to go into the cache. The industry has explored many techniques to speed up that main memory access, from designing a hardware-based memory controller into the processor, to optimizing memory access through more flexible designs and methodologies.
Each set of techniques has its benefits. However, using a single hardware-based memory technology means that a design cannot easily take advantage of newer, more advanced techniques for improving memory access. The better designs use architectures flexible enough to support multiple memory technologies, to meet any requirements in the system.
These advanced designs use intelligent memory access to optimize the use of the available data bandwidth from the memory subsystem, and to hide the latency of memory accesses. This ensures that data can be used as quickly as possible. It also helps make sure that data is located as close as possible to where it's needed. Intelligent memory access minimizes latency and significantly improves efficiency and speed of memory accesses.