Intel Atom Processor Features
- Intel's 45 nm technology, based on a Hafnium, high-K metal gate formula, is designed to reduce power consumption, increase switching speed, and significantly increase transistor density over previous 65 nm technology.
- Multiple micro-ops per instruction are combined into a single micro-op and executed in a single cycle, resulting in improved performance and power savings.
- In-order execution core consumes less power than out-of-order execution.
- Intel Hyper-Threading Technology (Intel HT Technology; 1.6-GHz version only) provides high performance-per-watt efficiency in an in-order pipeline. Intel HT Technology provides increased system responsiveness in multitasking environments. One execution core is seen as two logical processors, and parallel threads are executed on a single core with shared resources.
The evolution of low power Intel architecture was realized through a number of technological advances and some common sense power budget analysis. The power considerations of typical embedded platforms break down into two key areas, heat dissipated and average power consumed.
Using the Intel Pentium M processor, the analysis focused on identifying the main power consumers within the instruction pipeline. This is a 14-stage, 3-way superscalar pipeline whose instruction execution engine is based on an out-of-order execution scheduler. This analysis highlighted not only the large amount of power required for the execution scheduler logic but also significant power consumed by the ancillary logic, which optimizes instruction flow to the scheduler.
The pipeline stages were deconstructed and rebuilt as a 2-way superscalar, in-order pipeline, allowing many of the power-hungry stages to be removed or reduced, leading to a power savings of over 60 percent compared to the Intel Pentium M processor, as Figure 1 illustrates. Following this, the next stage was to examine the delivery of instructions and data to the pipeline. This highlighted two major elements, the caches and front side bus (FSB).
The L2 cache was designed as an 8-way associative 512-KB unit, with the capability of reducing the number of ways to zero, through save of dynamic cache sizing to use power. L2 pre-fetchers are implemented to maintain an optimal placement of data and instructions for the processor core.
The FSB interface connects the processor and system controller hub (SCH). The FSB was originally designed to support multiprocessor systems, where the bus could extend to 250 mm and up to four loads: this is reflected in the choice of logic technology used in the I/O buffers. AGTL+ logic, while providing excellent signal integrity, consumes a relatively large amount of power. A CMOS FSB implementation was found to be more suited to low power applications, consuming less than 40 percent of an AGTL interface.
One of the key enabling technologies for low power Intel architecture was the transition in manufacturing process to 45-nm High-K metal gate type transistors. As semiconductor process technology gets ever smaller, the materials used in the manufacture of transistors has come under scrutiny, particularly the gate oxide leakage of SiO2. To implement 45-nm transistors effectively, a material with a high dielectric constant was required (High-K). One such material is Hafnium (Hf) and provides excellent transistor characteristics, when coupled with a metal gate.
In embedded systems in-order pipelines can suffer from the problem of stalls due to memory access latency issues. The resolution of this problem came from an unusual source. Intel HT Technology enables the creation of logical processors, within a single physical core, capable of executing instructions independent of each other. As a result of sharing physical resources, Intel HT Technology relies on the processor stall time on individual execution pipelines to allow the logical processors to remain active for a much longer period of time. The Intel Atom processor can use Intel HT Technology on its two execution pipelines to increase performance by up to 30 percent on applications that can make use of the multi-threaded environment.
To maximize the performance of the pipeline, the Intel compiler has added "in-order" extensions, which allow up to 25-percent performance improvement compared with code compiled using standard flags.
As has been long included in the standard IA-32 instruction set, the Intel Atom processor supports the SIMD extensions up to Intel Streaming SIMD Extensions 3.1 (Intel SSE3.1). These instructions can be used to implement many media and data processing algorithms. Traditionally considered the domain of the DSP, the SSE instructions are executed in dedicated logic within the execution pipeline. Delivering a low power processor on its own does not necessarily meet the needs of an embedded low power market, where low power, small platform footprint and low chip count tend to be the key cornerstones of a typical design.
To address this, the Intel Atom processor platform is paired with an Intel System Controller Hub (Intel SCH), which takes the traditional components of memory controller, graphics and I/O complex, integrated into a single chip, attached to the Intel Atom processor platform over a 400-MHz/533-MHz FSB. Figure 2 shows a typical Intel Atom processor platform.
To meet the need for a small footprint, the processor and chipset are offered in ultra-small footprint packages, with a size of 13 mm x 14 mm and 22 mm and 22 mm respectively. This footprint enables complete platforms to be developed with an area of less than 6000 mm2.
The Intel System Controller Hub continues the delivery of features and attributes suitable for the low power embedded market. The main features of the Intel System Controller Hub are described below:
- The memory interface is a single channel 32-bit DDR-2 memory, capable of implementing un-terminated memory-down solutions of up to 2 GB locked to the FSB speed.
- Closely coupled to the memory controller is the 3D graphics subsystem, sharing system memory in a Unified Memory Architecture (UMA) configuration.
- The graphics controller offers respectable 3D performance and also has the ability in hardware to completely decode a range of video streams (MPEG 2 and 4, H.264 WMV9/VC1, and others), removing this task from the main processor core.
- The graphics controller can output two simultaneous independent streams using an LVDS and sDVO interface, these display interfaces may be configured using the embedded graphics driver configuration tool.
Embedded applications are usually defined by their I/O requirements. The Intel SCH provides the designer a range of interfaces, from USB ports, which may operate in Host or Client mode, SDIO/MMC controllers supporting a wide range of card types and an eIDE P-ATA controller, which enables the use of the latest solid state drives (SSDs) and provides the designer with a storage interface that can easily be switched in and out of low power states (SATA interfaces require more link management and cannot easily be turned on and off when not in use). In addition to the integrated features, the Intel SCH offers two PCI Express x1 ports for further expansion.