Itanium 2 Developer Days Diary

The Intel Itanium microarchitecture is fundamentally different under the hood than other processors. It "thinks" differently.


April 27, 2006
URL:http://www.drdobbs.com/parallel/itanium-2-developer-days-diary/187000299

This article was inspired by the Itanium Solutions Alliance Developer Days, an ongoing series of conferences sponsored by the Itanium Solutions Alliance and intended to give creators of Windows-, and Linux-based applications hands-on knowledge about how to port applications to and optimize them for Itanium 2-based systems. Developer Days are offered free of charge to interested software developers. The events provide training, technical assistance and industry-leading tools from Alliance members.

Ten years ago, computer users and software developers on the cutting edge of microprocessor power were faced with disparate hardware, proprietary operating systems, incompatible architectures, and inflexible vendor relationships. That landscape has changed dramatically with the introduction of the Intel Itanium microarchitecture, a high-end, standards-based processor capable of handling the heaviest workloads.

The Intel Itanium microarchitecture is fundamentally different under the hood than other processors. It doesn't just have 64 bits and more on-board memory; it also "thinks" differently, making the compiler do a lot of the code-juggling that x86 and other processors do with silicon. This required that programmers think differently, too. In the early days of Itanium-based systems, this led to a circular problem. Because the mind-shift was slow in coming, as well as the somewhat greater complexity of compilers for Intel Itanium microarchitecture, tools for porting software to Itanium-based systems lagged. This led to the use of poorly optimized code, or else an excessive reliance on the first Intel Itanium processor's hardware-based 32-bit capabilities. Misperceptions about Itanium-based systems' performance advantages followed, leading to lack of interest in porting, and so on, finally leading to much-exaggerated accounts of the first Intel Itanium processor's impending death.

Soon, however, the market spoke--and loudly, too--especially after the release of the Intel Itanium 2 processor with its increased I/O bandwidth, software-based 32-bit capabilities and other new-and-improved features. Scientific and business users have since found high-speed access to large memory caches and superior floating-point performance not only useful but crucial to cracking the toughest computing challenges. The developer community has embraced the smart solutions made available to them by the Intel Itanium 2 processor, resulting in a vast increase in the quality and variety of tools for porting to it. While it has never been easier to move to Itanium 2-based systems, there are many factors to consider before setting out. In this article, I provide an overview of the challenges and solutions inherent in such an effort.

Making Sense of Microarchitecture

The main difference between standard processors and the Intel Itanium 2 microprocessors is Explicitly Parallel Instruction Computing (EPIC), which shifts the responsibility for maximizing parallelism from the processor to the compiler. Unlike microprocessors employing Reduced Instruction Set Computing (RISC) or Complex Instruction Set Computing (CISC) models, in the EPIC model the compiler, aware that there are multiple execution units, groups parallel-ready instructions in bundles. The processor executes the bundles in parallel without runtime analysis.

The Leap to EPIC: Architecture Highlights

  • The compiler orchestrates predication, allowing instructions to be executed conditionally and reducing the performance hits caused by branch mispredicts in RISC-based systems.
  • The Intel Itanium 2 compiler recognizes that there are multiple execution units; the compiler groups instructions that can be performed in parallel, making them ready for execution without runtime analysis.
  • The processor's scheduler is in the compiler, allowing the compiler to handle scheduling and produce code that takes full advantage of on-chip resources.
  • Intel Itanium 2 microarchitecture has 128 general-purpose and floating-point registers, versus the 32 general-purpose and floating registers found in most RISC-based systems.
  • Intel Itanium 2 processors use only the registers they need rather than the 8 registers that RISC-based systems take whether they need them or not.
  • Intel Itanium 2 microarchitecture has more units that execute instructions.
  • Two-way pipelines pre-load data ahead of possible over-writes, resulting in fewer flushes, fewer problems, and increased reliability and performance.
  • Software pipelining. Combining speculation, explicit parallelism, predicated execution and rotating registers with looping branch instruction allows:

    • Efficiently pipelined loops
    • Smaller code
    • Reduced latency
    • Elimination of copied code for prologue or epilogue
    • Increased parallelism
    • More Level 1 cache memory
    • Shorter wait times
    • Greater I/O bandwidth

—R.D.

The Intel Itanium 2 microprocessor has other speed-enhancing features in addition to the EPIC paradigm; in most cases, the compiler exploits them automatically. For example, non-EPIC processors use branch prediction to speed up processing times. Encountering a code branch, x86 chips don't wait around to find out which way to go. They "guess." Branch prediction algorithms are almost always right, but in the highly branched code relevant to data- and calculation-intensive computing, even a tiny percentage of wrong guesses can add up to big performance hits because a wrong guess sends the process back to the beginning.

The Intel Itanium 2 processor does use prediction, but adds predication to avoid misprediction performance hits by running each possible variation of a branch in parallel and tossing the incorrect result. The microprocessor actually contains extra bits which can be set to "true" or "false" for a given predicated instruction. The compiler chooses which branches are suitable for predication and sets the bit. All developers have to do is re-compile for the Intel Itanium 2 processor to make use of predication

Optimal use of the Intel Itanium 2 microarchitecture's extensive onboard memory caches is also critical to maximizing performance. "The idea is to arrange program execution so that needed instructions and data are in L1 cache as much as possible," said HP/s Dick Nicholson at the Developer Days conference sponsored by the Itanium Solutions Alliance. "In a best case/worst case comparison, a program whose data is always in Level 1 cache when needed will run much faster as a program whose data always has to be fetched from main memory."

[Click image to view at full size]

Figure 1: Intel Itanium 2 microarchitecture.

Porting Makes Sense for Many

Given these hardware-based advantages, scientific and technical users have adopted Itanium 2-based systems where the huge gains in floating-point performance and scalability found immediate application in the sequencing of genomes, quantum physics, weather modeling, and other computationally-intensive settings. Currently, the biggest challenges for these users are migrating custom applications from one platform to another as well as reorganizing enormous datasets to better exploit parallelism.

In the past few years enterprise business computing has grown exponentially more analytical, data structures more vast and user populations more immense. The tipping point for Itanium 2-based systems to move out of high-performance computing labs and into the general business community is past. For both scientific and commercial users, the availability of off-the-rack software optimized for Itanium 2-based systems is hardly an issue today. More than 7000 applications now run natively on the Intel Itanium 2 processor with more arriving all the time.

The 64-bit Tipping Point: What's Different on Intel Itanium 2 Microarchitecture?

  • Explicit parallelism through compiler/processor synergy allows Intel Itanium 2 processors to more efficiently execute a larger number of instructions.
  • 16-way Intel Itanium 2 processors show 30 percent, 40 percent and 44 percent better performance than 16-way RISC-based systems in TPC-C, SAP 50 and SPECint_rate benchmark tests, respectively.
  • RAS features including error recovery, internal soft error logic check, Machine Check Architecture, bad data containment, cache reliability, lockstep support, memory SDEC, memory spares and partitioning position Intel Itanium 2 processors for RISC replacement.
  • Enhanced register model provides a large register file, rotation registers and a register stack engine.
  • Floating-point architecture allows for extended precision calculations and better performance for complex calculations.
  • Memory management. 64-bit addressing, speculation and memory hierarchy control.
  • Scalable from 2 to 512 processors. In 4-way and 32-way TPC-C testing, Intel Itanium 2 processors outscale RISC-based systems by 32 percent and 125 percent, respectively.
  • Low voltage. Intel Itanium 2 microarchitecture can optimize power through blades, resulting in lower power consumption and lower-cost systems.

—R.D.

While the market is large and growing, not every software vendor needs to port; Itanium 2-based hardware serves the upper end of the computer market. Those who market applications for business intelligence, database management, computer-aided engineering, process simulation, complex imaging or similarly power-hungry functions stand to profit from the expanding user base of the Intel Itanium 2 microarchitecture.

System manufacturers and OEMs are the final piece of the porting puzzle. The Intel Itanium 2 processor is not limited to a single hardware platform. Many manufacturers including Bull, Hitachi, HP, Fujitsu Siemens Computers, Fujitsu, NEC, SGI and Unisys use it in high-end servers. Anyone manufacturing hardware using Itanium 2-based systems needs to consider porting issues, particularly with regard to drivers which can be tricky to migrate from one system to another.

While the proliferation of porting tools makes the process easier (in some cases almost trivial), migrations must be planned as carefully as any other major IT effort. Operating systems, hardware, availability of code, programming language, the need to port device drivers, cross-platform communication and security are all important considerations.

Key Factors to Consider When Porting to Itanium-based Systems

Moving a program from an early-90s Cray mainframe to an Itanium 2-based system will be harder than moving a program from the Intel Xeon 64-bit processor to an Itanium 2-based system. Luckily, Intel, HP and other vendors offer access programs that give members access to the latest Itanium 2-based hardware, often before it comes out on the market. In addition to the ability to lease systems at a discount the Intel Early Access Program offers support for training, testing, optimization, and other elements of porting to the Intel Itanium microarchitecture. The Intel Remote Access program lets developers use Itanium-based systems remotely:

Consider the larger ecosystem as well. Will your shiny new Itanium 2-based application have to communicate with other systems? Is there mixing of 32- and 64-bit systems? Are there potential security issues? What level of users will be interacting with the application? Where are data stored, and how are they accessed? These can all be factors in deciding how to port.

Operating Systems

One of the biggest advantages in using servers based on commodity processors like the Intel Itanium 2 microarchitecture is that they can run a variety of operating systems. In fact, because of their virtualization capabilities and processing power, Itanium 2-based systems can often run different operating systems simultaneously. Whether your software is made for OpenVMS, Windows, HP-UX 11, or any of multiple versions of Linux, you can get it onto an Itanium 2-based server or cluster without having to recompile for both a new OS and a new processor. Operating system migrations often far outstrip the complexity of the move between processor types and must be dealt with on their own terms before the hardware migration even begins.

Code and Languages

Ideally, you have the complete source code in your hands. Due to differences in pointers and data sizes, you'll still have some work to do to get your code ready for the migration. But with a highly-optimizing Intel Itanium 2 compiler, you stand a much greater chance of getting the most out of the EPIC architecture and other advanced processor features if you have the source code, as well as a full set of post-compile testing and optimization tools from which to choose.

Applications that rely on runtime environments (RTE) such as Java, .NET, or shell programming languages like Python and Perl are the easiest to migrate. Of course, this assumes that the appropriate RTE is available for the Intel Itanium 2 microarchitecture; most are. Native libraries, external components and database drivers that are crucial to the functioning of a managed application still may need to be ported or replaced with compatible equivalents.

If you don't have the source code, you could face the onerous task of rebuilding from the bottom up. Even though the software in question might be outdated and stand to benefit from such an initiative, a complete rewrite could be too expensive or time-consuming. For such situations, Intel has provided the IA-32 Emulation Layer (IA-32 EL), a software-based solution for running 32-bit programs on Itanium 2-based systems. While it can't match the performance of natively compiled code, it beats starting from scratch.

Assembly language is not much easier to port to Intel Itanium 2 processors. Assembly for the Intel Itanium 2 microprocessor is radically different than that for other processors and it can be difficult to take advantage of processor features when coding at an assembly level, so much so that Intel's documentation recommends against coding in assembly.

Compilers

In the open source melee that is Linux, more than one option exists for re-compiling. The two primary compilers for Linux are the GNU Compiler Collection (GCC) and Intel's C++ compiler. Intel also provides a FORTRAN 95 compiler. The GNU Compiler Collection (GCC) is available for nearly all Linux and UNIX systems, and is portable across platforms. The Gelato Federation, a global technical community, dedicated to advancing Linux on Intel Itanium 2 microarchitecture, has a strong effort afoot (sponsored in part by the Itanium Solutions Alliance) to significantly improve the GCC for Itanium 2-based systems.

The Intel C++ Compiler is designed to be highly compatible with GCC, but adds a set of very aggressive optimizations like inlining of math library routines, interprocedural optimizations that inline functions across files, and Profile Guided Optimizations that provide reporting options for further code optimization based on profiling.

Migrating Windows applications from 32-bit RISC to 64-bit Intel Itanium 2 microarchitecture is straightforward. In addition to the Microsoft compilers, Intel offers C/C++ and FORTRAN compilers that groove to the unique qualities of the Intel Itanium 2 microarchitecture and integrate with Microsoft Visual Studio and Visual Studio .NET environments. The Intel compilers are switch-compatible with Microsoft compilers.

In both cases, code needs to be "cleaned" before porting. Tutorials on how to go about making sure your Windows or Linux code is 64-bit clean can be found all over the Web, including the Intel, HP, and Gelato web sites. Intel Itanium 2 processors also run a variety of other operating systems, each with unique portability issues which an organization must address before a porting project gets underway.

Drivers

Drivers require special consideration because of their intimate relationship with the hardware platform. This is yet another area in which the Intel Itanium 2 processor breaks new ground, this time with the Extensible Firmware Interface (EFI) which offers freedom from the decades-old shackles of PC BIOS.

With EFI, the ideal of platform independence for hardware devices is closer to reality. The standardized driver model means, for instance, that changing a keyboard bus from PCAT to a USB on Intel Itanium 2 microarchitecture does not require any application code changes if developers follow the EFI standard. Other features of EFI include a pre-OS boot point in system startup, meaning that actions can be triggered without booting an operating system. A libc utilities interface to the system prevents having to create stand-alone utilities. Best of all is the shorter learning curve compared to programming drivers for PC BIOS.

32-bit Support

In addition to compiling a fully native 64-bit application, the Intel Itanium 2 microarchitecture provides x86 compatibility via the IA-32 Execution Layer (EL), a software-based translator replacing the less-efficient, less-flexible, hardware-based 32-bit support of the original Intel Itanium processor. IA 32-EL is usually faster than hardware-based x86 support because it translates "live" frequently used code to native IA-64 instructions.

The IA-32 EL provides performance scaling as processor performance increases as well as the flexibility to add enhancements to the operation of 32-bit applications on Itanium 2-based systems. Both Windows and Linux operating systems running on Itanium 2-based systems support IA-32 EL, greatly broadening the range of IA-32 applications that run well on such systems. Operating systems that support IA-32 EL include:

While in nearly every case a fully native 64-bit port results in the greatest performance benefits, it's not always practical to port everything at once. Achieving near-native Itanium 2-based server performance while migration projects are in progress is possible with a hybrid approach in which primary software like drivers and major applications are ported first while secondary programs continue to run in 32-bit. The IA-32 EL also facilitates communication between 64- and 32-bit applications and may provide the ability to continue running programs for which source code is not available.

Conclusion

Plan well and the rest of the process is the same as any software development effort--tighten up the code, compile it with the right tools and test it. The optimization stage is critical. To assist with the process, the processor itself provides an advanced set of performance monitoring tools through the Performance Monitoring Unit (PMU) which can provide developers with detailed information on more than 140 runtime events like cache misses, pipeline stall and branch mispredicts. The PMU provides the ability to perform detailed application analysis without affecting the performance of the application.

Many software-based optimization tools are also available, such as the Intel VTune performance monitor, and can isolate specific lines of code that cause performance problems without having to embed special monitoring code into the program. VTune shares an interface with Intel's threading tools, so developers don't have to learn yet another interface as they start writing multi-threaded code.

Today, the computing world embraces the advantages provided by 64-bit computing. More than half of the Global 100 Corporations already run Itanium 2-based systems, and Intel Itanium 2 processors are at work in more than 70,000 enterprise systems. The Itanium Solutions Alliance brings investment protection through increased geographical coverage, strong developer support, and a growing portfolio of hardware and software.

The next-generation Intel Itanium 2 processor includes a dual core that supports multithreading, better power management, enhanced virtualization, new performance acceleration features, and even greater reliability than the current version. Codenamed "Montecito," this processor is based on 90nm process technology and boasts more than 1.7 billion transistors and a 24MB cache. The processor further enhances parallel processing capabilities and increases the performance benefits for managed code, while providing end users with a 2x performance increase and a greater than 2.5x increase in power efficiency. With multi-core processors for single- and dual-processor servers shortly to follow, porting to Intel Itanium 2 microarchitecture now is an investment for the long-term success of any organization that needs world-class computer performance.

Robin Drummond is president of the Itanium Solutions Alliance. She can be contacted at [email protected].

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.