Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Books in Brief



The Software Optimization Cookbook
Richard Gerber ([email protected])
280 pages + CD-ROM
Intel Press
$49.95
ISBN: 0971288712
www.intel.com/intelpress/swoptcookbook/

The Software Optimization Cookbook, by Richard Gerber, is a deep and insightful look into how the implementation details of your microprocessor interact with C programming techniques. Specifically, the optimizations address algorithms and how they interact with memory access, branching, SIMD instructions, multiple threads, and floating-point calculations. As you might expect, many of the optimizations are CPU specific and as such are tuned to the Pentium 4 architecture. In fact, a better title for this book might have been Optimizing Pentium 4 in C. With the Pentium 4 gains in market share (according to News.com, Pentium 4 unit shipments more than doubled during the fourth quarter of 2001 to about 15 million), now is the time to tune your app for performance in this regime. Since this is an Intel Press book, it of course fails to mention AMD at all.

It's my guess that a lot of programmers know little more about their CPU than its clock speed and would be hard pressed to describe the mechanics of the "L2" cache, for example. As such, the early chapters provide a thorough description of caches, pipelines, branch prediction, and other components of modern microprocessors. Until you understand how cache lines are organized and sized, you may be unintentionally writing code that is less than cache friendly.

Since few C/C++ programmers concern themselves with assembly language details on a regular basis, Gerber always supplies examples in C where possible. However, readers who have taken an introductory college-level course on assembly language will get the most out of the later chapters in this book. I quickly realized how much has changed since I wrote an 80286 CPU simulator in grad school some 15 years ago! With fewer developers coding in assembly each year, there are extremely few current instruction set reference books on the shelves. Fortunately, the accompanying CD-ROM contains the "IA-32 Intel Architecture Software Developer's Manual Volume 2: Instruction Set Reference," a 984-page reference book.

The Performance Issues section forms the core of the book and surveys how algorithms, branching, memory, loops, slow operations, and floating point impact performance. It then continues with new techniques to try such as SIMD, processor-specific optimizations, and parallel programming. Though I knew that multiple pipelines were critical to high-performance CPUs, I was surprised to learn that a little bit of loop unrolling could greatly increase instruction parallelism and thus avoid idling pipelines. In another case, Gerber demonstrates how trying to use branches to avoid doing more work can actually reduce overall performance because the load of mispredicted branches it produces outweighs the savings in reduced calculations. In both cases, the optimizations seemed counter-intuitive because what looks efficient in C may in fact be a hindrance to the CPU.

It's impossible to tune your code or even to gauge which part of your code is in need of help without a performance analyzer. The most basic of these tools is a "profiler" such as GNU gprof or CompuWare's TrueTime. A profiler simply breaks down by percentage the functions where the CPU spends most of its time to determine the "hotspots." To understand why a particular function may be using up so much CPU, you need to drill deeper with a tool such as Intel's VTune. VTune can report on more than 100 specific event counters indicating potential problems such as mispredicted branches, misaligned data accesses, cache load misses, idle time, floating-point operations retired, resource stalls, and dozens more. You might be wary of the text as a sales pitch for VTune, but the tool does provide a unique and comprehensive view of CPU events. In the book, VTune mainly demonstrates particular bottlenecks, such as how to spot L1 cache misses. Since the VTune screenshots are all black and white, I found the shades of gray sometimes hard to discriminate.

The CD-ROM actually includes the entire three volume set of IA-32 Software Developer's Manuals as PDFs (totaling a whopping 2000 pages). These manuals are also available as a free download from developer.intel.com. Also included are two other manuals devoted to optimizing Pentium 4 code (another 600 pages), which could be considered the theoretical basis for the practical advice Gerber provides. These latter books may mostly be of interest to compiler writers or device driver specialists. Last, the CD-ROM includes a coupon for $50 off the Intel C++ compiler or VTune, valid until the end of 2002. Since all this occupies a mere 22 MB, I was actually hoping to find an evaluation copy of VTune on the CD-ROM. Fortunately, you can download a 30-day free trial edition (34 MB) also at developer.intel.com.

I would recommend the The Software Optimization Cookbook to any C/C++ developer who wants to improve performance of an existing application or, better yet, to design a new app with performance foremost in mind. Only a little "assembly" is required. Developers working in interpreted virtual machine environments such as Java, VB, C#, or .NET won't benefit much from this since their world squarely faces the virtual machine rather than the CPU.


Victor R. Volkman received a BS in Computer Science from Michigan Technological University. He has been a frequent contributor to Windows Developer Magazine since 1990. He is the author of C/C++ Treasure Chest (CMP Books, 1998), which includes 300 products on CD-ROM. He can be reached by e-mail at [email protected] or through http://www.HAL9K.com/.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.