Channels ▼
RSS

Inner Loops and Regular Expressions


Dr. Dobb's Journal December 1997: Inner Loops and Regular Expressions

Robert, a language designer and CEO of Snake Island Research in Toronto, Canada, can be reached at bernecky@ acm.org. Michael, a developer for Eclipse International, can be contacted at mikefitz@ pacbell.net.


Inner Loops:
A Sourcebook for Fast 32-bit Software Development
Rick Booth
Addison-Wesley, 1997
364 pp., $39.95
ISBN 0-201-47960-5

Mastering Regular Expressions:
Powerful Techniques for Perl and Other Tools
Jeffrey E.F. Friedl
O'Reilly & Associates, 1997
342 pp., $29.95
ISBN 1-56592-257-3

If you are an x86 assembly-code programmer trying to squeeze the last bit of performance out of an application, then buy Rick Booth's Inner Loops right now.

This well-crafted and excellently edited book presents the performance characteristics of the Intel 486 and post-486 processors in excruciating detail, often providing corrections to Intel's published instruction timings. Rather than merely regurgitating specifications, Booth has covered a wide range of topics, including the superior performance of the RISC-like x86 instructions versus their CISC-like equivalents, cache characteristics (including the use of spurious storage references to issue a cache line prefetch), instruction pairing effects, memory queues, branch prediction, floating-point unit quirks, and how to exploit all of these for improved performance. If you program RISC systems, then many of the aforementioned problems have, for the most part, been solved for you, yet a quick read through the earlier chapters may provide some useful ideas, particularly in the area of cache management.

Inner Loops' breadth of coverage makes it useful to compiler writers, computer nerds, and introductory computer-science students, if only to gain insight into a quantitative approach to performance analysis of today's most popular (sadly, given its unpleasant architecture) microprocessor. Booth, drawing on his store of knowledge from the video and digital video fields, offers good guidance to newcomers by quite properly emphasizing when and why NOT to optimize, and recommending a look to algorithmic improvements first. He also presents the most readable summary of the x86 instruction set that I have ever seen.

As noted earlier, Inner Loops is well edited, a rarity in these days of slapdash computer publications. I noted only two typos -- a missing space on page 12, and "it" for "if" in a code fragment on page 249. Nomenclature sometimes departs from industry norms. Booth refers to L1 and L2 memory cache as "primary" and "secondary" cache, to Little-endian and Big-endian memory formats as "Intel" and "Motorola," to software pipelining as "coiled loops," to radix sort as "distribution sort" (even though one table refers to it as radix sort), and to code scheduling as "code reordering." Despite these oddities, Inner Loops is quite readable.

In Chapter 5 ("Optimizing the Pentium"), Booth claims that manual optimization of assembler code presents "interesting problems" for fans of "brain teasers and chess puzzles," yet does not suggest that creation of a code scheduler might be a better use of one's time. This is a good chapter if you accept his premise, as it contains considerable detail on such areas as instruction pairing, cache output queuing, data and code alignment, and address generation interlocks.

The book includes a CD-ROM containing algorithms, source code, and demos for sorting, hashing, compressing, and more. It also contains the full text of War and Peace, used as test data for several programs presented in the text. I did not examine this material, so I cannot comment on its utility.

I was disappointed to find that, although the book's cover claims that Inner Loops presents examples of "random numbers, hashing, sorting, matrix math, Huffmann compression, and JPEG," the coverage lacks depth and does not break new ground for the programmer using such algorithms. However, the book does present a good deal of practical advice and careful quantitative analysis of the algorithms and methods of optimizing for the x86 platforms. In addition, Booth shows highly optimized implementations of the aforementioned algorithms, along with very detailed timing analysis for them. Sadly, though, the book's focus on current members of the x86 family of processors will quickly reduce its utility to readers.

Nonetheless, leafing again through Inner Loops, I came across a number of places where I had annotated the text with "good advice" or "interesting idea" -- this book contains a wealth of pragmatic information for the x86 assembly-code programmer. Inner Loops would also be a good supplement to an introductory undergraduate course in computer hardware, as it emphasizes the use of hypothesis, experiment, and measurement in analyzing application performance.

-- Robert Bernecky

Mastering Regular Expressions

If you've spent any time crafting "regular expressions," you know they are the closest thing there is to mangled punctuation. And because they can mean different things to mathematicians and programmers, regular expressions are difficult to define, too. Indeed, the meaning changes from tool to tool. As used in Jeffrey Friedl's Mastering Regular Expressions, the definition "special search strings that match patterns of data (typically text), rather than specific sequences of bytes or characters" is sufficient.

To appease the theorists, but mostly for notational convenience, regular expressions will be referred to here as a "regex." It is important to note that there is no standard for regex. Each tool defines its own regex syntax and the extent to which it is implemented. Some valuable regex features are not always available in every tool.

Mastering Regular Expressions is about regex, not Perl. Friedl covers regex in Perl, but says nothing about the many other Perl language features. Still, many people think of Friedl's book as a "Perl" book. (In fact, it is even miscategorized as a Perl book by the Library of Congress.) Granted, Perl is a language that includes seamless use of regex as its main feature, and Perl's regex implementation is unsurpassed. Understanding regex is vital to using Perl effectively. Anyone who programs in Perl for a living would not argue with that.

However, regex is found in many places, including languages (Python, Tcl, and Expect), tools (awk, lex, and grep), and editors (Emacs, vi, and sed). It can save you lots of time, if you are willing to learn it. Friedl spends time discussing many regex tools in Chapter 6 and dedicates all of Chapter 7 to Perl regex.

The author carefully brings us to understanding regex by example and analogy. A simple example of a regex can be difficult to read for the inexperienced user.

Creating a regex is intuitive after you have some experience, but getting the experience can be quite frustrating. When you start learning regex, you have to figure out matching problems purely analytically, which is especially difficult since your tool's documentation of regex will most likely be inadequate and there is no regex debugger. For example, in Perl you can construct a regex that matches nested expressions using parentheses. The regex in Example 1(a), which is borrowed from page 126 of Friedl's book, matches a parenthesized expression and allows parenthesis nesting up to one level, and will perform the match in the text pattern in Example 1(b). Just look at this regex! Now you can see what a great feat it is to write a book on regex that is actually readable.

Even with a basic understanding of regex, you can still learn a lot from reading Mastering Regular Expressions. If nothing else, the book is well researched, covering even obscure areas of regex (the POSIX regex standard, for instance). Many of the examples are practical, covering tricky problems such as matching C comment blocks, IP matching, and date matching. And Friedl's discussion of regex efficiency is valuable. Understanding the inner workings of regex can mean the difference between writing a regex that may not match in your lifetime, or writing one that can make a quick match. As always, it is important to note that optimization can lead to a trap. When too much knowledge of a process's internals is assumed, those assumptions can create inefficiencies when the technology changes.

One reason this book is important is because regex is intimidating -- and Friedl makes it easier to understand. Many programmers don't use the regex available in their development tools even though regex would probably save them a lot of time. Think of that the next time you find yourself stuck with a pile of someone else's code that you need to maintain.

Mastering Regular Expressions is destined to be a classic reference on the subject it covers. If you're just getting started with regex, this book will save you a lot of time (and grief). If you are already using regex, it will help you extend your ability and understanding.

-- Michael Fitzpatrick

DDJ


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.