Channels ▼


Fast String Search on Multicore Processors

Programming the Cell: Rethinking Bottom-Up

The Cell is so different from traditional processors that you can't get the best out of it unless your code explicitly takes advantage of its hardware. The Cell contains nine processors—one Power Processing Element (PPE), a 64-bit PowerPC-like processor that is mainly used to run the operating system and load the tasks, and eight worker elements called "Synergistic Processing Elements" (SPEs).

SPEs have 128-bit-wide registers and Single Instruction Multiple Data (SIMD) instructions. Because they have a simple branch prediction, jumps are expensive and your code should avoid them, possibly replacing them with speculative execution. Additionally, SPEs cannot access the RAM directly and they have no cache. Instead, they have a small 256-KB memory called "local store" (LS), and transfer data to/from main memory with asynchronous DMA transfers. By cleverly scheduling DMA transfers in your code (with, for instance, double buffering), you can hide part or all of the transfer latency. See our article "Programming the Cell" (

These hardware features revolutionize how you write code. Programmers have been told since Programming 101 to develop code in a top-down fashion. Now on the Cell, we are forced to look down to the bottom and make both ends meet. Hardware constraints strongly influence the algorithm.

Searching Strings: One Problem, Too Many Solutions

Whether you are scanning files for viruses, performing full-text searches on a large collection of articles, or detecting intrusions on network traffic, you're doing string searching. Given a text and a dictionary (a set of patterns), your task is to find all occurrences of any of the patterns in the text.

String searching is a beaten track, with many proposed algorithms. We employ Aho-Corasick, which is famous for appearing in the original "fgrep" UNIX utility. Despite its age (it was presented in 1975), this algorithm is simple and appears more suitable to the Cell than others (Knuth-Morris-Pratt, Boyer-Moore, Commentz-Walter, Wu-Manber, and the like). In fact, Aho-Corasick can be expressed well as a branchless data flow that best exploits the SIMD capabilities of the SPEs. On the other hand, its evolutions have more complex control flows, which suffer from frequent and large branch misprediction penalties when run on an SPE.

The variant of Aho-Corasick we employ is based on a Deterministic Finite Automaton (DFA), and is fast for two reasons:

  • The DFA is based on a keyword tree (a trie). A trie has a path of labeled edges per each pattern in the dictionary. To process an input character, you just choose the outgoing edge, which is labeled with that character, and jump to the following node along that edge.
  • In a DFA, no special rollback actions are required in case of mismatches. A non-matching input character triggers a simple state transition, like any other character. For more information, see "The Aho-Corasick Algorithm" (en.wikipedia .org/wiki/Aho-Corasick_algorithm).

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.