Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


Fast String Search on Multicore Processors

Daniele, Oreste, and Fabrizio were affiliated with the Pacific Northwest National Laboratory when this article was written. Daniele and Fabrizio have since moved to the Cell Solutions department of IBM. They can currently be contacted at [email protected]

While your current PC may be dual- or quad-core, the one sitting on your desk in a couple of years will be a "many-core." Multicores are the future, and because programming them is so different than programming for traditional processors, you must prepare for the change now, especially in mapping basic algorithms efficiently onto the new hardware.

String searching is one of these basic algorithms. It has a host of applications, including search engines, network intrusion detection, virus scanners, spam filters, and DNA analysis, among others. The Cell processor, with its multiple cores, promises to speed-up string searching a lot.

In this article, we show how we mapped string searching efficiently on the Cell. We present two implementations:

  • The fast implementation supports a small dictionary size (approximately 100 patterns) and provides a throughput of 40 Gbps, which is 100 times faster than reference implementations on x86 architectures.
  • The heavy-duty implementation is slower (3.3-4.3 Gbps), but supports dictionaries with tens of thousands of strings.

This task is not trivial. We had to change our algorithm significantly to reach top performance. In particular, to exploit the memory subsystem at its best, we employ a pipelined parallelization strategy, and we shuffle the data layout to fight congestion—techniques that are unfamiliar to most programmers of traditional architectures.

Why Is String Searching So Important?

The Internet is a dirty place, with malware, spyware, spam, and viruses trying to penetrate your systems through all possible vulnerabilities. Undesired traffic cannot be filtered anymore on the basis of header information. What's needed is"deep-packet inspection, which checks the payload against a database containing signatures of a large number of threats.

But with network links getting faster (10 Gbps Ethernet, for instance), performing on-the-fly deep-packet inspection isn't child's play. For example, running the Snort (www.snort.org) intrusion-detection system on a Pentium 4 PC only ensures a filtering throughput between 200-400 Mbps. This is why network appliance manufacturers often employ specialized hardware like Field-Programmable Gate Arrays (FPGAs) or Application-Specific Instruction Processors (ASIPs). The Cell processor is a new player in this field, with the potential to offer lots of computing power at low price. In fact, its large production volumes (it's used in the Sony PlayStation 3) contribute to keeping its price low.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.