Channels ▼
RSS

Parallel

Multicore and Cryptographic Hash Functions


Ilya Mirman can be contacted at Cilk Arts, a company that focuses on tools for multicore programming.


Powerful computers have made cracking encryption schemes increasingly easier, but a recent effort is putting the ever more potent multicore CPUs to work on the encryption side of the equation.

A cryptographic hash function maps an input string to an output string of some fixed bit-length. Cryptographic hash functions have many applications, such as digital signatures, time-stamping methods, and file modification detection methods. In response to advances in the cryptanalysis of hash functions, the National Institute of Standards and Technology (NIST) has recently created a competition for a new cryptographic hash algorithm.

A team from MIT led by Professor Ron L. Rivest (founder of RSA Data Security and author of the popular MD5 hash function) has put forth an entry. MD6 is a totally new design that leverages the larger memory and multicore CPUs found in today's computers. The full submission is available here, along with the invited talk given at the CRYPTO 2008 conference.

As Ron's talk pointed out, MD5 was designed in 1991 -- the same year the World Wide Web was announced, and typical PC clock rates were 33 MHz. MD5 and other hash functions have since been broken, and others may be vulnerable. That's the bad news. The good news is that memory is now plentiful, and parallelism has arrived (indeed, the SHA-3 requirements state, "SHA-3 should be able to exploit the parallelism provided by the forthcoming flood of multicore processors."). Memory capacities have increased 60% per year since 1991, chips have 1,000 times as much memory as they did in 1991, and even embedded processors typically have at least 1KB of RAM -- allowing MD6 to have a large input message block size (512 Bytes). The arrival of parallelism in mainstream processors makes a whole lot of flops available, which MD6 puts to use with a tree-based algorithm that works extremely well in parallel (Figure 2.1 from the report is found below).

From the MD6 report:

Computing platforms are quickly becoming highly parallel, as chip manufacturers realize little more performance is available by increasing clock rates. Increased performance is instead being obtained by utilizing increased parallelism, as through multi-core chip designs. Indeed, typical chips may become very highly parallel, very soon. Anwar Ghoulum, at Intel's Microprocessor Technology Lab, says, 'developers should start thinking about tens, hundreds, and thousands of cores now.' Section 4.7.1 shows how the speed of MD6 can be dramatically increased by utilizing the CILK software system for programming multicore computers. ... We implemented MD6 for multicore processors using the CILK extension to the C programming language [...] The CILK technology makes multicore programming quite straightforward [...] Our implementation of MD6 in CILK used the layer-by-layer approach (so it assumes that the input message is available all at once). It processes each layer in turn, but uses parallelism to process a layer efficiently."

The full MD6 application has over 3,000 lines of code. Adding two Cilk keywords (at the bottom of the code snippet below) was sufficient to multicore-enable the algorithm. MD6 is recursive, and adding the Cilk++ keywords exposes a great deal of parallelism.


    static
    void hash_segment (parallel_md6_info *pHashInfo, int ell, size_t lo, size_t hi)
    {
        // Check wheter we're paused.  We use both a global variable and an
        // event to avoid constant kernel transitions to check the event.
           if (pHashInfo->lPause)
                  WaitForSingleObject (pHashInfo-%gt;hPause, INFINITE);
          if ((hi-lo)==512) /* compression input block size */
        {
            int err = compress_block(ell,lo);
            if (err)
            {
                ::PostMessage (pHashInfo->hWnd, WMU_PARALLEL_ERROR, err, 0);
                error = 1;
            }
        }
        else
        {
            size_t mid = (lo+hi)/2;
            <b>cilk_spawn</b> hash_segment(pHashInfo, ell, lo, mid);
            hash_segment(pHashInfo, ell, mid, hi);
            <b>cilk_sync</b>;
        }
    }

The following illustrates the speed of MD6 using various numbers of cores on a 16-core system (2.2GHz AMD Barcelona CPU, four quad-core sockets).

To facilitate measurement, this particular implementation works with a 512 MB file of artificial data, and presumes that the data is available all at once -- eliminating I/O timing issues. As the report observes:

Using CILK, MD6's parallel performance is the best one can hope for: its throughput increases linearly with the number of processors (cores) available.

The report concludes:

Because of its tree-based mode of operation, MD6 is particularly well-suited for parallel implementations on multicore processors and GPUs. Speeds of many hundreds of megabytes per second are easily obtained, and speeds of 1-2 gigabytes second are very achievable.

Processor architecture is currently trending to larger numbers of homogenous cores -- both CPUs and GPUs are following this trend. Because the performance of individual cores is not improving, the throughput of traditional sequential algorithms, which used to have exponential performance growth, has stagnated. On the other hand, highly parallel algorithms, like MD6, are likely to continue to see improved throughput well into the foreseeable future as coarse-grained machine parallelism increases.

How Fast Can You Go?

At the time of this writing, the >1GB/sec throughput achieved on a 16-core system with the Cilkified version is a world record for MD6!

Would you like to push the envelope further? Here's the Cilkified MD6 (Linux) implementation -- run it on the biggest shared memory system you can find, an let us know the reported throughput number (and system specs)!


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video