### Candidates for Hash

Let's look at each of the final five and get an overview of its internal workings. Due to length, we could not show any source code for each candidate. But we will provide the URLs to that candidate's home site. Readers are welcomed to download the source project and do their own tests.

Unless stated otherwise, all five candidates can emit 256- or 512-bit hashes. All divide the message data into discrete chunks prior to processing, padding the last chunk, if necessary, according to a preset rule.

The BLAKE Function is based on the ChaCha stream cipher developed by Dan Bernstein. It uses a table of 16 constant words (`C`

) and a table of ten 16-element permutations (_{w}`P`

). The size of each message chunk (`M`

) is dictated by the final hash size. So, for a 256-bit hash, the chunk size is set at 32 bits. For a 512-bit hash, it is 64 bits.
_{w}

BLAKE's compression block was derived from a ChaCha quarter round. It uses ten operations of a single block round. The block combines two message words with two constant words. Each word, message or constant, is chosen based on the current round. There are about 8 compression rounds in a single block round. BLAKE uses 14 block rounds to produce a 256-bit hash, 16 rounds for a 512-bit hash.

The BLAKE hash function boasts a simple code structure, one that is easy to analyze. It lends itself well to parallelization and can use of multiple processor cores for faster throughput. Moreover, it can be optimized for embedded systems.

Next, the function can use salts to further randomize the final hash. Salts help the function resists pre-image and second pre-image attacks.

The function performs well in either hardware or software form. On an Intel Core 2, BLAKE-256 takes an average of 16 cycles per byte to produce a hash, and BLAKE-512 an average of 10 cycles per byte.

On the other hand, BLAKE is not meant to be scalable. Its authors deemed that scalability is impractical in most real-world applications. They also think scalability makes it difficult to locate security flaws.

The function also does not support variable-length hashing. It can only emit two types of hashes: 256 bit and 512 bit. According to its authors, supporting variable-length hashes only complicates the function's otherwise simple structure. Plus, they see little advantage to hashes longer than 512 bits in most real-world applications.

Grøstl uses a variant of the S-box construct found in the AES cipher. It keeps its hash states at double the output bit length. So, for a 256-bit hash or less, Grøstl uses states that are 512-bits long. For a 512-bit hash, it uses 1024-bit states. Grøstl also uses the same bit lengths to divide its message data.

Grøstl's S-box consists of two permutation functions `P`

and `Q`

. These are the same ones found in AES, but are modified to work with an 8-by-8 byte table (or 8-by-16 for Grøstl-512). Grøstl-256 uses 10 rounds to process the message data; Grøstl-512 uses 14 rounds.

After completing the requisite rounds, Grøstl subjects the last hash state to one more round of mixing. The mixed state is then truncated to the last n bits prior to output, n being the desired hash length.

Like BLAKE, Grøstl has a simple code structure. Its two permutation functions are easy to examine for cryptographic flaws. It can resist side-channel attacks, as well as length extension attacks. Plus, its S-box is designed to deter attempts to add a trap-door function.

Grøstl is platform-agnostic, and it is fast. On an Intel Core 2, Grøstl-256 was clocked at an average of 21.4 cycles per byte, Grøstl-512 at 19 cycles per byte. And unlike BLAKE, Grøstl can emit 224-bit and 384-bit hashes.

JH Function uses 1024-bit hash states and divides its message data into 512-bit chunks. In Figure 1 is the compression block used by JH. First, the block combines a message chunk (`mi`

) with the upper-half of the current hash state (`hi`

). It then submits the modified state (`h'i`

) to the bijective function `E`

. The function returns a new hash state (`hi+1`

), and the block combines the lower half of that state with the same message chunk. The final state is then truncated to produce the hash output.

Bijective E is basically a block cipher that uses a constant key. It starts by dividing the hash state into 4-bit chunks, storing the results in a private array. It processes the array with two 4-bit S-boxes, then with an an 8-bit linear transform. Next, the function uses a permutation routine to further randomize the state bits.

Bijective E repeats the last three steps 48 times. Once done, it combines the array elements into a new state hash.

Like its fellow candidates, JH has a simple and effective code structure. Its bijective function is easy to analyze for flaws and can be optimized to use bit-slicing for better performance. The function has a strong resistance to collision attacks. It is also efficient and performs quite well, even when not optimized. On an Intel Core 2 processor, JH takes an average of 37.3 cycles per byte to produce a hash. If optimized for a 64-bit host system, JH takes about 16.1 cycles per byte.

On the other hand, JH is claimed to be too inefficient for message authentication. The author himself suggested that a dedicated routine be used to handle MAC tasks.