Channels ▼
RSS

Design

Fast, Parallelized CRC Computation Using the Nehalem CRC32 Instruction


CRC of an Arbitrary Length Buffer

The basic approach to performing the CRC of an arbitrary-length buffer is to divide up the input into a series of fixed-sized buffers, then process each such buffer in a manner similar to the aforementioned example. The size of the fixed-size buffer can be selected to optimize the performance of the range of buffer lengths in a specific usage model. In one specific example, 1920 bytes of the input are iteratively processed until the size of the remaining bytes is less than 1920. We then process 960 bytes, 480 bytes, and/or 240 bytes, depending on how many bytes remain after each step. Finally, the remaining bytes (fewer than 240) are processed in a straightforward linear manner.

The processing for each buffer is handled in a manner similar to processing the 1024-byte fixed-size buffer, except that there is no "final Qword" (because 1920 is divisible by 3). Instead, the processing of the third (least-significant) portion of the input (i.e., the processing that results in crc2) stops one Qword short. Then the recombination becomes identical to that of the 1024-byte fixed-size buffer.

One further optimization is to note that the tables are defined by the distance the partial CRC value needs to be shifted down. This is basically 1/3 or 2/3 of the buffer size minus the 8 bytes to reflect that we are stopping one Qword early. So, for the 240-byte buffer, we need tables that shift by 72 and 152 bytes. For the 480-byte buffer, we need tables that shift by 152 and 312 bytes, and so on. The second table for a given buffer size is the same as the first table for the next larger buffer. That means that for four sizes of buffers, we only need five tables (rather than eight). The use of a small number of fixed buffer sizes is needed to keep the number of tables reasonable. Refer to the aforementioned whitepaper [2] for the detailed description, code, and performance of methods to perform CRC on arbitrary-length buffers.

Performance- CRC of a 1024-Byte Buffer

The performance was measured on the Intel Core i7 Processor and represents the performance on a single core running a single thread. For each run, we compute the CRC of the same data buffer 350,000 times. The CRC output of one calculation is used as the initial CRC for the next calculation to serialize them. The rdtsc instruction is used to sample the processors timestamp clock before and after the run, to get the number of timestamps per run. We then perform 256 runs, discarding the 64 fastest and 64 slowest times, and use the mean of the remaining 128 values. This number is then adjusted to reflect that the timestamp clock may not run at the same rate as the processor core clock due to turbo mode and/or power saving features. Our CRC implementation designed for the fixed-size of 1024 bytes performs at the rate of ~ 1.17 cycles/Qword.

Conclusion

We described optimized algorithms and code for computing the iSCSI CRC, which unleashes the full potential of the CRC32 instruction introduced in the Intel Core i7 processor. The performance of these methods tend towards 1.17 cycles/Qword for large/fixed buffers.

Acknowledgments

The authors would like to thank Erdinc Ozturk, Gil Wolrich, and Deniz Karakoyunlu for their contributions to the development and optimization of the implementation.

References

[1] Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M http://www.intel.com/products/processor/manuals/

[2] Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction http://download.intel.com/design/intarch/papers/323405.pdf

[3] __cpuid, __cpuidex http://msdn.microsoft.com/en-us/library/hskdteyh.aspx

[4] "Determining a Message Residue," Gopal et al. United States Patent 7,886,214

[5] "Determining a Message Residue," Gueron et al. United States Patent Application 20090019342

[6] "Determining a Message Residue," Gopal et al. United States Patent Application 20090158132

[7] "A Tutorial on CRC Computations", Ramabadran et al., Micro, IEEE, IEEE Computer Society, Aug. 1988, pp. 62-75.

[8] "High-Speed CRC Design for 10 Gbps applications," Lin et al., lSCAS 2006, IEEE, pp. 3177-3180.

[9] "Cyclic Redundancy Code (CRC) Polynomial Selection for Embedded Networks," Koopman et al., The International Conference on Dependable Systems and Networks, DSN- 2004, pp. 1-10.

[10] "Parallel CRC Realization," Campobello et al., IEEE Transactions on Computers, vol. 52, No. 10, IEEE Computer Society, Oct. 2003, pp. 1312-1319.

[11] "A Systematic Approach to Building High Performance Software-based CRC Generators," Kounavis et al., Proceedings of the 10th IEEE Symposium on Computers and Communications, ISCC 2005; pp. 855-862.


Vinodh Gopal, Jim Guilford, Martin Dixon, and Wajdi Feghali are IA Architects with the IAG Group at Intel Corporation.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video