Channels ▼
RSS

Design

New Processor Instructions for Accelerating Encryption and Authentication Algorithms


High Performance Implementation of AES in Counter Mode

Significant performance optimization for encrypting (and decrypting) can be achieved if software using the AES instructions is designed to process multiple data blocks in parallel. This "software pipelining" technique is applicable for parallelizable modes of operation such as Electronic Code Book (ECB), CTR, and decryption with the Cipher Block Chaining (CBC-Decryption) mode.

In such modes, different data blocks can be encrypted (or decrypted) independently of each other, and the hardware that supports the AES round instructions is pipelined. This allows independent AES instructions to be dispatched, theoretically every one to two CPU clock cycles, if data can be provided sufficiently fast. As a result, the AES throughput can be significantly enhanced for parallel modes of operation, if the software implementation itself is pipelined. Instead of completing the encryption of one data block and then continuing to the subsequent block, it is preferable to write software sequences that compute one AES round on multiple blocks, using one round key, and only then continue to compute the subsequent round for multiple blocks. This technique speeds up any parallelizable mode of operation, in particular the CTR mode. Listing 3 shows a code snippet encrypting eight blocks in parallel as part of the CTR mode (where the counters are encrypted).


mov rdx, OFFSET keyex_addr
; load Round key
movdqu xmm0, XMMWORD PTR [rdx]
pxor xmm1, xmm0
pxor xmm2, xmm0
pxor xmm3, xmm0
pxor xmm4, xmm0
pxor xmm5, xmm0
pxor xmm6, xmm0
pxor xmm7, xmm0
pxor xmm8, xmm0

mov ecx, 9
main_loop:
; load Round key
add rdx, 0x10
movdqu xmm0, XMMWORD PTR [rdx]
aesenc xmm1, xmm0
aesenc xmm2, xmm0
aesenc xmm3, xmm0
aesenc xmm4, xmm0
aesenc xmm5, xmm0
aesenc xmm6, xmm0
aesenc xmm7, xmm0
aesenc xmm8, xmm0

loop main_loop
add rdx, 0x10
movdqu xmm0, XMMWORD PTR [rdx]
aesenclast xmm1, xmm0
aesenclast xmm2, xmm0
aesenclast xmm3, xmm0
aesenclast xmm4, xmm0
aesenclast xmm5, xmm0
aesenclast xmm6, xmm0
aesenclast xmm7, xmm0
aesenclast xmm8, xmm0

; storing the encrypted blocks
movdqu XMMWORD PTR [dest], xmm1
movdqu XMMWORD PTR [dest+0x10], xmm2
movdqu XMMWORD PTR [dest+0x20], xmm3
movdqu XMMWORD PTR [dest+0x30], xmm4
movdqu XMMWORD PTR [dest+0x40], xmm5
movdqu XMMWORD PTR [dest+0x50], xmm6
movdqu XMMWORD PTR [dest+0x60], xmm7
movdqu XMMWORD PTR [dest+0x70], xmm8

Listing 3: AES Encryption of Eight Blocks in Parallel (Source: Intel Corporation, 2009)

High Performance Implementation of Galois Counter Mode

We now examine how GCM can be efficiently computed by using the PCLMULQDQ instruction, in combination with some improved algorithms.

The most compute-intensive part of GCM is the computation of the Galois hash, which is multiplication in the finite field GF(2128), defined by the reduction modulo g = x128 + x7 + x2 + x + 1. The multiplication in this field is carried out in two steps: the first step is the carry-less multiplication of two 128-bit elements, and the second step is the reduction of the 256-bit carry-less product modulo g = x128 + x7 + x2 + x + 1. We explain these steps in the rest of this section.

Computing a 256-bit Carry-less Product with the PCLMULQDQ Instruction

The following algorithm steps can be viewed as one iteration of a carry-less schoolbook multiplication:

  1. Multiply carry-less by the following operands: A0 with B0, A0 1 with B1, A0 with B1, and A1 with B0. Let the results of the above four multiplications be A0 * B0 = [C1 : C0], A1 * B1 = [D1 : D0], A0, * B1 = [E1 : E0], A1 * B0 = [F1 : F0]
  2. Construct the 256-bit output of the multiplication [A1: A0] * [B1 : B0] as follows in Equation 5:

One can also trade off one multiplication for additional XOR operations. This 2-step alternative approach can be viewed as a "carry-less Karatsuba" multiplication [9]:

  1. Multiply carry-less by the following operands: A1 with B1, A0 with B0, and A0 ⊕ A1 with B0 ͵B; A1. Let the results of the above three multiplications be [C1 : C0], [D1 : D0], and [E1 : E0], respectively.
  2. Construct the 256-bit output of the multiplication [A1: A0] * [B1 : B0] as follows in Equation 6:

Both methods can be used for the first step of the computation of the Galois hash.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video