*Shay Gueron is an Intel Principal Engineer and security architect for the CPU Architecture Department in the Mobility Group. Michael E. Kounavis is a Senior Research Scientist at Intel Labs. They can be contacted at [email protected]om and [email protected], respectively. Copyright (c) 2009 Intel Corporation. All rights reserved.*

Message confidentiality and integrity are key to the security of applications, operating systems, and the network infrastructure of the Internet in the future. As a result, improving the performance and security of encryption and authentication has significant benefits for today's computer platforms. In this article we describe new tools that Intel offers in this area.

First, we focus on instructions and software techniques for supporting high-performance encryption and decryption (for confidentiality) by using the Advanced Encryption Standard (AES), and for supporting the Galois Counter Mode (GCM), which is used (for integrity) in the AES-GCM authenticated-encryption protocol. AES is the Federal Information Processing Standard for symmetric encryption and is defined by FIPS Publication #197 (2001). It is widely used in a large variety of security applications. GCM is a message authentication protocol that was endorsed by the US Government in April 2006, and it is typically used, together with AES, for authenticated encryption. The GCM is also used by the IEEE 802.1ae standard, where its usage is recommended for forwarding rates higher than 10 Gbps. Other usage models of GCM include IPsec (IPsec RFC 4106), the storage standard P1619, and security protocols over fiber channels (ISO-T11 standard).

We describe how AES can be accelerated with the new processor instructions that Intel is introducing to the ISA, and we look at how GCM can be accelerated with another new instruction that computes the carry-less product of two 64-bit operands. This new instruction is used by a reduction algorithm that takes advantage of the fact that in GCM, the reduction polynomial of the associated GF (2128) Galois field is sparse [2]. This algorithm uses carry-less multiplications, implemented by this new instruction, and due to its efficiency, there is no need to add field-specific reduction logic to the processor architecture: the generic carry-less multiplication primitive can do the computation.

### The AES Instructions

AES is a block cipher that encrypts a 128-bit block (plaintext) to a 128-bit block (ciphertext), or decrypts a 128-bit block (ciphertext) to a 128-bit block (plaintext). AES uses a cipher key whose length can be 128, 192, or 256 bits, respectively. Hereafter, encryption/decryption with a cipher key of 128, 192, or 256 bits is denoted as AES-128, AES-192, AES-256, respectively. AES-128, AES-192, and AES-256 process the data block in 10, 12, or 14 iterations of pre-defined sequences of transformations, which are also called AES rounds (hereafter referred to simply as "rounds"). The rounds are identical except for the last one, which slightly differs from the others (by skipping one of the transformations). They operate on two 128-bit inputs: state and round key. Each round from 1 to 10/12/14 uses a different round key. The 10/12/14 round keys are derived from the cipher key by the "key expansion algorithm." This algorithm is independent of the processed data, and can therefore be carried out independently of the encryption/decryption phase. (Typically, the key is expanded once and is thereafter used for many data blocks by using some cipher mode of operation). The data block is processed serially; initially, the input data block is XOR'd with the first 128 bits of the cipher key to generate the state (an intermediate cipher result). Subsequently, the state passes, serially, through 10/12/14 rounds, with each round consisting of a sequence of transformations operating on the state and using a different round key. Listing 1 illustrates the AES encryption flow, for a single 16-byte block, by using the terminology of the FIPS197 document, which defines AES (see also [1] for details).

<b>Input</b>: Data: 16 bytes to encrypt Round_Key_Encrypt: array of 11-15 16-byte blocks which are the expanded cipher key Tmp = AddRoundKey (Data, Round_Key_Encrypt [0]) For round = 1-9 or 1-11 or 1-13: Tmp = ShiftRows (Tmp) Tmp = SubBytes (Tmp) Tmp = MixColumns (Tmp) Tmp = AddRoundKey (Tmp, Round_Key_Encrypt [round]) end loop Tmp = ShiftRows (Tmp) Tmp = SubBytes (Tmp) Tmp = AddRoundKey (Tmp, Round_Key_Encrypt [10 or 12 or 14]) <b>Output</b>: Tmp : (16 bytes)

A new set of instructions will be introduced in the next generation of the Intel processor family to facilitate secure and high-performance AES encryption and decryption. The instructions are described by using the terminology found in FIPS197; in this document, the details of the transformations, the encryption/decryption flows, and key expansions are provided in full. See also [1] for details on the AES instructions and their usages.

The AES architecture offers six instructions to support AES (see Table 1 and Listing 2). AESENC and AESENCLAST support encryption. AESDEC and AESDECLAST are building blocks suitable for decryption that use the Equivalent Inverse Cipher (see FIPS197 for definition).

AESIMC and AESKEYGENASSIST support the key expansion. AESIMC facilitates the conversion of the encryption round keys to a form suitable for the Equivalent Inverse Cipher. AESKEYGENASSIST uses an immediate byte as part of the input (used as RCON).

<b>Input</b>: Data: 16 bytes to encrypt Round_Key_Encrypt: array of 11-15 16-byte blocks which are the expanded cipher key Tmp = XOR128 (Data, Round_Key_Encrypt [0]) For round = 1-9 or 1-11 or 1-13: Tmp = AESENC (Tmp, Round_Key_Encrypt [round]) end loop Tmp = AESENCLAST (Tmp, Round_Key_Encrypt [10 or 12 or 14]) <b>Output</b>: Tmp: (16 bytes)

The AES processor instructions are designed based on the structure of AES, a structure that includes transformations in a GF(28) Galois field and byte shuffling transformation. The instructions execute the AES transformations efficiently by holding the operands in the SIMD registers of the IA architecture, and by using dedicated hardware.

### Protection Against Software Side-Channel Vulnerabilities

An important security advantage of using AES instructions is the protection it provides against software side-channel attacks (by other Ring 3 malicious applications).

Software side channels are vulnerabilities in the software implementation of cryptographic algorithms, and they emerge in multiple processing environments (cores, threads, and operating systems).

Cache-based software side-channel attacks exploit the fact that it takes time for a particular piece of data to be accessed, if the data are not in the cache. Because of this time lag, malicious code can potentially detect the memory addresses that are being accessed during encryption or decryption. In software implementations of AES, based on look-up tables, these addresses can reveal sensitive information about the keys.

On the other hand, the AES instructions are implemented via combinatorial logic, and their latency is data-independent. Therefore, software implementations of AES that use these instructions are not susceptible to any of the known software side-channel attacks [1].

### The Carry-less Multiplication Instruction

Carry-less multiplication, also known as "binary polynomial multiplication," is the mathematical operation of computing the product of two operands without generating or propagating carries. Such multiplications are an essential step in computing multiplications in binary Galois fields.

Carry-less multiplication is defined as follows. Let **A** and **B** be two n-bit operands in Equation 1:

and Equation 2:

If the carry-less product of **A** and **B** is denoted by **C = A * B**, and **C** is the bit array **C = [ _{c}2n - 1** c

_{2n - 2}... c

_{0}], then, the bits of

**C**are defined as the following functions of the bits of the inputs

**A**and

**B**in Equation 3:

for 0 ≤ i ≤ n - 1, in Equation 4, and:

for n-1 ≤ i ≤ 2n-1

See Figure 1.

As an example, if **A** = 0x63746f725d53475d and **B** = 0x5b477565726f6e5d, the carry-less product is **C** = **A** * **B** = 0x1d4d84c85c3440c0929633d5d36f0451.

### The PCLMULQDQ Instruction

Together with the AES instructions, Intel also introduces PCLMULQDQ, an instruction for computing the carry-less multiplication of two 64-bit halves (hereafter referred to as "quadwords") that are selected from the instruction's two operands (two xmm registers or one xmm register and one memory location), according to an immediate byte value (imm8), defined in Table 2:

### Efficient Implementation of AES-GCM

AES-GCM is an authenticated encryption algorithm, which is built upon AES encryption in counter (CTR) mode, and it is a computation of a Galois hash. AES-GCM uses a single key to both encrypt and authenticate data. An authenticated encryption algorithm is different from classical encryption and authentication schemes, where two independent keys are required to make both functions secure [2-8].

Figures 2 and 3 briefly describe the CTR mode of operation and the AES-GCM algorithm. Figure 2 shows the AES encryption of multiple blocks, by using CTR mode, and Figure 3 illustrates the AES-GCM algorithm.