## A fast block cipher that uses a 128-bit key

*Joan is an employee of Banksys, Brussels. He can be contacted at [email protected] Lars and Vincent are with the COSIC research group of the Department of Electrical Engineering of the Katholieke Universiteit Leuven, Belgium. They can be contacted at [email protected] and [email protected], respectively. *

Sidebar:

*The Galois Field GF(2*

^{8})
Square is a new fast block cipher that encrypts data in blocks of 128 bits, using a 128-bit key. Most cipher designs are optimized for one specific implementation platform. Square's structure (that is, its building blocks and their interaction) has been carefully chosen to allow very efficient implementations on a wide range of processors. This versatility is achieved through the use of operations in the finite field GF(2^{8}), which is a new approach in block cipher design. A reference implementation was written in C by Paulo Barreto ([email protected]) and runs at 2.63 MB on a 100 Mhz Pentium (available electronically; see "Availability," page 3). By optimizing in assembler, this speed was doubled.

### Structure

Square is an iterated, but not a Feistel, cipher. Instead, every round acts upon all 16 bytes of its the input. One round of Square consists of four different operations. These operations can be efficiently combined in a single set of table lookups and XOR operations. The round operations are easiest to visualize if you think of the 16 bytes as being arranged in a 4×4 array; see Figure 1. An intermediate value *a *can then be represented by the 16 bytes *a _{i,j}*,0<=

*i,j*< 4. The four operations are:

- A linear transformation that operates separately on the four rows of four bytes each. is determined by four coefficients
*c*,0<=_{j}*j*<4.*b*=(*a*) corresponds toIn this formula, additions and multiplications are performed in the Galois field GF(2

^{8}); see the accompanying text box entitled "The Galois Field GF(2^{8})." - A nonlinear transformation that uses one invertible 8-bit S-box. Each of the 16 bytes is transformed independently by the S-box.
- A byte permutation that rearranges the bytes by interchanging rows and columns (that is, it transposes the array).
- Key addition. Exclusive-Or with the round key
*k*is denoted [^{t}*k*] (Exclusive-or corresponds to addition in the field GF(2^{t}^{8})). - The round operation is denoted [
*k*] and defined as^{t}*q*^{-1}: Square[*k*]=

### Cryptanalysis

Square was designed following the Wide Trail Design Strategy (see "Cipher and Hash Function Design Strategies based on Linear and Differential Cryptanalysis," by Joan Daemen, doctoral dissertation, March 1995, K.U. Leuven), which separates the functionality of the block cipher round operation in separate layers -- the nonlinear substitution, linear diffusion layer, and key addition. Strict criteria are imposed on each layer to build ciphers that resist linear and differential cryptanalysis.

For a description of the basic principles of linear and differential cryptanalysis, refer to "Differential and Linear Cryptanalysis," by Bruce Schneier (*DDJ*, January 1996). In short, differential cryptanalysis works by applying small changes to the input of the algorithm and predicting the resulting differences at various intermediate stages of the algorithm. Resistance against differential cryptanalysis can be improved by ensuring that small differences at the input cause large differences at the output (and intermediate) values. In Square, this diffusion of differences is ensured by the two linear transformations: and .

*q* takes care of the intrarow diffusion. If one byte of a row at the input is different, all four bytes of this row in the output are different. For example, if *n* bytes of a row at the input are different (1<=*n*<=4), at least 5-*n* output bytes of this row are different.

produces interrow diffusion. After , the four bytes of a row are spread over the four different rows.

Together, and cause a very quick spread of differences; see Figure 2. Suppose two bytes of the same row at the input of the first round are different, then after , at least three bytes of this row are different. distributes these bytes to three different rows at the input of the next round. Then 3×4 of the output bytes of are different. The differences are distributed by over the four rows, and so on.

An immediate consequence of the mathematical structure of Square is that the resistance against linear cryptanalysis can be treated in an analogous way. Square is resistant to linear and differential cryptanalysis after six of its eight rounds.

There exists, however, a dedicated attack that breaks six rounds of Square (see our paper "The Block Cipher Square," in *Fast Software Encryption*, E. Biham, editor, Springer-Verlag 1995). Since Square has eight rounds, this dedicated attack poses no problems.

### Implementations

To illustrate how Square can be used, we have written a smart card application for the Motorola M68HC05 microcontroller. (Smart cards are becoming important as a platform for implementing cryptography and security.) The machine code occupies 547 bytes of ROM and needs 36 bytes of RAM. One execution of Square (including the key schedule) takes about 7500 cycles. This corresponds to less than 2 milliseconds with a 4-MHz clock. In comparison, a typical DES implementation occupies about 1 KB of ROM. One DES operation encrypts only 64 bits and needs about 16,000 cycles.

Writing out the consecutive transformations of the cipher for 32-bit processors, the succession of steps

occurs seven times. These transformations can be combined into a single set of table lookups. An intermediate value of the text can be represented by four 32-bit words, each containing a row [

*a*]. Its transpose is denoted by [

_{i}*a*]

_{i}^{T}. For

we have Figure 3(a). We define the tables

*M*and

*T*as Figure 3(b).

*T*and

*M*have 256 entries of four bytes each. The table

*M*implements the multiplications in GF(2

^{8}).

*T*combines the nonlinear substitution with the multiplications. Define rotr

*j*([

*a*]) as the rotation of the row [

_{i}*a*] with

_{i}*j*bytes to the right. This yields Figure 3(c). We conclude that

can be done with 16 table lookups, 12 rotations, and 16 XORs of 32-bit words. This implementation needs the table

*T*, with 256 entries of four bytes -- 1 KB in total.

### Further Reading

More details about Square are available in our paper "The Block Cipher Square," which has been published in *Fast Software Encryption *(E. Biham, editor, Springer-Verlag 1995) and at http://www .esat.kuleuven.ac.be/rijmen/square/.

**DDJ**

*Copyright © 1997, Dr. Dobb's Journal*