The Block Cipher Square Algorithm

Square is a new fast block cipher that encrypts data in blocks of 128 bits, using a 128-bit key. Square's structure has been carefully chosen to allow very efficient implementations on a wide range of processors.


October 01, 1997
URL:http://www.drdobbs.com/database/the-block-cipher-square-algorithm/database/the-block-cipher-square-algorithm/184410296

Dr. Dobb's Journal October 1997: The Block Cipher Square Algorithm

A fast block cipher that uses a 128-bit key

Joan is an employee of Banksys, Brussels. He can be contacted at [email protected]. Lars and Vincent are with the COSIC research group of the Department of Electrical Engineering of the Katholieke Universiteit Leuven, Belgium. They can be contacted at [email protected] and [email protected], respectively.


Sidebar: The Galois Field GF(28)

Square is a new fast block cipher that encrypts data in blocks of 128 bits, using a 128-bit key. Most cipher designs are optimized for one specific implementation platform. Square's structure (that is, its building blocks and their interaction) has been carefully chosen to allow very efficient implementations on a wide range of processors. This versatility is achieved through the use of operations in the finite field GF(28), which is a new approach in block cipher design. A reference implementation was written in C by Paulo Barreto ([email protected]) and runs at 2.63 MB on a 100 Mhz Pentium (available electronically; see "Availability," page 3). By optimizing in assembler, this speed was doubled.

Structure

Square is an iterated, but not a Feistel, cipher. Instead, every round acts upon all 16 bytes of its the input. One round of Square consists of four different operations. These operations can be efficiently combined in a single set of table lookups and XOR operations. The round operations are easiest to visualize if you think of the 16 bytes as being arranged in a 4×4 array; see Figure 1. An intermediate value a can then be represented by the 16 bytes ai,j,0<=i,j< 4. The four operations are:

Cryptanalysis

Square was designed following the Wide Trail Design Strategy (see "Cipher and Hash Function Design Strategies based on Linear and Differential Cryptanalysis," by Joan Daemen, doctoral dissertation, March 1995, K.U. Leuven), which separates the functionality of the block cipher round operation in separate layers -- the nonlinear substitution, linear diffusion layer, and key addition. Strict criteria are imposed on each layer to build ciphers that resist linear and differential cryptanalysis.

For a description of the basic principles of linear and differential cryptanalysis, refer to "Differential and Linear Cryptanalysis," by Bruce Schneier (DDJ, January 1996). In short, differential cryptanalysis works by applying small changes to the input of the algorithm and predicting the resulting differences at various intermediate stages of the algorithm. Resistance against differential cryptanalysis can be improved by ensuring that small differences at the input cause large differences at the output (and intermediate) values. In Square, this diffusion of differences is ensured by the two linear transformations: [theta] and [pi].

q takes care of the intrarow diffusion. If one byte of a row at the input is different, all four bytes of this row in the output are different. For example, if n bytes of a row at the input are different (1<=n<=4), at least 5-n output bytes of this row are different.

[pi] produces interrow diffusion. After [pi], the four bytes of a row are spread over the four different rows.

Together, [theta] and [pi] cause a very quick spread of differences; see Figure 2. Suppose two bytes of the same row at the input of the first round are different, then after [theta], at least three bytes of this row are different. [rho] distributes these bytes to three different rows at the input of the next round. Then 3×4 of the output bytes of [theta] are different. The differences are distributed by [rho] over the four rows, and so on.

An immediate consequence of the mathematical structure of Square is that the resistance against linear cryptanalysis can be treated in an analogous way. Square is resistant to linear and differential cryptanalysis after six of its eight rounds.

There exists, however, a dedicated attack that breaks six rounds of Square (see our paper "The Block Cipher Square," in Fast Software Encryption, E. Biham, editor, Springer-Verlag 1995). Since Square has eight rounds, this dedicated attack poses no problems.

Implementations

To illustrate how Square can be used, we have written a smart card application for the Motorola M68HC05 microcontroller. (Smart cards are becoming important as a platform for implementing cryptography and security.) The machine code occupies 547 bytes of ROM and needs 36 bytes of RAM. One execution of Square (including the key schedule) takes about 7500 cycles. This corresponds to less than 2 milliseconds with a 4-MHz clock. In comparison, a typical DES implementation occupies about 1 KB of ROM. One DES operation encrypts only 64 bits and needs about 16,000 cycles.

Writing out the consecutive transformations of the cipher for 32-bit processors, the succession of steps

[equation]
occurs seven times. These transformations can be combined into a single set of table lookups. An intermediate value of the text can be represented by four 32-bit words, each containing a row [ai]. Its transpose is denoted by [ai]T. For

[equation]
we have Figure 3(a). We define the tables M and T as Figure 3(b). T and M have 256 entries of four bytes each. The table M implements the multiplications in GF(28). T combines the nonlinear substitution with the multiplications. Define rotrj([ai]) as the rotation of the row [ai] with j bytes to the right. This yields Figure 3(c). We conclude that

[equation]
can be done with 16 table lookups, 12 rotations, and 16 XORs of 32-bit words. This implementation needs the table T, with 256 entries of four bytes -- 1 KB in total.

Further Reading

More details about Square are available in our paper "The Block Cipher Square," which has been published in Fast Software Encryption (E. Biham, editor, Springer-Verlag 1995) and at http://www .esat.kuleuven.ac.be/rijmen/square/.

DDJ


Copyright © 1997, Dr. Dobb's Journal

Dr. Dobb's Journal October 1997: The Galois Field GF(28)

The Galois Field GF(28)

By Joan Daemen, Lars R. Knudsen, and Vincent Rijmen

Dr. Dobb's Journal October 1997

Figure 1: Geometrical representation of the basic operations of Square. It consists of four parallel linear diffusion mappings; g consists of 16 separate substitutions and p is a transposition.


Copyright © 1997, Dr. Dobb's Journal

Dr. Dobb's Journal October 1997: The Galois Field GF(28)

The Galois Field GF(28)

By Joan Daemen, Lars R. Knudsen, and Vincent Rijmen

Dr. Dobb's Journal October 1997

Figure 2: Spreading of differences through four consecutive rounds. The left-side squares stand for the input values of the rounds. The sum of the numbers of bytes that are changed at the inputs of the four rounds equals 28. We can show that, in Square, this sum over four consecutive rounds is always at least 25.


Copyright © 1997, Dr. Dobb's Journal

Dr. Dobb's Journal October 1997: The Galois Field GF(28)

The Galois Field GF(28)

By Joan Daemen, Lars R. Knudsen, and Vincent Rijmen

Dr. Dobb's Journal October 1997

Figure 3: Implementing Square.


Copyright © 1997, Dr. Dobb's Journal

Dr. Dobb's Journal October 1997: The Galois Field

Dr. Dobb's Journal October 1997

The Galois Field GF(28)


Two byte values can be combined in many different ways to result in a third value. In general, a rule to combine two values to obtain a third value is called a "composition law." Some obvious examples of composition laws that are applicable to byte values are modular addition and multiplication, and the bitwise logical OR, AND, and XOR operations. The combination of a set and two composition laws is called an "algebra." If the laws obey certain rules, one of the laws can be seen as "additive" and the other as "multiplicative." Mathematicians have developed powerful tools to derive and prove nonobvious properties for these algebras. A particularly relevant property of such an algebra is the ability to solve x from the equation a+b[dot] x=0 for any value of a and any nonzero value of b. An algebra exhibiting these properties is called a field.

Finite Field

A field with a finite number of elements is called a "finite field" or "Galois Field" (GF()), after the mathematician Evariste Galois.

The most obvious candidates for the two composition laws on bytes are addition modulo 256 and multiplication modulo 256. Unfortunately, the resulting algebra is not a field. For instance, the equation 1+2 [dot] x=0 has no solution.

Still, 28 is a prime power and it is therefore possible to define two composition laws such that the resulting algebra is the field GF(28). For the additive law we take the bitwise XOR. The multiplicative is more involved, and is constructed as follows: The bits in the bytes are considered as coefficients of polynomials of degree smaller than 8 and the composition law is the multiplication of polynomials modulo some irreducible polynomial (with no other divisors than 1 and itself, similar to prime number). In practice, the multiplication can be easily implemented using table lookup methods.

-- J.D., L.K., and V.R.


Copyright © 1997, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.