### Performance Analysis

We studied the performance of the AES-CBC + Elephant diffuser (the original implementation and the one recommended here for maximum performance) and the proposed AES-ECB + Elephant diffuser on a single processor, then estimated their performance on a dual-core processor.

The measurements we report are processor clock cycles on a PIV 3-GHz processor running Windows Vista. The programming environment is Microsoft VC++.

### Single Processor

Our optimized implementation for Diffuser A and Diffuser B (using a loop unrolling mechanism) shows that:

- 4560 clock cycles are required for the current implementation of the diffusion layer.
- 256 cycles (128 for the
*XOR*process of the sector key and the other 128 in the*XOR*process in the CBC mode, using 32-bit*XOR*operation). - 13,888 clock cycles for the AES encryption (using optimized assembly language).

That is, it takes 18,704 clock cycles to encrypt a 512-byte sector using the AES-CBC + Elephant diffuser. This value can be reduced to 15,854 if you use (AC=2 and BC=1); that's about an 18 percent enhancement in the total running time.

With the AES-ECB + Elephant diffuser:

- 4560 clock cycles are required for the current implementation of the diffusion layer.
- 128 clocks for the
*XOR*process of the sector key. - 32 clocks for the addition of the counter.
- 13,888 clock cycles for the AES encryption.

In this case, 18,608 clock cycles are used to encrypt a 512-byte sector using the AES-ECB + Elephant diffuser. This value can be reduced to 16,328 if the minimum recommended values are used to achieve maximum performance (AC=2 and BC=2). That's about a 14 percent enhancement in the total running time.

### Dual Processor

To take advantage of dual-core processors, we investigated the AES-ECB + Elephant diffuser with the AES-ECB layer (which can be easily parallelized). Here, we estimated the processing time when a dual-core processor was used. For simplification, we divide the processing time by two when parallelization can be done.

In the case of the AES-CBC + Elephant diffuser, *XOR*ing with the sector key can be parallelized, so it takes only 64 clock cycles. Neither the diffusion layer nor the AES-CBC can be parallelized (by definition, they are serial). So the estimated processing times are 15,696 (when AC=5 and BC=3) and 12,846 (when AC=2 and BC=1) clock cycles for encrypting a 512-byte sector.

For the AES-ECB + Elephant diffuser, because counter addition can be parallelized, it takes only 16 clock cycles. The *XOR*ing with the sector key can be parallelized, so it only takes 64 clock cycles. The diffusion layer cannot be parallelized. But since the AES-ECB layer can be parallelized, the estimated processing times are 11,584 (when AC=5 and BC=3) and 9304 (when AC=2 and BC=2) clock cycles for encrypting a 512-byte sector. This is about 60-100 percent faster than the original AES-CBC + Elephant diffuser implementation (depending on the values of AC and BC).