Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


Writing a Bi-Endian Compiler

Compile and Resolve Diagnosed Issues

Compiling the application in big-endian mode still results in a mixed-endian application because the application interacts with system libraries whose byte order is typically dependent on the target architecture, which in this case is little-endian. With the BEC technology it is possible to execute mixed-endian code. One of the requirements when employing the BEC technology is for every type used in the program to have its byte order defined. When the byte order of a type associated with a variable, a parameter, or a structure is undeclared, the compiler uses the default byte order as specified earlier. C and C++ are not type-safe languages and therefore present some challenges in the compiler implementation.

For instance, when a function's prototype is not declared, the function will execute correctly as long as the byte order of the types used in the function declaration and function call are the same. If the byte order does not match, the function may not execute properly because the function argument may not be in the expected byte order. For these cases, the compiler is enhanced to detect such byte order inconsistency and issue a diagnostic. Example 3 shows such an example where the function foo calls an undeclared function zee. Assume function zee is defined to expect one argument whose type is in little-endian byte order. In this case, the function zee may produce an incorrect result for function foo when the function foo is compiled to expect big-endian types. In general, for mixed-endian code use of function prototypes is essential. The warning message #266 generated from the compiler is to encourage such programming practice.

int foo(int x) 
 return 0; 
>$ icc -big-endian -c test.c 
test.c(3): warning #266: function "zee" declared implicitly

Example 3: Undeclared function (Source: Intel Corporation),

The byte order of a pointer has two aspects, the byte order of the pointer type itself and the byte order of the pointed-to data type. For example, Example 4 shows a case where the compiler generates a diagnostic when the byte order of the pointed-to data type is different in an assignment statement.

#pragma byte_order (push, bigendian) 
     int *z;
#pragma byte_order (pop) 
int foo(int *x) {
     z = x; 
     return 0;
>$ icc -c -littleendian test1.c 
test1.c(6): warning #1696: implicit pointer conversion changes byte order of 
the pointed-to types from "int" to "bigendian int"
      z = x;

Example 4: Byte order difference of pointed-to types (Source: Intel Corporation).

Specifically, the pointer, z, points to a big-endian integer, which is declared by the explicit pragma declaration. The pointer, x, points to a little-endian integer whose byte order is specified by the command line. Therefore, dereferencing of the assigned pointer (z) may result in reading the value in the opposite byte order as was intended. Warning #1696 is emitted by the compiler in these cases and must be heeded in order to produce correctly executing code. To address these warnings, the code should be modified to convert the pointed to data to have a type with the same byte order before assigning. For structures and bit fields the compiler also implements byte order-specific warnings. Example 5 lists a structure in which the big-endian and little-endian bit fields are allocated differently in their containers. Big-endian bit fields are allocated from high to low bit, while little-endian bit fields are allocated from low to high bit. As a result, big-endian and little-endian bit fields allocated to the same container could potentially overwrite each other. Therefore, the compiler issues an error when it detects a structure containing both big-endian and little-endian bit fields. To address these issues, ensure all bit fields contained in the structure have the same byte order.

typedef __attribute__((bigendian))	int be_int; 
typedef __attribute__((littleendian)) int le_int;

struct foo { 
   be_int x:16; 
   le_int y:16;
>$ icc -c -little-endian test2.c: 
Test2.c(6): error #1700: adjacent bit fields have different byte order
    le_int y:16;

Example 5: Byte order difference of Bit fields (Source: Intel Corporation).

The BEC technology allows a byte order to be attributed to the void type. This extension alleviates potential issues casting through the void type where the original and final casted types are of opposite byte order. Example 6 shows a potential error case caused by the void type. The specific issue is that a void pointer defined in a big-endian context is passed to a function that expects a pointer that points to a little-endian variable. There is a risk that the void pointer points to a big-endian variable. To alleviate this problem, the compiler is enhanced to issue a diagnostic in this case. Since this checking is off by default, option -diag-enable 2324 should be used to turn it on. To address this issue, ensure the pointed-to type for each function argument and function parameter has the same byte order.

#pragma byte_order (push, littleendian) 
   typedef int myleint; 
   void func(myleint *arg);
#pragma byte_order (pop)

#pragma byte_order (push, bigendian) 
   void *void_var1;
#pragma byte_order(pop) 
   int main() {
return 0;
>$ icc -c -little-endian -diag-enable 2324 test3.c 
test3.c(10): warning: implicit pointer conversion (involving void) may change 
byte order of the pointed-to types from "bigendian void" to "myleint" func(void_var1);

Example 6: Byte order difference of void type (Source: Intel Corporation).

Finally, conversion between pointers to values of different sizes (for example from int * to char *), while safe in code that employs little-endian types, may result in an incorrect pointed-to value in code that employs big-endian types and executes on a little-endian processor. Standard compiler diagnostics are extended to account for pointer casts of different sizes. One method of addressing this class of issue is to explicitly convert the source and destination pointed to values to little endian before the cast.

Employ Symbol Consistency Checking

The second step in porting is to employ the symbol consistency checking utility and resolve identified incompatibilities between different compilation units. To perform symbol consistency checking, compile the sources with the -symcheck option and feed the resulting executable to the BEC symbol consistency checking tool.

Consider cases where symbol incompatibility exists and why they would cause problems for applications. As was previously described, the compiler makes automatic code adjustments based on the byte order information available through type declarations and function prototypes. While the compiler checks for correctness in a compilation unit, it knows nothing about interactions across different units. The BEC symbol consistency checking tool verifies that global symbols referencing the same variable have bi-endian compatible types across all compilation units and reports incompatibilities if they exist. Two types are considered bi-endian compatible when they are compatible according to the C language specification and have either the same byte order or are byte order agnostic. Thus, if a global symbol has type A in one compilation unit and type B in a second, the types should be bi-endian compatible to guarantee proper functioning of an application. The BEC symbol consistency checking tool verifies that global symbols with the same name have bi-endian compatible types across all compilation units compiled with -symcheck and reports incompatibilities.

An example of bi-endian incompatible types is the case of two types with different byte order. Assume you have included header file "i.h" with a declaration of a global variable g in two compilation units compiled with different default byte order. As a result, the symbol g is attributed as little-endian in one compilation unit and big-endian in another. The BEC symbol consistency checking tool reports the error listed in Example 7.

To address the identified issue, the declarations would need to be modified to have the same byte order.

bepostld: error #32600: symbol 'g' type differs between modules 
   First	declared at t2.c(2) 
   Later redeclared at i.h
	included from t1.c(1)
Error reason: 
    incompatible endianess:
	  declared at t2.c(2)
   is not compatible with __attribute__((bigendian))
	  declared at i.h 
       included from t1.c(1)

Example 7: Symbol consistency error example (Source: Intel Corporation).

The utility diagnoses other problems that may affect application correctness including mismatching type size, number of function arguments, and field offset. Typically, to help ensure correct execution of the application, one needs to address all the errors reported by the tool. These diagnostics serve a useful purpose in helping the developer correct issues before they result in difficult-to-debug execution time problems.

Manual Review and Debug

The third porting step is to conduct a manual review of the code and debug using BEC-enabled debuggers. There are parts of the code that cannot always be automatically checked for byte order consistency. For example, Example 8 illustrates a case where inlined assembler code that manipulates C variables receives automatic adjustment of the byte order by the compiler.

However if the source code of the application contains assembler instructions that directly manipulate big-endian data in memory, the source code must be modified to assume the little-endian byte order of the target architecture.

#include <stdio.h>

int main(){ 
   int __attribute__((littleendian)) i = 1; 
   int __attribute__((bigendian)) j = 0; 
   // The asm code below sets j = i; 
   asm ("movl %1, %%eax;"
	"movl %%eax, %0;" 
   printf("i = %d, j = %d\n", i, j); 
   return 0;
$ icc -o test.exe -big-endian test.c 
$ ./test.exe i = 1, j = 1

Example 8: Assembly language byte order example (Source: Intel Corporation).

Other examples of problematic code that require manual review are: overlapping union fields specified to have different byte order, bit fields operated on in bulk, and other operations on data that disregard the specified data type.

To help users catch byte order-specific issues the Intel debugger (idb) is employed. It is capable of displaying values of various byte orders correctly. Consider two global variables of different byte order defined in a source file test.c as listed in Example 9.

int __attribute__((bigendian)) bi = 1;
int __attribute__((littleendian)) li = 1;

Example 9: Byte order of global variables (Source: Intel Corporation).

The option -debug biendian enables the compiler to produce additional debug information for correct display of big-endian data:

icc -g -debug biendian -big-endian test.c

After compilation the executable can be debugged using idb. A debug session will display the byte order and correct value as shown in Example 10.

(idb) whatis bi 
type = int __be 
(idb) p bi 
$1 = 1
(idb) whatis li 
type = int 
(idb) p li 
$2 = 1

Example 10: Debugger console view of variables (Source: Intel Corporation).

Porting Effort

Table 1 details statistics on the effort to port various applications. These applications were ported by compiling them with the -big-endian option and following the steps detailed previously. 19 C/C++ SPEC 2006 applications were ported.

Application Total Lines of Code Modified Lines of Code
400.perlbench 154034 77
401.bzip2 7057

403.gcc 457947 185
429.mfc 2057 0
433.milc 13192 8
444.namd 4589 8
445.gobmk 174467 68
447.dealII 176393 20
450.soplex 36829 0
453.povray 130551 48
456.hmmr 32692 17
458.sjeng 12273 10
462.libquantum 3391 0
464.h264ref 46046 19
470.lbm 975 6
471.omnetpp 40200 75
473.astar 4467 1
482.sphinx3 22438 54
483.xalancbmk 463252 86

Table 1: Porting statistics (Data source: Intel Corporation, 2011).

As shown, half of the applications required changes amounting to 10 lines of code or less. Large applications require an increased number of changes, but these changes are still relatively small compared with the overall size of the application. All of these applications except one executed correctly after completion of the first porting step (address the default bi-endian-specific compiler diagnostics). One remaining application required manual changes. The overall porting of most of the applications took less than one hour each (for an experienced BEC user); the most complex cases took up to a day. A typical amount of training for a new user amounts to approximately three days.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.