Channels ▼
RSS

C/C++

Padding and Rearranging Structure Members


Many of today's processors can address memory one 8-bit byte at a time. They can also access memory as larger objects such as 2- or 4-byte integers, 4-byte pointers, or 8-byte floating-point numbers.

Multibyte objects often have an alignment. The C Standard defines alignment as a "requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address". The Standard leaves it up to each target processor to specify its alignment requirements. That is, a processor might require that a 4-byte integer or pointer referenced as a single object be word aligned — at an address that's a multiple of four. A processor also might require that an 8-byte floating-point number be word aligned, or maybe even double-word aligned — at an address that's a multiple of eight.

According to the C Standard, a program that attempts to access an improperly aligned object produces undefined behavior. This means that the program is in error, but the exact consequences of that error are platform-dependent. With many processors, an instruction that attempts to access improperly aligned data issues a trap. With other processors, an instruction that accesses misaligned data executes properly but uses up more cycles to fetch the data than if the data were properly aligned.

An object whose address requirement is a higher multiple than another is said to have a stricter alignment. For example, an object that must be double-word aligned (at an address that's a multiple of eight) has a stricter alignment than an object that must be only word aligned (at an address that's a multiple of four). Character objects always have a size of one (by definition) and can reside at any boundary. They have no alignment requirement.

Machines with 4-byte words and 8-byte double words are very common but hardly universal. The following discussion uses these common sizes for illustrative purposes only. Please bear in mind that machines with other word sizes and alignment requirements do exist.

Structure Padding in C

C compilers may insert unused bytes called "padding bytes" after structure members to ensure that each member is appropriately aligned. For example, given:

typedef struct widget widget;
struct widget
{
    char m1;
    int m2;
    char m3;
};

on a machine where int must be word aligned (on an address that's a multiple of four), the compiler will insert 3 bytes of padding after m1, and also after m3, as if the structure had been defined as:

typedef struct widget widget;
struct widget
{
    char m1;
    char padding_after_m1[3];
    int m2;
    char m3;
    char padding_after_m3[3];
};

except that the added members don't actually have names. The C Standard gives compilers the freedom to add more padding, but I don't know why a compiler would add more than necessary.

(The typedef definition immediately before the structure definition elevates the name widget from a tag to a full-fledged type name; for more information, see Tag vs. Type Names. This lets C code refer to the type as just widget, rather than as struct widget. For brevity, I'll omit the typedefs from now on, but you should assume that they are there.)

Padding between adjacent structure members is called "internal padding." Padding after the last member is called "trailing padding." The C Standard doesn't allow leading padding. It specifically states that "A pointer to a structure object, suitably converted, points to its initial member and vice versa. There may be unnamed padding within a structure object, but not at its beginning."

Each structure object must be at least as a strictly aligned as its most strictly aligned member. For example, the most strictly aligned member in widget is integer m2, which is word aligned. However, the m2 member of an actual widget object won't be word aligned unless that widget object is also word aligned. Thus each widget must be at least word aligned. A compiler could decide to make widget double-word aligned, which it might do if the stricter alignment yielded faster memory access. (I'm just speculating here. I don't know of a compiler that actually does this.)


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video