Type-Based Alias Analysis in C

The language C has several constraints which allow compiler vendors to implement optimizations in order to emit fast code. One of such an optimization is based on type-based alias analysis (TBAA for short) which itself makes use of the strict aliasing rules. Aliasing refers to the property that one memory location may be accessed in several ways. In C this is done by deriving a pointer from an object. Consider the following function:

void foo(int *x, short *y) {
    *x  = 40;
    *y  = 0;
    *x += 2;
}

Assume a little-endian architecture and a function call foo(&n, (short *) &n) for some variable n of type int. Then after the call variable n holds different values depending on the level of optimization if compiled with GCC 8.2 or Clang 7.0 for the x86-64 architecture. If compiled without any optimization, i.e., compiler option -O0, then variable n holds integer 2, and if compiled with optimization level 2, then variable n holds integer 42. The unoptimized version of function foo is basically a one-to-one translation of the original code whereas the optimized version of the function looks similar to the following one:

void foo(int *x, short *y) {
    *y = 0;
    *x = 42;
}

The compiler assumed that pointers x and y do not alias and based on that performed constant propagation. Is the assumption sound?

tl;dr Rule of thumb: accessing an object of type T through an lvalue expression of type U where T and U are not compatible (modulo few exceptions) invokes undefined behavior.

That means, in the example we accessed an object of type int through an lvalue of type short which leads to undefined behavior. This is defined in C11 section 6.5 paragraph 7:

C11 § 6.5 Expressions ¶ 7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

Let us examine the list of allowed lvalue expressions by several examples:

typedef int sint32_t;
sint32_t *p0;

const int *p1;
unsigned int *p2;
const unsigned int *p3;

struct s { short a; int b; long c; };
struct s *p4;

char *p5;

All pointers p0 through p5 may alias any pointer of type int.

N.B. the standard only speaks about the type of an object and the type of an lvalue in order to access an object. Thus a pointer to an object x may be converted arbitrarily often to arbitrary object pointer types, and therefore even to incompatible types, as long as every access to x is done through an lvalue which type conforms to C11 section 6.5 paragraph 7.

int x;
float *y = (float *) &x;
int *z = (int *) y;
*z = 42;

The example is sound w. r. t. C11 although type int and float are incompatible. However, dereferencing y would lead to undefined behavior.

TBAA in Action

Let us have a look at an example in order to get a better intuition of TBAA and its strengths.

void bar(int *x, T *y) {
    for (int i = 0; i < 1000; ++i)
        x[i] = *y + 42;
}

If type T is compatible with type int, then x and y may alias and therefore it must be assumed that during some iteration the memory location referenced by y changes. Thus in each iteration of the loop the value referenced by y must be loaded from memory. However, if type T is not compatible with int in the sense of C11 section 6.5 paragraph 7, then it is sound to assume that x and y do not alias. In that case the memory lookup *y can be performed exactly one time so that the resulting value of the expression *y + 42 may be stored in a register which is then accessed in each iteration. Consider the following assembler code where function bar is translated by GCC as follows where T is an int on the left-hand side and an short on the right-hand side:

bar(int*, int*):
lea   rdx, [rdi+4000]
.L2:
mov   eax, DWORD PTR [rsi]
add   rdi, 4
add   eax, 42
mov   DWORD PTR [rdi-4], eax
cmp   rdi, rdx
jne   .L2
ret
bar(int*, short*):
movsx eax, WORD PTR [rsi]
lea   rdx, [rdi+4000]
add   eax, 42
.L2:
mov   DWORD PTR [rdi], eax
add   rdi, 4
cmp   rdi, rdx
jne   .L2
ret

For the case where T is a short, GCC is able to perform the memory lookup *y exactly one time and therefore evaluates the expression *y + 42 exactly one time, too. The resulting value is then cached in a register and reused in each iteration. In contrast to the case where T is an int, the memory lookup *y must be performed a thousand times and therefore also the expression *y + 42 must be evaluated a thousand times, too.

TBAA in Action Reloaded

Another example which looks rather trivial but where TBAA shows its strength is given in this LLVM post. The example is even simpler than the previous one in the sense that only one pointer is involved. Assume that sizeof(float) equals 4.

float *x;
void zero_array(void) {
    for (int i = 0; i < 1000; ++i)
        x[i] = 0.0f;
}

If compiled with strict aliasing rules enabled, then Clang is able to translate the loop into a function call memset(x, 0, 4000) which is several times faster than performing 1000 times a 4-byte store. In case strict aliasing rules were not enabled, then after each 4-byte store it must be assumed that x changed and therefore no optimization as e.g. (partial) loop-unrolling is allowed. A conflicting program could look as follows:

int main(void) {
    x = (float *) &x; // cast causes TBAA violation in zero_array
    zero_array();
    return 0;
}

Here x is initialized with a pointer to itself which violates the strict aliasing rules as soon as x is dereferenced in function zero_array.

TBAA to the Rescue

The two previous examples demonstrate how TBAA can be used in order to emit optimized code. However, the strict aliasing rules are not only good for optimizations, although they are most often mentioned in this context, but also to enable portable code. Consider the following example:

uint16_t x[2] = { 0 };
*(uint32_t *) x; // unaligned?

On most modern architectures x may be aligned at any address which is a multiple of 2 bytes. Though, a 32-bit access should typically happen with an alignment of 4 bytes. Thus, depending on the alignment constraints of an architecture and the final alignment of object x, the read access may just work as expected, or may be slower because it is unaligned which is more expensive on the respective architecture, or it may even raise a memory exception at run-time.

Type Punning

So far we have seen that reinterpreting a value of an object by accessing it through an incompatible type leads to undefined behavior. The question which is left is: how to reinterpret a value correct? This is the place where unions come into play. Assume we are about to swap the words of a 32-bit integer. A naive implementation could look as follows which involves undefined behavior.

void swap(uint32_t *x) {
    uint16_t *p = (uint16_t *) x;
    uint16_t tmp  = p[0];  // UB
    p[0] = p[1];           // UB
    p[1] = tmp;            // UB
}

Since object *x of type uint32_t is accessed via an lvalue of type uint16_t, which are not compatible, the code snippet contains undefined behavior. A well defined implementation might look as follows:

void swap(uint32_t *x) {
    union {
        uint32_t ui;
        uint16_t p[2];
    } my_union = { *x };
    uint16_t tmp  = my_union.p[0];
    my_union.p[0] = my_union.p[1];
    my_union.p[1] = tmp;
    *x = my_union.ui;
}

GCC as well as Clang compile the body of the function into the single instruction rol DWORD PTR [rdi], 16. Thus both compilers recognize the pattern behind the usage of the union pretty well and emit optimized code.

Type punning via unions is mentioned in C11 section 6.5.2.3 paragraph 3.

C11 § 6.5.2.3 Structure and union members ¶ 3
[…] If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”) […]

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”). This might be a trap representation.

Strictly speaking this is a footnote and therefore non-normative but it reflects the committees intentions pretty well.

Copying Bits

Another way of type punning is to copy the raw bits from one object to another where the types of both objects are incompatible. Let us demonstrate this by an example. Assume we want to test if a variable f of type float is negative or not. A naive implementation would simply test if f < 0.0f is true or not. However, a drawback of this approach is that we have to make use of floating-point instructions which are rather expensive in the sense of run-time. An optimized version just tests if the sign-bit is set or not. For that we have to access the bits of an object of type floating-point via an integer type. Assume that the size of type float and unsigned are 32-bit, respectively. Furthermore, assume a little-endian architecture. Then an example implementation could look as follows:

bool is_negative(float f) {
    unsigned i;
    memcpy(&i, &f, sizeof(f));
    return (0x80000000 & i) != 0;
}

All accesses to variables f and i are sound according to C11 section 6.5 paragraph 7. The function call to memcpy is removed by GCC and compiled into an ordinary register move instruction:

is_negative(float):
    movd    eax, xmm0  // memcpy
    shr     eax, 31    // (0x80000000 & i) != 0
    ret

The same could be achived via a union as proposed before:

bool is_negative(float f) {
    union { float f; unsigned i; } u = { f };
    return (0x80000000 & u.i) != 0;
}

Interestingly GCC compiles this version into the very same assembler code as the version which uses memcpy. Therefore, the pattern behind the usage of memcpy as well as an union is recognized and optimized code is emitted. According to section 6.5 paragraph 7 we have another option to access an object. Each object may be accessed through an lvalue of any character type. Thus a further option to copy the bits from one object to another one is:

bool is_negative(float f) {
    unsigned i;
    unsigned char *pi = (unsigned char *) &i;
    unsigned char *pf = (unsigned char *) &f;

    for (size_t n = 0; n < sizeof(float); ++n)
        pi[n] = pf[n];

    return (0x80000000 & i) != 0;
}

From a personal point of view, I favor the solution via memcpy or a union over this one since they seem to be more concise and readable, i.e., the intention is more clear. From a technical point of view, we have that GCC does not recognize the pattern behind the loop which means that multiple store operations are emitted. In contrast to Clang which eliminates the loop completely, i.e., emits as optimal code as for the memcpy and union example.

For the sake of completeness let us demonstrate by another implementation how powerful character pointers are. Instead of copying the whole object we could simply access the corresponding byte which comprises the sign-bit.

bool is_negative(float f) {
    unsigned char *c = (unsigned char *) &f;
    return (0x80 & c[3]) != 0;
}

This time GCC as well as Clang emit as optimal code as for the memcpy and union example.

Character Type

It is worthwhile to mention that only a pointer to a character including char, signed char and unsigned char may alias any other pointer according to C11 section 6.5 paragraph 7. In particular this means that it is not guaranteed that a pointer of type uint8_t* or int8_t* may alias any other pointer. It depends on how the type uint8_t or int8_t is implemented. In C11 section 7.20.1.1 the exact-width integer types intN_t and uintN_t are defined but no guarantee is given that uint8_t or int8_t are a typedef of a character type. Some implementations may provide special purpose 8-bit width integers which are used for int8_t and uint8_t. This has been noted in this GCC bug report.

Fun fact: since C11 unsigned char and signed char are guaranteed to have no padding bits and therefore no trap representation (C11 § 6.2.6.2 ¶ 1 and 2). In C99 this was only guaranteed for unsigned char whereas a signed char may have padding bits and thus might generate trap representations.

Reinterpreting Bits

The idea of type punning is to reinterpret some or all bits of an object of some type T as the bits of an object of type Q. However, this raises the question what happens if a bit representation of an object of type Q does not represent a value of that type? For example, an integer of size n could reserve m bits for parity and ECC checking. Therefore, only n - m bits could be used in order to represent values of that particular integer type. Thus reinterpreting the bits of a valid object of some type T does not mean that the same bits represent a valid value of some type Q. This is also explicitly stated in C11 section 6.2.6.1 paragraph 5.

C11 § 6.2.6.1 General ¶ 5

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

50) Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it.

Let us have a look at an example where a 32-bit value is written and 16-bit are read:

uint16_t foo(uint32_t x) {
  union { uint32_t a; uint16_t b; } u = { x };
  return u.b;
}

According to C11 section 6.5.2.3 paragraph 3 the bits corresponding to u.b are reinterpreted as described in section 6.2.6 which means that in general the values are unspecified or possibly even be a trap representation. For integer types the values are implementation-defined or a trap representation.

Let us have a look at the dual where a 16-bit value is written and a 32-bit value is read:

uint32_t bar(uint16_t x) {
  union { uint16_t a; uint32_t b; } u = { x };
  return u.b;
}

This time less bits get written than read which is explicitly mentioned in C11 section 6.2.6.1 paragraph 7.

C11 § 6.2.6.1 General ¶ 7

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

Thus in this case we have that the value for u.b is unspecified or a trap representation which finally means the value is indeterminate.

Good news: for modern architectures we have that it is rather uncommon to have trap representations. If traps do not exist, then in such cases it is guaranteed that both examples from above do not contain undefined behavior. For example, all standard integer types except _Bool for all supported systems of the GNU C Library (ver. 2.28) have no padding bits. Thus, for such cases it is guaranteed that no traps exist.

Effective Type

While we studied C11 section 6.5 paragraph 7 we omitted to lookup the definition of an effective type what we want to catch up now.

C11 § 6.5 Expressions ¶ 6

The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

87) Allocated objects have no declared type.

According to the paragraph we have that for all objects with static, thread, or automatic storage duration their effective type equals their declared type which follows from the first sentence. The remainder of the paragraph deals with objects with allocated storage duration. Allocated objects do not have a declared type but an effective type which may change during the objects lifetime. Let us have a look at some examples:

static_assert(sizeof(float) == sizeof(int), "Whoopsy");
void *p = malloc(sizeof(int)); assert(p);

*(int *) p = 42;              // eff. type of *p is int now
printf("%i\n", *(int *) p);   // eff. type of *p is still int

*(float *) p = 42.0f;         // eff. type of *p is float now
printf("%f\n", *(float *) p); // eff. type of *p is still float

int i = 42;
memcpy(p, &i, sizeof(i));     // eff. type of *p is int now

float f = 42.0f;
memcpy(p, &f, sizeof(f));     // eff. type of *p is float now

unsigned char *pc = pv;
unsigned char *qc = (unsigned char *) &i;
for (size_t n = 0; n < sizeof(i); ++n)
    pc[n] = qc[n];
                              // eff. type of *p is int now

Basically, the effective type of an allocated object is the type of the last store. A side-effect of this is that type punning for allocated memory does not work anymore except through union types. Consider the function in order to test for a negative float. The following implementation using allocated memory invokes undefined behavior.

bool is_negative(float f) {
    unsigned *i = malloc(sizeof(f)); assert(i);
    memcpy(i, &f, sizeof(f));      // eff. type of *i is float
    return (0x80000000 & *i) != 0; // UB reading float as int
}

Since memcpy does not only copy the bits from the source object but also virtually its type, the effective type of *i after the call is float. Thus the final read access through an lvalue of type int invokes undefined behavior. Up to my knowledge, type punning can only be done through union types or via character pointers for allocated objects.

bool is_negative(float f) {
    union u_t { float f; unsigned i; };
    union u_t *u = malloc(sizeof(union u_t)); assert(u);
    u->f = f;
    return (0x80000000 & u->i) != 0;
}

Fun fact: removing the assert statement results in the very same assembler code as we have seen for the previous implementations, if compiled with GCC or Clang. Thus in that case no allocated memory is used and the call to malloc is removed.

Shortcomings

The standard has several shortcomings regarding the strict aliasing rules. One interesting example is given in DR236 where different translations result in different behaviors and even in type incorrect programs. A simplified version is given as follows:

void f(int *pi, double *pd) {
    *pi += 42;
    *pd  = 42.0;
}

int main(void) {
    void *pv = malloc(sizeof(double)); assert(pv);
    *(int *) pv = 5;
    f(pv, pv);
    free(pv);
    return 0;
}

In function main just before the call to function f the effective type of object *pv is int since the last write to it is done via an lvalue of type int. Then in function f object *pv is read and written to via an lvalue of type int by the first assignment. Therefore, the effective type does not change. By the second assignment the effective type changes to type double. Thus every translation which follows this computation is type correct.

Let us now consider a different translation of function f. According to the strict aliasing rules it can be assumed that pi and pd do not alias since the types are not compatible. Therefore, the assignment *pd = 42.0; may be hoisted to the top of the function. Thus after the assignment the effective type of *pv is double. We then have that by the subsequent assignment *pi += 42; object *pv is read via an lvalue of type int which is type incorrect.

Apart from the fact that both translations result in different computations for object *pv, the effective type of *pv is different in both translations after function call f(pv, pv).

A second and similar example is given in DR236 where no allocated object is used but a union. A simplified version is given as follows:

void f(int *pi, double *pd) {
    *pi += 42;
    *pd  = 42.0;
}

int main(void) {
    union { int i; double d; } u = { 5 };
    f(&u.i, &u.d);
    return 0;
}

The program suffers from the same problem as the example before: pi and pd alias in function f. The response from the committee regarding both examples is clear but no justification is given:

Both programs invoke undefined behavior, by calling function f with pointers pi and pd that have different types but designate the same region of storage. The translator has every right to rearrange accesses to *pi and *pd by the usual aliasing rules.DR236

The wording of the C11 standard is not stringent and leaves room for interpretation which has not been addressed by the answer from the committee. The current wording does not differentiate between the type of an object x and the type of an lvalue which refers to object x. Consider the following example.

union { float f; int i; } u;
float f;

Object u is of type union whereas the lvalue u.f is of type float. Similar but still different, object f is of type float and the lvalue f is of type float, too. Thus from a union type it is possible to derive all types of lvalues. The other way around, from the type of an lvalue it is not possible to deduce if it is part of a surrounding aggregate or union object. This asymmetry has not been addressed so far which strictly speaking renders a lot of programs which make use of union or aggregate objects into type incorrect programs. Consider the following example:

struct { float f; int i; } s;
int *pi = &s.i;
*pi = 42;

Object s has a structure type whereas the type of the lvalue expression *pi is int. Since any structure type is not compatible with type int the first 4 bullet points of C11 section 6.5 paragraph 7 do not apply. Likewise, the last bullet point does not apply either, since no character type is involved. This leaves us with bullet point 5. Since int is neither an aggregate nor a union type this bullet point does not apply either. Thus the code snippet violates the aliasing rules which is obviously not intended. Otherwise, each aggregate or union object would have to be accessed only via an aggregate or union type which would render the aliasing rules useless. This cannot be the intention of the standard and therefore the code must be correct.

Having a look at N973 we can guess the intention of the committee and how the standard could be adapted:

we agreed that we want to move the Standard in the direction of “the visibility of the union at the point of the reference” is what matters

Throughout the C11 standard (§ 6.5 Expressions ¶ 7; § 6.5.2.3 Structure and union members) there is basically no distinction between accesses to aggregate and union objects. However, deducing from the comment above we may see a change in a future release regarding the access of a member of an union object. The GCC team already adapted the rules which we investigate next.

GCC to the Rescue

The GCC team decided to interpret the C standard in a slightly different way. According to the GCC documentation of the strict aliasing options, the following two code snippets contain undefined behavior. The first example is almost the same as we have seen before but this time a union is used instead of a structure.

union { float f; int i; } u;
int *pi = &u.i;
*pi = 42; // UB according to GCC

In the example above an object of type union is accessed via an lvalue expression of a non-union type whereas in the example below it is the other way around. An object of a non-union type is accessed via an lvalue expression of a union type.

union u_t { float f; int i; };
float f = 42.0f;
((union u_t *) &f)->i;  // UB according to GCC

Strictly speaking the first example should not violate the aliasing rules which we concluded from the example which was almost the same but where a structure instead of a union object was used. Therefore, GCC is not conforming to the standard. However, from the discussions of the C committee it is clear that the rules are broken. Thus the GCC team went for a pragmatic solution and require that union members are only accessed via union types. This deviation from the standard is also documented on the mailing list:

GCC doesn’t implement C99 aliasing as written in the standard regarding unions. We don’t do so because we determined that this can’t possibly have been the intent of the standard as it makes type-based aliasing relatively useless.

For further discussions have a look at the following incomplete list: N640 N980 N987 N1409 N1520 N1637 N2222 DR028 DR257 DR283 3406

Theory vs Practice

Casting pointers to pointers of even non-compatible types has a long history in C programming. This has not changed since C89 when the strict aliasing rules were standardized. One famous example is the UNIX pattern in order to deal with different socket types which mimics the behavior of subclassing in C.

struct sockaddr {
  sa_family_t  sa_family;
  char         sa_data[];
};

struct sockaddr_in {
  sa_family_t     sin_family;
  in_port_t       sin_port;
  struct in_addr  sin_addr;
};

struct sockaddr_un {
  sa_family_t  sun_family;
  char         sun_path[];
};

Intuitively the structure sockaddr is the base class from which the structures sockaddr_in and sockaddr_un derive. All three structures share a common initial member of type sa_family_t which serves as a tag for an object in order to define of which type it is. Based on such a member the idea is to decide at run-time of which subclass an object is, given a pointer to a structure sockaddr. Consider the following example.

struct sockaddr *sa;

/* ... */

switch (sa->sa_family) {
  /* ... */
  case AF_INET:
    struct sockaddr_in *sin = (struct sockaddr_in *) sa;
  /* ... */
  case AF_UNIX:
    struct sockaddr_un *sun = (struct sockaddr_un *) sa;
  /* ... */
}

The code snippet mimics the behavior of a downcast as known from OOP languages like Java where the instanceof operator could have been used at run-time in order to decide of which subclass an object is. Likewise, the idea of an upcast is depicted by the following example.

extern int connect(int sockfd, const struct sockaddr *addr,
                   socklen_t addrlen);

struct sockaddr_in sin;
/* ... */
connect(sockfd, (struct sockaddr *) &sin, sizeof(sin));

Inside of function connect object sin will be first accessed via structure type sockaddr in order to decide to which socket family the object belongs. Thus both examples suffer from the same problem that an object is accessed via an incompatible type and therefore violate the strict aliasing rules. Still, the usage of such and similar patterns is common practice. The dilemma can be solved by using a union of structures for which the C standard has an exception.

C11 § 6.5.2.3 Structure and union members ¶ 6

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

Thus, for the previous examples, instead of using a structure type directly we could have used a union type which comprises members of all three structure types.

union {
  struct sockaddr     sa;
  struct sockaddr_in  sin;
  struct sockaddr_un  sun;
} u;

u.sin.sin_family = AF_INET;

switch (u.sa.sa_family) { /* ... */ }

The common initial sequence of all three structures comprises the first member which is of type sa_family_t in all three structures. Note, the name of the member may differ and is not taken into account. The type of the second member of structure sockaddr and sockaddr_un are compatible. However, this is not the case for structure sockaddr and sockaddr_in, or likewise for sockaddr_un and sockaddr_in. Thus the longest common initial sequence comprises only the first member of all three structures.

Interestingly, it is only allowed to inspect the common initial sequence. That means, it is not explicitly allowed to read and write arbitrarily to structure members which are part of the common initial sequence. One could get the impression that it is only defined behavior to write through exactly one structure and reading is allowed through all structure members which are part of the common initial sequence.

The takeaway message here is that there still exist patterns which are actively used and which violate the strict aliasing rules.

See Beyond One’s Nose

We have seen that the C standard is not stringent, leaves room for interpretation, and that nowadays programs still make use of C constructs which may lead to undefined behavior. Coming from the automotive industry the MISRA standard (Motor Industry Software Reliability Association) has been developed on top of the C standard in order to restrict which C constructs may be used and which should not. That is the MISRA committee tries to identify common patterns which possibly lead to non-portable, non-safe, or non-secure programs. Such patterns are then cumulated into different rules. One of such a rule is about the usage of unions:

MISRA C:2012 Rule 19.2 The union keyword should not be used

A union member can be written and the same member can then be read back in a well-defined manner.

However, if a union member is written and then a different union member is read back, the behaviour depends on the relative sizes of the members:

  • If the member read is wider than the member written then the value is unspecified;
  • Otherwise, the value is implementation-defined.

The Standard permits the bytes of a union member to be accessed by means of another member whose type is array of unsigned char. However, since it is possible to access bytes with unspecified values, unions should not be used.

If this rule is not followed, the kinds of behaviour that need to be determined are:

  • Padding — how much padding is inserted at the end of the union;
  • Alignment — how are members of any structures within the union aligned;
  • Endianness — is the most significant byte of a word stored at the lowest or highest memory address;
  • Bit-order — how are bits numbered within bytes and how are bits allocated to bit fields.

According to the rule the MISRA committee decided that the usage of unions should be done only with cautions. Strictly speaking they miss the point of trap representations, i.e., they speak about unspecified and implementation-defined values only, but that is just a little something. The core message is that they identified unions to be problematic and recommend not to use it.

In MISRA C:2012 Amendment 1 another rule is given which suggests that memcpy should only be used for pointers to qualified or unqualified versions of compatible types.

MISRA C:2012 Rule 21.12 The pointer arguments to the Standard Library functions memcpy, memmove and memcmp shall be pointers to qualified or unqualified versions of compatible types

This rule basically prevents type punning via memcpy. In total we have that according to the MISRA C:2012 standard type punning, either via a union or via memcpy, should not be done or at least each deviation must be justified. This seems to be a reasonable decision w. r. t. the current wording of the C standard and its open issues—at least for mission critical programs.

Beside the MISRA standard there exist several other standards with same or similar goals. One of them is the “ISO/IEC TS 17961:2013(E) Information technology — Programming languages, their environments and system software interfaces — C secure coding rules” standard (see public draft N1624) and another one is the “SEI CERT C Coding Standard” (Link). All three standards comprise rules which basically say that you have to follow the strict aliasing rules from the C standard.

MISRA C:2012 Rule 11.3 A cast shall not be performed between a pointer to object type and a pointer to a different object type
ISO/IEC TS 17961:2013(E) Rule 5.1 Accessing an object through a pointer to an incompatible type
SEI CERT C Coding Standard: EXP39-C. Do not access a variable through a pointer of an incompatible type (Link)

The MISRA rule is even more restrictive than the strict aliasing rules from the C standard since it does not even allow to cast a pointer to a pointer of an incompatible object type. Thus the following code snippet violates the MISRA rule but not the strict aliasing rules of the C standard.

int x;
float *y = (float *) &x;  // MISRA
int *z = (int *) y;       // violation
*z = 42;

Such casts are allowed according to the C standard, and ISO/IEC TC 17961, and also the SEI CERT C Secure Coding Standard. However, dereferencing variable y is not allowed according to all of them. Thus ISO/IEC TC 17961 and SEI CERT C Secure Coding Standard do not add any further restriction on top of the C standard w. r. t. to pointer casts.

The takeaway message here is that the MISRA committee identified type punning as problematic and gave good examples which were non-portable or error prone. All three standards mention the strict aliasing rules and require compliance with them. This might be seen as an indicator that the strict aliasing rules are violated here and there in practice.

History

The strict aliasing rules are part of the C standard since the early days of C89 which is also known as ANSI C and from which the ISO C90 standard is derived. In all ISO C standards the rules are given in section termed “Expressions” which is numbered in C89 as 6.3 and 6.5 in all later revisions up to and including C2x-draft.

C90 § 6.3 Expressions ¶ 6

An object shall have its stored value accessed only by an lvalue that has one of the following types:36

  • the declared type of the object,
  • a qualified version of the declared type of the object,
  • a type that is the signed or unsigned type corresponding to the declared type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

36) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

The paragraph has been polished up between the different releases of the standard. Most notably in C99 the special handling of allocated memory was introduced which was suggested in N640.

The section about structure and union members has been reworked in C99, too. The intention was to remove any restrictions on accessing a corresponding union member other than the one last stored into. However, the rewording introduced some confusion which we investigate in the following. The interesting part in C90 is:

C90 § 6.3.2.3 Structure and union members ¶ 5

[…] if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.41 […]

41. The “byte orders” for scalar types are invisible to isolated programs that do not indulge in type punning (for example, by assigning to one member of a union and inspecting the storage by accessing another member that is an appropriately sized array of character type), but must be accounted for when conforming to externally-imposed storage layouts.

and the corresponding part in C99/C11:

C99/C11 § 6.5.2.3 Structure and union members ¶ 3
[…] If the member used to read [“access” in C99] the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type […]

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

95) If the member used to read [“access” in C99] the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”). This might be a trap representation.

According to DR283 both versions should have the same meaning. Though strictly speaking in C99/C11 the wording is not about implementation-defined but uses the soft words “the value is reinterpreted as an object representation in the new type” which is a fancy way of saying that it is unspecified or implementation-defined unless the corresponding bits are a trap representation.

The takeaway message here is that type punning via a union is defined behavior since the early days of C89/C90. However, the value of the bytes of a corresponding union member other than the one last stored into is unspecified or implementation-defined unless the corresponding bits are a trap representation.

The Past and the Future

The only major change between C90 and C99 w. r. t. the strict aliasing rules is the handling of allocated memory. All other changes are rather small and refer to corrections of existing formulations instead of content innovations. Comparing the standards C99, C11, and a first draft of the upcoming C2x standard we observe that they are almost identical w. r. t. the paragraphs we had a look at in this post:

Of course, numbering of footnotes has also changed.

Wrap-up

The intent of the C committee regarding type-based alias analysis and strict aliasing rules is clear: allow optimizations based on type information. Such kind of optimizations exist not only for programs written in C, but also for programs written in type safe languages as e.g. Haskell, ML, Java, and so on. However, since such languages lack the ability to create pointers/references to arbitrary objects of arbitrary types, the strict aliasing rules in such languages feel transparent to the programmer. In C this is not the case. Furthermore, the C standard does not define the strict aliasing rules rigorously which leads to some confusion. There still exist programs for which it is unclear what the actual semantics should be. Admittedly, these programs are rather exotic.

The question which still has to be answered is: why does the C standard allow casts between incompatible pointer types as e.g. int* and float* when at the same time it is required that an access to an object is only done via an lvalue expression with a compatible type (modulo few exceptions)?

From a programmers point of view I would love to see some progress towards a more stringent definition of the strict aliasing rules which allow a compiler to warn about a wrong usage of an alias. In general, enforcing a stricter type system could be one step.

All experiments were done with GCC 8.2.1 and Clang 7.0.0 using options -std=c11 -O2 on an x86_64 Linux distribution.