Non-Termination Considered Harmful in C and C++

Admittedly the title is somewhat lurid—kudos to Dijkstra—but the C and C++ standards have some constraints regarding termination that may be surprising at first. In the following we will have a look at some example programs for which unexpected code is emitted by two compilers, and examine the C and C++ standards in order to clarify whether such compilations are sound or not.

tl;dr Non-termination essentially invokes undefined behavior under certain conditions.

Let us start with our first example:

int foo(int x) {
    if (x == 1) return 1;
    else return foo(x);
}

Intuitively, a function call foo(1) should terminate, whereas, for any integer i with i != 1 a function call foo(i) should not terminate, i.e., should result in an infinite recursion. Still, Clang (version 9.0.0; options -O1) compiles this into a function which always terminates:

foo:
  mov eax, 1
  ret

This might be surprising at first, though, it follows a rule. Before we go into detail, let us have a look at another example.

An infinite loop runs forever. Otherwise it would not be an infinite loop. This is a common understanding and still there is some confusion about it w.r.t. the programming languages C and C++. Consider our second example:

int x = 1;

void bar(void) {
    while (x);
}

The body of function bar consists of a loop which depends on global variable x which itself is initialized to a non-zero value. Thus as long as the value of x is not changed, the loop should run forever. In particular this means that a call to function bar should not return. However, ICC (version 13.0.1; options -O1) compiles the code as follows:

bar:
  ret
x:
  .long 1

The loop is removed, although statically the value of the loop condition is known to be non-zero. Thus any function call bar() terminates.

You may examine both examples via Compiler Explorer: foo bar

In the upcoming sections we first have a look at the C standard and afterwards at the C++ standard in order to clarify whether such compilations are sound or not.

Observable Behavior

A compiler is allowed to reorder, delete, and add arbitrary instructions as long as the semantics of the compiled and the source program are the same. The semantics of a C program is informally defined by the C standard. There an abstract machine is given which itself can be seen as a form of operational semantics where things like variable bindings and contexts are made more explicit than in other forms like the lambda calculus. We then have that a compiled version q of a program p is sound if and only if for all inputs the execution of q on the target machine produces the same observable behavior as if p is executed on the abstract machine. What constitutes as observable behavior is defined by the C standard:

C17 § 5.1.2.3 Program execution ¶ 6

The least requirements on a conforming implementation are:

Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

This is the observable behavior of the program.

The first and third bullet point do not apply to our introductory examples since no volatile object is accessed and no input/output operation is performed. We are left with the second bullet point. If both machines terminate then at program exit the data written into files must be identical and in the same order of the implementation as well as the abstract machine. The definition leaves room for interpretation and it is unclear whether it applies in case if only one machine terminates. Let’s be philosophical and consider the following program which makes use of the previously defined function bar:

int main(void) {
    bar();
    printf("hello world\n");
    return 0;
}

The abstract machine does not terminate and therefore does not write to stdout in finite time. However, the compiled version emitted by ICC renders bar into a terminating function which means that hello world is written to stdout. Is this still sound? You might argue that by transfinite iteration the abstract machine produces the same output as the implementation. Though, this argument seems to be rather academic.

(Non-)Terminating Loops

In order to clarify status quo a new paragraph was added in C11 which received a minor update in C17:

C17 § 6.8.5 Iteration statements ¶ 6

An iteration statement may be assumed by the implementation to terminate if its controlling expression is not a constant expression,¹⁵⁸⁾ and none of the following operations are performed in its body, controlling expression or (in the case of a for statement) its expression-3:¹⁵⁹⁾

input/output operations
accessing a volatile object
synchronization or atomic operations.

158) An omitted controlling expression is replaced by a nonzero constant, which is a constant expression.
159) This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

The controlling expression of the loop from our introductory example is not a constant expression. Furthermore, no input/output operation is performed, no volatile object is accessed, and no atomic operation is performed in the controlling expression nor body of the loop. Thus if we take the paragraph literally, then it is sound to remove the loop—especially the second footnote encourages this. Note, the paragraph only defines what may be assumed in case of a loop with a non-constant condition. Nothing additionally is defined by the paragraph for a loop with a constant condition, i.e., the usual rules apply for such loops. Let’s be nit-picky and consider the following code snippet:

int x = 1; while (x) while (1);

The outer loop depends on a non-constant controlling expression and therefore may be assumed to terminate. However, in contrast the inner loop depends on a constant non-zero controlling expression and therefore does not terminate. Still an implementation is allowed to assume that the outer loop terminates although it is clear that the body never does. This contradiction is not properly addressed by the new paragraph.

Loop Fusion

The intention behind the new paragraph is described in N1528. There the following example is given. Assume that count and count2 are non-volatile and non-atomic global variables.

for (p = q; p != 0; p = p->next) {
    ++count;
}
for (p = q; p != 0; p = p->next) {
    ++count2;
}

From an optimization perspective it is desirable to fuse the two loops into one.

for (p = q; p != 0; p = p->next) {
    ++count;
    ++count2;
}

However, in general it is not statically decidable whether the linked list is cyclic or acyclic and therefore whether the loops terminate or not. In case the first loop does not terminate and another thread accesses count2, then the fused version introduces a data race. Let us have a look at a complete example:

#include <stdio.h>
#include <threads.h>

struct list { struct list *next; };

unsigned count, count2;

int thread1(void *q) {
    for (struct list *p = q; p != 0; p = p->next) {
        ++count;
    }
    for (struct list *p = q; p != 0; p = p->next) {
        ++count2;
    }
    return 0;
}

int thread2(void *q) {
    if (((struct list *)q)->next) // cyclic?
        printf("count2 = %i\n", count2);
    return 0;
}

int main(int argc, char **argv) {
    struct list q;
    q.next = argc > 1 ? &q : 0; // cyclic or acyclic?
    thrd_t t1, t2;
    thrd_create(&t1, thread1, &q);
    thrd_create(&t2, thread2, &q);
    thrd_join(t1, 0);
    thrd_join(t2, 0);
    return 0;
}

Assume the loops are not fused by the compiler. If q points to a cyclic list, then only thread2 accesses variable count2 since the first loop does not terminate. Otherwise, if q points to an acyclic list, then only thread1 accesses variable count2. Thus, it is guaranteed that there is no data race on variable count2. However, if the loops are fused, then in the first case there is a data race on count2.

Note, statically it is not decidable whether the list is cyclic or not, since this depends on a value which is only known at run-time.

The program is weird in the sense that no synchronization mechanisms are used in order to deal with a variable which is potentially shared, i.e., which is in the lexical scope of multiple threads but is accessed by at most one thread at run-time—at least in the non-fused version. Therefore, it seems reasonable that a compiler may introduce a data race by fusing the two loops into one.

Forward Progress

As mentioned in N1528 the new paragraph “Iteration statements” in C11 was also added in order to streamline C and C++ where we have:

C++17 § 4.7.2 Forward progress ¶ 1

The implementation may assume that any thread will eventually do one of the following:

terminate,
make a call to a library I/O function,
perform an access through a volatile glvalue, or
perform a synchronization operation or an atomic operation.

[Note: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. — end note]

The paragraph is quite similar compared to the one for C but still distinct. In C++ termination is explicitly mentioned. Furthermore, the wording in C++ in contrast to C is not about how something is done but rather what should be done, i.e., the paragraph does not explicitly speak about iteration statements but about I/O operations, volatile accesses, and atomic operations. Thus in C it is not made explicit whether the following statement may terminate or not, whereas in C++ it is clear that it may:

loop:
goto loop;

It also follows from the paragraph that C and C++ have different semantics w.r.t. loops with constant conditions like while(1);. In C++ an implementation may assume that such loops terminate whereas in C this is not the case.

Finally we have for our introductory examples that function bar may terminate in C++ as well as in C. Whereas function foo may terminate in C++ but not in C. This means that compiling function foo into an always terminating function as done by Clang is not sound w.r.t. the C standard. The LLVM framework seems to implement only the idea of forward progress which is not just problematic for C but also for languages like Rust which depend on LLVM (see e.g. Rust issue 42009 or 28728). This may be fixed in the future according to LLVM bug 965.

Wrap-up

Non-termination essentially invokes undefined behavior under certain conditions for C as well as C++ programs. For C the case is more complex than for C++ in the sense that we have to distinguish between loops with constant and non-constant conditions like while(1) and while(e), goto, and even (mutual) recursive functions whereas in C++ all those cases are treated the same. One of the reasons why C distinguishes between such loops is due to the embedded community that needed a way to encode an unrecoverable error state by an infinite loop with an empty body. However, these distinctions lead to more complex compilers because they cannot use a single logic for both languages.

From a practical point of view there is still an open question: are all these complications worth it just to have the potential to emit a more optimized program in some cases? It might be worthwhile to have a look at other languages, e.g., Java (JLS 17.4.9, JLS 17.4.2) or Haskell to name just two, for which non-termination is well defined, and IMHO, in a more intuitive way which may prevent us from subtle bugs.