Behavior

From cppreference.com
< c‎ | language

Behavior is an observable action.

The C standard precisely defines the observable behavior of many C language constructs. It also specifies those constructs whose observable behavior it leaves imprecisely defined. Categories of behavior follow:

  • well-defined behavior - the C standard specifies exactly one observable behavior for a language construct
  • unspecified behavior - use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance
  • implementation-defined behavior - unspecified behavior where each implementation documents how the choice is made
  • locale-specific behavior - behavior that depends on local conventions of nationality, culture, and language that each implementation documents
  • undefined behavior - behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

Contents

[edit] Explanation

The phrases well-defined semantics and well-defined forms appear in the C standard; however, the term well-defined behavior does not appear explicitly among the terms and definitions of clause 3, so the ordinary definition applies.

With unspecified behavior, the behavior of a C program varies among implementations, and the conforming implementation may, but is not required to, document the effects of each behavior. Each occurrence of unspecified behavior results in one of a set of valid results. A program must not fail to compile just because it includes unspecified language constructs; a correct program includes both well-defined and unspecified language constructs. Examples include order of evaluation, whether identical string literals are distinct, the amount of array allocation overhead.

With implementation-defined behavior, the behavior of a C program varies among implementations, and the conforming implementation must document the effects of each behavior. Implementation-defined behavior is a subset of unspecified behavior. Examples include the type of size_t, the number of bits in a byte, and the range of an integer type.

Locale-specific behavior depends on the implementation-supplied locale. It is a subset of implementation-defined behavior.

With undefined behavior, there is no restriction on the behavior of an implementation or a compiled program. Both implementations and compiled programs may do anything. Implementations are not required to diagnose undefined behavior (although they diagnose many simple situations). Compiled programs may do something meaningful but are not required to do so. There is an anything goes quality surrounding undefined behavior. Examples include memory accesses occurring outside array boundaries, signed integer overflow, null pointer dereference, modification of the same scalar more than once in an expression without sequence points, access to an object through a pointer of a different type.

Conformance (clause 4) specifies in the second paragraph how the C standard flags undefined behavior:

  • a shall or shall not requirement that appears outside of a constraint or runtime-constraint is violated;
  • using the term undefined behavior;
  • the omission of any explicit definition of behavior.

The last method in the above list contributes much to the challenge of recognizing undefined behavior.

Portable C code includes well-defined language constructs and avoids unspecified language constructs. This ideal is difficult to achievable especially when an application depends on a locale; however, avoiding the other imprecise behaviors improves portability.

[edit] UB and optimization

Because correct C programs are free of undefined behavior, compilers may produce unexpected results when a program that actually has UB is compiled with optimization enabled:

For example,

int foo(int x) {
    return x+1 > x; // either true or UB due to signed overflow
}
// may be compiled as 
int foo(int x) {
    return 1;
}
bool p; // uninitialized local variable
if(p) // UB access to uninitialized scalar
    puts("p is true");
if(!p) // UB access to uninitialized scalar
    puts("p is false");
// may be compiled to a program that prints both lines:
// p is true
// p is false
// ...or to a program that prints nothing
int table[4] = {};
bool exists_in_table(int v)
{
    // return true in one of the first 4 iterations or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}
// this may be compiled as
bool exists_in_table(int v)
{
    return true;
}
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1; // UB access to a pointer that was passed to realloc
*q = 2;
if (p == q) // UB access to a pointer that was passed to realloc
    printf("%d %d\n", *p, *q);
// this may print 1 2

[edit] References

  • C11 standard (ISO/IEC 9899:2011):
  • 3.4 Behavior (p: 3-4)
  • 4/2 Undefined behavior (p: 8)