WhozCraig WhozCraig - 3 months ago 9
C Question

Does applying post-decrement on a pointer already addressing the base of an array invoke undefined behavior?

After hunting for a related or duplicate question concerning the following to no avail (I can only do marginal justice to describe the sheer number of pointer-arithmetic and post-decrement questions tagged with C, but suffice it to say "boatloads" does a grave injustice to that result set count) I toss this in the ring in hopes of clarification or a referral to a duplicate that eluded me.

If the post-decrement operator is applied to a pointer such as below, a simple reverse-iteration of an array sequence, does the following code invoke undefined behavior?

#include <stdio.h>
#include <string.h>

int main()
{
char s[] = "some string";
const char *t = s + strlen(s);

while(t-->s)
fputc(*t, stdout);
fputc('\n', stdout);

return 0;
}


It was recently proposed to me that 6.5.6.p8 Additive operators, in conjunction with 6.5.2.p4, Postfix increment and decrement operators, specifies even performing a post-decrement upon
t
when it already contains the base-address of
s
invokes undefined behavior, regardless of whether the resulting value of
t
(not the
t--
expression result) is evaluated or not. I simply want to know if that is indeed the case.

The cited portions of the standard were:


6.5.6 Additive Operators


  1. If both the pointer operand and the result point to elements of the
    same array object, or one past the last element of the array object,
    the evaluation shall not produce an overflow; otherwise, the behavior
    is undefined.




and its nearly tightly coupled relationship with...


6.5.2.4 Postfix increment and decrement operators Constraints


  1. The operand of the postfix increment or decrement operator shall have
    atomic, qualified, or unqualified real or pointer type, and shall be a
    modifiable lvalue.



Semantics


  1. The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The value computation of the result is sequenced before the side effect of updating the stored value of the operand. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object with atomic type is a read-modify-write operation with memory_order_seq_cst memory order semantics.98)

  2. The postfix -- operator is analogous to the postfix ++ operator, except that the value of the operand is decremented (that is, the value 1 of the appropriate type is subtracted from it).



Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).


The very reason for using the post-decrement operator in the posted sample is to avoid evaluating an eventually-invalid address value against the base address of the array. For example, the code above was a refactor of the following:

#include <stdio.h>
#include <string.h>

int main()
{
char s[] = "some string";

size_t len = strlen(s);
char *t = s + len - 1;
while(t >= s)
{
fputc(*t, stdout);
t = t - 1;
}
fputc('\n', stdout);
}


Forgetting for a moment this has a non-zero-length string for
s
, this general algorithm clearly has issues (perhaps not as clearly to some). If
s[]
were instead
""
, then
t
would be assigned a value of
s-1
, which itself is not in the valid range of
s
through its one-past-address, and the evaluation for comparison against
s
that ensues is no good. If
s
has non-zero length, that addresses the initial
s-1
problem, but only temporarily, as eventually this is still counting on that value (whatever it is) being valid for comparison against
s
to terminate the loop. It could be worse. it could have naively been:

size_t len = strlen(s) - 1;
char *t = s + len;


This has disaster written all over it if
s
were a zero-length string. The refactored code of this question opened with was intended to address all of these issues. But...

My paranoia may be getting to me, but it isn't paranoia if they're really all out to get you. So, per the standard (these sections, or perhaps others), does the original code (scroll to the top of this novel if you forgot what it looks like by now) indeed invoke undefined behavior or not?

Answer

I am pretty certain that the result of the post-decrement in this case is indeed undefined behaviour. The post-decrement clearly subtracts one from a pointer to the beginning of an object, so the result does not point to an element of the same array, and by the definition of pointer arithmetic (§6.5.6/8, as cited in the OP) that's undefined behaviour. The fact that you never use the resulting pointer is irrelevant.

What's wrong with:

char *t = s + strlen(s);
while (t > s) fputc(*--t, stdout);

Interesting but irrelevant fact: The implementation of reverse iterators in the standard C++ library usually holds in the reverse iterator a pointer to one past the target element. This allows the reverse iterator to be used normally without ever involving a pointer to "one before the beginning" of the container, which would be UB, as above.