Compton Compton - 3 months ago 9
C++ Question

Values past valid pointer to C-Style string

I am currently working my way through "C++ Primer". In one the exercise questions it asks:

What does the following program do?

const char ca[] = { 'h', 'e', 'l', 'l', 'o' };

const char *cp = ca;

while (*cp)
{
cout << *cp << endl;

cp++;
}


I am quite happy that i understand *cp will be continue to be true past the last character of the ca[] array because there is no null-character as the last item in the array.

It is more for my own curiosity as to what makes the while-loop become false. It seems to always show 19 characters on my computer. 0-4 are the hello string, 5-11 are always the same, and 12-19 change with each execution.

#include <iostream>

using namespace std;

int main( )
{
const char ca[ ] = { 'h', 'e', 'l', 'l', 'o'/*, '\0'*/ };

const char *cp = ca;

int count = 0;

while ( *cp )
{
// {counter} {object-pointed-to} {number-equivalent}
cout << count << "\t" << *cp << "\t" << (int)*cp << endl;

count++;
cp++;
}

return 0;
}


The question: What causes the while-loop to become invalid? Why is 5-11 always the same character?

Answer

Since for the language what happens by accessing an array out of it's bounds is undefined, if you want to understand what happens you have o understand how your "platform" works.

For the most of the compilers your memory is probably layed-out like this:

|H|e|l|l|o|XXX|____cp___|__count__|

XXX are "padding bytes" necessary to align to 8. Compilers -in debug version- typically fill these bytes with fixed values other than 0 just to have an out of bound iteration to don't stop (so that you can discover it)

cp is a pointer to the "H" that increments one by one. It's value is normally the address map of the stack of your process in your process itself.

This address usually have a fixed prefix, and an offset value that grows as you go deep in nested calls.

Since a pointer is (probably) 8 bytes long (with the last four bytes placed before the first, because of low endianes of x86 processors) what you get is an iteration that prints:

  • The file "Hello" characters
  • The three padding characters (admitting the yare somehow printable)
  • The cp offset from the stack beginning (always the same, since main is always in the same place respect to the program itself)
  • Part of the process prefix (this changes at every invocation)

This prefix may include a "0" at certain point on, thus terminating the loop.

Note that -however this explanation can make sense- you cannot in any way trust it for production code to be compiled for different platform, may be even by different compilers, since the wy they manage variables can also be different.