tectonicfury tectonicfury - 1 month ago 10
C Question

Weird behaviour with variable length arrays in struct in C

I came across a concept which some people call a "Struct Hack" where we can declare a pointer variable inside a struct, like this:

struct myStruct{
int data;
int *array;
};


and later on when we allocate memory for a
struct myStruct
using
malloc
in our
main()
function, we can simultaneously allocate memory for our
int *array
pointer in same step, like this:

struct myStruct *p = malloc(sizeof(struct myStruct) + 100 * sizeof(int));

p->array = p+1;


instead of

struct myStruct *p = malloc(sizeof(struct myStruct));

p->array = malloc(100 * sizeof(int));


assuming we want an array of size 100.

The first option is said to be better since we would get a continuous chunk of memory and we can free that whole chunk with one call to free() versus 2 calls in the latter case.

Experimenting, I wrote this:

#include<stdio.h>
#include<stdlib.h>

struct myStruct{
int i;
int *array;
};

int main(){
/* I ask for only 40 more bytes (10 * sizeof(int)) */

struct myStruct *p = malloc(sizeof(struct myStruct) + 10 * sizeof(int));

p->array = p+1;

/* I assign values way beyond the initial allocation*/
for (int i = 0; i < 804; i++){
p->array[i] = i;
}

/* printing*/
for (int i = 0; i < 804; i++){
printf("%d\n",p->array[i]);
}

return 0;
}


I am able to execute it without problems, without any segmentation faults. Looks weird to me.

I also came to know that C99 has a provision which says that instead of declaring an
int *array
inside a struct, we can do
int array[]
and I did this, using
malloc()
only for the struct, like

struct myStruct *p = malloc(sizeof(struct myStruct));


and initialising array[] like this

p->array[10] = 0; /* I hope this sets the array size to 10
and also initialises array entries to 0 */


But then again this weirdness where I am able to access and assign array indices beyond the array size and also print the entries:

for(int i = 0; i < 296; i++){ // first loop
p->array[i] = i;
}

for(int i = 0; i < 296; i++){ // second loop
printf("%d\n",p->array[i]);
}


After printing
p->array[i]
till
i = 296
it gives me a segmentation fault, but clearly it had no problems assigning beyond
i = 9
.
(If I increment 'i' till 300 in the first for loop above, I immediately get a segmentation fault and the program doesn't print any values.)

Any clues about what's happening? Is it undefined behaviour or what?

EDIT: When I compiled the first snippet with the command

cc -Wall -g -std=c11 -O struct3.c -o struct3


I got this warning:

warning: incompatible pointer types assigning to 'int *' from
'struct str *' [-Wincompatible-pointer-types]
p->array = p+1;

Answer

Yes, what you see here is an example of undefined behavior.

Writing beyond the end of allocated array (aka buffer overflow) is a good example of undefined behavior: it will often appear to "work normally", while other times it will crash (e.g. "Segmentation fault").

A low-level explanation: there are control structures in memory that are situated some distance from your allocated objects. If your program does a big buffer overflow, there is more chance it will damage these control structures, while for more modest overflows it will damage some unused data (e.g. padding). In any case, however, buffer overflows invoke undefined behavior.

The "struct hack" in your first form also invokes undefined behavior (as indicated by the warning), but of a special kind - it's almost guaranteed that it would always work normally, in most compilers. However, it's still undefined behavior, so not recommended to use. In order to sanction its use, the C committee invented this "flexible array member" syntax (your second syntax), which is guaranteed to work.

Just to make it clear - assignment to an element of an array never allocates space for that element (not in C, at least). In C, when assigning to an element, it should already be allocated, even if the array is "flexible". Your code should know how much to allocate when it allocates memory. If you don't know how much to allocate, use one of the following techniques:

  • Allocate an upper bound: struct myStruct{ int data; int array[100]; // you will never need more than 100 numbers };
  • Use realloc
  • Use a linked list (or any other sophisticated data structure)