James James - 1 month ago 6
C Question

Merge-sorting a large number of integers recursively with a dynamically allocated array throws fault

I wrote a C program to merge-sort (recursive) integers, using a dynamically allocated array. It works fine with up to 100k integers, but when I feed 1 million integers, it throws the

Segmentation fault (core dumped)
error.

Why does it do this? Is my 16GB RAM not good enough? Would it be able to sort bigger number of integers if I didn't use a dynamically allocated array?

How does dynamic allocation exactly work? From my understanding, when a dynamic variable or an element in an dynamic array is declared, a portion of memory (RAM?) is put aside and set to strictly store the declared variable until the memory is freed.

When my program tries to set aside memory to hold a million integers, does it fail because there isn't enough memory available?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define BIL 1E9

//struct Sort allows dynamic allocation of the array used in insertion sort.
typedef struct {
int *arr; //pointer to the dynamic array
size_t used; //stores number of 'used' elements in the array
size_t size; //stores number of total elements
} Sort;

//function prototypes to interact with the dynamic array
void freeSort(Sort *);
void initSort(Sort *, size_t);
void inSort(Sort *, int);

//prototypes for the Merge Sort
void mergeSort(Sort *, int, int, int []);
void merge(Sort *, int, int, int, int []);
void copyArray(int [], int, int, Sort *);



int main(){

//declare Sort variable 'magic' to perform the magical insertion sort on the dynamic array.
Sort magic;
initSort(&magic, 10); //initialize magic with 10 elements

//variables to allow the program to function
int intin;
char filename[15];

//tosort is the file to sort.
//sorted is the output file after sort.
FILE *tosort, *sorted;

//necessary variables to measure time
struct timespec start, finish;

//prompt user for file name.
printf("Enter the name of file with a list of integers to sort: ");
scanf("%s", filename);
tosort = fopen(filename, "r"); //read 'tosort' file

//write the 'sorted' file to 'filename.sorted'
sorted = fopen(strcat(filename, ".sorted"), "w");

//while loop stores every integer in the dynamically allocated magic array from tosort file.
while (!feof(tosort)) {
fscanf(tosort, "%d", &intin);
inSort(&magic, intin);
}

//n stores number of integers to sort
int n = magic.used;
//temporary array for use with the merge sort
int sortedArray [n];

//measure time
clock_gettime(CLOCK_REALTIME, &start); //start

//Merge Sort
mergeSort(&magic, 0, n, sortedArray);

clock_gettime(CLOCK_REALTIME, &finish); //finish

//calculate the elapsed time in nanoseconds.
double elapsed = (finish.tv_sec-start.tv_sec)+(finish.tv_nsec-start.tv_nsec)/BIL;

printf("Merge Sort took %lf seconds\n", elapsed);

//write the sorted array to 'sorted' ('filename'.sorted)
for (int i = 0; i < n; i++) {
fprintf(sorted, "%d\n", magic.arr[i]);
}

//free up the allocated memory for the Sort array and close the files.
freeSort(&magic);
fclose(tosort);
fclose(sorted);

return 0;
}

//initialize the dynamic array
void initSort(Sort *dynA, size_t initSize) {
dynA->arr = (int *)malloc(initSize * sizeof(int));
dynA->used = 0;
dynA->size = initSize;
}

//add values to the elements of the dynamic array
void inSort(Sort *dynA, int val) {
//if the array size is not big enough to fit new values, allocate 100 more elements.
if (dynA->used == dynA->size) {
dynA->size += 100;
dynA->arr = (int *)realloc(dynA->arr, dynA->size * sizeof(int));
}
//'used' holds the number of used elements with values in the array.
dynA->arr[dynA->used++] = val;
}

//free allocated memory for the dynamic array
void freeSort(Sort *dynA) {
free(dynA->arr);
dynA->arr = NULL;
dynA->used = dynA->size = 0;
}


//split the array until size is 1
void mergeSort(Sort *dynA, int begin, int end, int tempA [])
{
//if size is 1, done splitting.
if(end-begin < 2)
return;

// recursively split the array
int mid = (end+begin)/2; // mid = middle point
mergeSort(dynA, begin, mid, tempA); // mergeSort left half
mergeSort(dynA, mid, end, tempA); // mergeSort right half
merge(dynA, begin, mid, end, tempA); // merge the two halves
copyArray(tempA, begin, end, dynA); // copy the merged array to dynA
}

//merge the two arrays
void merge (Sort *dynA, int begin, int mid, int end, int tempA [])
{
int i = begin; int j = mid;

//from begin to end, compare the values of the two arrays
for (int k = begin; k < end; k++)

// store the smaller value into tempA[k]
if (j >= end || (i < mid && dynA->arr[i] <= dynA->arr[j]))
tempA[k] = dynA->arr[i++];
else tempA[k] = dynA->arr[j++];
}

//copy the contents of the temporary array to the dynamic array
void copyArray(int tempA[], int begin, int end, Sort *dynA){
for(int k = begin; k < end; k++)
dynA->arr[k] = tempA[k];
}


Cygwin64 and CommandPrompt give same faults when I feed a million integers to sort.

Answer

Your bug is that you are using a large stack based VLA sortedArray. With 1,000,000 values, your array is 4MB and you get a segfault due to stack overflow because the array exceeds the preset stack size limit.

For example, under linux, the stack limit is around 8MB [on my system, I had to increase the array count to 3,000,000 to reproduce the segfault]


Change:

    //temporary array for use with the merge sort
    int sortedArray [n];

Into:

    //temporary array for use with the merge sort
    int *sortedArray = malloc(sizeof(int) * n);

Optionally, at the bottom of main you can add a free(sortedArray) for tidiness if you wish.


Also, you can use the limit shell command to see what the stacksize parameter is set to. So, an alternate way to fix the problem would be to use the command to increase the limit. But, I do not recommend that as it's a less general and more fragile solution than simply using malloc.


On debugging tips ...

To find your problem, I did the following:

Compiled the program with debug info: gcc -o blah -g blah.c

Invoked the debugger with gdb ./blah

Ran the program [inside the debugger] with run

I got the segfault information below:

Starting program: ...
Enter the name of file with a list of integers to sort: input2

Program received signal SIGSEGV, Segmentation fault.
0x00000000004009d1 in main () at blah.c:64
64      clock_gettime(CLOCK_REALTIME, &start);  //start

It didn't make much sense that clock_gettime was faulting, so I looked at the line above it:

    int sortedArray [n];
Comments