Alexandre Pieroux Alexandre Pieroux - 3 months ago 16
C Question

Segmentation fault C gdb give wrong line

Here is the portion of the code that trigger this error (this is part of a bigger file that have roughly 1000 lines and is intended to implement the hypercuts algorithm) :

void get_combination_cuts_characteristics(uint32_t* cuts,
uint32_t nb_dim_cut,
struct hypercuts_dimension** dimensions,
uint32_t* return_cuts,
uint32_t sum_cuts,
struct classifier_rule** rules,
uint32_t nb_rules,
uint32_t* children_rules_sum,
uint32_t* max_rules)
{
// Array of children
uint32_t nb_children = (uint32_t) 0x1 << sum_cuts;
uint32_t* min_index = chkmalloc(sizeof(*min_index) * nb_dim_cut);
uint32_t* max_index = chkmalloc(sizeof(*max_index) * nb_dim_cut);
uint32_t* current_index = chkmalloc(sizeof(*current_index) * nb_dim_cut);
uint32_t children_array[nb_children];
for (uint32_t i = 0; i < nb_children; ++i)
children_array[i] = 0;


// For each rules we compute the number of rule each child get.
uint32_t min_value;
uint32_t max_value;
uint32_t nb_cuts;
uint32_t subregion_size;
uint32_t index;
for (uint32_t i = 0; i < nb_rules; ++i)
{
for (uint32_t j = 0; j < nb_dim_cut; ++j)
{
min_value = rules[i]->statements[dimensions[j]->id]->value;
max_value = rules[i]->statements[dimensions[j]->id]->value | rules[i]->statements[dimensions[j]->id]->mask;
nb_cuts = (uint32_t)0x1 << cuts[j];
subregion_size = (dimensions[j]->max_dim - dimensions[j]->min_dim) + 1;
subregion_size = subregion_size / nb_cuts;

if(subregion_size == 0)
continue;

// Fit the interval in the region of the dimension
if(min_value < dimensions[j]->min_dim)
min_value = dimensions[j]->min_dim;

if(max_value > dimensions[j]->max_dim)
max_value = dimensions[j]->max_dim;

// Compute the minimal and maximal index of the rule in this dimension
min_index[j] = (min_value - dimensions[j]->min_dim) / subregion_size;
max_index[j] = (max_value - dimensions[j]->min_dim) / subregion_size;
current_index[j] = min_index[j];
}

// Locate the first child
index = get_multi_dimension_index(min_index, nb_dim_cut, cuts);
children_array[index] ++;

// Locate all the other children that the rule span
while(get_next_dimension_index(current_index, min_index, max_index, nb_dim_cut))
{
index = get_multi_dimension_index(current_index, nb_dim_cut, cuts);
children_array[index]++;
}
}

// Set the return variables
uint32_t rules_sum = 0;
uint32_t max_rule_child = 0;

for (uint32_t i = 0; i < nb_children; ++i)
{
rules_sum += children_array[i];
if(max_rule_child < children_array[i])
max_rule_child = children_array[i];
}

if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum)))
{
*max_rules = max_rule_child;
*children_rules_sum = rules_sum;
for (uint32_t i = 0; i < nb_dim_cut; ++i)
return_cuts[i] = cuts[i];
}

free(min_index);
free(max_index);
free(current_index);
}


gdb tell me that I got a segfault at the line
rules_sum += children_array[i];
so it seemed that I went too far on the array and I checked my code. But the thing is that with gdb when I print the cell it try to access it is fine (give me the value I expect). I then tried to spot if a pointer could be the cause but they all print fine in gdb. I ran the program with valgrind and then it give me the segfault at the line
if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum)))
. I also tested the variables/pointers of this statements and they print fine too. So I was wondering if I could have stack overflow, so I allocated a 2GB stack to valgrind and allocated the arrays of the function on the heap but it result on the same issue.

Another tricky thing is that if I put a fprint before the for loop, one after and one inside I run fine...

Here is what valgrind give me:

Invalid read of size 4
==8397== at 0x4017EE: get_combination_cuts_characteristics (hypercuts.c:775)
==8397== by 0x401947: get_optimal_cut_combination (hypercuts.c:663)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x401B78: set_nb_cuts (hypercuts.c:485)
==8397== by 0x4025B2: build_node (hypercuts.c:219)
==8397== by 0x40273D: build_node (hypercuts.c:285)
==8397== by 0x4029B2: new_hypercuts_classifier (hypercuts.c:143)
==8397== by 0x403B02: main (hypercuts_test.c:277)
==8397== Address 0x11fefff77a is not stack'd, malloc'd or (recently) free'd
==8397==
==8397==
==8397== Process terminating with default action of signal 11 (SIGSEGV)
==8397== Access not within mapped region at address 0x11FEFFF77A
==8397== at 0x4017EE: get_combination_cuts_characteristics (hypercuts.c:775)
==8397== by 0x401947: get_optimal_cut_combination (hypercuts.c:663)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x401B78: set_nb_cuts (hypercuts.c:485)
==8397== by 0x4025B2: build_node (hypercuts.c:219)
==8397== by 0x40273D: build_node (hypercuts.c:285)
==8397== by 0x4029B2: new_hypercuts_classifier (hypercuts.c:143)
==8397== by 0x403B02: main (hypercuts_test.c:277)


I'am out of idea and I come here to have help that could give me hints or new ideas. This function is called by another recursive one (build_node: I'am talking of 4 recursive calls in the case of the segmentation fault so not too many) and is executed fine 3 times before it fault. This give me the feeling that something is messing around with the stack (a pointer or an array), but I didn't found tools to profile the stack and I checked that portion of code many times.

To give a little of details about that part of the code: this intend to perform a linear optimisation on the number of cuts to perform in a multidimensional space. This particular function give the caracteristics of the cuts performed and is performed at the end of each optimisation steps.

Thanks in advance !!

Answer

This should be straightforward to debug.

I'd investigate the valgrind crash first as it tends to be more precise. The line, if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum))), has two pointers being dereferenced. One or more of them are sure to be rubbish. Carefully check max_rules and children_rules_sum are pointing to a valid address. Add a debug statement and see if the values change.

The other line,rules_sum += children_array[i];, seems possible as well. There doesn't appear to be a check of index being less than nb_children. Use the same strategy and add some debug statements of the two values. A write past the end of the Array will corrupt the stack. The stack corruption could overwrite children_rules_sum or max_rules resulting in the valgrind crash.