Kenneth Cornett Kenneth Cornett - 26 days ago 10
C++ Question

I am having trouble understanding how to handle the heisenbug segfault in my OMP program

I am having trouble in a side project I'm doing for a personal Neural Network structure. The issue I am encountering is the Heisenbug Segfault, and it is occurring in a paralleled section of code for a custom Monte Carlo algorithm I am writing.

The threads should not be interacting in any way for this section of the code until they reach the critical section I have defined, but some how, memory locations for local variables in a function call are being overridden by another thread, or the function call itself is overriding the memory position allocated by a previous thread.

I believe this person's problem is the same as the one I am experiencing, but I lack the understanding of how to use his enlightenment to fix my code, since, he did not specify how he fixed his issue.
OpenMP Causes Heisenbug Segfault

Here is the parallel section of the code I have written with the "tested" critical add in commented out, since, it did not help with the bug. The section where the bug is occurring is

#include "Network.h"
#include <vector>
#include <cmath>
#include <thread>
#include <omp.h>
#include <stdint.h>
#include <iostream>
using namespace std;
using namespace AeroSW;
int main(){
// Generate X amount of blueprints
vector<vector<double> > inputs;
vector<vector<double> > outputs;
double sf = 1100000;
double lr = 0.1;
uint32_t duration = 3;
for(uint32_t i = 0; i < 1000; i++){
vector<double>* in = new vector<double>(3);
vector<double>* out = new vector<double>(1); // These can be different sizes, but for simplicity for example
(*in)[0] = i;
(*in)[1] = i+1;
(*in)[2] = i+2;
(*out)[0] = i * 1000;
inputs.push_back(*in);
outputs.push_back(*out);
}
vector<vector<int> > bps;
int n_i = 3;
int n_o = 1;
for(uint32_t i = 0; i <= 3; i++){
int num_bps_for_this_layer = pow(7, i);
int* val_array = new int[i];
for(uint32_t j = 0; j < i; j++){
val_array[j] = 7;
}
for(uint32_t j = 0; j < (unsigned)num_bps_for_this_layer; j++){
vector<int>* vec_i = new vector<int>(2+i);
(*vec_i)[0] = n_i;
(*vec_i)[i+1] = n_o;
for(uint32_t k = 0; k < i; k++){
(*vec_i)[k+1] = val_array[k];
}
bps.push_back(*vec_i);
if(i > 0){
uint32_t t_i = i-1; // Temp i
val_array[t_i]--;
bool b_flag = false; // break flag
while(val_array[t_i] == 0){
val_array[t_i] = 7;
if(t_i == 0){
b_flag = true;
break;
}
t_i--;
val_array[t_i]--;
}
if(b_flag) break;
}
}
}
//cout << "Hello World\n";
uint32_t num_bins = 10;
uint32_t num_threads = std::thread::hardware_concurrency(); // Find # of cores
if(num_threads == 0) // Assume 1 core for systems w/out multiple cores
num_threads = 1;
if(num_bins < num_threads){
num_threads = num_bins;
}
uint32_t bp_slice = bps.size() / num_threads;
#pragma omp parallel num_threads(num_threads) firstprivate(num_bins, n_i, n_o, lr)
{
uint32_t my_id = omp_get_thread_num();
uint32_t my_si = my_id * bp_slice; // my starting index
uint32_t my_ei; // my ending index, exclusive
if(my_id == num_threads -1) my_ei = bps.size();
else my_ei = my_si + bp_slice;
std::vector<Network*> my_nets;
for(uint32_t i = my_si; i < my_ei; i++){
uint32_t nl = bps[i].size();
uint32_t* bp = new uint32_t[nl];
for(uint32_t j = 0; j < nl; j++){
bp[j] = bps[i][j];
}
Network* t_net = new Network(lr, bp, nl);
my_nets.push_back(t_net);
}
for(uint32_t i = 0; i < my_nets.size(); i++){
for(uint32_t j = 0; j < num_bins; j++){
my_nets[i]->train(inputs, outputs, sf, inputs.size(), duration);
}
}
}
}


If anyone sees something I do not, or knows what I could potentially do to fix this issue, please let me know!

Here is a sample output from Valgrind Debugger with the Helgrind tool active which describes the problem I believe as well.

==26386==
==26386== Possible data race during read of size 8 at 0x6213348 by thread #1
==26386== Locks held: none
==26386== at 0x40CB26: AeroSW::Node::get_weight(unsigned int) (Node.cpp:84)
==26386== by 0x40E688: AeroSW::Network::train_tim(std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >, double, unsigned int, unsigned long) (Network.cpp:227)
==26386== by 0x4058F1: monte_carlo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, double, double, double, std::vector<double*, std::allocator<double*> >&) [clone ._omp_fn.0] (Validation.cpp:196)
==26386== by 0x5462E5E: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==26386== by 0x404B86: monte_carlo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, double, double, double, std::vector<double*, std::allocator<double*> >&) (Validation.cpp:136)
==26386== by 0x402467: main (NeuralNetworkArchitectureDriver.cpp:85)
==26386== Address 0x6213348 is 24 bytes inside a block of size 32 in arena "client"
==26386==


-UPDATE-
It is a heap corruption problem. I had to modify a ton of code, but I got it working using shared_ptrs and vectors. The threads were overriding memory locations they should not have had access to, which caused other threads to crash because information they were trying to access had been changed.

Answer

I am writing a post because I identified my problem with the help of a local university professor. The issue I was having turned out to be a heap corruption problem due to the scope of memory being used in the program. This caused the threads to bypass their own memory allocation and start using other thread's memory space on the heap to store the information they could not fit into their own heap.

I was able to handle this by changing all object pointers to be shared_ptrs which prevented the memory locations from being overridden until all references to the objects were properly removed. I also changed all array pointers, or pointers being used as arrays, to vectors. After doing this, my problem vanished into thin air and it stopped randomly crashing.

And thanks Zulan for the recommendation as well!

Comments