user1877600 user1877600 - 2 months ago 13
Python Question

Initialization of very big vector in C++

I created very big O(10M) floating point list in python. I would like to use this lookup table in my C++ project. What is the easiest and the most efficient way to transfer this array from python to C++.

My first idea was to generate c++ function, which is responsible for initializations of such long vector and then compile it.
The python code looks like above:

def generate_initLookupTable_function():
numbers_per_row = 100
function_body = """
#include "PatBBDTSeedClassifier.h"

std::vector<double> PatBBDTSeedClassifier::initLookupTable()
{
std::vector<double> indicesVector ={
"""
row_nb = 1
for bin_value in classifier._lookup_table[:,0]:
function_body += "\t" + str(bin_value) +" , "
if (row_nb % numbers_per_row) == 0:
function_body += "\n"
row_nb += 1

function_body += """\n };
return indicesVector;
}
"""
return function_body


The output file has size of 500 MB. And there is not possible to compile it (compilation terminated due to gcc crash):

../src/PatBBDTSeedClassifier_lookupTable.cpp
lcg-g++-4.9.3: internal compiler error: Killed (program cc1plus)

0x409edc execute
../../gcc-4.9.3/gcc/gcc.c:2854
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.


The another idea is to store python array into binary file and then read it in C++. But this is tricky. I cannot properly read it.
I generate the table using such simple command:

file = open("models/BBDT_lookuptable.dat", 'wb')
table = numpy.array(classifier._lookup_table[:,0])
table.tofile(file)
file.close()


Can you tell me how can I do it? I googled SO and I could't find andy sufficient answer.

Do you have any idea how can I deal with such big arrays.

I should have give you more detailed description of the problem.
I use python to train the ML (sklearn) classifier and then I would like to deploy it in C++. Doe to timing issue (execution speed is a crucial part of my study) I use idea of bonsai boosted decision trees. In this approach you transfer BDT into lookup table.

Answer

Here's a simple example of how to write Python float data to a binary file, and how to read that data in C. To encode the data, we use the struct module.

savefloat.py

#!/usr/bin/env python3
from struct import pack

# The float data to save
table = [i / 16.0 for i in range(32)]

# Dump the table to stdout
for i, v in enumerate(table):
    print('%d: %f' % (i, v))

# Save the data to a binary file
fname = 'test.data'
with open(fname, 'wb') as f:
    for u in table:
        # Pack doubles as little-endian 
        f.write(pack(b'<d', u))    

output

0: 0.000000
1: 0.062500
2: 0.125000
3: 0.187500
4: 0.250000
5: 0.312500
6: 0.375000
7: 0.437500
8: 0.500000
9: 0.562500
10: 0.625000
11: 0.687500
12: 0.750000
13: 0.812500
14: 0.875000
15: 0.937500
16: 1.000000
17: 1.062500
18: 1.125000
19: 1.187500
20: 1.250000
21: 1.312500
22: 1.375000
23: 1.437500
24: 1.500000
25: 1.562500
26: 1.625000
27: 1.687500
28: 1.750000
29: 1.812500
30: 1.875000
31: 1.937500

loadfloat.c

/* Read floats from a binary file & dump to stdout */

#include <stdlib.h>
#include <stdio.h>

#define FILENAME "test.data"
#define DATALEN 32

int main(void)
{
    FILE *infile;
    double data[DATALEN];
    int i, n;

    if(!(infile = fopen(FILENAME, "rb")))
        exit(EXIT_FAILURE);

    n = fread(data, sizeof(double), DATALEN, infile);
    fclose(infile);

    for(i=0; i<n; i++)
        printf("%d: %f\n", i, data[i]);

    return 0;
}

The above C code produces identical output to that shown for savefloat.py.

Comments