Tom Tom - 19 days ago 9
C++ Question

Cython interfaced with C++: segmentation fault for large arrays

I am transferring my code from Python/C interfaced using ctypes to Python/C++ interfaced using Cython. The new interface will give me an easier to maintain code, because I can exploit all the C++ features and need relatively few lines of interface-code.

The interfaced code works perfectly with small arrays. However it encounters a segmentation fault when using large arrays. I have been wrapping my head around this problem, but have not gotten any closer to a solution. I have included a minimal example in which the segmentation fault occurs. Please note that it consistently occurs on Linux and Mac, and also valgrind did not give insights. Also note that the exact same example in pure C++ does work without problems.

The example contains (part of) a Sparse matrix class in C++. An interface is created in Cython. As a result the class can be used from Python.

C++ side



sparse.h


#ifndef SPARSE_H
#define SPARSE_H

#include <iostream>
#include <cstdio>

using namespace std;

class Sparse {

public:
int* data;
int nnz;

Sparse();
~Sparse();
Sparse(int* data, int nnz);
void view(void);

};

#endif


sparse.cpp


#include "sparse.h"

Sparse::Sparse()
{
data = NULL;
nnz = 0 ;
}

Sparse::~Sparse() {}

Sparse::Sparse(int* Data, int NNZ)
{
nnz = NNZ ;
data = Data;
}

void Sparse::view(void)
{

int i;

for ( i=0 ; i<nnz ; i++ )
printf("(%3d) %d\n",i,data[i]);

}


Cython interface



csparse.pyx


import numpy as np
cimport numpy as np

# UNCOMMENT TO FIX
#from cpython cimport Py_INCREF

cdef extern from "sparse.h":
cdef cppclass Sparse:
Sparse(int*, int) except +
int* data
int nnz
void view()


cdef class PySparse:

cdef Sparse *ptr

def __cinit__(self,**kwargs):

cdef np.ndarray[np.int32_t, ndim=1, mode="c"] data

data = kwargs['data'].astype(np.int32)

# UNCOMMENT TO FIX
#Py_INCREF(data)

self.ptr = new Sparse(
<int*> data.data if data is not None else NULL,
data.shape[0],
)

def __dealloc__(self):
del self.ptr

def view(self):
self.ptr.view()


setup.py


from distutils.core import setup, Extension
from Cython.Build import cythonize

setup(ext_modules = cythonize(Extension(
"csparse",
sources=["csparse.pyx", "sparse.cpp"],
language="c++",
)))


Python side



import numpy as np
import csparse

data = np.arange(100000,dtype='int32')

matrix = csparse.PySparse(
data = data
)

matrix.view() # --> segmentation fault


To run:

$ python setup.py build_ext --inplace
$ python example.py


Note that
data = np.arange(100,dtype='int32')
does work
.

Answer

The memory is being managed by your numpy arrays. As soon as they go out of scope (most likely at the end of the PySparse constructor) the arrays cease to exist, and all your pointers are invalid. This applies to both large and small arrays, but presumably you just get lucky with small arrays.

You need to hold a reference to all the numpy arrays you use for the lifetime of your PySparse object:

cdef class PySparse:

  # ----------------------------------------------------------------------------

  cdef Sparse *ptr
  cdef object _held_reference # added

  # ----------------------------------------------------------------------------

  def __cinit__(self,**kwargs):
      # ....
      # your constructor code code goes here, unchanged...
      # ....

      self._held_reference = [data] # add any other numpy arrays you use to this list

As a rule you need to be thinking quite hard about who owns what whenever you're dealing with C/C++ pointers, which is a big change from the normal Python approach. Getting a pointer from a numpy array does not copy the data and it does not give numpy any indication that you're still using the data.


Edit note: In my original version I tried to use locals() as a quick way of gathering a collection of all the arrays I wanted to keep. Unfortunately, that doesn't seem to include to cdefed arrays so it didn't manage to keep the ones you were actually using (note here that astype() makes a copy unless you tell it otherwise, so you need to hold the reference to the copy, rather than the original passed in as an argument).