Stephen Stephen - 4 months ago 33
Python Question

specifying string types in cython code

I'm doing some experimentation with cython and I came across some unexpected behavior:

In [1]: %load_ext cython

In [2]: %%cython
...: cdef class foo(object):
...: cdef public char* val
...: def __init__(self, char* val):
...: self.val = val

In [3]: f = foo('aaa')

In [4]: f.val
Out[4]: '\x01'

What's going on with
? repeated inspection produces seemingly random output, so it looks like
is pointing to invalid memory.

The answer to this question suggests that you should use
Indeed, this version works fine:

In [21]: %%cython
...: cdef class foo(object):
...: cdef public str val
...: def __init__(self, str val):
...: self.val = val

So, what is going on in the first version? It seems like the
is getting freed at some point after class construction but I'm not really clear on why.


When you convert a Python bytestring to a char * in Cython, Cython gives you a pointer to the contents of the string object. This raw pointer does not affect the string's Python refcount (it'd be infeasible to track which pointers refer to which strings).

When the string's refcount hits zero and the string is reclaimed, your pointer becomes invalid.

You shouldn't convert Python bytestrings to char *s unless you actually need a char *. If you do, make sure to also keep a normal Python reference to the string for as long as you need the char *, to extend the string's lifetime.