Stephen Stephen - 2 months ago 14
Python Question

specifying string types in cython code

I'm doing some experimentation with cython and I came across some unexpected behavior:

In [1]: %load_ext cython

In [2]: %%cython
...: cdef class foo(object):
...: cdef public char* val
...: def __init__(self, char* val):
...: self.val = val
...:

In [3]: f = foo('aaa')

In [4]: f.val
Out[4]: '\x01'


What's going on with
f.val
? repeated inspection produces seemingly random output, so it looks like
f.val
is pointing to invalid memory.

The answer to this question suggests that you should use
str
instead.
Indeed, this version works fine:

In [21]: %%cython
...: cdef class foo(object):
...: cdef public str val
...: def __init__(self, str val):
...: self.val = val


So, what is going on in the first version? It seems like the
char*
is getting freed at some point after class construction but I'm not really clear on why.

Answer

When you convert a Python bytestring to a char * in Cython, Cython gives you a pointer to the contents of the string object. This raw pointer does not affect the string's Python refcount (it'd be infeasible to track which pointers refer to which strings).

When the string's refcount hits zero and the string is reclaimed, your pointer becomes invalid.

You shouldn't convert Python bytestrings to char *s unless you actually need a char *. If you do, make sure to also keep a normal Python reference to the string for as long as you need the char *, to extend the string's lifetime.