Tom Cornebize Tom Cornebize - 2 months ago 7
Python Question

Why is ctypes so slow to convert a Python list to a C array?

The bottleneck of my code is currently a conversion from a Python list to a C array using ctypes, as described in this question.

A small experiment shows that it is indeed very slow, in comparison of other Python instructions:

import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='array("I",t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))


Gives:

1.790962941000089
0.0911122129996329
0.3200237319997541


I obtained these results with CPython 3.4.2. I get similar times with CPython 2.7.9 and Pypy 2.4.0.

I tried runing the above code with
perf
, commenting the
timeit
instructions to run only one at a time. I get these results:

ctypes

Performance counter stats for 'python3 perf.py':

1807,891637 task-clock (msec) # 1,000 CPUs utilized
8 context-switches # 0,004 K/sec
0 cpu-migrations # 0,000 K/sec
59 523 page-faults # 0,033 M/sec
5 755 704 178 cycles # 3,184 GHz
13 552 506 138 instructions # 2,35 insn per cycle
3 217 289 822 branches # 1779,581 M/sec
748 614 branch-misses # 0,02% of all branches

1,808349671 seconds time elapsed


array

Performance counter stats for 'python3 perf.py':

144,678718 task-clock (msec) # 0,998 CPUs utilized
0 context-switches # 0,000 K/sec
0 cpu-migrations # 0,000 K/sec
12 913 page-faults # 0,089 M/sec
458 284 661 cycles # 3,168 GHz
1 253 747 066 instructions # 2,74 insn per cycle
325 528 639 branches # 2250,011 M/sec
708 280 branch-misses # 0,22% of all branches

0,144966969 seconds time elapsed


set

Performance counter stats for 'python3 perf.py':

369,786395 task-clock (msec) # 0,999 CPUs utilized
0 context-switches # 0,000 K/sec
0 cpu-migrations # 0,000 K/sec
108 584 page-faults # 0,294 M/sec
1 175 946 161 cycles # 3,180 GHz
2 086 554 968 instructions # 1,77 insn per cycle
422 531 402 branches # 1142,636 M/sec
768 338 branch-misses # 0,18% of all branches

0,370103043 seconds time elapsed


The code with
ctypes
has less page-faults than the code with
set
and the same number of branch-misses than the two others. The only thing I see is that there are more instructions and branches (but I still don't know why) and more context switches (but it is certainly a consequence of the longer run time rather than a cause).

I therefore have two questions:


  1. Why is ctypes so slow ?

  2. Is there a way to improve performances, either with ctype or with another library?


Answer

The solution is to use the array module and cast the address or use the from_buffer method...

import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt="v = array('I',t);assert v.itemsize == 4; addr, count = v.buffer_info();p = ctypes.cast(addr,ctypes.POINTER(ctypes.c_uint32))",setup=setup,number=10))
print(timeit.timeit(stmt="v = array('I',t);a = (ctypes.c_uint32 * len(v)).from_buffer(v)",setup=setup,number=10))
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))

It is then many times faster when using Python 3:

$ python3 convert.py
0.08303386811167002
0.08139665238559246
1.5630637975409627
0.3013848252594471
Comments