The context: my Python code pass arrays of 2D vertices to OpenGL.
I tested 2 approaches, one with ctypes, the other with struct, the latter being more than twice faster.
from random import random
points = [(random(), random()) for _ in xrange(1000)]
from ctypes import c_float
n = len(points)
return n, (c_float*(2*n))(*[u for point in points for u in point])
from struct import pack
n = len(points)
return n, pack("f"*2*n, *[u for point in points for u in point])
You could try Cython. For me, this gives:
function usec per loop: Python Cython array_ctypes 1370 1220 array_struct 384 249 array_numpy 336 339
So Numpy only gives 15% benefit on my hardware (old laptop running WindowsXP), whereas Cython gives about 35% (without any extra dependency in your distributed code).
If you can loosen your requirement that each point is a tuple of floats, and simply make 'points' a flattened list of floats:
def array_struct_flat(points): n = len(points) return pack( "f"*n, *[ coord for coord in points ] ) points = [random() for _ in xrange(1000 * 2)]
then the resulting output is the same, but the timing goes down further:
function usec per loop: Python Cython array_struct_flat 157
Cython might be capable of substantially better than this too, if someone smarter than me wanted to add static type declarations to the code. (Running 'cython -a test.pyx' is invaluable for this, it produces an html file showing you where the slowest (yellow) plain Python is in your code, versus python that has been converted to pure C (white). That's why I spread the code above out onto so many lines, because the coloring is done per-line, so it helps to spread it out like that.)
Full Cython instructions are here: http://docs.cython.org/src/quickstart/build.html
Cython might produce similar performance benefits across your whole codebase, and in ideal conditions, with proper static typing applied, can improve speed by factors of ten or a hundred.