Joseph Turian - 1 year ago 56

Python Question

What is an efficient way to initialize and access elements of a large array in Python?

I want to create an array in Python with 100 million entries, unsigned 4-byte integers, initialized to zero. I want fast array access, preferably with contiguous memory.

Strangely, NumPy arrays seem to be performing very slow. Are there alternatives I can try?

There is the array.array module, but I don't see a method to efficiently allocate a block of 100 million entries.

Responses to comments:

- I cannot use a sparse array. It will be too slow for this algorithm because the array becomes dense very quickly.
- I know Python is interpreted, but surely there is a way to do fast array operations?
- I did some profiling, and I get about 160K array accesses (looking up or updating an element by index) per second with NumPy. This seems very slow.

Answer

I have done some profiling, and the results are completely counterintuitive.
For simple array access operations, **numpy and array.array are 10x slower than native Python arrays**.

Note that for array access, I am doing operations of the form:

```
a[i] += 1
```

Profiles:

[0] * 20000000

- Access: 2.3M / sec
- Initialization: 0.8s

numpy.zeros(shape=(20000000,), dtype=numpy.int32)

- Access: 160K/sec
- Initialization: 0.2s

array.array('L', [0] * 20000000)

- Access: 175K/sec
- Initialization: 2.0s

array.array('L', (0 for i in range(20000000)))

- Access: 175K/sec, presumably, based upon the profile for the other array.array
- Initialization: 6.7s