user3820991 user3820991 -4 years ago 289
Python Question

save numpy array in append mode

Is it possible to save a numpy array appending it to an already existing npy-file --- something like

np.save(filename,arr,mode='a')
?

I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.

Answer Source

The build-in .npy file format is perfectly fine for working with small datasets, without relying on external modules other then numpy.

However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].

For instance, below is a solution to save numpy arrays in HDF5 with PyTables,

Step 1: Create an extendable EArray storage

import tables
import numpy as np

filename = 'outarray.h5'
ROW_SIZE = 100
NUM_COLUMNS = 200

f = tables.open_file(filename, mode='w')
atom = tables.Float64Atom()

array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE))

for idx in range(NUM_COLUMNS):
    x = np.random.rand(1, ROW_SIZE)
    array_c.append(x)
f.close()

Step 2: Append rows to an existing dataset (if needed)

f = tables.open_file(filename, mode='a')
f.root.data.append(x)

Step 3: Read back a subset of the data

f = tables.open_file(filename, mode='r')
print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download