Michael Michael - 4 months ago 25
Python Question

load np.memmap without knowing shape

Is it possible to load a

numpy.memmap
without knowing the shape and still recover the shape of the data?

data = np.arange(12, dtype='float32')
data.resize((3,4))
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
del fp
newfp = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))


In the last line, I want to be able not to specify the shape and still get the variable
newfp
to have the shape
(3,4)
, just like it would happen with
joblib.load
. Is this possible? Thanks.

Answer

Not unless that information has been explicitly stored in the file somewhere. As far as np.memmap is concerned, the file is just a flat buffer.

I would recommend using np.save to persist numpy arrays, since this also preserves the metadata specifying their dimensions, dtypes etc. You can also load an .npy file as a memmap by passing the memmap_mode= parameter to np.load.

joblib.dump uses a combination of pickling to store generic Python objects and np.save to store numpy arrays.


To initialize an empty memory-mapped array backed by a .npy file you can use numpy.lib.format.open_memmap:

import numpy as np
from numpy.lib.format import open_memmap

# initialize an empty 10TB memory-mapped array
x = open_memmap('/tmp/bigarray.npy', mode='w+', dtype=np.ubyte, shape=(int(1E13),))

You might be surprised by the fact that this succeeds even if the array is larger than the total available disk space (my laptop only has a 500GB SSD, but I just created a 10TB memmap). This is possible because the file that's created is sparse.

Credit for discovering open_memmap should go to kiyo's previous answer here.

Comments