Coderx7 Coderx7 - 3 years ago 142
Python Question

How can I gradually free memory from a numpy array?

I'm in a situation where I'm constantly hitting my memory limit (I have 20G of RAM). Somehow I managed to get the huge array into memory and carry on my processes. Now the data needs to be saved onto the disk. I need to save it in

leveldb
format.

This is the code snippet responsible for saving the normalized data onto the disk:

print 'Outputting training data'

leveldb_file = dir_des + 'svhn_train_leveldb_normalized'
batch_size = size_train

# create the leveldb file
db = leveldb.LevelDB(leveldb_file)
batch = leveldb.WriteBatch()
datum = caffe_pb2.Datum()

for i in range(size_train):
if i % 1000 == 0:
print i

# save in datum
datum = caffe.io.array_to_datum(data_train[i], label_train[i])
keystr = '{:0>5d}'.format(i)
batch.Put( keystr, datum.SerializeToString() )

# write batch
if(i + 1) % batch_size == 0:
db.Write(batch, sync=True)
batch = leveldb.WriteBatch()
print (i + 1)

# write last batch
if (i+1) % batch_size != 0:
db.Write(batch, sync=True)
print 'last batch'
print (i + 1)


Now, my problem is, I hit my limit pretty much at the very end (495k out of 604k items that need to be saved to the disk) when saving to the disk.

To get around this issue, I thought after writing each batch, I release the corresponding memory from the numpy array (data_train) since it seems leveldb writes the data in a transaction manner, and until all the data are written, they are not flushed to the disk!

My second thought is to somehow, make the write non-transactional, and when each batch is written using
the db.Write
, it actually saves the content to the disk.

I don't know if any of these ideas are applicable.

Answer Source

Try reducing batch_size to something smaller than the entire dataset, for example, 100000.

Converted to Community Wiki from @ren 's comment

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download