I'm in a situation where I'm constantly hitting my memory limit (I have 20G of RAM). Somehow I managed to get the huge array into memory and carry on my processes. Now the data needs to be saved onto the disk. I need to save it in
leveldb
print 'Outputting training data'
leveldb_file = dir_des + 'svhn_train_leveldb_normalized'
batch_size = size_train
# create the leveldb file
db = leveldb.LevelDB(leveldb_file)
batch = leveldb.WriteBatch()
datum = caffe_pb2.Datum()
for i in range(size_train):
if i % 1000 == 0:
print i
# save in datum
datum = caffe.io.array_to_datum(data_train[i], label_train[i])
keystr = '{:0>5d}'.format(i)
batch.Put( keystr, datum.SerializeToString() )
# write batch
if(i + 1) % batch_size == 0:
db.Write(batch, sync=True)
batch = leveldb.WriteBatch()
print (i + 1)
# write last batch
if (i+1) % batch_size != 0:
db.Write(batch, sync=True)
print 'last batch'
print (i + 1)
the db.Write
Try reducing batch_size
to something smaller than the entire dataset, for example, 100000
.
Converted to Community Wiki from @ren 's comment