Silas Silas - 6 months ago 29
Python Question

Pickle multiple files in a directory

So I wanted to figure out a way to read multiple text files in a directory and pickle them together to form a data.pkl file.

So far I tried as below:

Code:

import _pickle as cPickle

file1=open('/home/mustafa/data/raw.en/raw.en','rb')
obj=[file1.read()]
pickle.dump(obj,open('data.pkl','wb'),4)


There are about 2 dozen text files labeled as englishText_1 , englishText2 and so on.

Answer Source

How you use the data would dictate how you would want to save each file. If the file names aren't necessary, then iterating over each file in a directory and just saving the contents to a list and then dumping that list to a pickle file would suffice. If you need to save file names, attributes, etc then I would recommend creating a class to save that information to; ie

class FileData(object):
    def __init__(self, path):
        self.path = path
        with open(path, "rb") as fileobj:
            self.data = fileobj.read()
        # add whatever other attributes you want to save here

and then add the FileData instances to a list or another class and dump that to a file.

file_list = []
for name in os.listdir(folder_path):
    path = os.path.join(folder_path, name)
    if not os.path.isfile(path):
        continue
    file_list.append(FileData(path))

with open(pkl_path, "wb") as fileobj:
    cPickle.dump(file_list, fileobj)