Silenus Silenus - 4 months ago 29
Python Question

How to access lists saved as files with Python?

I am doing some natural language processing with Python (2.7.9) and NLTK (3.2.1). The way I am currently doing things, every time I run my program I do part-of-speech tagging on a large corpus.

The resulting tagged corpus looks like a larger version of this:

[('a', 'DT'), ('better', 'JJR'), ('widower', 'JJR'), ('than', 'IN'),
('my', 'PRP$'), ('father', 'NN'), ('.', '.'), ('Aunt', 'NNP'),
('Sybil', 'NNP'), ('had', 'VBD'), ('pink-rimmed', 'JJ'), ('azure',
'JJ'), ('eyes', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('waxen', 'JJ'),
('complexion', 'NN'), ('.', '.'), ('She', 'PRP'), ('wrote', 'VBD'),
('poetry', 'NN'), ('.', '.'), ('She', 'PRP'), ('was', 'VBD'),
('poetically', 'RB'), ('superstitious', 'JJ')]

Ideally, I would just save this list to a file and then read the file into a variable every time I run my program. Saving the list to a file is very easy:

POScorpus = pos_tag(words)

#I convert this to a string so I can write it to a file.

POScorpus_string = str(POScorpus)

#I then write it to a file.

f = open('C:\Desktop\POScorpus.txt', 'w')



The problem is that when I go to read the file into a variable, the
function only reads the file as a string—not as a list.

My question is simple: How can I read the file as a list rather than as a string? I imagine this is relatively simple, but I could not find any information about how to do it.

(Apologies if this is off-topic or a dupe.)


A string can be transformed into a list using the eval() function. That said, this is not the most efficient and memory-friendly solution to the problem.

A better option is to use Python's pickle or cPickle module. "Pickling" refers to the process of saving a Python object (for example, a list or dictionary) as a byte stream which can then be quickly unloaded into variables later, without loss or deformation of its object type. Pickling is also known as "serialization" and "marshalling".

Here is an example:


#Pickling involves saving a Python object as a file (without first converting
#it to a string).

#Let's pickle TaggedCorpus so we can use it efficiently later:

import cPickle                                 #imports fast pickle module (written in C)

f = open('C:\Desktop\TaggedCorpus.p', 'w')     #creates pickle file f
cPickle.dump(TaggedCorpus, f)                  #dumps data of TaggedCorpus object to f

#To unpickle the object, simply load the file into a variable:

f = open('C:\Desktop\TaggedCorpus.p', 'r')     #opens the pickle file as read
TaggedCorpus = cPickle.load(f)                 #loads the content of f as TaggedCorpus