MACEE MACEE - 1 year ago 58
Python Question

How to split a text file to its words in python?

I am very new to python and also didn't work with text before...I have 100 text files, each has around 100 to 150 lines of unstructured text describing patient's condition. I read one file in python using:

with open("C:\\...\\...\\...\\record-13.txt") as f:
content = f.readlines()
print (content)

Now I can split each line of this file to its words using for example:

a = content[0].split()
print (a)

but I don't know how to split whole file to words?
do loops (while or for) help with that?

Thank you for your help guys. Your answers help me to write this (in my file, words are split by space so that's delimiter I think!):

with open ("C:\\...\\...\\...\\record-13.txt") as f:
lines = f.readlines()
for line in lines:
words = line.split()
for word in words:
print (word)

that simply splits words by line (one word in one line).

Answer Source

Nobody has suggested a generator, I'm surprised. Here's how I would do it:

def words(stringIterable):
    #upcast the argument to an iterator, if it's an iterator already, it stays the same
    lineStream = iter(stringIterable)
    for line in lineStream: #enumerate the lines
        for word in line.split(): #further break them down
            yield word

Now this can be used both on simple lists of sentences that you might have in memory already:

listOfLines = ['hi there', 'how are you']
for word in words(listOfLines):

But it will work just as well on a file, without needing to read the whole file in memory:

with open('', 'r') as myself:
    for word in words(myself):