I'm looking to count the number of words per sentence, calculate the mean words per sentence, and put that info into a CSV file. Here's what I have so far. I probably just need to know how to count the number of words before a period. I might be able to figure it out from there.
#Read the data in the text file as a string
with open("PrideAndPrejudice.txt") as pride_file:
pnp = pride_file.read()
#Change '!' and '?' to '.'
for ch in ['!','?']:
if ch in pnp:
pnp = pnp.replace(ch,".")
#Remove period after Dr., Mr., Mrs. (choosing not to include etc. as that often ends a sentence although in can also be in the middle)
pnp = pnp.replace("Dr.","Dr")
pnp = pnp.replace("Mr.","Mr")
pnp = pnp.replace("Mrs.","Mrs")
To split a string into a list of strings on some character:
pnp = pnp.split('.')
Then we can split each of those sentences into a list of strings (words)
pnp = [sentence.split() for sentence in pnp]
Then we get the number of words in each sentence
pnp = [len(sentence) for sentence in pnp]
Then we can use
statistics.mean to calculate the mean:
statistics you must put
import statistics at the top of your file. If you don't recognize the ways I'm reassigning
pnp, look up list comprehensions.