student3000 student3000 - 18 days ago 5
Python Question

How to assign corpus to myfile?

I am very new to python and need some help with this please. I'm sure this is a very simple and quick thing to do but i've been stuck for hours and cant figure this out.

My assignment is to write a line of code that reads the corpus from the le corpus.txt and stores it as a variable. The variable will contain a python list, such that each element of the list represents a letter. The line that you write will involve calling the function get corpus from file(myfile) that has already been dened for you.

I have been given a skeleton code like this :

#!/usr/bin/env python

import re


def tidy_text(text):
text = text.lower()
text = re.sub('[\W_]', '', text)
return(text)


def get_corpus_from_file(myfile):
f = open(myfile)
print "Reading corpus from file:", myfile
corpus = f.read()
tidied_corpus = tidy_text(corpus)
corpus = list(tidied_corpus) ## split text into list of characters
print "The corpus consists of a sequence of", len(corpus), "letters"
return(corpus)
f.close()


What I cant figure out is how to get python to recognise the myfile? or in other words how do I assign the corpus.txt file which I want it to read from with myfile?

Any advice would be greatly appreciated.

The code should look like this :

$ python word_segmentation.py
Reading corpus from file: corpus.txt
The corpus consists of a sequence of 639 letters
What do you want to do?
s (segment input to find the best word boundary)
a (add new input to the corpus)
d (damage the corpus)
q (quit)
> s
type a 3-letter sequence for segmentation: thb
Here is the proposed word boundary given the training corpus:
Proposed end of one word: t h
Proposed beginning of new word: b
What do you want to do?
s (segment input to find the best word boundary)
a (add new input to the corpus)
d (damage the corpus)
q (quit)
> q
Writing corpus to file: corpus.txt
Good bye

Answer

myfile variable is assigned when the function is called. You have to call the function with the appropriate name.

You can append the following lines to the end of your script to type the file name when the script runs:

filename = raw_input("File to open: ")
get_corpus_from_file(filename)