Reira VR Reira VR - 17 days ago 5
Python Question

How to write multiple txt files in Python?

I am doing preprocessing tweet in Python. My unpreprocess tweets are in a folder. Each file containing unpreprocess tweet named 1.txt, 2.txt,...10000.txt. I want to preprocess them and write them into new files that also named 1.txt , 2.txt,...10000.txt.
My code is as follows :

for filename in glob.glob(os.path.join(path, '*.txt')):
with open(filename) as file:
tweet=file.read()
def processTweet(tweet):
tweet = tweet.lower()
tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',tweet)
tweet = re.sub('@[^\s]+','USER',tweet)
tweet = re.sub('[\s]+', ' ', tweet)
tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
tweet = tweet.translate(None, string.punctuation)
tweet = tweet.strip('\'"')
return tweet

fp = open(filename)
line = fp.readline()

count = 0
processedTweet = processTweet(line)
line = fp.readline()
count += 1
name = str(count) + ".txt"
file = open(name, "w")
file.write(processedTweet)
file.close()


But that code just give me a new file named 1.txt that already preprocessed. How can I write the other 9999 files? Is there any mistake in my code?

Answer

Your count is getting reset to 0 with the call to count=0. So everytime it is about to write a file, it write "1.txt". Why are you trying to reconstruct the filename, instead of just using the existing filename for the tweet you are preprocessing. Also, you should move your function definition to outside the loop:

def processTweet(tweet):
    tweet = tweet.lower()
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',tweet)
    tweet = re.sub('@[^\s]+','USER',tweet)
    tweet = re.sub('[\s]+', ' ', tweet)
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)            
    tweet = tweet.translate(None, string.punctuation)
    tweet = tweet.strip('\'"')
    return tweet

for filename in glob.glob(os.path.join(path, '*.txt')):
  with open(filename) as file:
    tweet=file.read()

  processedTweet = processTweet(tweet)

  file = open(filename, "w")
  file.write(processedTweet)
  file.close()
Comments