two1234 two1234 - 4 months ago 170
Python Question

Decompressing a text file

So I have already compressed my text now I need to decompress it to be able to recreate the text.

The compression is :

import zlib, base64

text = raw_input("Enter a sentence: ")#Asks the user to input text
text = text.split()#Splits the sentence

uniquewords = [] #Creates an empty array
for word in text: #Loop to do the following
if word not in uniquewords: #If the word is not in uniquewords
uniquewords.append(word) #It adds the word to the empty array

positions = [uniquewords.index(word) for word in text] #Finds the positions of each uniqueword
positions2 = [x+1 for x in positions] #Adds 1 to each position
print ("The uniquewords and the positions of the words are: ") #Prints the uniquewords and positions
print uniquewords
print positions2

file = open('task3file.txt', 'w')
file.write('\n'.join(uniquewords))#Adds the uniquewords to the file
file.write('\n')
file.write('\n'.join([str(p) for p in positions2]))
file.close()

file = open('compressedtext.txt', 'w')

text = ', '.join(text)

compression = base64.b64encode(zlib.compress(text,9))

file.write('\n'.join(compression))

print compression

file.close()


My attempt at decompression is:

import zlib, base64

text = ('compressedtext.txt')

file = open('compressedtext.txt', 'r')

print ("In the file is: \n") + file.read()

text = ''.join(text)
data = zlib.decompress(base64.b64decode(text))

recreated = " ".join([uniquewords[word] for word in positions]) #Recreates the sentence

file.close() #Closes the file

print ("The sentences recreated: \n") + recreated


But when I run the decompression and try to recreate the original text an error message appears saying

File "C:\Python27\lib\base64.py", line 77, in b64decode
raise TypeError(msg)
TypeError: Incorrect padding

Does anyone know how to fix this error?

Answer

There are a few things going on here. Let me start by giving you a working sample:

import zlib, base64

rawtext = raw_input("Enter a sentence: ")  # Asks the user to input text
text = rawtext.split()  # Splits the sentence

uniquewords = []  # Creates an empty array
for word in text:  # Loop to do the following
    if word not in uniquewords:  # If the word is not in uniquewords
        uniquewords.append(word)  # It adds the word to the empty array

positions = [uniquewords.index(word) for word in text]  # Finds the positions of each uniqueword
positions2 = [x+1 for x in positions]  # Adds 1 to each position
print ("The uniquewords and the positions of the words are: ")  # Prints the uniquewords and positions
print uniquewords
print positions2

infile = open('task3file.txt', 'w')
infile.write('\n'.join(uniquewords))  # Adds the uniquewords to the file
infile.write('\n')
infile.write('\n'.join([str(p) for p in positions2]))
infile.close()

infile = open('compressedtext.b2', 'w')

compression = base64.b64encode(zlib.compress(rawtext, 9))

infile.write(compression)

print compression

infile.close()

# Now read it again

infile = open('compressedtext.b2', 'r')
text = infile.read()
print("In the file is: " + text)
recreated = zlib.decompress(base64.b64decode(text))
infile.close()
print("The sentences recreated:\n" + recreated)

I've tried to keep things pretty close to what you had, but note in particular a few changes:

  • I'm trying to more carefully track the raw text versus the processed text.

  • I've removed the redefinition of zlib.

  • I've removed the extra line breaks that break the decompression.

  • I've done some general clean-up to better conform with normal Python conventions.

Hope this helps.

Comments