Matt Rest Matt Rest - 4 months ago 29
Python Question

Decoding a Huffman code with a dictionary

I need to decode a Huffman code I coded with my program using a file containing the translation beetween ASCII and Huffman bits. I have already a dictionary in the progam from "codes" to ASCII like this one:

{'01110': '!', '01111': 'B', '10100': 'l', '10110': 'q', '10111': 'y'}


I created the function:

def huffmanDecode (dictionary, text) :


That needs the dictionary and the code. I have tried searching the text for key in the dictionary and using both the replace method form string and the sub from re but neither of them decodes the message properly.
for example if the code is:

011111011101110


It should be straightforward to decode it to:

By!


But I haven't been able to do it by iterating over the code and searching matches in the dictionary!

How can I decode the code using the keys and their values in the dictionary by finding the key inside the text and replacing it by its value?

Any help is greatly appreciated.

Answer

Not sure what you tried, but re.sub or replace probably did not work because they do not necessarily replace from the beginning of the string. You have to see what code the string starts with, then replace that code, and proceed with the rest of the string.

For example, like this:

def huffmanDecode (dictionary, text):
    res = ""
    while text:
        for k in dictionary:
            if text.startswith(k):
                res += dictionary[k]
                text = text[len(k):]
    return res

Or recursively:

def huffmanDecode (dictionary, text):
    if text:
        k = next(k for k in dictionary if text.startswith(k))
        return dictionary[k] + huffmanDecode(dictionary, text[len(k):])
    return ""

You could also make a regex from your codes and use re.match to find the next one:

import re
def huffmanDecode (dictionary, text):
    p = "|".join(dictionary) # join keys to regex
    res = ""
    while text:
        m = re.match(p, text)
        res += dictionary[m.group()]
        text = text[len(m.group()):]
    return res

Note: Neither of those have proper error handling and will fail or loop forever if the codes do not match the message, but this should get you started.

Comments