I'm implementing Huffman encoding in c++ and I can successfully build a Huffman tree and can encode/decode strings.
The next thing I want to do is be able to encode/decode files, but I have a few problems.
I'm using bool vectors to contain the code words. My problem is: I can only write bytes to a file. How do I write bit by bit? Is there perhaps a library I can use?
The other thing is, that if I want to decode a file I need the the tree itself (or the code table). What's the best way to serialize the tree?
Any help would be much appreciated.
Too bad the internal format of a C++ bool vector is undefined, since it very likely to already be packed bits.
Anyway, you would use the
& operators to pack bits into bytes on the encoding side, as well as to unpack the bits on the decoding side. Assuming that you know that a byte is made up of eight bits, then this is trivial to do.
As for transmitting a Huffman code, read about canonical Huffman codes. You do not need to send the code, just the code length in bits for each symbol. For more efficiency, the sequence of lengths can itself be compressed, with run-length and Huffman coding. See the Deflate format for an example.