Text compression through variable-length coding

Technologies: Java, Maven, JUnit, javadoc

HuffmanRevisited is a command-line app compressing text files into a proprietary .huf format, and decompressing them back again.

It uses Huffman coding, a famous technique invented by David A. Huffman. It relies on repeatedly gathering into a new subtree the lowest two members of a sorted list of subtrees whose leaves correspond to the unique characters of the string being compressed; the sorting being by accumulated frequency of occurrence of any character held in that subtree.

As I discuss in the blog post linked below, I enjoyed working on bitwise transformations, the overall design and in particular setting up tests with JUnit.

I plan to use javadoc and write more tests to turn this into a fully-documented and specced piece of software.

More info