Huffman compression software

Project Overview

Process Project

First, to achieve a good first Huffman compression logic that I have to compress and decompress the package into two classes. Then swing structure using the Java graphical user interface tool to achieve the main GUI interface class.

Huffman compression principle

Huffman coding is a string 01, so that each one can be encoded in the minimum unit of the computer of bits stored. The main idea of which is placed close to the high frequency location of the root character appears on the location of the low frequency away from the root of the character, so that frequency can be achieved with the encoded characters appear in inverse proportion to the length, and thus more reasonable to allocate storage space to achieve the compressed file. Then it is constructed from the bottom up, it is seen from the front of each of said secondary principles find minimum weight two parent nodes a newborn, so that up layer by layer to build the Huffman tree can be achieved. To find the minimum of every two nodes ahead of me all the nodes stored in a maximum heap, so that you can \ [O (logN) \] complexity to find the minimum weight of the two nodes. After a good structure, according to the agreed path left and right 0 1, you can get Huffman coding of each character.

all in all

  • Compression process: reading documents, statistical bytes (characters) frequency of occurrence, the Huffman tree constructed according to the frequency, is obtained for each character coding, extract the relevant information and character code rewriting writing compressed file;
  • Decompression process: reading the file, extract the relevant information to read, to build the Huffman tree, get a code for each character, build character encoding to map HashMap easy to find, traverse the encoded into character, write-extracting file.

Since my object is a text compression, so I will be seen as the entire text character collection, because the ASCII code is only 256, so only need to open up an array of the same size to record the appearance of each character number as a key.

Since then Java data type is the smallest byte , so bit manipulation in terms of press-fitting each code bits. Then, to decode after the convenience, but also stores the necessary information, such as the total effective number of characters, and character bit-level corresponding to a valid Huffman code, the number of redundant bit (because the bit may not be immediately converted to stored in bytes). My storage structure compressed file is as follows

有效的字符总数(byte) => 有效字符(byte) + 出现频数(Int, 4bytes) => 多余的bit => 原文的编码表示(bytes)

Consideration about

  • Huffman coding Huffman coding character length may be greater than 8 Yeah, this is not more consumption of space yet?

Yes, indeed there will be more than 8 character encoding, but can be seen by the nature of Huffman's character is the number of occurrences of this little. In contrast, the number of additional characters which will be non-uniform rational allocation of space appear more in length tend to save more space.

Project Features

...... No ...... Lv, done for so long find this is the key -

Can improvement

Currently only able to compress text files, pictures and the like, although compression decompression can be restored, but the backlog of small body did not achieve the purpose.

Problems encountered in the project

  • In the text after the decompression is not in line with expectations when the error is too much trouble

The method I use is to print information in all places possible errors, for errors. Then to design their own simple sample text, your own simulation again, and then run again this inspection program. Followed by a longer article with further testing, the main method is to print out information during execution to troubleshooting.

  • It may not just be stored bytes after converting text into code a bit.

Less than zero padding in the back position, with a byte to record how many extra bit

Guess you like

Origin www.cnblogs.com/GorgeousBankarian/p/12543615.html