Detailed explanation of common general compression formats

1. Common compressed files

*.zip | archives compressed by zip program; (very common, but not suitable for cross-platform use because it does not contain encoding format information)
*.gz | archives compressed by gzip program; (the most widely used compression format in linux)
* .bz2 | files compressed by the bzip2 program;
*.xz | files compressed by the xz program; *.tar | data packaged by the tar
program, not compressed;
Compressed (most common)
*.tar.bz2 | tar-packed archives, which are bzip2-packed
*.tar.xz | tar-packed archives, which are xz-packed (next-generation compression option)
*. 7z | The 7zip program compresses packaged files.

2. Classification by whether to compress multiple files

  1. All three compression formats, gzip bzip2 xz, can only compress a single file. (In other words, the format only deals with streams and does not contain document tree information itself.)
    So if you want to use them to compress multiple files or directories, you need to use another software to combine the documents to be compressed into one file first, This command is tar.
    First use tar to archive the multiple files to be compressed, and then use the above compression instructions for the generated *.tar. This is how multi-file compression is implemented under linux.

  2. And 7z and zip, as well as rar format, both have two functions of archiving (tar) and compression, (that is, the format contains document tree information), which means that they can directly compress multiple files.

3. Differences in Algorithms Used in Each Format

  1. gzip is a mature format that uses an algorithm based on DEFLATE.
  2. 7z is a new generation format, the compression algorithm used can be replaced, the default is the lzma/lzma2 algorithm used, and AES-256 is used as the encryption algorithm.
  3. xz also uses the lzma/lzma2 algorithm, but can only compress one file.
  4. zip is also a compression format that supports multiple algorithms, and the default should be the DEFLATE algorithm used. Born earlier, there are many flaws. (Cross-platform garbled characters, easy to be cracked, etc.)
  5. rar uses a proprietary algorithm like DEFLATE, encrypted with AES. (use AES-256CBC after rar5.0)

However, zip is widely used in Android apk format, java jar format, etc. The reason is not found.

Expansion: Among the lossless compression algorithms, in addition to the commonly used general algorithms such as lzma/lzma2/deflate, some special algorithms have emerged recently, which are widely used on the web.
In the fields of audio, video, and pictures, lossy compression algorithms are usually used.

4. How to choose a compression scheme

  1. gzip is most common on linux and has a good balance of compression ratio and compression time. If in doubt, use it, you can't go wrong.
  2. xz is a new generation of compression format, although it has a better compression ratio, the compression/decompression speed is relatively many times slower. Generally, it can be used when the computer performance is good enough.
  3. tar generally needs to compress more than one document, so tar is usually used to complete the compression (adjust the format through the parameters of tar)
  4. 7z and xz are both new-generation compression formats, which are more complex and support multi-file compression. And more suitable for cross-platform, recommended.
  5. zip is not recommended because it is easy to cause garbled file names across platforms.
  6. The performance of rar is not bad, but it is a commercial format, not open source, and it is not recommended to use it.
  7. bz2 is a product of the transition period in the history of linux compression, and its performance is also between gz and xz. Generally speaking, it is not necessary to consider it.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324921222&siteId=291194637