InnoDB page compression technology

Ⅰ. Remember an error

1.1 Create table error

(root@localhost) [(none)]> create tablespace ger_space add datafile 'ger_space.ibd' file_block_size=8192;
Query OK, 0 rows affected (0.02 sec)

(root@localhost) [(none)]> create database test;
Query OK, 1 row affected (0.01 sec)

(root@localhost) [(none)]> create table test_ger(a int) tablespace=ger_space;
(root@localhost) [(none)]> use test;
Database changed
(root@localhost) [test]> create table test_ger(a int) tablespace=ger_space;
ERROR 1478 (HY000): InnoDB: Tablespace `ger_space` uses block size 8192 and cannot contain a table with physical page size 16384

2.1 Solution

(root@localhost) [test]> create table test_ger(a int) tablespace=ger_space row_format=compressed, key_block_size=8;
Query OK, 0 rows affected (0.14 sec)

amazing!!!(~﹃~)~zZ

A technique for compressing pages is used here. In this chapter, we discuss page compression in InnoDB.

Ⅱ. Traditional compression method (from 5.5)

2.1 Principle - Buddy Algorithm

  • Compressed pages (row_format=compressed) are stored in the disk. Assuming key_block_size=8K, the page size of the Buffer Pool is 16K
  • Apply for a free page from the Free List. If there is no free page, apply for a page from the LRU List. If the LRU is full, find the last page in the LRU. If the last page is a dirty page, perform a flush operation, and finally get A blank page (16K)
  • The 16k blank page is used for the 8K compressed page, so that there is an extra 8K space, which will be moved to the 8K Free List
  • If there is a 4K compressed page, use the blank page in the 8K Free List for him, and then move the extra 4K space to the 4K Free List

important point:

  • This compression is page based, the page size of each table can be different, the compression algorithm is L777
  • When the user obtains data, if the compressed page is not in the Innodb_Buffer_Pool buffer pool, it will be loaded from the disk, and a new uncompressed 16KB data page will be opened in the Innodb_Buffer_Pool buffer pool to decompress, in order to reduce disk I /O and decompression of pages (faster queries), both compressed and uncompressed pages exist in the buffer pool
  • In order to make room for other required data pages, the uncompressed data pages will be kicked out of the buffer pool, and the compressed pages will be kept in memory. If the uncompressed pages have not been accessed for a period of time, they will be flushed directly. into disk, so there may be compressed and uncompressed pages, or only compressed pages in the buffer pool
  • The reason why the compressed page is reserved is to add redo to the free part of the compressed page when updating data. If you want to flush the compressed page back to the disk, you can directly flush the compressed page back. If the page is full, do a reorganize operation. (Before this, you will also need to decompress it), and it will be split when it is really full.

shortcoming:

  • Compressed pages occupy the space of the Buffer Pool. For hot data, it is equivalent to a small memory, which may cause performance degradation (the hotspot space becomes smaller)
  • Therefore, after the compression is turned on, the space of the Buffer Pool should be increased accordingly.
  • If the disk IO saved by enabling compression can offset the performance drop caused by the reduced Buffer Pool space, the overall performance will still increase, so the Buffer Pool should be as large as possible

2.2 Play with two hands

直接创建
(root@localhost) [test]> create table comps_test(a int) row_format=compressed, key_block_size=4;
Query OK, 0 rows affected (0.04 sec)

对已存在的表启用压缩,并且页大小为4k,
alter table xxxxx
engine=innodb
row_format=compressed,key_block_size=4

可以设置为1 2 4 8 16
操作须知:
指定row_format=compressed,则可忽略key_block_size的值,这时使用默认innodb页的一半,即8kb
指定key_block_size的值,则可忽略row_format=compressed,会自动启用压缩
0代表默认压缩页的值,Innodb页的一半
key_block_size的值只能小于等于innodb page size,若指定了一个大于innodb page size的值,mysql会忽略这个值然后产生一个警告,这时key_block_size的值是Innodb页的一半
若设置了innodb_strict_mode=ON,那么指定一个不合法的key_block_size的值是返回报错

tips:

Although row_format=compressed is written in the SQL syntax, compression is for pages, not records, that is, decompression when reading pages, and compression when writing pages, not when reading or writing a single record (row) decompress or compress

2.3 Details key_block_size

  • The options for key_block_size are 1k, 2k, 4k, 8k, 16k (the page size, not the scale)
  • It is not to compress the data of the original innodb_page_size page size into the page size of key_block_size, because some data may not be compressed, or the compression may not be so small
  • Compression is to compress the data of the original page to a certain size through the compression algorithm, and then use the page of the size of key_block_size to store it
  • For example, the original innodb_page_size size is 16K, the current key_block_size is set to 8K, and the data size of a table is 24k. Originally, two 16k pages were used to store the compressed data from 24k to 18k. Since the current key_block_size=8k, it is necessary to Three 8K pages store the compressed 18K data, and the extra space can be reserved for the next insert or update
  • The compression ratio has nothing to do with the set key_block_size. The compression ratio depends on the data itself and the algorithm. The key_block_size only sets the page size for storing compressed data.

tips:

Data can be inserted without decompression, by directly storing the redo log in the remaining space, and then decompressing after the page space is full, and then compressing and storing it after the update is completed with the redo log (there is no redo log at this time) to reduce Decompression and compression times

2.4 Important Notes

It's not that the smaller the key_block_size, the higher the compression ratio, but the page size has been modified

In the compression process, the 16k page is pressed 8k, first judge whether it can be pressed or not, and then save it as 8k, after the compression is greater than 8k, such as 12k, then it will be saved as two 8k

The success rate of pressing from 16k to 8k is 80%~90%, but it cannot be guaranteed if it is pressed again.

2.5 View compressed pages in the buffer pool

(root@localhost) [(none)]> SELECT
    ->     table_name,
    ->     space,
    ->     page_number,
    ->     index_name,
    ->     compressed,
    ->     compressed_size
    -> FROM
    ->     information_schema.innodb_buffer_page_lru
    -> WHERE
    ->     compressed = 'yes'
    -> LIMIT 5;
+------------+-------+-------------+------------+------------+-----------------+
| table_name | space | page_number | index_name | compressed | compressed_size |
+------------+-------+-------------+------------+------------+-----------------+
| NULL       |   168 |         502 | NULL       | YES        |            4096 |
| NULL       |   168 |         505 | NULL       | YES        |            4096 |
| NULL       |   168 |         508 | NULL       | YES        |            4096 |
| NULL       |   168 |         510 | NULL       | YES        |            4096 |
| NULL       |   168 |         513 | NULL       | YES        |            4096 |
+------------+-------+-------------+------------+------------+-----------------+
5 rows in set (0.00 sec)

(root@localhost) [(none)]> SELECT
    ->     table_id, name, space, row_format, zip_page_size
    -> FROM
    ->     information_schema.INNODB_SYS_TABLES
    -> WHERE
    ->     space = 168;
+----------+----------------+-------+------------+---------------+
| table_id | name           | space | row_format | zip_page_size |
+----------+----------------+-------+------------+---------------+
|      173 | sbtest/sbtest1 |   168 | Compressed |          4096 |
+----------+----------------+-------+------------+---------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> show create table sbtest.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
1 row in set (0.00 sec)

Touch it all the way, it really is a compressed table

Look at the storage of compressed pages in bp

(root@localhost) [(none)]> show engine innodb status\G
...
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 127372
Buffer pool size   8191
Free buffers       7759
Database pages     632
Old database pages 253
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 597, created 35, written 42
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 632, unzip_LRU len: 4          压缩页在bp中的长度是4
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
...

2.6 Check the compression ratio

查看压缩比,看information_schema.innodb_cmp表
这个表里面的数据是累加的,是全局信息,没法对应到某一张表,查它之前先查另一张表来清空此表
select * from information_schema.innodb_cmp_reset;
把innodb_cmp表中的数据复制过来,并清空innodb_cmp,此处不展示结果

玩起来了
(root@localhost) [emp]> create table emp_comp like emp;
Query OK, 0 rows affected (0.26 sec)

(root@localhost) [emp]> alter table emp_comp row_format=compressed,key_block_size=4;
Query OK, 0 rows affected (0.23 sec)
Records: 0  Duplicates: 0  Warnings: 0

(root@localhost) [emp]> show create table emp_comp\G
*************************** 1. row ***************************
       Table: emp_comp
Create Table: CREATE TABLE `emp_comp` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`),
  KEY `ix_firstname` (`first_name`),
  KEY `ix_3` (`emp_no`,`first_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
1 row in set (0.04 sec)

(root@localhost) [emp]> insert into emp_comp select * from emp;
Query OK, 300024 rows affected (23.13 sec)
Records: 300024  Duplicates: 0  Warnings: 0

看压缩比咯
(root@localhost) [emp]> select * from information_schema.innodb_cmp;
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
|      1024 |            0 |               0 |             0 |              0 |               0 |
|      2048 |            0 |               0 |             0 |              0 |               0 |
|      4096 |        34296 |           27184 |             9 |           7743 |               0 |
|      8192 |            0 |               0 |             0 |              0 |               0 |
|     16384 |            0 |               0 |             0 |              0 |               0 |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
5 rows in set (0.69 sec)

(root@localhost) [emp]> select 27184/34296;    # compress_ops_ok/compress_ops
+-------------+
| 27184/34296 |
+-------------+
|      0.7926 |
+-------------+
1 row in set (0.11 sec)
压缩比为79.26%

看下物理存储
[root@VM_0_5_centos emp]# ll -h *.ibd
-rw-r----- 1 mysql mysql 40M Feb 27 19:01 emp.ibd
-rw-r----- 1 mysql mysql 20M Feb 27 19:36 emp_comp.ibd
能看出来表空间小了很多,但是不是79.26%,这里有时间需要理解一下

Open a parameter, you can see the compression of each table, this parameter is not enabled by default, it will affect performance

(root@localhost) [emp]> set global innodb_cmp_per_index_enabled=1;
Query OK, 0 rows affected (0.09 sec)

重复上面的测试,这次压到2k,过程省略只看结果
(root@localhost) [emp]> select * from information_schema.innodb_cmp;
+-----------+--------------+-----------------+---------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
|      1024 |            0 |               0 |             0 |              0 |               0 |
|      2048 |        68793 |           52455 |            12 |          18353 |               1 |
|      4096 |            0 |               0 |             0 |              0 |               0 |
|      8192 |            0 |               0 |             0 |              0 |               0 |
|     16384 |            0 |               0 |             0 |              0 |               0 |
+-----------+--------------+-----------------+---------------+----------------+-----------------+
5 rows in set (0.00 sec)

(root@localhost) [emp]> select * from information_schema.innodb_cmp_per_index;
+---------------+------------+--------------+--------------+-----------------+---------------+----------------+-----------------+
| database_name | table_name | index_name   | compress_ops | compress_ops_ok | compress_time | uncompress_ops | uncompress_time |
+---------------+------------+--------------+--------------+-----------------+---------------+----------------+-----------------+
| emp           | emp_comp   | PRIMARY      |        34676 |           23729 |             4 |          11053 |               0 |
| emp           | emp_comp   | ix_firstname |        20958 |           18349 |             5 |           4384 |               0 |
| emp           | emp_comp   | ix_3         |        13159 |           10377 |             2 |           2916 |               0 |
+---------------+------------+--------------+--------------+-----------------+---------------+----------------+-----------------+
3 rows in set (0.00 sec)

(root@localhost) [emp]> select 52455/68793;
+-------------+
| 52455/68793 |
+-------------+
|      0.7625 |
+-------------+
1 row in set (0.06 sec)

(root@localhost) [emp]> select (23729+18349+10377)/(34676+20958+13159);
+-----------------------------------------+
| (23729+18349+10377)/(34676+20958+13159) |
+-----------------------------------------+
|                                  0.7625 |
+-----------------------------------------+
1 row in set (0.00 sec)

可以直接看到emp.emp_comp这个表的压缩比(innodb索引即数据)
当只有一个表的时候innodb_cmp等于innodb_cmp_per_index

2.6 Compressed storage and performance

One question, does it make sense to set key_block_size to innodb_page_size when compressing

Answer: It is meaningful. The setting of key_block_size does not affect the compression itself (only related to the data itself and the compression algorithm), but only determines the page size of the compressed data. The compression effect of varchar, text and other data types is still obvious.

What about compressed storage and performance?

Github is very troublesome to put pictures, probably get a similar picture and download it

in conclusion:

  • Inodb_page_size=16k data setting key_block_size=16 can be compressed, and the effect is more obvious
  • It's not that the smaller the key_block_size is set, the higher the compression rate. The compression rates of 8K and 4K in the above figure are almost the same
  • After enabling compression, the insertion performance of 16K and 8K is better than the original uncompressed insertion performance, so with compression enabled, the performance is not necessarily worse
  • In I/O Bound (IO-intensive) business scenarios, reducing the number of I/O operations can significantly improve performance
  • The set value of key_block_size (experience value) is usually 1/2 of innodb_page_size

Ⅲ、transparent page compression(from 5.7)

3.1 Play first

(root@localhost) [test]> create table trans_test1(a int) compression='zlib';
Query OK, 0 rows affected, 1 warning (0.04 sec)

(root@localhost) [test]> create table trans_test2(a int) compression='lz4';
Query OK, 0 rows affected, 1 warning (0.06 sec)

(root@localhost) [test]> alter table trans_test1 compression='lz4';
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

alter table会很快,因为它不会真的改,只是下次压缩的时候才会用,不会重新压,如果非要马上生效则需要optimize

(root@localhost) [test]> optimize table trans_test1;
+------------------+----------+----------+-------------------------------------------------------------------+
| Table            | Op       | Msg_type | Msg_text                                                          |
+------------------+----------+----------+-------------------------------------------------------------------+
| test.trans_test1 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.trans_test1 | optimize | status   | OK                                                                |
+------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set, 1 warning (0.23 sec)

(root@localhost) [test]> show warnings;
+---------+------+---------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                     |
+---------+------+---------------------------------------------------------------------------------------------+
| Warning |  138 | Punch hole is not supported by the file system. Compression disabled for 'test/trans_test1' |
+---------+------+---------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

错误日志中有下面这段:
2017-04-22T19:20:14.168298+8:00 0 [Note] InnoDB: PUNCH HOLE support not available

换言之,5.7现在还不能用这个功能,二进制包编译的时候没把PUNCH HOLE编译进去

怎么解决?
自己编译源码,带上PUNCH HOLE,或者用percona

MySQL8.0的时候就完美availabe,可以去看看,亲测
create table a ( a int ) compression='lz4';
Query OK, 0 rows affected (0.03 sec)

tips:
5.7.19二进制包已经把做进去了,超赞
经测试,貌似ext3不支持punch hole

What is the difference between lz4 and zlib?

lz4 is faster, zlib compression ratio is higher

I usually choose lz4, which is faster and can be compressed to half, which is enough. Many databases on the hadoop platform use lz4 by default.

Look at the file system related content of the tablespace:

(root@localhost) [(none)]> SELECT
    ->     space, page_size, fs_block_size, file_size, allocated_size
    -> FROM
    ->     information_schema.INNODB_SYS_TABLESPACES
    -> LIMIT 5;
+-------+-----------+---------------+-----------+----------------+
| space | page_size | fs_block_size | file_size | allocated_size |
+-------+-----------+---------------+-----------+----------------+
|     2 |     16384 |          4096 |     98304 |         102400 |
|     3 |     16384 |          4096 |     98304 |         102400 |
|     4 |     16384 |          4096 |   9437184 |        9453568 |
|     5 |     16384 |          4096 |    114688 |         118784 |
|     6 |     16384 |          4096 |    147456 |         151552 |
+-------+-----------+---------------+-----------+----------------+
5 rows in set (0.00 sec)

表空间id    page大小    文件系统块大小   文件大小    文件实际分配大小

3.2 Look at the principle again (Hole Punch Size)

Careful friends will find that the above whole process does not specify the page size

Here is the use of the characteristics of sparse files in the file system layer to achieve the purpose of compression (file system holes)

1. File size and occupied space:

[root@VM_0_5_centos ~]# dd of=spare-file bs=1k seek=5120 count=0             #创建数据全为0的临时文件
0+0 records in
0+0 records out
0 bytes (0 B) copied, 4.1441e-05 s, 0.0 kB/s
[root@VM_0_5_centos ~]# ls -lh spare-file
-rw-r--r-- 1 root root 5.0M Feb 28 10:53 spare-file                          #文件大小5M
[root@VM_0_5_centos ~]# du --block-size=1 spare-file
0   spare-file                                                            #文件占用空间0M

The part where the data in the file is consecutively 0 does not occupy disk space

2. Hole characteristics of the file system:

  • A 16k page, the front data occupies 4k, the back is filled with 0, and the filled 12K space can be provided for subsequent insertion, update, etc.
  • From the perspective of innodb, the page size is still 16K, but the file system knows that the page only needs 4K to be stored (transparent to innodb)
  • The way SpaceID and PageNumber are read has not changed (details are masked by the filesystem)
    ```
    fopen(f,o_direct|o_punch_hole)

fwrite(f, page) At this time, this page is 4k in size on disk
```

3. The TPC process is as follows:

Before writing a page to disk, compress it in memory. After the compression, the first part is the compressed data, and the latter is filled with 0, and then fwrite is called. If the punch hole is opened, then the compression is realized on the disk. The
compression size is based on The block size of the file system is aligned, the default is 4k, so 16k can only be compressed into 4, 8, 16, if it is 6k after compression, it will occupy 8k

4. Comparison of new and old compression algorithms:

  • TPC is to call the punch hole of the file system layer, just compress and fill 0 before writing, which is concise and efficient; the old compression needs to specify key_block_size, the old algorithm data will occupy two spaces in bp, one is the compressed version, the other is the compressed version. It is a non-compressed version. To update a page, both pages must be updated, which requires extra overhead and is more complicated, so the performance is sometimes good or bad.

  • In the case of TPC, the size of each page of the disk is 16k. In fact, it may only be 4k, 8k or 12k. The management is still managed according to 16k.

  • In the case of TPC, a disk page in bp corresponds to a 16k page, because when it is read for the first time (space, page_no) is read from the disk, after reading and decompressing, it must be 16k after decompression, which is in bp take up only one space

3.3 Performance related issues

Official test:

122G, ssd runs ext4

After the old algorithm is compressed, the storage is reduced by 40%, and the QPS is reduced by 20%, which is still acceptable. The most important thing is the performance of SQL.

With the new algorithm, the storage is reduced by less than 40%, and the QPS is increased to 1w8, which is increased by 30%. The size of a read block becomes smaller, which saves the time of io operations. The CPU time is done once when the first read Decompression, do a compression when writing, not every record is compressed and decompressed.

Load test:

The import speed of the old algorithm is 50% slower, and the new algorithm is similar to no compression

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325294977&siteId=291194637