sysbench的lua小改动导致的性能差异

|  导语

最近在配合某同事做一项性能压测,发现相同数据量、相同数据库参数、相同sysbench压力、相同数据库版本和sysbench版本、相同服务器硬件环境下,我和同事的压测结果天差地别:一个小时压测结束后,我的压测结果中出现了高频率周期性阻塞(tps,qps为0),而同事的压测结果中未出现阻塞(tps,qps从头到尾都比较稳定)。正常情况下,在环境完全相同时,不可能会出现如此巨大的性能差异。但这次,不可能发生的事情它的确发生了。经过复测与排查,终于发现了其中的奥妙。

服务器硬件信息

  • 数据库主机 
    * CPU:72 process 
    * memory:128G 
    * 磁盘:某存储 100G 
    * 网卡:intel 万兆网卡

  • 数据库版本:MySQL 5.7.21

  • sysbench版本:1.0.9

  • sysbench主机 
    * CPU:20 process 
    * memory:128G 
    * 磁盘:本地SAS 50G 
    * 网卡:intel 万兆网卡


|  复测结果数据

先来看看压测结果数据,如下图所示(注:此处只截取了前几十秒的数据进行对比):

  • 我的压测结果(下图可见,明显的周期性tps,qps为0的情况) 


  • 同事的压测结果(下图可见,tps,qps较为稳定且并未出现阻塞情况) 



|  抓取等待事件

看到上文第1节中的结果,如果根据以往的经验、常识来快速判断...估计会懵圈!!当得知同事的测试结果很稳定时,我本能地想:要把等待事件信息拉出来瞧瞧!!下面是我与同事各自在复测时截取的等待事件信息(复测时间为3分钟)(不想仔细看等待事件内容的亲可直接跳至第3节)

operation操作时间统计(每秒查询一次,查询数十次截取时间最长的5次)

# 我的
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.20 s                           | idle      |
| 219.20 ms                        | write     |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.16 s                           | idle      |
| 182.44 ms                        | write     |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.17 s                           | idle      |
| 164.27 ms                        | write     |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.21 s                           | idle      |
| 268.65 ms                        | write     |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 11.78 s                          | fetch     |
| 1.61 s                           | idle      |
+----------------------------------+-----------+
2 rows in set (0.02 sec)
# 同事的
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 21.17 ms                         | idle      |
| 1.58 ms                          | fetch     |
+----------------------------------+-----------+
2 rows in set (0.02 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 185.44 ms                        | idle      |
+----------------------------------+-----------+
1 row in set (0.02 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 141.82 ms                        | lock      |
| 110.90 ms                        | fetch     |
| 3.30 ms                          | idle      |
+----------------------------------+-----------+
3 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 2.91 s                           | idle      |
+----------------------------------+-----------+
1 row in set (0.00 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.20 s                           | idle      |
+----------------------------------+-----------+
1 row in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc
;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 215.34 ms                        | idle      |
+----------------------------------+-----------+
1 row in set (0.01 sec)

等待事件按照时间开销倒序排序,取出超过ms的部分(每秒查询一次,查询数十次截取时间最长的5次)

# 我的
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+----------------------------------------+-----------------+-----------------------------+-----------+
| event_name                             | source          | sys.format_time(timer_wait) | operation |
+----------------------------------------+-----------------+-----------------------------+-----------+
|
wait/io/file/innodb/innodb_data_file   | fil0fil.cc:5778 | 37.31 ms                    | write     |
| wait/io/file/innodb/innodb_data_file   | fil0fil.cc:5778 | 37.08 ms                    | write     |
|
wait/io/file/innodb/innodb_data_file   | fil0fil.cc:5778 | 23.95 ms                    | write     |
| wait/io/file/innodb/innodb_data_file   | fil0fil.cc:5778 | 23.06 ms                    | write     |
|
wait/io/file/innodb/innodb_data_file   | fil0fil.cc:5778 | 20.99 ms                    | write     |
......
+----------------------------------------+-----------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+--------------------------------------+-----------------+-----------------------------+-----------+
| event_name                           | source          | sys.format_time(timer_wait) | operation |
+--------------------------------------+-----------------+-----------------------------+-----------+
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 15.71 ms                    | write     |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 14.94 ms                    | write     |
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 14.22 ms                    | write     |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 13.93 ms                    | write     |
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 13.53 ms                    | write     |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+--------------------------------------+-----------------+-----------------------------+-----------+
| event_name                           | source          | sys.format_time(timer_wait) | operation |
+--------------------------------------+-----------------+-----------------------------+-----------+
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 41.05 ms                    | write     |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 34.67 ms                    | write     |
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 31.63 ms                    | write     |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 31.30 ms                    | write     |
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 28.09 ms                    | write     |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+-------------------------------------+-----------------+-----------------------------+-----------+
| event_name                          | source          | sys.format_time(timer_wait) | operation |
+-------------------------------------+-----------------+-----------------------------+-----------+
|
wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 33.20 ms                    | sync      |
| wait/io/table/sql/handler           | handler.cc:3084 | 26.20 ms                    | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3032 | 23.26 ms                    | fetch     |
| wait/io/table/sql/handler           | handler.cc:3032 | 21.26 ms                    | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3032 | 19.07 ms                    | fetch     |
| wait/io/table/sql/handler           | handler.cc:3084 | 18.55 ms                    | fetch     |
......
|
wait/io/table/sql/handler           | handler.cc:7992 | 12.07 ms                    | insert    |
| wait/io/table/sql/handler           | handler.cc:3084 | 11.96 ms                    | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3032 | 11.31 ms                    | fetch     |
| wait/io/table/sql/handler           | handler.cc:3084 | 11.04 ms                    | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3084 | 10.69 ms                    | fetch     |
| wait/io/table/sql/handler           | handler.cc:7992 | 9.58 ms                     | insert    |
|
wait/io/table/sql/handler           | handler.cc:3032 | 9.55 ms                     | fetch     |
| wait/io/table/sql/handler           | handler.cc:3032 | 9.27 ms                     | fetch     |
|
wait/io/table/sql/handler           | handler.cc:7992 | 9.10 ms                     | insert    |
| wait/io/table/sql/handler           | handler.cc:3084 | 9.01 ms                     | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3032 | 7.69 ms                     | fetch     |
......
| wait/io/table/sql/handler           | handler.cc:3084 | 5.70 ms                     | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3032 | 5.69 ms                     | fetch     |
| wait/io/table/sql/handler           | handler.cc:3032 | 5.63 ms                     | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3084 | 5.63 ms                     | fetch     |
| wait/io/table/sql/handler           | handler.cc:3084 | 5.57 ms                     | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3084 | 5.53 ms                     | fetch     |
| wait/io/table/sql/handler           | handler.cc:7992 | 5.52 ms                     | insert    |
|
wait/io/table/sql/handler           | handler.cc:3084 | 5.51 ms                     | fetch     |
| wait/io/table/sql/handler           | handler.cc:3032 | 5.48 ms                     | fetch     |
|
wait/io/table/sql/handler           | handler.cc:3032 | 5.43 ms                     | fetch     |
+-------------------------------------+-----------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+--------------------------------------+-----------------+-----------------------------+-----------+
| event_name                           | source          | sys.format_time(timer_wait) | operation |
+--------------------------------------+-----------------+-----------------------------+-----------+
|
wait/io/file/innodb/innodb_log_file  | fil0fil.cc:5992 | 33.20 ms                    | sync      |
| wait/io/table/sql/handler            | handler.cc:3032 | 19.07 ms                    | fetch     |
|
wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 18.71 ms                    | write     |
| wait/io/table/sql/handler            | handler.cc:3032 | 14.12 ms                    | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3032 | 13.16 ms                    | fetch     |
| wait/io/table/sql/handler            | handler.cc:3032 | 13.08 ms                    | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3032 | 12.56 ms                    | fetch     |
| wait/io/file/innodb/innodb_log_file  | fil0fil.cc:5992 | 12.23 ms                    | sync      |
|
wait/io/table/sql/handler            | handler.cc:3032 | 12.22 ms                    | fetch     |
| wait/io/table/sql/handler            | handler.cc:3032 | 11.31 ms                    | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3084 | 11.04 ms                    | fetch     |
......
| wait/io/table/sql/handler            | handler.cc:3032 | 5.86 ms                     | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3032 | 5.79 ms                     | fetch     |
| wait/io/table/sql/handler            | handler.cc:3032 | 5.78 ms                     | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3084 | 5.74 ms                     | fetch     |
| wait/io/table/sql/handler            | handler.cc:3032 | 5.71 ms                     | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3032 | 5.71 ms                     | fetch     |
......
| wait/io/table/sql/handler            | handler.cc:3032 | 3.88 ms                     | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3084 | 3.88 ms                     | fetch     |
| wait/io/table/sql/handler            | handler.cc:3084 | 3.86 ms                     | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3032 | 3.85 ms                     | fetch     |
| wait/io/table/sql/handler            | handler.cc:3084 | 3.81 ms                     | fetch     |
|
wait/io/table/sql/handler            | handler.cc:3084 | 3.81 ms                     | fetch     |
| wait/io/table/sql/handler            | handler.cc:3032 | 3.80 ms                     | fetch     |
+--------------------------------------+-----------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
# 同事的
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+---------------------------------------------+-------------------+-----------------------------+-----------+
|
event_name                                  | source            | sys.format_time(timer_wait) | operation |
+---------------------------------------------+-------------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_log_file         | fil0fil.cc:5992   | 16.18 ms                    | sync      |
|
wait/io/file/innodb/innodb_log_file         | fil0fil.cc:5992   | 5.34 ms                     | sync      |
......
+---------------------------------------------+-------------------+-----------------------------+-----------+
50 rows in set (0.01 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+---------------------------------------+--------------------+-----------------------------+-----------+
| event_name                            | source             | sys.format_time(timer_wait) | operation |
+---------------------------------------+--------------------+-----------------------------+-----------+
|
wait/synch/mutex/innodb/trx_sys_mutex | ha_innodb.cc:15727 | 1.87 ms                     | lock      |
| wait/synch/mutex/innodb/trx_sys_mutex | ha_innodb.cc:15727 | 1.64 ms                     | lock      |
|
wait/io/table/sql/handler             | handler.cc:3084    | 1.63 ms                     | fetch     |
| wait/io/table/sql/handler             | handler.cc:3084    | 1.63 ms                     | fetch     |
|
wait/synch/mutex/innodb/trx_sys_mutex | read0read.cc:584   | 1.61 ms                     | lock      |
| wait/synch/mutex/innodb/trx_sys_mutex | read0read.cc:584   | 1.57 ms                     | lock      |
......
+---------------------------------------+--------------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+------------------------------------------+--------------------+-----------------------------+-----------+
|
event_name                               | source             | sys.format_time(timer_wait) | operation |
+------------------------------------------+--------------------+-----------------------------+-----------+
| wait/synch/mutex/innodb/lock_mutex       | lock0lock.cc:6844  | 1.27 ms                     | lock      |
|
wait/synch/mutex/innodb/lock_mutex       | lock0lock.cc:6047  | 1.19 ms                     | lock      |
| wait/io/table/sql/handler                | handler.cc:3032    | 1.19 ms                     | fetch     |
|
wait/io/table/sql/handler                | handler.cc:8033    | 1.13 ms                     | update    |
| wait/io/table/sql/handler                | handler.cc:3084    | 1.07 ms                     | fetch     |
|
wait/synch/mutex/innodb/trx_sys_mutex    | read0read.cc:584   | 1.05 ms                     | lock      |
| wait/io/table/sql/handler                | handler.cc:8066    | 1.03 ms                     | delete    |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+---------------------------------------+-------------------+-----------------------------+-----------+
|
event_name                            | source            | sys.format_time(timer_wait) | operation |
+---------------------------------------+-------------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_log_file   | fil0fil.cc:5992   | 14.18 ms                    | sync      |
|
wait/io/table/sql/handler             | handler.cc:3084   | 4.72 ms                     | fetch     |
| wait/io/file/innodb/innodb_log_file   | fil0fil.cc:5992   | 4.20 ms                     | sync      |
|
wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.68 ms                     | lock      |
| wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.66 ms                     | lock      |
|
wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.56 ms                     | lock      |
| wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.49 ms                     | lock      |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+-------------------------------------------+------------------+-----------------------------+----------------+
|
event_name                                | source           | sys.format_time(timer_wait) | operation      |
+-------------------------------------------+------------------+-----------------------------+----------------+
| wait/synch/sxlock/innodb/fil_space_latch  | fsp0fsp.cc:3365  | 7.84 ms                     | exclusive_lock |
|
wait/io/table/sql/handler                 | handler.cc:3032  | 1.00 ms                     | fetch          |
......
+-------------------------------------------+------------------+-----------------------------+----------------+
50 rows in set (0.01 sec)


按照等待事件类型统计总数、平均、最大、最小等待时长(只取出前50)

# 我的
mysql> select * from sys.waits_global_by_latency limit 50;
+-----------------------------------------------------------+-----------+---------------+-------------+-------------+
| events                                                    | total     | total_latency | avg_latency | max_latency |
+-----------------------------------------------------------+-----------+---------------+-------------+-------------+
| wait/io/table/sql/handler                                 |  50405591 | 9.64 h        | 688.46 us   | 11.26 s     |
| wait/synch/mutex/innodb/trx_sys_mutex                     |   6815615 | 6.61 m        | 58.20 us    | 25.66 ms    |
| wait/io/socket/sql/client_connection                      |   8527800 | 1.64 m        | 11.53 us    | 11.44 ms    |
| wait/synch/sxlock/innodb/fil_space_latch                  |     22637 | 1.44 m        | 3.83 ms     | 10.36 s     |
| wait/synch/sxlock/innodb/dict_operation_lock              |        90 | 1.40 m        | 934.44 ms   | 9.70 s      |
| wait/io/file/innodb/innodb_data_file                      |    204837 | 1.22 m        | 358.33 us   | 112.80 ms   |
| wait/synch/mutex/innodb/sync_array_mutex                  |   1863469 | 1.03 m        | 33.24 us    | 10.80 ms    |
| wait/synch/mutex/innodb/log_sys_write_mutex               |    602223 | 59.64 s       | 99.03 us    | 11.26 ms    |
| wait/io/file/innodb/innodb_log_file                       |      8158 | 46.84 s       | 5.74 ms     | 82.44 ms    |
| wait/synch/mutex/sql/LOCK_table_cache                     |   7534927 | 46.66 s       | 6.19 us     | 11.16 ms    |
| wait/synch/sxlock/innodb/index_tree_rw_lock               |   5966049 | 44.10 s       | 7.39 us     | 10.43 s     |
| wait/synch/mutex/sql/THD::LOCK_thd_data                   | 142798237 | 22.75 s       | 159.20 ns   | 2.52 ms     |
......
50 rows in set (0.04 sec)
# 同事的
mysql> select * from sys.waits_global_by_latency limit 50;
+-----------------------------------------------------+-----------+---------------+-------------+-------------+
| events                                              | total     | total_latency | avg_latency | max_latency |
+-----------------------------------------------------+-----------+---------------+-------------+-------------+
| wait/io/table/sql/handler                           | 280178382 | 32.65 m       | 6.99 us     | 324.56 ms   |
| wait/synch/mutex/innodb/trx_sys_mutex               |  24005580 | 15.87 m       | 39.66 us    | 36.95 ms    |
| wait/io/socket/sql/client_connection                |  28521546 | 5.47 m        | 11.52 us    | 21.09 ms    |
| wait/synch/mutex/innodb/log_sys_write_mutex         |   1383193 | 3.18 m        | 137.81 us   | 15.70 ms    |
| wait/io/file/innodb/innodb_log_file                 |     22238 | 2.82 m        | 7.60 ms     | 456.05 ms   |
| wait/synch/mutex/innodb/sync_array_mutex            |   5195017 | 2.64 m        | 30.52 us    | 10.90 ms    |
| wait/synch/mutex/sql/LOCK_table_cache               |  24012569 | 2.26 m        | 5.65 us     | 23.75 ms    |
| wait/synch/mutex/sql/THD::LOCK_thd_data             | 621837619 | 1.69 m        | 162.80 ns   | 3.15 ms     |
| wait/synch/mutex/sql/THD::LOCK_query_plan           | 844318371 | 1.61 m        | 114.00 ns   | 6.14 ms     |
| wait/synch/mutex/innodb/log_sys_mutex               |   9348544 | 1.33 m        | 8.55 us     | 13.32 ms    |
| wait/synch/mutex/sql/THD::LOCK_thd_query            | 444158105 | 46.16 s       | 103.60 ns   | 3.11 ms     |
| wait/lock/table/sql/handler                         |  12006244 | 31.98 s       | 2.66 us     | 5.55 ms     |
| wait/synch/mutex/innodb/lock_mutex                  |   8701428 | 29.39 s       | 3.38 us     | 8.93 ms     |
| wait/synch/mutex/innodb/redo_rseg_mutex             |   2670852 | 27.70 s       | 10.37 us    | 242.76 ms   |
| wait/io/file/innodb/innodb_data_file                |     73925 | 25.97 s       | 351.35 us   | 97.23 ms    |
......
50 rows in set (0.05 sec)



|  利用等待事件分析差异

通过对上文第2节抓取的等待事件进行仔细地对比,可以看到在我的压测结果的等待事件中,对表的write(对表的写入)、fetch(可以理解为对表的读取)、sync(表数据落盘)等操作普遍在10ms以上(甚至高达37ms),而同事的压测结果的等待事件中,对表的write、fetch、sync等操作ms级别以上很少,有ms级别以上的也在2ms以内,从等待事件类型统计数据中也可以看到,wait/io/table/sql/handler 等待事件在我的测试结果中平均等待时间为688.46 us ,而同事的是6.99 us,而我的最大等待时间为11.26 s ,同事的为324.56 ms。

那么,问题来了:为什么会有这个差异?

百思不得其解,只能继续压测,然而无意间却发现,同事的环境在压测的时候数据量居然不增长...

# 同事的(从下面5次查询sbtest1表的id最大值我们可以看到,都为5159769
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:28 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:31 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:33 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:35 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:39 |
+---------+---------------------+
1 row in set (0.00 sec)
# 而我的(从下面5次查询sbtest1表的id最大值可以看到,5次分别为51675595169955517224551742695175955
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5167559 | 2001-01-25 01:48:29 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5169955 | 2001-01-25 01:48:31 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5172245 | 2001-01-25 01:48:33 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5174269 | 2001-01-25 01:48:36 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now()               |
+---------+---------------------+
| 5175955 | 2001-01-25 01:48:38 |
+---------+---------------------+
1 row in set (0.00 sec)


why?

经过与同事重新仔细确认,唯一的差别就是sysbench的测试命令不同(sysbench是同事编译安装的,我没注意到这个细节,阴差阳错造成了使用了不同的lua脚本

# 我的是使用的安装目录/usr/local/share/sysbench目录下的oltp.lua脚本
sysbench --test=/usr/local/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-table-engine=innodb --mysql-db=sysbench_test  --oltp-table-size=5000000 --oltp-tables-count=8 --mysql-host=172.16.130.2 --mysql-port=3307 --mysql-user=root --mysql-password='password' --report-interval=1 --threads=256 --time=3600 run
# 同事的直接使用了编译目录/root/sysbench/sysbench-1.0.9下的oltp.lua脚本
sysbench --test=/root/sysbench/sysbench-1.0.9/tests/include/oltp_legacy/oltp.lua --mysql-table-engine=innodb --mysql-db=sysbench_test  --oltp-table-size=5000000 --oltp-tables-count=8 --mysql-host=172.16.130.2 --mysql-port=3307 --mysql-user=root --mysql-password='password' --report-interval=1 --threads=256 --time=3600 run


经过对比两个lua脚本文件的差异,发现有一行不相同

# 我的
for i=1, oltp_delete_inserts do
i = sb_rand(1, oltp_table_size)
rs = db_query("DELETE FROM " .. table_name .. " WHERE id=" .. i)
c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]
)
pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]]
)
## 不同的地方在这里,由于为了避免主键冲突,把lua生成的随机主键值i变量删了,让数据库利用表自身的自增属性来填充主键值
rs = db_query("INSERT INTO " .. table_name ..  " ( k, c, pad) VALUES " .. string.format("(%d, '%s', '%s')",sb_rand(1, oltp_table_size) , c_val, pad_val))
end
# 同事的
for i=1, oltp_delete_inserts do
i = sb_rand(1, oltp_table_size)
rs = db_query("DELETE FROM " .. table_name .. " WHERE id=" .. i)
c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]
)
pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]]
)
## 同事的并未改动这一行,默认情况下,自增主键值是使用lua脚本生成的i变量(即1, oltp_table_size之间的随机数 )
rs = db_query("INSERT INTO " .. table_name ..  " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')",i, sb_rand(1, oltp_table_size) , c_val, pad_val))
end



|  结论推导

仔细看第3节中的lua脚本代码,delete和insert两个操作被放进了同一个for循环中(其他的update_index和update_non_index是各自单独的一个for循环,互不影响),也就是说:

  • 使用lua脚本随机生成主键值时:假设当delete操作删除id=1的数据行时,紧接着insert也会使用相同的id=1的主键值。所以高概率会出现innodb重复使用delete数据行所在的页来存放insert数据,在sysbench高并发压力下,大部分的insert数据存储可能只需要在内存中已存在的页内操作即可,无需太多IO操作

  • 不使用lua脚本随机生成的主键值,而是使用表的自增属性生成主键值时:假设当delete操作删除id=1的数据行时,紧接着insert由于是表自增属性自己生成,也就是说几乎不太可能id=1,所以高概率会出现innodb重新申请一个数据页来写入insert数据,在sysbench高并发压力下,大部分的insert数据存储可能需要从磁盘文件中重新申请空间,IO操作较为频繁

至此,我和同事的压测结果有巨大差异的原因大致确定,后续经过反复的验证,也确认了是由于oltp.lua脚本的微小差异导致的。由于我们大多数时候都是使用的本地盘,而本地盘IO延迟低,通常情况下sysbench压测时这点微小的差异容易被忽略。而在此案例中,由于我们测试的环境中使用了某存储设备,相对于本地盘,IO延迟大大增加,进而造成了因为oltp.lua脚本的微小差异而导致最后压测结果的巨大差异。

提示:如果不想改动lua脚本,又想避免主键冲突有办法解决吗?有的,从sysbench 0.5版本开始,新增了一个隐藏选项--mysql-ignore-errors,用于忽略指定的错误,如果要避免主键冲突,指定选项--mysql-ignore-errors=1062 即可。


|  作者简介

罗小波·沃趣科技高级数据库技术专家

IT从业多年,历任运维工程师、高级运维工程师、运维经理、数据库工程师,曾参与版本发布系统、轻量级监控系统、运维管理平台、数据库管理平台的设计与编写,熟悉MySQL体系结构,Innodb存储引擎,喜好专研开源技术,追求完美。

猜你喜欢

转载自blog.csdn.net/woqutechteam/article/details/80510895