经常会遇到这样一个场景:
业务那边觉得数据库“很慢”,上去通过show processlist查看发现大量State为在等待lock,如:
Waiting for table metadata/level lock等
比如在执行一个DDL时,发现被hang住,查看到目前进程状态,有MDL
- mysql> SHOW PROCESSLIST;
- +-----+------+-----------+------+---------+------+---------------------------------+--------------------------------------------+
- | Id | User | Host | db | Command | Time | State | Info |
- +-----+------+-----------+------+---------+------+---------------------------------+--------------------------------------------+
- | 585 | root | localhost | test | Sleep | 1658 | | NULL |
- | 586 | root | localhost | test | Query | 1654 | Waiting for table metadata lock | alter table t change name name varchar(32) |
- | 590 | root | localhost | test | Query | 0 | starting | show processlist |
- +-----+------+-----------+------+---------+------+---------------------------------+--------------------------------------------+
- 3 rows in set (0.00 sec)
在实验环境里,我们很快就能定位到,应该是id为585这个线程,但是无法知道正在执行什么sql:
通过information_schema.innodb_trx\G,也不能查询到具体执行了什么sql。
通过简单的kill的确可以解决眼前的问题,但如果继续遇到该问题,也难以定位具体内容。
但其实,performance_schema.events_statements_current提供了相关信息,此处复现一下:
- session1> BEGIN;
- Query OK, 0 rows affected (0.00 sec)
- session1> UPDATE t SET name='fasdfsad';
- Query OK, 3 rows affected (0.00 sec)
- Rows matched: 3 Changed: 3 Warnings: 0
- session2> ALTER TABLE t CHANGE name name varchar(32)
发现被hang住
查看一下是否有事务未提交,可以发现的确有:
该事务内的语句执行完毕(处于Sleep),但未提交,就会看不到对应的trx_query:
- session3> SELECT * FROM information_schema.innodb_trx\G
- *************************** 1. row ***************************
- trx_id: 9614
- trx_state: RUNNING
- trx_started: 2017-09-19 15:58:05
- trx_requested_lock_id: NULL
- trx_wait_started: NULL
- trx_weight: 2
- trx_mysql_thread_id: 585
- trx_query: NULL
- trx_operation_state: NULL
- trx_tables_in_use: 0
- trx_tables_locked: 1
- trx_lock_structs: 2
- trx_lock_memory_bytes: 1136
- trx_rows_locked: 4
- trx_rows_modified: 0
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 0
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- 1 row in set (0.00 sec)
只能根据trx_mysql_thread_id看到未提交的事务的process id,看一下processlist,INFO内也没有具体内容:
- session3> SHOW PROCESSLIST;
- +-----+------+-----------+------+---------+------+---------------------------------+--------------------------------------------+
- | Id | User | Host | db | Command | Time | State | Info |
- +-----+------+-----------+------+---------+------+---------------------------------+--------------------------------------------+
- | 585 | root | localhost | test | Sleep | 42 | | NULL |
- | 586 | root | localhost | test | Query | 37 | Waiting for table metadata lock | ALTER TABLE t CHANGE name name varchar(32) |
- | 590 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST |
- +-----+------+-----------+------+---------+------+---------------------------------+--------------------------------------------+
- 3 rows in set (0.00 sec)
但只要打开了P_S,就可以通过performance_schema.events_statements_current来查看到对应的sql,包括已经执行完,但没有提交的。
- session3> SELECT sql_text FROM performance_schema.events_statements_current;
- +-------------------------------------------------------------------+
- | sql_text |
- +-------------------------------------------------------------------+
- | UPDATE t SET name='fasdfsad' |
- | ALTER TABLE t CHANGE name name varchar(32) |
- | select sql_text from performance_schema.events_statements_current |
- +-------------------------------------------------------------------+
- 3 rows in set (0.00 sec)
当然,在复杂的生产环境中,光凭上面的语句查出来的信息,是远远不够的。
通过如下语句,可以扩展show processlist的显示结果,并提供对应的SQL。
- SELECT b.processlist_id, c.db, a.sql_text, c.command, c.time, c.state
- FROM performance_schema.events_statements_current a JOIN performance_schema.threads b USING(thread_id)
- JOIN information_schema.processlist c ON b.processlist_id = c.id
- WHERE a.sql_text NOT LIKE '%performance%';
结果:
- +----------------+------+--------------------------------------------+---------+------+---------------------------------+
- | processlist_id | db | sql_text | command | time | state |
- +----------------+------+--------------------------------------------+---------+------+---------------------------------+
- | 585 | test | UPDATE t SET name='fasdfsad' | Sleep | 243 | |
- | 586 | test | ALTER TABLE t CHANGE name name varchar(32) | Query | 238 | Waiting for table metadata lock |
- +----------------+------+--------------------------------------------+---------+------+---------------------------------+
- 2 rows in set (0.01 sec)
也可以很容易定位到执行的内容是做了一个update操作未提交。
总结:这个查出来的貌似或包含以前的processid,单超过一天貌似会自动删除。可以配置如下
lock_wait_timeout 表示获取metadata lock的超时(单位为秒),允许的值范围为1到31536000(1年)。 默认值为31536000。
set session lock_wait_timeout = 1800;
set global lock_wait_timeout = 1800;
还有就是mysqldump会导致锁表所以在dump的时候需要加上参数mysqldump --single-transaction=TRUE -u root -p db > C:\FileLocation。