As a DBA, it is inevitable to encounter performance problems (especially performance problems such as disk IO), so how should we troubleshoot performance problems when we encounter them? For example, in a high-concurrency business, the business response is slow and the processing time is long. How should we start to troubleshoot and solve the problem? This article from the technical community "Fault Analysis | Linux disk IO utilization is high, the correct posture for analysis"》Will explain how to analyze and position when IO is high.
1. Environmental recurrence
Environment configuration: This test uses the 128C_512G_4TSSD server configuration, and the MySQL version is 8.0.27.
Scenario simulation: Use sysbench to create 5 tables with 200 million pieces of data in each table, execute the SQL statement that generates Cartesian product query, and generate IO, which can simulate business pressure. First use sysbench to perform data stress testing
2. Bottom-level troubleshooting at the system level
Shell> sysbench --test=/usr/local/share/sysbench/oltp_insert.lua --mysql-host=XXX --mysql-port=3306 --mysql-user=pcms --mysql-password=abc123 --mysql-db=sysbench --percentile=99 --table-size=2000000000 --tables=5 --threads=1000 prepare
Use sysbench to simulate high concurrency,
shell> sysbench --test=/usr/local/share/sysbench/oltp_write_only.lua --mysql-host=xxx --mysql-port=3306 --mysql-user=pcms --mysql-password=abc123 --mysql-db=sysbench --percentile=99 --table-size=2000000000 --tables=5 --threads=1000 --max-time=60000 --report-interval=1 --threads=1000 --max-requests=0 --mysql-ignore-errors=all run
Execute Cartesian product sql statement,
mysql> select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id group by a.k order by b.c desc;
2.1 Check current server status
It can be seen from the above: the current one-minute load is 72.56, and it is on an upward trend, and there is io pressure.
2.2 Check the current IO status of each disk device
It can be seen from the above: There are currently multiple physical disks, and the IO pressure of the sda disk is relatively high.
2.3 Check the current io reading and writing status of the sda disk
It can be seen from the above that the current pressure on the SDA disk is relatively high, and the gap between writes per second and reads per second is large, which proves that there are currently a large number of IO writes.
2.4 Check which application in the sda disk takes up higher IO
It can be seen from the above: the application that takes up a lot of io is mysql, and the pid is 73739
2.5 Analyze which thread in the application takes up higher IO
It can be seen from the above: the IO occupied by the thread 74770 is relatively high.
2.6 Analyze what this thread is doing?
It can be seen from the above: This thread is currently writing multiple files, fd is the file handle, and the file handle numbers are 64 and 159.
2.7 Check what this file handle is
shell> lsof -p 73739|grep 159u
mysqld 73739 mysql 159u REG 8,0 212143246 7046482357 /mysql/mysqldata/16320fff-5fd5-4c47-889a-a9e1a8591d0d/tmp/#7046482357 (deleted)
[root@mysql-4 ~]# lsof -p 73739|grep 64u
mysqld 73739 mysql 64u REG 8,0 211872724 6979323031 /mysql/mysqldata/16320fff-5fd5-4c47-889a-a9e1a8591d0d/tmp/#6979323031 (deleted)
It can be seen from the above: this thread is writing a large number of temporary files.
3. Analyzing MySQL Applications
3.1 View the current session list
mysql> select * from information_schema.processlist where command !='sleep';
| 9 | pcms | 172.16.76.12:57596 | sysbench | Query | 67 | executing | select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id group by a.k order by b.c desc | 66477 | 0 | 0 |
It can be seen from the above: this sql has been executed for 67 seconds, and this sql uses group by and order by, which will inevitably generate io.
3.2 Query sessions by thread number
It can be seen from the above that it can be verified by querying the threads table. The reason why the thread frequently creates temporary tables comes from this SQL.
3.3 View the execution plan of the sql statement for further authentication
It can be seen from the above: the execution plan of this sql uses temporary tables and temporary files, which is consistent.
3.4 Check the global status for further confirmation
After executing it several times, you can see that the values of tmp_files and tmp_disk_tables are growing, which proves that a large number of temporary files and disk temporary tables are created, which is consistent with the behavior of this thread.
4. Troubleshooting
Through the above series of investigations, we have analyzed that the IO usage of the sda disk is currently the highest, and the mysqld program takes up the most. Through the investigation, there is a thread that frequently creates temporary tables or temporary files, and the session and thread are checked by logging in to mysql. The view can find that it is caused by a certain slow SQL. Checking the execution plan of this slow SQL will also create temporary tables and temporary files, which is in line with our previous investigation expectations. At this time, we need to optimize this slow SQL. The optimization step is handled by the DBA and is ignored here. After the slow SQL optimization is completed, you can continue to observe the IO to see if there is a decrease in IO.
5. Code analysis
We can use pstack to track the thread number and obtain the current thread stack information. Remember that pstack will call gdb for debugging.
If you think this article is helpful, please feel free to click "Like" and "Reading" at the end of the article, or forward it directly to pyq,
Recently updated articles:
" Can the size of SQL transactions exported by mysqldump be controlled? 》
" MySQL remote login prompts Access denied scenario "
" Usage scenarios of JDBC connection parameter useCursorFetch "
" Scenarios of Index Creation Error in MySQL "
Recent hot articles:
" Recommend a classic paper on Oracle RAC Cache Fusion "
" The shock that the open source code of the "Red Alert" game brings to us "
Article classification and indexing:
《Classification and indexing of 1,300 articles on public accounts》