Troubleshooting and solution paths for scenarios with high Linux disk IO utilization

As a DBA, it is inevitable to encounter performance problems (especially performance problems such as disk IO), so how should we troubleshoot performance problems when we encounter them? For example, in a high-concurrency business, the business response is slow and the processing time is long. How should we start to troubleshoot and solve the problem? This article from the technical community "Fault Analysis | Linux disk IO utilization is high, the correct posture for analysis"》Will explain how to analyze and position when IO is high.

1. Environmental recurrence

Environment configuration: This test uses the 128C_512G_4TSSD server configuration, and the MySQL version is 8.0.27.

Scenario simulation: Use sysbench to create 5 tables with 200 million pieces of data in each table, execute the SQL statement that generates Cartesian product query, and generate IO, which can simulate business pressure. First use sysbench to perform data stress testing

2. Bottom-level troubleshooting at the system level

Shell> sysbench --test=/usr/local/share/sysbench/oltp_insert.lua --mysql-host=XXX --mysql-port=3306 --mysql-user=pcms --mysql-password=abc123 --mysql-db=sysbench --percentile=99 --table-size=2000000000 --tables=5 --threads=1000 prepare

Use sysbench to simulate high concurrency,

shell> sysbench --test=/usr/local/share/sysbench/oltp_write_only.lua --mysql-host=xxx --mysql-port=3306 --mysql-user=pcms --mysql-password=abc123 --mysql-db=sysbench --percentile=99 --table-size=2000000000 --tables=5 --threads=1000 --max-time=60000 --report-interval=1 --threads=1000 --max-requests=0 --mysql-ignore-errors=all run

Execute Cartesian product sql statement,

mysql> select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id  group by a.k order by b.c desc;

2.1 Check current server status

77c2fac47ef7cd6d9cadf2da582c7f2c.png

It can be seen from the above: the current one-minute load is 72.56, and it is on an upward trend, and there is io pressure.

2.2 Check the current IO status of each disk device

e94996a4d54eb211943e5a12cba6367f.png

It can be seen from the above: There are currently multiple physical disks, and the IO pressure of the sda ​​disk is relatively high.

2.3 Check the current io reading and writing status of the sda ​​disk

2a7572044475060497c331df14329345.png

It can be seen from the above that the current pressure on the SDA disk is relatively high, and the gap between writes per second and reads per second is large, which proves that there are currently a large number of IO writes.

2.4 Check which application in the sda ​​disk takes up higher IO

f6439b993e31a27382000710c1a00037.png

It can be seen from the above: the application that takes up a lot of io is mysql, and the pid is 73739

2.5 Analyze which thread in the application takes up higher IO

c40e2a93fc99703df61a4e19c1d18fd6.png

It can be seen from the above: the IO occupied by the thread 74770 is relatively high.

2.6 Analyze what this thread is doing?

5e67301a3b98e5d71859b5dfbd54947e.png

It can be seen from the above: This thread is currently writing multiple files, fd is the file handle, and the file handle numbers are 64 and 159.

2.7 Check what this file handle is

shell> lsof -p 73739|grep 159u
mysqld 73739 mysql  159u   REG                8,0   212143246  7046482357 /mysql/mysqldata/16320fff-5fd5-4c47-889a-a9e1a8591d0d/tmp/#7046482357 (deleted)
[root@mysql-4 ~]# lsof -p 73739|grep 64u
mysqld 73739 mysql   64u   REG                8,0   211872724  6979323031 /mysql/mysqldata/16320fff-5fd5-4c47-889a-a9e1a8591d0d/tmp/#6979323031 (deleted)

It can be seen from the above: this thread is writing a large number of temporary files.

3. Analyzing MySQL Applications

3.1 View the current session list

mysql> select * from information_schema.processlist where command !='sleep';
|  9 | pcms             | 172.16.76.12:57596 | sysbench | Query            |   67 | executing                                                     | select SQL_NO_CACHE b.id,a.k from sbtest_a a left join sbtest_b b on a.id=b.id  group by a.k order by b.c desc |   66477 |         0 |             0 |

It can be seen from the above: this sql has been executed for 67 seconds, and this sql uses group by and order by, which will inevitably generate io.

3.2 Query sessions by thread number

9c20723b55a711062f17f89747a2bf5c.png

It can be seen from the above that it can be verified by querying the threads table. The reason why the thread frequently creates temporary tables comes from this SQL.

3.3 View the execution plan of the sql statement for further authentication

af546dd971e5fd7378242fa65e72dfe4.png

It can be seen from the above: the execution plan of this sql uses temporary tables and temporary files, which is consistent.

3.4 Check the global status for further confirmation

df008a0914562084ec95b28ef145d6e8.png

After executing it several times, you can see that the values ​​of tmp_files and tmp_disk_tables are growing, which proves that a large number of temporary files and disk temporary tables are created, which is consistent with the behavior of this thread.

4. Troubleshooting

Through the above series of investigations, we have analyzed that the IO usage of the sda ​​disk is currently the highest, and the mysqld program takes up the most. Through the investigation, there is a thread that frequently creates temporary tables or temporary files, and the session and thread are checked by logging in to mysql. The view can find that it is caused by a certain slow SQL. Checking the execution plan of this slow SQL will also create temporary tables and temporary files, which is in line with our previous investigation expectations. At this time, we need to optimize this slow SQL. The optimization step is handled by the DBA and is ignored here. After the slow SQL optimization is completed, you can continue to observe the IO to see if there is a decrease in IO.

5. Code analysis

We can use pstack to track the thread number and obtain the current thread stack information. Remember that pstack will call gdb for debugging.

51253070dfa11d9fa6ff58bc365c65bc.png

8ee5e411b350704ab0f2430f883ad37d.png

If you think this article is helpful, please feel free to click "Like" and "Reading" at the end of the article, or forward it directly to pyq,

d4e37b2a46b00341a6af97ae754ab82b.png

Recently updated articles:

" MySQL Query Optimization "

" Can the size of SQL transactions exported by mysqldump be controlled?

" MySQL remote login prompts Access denied scenario "

" Usage scenarios of JDBC connection parameter useCursorFetch "

" Scenarios of Index Creation Error in MySQL "

Recent hot articles:

" Recommend a classic paper on Oracle RAC Cache Fusion "

" The shock that the open source code of the "Red Alert" game brings to us "

Article classification and indexing:

Classification and indexing of 1,300 articles on public accounts

Guess you like

Origin blog.csdn.net/bisal/article/details/133326496