Step 1. Locate the problem
System waiting can often reflect system problems intuitively. Through some common types of waiting, the system bottleneck can also be found, combined with performance counters, it is often more accurate to locate.Common wait types
-
-
- CXPACKET: Occurs when trying to synchronize query processors to exchange iterators. Consider reducing the degree of parallelism if contention for this type of wait becomes an issue.
- IO_COMPLETION : Occurs while waiting for an I/O operation to complete. Typically, this wait type represents non-data page I/O.
- PAGEIOLATCH_ : Occurs when a task is waiting for a latch on a buffer in an I/O request.
- PAGELATCH_ : Occurs when a task is waiting for a buffer latch that is not in an I/O request.
- LCK_ : Occurs while waiting for a latch.
- ASYNC_NETWORK_IO : Appears in network writes when the task is blocked behind the network. Verify that the client is processing data from the server.
- OLEDB : Occurs when SQL Server calls the Microsoft SQL Native Client OLE DB provider. This wait type is not used for synchronization. Instead, it is used to indicate the duration of the call to the OLE DB provider
- WRITELOG : Appears when waiting for log flushing to complete. Common operations that cause log flushes are checkpoints and transaction commits.
-
Step 2. Analysis
CXPACKET
CXPACKET This wait can be simply understood as a CPU-related wait, which mainly occurs in parallel plans. Since the parallel plan needs to cooperate with multiple tasks to work at the same time, it is this wait that occurs when the "coordinated" allocation and other operations are performed.
If CXPACKET is the most serious wait in your system, then the general performance is that your CPU is very high.
Solution: Adjust the degree of parallelism appropriately
It is generally recommended that if the system has more than 32 CPUs, it should be set to 8 or 4. If there are very short and frequent statements in the system, it is recommended to set it to 1.
The threshold of parallel overhead mainly controls when the SQL optimizer selects a parallel plan. The default value is recommended. The smaller the value is, the easier it is for the optimizer to select a parallel plan.
The setting of the degree of parallelism is the setting for the instance level (in 2016, it can be set for a single database)
I 类
The three waits IO_COMPLETION and PAGEIOLATCH_ and WRITELOG are the most common disk-related waits. Their difference is that IO_COMPLETION is mainly for non-data page I/O , such as disk interactions required for backup operations. PAGEIOLATCH_ is the disk wait associated with the data page. WRITELOG is log related.
If the three waits in the system are the main waits, it means that the system disk is under pressure or has become a bottleneck.
Here we use PAGEIOLATCH_ as an example to illustrate
Official explanation of PAGEIOLATCH_: Occurs when a task is waiting for a latch on a buffer in an I/O request. Latch request is in "XX" mode. Long waits may indicate a problem with the disk subsystem.
Related waits for PAGEIOLATCH_:
PAGEIOLATCH_DT |
Occurs when a task is waiting for a latch on a buffer in an I/O request. Latch requests are in "destroy" mode. Long waits may indicate a problem with the disk subsystem. |
PAGEIOLATCH_EX |
Occurs when a task is waiting for a latch on a buffer in an I/O request. The latch request is in "exclusive" mode. Long waits may indicate a problem with the disk subsystem. |
PAGEIOLATCH_KP |
Occurs when a task is waiting for a latch on a buffer in an I/O request. Latch requests are in "hold" mode. Long waits may indicate a problem with the disk subsystem. |
PAGEIOLATCH_NL |
For internal use only. |
PAGEIOLATCH_SH |
Occurs when a task is waiting for a latch on a buffer in an I/O request. The latch request is in "shared" mode. Long waits may indicate a problem with the disk subsystem. |
PAGEIOLATCH_UP |
Occurs when a task is waiting for a latch on a buffer in an I/O request. The latch request is in "update" mode. Long waits may indicate a problem with the disk subsystem. |
How to understand this official explanation? First of all, it is clear that any data operated by the operating system CPU is read from memory, that is to say, reading data must go through such a path:
-
In Disk --> In Memory --> End Use
The PAGEIOLATCH_ here is what happens, in disk --> in memory
Take reading as an example: the data page to be read is not in memory, so it is necessary to read this part of the data page from the disk. When reading data from the disk, a related wait of PAGEIOLATCH_ will be generated. If the disk pressure is high, If the data cannot be returned for a long time, the longer the PAGEIOLATCH_ time will be, and the longer the statement execution time will be.
Note: When your system has a large number of PAGEIOLATCH_ classes waiting, it means that your disk may be under pressure (the disk speed cannot meet the current business needs) or your memory is not enough to cache common business data and often interact with the disk!
WRITELOG Another disk-related wait state is waiting for log records to be written, which means that the write speed is also significantly slower. There are generally two situations in which the speed cannot keep up: the disk pressure is large and the response time is long or the real speed cannot meet the needs of reading and writing.
PAGELATCH_
PAGELATCH_ and PAGEIOLATCH_ described above seem very similar, but the key IO is missing in the middle .
-
In Disk --> In Memory --> End Use
In disk --> in-memory wait is PAGEIOLATCH_ and in memory --> final used wait is PAGELATCH_
When the data is already in memory, SQL SERVER wants to use the data page to lock the data page.
When there are many PAGELATCH_ waits in the wait, then it can be explained:
- SQL Server has no apparent memory and disk bottlenecks.
- The application sends a large number of concurrent statements to modify the records in the same table, and the table schema design and user business logic make these modifications concentrated on the same page, or a small number of pages. These pages are sometimes called Hot Pages . Such bottlenecks usually only occur on typical OLTP systems with many concurrent users.
- This bottleneck cannot be solved by improving the hardware configuration. Only by modifying the table design or business logic and spreading the modifications to as many pages as possible can the concurrent performance be improved.
PAGELATCH_ caused by TempDB (in fact, it is also a kind of Hot Page), here is a simple example:
There are a lot of PAGELATCH_UP waiting in the system So what makes a Hot Page ? Why is it related to TempDB?
Waiting for the resource "2:X:X:" starts with TempDB. There are a large number of high concurrent statements in the system that use temporary tables and table variables, which causes the TEMPDB bottleneck. See: Diagnosis and Optimization of TempDB .
LCK_
There are many of the LCK_ types. If this kind of waiting exists in a large number in the system, it can indicate that the mutual blocking between system statements is serious. As we all know when you update a table, your select will be blocked until the update is complete. I won't introduce too many scenarios here, but let's take a look at the main methods to solve this kind of waiting:
-
- Statement optimization makes statements execute faster and reduces waiting time.
- Use batch operations instead of loops.
- Minimize the length of transactions.
- Try lowering the transaction isolation level.
- None of the above can be alleviated... Please choose read-write separation.
The LCK_ type contains: (not explained in detail here)
LCK_M_RIn_NL |
Occurs when a task is waiting to acquire a NULL lock on the current key value and an insert range lock between the current key and the previous key. A NULL lock on a key is a lock that is released immediately. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RIn_S |
Occurs when a task is waiting to acquire a shared lock on the current key value and an insert range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RIn_U |
The task is waiting to acquire the update lock on the current key value and the insert range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RIn_X |
Occurs when a task is waiting to acquire an exclusive lock on the current key value and an insert range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RS_S |
Occurs when a task is waiting to acquire a shared lock on the current key value and a shared range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RS_U |
Occurs when a task is waiting to acquire an update lock on the current key value and an update range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RX_S |
Occurs when a task is waiting to acquire a shared lock on the current key value and an exclusive range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RX_U |
Occurs when a task is waiting to acquire an update lock on the current key value and an exclusive range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_RX_X |
Occurs when a task is waiting to acquire an exclusive lock on the current key value and an exclusive range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_S |
Occurs when a task is waiting to acquire a shared lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_SCH_M |
Occurs when a task is waiting to acquire a schema modification lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_SCH_S |
Occurs when a task is waiting to acquire a schema shared lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_SIU |
Occurs when a task is waiting to acquire a shared intent update lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_SIX |
Occurs when a task is waiting to acquire a shared intent exclusive lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_U |
Occurs when a task is waiting to acquire an update lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_UIX |
Occurs when a task is waiting to acquire an update intent exclusive lock. See sys.dm_tran_locks for the lock compatibility matrix . |
LCK_M_X |
Occurs when a task is waiting to acquire an exclusive lock. See sys.dm_tran_locks for the lock compatibility matrix . |
ASYNC_NETWORK_IO
This waiting state occurs when SQL Server has prepared the data, but the network does not have enough transmission speed to keep up, so the data of SQL Server has no place to store.
- This situation is generally not a database problem, and adjusting the database configuration will not be of great help.
- A bottleneck at the network layer is of course a possible cause: is it really necessary to return that much data?
- Performance issues on the application side can also cause ASYNC_NETWORK_IO waits in SQL Server. If you see this type of wait, check the health of the application and whether it is necessary for the application to request such a large result set from SQL Server.
- The way the program returns the result set.