SQL SERVER - Solve system problems caused by session waiting

The basic process of waiting for classification and resolution:
 

 

Step 1. Locate the problem

System waiting can often reflect system problems intuitively. Through some common types of waiting, the system bottleneck can also be found, combined with performance counters, it is often more accurate to locate.
For example, if there are a large number of IO class waiting in the system, it may indicate that your disk or memory is the reason for the slow running of the statement, and it is also the bottleneck of the system.

Common wait types

      • CXPACKET: Occurs when trying to synchronize query processors to exchange iterators. Consider reducing the degree of parallelism if contention for this type of wait becomes an issue.
      • IO_COMPLETION : Occurs while waiting for an I/O operation to complete. Typically, this wait type represents non-data page I/O.
      • PAGEIOLATCH_ : Occurs when a task is waiting for a latch on a buffer in an I/O request.
      • PAGELATCH_ : Occurs when a task is waiting for a buffer latch that is not in an I/O request.
      • LCK_ : Occurs while waiting for a latch.
      • ASYNC_NETWORK_IO : Appears in network writes when the task is blocked behind the network. Verify that the client is processing data from the server. 
      • OLEDB : Occurs when SQL Server calls the Microsoft SQL Native Client OLE DB provider. This wait type is not used for synchronization. Instead, it is used to indicate the duration of the call to the OLE DB provider 
      • WRITELOG : Appears when waiting for log flushing to complete. Common operations that cause log flushes are checkpoints and transaction commits. 
 
 
 
 
 

Step 2. Analysis

problem and solution

CXPACKET 

CXPACKET This wait can be simply understood as a CPU-related wait, which mainly occurs in parallel plans. Since the parallel plan needs to cooperate with multiple tasks to work at the same time, it is this wait that occurs when the "coordinated" allocation and other operations are performed.

If CXPACKET is the most serious wait in your system, then the general performance is that your CPU is very high.

 

Solution: Adjust the degree of parallelism appropriately

 

 

 

 

 It is generally recommended that if the system has more than 32 CPUs, it should be set to 8 or 4. If there are very short and frequent statements in the system, it is recommended to set it to 1.

    The threshold of parallel overhead mainly controls when the SQL optimizer selects a parallel plan. The default value is recommended. The smaller the value is, the easier it is for the optimizer to select a parallel plan.

    The setting of the degree of parallelism is the setting for the instance level (in 2016, it can be set for a single database)

I 类

  The three waits IO_COMPLETION and PAGEIOLATCH_ and WRITELOG  are the most common disk-related waits. Their difference is that IO_COMPLETION is mainly for non-data page I/O  , such as disk interactions required for backup operations. PAGEIOLATCH_ is the disk wait associated with the data page. WRITELOG is log related.

  If the three waits in the system are the main waits, it means that the system disk is under pressure or has become a bottleneck.

  Here we use PAGEIOLATCH_ as an example to illustrate

  Official explanation of PAGEIOLATCH_: Occurs when a task is waiting for a latch on a buffer in an I/O request. Latch request is in "XX" mode. Long waits may indicate a problem with the disk subsystem.

    Related waits for PAGEIOLATCH_:

 

PAGEIOLATCH_DT

Occurs when a task is waiting for a latch on a buffer in an I/O request. Latch requests are in "destroy" mode. Long waits may indicate a problem with the disk subsystem.

PAGEIOLATCH_EX

Occurs when a task is waiting for a latch on a buffer in an I/O request. The latch request is in "exclusive" mode. Long waits may indicate a problem with the disk subsystem.

PAGEIOLATCH_KP

Occurs when a task is waiting for a latch on a buffer in an I/O request. Latch requests are in "hold" mode. Long waits may indicate a problem with the disk subsystem.

PAGEIOLATCH_NL

For internal use only.

PAGEIOLATCH_SH

Occurs when a task is waiting for a latch on a buffer in an I/O request. The latch request is in "shared" mode. Long waits may indicate a problem with the disk subsystem.

PAGEIOLATCH_UP

Occurs when a task is waiting for a latch on a buffer in an I/O request. The latch request is in "update" mode. Long waits may indicate a problem with the disk subsystem.

     How to understand this official explanation? First of all, it is clear that any data operated by the operating system CPU is read from memory, that is to say, reading data must go through such a path:

    In Disk --> In Memory --> End Use  

The PAGEIOLATCH_ here is what happens, in disk --> in memory 

  Take reading as an example: the data page to be read is not in memory, so it is necessary to read this part of the data page from the disk. When reading data from the disk, a related wait of PAGEIOLATCH_ will be generated. If the disk pressure is high, If the data cannot be returned for a long time, the longer the PAGEIOLATCH_ time will be, and the longer the statement execution time will be.

 

Note: When your system has a large number of PAGEIOLATCH_ classes waiting, it means that your disk may be under pressure (the disk speed cannot meet the current business needs) or your memory is not enough to cache common business data and often interact with the disk!

 

 

WRITELOG Another disk-related wait state is waiting for log records to be written, which means that the write speed is also significantly slower. There are generally two situations in which the speed cannot keep up: the disk pressure is large and the response time is long or the real speed cannot meet the needs of reading and writing.

 

PAGELATCH_ 

PAGELATCH_ and PAGEIOLATCH_ described above seem   very similar, but  the key IO  is missing in the middle .

    In Disk --> In Memory --> End Use  

In disk --> in-memory wait is PAGEIOLATCH_    and in memory --> final used wait is  PAGELATCH_

 When the data is already in memory, SQL SERVER wants to use the data page to lock the data page.

When there are many PAGELATCH_ waits in the wait, then it can be explained:

  1. SQL Server has no apparent memory and disk bottlenecks.
  2. The application sends a large number of concurrent statements to modify the records in the same table, and the table schema design and user business logic make these modifications concentrated on the same page, or a small number of pages. These pages are sometimes called Hot Pages . Such bottlenecks usually only occur on typical OLTP systems with many concurrent users.
  3. This bottleneck cannot be solved by improving the hardware configuration. Only by modifying the table design or business logic and spreading the modifications to as many pages as possible can the concurrent performance be improved.

 

PAGELATCH_ caused by TempDB (in fact, it is also a kind of Hot Page), here is a simple example:

 

    There are a lot of  PAGELATCH_UP waiting in the system So what makes a Hot Page  ? Why is it related to TempDB?

     

 

     Waiting for the resource "2:X:X:" starts with TempDB. There are a large number of high concurrent statements in the system that use temporary tables and table variables, which causes the TEMPDB bottleneck. See: Diagnosis and Optimization of TempDB .

 

LCK_ 

 There are many of the LCK_ types. If this kind of waiting exists in a large number in the system, it can indicate that the mutual blocking between system statements is serious. As we all know when you update a table, your select will be blocked until the update is complete. I won't introduce too many scenarios here, but let's take a look at the main methods to solve this kind of waiting:

    1. Statement optimization makes statements execute faster and reduces waiting time.
    2. Use batch operations instead of loops.
    3. Minimize the length of transactions.
    4. Try lowering the transaction isolation level.
    5. None of the above can be alleviated... Please choose read-write separation.
 

 The LCK_ type contains: (not explained in detail here)

LCK_M_RIn_NL

Occurs when a task is waiting to acquire a NULL lock on the current key value and an insert range lock between the current key and the previous key. A NULL lock on a key is a lock that is released immediately. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RIn_S

Occurs when a task is waiting to acquire a shared lock on the current key value and an insert range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RIn_U

The task is waiting to acquire the update lock on the current key value and the insert range lock between the current key and the previous key. See sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RIn_X

Occurs when a task is waiting to acquire an exclusive lock on the current key value and an insert range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RS_S

Occurs when a task is waiting to acquire a shared lock on the current key value and a shared range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RS_U

Occurs when a task is waiting to acquire an update lock on the current key value and an update range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RX_S

Occurs when a task is waiting to acquire a shared lock on the current key value and an exclusive range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RX_U

Occurs when a task is waiting to acquire an update lock on the current key value and an exclusive range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_RX_X

Occurs when a task is waiting to acquire an exclusive lock on the current key value and an exclusive range lock between the current key and the previous key. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_S

Occurs when a task is waiting to acquire a shared lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_SCH_M

Occurs when a task is waiting to acquire a schema modification lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_SCH_S

Occurs when a task is waiting to acquire a schema shared lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_SIU

Occurs when a task is waiting to acquire a shared intent update lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_SIX

Occurs when a task is waiting to acquire a shared intent exclusive lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_U

Occurs when a task is waiting to acquire an update lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_UIX

Occurs when a task is waiting to acquire an update intent exclusive lock. See  sys.dm_tran_locks for the lock compatibility matrix .

LCK_M_X

Occurs when a task is waiting to acquire an exclusive lock. See  sys.dm_tran_locks for the lock compatibility matrix .

ASYNC_NETWORK_IO 

  This waiting state occurs when SQL Server has prepared the data, but the network does not have enough transmission speed to keep up, so the data of SQL Server has no place to store.

  1. This situation is generally not a database problem, and adjusting the database configuration will not be of great help.
  2. A bottleneck at the network layer is of course a possible cause: is it really necessary to return that much data?
  3. Performance issues on the application side can also cause ASYNC_NETWORK_IO waits in SQL Server. If you see this type of wait, check the health of the application and whether it is necessary for the application to request such a large result set from SQL Server.
  4. The way the program returns the result set.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326374937&siteId=291194637