About MySQL thread pool, which is probably the most comprehensive and practical post! (Reprint)

About MySQL thread pool, which is probably the most comprehensive and practical post!

Recent times due to the abnormal upper components lead to an avalanche of DB, DB I will be part of the monitoring function is enabled thread pool, while in the process of using a thread pool in the deepening of learning, also encountered a lot of problems.

This article describes in detail about the relevant knowledge of MySQL thread pool, to help the DBA to quickly understand MySQL's thread pooling mechanism, to quickly configure MySQL thread pool as well as some of the existing pit inside. In fact, I would say, to understand and use MySQL thread pool, see the article enough.

I. Why use MySQL thread pool

Before describing why to use the thread pool, we all know that traffic is growing with the DB, DB response time more and more will follow, as shown below:

When the DB access to a certain extent, the throughput DB will decline, and the difference will be more, as shown below:

So is there any way to achieve with the increasing traffic DB, DB always show the best performance? Similar performance in the following figure:

The answer is the thread pool features highlights of today. In summary, the reason for using the thread pool, there are two:

1, to reduce the overhead of thread creation and destruction repeated part, to improve performance

Thread pool by a certain number of threads created in advance, when listening to a new request, the thread pool allocation directly from existing threads in a thread to service, after the service of this thread does not directly destroyed, but went process other requests. This avoids the frequent threads and memory objects created and destroyed, reducing context switching, improved resource utilization, thereby improving the performance and stability of the system to some extent.

2, the system play a protective role

Thread pool limits the number of concurrent threads, equivalent to limiting the number of runing MySQL threads, regardless of the system or the current number of connection requests, need to queue exceeds the maximum number of threads setting, to keep the system performance levels, thus preventing the emergence of DB avalanche, has protective effects on the underlying DB.

Some may ask, connection pooling can also achieve a similar effect?

Maybe some DBA thread pool and connection pool will be confused, but in fact the two are very different: the connection pool is generally set at the client, and the thread pool is configured on the DB server; additional connection pool can serve to avoid connection frequently create and destroy, but can not control the activities of the target number of MySQL threads, at high concurrency scenarios, you can not play a role in protecting the DB. A better approach is to combine the use of connection pooling and thread pooling.

Two, MySQL Thread Pools

About MySQL Thread Pool

In order to address one-thread-per-connection (one thread per connection) frequently create and destroy large numbers of threads and concurrency problems under high avalanche existing DB, DB achieve higher performance in a highly concurrent environment is still able to maintain.

Oracle and MariaDB have launched a ThreadPool program, currently implemented as Oracle's Thread pool Plugin way, and only added to the Enterprise version, Percona transplanted Thread pool features of MariaDB, and made a further optimization. The paper-based environment Percona MySQL 5.7 version.

MySQL thread pool architecture

MySQL's Thread pool (thread pool) is divided into multiple group (group), each group there are corresponding worker thread, the whole logic of the work is quite complicated, here I am trying to introduce MySQL thread pool by way of a simple work principle.

1, Chart

First, let's look at the architecture diagram Thread Pool.

2, Thread Pool composition

Architecture can be seen from FIG Thread Pool and consists of a plurality of thread Thread Group Timer composition, and each in turn made up of two queues Thread Group, a listener thread and a plurality of worker thread configuration. The following describes the function of each part respectively of:

  • Queue (high priority queue and a low priority queue)

IO is used to store tasks to be performed, into a high priority queue and a low priority queue, high priority queue of the task priority is processed.

What tasks will be placed high priority queue it?

Statement within a transaction will be placed in the high priority queue, such a transaction, there are two update of SQL, there is an already executed, then the task will be another update on high-priority. It should be noted that if non-transactional engine, or opened Autocommit transaction engine, will put low priority queue.

Another scenario would put the task of high priority queue, if the statement in the low-priority queue stay too long, the statement will be moved to the high priority queue, to prevent starvation.

  • listener thread

listener listens thread the thread group's statement and determine when transform itself into a worker thread is immediately execute the corresponding statement was put in the queue, to determine whether the standard is to look at the statement in the queue to be executed.

If the number of statements to be executed in the queue is 0, the listener thread converted into worker thread, and executes the corresponding statement immediately. If the number of statements to be executed in the queue is not 0, it is considered more tasks, the statements in the queue, so that other thread to handle. The mechanism here is to reduce thread creation, because the general SQL execution is very fast.

  • worker thread

worker thread is the real work of the thread.

  • Timer thread

Timer thread is used to periodically check the group is in a state of congestion, when congestion occurs, it will be resolved by a new wake-up thread or threads.

Specific detection methods are: queue_event_count be determined by the value of IO queue is empty thread group is blocked.

Every worker thread task queue when checking in, queue_event_count will be +1, each time Timer finished checking whether the group will be blocked when queue_event_count cleared, if not checked when the task queue is empty, and queue_event_count is 0, then the task queue is not properly treated, and the group appeared blocked, Timer will wake up the thread worker thread or create a new wokrer thread to handle the task queue is blocked to prevent the group for a long time.

3, Thread Pool is how it works?

The following describes the operation of minimalist Thread Pool, just a brief description omitted a lot of complex logic, do not prick ~

Step1: request to connect to MySQL, which is determined according to the group falling threadid% thread_pool_size;

Step2: group of listener threads listen to the group where there is after a new request to check whether there is a request in the queue has not been processed. If not, then converted to their own worker thread to process the request immediately, if there is a pending request queue, the corresponding request on the queue, so that other thread processing;

Step3: group requests in the queue thread thread checks if there is a request queue, the processing, if there is no request, dormant, has not been awakened, more than thread_pool_idle_timeout automatically after exit. End of the thread. Of course, it will first check whether the number of running before the acquisition request thread group exceeds thread_pool_oversubscribe + 1, if exceeded will sleep;

Step4: timer thread periodically checks whether each group has blocked, if so, to wake wokrer threads or create a new worker thread.

4, Thread Pool distribution mechanism

The thread pool may be divided into a plurality of parameter group according thread_pool_size size, each group maintains a respective connection initiated by the client, when the client initiates a connection to MySQL, MySQL will follow the thread connection id (thread_id) for performing a modulo thread_pool_size to fall corresponding group.

thread_pool_oversubscribe parameter controls the maximum number of concurrent threads of each group, the maximum number of concurrent threads of each group is thread_pool_oversubscribe + 1 th. If the corresponding group reached the maximum number of concurrent threads, the corresponding connections will have to wait. This allocation mechanisms under multiple scenarios can lead to slow SQL SQL ordinary long-running in a group, this issue will be described in detail later.

MySQL Thread Pool Parameters

About small thread pool parameters, show variables like 'thread%' parameters can be seen below, the following one by one to resolve:

  • thread_handling

The threading model parameters are configured, the default is one-thread-per-connection, i.e. the thread pool is not enabled; this parameter to pool-of-threads i.e. a thread pool enabled.

  • thread_pool_size

This parameter is the number of Group disposed thread pool, the system defaults to the number of the CPU, the full use of CPU resources.

  • thread_pool_oversubscribe

The parameter sets the maximum number of threads in the group, the maximum number of threads per group is thread_pool_oversubscribe + 1, note listener thread not included.

  • thread_pool_high_prio_mode

A high priority queue control parameters, there are three values ​​(transactions / statements / none), Transactions default, the meaning of the following three values:

transactions: For a statement of affairs has been launched into the high priority queue, but it also depends on parameters thread_pool_high_prio_tickets back.

statements: All statements in this mode will put a high priority queue, do not use the low-priority queue.

none: This mode does not use the high priority queue.

  • thread_pool_high_prio_tickets

This parameter controls how many times each connect up the word order is placed in the high priority queue, the default is 4294967295. Note that this parameter is only thread_pool_high_prio_mode for the transactions when there are results.

  • thread_pool_idle_timeout

worker thread maximum idle time, the default is 60 seconds, after exceeding the limit will quit.

  • thread_pool_max_threads

This parameter is used to limit the maximum number of threads in the thread pool, exceeding this limit will not be able to create more threads, the default is 100,000.

  • thread_pool_stall_limit

This parameter sets the thread timer for detecting a time interval group is abnormal, the default is 500ms.

Third, the use of MySQL thread pool

Use the thread pool is relatively simple, just add an instance restart after configuration can be.

Specific configuration is as follows:

#thread pool

thread_handling=pool-of-threads

thread_pool_oversubscribe=3

thread_pool_size=24

performance_schema=off

#extra connection

extra_max_connections = 8

extra_port = 33333

Note: Other parameters to default

In the above specific parameters previously described in detail, the following configuration is two points to note:

1, the reason for adding performance_schema = off, because the test is found during the Thread pool and PS simultaneously turned on when there will be a memory leak problem (details will be described later);

2, add extra connection is to prevent MySQL can not log in under the thread pool is full, the port management therefore intended to be used, in case of emergency situations use;

After rebooting example, by like you can show variables '% thread%'; see whether the configured parameters take effect.

Fourth, the problems encountered in the use

In the process of using a thread pool, I encountered a few problems, but also the way here to be a summary:

Memory leak problem

DB enabled after the thread pool, memory surged about 8G, as shown below:

Not only enabled after the thread pool memory surged about 8G, and memory continues to grow, it is clear that enabled memory leak problem after the thread pool.

There are also lots of people encounter this problem, confirm that the bug is percona lead ( jira.percona.com/browse/PS-3...

The following is closed after the memory usage PS comparison:

Note: Currently Percona server 5.7.21-20 version has fixed the thread pool and open at the same PS memory leak problem, from what I tested to see problems have been resolved, we can directly use the Percona server 5.7.21-20 version, as shown below.

Dial test is abnormal

When enabled thread pool, the equivalent of the MySQL limit the number of concurrent threads, when the maximum number of threads, and other threads need to wait for the new connection will be stuck in that connection verification step, this time will cause the dial test program to connect to MySQL timeout , dial test returns an error as follows:

After dialing test to connect instance timeout, it will assume master problems have emerged. In extreme cases, after several retries are abnormal, the automatic switching operation is initiated, to switch the service from the machine.

There are two solutions to this situation:

1, enable the MySQL bypass port management, monitoring, and high availability is directly related to the use of MySQL bypass management port.

Specific practices: add the following configuration in my.cnf reboot, you can influence through the bypass port login MySQL, the maximum number of threads from the thread pool:

extra_max_connections = 8

extra_port = 33333

Note: It is recommended to enable the thread pool, this add on to facilitate an emergency troubleshooting.

2, modify the script to detect high availability, errors will reach the maximum number of active threads in the thread pool to return to do exception handling, as the scene exceeds the maximum number of connections. (Note: only the number of connections exceeds the maximum alarm, automatic switching is not performed)

Slow introduction of SQL problem

With dial-depth analysis of the measured time-out problems, but full thread pool monitoring dial test timeout of one of the cases, there is a situation two configuration thread pool is not full, the line:

thread_pool_oversubscribe=3

thread_pool_size=24

According to the above two configurations to calculate it, in total can run concurrently 24x (3 + 1) = 96, but according to multiple problems in the chase and found that many times the thread pool does not reach 96, that is to say the whole thread pool and it is not full. What is the problem that would lead to dial test failure?

Given the structure and thread pool allocation mechanism, the description of the front portion of the thread pool, we all know in the interior is divided into a thread pool of a Group, our online Group 24 arranged, while the allocation mechanism is a thread pool Threadid modulo, which then determines that the thread group to fall.

When a timeout occurs, there are a lot of load thread to import the data. That is to say that when there are some threads slower situation. It will not be a group of thread full, resulting in a new thread assigned to wait?

With this conjecture later, the next step is to verify the problem. Verification in two steps:

1, processlist line fetch operation, then modulo threadid see if there is a plurality of load situation falls threads of the same group;

2, in a test environment to simulate this scenario to see if in line with expectations.

Online scene analysis

Scenes look at the first line, after crawling through a dial test point processlist timeout to find out who was load threads, de-molding according to threadid, and summary statistics, the following results:

As can be seen, when the number of request group 4 and 7 are more than four, indicating that the dialing test abnormalities resulting single full group. Of course, part of the operation will lead to fast SQL slower.

Environmental simulation test scene analysis

In order to build quickly reproduce the environment, I will adjust the parameters as follows:

thread_pool_oversubscribe=1

thread_pool_size=2

By adjusting the above parameters, we can calculate the maximum number of concurrent threads of 2x (1 + 1) = 4, as shown below, when the number of active threads over 4, the other threads must wait:

I simulated online environment approach to on slow SQL 1 threads, thread pool situation at this time of the test environment are as follows:

According to previous speculation, then Group1 processing power equivalent to 50% Group2 processing capabilities, if previous inference is correct, then allocated on the Group1 thread will be blocked.

In this case such a request to the thread 20, in accordance with the principle of distribution of the thread pool at this time are assigned Group1 and Group2 10 thread request. If all the threads consuming requests are the same, then the thread assigned to the request of Group1 overall processing time should be allocated to 2 times the overall processing time on Group2.

I use a script, and launch 12 threads request, each thread requests are running select sleep (2), then in the case of Group1 and Group2 are idle, run as follows:

2018-03-18-20:23:53

2018-03-18-20:23:53

2018-03-18-20:23:53

2018-03-18-20:23:53

2018-03-18-20:23:55

2018-03-18-20:23:55

2018-03-18-20:23:55

2018-03-18-20:23:55

2018-03-18-20:23:57

2018-03-18-20:23:57

2018-03-18-20:23:57

2018-03-18-20:23:57

Every four threads, for a total run of six seconds.

Then after Group1 is covered with a long-running thread, look at the test results is how:

2018-03-18-20:24:35

2018-03-18-20:24:35

2018-03-18-20:24:35

2018-03-18-20:24:37

2018-03-18-20:24:37

2018-03-18-20:24:37

2018-03-18-20:24:39

2018-03-18-20:24:39

2018-03-18-20:24:39

2018-03-18-20:24:41

2018-03-18-20:24:43

2018-03-18-20:24:45

As seen from the above results, in the absence of blocking, each time four threads, and the thread behind a long running time, there will be the case that a long line corresponding to the thread group appears, Finally, although there are three idle threads, but only one thread in the process (marked red partial results).

There are two solutions:

1, the thread_pool_oversubscribe appropriate transfer large, this approach can only alleviate a similar problem, not cure;

2, find the slow SQL, solve the problem of slow.

Guess you like

Origin blog.csdn.net/weixin_33971205/article/details/91397797