mysql combat 22 | MySQL method which "harm than good" to improve performance there?

I do not know you have not encountered such a scene in the actual operation and maintenance process: peak periods, MySQL pressure production environment is too large, can not respond to normal, short-term needs, temporarily lifting some performance.

I do business escort before time, we occasionally run into this scenario. The user's development chief says, no matter what program you use, let it start and repeat business.

However, if the program is non-destructive, then certainly you do not need to wait until this time to play. Today we'll talk about these temporary programs, talk and focus on the risk they may be present.

Short connections storm

Normal mode is the short connection after connection to the database, execute SQL statements on little disconnected, next time you need time to re-link. If you are using a short connection, during peak periods, when the sudden increase in the number of cases connected can occur.

My first article "Infrastructure: how a SQL query statement is executed? "I said that, MySQL connection establishment, the cost is very high. In addition to the normal network connection three-way handshake, the need to judge and get permission to do this login data read and write access connection.

In the database less stress when these additional costs are not obvious.

However, there is a risk of short connection model, once the database is processed more slowly, the number of connections will surge. max_connections parameters used for controlling the maximum number of connections simultaneously present a MySQL instance, the connection request exceeds this value, the system will reject the next, and error prompt "Too many connections". For the connection request is rejected, the perspective is from the service database is unavailable.

When the machine load is relatively high, the processing time becomes long existing request, each time the connection is maintained longer. At this time, then have something new connection, it may exceed the limit of max_connections.

When confronted with this situation, a more natural idea is to increase the value of max_connections. But this is risky. Because the purpose of this parameter is designed max_connections want to protect MySQL, if we change it too much, so that more connections can come in, then the system load may further increase the amount of resources spent on verification authority on logic the result may be counterproductive, the thread has been unable to get connected CPU resources to execute operations SQL requests.

So in this case, do you have any other suggestions? I have here two approaches, but be aware that these methods are lossy.

The first method: first dispose of those threads occupied connection but does not work.

Max_connections calculation, not running to see who, as long as it is attached to take up a position count. For those who need to keep connected, we can take the initiative kicked off by kill connection. This behavior is similar to pre-set wait_timeout the effect is the same. Set wait_timeout parameter indicates that a thread is idle wait_timeout after so many seconds, MySQL will be directly disconnected.

Note, however, the results show processlist's, to sleep thread kicked off display, it may be lossy. Let's look at the following example.


                                               Two states in FIG. 1 sleep thread

 In the above example, if the connection is disconnected session A, because this time the session A has not been submitted, so MySQL can only be handled in accordance roll back the transaction; and disconnected session B, it is no big impact. So, according to priority, you should give priority to disconnect external affairs such as session B free connection.

However, how to determine what is outside of a transaction free of it? session C show processlist performed at 30 seconds after the time T, the result of this is seen.


                                Two states 2 sleep thread chart, show processlist results

 FIG id = id = 4 and two sessions are Sleep state. 5. And depends on the specific state of affairs, you can check innodb_trx table information_schema library.


                         3 from information_schema.innodb_trx query transaction status

This results in, trx_mysql_thread_id = 4, represents id = 4 thread is still in the transaction.

Therefore, if it is too many connections, you can connect external affairs idle for too long a priority disconnected; if this is not enough, then consider idle for too long within a transaction is disconnected.

Disconnected from the server using the kill connection + id command, a client is in sleep state, after its connection is active off the server, the client will not immediately know. Until the next time the client initiates a request, you will receive this error "ERROR 2013 (HY000): Lost connection to MySQL server during query".

Disconnect from the database end of the initiative may be lossy, especially after some end applications receive this error, do not reconnect, but can not be used with this already handles retry queries directly. This causes the application side looks, "MySQL has not been restored."

You might think this is a joke, but in fact I encountered no less than 10 times.

So, if you are a business support of DBA, do not assume that all application code will be processed correctly. Even if only a connection of disconnected operation, but also to ensure that the notice to the business development team.

The second method: to reduce the consumption connection process.

Some business code will apply to a large number of database connections in a short time to do backup, if the database is now identified as being connected to behavior and hanging, then one possible approach is to skip the permissions database validation stage.

Skip permission verification method: restart the database, and using -skip-grant-tables starting. Thus, the entire MySQL will skip all the permissions validation phase, including the connection process and procedure statements, including execution.

However, this method is particularly in line with our title speaks of "harm than good" high risk, I do not particularly recommend the program is used. Especially if you are outside the network can access the library, they can not even do that.

In MySQL 8.0 version, if you enable -skip-grant-tables parameter, MySQL will default to open --skip-networking parameters, said this time the database can only be connected to a local client. Visible, MySQL official also attached great importance to security issues skip-grant-tables of this parameter.

In addition to short connections surge may lead to performance issues, in fact, we met online is more of a query or update statements due to performance issues. Among them, typical inquiries there are two types, one is the slow queries emerging caused by a class of QPS (queries per second) due to the sudden increase. The statement about performance issues caused by update, I will unfold in the next article and you explain.

Slow query performance issues

In MySQL, can cause slow query performance problems, generally have the following three possibilities:

  1. Index no good design;
  2. SQL statement is not written;
  3. MySQL wrong index.

Next, we analyze three possible concrete, and the corresponding solutions.

Resulting in slow query first possibility is that the index is not designed well.

This scenario is usually addressed by the emergency to create the index. After the MySQL 5.6 release, creating an index of the Online DDL support for the case that the peak of the database has been playing this statement is linked to the most efficient approach is to directly execute alter table statements.

Ideal to be able to perform in the standby database. Suppose you service is a main one, the main library A, prepared by the library B, the flow of this program is substantially this:

  1. Executed on the standby database B set sql_log_bin = off, i.e. not written binlog, and then execute the statement alter table index plus;
  2. Performing standby switching;
  3. At this time the primary database be B, is prepared by the library A. Execution set sql_log_bin = off on A, and then execute the statement alter table index plus.

This is an "old" DDL program. Usually do change, you should consider a program like this gh-ost, more secure. But in need of emergency treatment efficiency above this program is the highest.

Resulting in slow query second possibility is that the statement did not write.

For example, we are guilty of "Why the same as in the first 18 articles of these SQL statements logic, the performance is a huge difference? "Those mentioned in the error, resulting statement does not use the index.

At this time, we can be handled by rewriting the SQL statement. MySQL 5.7 provides query_rewrite function, to change one statement is rewritten into another input mode.

For example, the statement was incorrectly written as a select * from t where id + 1 = 10000, you can use the following ways to add a statement to rewrite the rules.

mysql> insert into query_rewrite.rewrite_rules(pattern, replacement, pattern_database) values ("select * from t where id + 1 = ?", "select * from t where id = ? - 1", "db1");

call query_rewrite.flush_rewrite_rules();
复制代码

Here, call query_rewrite.flush_rewrite_rules () This stored procedure is to allow the insertion of the new rules take effect, that is, we say, "query rewrite." You can use the 4 way to confirm whether to rewrite the rules take effect.

                                                    Figure 4 query rewrite effect

 

A third possible cause of slow queries, run into the situation that we mentioned in the first 10 articles, MySQL wrong index.

At this time, the emergency plan is to give this statement with force index.

Likewise, the use of query rewrite features to the original statement with force index, can solve this problem.

I am above you and discuss the slow query performance problems may lead to three kinds of circumstances, in fact, the most frequent are the first two, namely: the index is not designed well and did not write the statement. And in both cases, it is the completely avoidable. For example, by following this process, we can identify problems in advance.

  1. Before on-line, in a test environment, the slow query log (slow log) is opened, and the long_query_time set to 0, to ensure that each statement will be recorded into the slow query log;
  2. Inserting analog data line in the test table, regression testing to do it again;
  3. Observe the slow query log output for each type of statement, whether special attention Rows_examined field in line with expectations. (We have used many times in the previous article had Rows_examined method, I believe you've tried hands. If there do not understand, please give me a message, we discussed together).

Do not skimp on this "extra" time spent on the front line, as this will help you save a lot of time in the recovery disk failure.

If the new SQL statement is not much, it can be run manually. And if it is a new project, or if the table is modified the original project design, the full amount of regression testing is necessary. At this time, you need tools to help you check all of the SQL statement returns the result. For example, you can use the open source tool pt-query-digest

QPS spurt problem

Sometimes due to the sudden peak business, or an application bug, leading to the sudden increase in QPS a statement, MySQL may also lead to excessive pressure, impact service.

I came across a type of situation before, it is a new feature bug caused. Of course, the ideal situation is to get the business off this feature, the service will naturally recover.

A down function off, if the end of the process from the database, then, corresponding to different backgrounds, different methods are available. I'm here again and you started to explain.

  1. One is a new business due to a bug. Suppose your DB operation and maintenance is fairly standard, that is to say a whitelist is a plus. In this case, if you can determine the business side will fall under this function, but the time is not so fast, it can be directly removed from the whitelist database side.
  2. If this new feature is the use of a separate database user, you can use the administrator account to delete the user, and then disconnect the existing connection. In this way, the connection is not successful new feature, triggered by its QPS will become zero.
  3. If this new feature is deployed together with the main function, then we can only be limited by the processing statement. At this time, we can use the query rewrite features mentioned above, the most stressful SQL statements directly rewritten "select 1" is returned.

Of course, a high risk of this operation, you need to especially careful. It may be two effects:

  1. If other function which also uses the SQL statement template, there will be accidental injury;
  2. Many businesses do not rely on it to complete a statement of logic, so if this individual a statement to select 1 of the results returned, it may lead to the failure of the business logic behind together.

Therefore, Option 3 is used to stop bleeding, remove privileges like validation mentioned earlier, you should be all the options in the lowest priority of a program.

At the same time you will find that, in fact, Options 1 and 2 have to rely on the operation and maintenance system specifications: virtualization, whitelist mechanism, separate business account. Thus, more preparation, often means a more stable system.

summary

Today this article, I question the performance of the business to the peak of the background, and you introduce some means of emergency treatment.

These processing means in both the rudely refused to connect and disconnect, there are some ways to bypass the pits by rewriting the sentence; both temporary high-risk program, there are proactive, relatively safe plan.

In the actual development, we should try to avoid some of the inefficient methods, such as avoiding the use of a large number of short connections. At the same time, if you do business development, then, you know, connection exception disconnect is often the case, your code must be correctly and retry mechanism of reconnection.

Although the DBA can override temporarily deal with the problem by the statement, but that in itself is a high-risk operation, good opportunity SQL audit can reduce the need for this type of operation.

In fact, you can see, the solution in this article I mentioned mainly in the server layer. In the next article, I will continue to discuss with you some of the treatment methods with InnoDB related.

Finally, we went to the Questions time.

Today, I leave you with after-school question is whether you come across, you need to temporarily fire scene during peak periods? How did you deal with it?

You can write your experience and experience in the comments section, I will be the end of an article selection and analysis of interesting comments to share with everyone the next. Thank you for listening, you are welcome to send this share to more friends to read together.

On the issue of time

Former two questions I'll keep that perform the following sequence of this figure, why insert statement session B will be blocked.



We spend a lock of rules to analyze it and see session A select statement added which locks:

  1. Since that order by c desc, is the first to be located on the index c "rightmost" c = line 20, it will lock together the gap (20, 25) and next-key lock (15,20].
  2. Traversing the left on the index c, c = 10 to scan to a halt, it will be added next-key lock (5, 10], which is the reason for blocking the session insert statements B.
  3. During the scanning process, c = 20, c = 15, c = 10 values ​​of the three lines are present, as is the select *, it will lock on the row plus three primary key id.

Therefore, select the session A statement lock range is:

  1. The index c (5, 25);
  2. The primary key index id = 15,20 two row lock.

Here, my long-winded down, you will find me in the article, every lock you will note is added, "on which Index". Because the lock is added to the index, which is a basic set of InnoDB, you need to always remember when analyzing problems.

Reproduced in: https: //juejin.im/post/5d037549e51d45777b1a3d85

Guess you like

Origin blog.csdn.net/weixin_34249367/article/details/93183477