The tragedy caused by mysql modified character set utf8mb4

The tragedy caused by mysql modified character set utf8mb4

  • Environment configuration: Linux CentOS 7 mysql5.7 character encoding is utf8;
  • The cause of the tragedy: The database table needs to support emoticons. Emojis are generally 4 characters. utf8 supports up to 3 characters. If you insert emojis in the field of 4 characters, an error will be reported. Therefore, we modified the character set of this table to utf8mb4 , Here to explain that utf8mb4 is a superset of utf8.
  • Here comes the problem: there are two tables in the MySQL environment where the fields used in the left join are indexed, but the execution plan shows that one table uses a full table scan, which scans nearly a million rows of records in the entire table, resulting in slow SQL execution.
  • Diagnosis result: the tragedy caused by the modification of the mysql table character set utf8mb4;
  • The problem diagnosis is reproduced: First, the table structure and table records are as follows:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`code` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_code` (`code`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8

t1 insert some data

CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`code` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_code` (`code`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4

T2 inserts some data and

the execution plan for the left join of 2 tables is as follows:

desc select * from t2 join t1 on t1.code = t2.code where t2.name = 'dddd'\G;

  • It can be clearly seen that t2.name ='dddd' uses the index, and the association condition t1.code = t2.code does not use the index on t1.code. At first, it was puzzled, but the machine did not Will deceive. Show warnings to view
    the warning information of the query execution plan:
    show warnings;

  • Found the problem: After discovering the conversion (using testdb.t1.code of utf8mb4), Scott found that the character sets of the two tables were different. t1 is utf8, t2 is utf8mb4. But why is the table table character set different (actually I modified the character set and the character set is different) will cause the T1 full table scan? Let's analyze it below.
    (1) First, t2 left join t1 to determine that t2 is the driving table. This step is equivalent to executing the selection * from t2 where t2.name ='dddd', taking the value of the code range, here is '8a77a32a7e0825f7c8634226105c42e5';
    (2) Then Pick up the value of the code found in t2 and search in t1 according to the connection condition. This step is equivalent to executing select * from t1 where t1.code ='8a77a32a7e0825f7c8634226105c42e5';
    (3) But because of the T2 table in step (1) The extracted code field is the utf8mb4 character set, and the code in the T1 table is the UTF8 character set. The character set conversion is required here. The character set conversion follows the principle of increasing from small to large, because utf8mb4 is a superset of UTF8, so UTF8 is used here. Convert to utf8mb4, that is, convert t1.code to utf8mb4 character set. After conversion, since the index on t1.code is still UTF8 character set, this index is ignored by the execution plan, and then the T1 table can only select the full table scanning. To make matters worse, if T2 filters out more than one record, then T1 will be scanned multiple times by the entire table, and the performance difference can be imagined.

  • Solve the problem:
    Now that the cause is clear, how to solve it? Of course, it is to change the character set. You can change T1 to be the same as T2 or change T2 to T1. Here you choose to change T1 to utf8mb4. How to convert the character set?
    A student would say to use alter table t1 charset utf8mb4; but it is wrong, this is only the replacement character set of the table is changed, that is, the new data will use utf8mb4, and the existing part is still utf8.
    Only alter table t1 convert to charset utf8mb4; is correct.
    But also pay attention to the fact that the operation of changing the table to change the character set is double-written (lock = node will report an error), so please do not operate during the peak business period. Even during the peak business period, the operation of the large table is still recommended to use pt-online -schema changes the online modification character set.
    Test environment: use alter table t1 convert to charset utf8mb4, lock=shared;

Now look at the execution plan again, you can see that there is no problem.

  • Re-discovery summary:
    1. When the table character set is different, it may cause the added SQL to not use the index, causing serious performance problems;
    2. The alter table operation of changing the character set will be multiple writes, business mysql recommends using pt-online- schema-change;
    3. If you want to modify the character set of a table in large quantities, do the same for SQL review and modify the character set of the associated table together.
    4. Follow the example and use show warnings [Don't forget]

Guess you like

Origin blog.csdn.net/qq_31555951/article/details/106615110