MySQL implicit conversion must be known and understood

In the production environment, there are often some implicit type conversions that cause SQL indexes to fail and perform extremely poorly, which in turn affects the cluster load and business. This article summarizes common scenarios of implicit conversion. Try to avoid SQL implicit conversion in production.

Author: Zhang Luodan is passionate about database technology and constantly exploring. He hopes to write more in-depth articles and output more valuable content in the future!

Produced by the Aikeson open source community. Original content may not be used without authorization. Please contact the editor and indicate the source for reprinting.

This article is about 3,000 words and is expected to take 10 minutes to read.

Common scenarios where SQL generates implicit conversions include:

  1. Implicit conversion of data types
  2. Implicit conversion of character sets

Among them, character set conversion, especially in table connection scenarios and stored procedures, is easily overlooked.

Note: The character set is the encoding rule for character type data. For numeric types, there is no need to convert the character set.

Implicit conversion of data types

Test table structure

t1The table fields aare of VARCHAR type and t2the table fields aare of INT type.

mysql> show create database test1\G
*************************** 1. row ***************************
       Database: test1
Create Database: CREATE DATABASE `test1` /*!40100 DEFAULT CHARACTER SET utf8 */
1 row in set (0.00 sec)
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `a` varchar(20) DEFAULT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Single table example

What needs to be noted here is that there are the following two types of conversions:

  1. When the field type is a string type and the parameter is an integer, the index will fail.
  2. When the field type is an integer and the incoming parameter is a string type, it will not cause the index to fail.

This is because when comparing strings with numbers, MySQL will convert the string type to a number for comparison. Therefore, when the field type is a string, a function will be added to the field, causing the index to fail.

Official documentation explains : Strings are automatically converted to numbers and numbers to strings as necessary.

-- 字段类型为varchar,传参为整数,无法走到索引
mysql> explain select * from t1 where a=1000;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | a             | NULL | NULL    | NULL | 498892 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                           |
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'a' due to type or collation conversion on field 'a'                                                               |
| Warning | 1739 | Cannot use range access on index 'a' due to type or collation conversion on field 'a'                                                             |
| Note    | 1003 | /* select#1 */ select `test1`.`t1`.`id` AS `id`,`test1`.`t1`.`a` AS `a`,`test1`.`t1`.`b` AS `b` from `test1`.`t1` where (`test1`.`t1`.`a` = 1000) |
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
-- 字段类型为int,传参为字符串,可以走到索引
mysql> explain select * from t2 where a='1000';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t2    | NULL       | ref  | a             | a    | 5       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

As for why you can't convert numbers to strings for comparison?

Comparison results below:

  • The comparison of strings is to compare the size of strings one by one until different characters are found. The comparison results of this comparison are different from the comparison results of numbers.
mysql> select '2000' <'250';
+---------------+
| '2000' <'250' |
+---------------+
|             1 |
+---------------+
1 row in set (0.00 sec)

Data type conversion in table joins

When the connection field types of the two tables are inconsistent, it will lead to implicit conversion (MySQL internal cast()function is added), and the connection field index cannot be reached, and the optimal table connection order may not be used.

The table that was originally a driven table may be used as a driving table because the index cannot be used.

Example:

  • As follows, under normal circumstances, t2the table will be selected as the driving table, but due to different data types, the actual SQL executed is:select * from t1 join t2 on cast(t1.a as unsigned)=t2.a where t2.id<1000
  • If t1used as a driven table, there is no way to go to t1.athe index of , so t1the table is selected as the driving table
mysql> explain select * from t1 join t2 on t1.a=t2.a where t2.id<1000;
+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref        | rows   | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | a             | NULL | NULL    | NULL       | 498892 |   100.00 | Using where           |
|  1 | SIMPLE      | t2    | NULL       | ref  | PRIMARY,a     | a    | 5       | test1.t1.a |      1 |     5.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                                                                                                                    |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'a' due to type or collation conversion on field 'a'                                                                                                                                                                                                        |
| Note    | 1003 | /* select#1 */ select `test1`.`t1`.`id` AS `id`,`test1`.`t1`.`a` AS `a`,`test1`.`t1`.`b` AS `b`,`test1`.`t2`.`id` AS `id`,`test1`.`t2`.`a` AS `a`,`test1`.`t2`.`b` AS `b` from `test1`.`t1` join `test1`.`t2` where ((`test1`.`t2`.`id` < 1000) and (`test1`.`t1`.`a` = `test1`.`t2`.`a`)) |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.01 sec)

Implicit conversion of character sets

When the parameter character set and the field character set are different, direct comparison cannot be made, and character set conversion is required. You may need to add convert()a function to the conversion field to convert the character set, resulting in index failure.

Test table structure

  • Database character set is UTF8MB4
  • t1Table character set is UTF8
  • t2Table character set is UTF8MB4
mysql> show create database test\G
*************************** 1. row ***************************
       Database: test
Create Database: CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET utf8mb4 */
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `a` varchar(20) DEFAULT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `a` varchar(20) DEFAULT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.01 sec)

Single table example

-- 正常执行时,匹配字段的字符集(没有单独指定时继承表的字符集)
mysql> explain select * from t1 where a='1000';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | ref  | a             | a    | 63      | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

-- 将参数转换不同的字符集,无法走到索引,而是全表扫描
mysql> explain select * from t1 where a=convert('1000' using utf8mb4);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2000 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


-- show warnings可以看到优化器进行了转换,在t1.a上加了convert函数,从而无法走到索引
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                               |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` where (convert(`test`.`t1`.`a` using utf8mb4) = <cache>(convert('1000' using utf8mb4))) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

In addition, it should be noted that:

MySQL will internally give priority to converting low-level character sets to higher-level character sets, such as converting UTF8 to UTF8MB4.

In the previous example, convert()the function was added t1.ato , but in the following example, convert()the function was added to the parameter instead of t2.athe field. This situation did not result in poor performance:

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `a` varchar(20) DEFAULT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> explain select * from t2 where a=convert('1000' using utf8);
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t2    | NULL       | ref  | a             | a    | 83      | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                   |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`a` = convert(convert('1000' using utf8) using utf8mb4)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

To sum up:

  • When the table field character set is a lower-level character set (such as UTF8), and the incoming value is a higher-level character set (such as UTF8MB4), the character set of the table field will be converted at this time, which is equivalent to using the Function, index invalid.
  • When the table field is a higher-level character set (such as UTF8MB4) and the incoming value is a lower-level character set (such as UTF8), the incoming value will be converted into the character set and will not cause index failure. .

However, we usually do not use convert()the function to convert the character set of parameters manually. In the following two scenarios, there may be implicit type conversions that are easy to ignore, causing production problems.

Character set conversion in table joins

When the character sets of the connection fields of the two tables are inconsistent, it will lead to implicit conversion ( convert()function added inside MySQL), and the connection field index cannot be reached, and the optimal table connection sequence may not be used.

The table that was originally a driven table may be used as a driving table because the index cannot be used.

Example:

  • Under normal circumstances, MySQL will give priority to tables with small result sets as driving tables. In this example, they are t2the driving table and t1the driven table.
  • However, due to the different character sets, the actual executed SQL is as show warningsshown in the figure. If a function t1.ais added to the field convert()to convert the character set, t1.athe index of the field cannot be reached and the connection order has to be changed.
mysql> explain select * from t1 left join t2 on t1.a=t2.a where t2.id<1000;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 498649 |   100.00 | NULL                  |
|  1 | SIMPLE      | t2    | NULL       | ref  | PRIMARY,a     | a    | 83      | func |      1 |     4.79 | Using index condition |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`id` < 1000) and (convert(`test`.`t1`.`a` using utf8mb4) = `test`.`t2`.`a`)) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)




-- 在下面示例中,虽然也发生了类型转换,但是效率并没有变差,因为原本最优的连接顺序就是t1作为驱动表
mysql> explain select * from t1 left join t2 on t1.a=t2.a where t1.id<1000;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |  999 |   100.00 | Using where |
|  1 | SIMPLE      | t2    | NULL       | ref   | a             | a       | 83      | func |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)


mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                   |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t1` left join `test`.`t2` on((convert(`test`.`t1`.`a` using utf8mb4) = `test`.`t2`.`a`)) where (`test`.`t1`.`id` < 1000) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Character set conversion in stored procedures

This is also a scenario that is relatively easy to ignore. The problem was discovered when the primary key was updated during the storage process in the production environment, but it took 10s+ to execute.

The character set of variables in the stored procedure is inherited from databasethe character set of (can also be specified when creating). When the table field character set databaseis different from the character set of (), implicit character set type conversion similar to the previous one will occur.

Example:

  • databaseThe character set is UTF8MB4
  • character_set_clientand are the values ​​of session and collation_connectionwhen creating the stored procedurecharacter_set_clientcollation_connection
  • It has been tested that the character set of the variables in the stored procedure is consistent with the character set at the database level.
-- 存储过程信息: Database Collation: utf8mb4_general_ci
mysql> show create procedure update_data\G
*************************** 1. row ***************************
           Procedure: update_data
            sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
    Create Procedure: CREATE DEFINER=`root`@`%` PROCEDURE `update_data`()
begin
  declare j int;
  declare n varchar(100);
   select charset(n);
  set j=1;
  while(j<=2000)do
set n = cast(j as char);
select 1,now();
    update t1 set b=concat(b,'1') where a=n;
select 2,now();
select sleep(1);
    set j=j+1;
  end while;
end
character_set_client: utf8mb4
collation_connection: utf8mb4_general_ci
  Database Collation: utf8mb4_general_ci
1 row in set (0.00 sec)
如下,在执行存储过程后,看到打印的变量n的字符集是utf8mb4


mysql> call update_data();
+------------+
| charset(n) |
+------------+
| utf8mb4    |
+------------+
1 row in set (0.00 sec)

The statement updated based on the index field aactually becomes the following, using a full table scan (type: index, key: primary).

mysql> explain update t1 set b=concat(b,'1') where a=convert('1000' using utf8mb4);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | UPDATE      | t1    | NULL       | index | NULL          | PRIMARY | 4       | NULL | 498649 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
1 row in set (0.00 sec)


-- 而正常情况下,执行计划为:
mysql> explain update t1 set b=concat(b,'1') where a='1000';
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
|  1 | UPDATE      | t1    | NULL       | range | a             | a    | 63      | const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
1 row in set (0.00 sec)

The update time has also changed from 0.00sec to 0.60sec . When the amount of table data is large, a full table scan will have a greater impact on production.

mysql> update t1 set b=concat(b,'1') where a='1000';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> update t1 set b=concat(b,'1') where a=convert('1000' using utf8mb4);
Query OK, 1 row affected (0.60 sec)
Rows matched: 1  Changed: 1  Warnings: 0

How to avoid implicit conversions

For implicit conversion of data types:

  1. Standard data type selection
  2. SQL pass participating field data type matching

For implicit conversion of character sets: client character set, server-side character set, database character set, table character set, and field character set remain consistent.

For more technical articles, please visit: https://opensource.actionsky.com/

About SQLE

SQLE is a comprehensive SQL quality management platform that covers SQL auditing and management from development to production environments. It supports mainstream open source, commercial, and domestic databases, provides process automation capabilities for development and operation and maintenance, improves online efficiency, and improves data quality.

SQLE get

type address
Repository https://github.com/actiontech/sqle
document https://actiontech.github.io/sqle-docs/
release news https://github.com/actiontech/sqle/releases
Data audit plug-in development documentation https://actiontech.github.io/sqle-docs/docs/dev-manual/plugins/howtouse
The pirated resources of "Qing Yu Nian 2" were uploaded to npm, causing npmmirror to have to suspend the unpkg service. Zhou Hongyi: There is not much time left for Google. I suggest that all products be open source. Please tell me, time.sleep(6) here plays a role. What does it do? Linus is the most active in "eating dog food"! The new iPad Pro uses 12GB of memory chips, but claims to have 8GB of memory. People’s Daily Online reviews office software’s matryoshka-style charging: Only by actively solving the “set” can we have a future. Flutter 3.22 and Dart 3.4 release a new development paradigm for Vue3, without the need for `ref/reactive `, no need for `ref.value` MySQL 8.4 LTS Chinese manual released: Help you master the new realm of database management Tongyi Qianwen GPT-4 level main model price reduced by 97%, 1 yuan and 2 million tokens
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/actiontechoss/blog/11183928