MySQL VARCHAR best length evaluation practices

Is your VARCHAR the right length?

Author: Guan Yongqiang, a member of the Aikesheng DBA team, is good at MySQL operation and maintenance skills. He loves learning new knowledge and is also an otaku who loves to play games.

Author: Li Fuqiang, a member of Aikesheng DBA team, familiar with MySQL, TiDB, OceanBase and other databases. I believe that if you continue to do the right things well, you will get different rewards.

Produced by the Aikeson open source community. Original content may not be used without authorization. Please contact the editor and indicate the source for reprinting.

This article is about 2,200 words and is expected to take 8 minutes to read.

Background description

Some customers reported that they expanded the length of a VARCHAR type field. The first time you can modify it quickly, but the second time it takes a long time to execute. I am confused because the amount of data in the table is almost the same. Why is it faster VARCHAR(20)to adjust from to , but it takes a long time to adjust from to ?VARCHAR(50)VARCHAR(50)VARCHAR(100) So we reproduced the situation and conducted problem analysis.

environmental information

The products and version information involved in this verification are as follows:

product Version
MySQL 5.7.25-log MySQL Community Server (GPL)
Sysbench sysbench 1.0.17

Scene recurrence

3.1 Data preparation

mysql> show create table test.sbtest1;
+---------+----------------------------------------+
| Table   | Create Table                           |
+---------+----------------------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  `pad` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+----------------------------------------+
1 row in set (0.00 sec)
mysql> select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.10 sec)

3.2 Problem verification

To simulate the customer's description, we cmodify the field, VARCHAR(20)modify it to VARCHAR(50)and then modify it to VARCHAR(100), and observe the time required for its execution. The following are the relevant operation commands and execution results:

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(50);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test.sbtest1;
+---------+-------------------------------+
| Table   | Create Table                  |
+---------+-------------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL,
  `pad` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(100);
Query OK, 1000000 rows affected (4.80 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> show create table test.sbtest1;
+---------+---------------------------+
| Table   | Create Table              |
+---------+---------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(100) COLLATE utf8mb4_bin DEFAULT NULL,
  `pad` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+------------------------------------------------------------------------+
1 row in set (0.00 sec)

Through verification, it was found that the problem would reappear stably, so we continued to try to modify it. Finally, we found that it took a long time to modify VARCHAR(63)to VARCHAR(64), but after 64, we continued to expand the length and found that it could be completed quickly.

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(63);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(64);
Query OK, 1000000 rows affected (4.87 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> show create table test.sbtest1;
+---------+---------------+
| Table   | Create Table  |
+---------+---------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(64) COLLATE utf8mb4_bin DEFAULT NULL,
  `pad` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(65);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(66);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

3.3 Problem analysis

Analyze the situation where it takes a long time to VARCHAR(63)modify . VARCHAR(64)By consulting the official documentation , we found that VARCHARthe character type that can store characters when the byte length is 1 is 0~255. The current character set type is UTF8MB4. Since UTF8MB4 is a four-byte encoded character set, that is, one byte length can store 63.75 (255/4) characters, so when we modify it toVARCHAR(63) , we need to add a byte for data processing. VARCHAR(64)For storage, it is necessary to complete this length expansion by establishing a temporary table, so it takes a lot of time.

Extended verification

4.1 Data preparation

mysql> show create table test_utf8.sbtest1;
+---------+----------------------------------------+
| Table   | Create Table                           |
+---------+----------------------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(20) NOT NULL DEFAULT '',
  `pad` varchar(20) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8 |
+---------+------------------+
1 row in set (0.00 sec)

mysql> select count(*) from test_utf8.sbtest1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.10 sec)

4.2 UTF8 scene verification

Since UTF8 is a three-byte encoding character set, one byte can store 85 (255/3=85) characters.

The sequence of this modification is: VARCHAR(20)→VARCHAR(50)→VARCHAR(85), and observe the time required for its execution. The following are the relevant operation commands and execution results:

mysql>  ALTER TABLE test_utf8.sbtest1 MODIFY c VARCHAR(50) ,algorithm=inplace,lock=none;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test_utf8.sbtest1 MODIFY c VARCHAR(85) ,algorithm=inplace,lock=none;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test_utf8.sbtest1;
+---------+-------------------------------+
| Table   | Create Table                  |
+---------+-------------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(85) DEFAULT NULL,
  `pad` varchar(20) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8 |
+---------+--------------------------------------------------+
1 row in set (0.00 sec)

Modification sequence: VARCHAR(85)→VARCHAR(86)→VARCHAR(100). At this time, we will observe that the executed SQL statement directly returns an error. So we delete algorithm=inplace ,lock=nonethese two parameters, which allows this SQL to create a temporary table and lock the target table, and then re-execute the SQL. The following are the relevant operation commands and execution results:

mysql> ALTER TABLE test_utf8.sbtest1 MODIFY c VARCHAR(86) ,algorithm=inplace,lock=none;
ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

mysql> ALTER TABLE test_utf8.sbtest1 MODIFY c VARCHAR(86);
Query OK, 1000000 rows affected (4.94 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> show create table test_utf8.sbtest1;
+---------+-------------------------------+
| Table   | Create Table                  |
+---------+-------------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(86) DEFAULT NULL,
  `pad` varchar(20) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8 |
+---------+--------------------------------------------------+
1 row in set (0.00 sec)

mysql> ALTER TABLE test_utf8.sbtest1 MODIFY c VARCHAR(100) ,algorithm=inplace,lock=none;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

4.3 UTF8MB4 scenario verification

Since UTF8MB4 is a four-byte encoding character set, one byte can store 63 (255/4=63.75) characters.

The sequence of this modification is: VARCHAR(20)→VARCHAR(50)→VARCHAR(63), and observe the time required for its execution. The following are the relevant operation commands and execution results:

mysql>  ALTER TABLE test.sbtest1 MODIFY c VARCHAR(50) ,algorithm=inplace,lock=none;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(63) ,algorithm=inplace,lock=none;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test.sbtest1;
+---------+-------------------------+
| Table   | Create Table           |
+---------+-------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(63) COLLATE utf8mb4_bin DEFAULT NULL,
  `pad` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

The sequence of this modification is: VARCHAR(63)→VARCHAR(64)→VARCHAR(100). At this time, we will observe that the executed SQL statement directly returns an error. So we delete algorithm=inplace, lock=nonethese two parameters, which allows this SQL to create a temporary table and lock the target table, and then re-execute the SQL. The following are the relevant operation commands and execution results:

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(64) ,algorithm=inplace,lock=none;
ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(64) ;
Query OK, 1000000 rows affected (4.93 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

mysql> show create table test.sbtest1;
+---------+--------------------------+
| Table   | Create Table             |
+---------+--------------------------+
| sbtest1 | CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(64) COLLATE utf8mb4_bin DEFAULT NULL,
  `pad` varchar(20) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> ALTER TABLE test.sbtest1 MODIFY c VARCHAR(100) ,algorithm=inplace,lock=none;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

4.4 Comparative analysis

Character length modification UTF8(MB3) UTF8MB4
20->50 online ddl (inplace) online ddl (inplace)
50->100 online ddl (copy) online ddl (copy)
X->Y When Y*3<256, inplace <br> When X*3>=256, inplace When Y*4<256, inplace <br> When X*4>=256, inplace
Remark One character occupies a maximum of 3 bytes One character occupies a maximum of 4 bytes

in conclusion

When the maximum byte length of a field is >= 256 characters, 2 bytes are needed to represent the field length.

Example using UTF8MB4:

  • For the maximum byte length of the field varying within 256 characters (that is, x*4<256 and Y*4<256), online ddl uses inplace mode, which is highly efficient.
  • For fields whose maximum byte length varies beyond 256 characters (that is, x*4>=256 and Y*4>=256), online ddl uses inplace mode, which is highly efficient.
  • Otherwise, online ddl uses copy mode, which is inefficient.
  • The same goes for UTF8(MB3).

suggestion

In order to avoid later field length expansion, online ddl adopts the inefficient copy mode. It is recommended that:

  • For UTF8(MB3) character type:
    • The number of characters is less than 50. It is recommended to set the character length to VARCHAR(50) or less.
    • The number of characters is close to 84 (256/3=83.33). It is recommended to set it to varchar(84) or a larger character length.
  • For UTF8MB4 character type:
    • If the number of characters is less than 50, it is recommended to set it to VARCHAR(50), or a smaller character length.
    • The number of characters is close to 64 (256/4=64). It is recommended to set it to VARCHAR(64) or a larger character length.

The results of this verification are for reference only. If you need to operate in a production environment, please reasonably define the length of VARCHAR based on the actual situation to avoid economic losses.

For more technical articles, please visit: https://opensource.actionsky.com/

About SQLE

SQLE is a comprehensive SQL quality management platform that covers SQL auditing and management from development to production environments. It supports mainstream open source, commercial, and domestic databases, provides process automation capabilities for development and operation and maintenance, improves online efficiency, and improves data quality.

SQLE get

type address
Repository https://github.com/actiontech/sqle
document https://actiontech.github.io/sqle-docs/
release news https://github.com/actiontech/sqle/releases
Data audit plug-in development documentation https://actiontech.github.io/sqle-docs/docs/dev-manual/plugins/howtouse
A programmer born in the 1990s developed a video porting software and made over 7 million in less than a year. The ending was very punishing! High school students create their own open source programming language as a coming-of-age ceremony - sharp comments from netizens: Relying on RustDesk due to rampant fraud, domestic service Taobao (taobao.com) suspended domestic services and restarted web version optimization work Java 17 is the most commonly used Java LTS version Windows 10 market share Reaching 70%, Windows 11 continues to decline Open Source Daily | Google supports Hongmeng to take over; open source Rabbit R1; Android phones supported by Docker; Microsoft's anxiety and ambition; Haier Electric shuts down the open platform Apple releases M4 chip Google deletes Android universal kernel (ACK ) Support for RISC-V architecture Yunfeng resigned from Alibaba and plans to produce independent games on the Windows platform in the future
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/actiontechoss/blog/11094958