【MySQL】性能测试如何快速生成千万数据

前两天发现同事要做一个对大表进行范围查询的功能，然后想在本地生成7千万数据进行性能测试，苦于代码插入太慢，SQL单条插入也很慢，于是想到了用以下几个方法快速生成千万级数据。

数据单条插入

首先我们的表结构如下（嫌麻烦也可以直接看下面的截图）：

CREATE TABLE `batch_index` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `resource_id` int(10) unsigned NOT NULL COMMENT 'xx id',
  `name` varchar(255) NOT NULL DEFAULT '' COMMENT '名称',
  `cate_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '分类id',
  `input_time` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '创建时间',
  PRIMARY KEY (`id`),
  KEY `idx_resource_id` (`resource_id`),
  KEY `idx_cate_id` (`cate_id`),
  KEY `idx_input_time` (`input_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
复制代码

然后单条SQL语句为一万条：

INSERT INTO batch_index VALUES (1, 1357, 'name_1357', 16, 1626984835);
INSERT INTO batch_index VALUES (2, 1148, 'name_1148', 6, 1617323895);
INSERT INTO batch_index VALUES (3, 1168, 'name_1168', 5, 1638031542);
...
复制代码

执行时间为：42.497 秒

数据批量插入

将数据表截断清空后，再批量导入这一万条

INSERT INTO batch_index VALUES 
(1, 1357, 'name_1357', 16, 1626984835)
,(2, 1148, 'name_1148', 6, 1617323895)
,(3, 1168, 'name_1168', 5, 1638031542)
...
复制代码

执行时间为：1.815 秒

加事务

这种情况我们也分成两组：单条插入一万条并加事务、批量插入一万条并加事务。

单条插入一万条并加事务

start transaction;
INSERT INTO batch_index VALUES (1, 1357, 'name_1357', 16, 1626984835);
INSERT INTO batch_index VALUES (2, 1148, 'name_1148', 6, 1617323895);
...
commit;
复制代码

执行时间为：3.644 秒

批量插入一万条并加事务

start transaction;
INSERT INTO batch_index VALUES 
(1, 1357, 'name_1357', 16, 1626984835)
,(2, 1148, 'name_1148', 6, 1617323895)
...
commit;
复制代码

执行时间为：2.003 秒

储存过程

有小伙伴会说我们可以用储存过程，这样可以快很多。

我们先定义存储过程：

DELIMITER $$  -- 定义结束符（为了不跟储存过程的“;”冲突，这里重新定义）
drop procedure if exists `insert_batch_index` $$
CREATE procedure `insert_batch_index` (in n int)
begin
	declare i int default 1;
	declare resource_id int default 0;
	declare test_name varchar(255) default '';
	declare cate_id int default 0;
	declare input_time int default 0;

	while i < n do
		set resource_id = floor(1 + rand() * 3000);
		set test_name = concat('name_', resource_id);
		set cate_id = floor(1 + rand() * 20);
		set input_time = floor(1609430400 + rand() * 32227200);
		insert into batch_index values (null, resource_id, test_name, cate_id, input_time);
		set i = i + 1;
	end while;
end $$
delimiter ;  --把结束符再设置回“;”
复制代码

然后运行

call insert_batch_index(10000);
复制代码

执行时间为：41.796 秒

加上事务执行

start transaction;
call insert_batch_index(10000);
commit;
复制代码

执行时间为：0.798 秒

储存过程 + 内存表

跟我们前面创建的innodb的表结构一样，只是存储引擎用memory内存：

CREATE TABLE `batch_index_memory` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `resource_id` int(11) NOT NULL COMMENT 'xx id',
  `name` varchar(255) NOT NULL DEFAULT '' COMMENT '名称',
  `cate_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '分类id',
  `input_time` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '创建时间',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE=MEMORY DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC;
复制代码

同样我们也创建一个存储过程，只是插入的表为batch_index_memory

-- 存储过程，插入到batch_index_memory内存表
call insert_batch_index_memory(10000); 
insert into batch_index select * from batch_index_memory;
复制代码

执行时间为：0.745 秒

如果报错，说表空间已满，可以在my.cnf或my.ini文件中设置max_heap_table_size = 1G，视情况设置大小。

加上事务后：

start TRANSACTION;
call insert_batch_index_memory(10000);
commit;
insert into batch_index select * from batch_index_memory;
复制代码

执行时间为：0.740 秒
貌似没啥变化

临时表（内存表）

首先用你熟悉的编程语言生成以下文件（生成1万条大约耗时1.002秒）：

再在MySQL客户端或者navicat等工具上运行（耗时0.02秒）：

load data infile 'F:/batch_sql.sql' replace into table tmp_table;
复制代码

其中“F:/batch_sql.sql”是你的文件路径。

如果报错：

mysql> load data infile 'F:/batch_sql.sql' replace into table tmp_table;
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
复制代码

需要在my.cnf或者my.ini配置文件中设置你的路径：

secure_file_priv =F:/
复制代码

然后从临时表复制主键id，并用随机生成的其他字段，复制到batch_index表，耗时0.245秒

insert into batch_index (
  select 
    id, 
    floor(1 + rand() * 1000000) as resource_id, 
    concat('name', '_', floor(1 + rand() * 1000000)) as `name`, 
    floor(1 + rand() * 20) as cate_id, 
    floor(1609430400 + rand() * 32227200) as input_time 
  from tmp_table
);
复制代码

整个过程为1.267秒

对比

我们把这些用表格对比一下

方式	数据量	耗时
单条插入	10000	42.497 s
批量插入	10000	1.815 s
单条插入 + 事务	10000	3.644 s
批量插入 + 事务	10000	2.003 s
储存过程	10000	41.796 s
存储过程 + 事务	10000	0.798 s
储存过程 + 内存表	10000	0.745 s
储存过程 + 内存表 + 事务	10000	0.740 s
临时表（内存表）	10000	1.267 s

我们发现快速导入数据起作用的因素是：

事务
批量
存储过程
内存表/临时表

最后，我们将数据量提高到100万，做了以下对比：

方式	数据量	耗时
存储过程 + 事务	1000000	80.530 s
储存过程 + 内存表	1000000	77.822 s
储存过程 + 内存表 + 事务	1000000	76.466 s
临时表（memory表）	1000000	84.874 s
临时表（innodb表）	1000000	92.456 s

当我以为这里基本就可以得出结论，存储过程 + 内存表的方式是最快的时候，准备生成1000万数据，发现我的mysqld服务挂了，因为内存爆了。

总结

快速生成测试数据的方法，我们主要尝试了用批量、事务、储存过程、临时内存表 的方式做对比，最终发现存储过程 + 内存表的方式是最快的，但随着数据量的增大，内存有可能爆掉，我们可以分批次生成。

有兴趣的小伙伴可以尝试一下，毕竟每台机器上的配置文件和机器性能不一样，可能结果上导致一些偏差。欢迎提出你的结论~

好了，这个技巧你学废了吗？

欢迎关注“易科编程”公众号，定期分享golang技术和财富思维，一起变得富有。