[Mysql] X-DOC: Mysql database massive data query acceleration (timed JOB and stored procedure application cases)

1. Case background

In a certain middle-end system, a large amount of basic data (dimensional data, dimension mapping relationships, etc.) is designed to support business functions. There are a large number of dimension foreign key related fields in the business table. The advantage is that it can realize the front-end selection and entry of data. Verification ensures the accuracy of the entered data; the disadvantage is that when making business reports, a large number of dimension association (join) operations are required.
Due to the dilemma that the platform needs to write views to output reports, some more complex report requirements may involve a lot of table associations, which greatly affects report performance. Some even require views to nest views, because views The full amount of data will not be filtered preferentially based on external filtering conditions, which will further affect the execution efficiency. For business data with a large amount of data, the data may not even be output.

2. Solution ideas

Introducing the solution idea of ​​data warehouse, through scheduled scheduling, cleaning and materializing data in advance, that is, some relatively stable and infrequently changing data, such as basic data, historical business data, etc., are processed through scheduled tasks and necessary logic to form an intermediate The data, even the final result data, is saved to self-built physical tables, and then reasonable indexes are created on the tables. Finally, reports are created based on these physical tables and provided to front-end users for query, ultimately achieving a better user experience.

3. Implementation method

Since the system is based on Mysql data, the following uses Mysql as the basis to demonstrate the implementation process.

3.1 Turn on the scheduled scheduling function

Generally, relational databases have their own scheduled scheduling functions, which can implement the function of executing scripts on a scheduled basis. Such as SQLServer's jobs and Mysql's events.

# 1、开启MYSQL定时设置
-- 1.1、通过show EVENTS显示当前定义的事件
	show EVENTS;

Insert image description here
(Note: The initial list is empty, and the records in the picture are generated after the following steps are completed)

-- 1.2、检查event_scheduler状态
	SHOW VARIABLES LIKE 'event_scheduler';

Insert image description here

-- 1.3、设置job自动启动可以执行:
	SET GLOBAL event_scheduler = ON;
-- 或修改my.ini文件,添加:event_scheduler=1

3.2 Create JOB log table

Customize a table to record job execution status to facilitate follow-up tracking.

delimiter #
drop procedure if exists leodb.p_job_log;
create procedure leodb.p_job_log
(
	in p_id int,
	in p_job varchar(50),
	in p_task varchar(50),
	in p_note varchar(255)
)
begin
	CREATE TABLE if not exists leodb.t_job_log (
		id 				int,			-- job id
		job 			varchar(50),	-- job名称
		task 			varchar(50),	-- 任务名称
		starttime 		datetime,		-- 开始时间
		endtime 		datetime,		-- 结束时间
		NOTE			varchar(255),	-- 备注信息
		primary key( id, job, task)
	) ENGINE=InnoDB DEFAULT CHARSET=utf8;
	-- 存在则更新结束时间,并拼接备注信息
	if exists(select 1 from leodb.t_job_log 
						where id = p_id and job = p_job and task = p_task) 
	THEN
	 	update leodb.t_job_log set
	 		endtime = now(),
			note = case when ifnull(note,'')<>'' then note + '/' + p_note else p_note end
	 	where id = p_id and job = p_job and task = p_task;
	-- 不存在则插入日志
	else
		insert into leodb.t_job_log(id, job, task, starttime, note)
		value( p_id, p_job, p_task, now(), p_note ); 
	end if;
end#
delimiter ;
-- 测试日志维护存储过程
-- call leodb.p_job_log( cast(unix_timestamp() as signed) , 'test', 'test', 'test');
-- call leodb.p_job_log( 1688086594,, 'test', 'test', 'test');

3.3 Create JOB task

# 3、创建具体JOB存储过程
# 3.1、TASK1:维度、维度映射转存物理表
delimiter #
drop procedure if exists leodb.p_job_get_data_4_rp;
create procedure leodb.p_job_get_data_4_rp()
begin
	-- ·、成本中心映射预算部门
	start transaction;
	create table if not exists leodb.t_view_LEOPU_COST_BGDP 
	(
		s_object_code varchar(20),
		t_object_code varchar(20)
	);
	delete from leodb.t_view_LEOPU_COST_BGDP;
	insert into leodb.t_view_LEOPU_COST_BGDP 
	select s_object_code, t_object_code from leodb.view_LEOPU_COST_BGDP;
	commit;
	
	-- 2、消费类型映射预算科目
	start transaction;
	create table if not exists leodb.t_view_LEOPU_EXP_CSM_BGT
	(
		s_object_code varchar(20),
		t_object_code varchar(20)
	);
	delete from leodb.t_view_LEOPU_EXP_CSM_BGT;
	insert into leodb.t_view_LEOPU_EXP_CSM_BGT 
	select s_object_code, t_object_code from leodb.view_LEOPU_EXP_CSM_BGT;
	commit;
	
	-- 3、会计科目映射预算科目
	start transaction;
	create table if not exists leodb.t_view_LEOPU_ACC_BGT
	(
		s_object_code varchar(20),
		t_object_code varchar(20)
	);
	delete from leodb.t_view_LEOPU_ACC_BGT;
	insert into leodb.t_view_LEOPU_ACC_BGT 
	select s_object_code, t_object_code from leodb.view_LEOPU_ACC_BGT;
	commit;
	
	-- 4、LEOUP付款类型+供应商类型映射贷方(应付报账)
	start transaction;
	create table if not exists leodb.t_view_LEOPU_PUR_BILL
	(
		s1_object_code varchar(20),
		s2_object_code varchar(20),
		t_object_code varchar(20)
	);
	delete from leodb.t_view_LEOPU_PUR_BILL;
	insert into leodb.t_view_LEOPU_PUR_BILL 
	select s1_object_code, s2_object_code, t_object_code from leodb.view_LEOPU_PUR_BILL;
	commit;
end#
delimiter ;

# 3.2、TASK2:提取流程初审人到物理表
-- 基础表:记录流程初审人
select * from leodb.t_mdfp_bpm_audit_first;
drop PROCEDURE if exists leodb.p_job_task_first_audit;
create PROCEDURE leodb.p_job_task_first_audit( in p_minute int )
begin
	start transaction;
	-- 创建初审人员表
	create table if not exists leodb.t_mdfp_bpm_audit_first
	(	BUSINESS_ID varchar(32), OPERATE_TIME bigint, USER_NAME varchar(255));
	-- 创建临时表,减少对审批步骤表的访问,提升查询效率
	CREATE TEMPORARY TABLE if not exists leodb.tmp_mdfp_bpm_audit_history(
		BUSINESS_ID varchar(32), OPERATE_TIME bigint, USER_NAME varchar(255));   
	truncate table tmp_mdfp_bpm_audit_history;  
	-- 拉取最近的初审记录(增量不同:前p_minute分钟内产生的记录)
	insert into leodb.tmp_mdfp_bpm_audit_history( BUSINESS_ID, OPERATE_TIME, USER_NAME )	
	select BUSINESS_ID, OPERATE_TIME, USER_NAME
	from mdfp.mdfp_bpm_audit_history as t
	where ACT_NAME LIKE '%初审%' 
		and OPERATE_TYPE = 'approve'
		and OPERATE_TIME > UNIX_TIMESTAMP(DATE_ADD(now(),INTERVAL -p_minute MINUTE))*1000
		and not exists(select 1 from mdfp.mdfp_bpm_audit_history
										where BUSINESS_ID = t.BUSINESS_ID
											and ACT_NAME = t.ACT_NAME
											and OPERATE_TYPE = t.OPERATE_TYPE
											and OPERATE_TIME > t.OPERATE_TIME );

	-- 处理数据1:存在且时戳较大的,需要更新回去
	update leodb.t_mdfp_bpm_audit_first as d
	join leodb.tmp_mdfp_bpm_audit_history as t on d.BUSINESS_ID = t.BUSINESS_ID and t.OPERATE_TIME > d.OPERATE_TIME
	set d.OPERATE_TIME = t.OPERATE_TIME, d.USER_NAME = t.USER_NAME;					 
	-- 处理数据2:不存在的,直接插入
	insert into leodb.t_mdfp_bpm_audit_first( BUSINESS_ID, OPERATE_TIME, USER_NAME)
	select BUSINESS_ID, OPERATE_TIME, USER_NAME 
	from leodb.tmp_mdfp_bpm_audit_history as t
	where not exists(select 1 from leodb.t_mdfp_bpm_audit_first 
									 where BUSINESS_ID = t.BUSINESS_ID);

	-- 释放临时表
	drop table leodb.tmp_mdfp_bpm_audit_history;
	commit;		-- 提交事务
end;

-- 测试,由于以上存储过程,结合了同步周期考虑增量查询,此处入参写大一点,实现全量数据初始化
call leodb.p_job_task_first_audit(10000);

# 3.3、TASK3:预算存物理表
drop PROCEDURE if exists leodb.p_job_task_budget;
create PROCEDURE leodb.p_job_task_budget()
begin
	start transaction;	-- 开始事务
	-- 建表
	create table if not exists leodb.t_budget_data
	(
			FYEAR 						int,
			FMONTH 						int,
			FDATE 						date,	
			FYEARMONTH 				varchar(10),
			FBG_DEPT_CODE 		varchar(50),
			FBG_ACCOUNT_CODE 	varchar(50),
			FPROJECT_CODE 		varchar(50),
			FINDUSTRY_CODE 		varchar(50),
			BUDGET_AMOUNT 		decimal(16,6),					-- 预算金额
			OCCUPIED_AMOUNT 	decimal(16,6),				-- 占用金额
			ACTUAL_AMOUNT 		decimal(16,6),					-- 发生金额
			AVAILABLE_AMOUNT 	decimal(16,6),				-- 可用金额
			primary key(FYEAR, FMONTH, FBG_DEPT_CODE, FBG_ACCOUNT_CODE, FPROJECT_CODE, FINDUSTRY_CODE)
	)
	ENGINE=InnoDB DEFAULT CHARSET=utf8;
	-- 清除数据
	delete from leodb.t_budget_data where fyear = 2023;  
	-- 插入数据
	insert into leodb.t_budget_data( FYEAR, FMONTH, FDATE, FYEARMONTH, FBG_DEPT_CODE, FBG_ACCOUNT_CODE, 
		FPROJECT_CODE, FINDUSTRY_CODE, BUDGET_AMOUNT, OCCUPIED_AMOUNT, ACTUAL_AMOUNT, AVAILABLE_AMOUNT )
	select h.FYEAR, h.FMONTH, h.FDATE, FYEARMONTH, h.FBG_DEPT_CODE, h.FBG_ACCOUNT_CODE, 
		h.FPROJECT_CODE, h.FINDUSTRY_CODE,
		h.BUDGET_AMOUNT, h.OCCUPIED_AMOUNT, h.ACTUAL_AMOUNT, h.AVAILABLE_AMOUNT
	from leodb.view_mdfp_bm_format as h
	where fyear = 2003;
	
	COMMIT;		-- 提交事务
end;

3.4 Create JOB

# 4、创建具体JOB
delimiter #
drop event if exists leodb.JOB_RUN_EVERY1HOUR;
create event leodb.JOB_RUN_EVERY1HOUR  
on schedule every 1 hour starts timestamp '2023-06-29 00:00:01'
do
begin
	-- 1、维度数据、维度映射转存物理表
	set @v_id=cast(unix_timestamp() as signed);
	call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_get_data_4_rp', '');
	call leodb.p_job_get_data_4_rp();
	call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_get_data_4_rp', '维度数据转储成功');
	
	-- 2、审批步骤初审人另存物理表
	set @v_id=cast(unix_timestamp() as signed);
	call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_task_first_audit(100)', '');
	call leodb.p_job_task_first_audit(100);		-- 100分钟
	call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_task_first_audit(100)', '初审人员转储成功');
	
	-- 3、预算另存物理表
	set @v_id=cast(unix_timestamp() as signed);
	call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR','leodb.p_job_task_budget', '');
	call leodb.p_job_task_budget();		-- 100分钟
	call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR','leodb.p_job_task_budget', '预算数据转储成功');
end#
delimiter ;

3.5 Maintenance and viewing of JOB

# 5、JOB维护
-- 5.1、停止
ALTER EVENT leodb.JOB_RUN_EVERY1HOUR DISABLE;
-- 5.2、开启
ALTER EVENT leodb.JOB_RUN_EVERY1HOUR enable;
-- 5.3、查看状态
select * from mysql.event;

4. Summary

Through the above-mentioned intermediate data processing, the efficiency of report query can be significantly improved.
In actual projects, this solution can be used in scenarios where data real-time requirements are not high; for scenarios where real-time requirements are high, data can also be processed in segments. For example, historical data can be materialized first, and only current data can be processed. Perform real-time query, and then combine the two to accelerate the query of the entire data. If applied properly, it can also achieve a good speed improvement.

Original article, please indicate the source when reprinting - X-Files

Guess you like

Origin blog.csdn.net/XLevon/article/details/131487428