X-DOC: Mysql database massive data query acceleration (timed JOB and stored procedure application cases)
1. Case background
In a certain middle-end system, a large amount of basic data (dimensional data, dimension mapping relationships, etc.) is designed to support business functions. There are a large number of dimension foreign key related fields in the business table. The advantage is that it can realize the front-end selection and entry of data. Verification ensures the accuracy of the entered data; the disadvantage is that when making business reports, a large number of dimension association (join) operations are required.
Due to the dilemma that the platform needs to write views to output reports, some more complex report requirements may involve a lot of table associations, which greatly affects report performance. Some even require views to nest views, because views The full amount of data will not be filtered preferentially based on external filtering conditions, which will further affect the execution efficiency. For business data with a large amount of data, the data may not even be output.
2. Solution ideas
Introducing the solution idea of data warehouse, through scheduled scheduling, cleaning and materializing data in advance, that is, some relatively stable and infrequently changing data, such as basic data, historical business data, etc., are processed through scheduled tasks and necessary logic to form an intermediate The data, even the final result data, is saved to self-built physical tables, and then reasonable indexes are created on the tables. Finally, reports are created based on these physical tables and provided to front-end users for query, ultimately achieving a better user experience.
3. Implementation method
Since the system is based on Mysql data, the following uses Mysql as the basis to demonstrate the implementation process.
3.1 Turn on the scheduled scheduling function
Generally, relational databases have their own scheduled scheduling functions, which can implement the function of executing scripts on a scheduled basis. Such as SQLServer's jobs and Mysql's events.
# 1、开启MYSQL定时设置
-- 1.1、通过show EVENTS显示当前定义的事件
show EVENTS;
(Note: The initial list is empty, and the records in the picture are generated after the following steps are completed)
-- 1.2、检查event_scheduler状态
SHOW VARIABLES LIKE 'event_scheduler';
-- 1.3、设置job自动启动可以执行:
SET GLOBAL event_scheduler = ON;
-- 或修改my.ini文件,添加:event_scheduler=1
3.2 Create JOB log table
Customize a table to record job execution status to facilitate follow-up tracking.
delimiter #
drop procedure if exists leodb.p_job_log;
create procedure leodb.p_job_log
(
in p_id int,
in p_job varchar(50),
in p_task varchar(50),
in p_note varchar(255)
)
begin
CREATE TABLE if not exists leodb.t_job_log (
id int, -- job id
job varchar(50), -- job名称
task varchar(50), -- 任务名称
starttime datetime, -- 开始时间
endtime datetime, -- 结束时间
NOTE varchar(255), -- 备注信息
primary key( id, job, task)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- 存在则更新结束时间,并拼接备注信息
if exists(select 1 from leodb.t_job_log
where id = p_id and job = p_job and task = p_task)
THEN
update leodb.t_job_log set
endtime = now(),
note = case when ifnull(note,'')<>'' then note + '/' + p_note else p_note end
where id = p_id and job = p_job and task = p_task;
-- 不存在则插入日志
else
insert into leodb.t_job_log(id, job, task, starttime, note)
value( p_id, p_job, p_task, now(), p_note );
end if;
end#
delimiter ;
-- 测试日志维护存储过程
-- call leodb.p_job_log( cast(unix_timestamp() as signed) , 'test', 'test', 'test');
-- call leodb.p_job_log( 1688086594,, 'test', 'test', 'test');
3.3 Create JOB task
# 3、创建具体JOB存储过程
# 3.1、TASK1:维度、维度映射转存物理表
delimiter #
drop procedure if exists leodb.p_job_get_data_4_rp;
create procedure leodb.p_job_get_data_4_rp()
begin
-- ·、成本中心映射预算部门
start transaction;
create table if not exists leodb.t_view_LEOPU_COST_BGDP
(
s_object_code varchar(20),
t_object_code varchar(20)
);
delete from leodb.t_view_LEOPU_COST_BGDP;
insert into leodb.t_view_LEOPU_COST_BGDP
select s_object_code, t_object_code from leodb.view_LEOPU_COST_BGDP;
commit;
-- 2、消费类型映射预算科目
start transaction;
create table if not exists leodb.t_view_LEOPU_EXP_CSM_BGT
(
s_object_code varchar(20),
t_object_code varchar(20)
);
delete from leodb.t_view_LEOPU_EXP_CSM_BGT;
insert into leodb.t_view_LEOPU_EXP_CSM_BGT
select s_object_code, t_object_code from leodb.view_LEOPU_EXP_CSM_BGT;
commit;
-- 3、会计科目映射预算科目
start transaction;
create table if not exists leodb.t_view_LEOPU_ACC_BGT
(
s_object_code varchar(20),
t_object_code varchar(20)
);
delete from leodb.t_view_LEOPU_ACC_BGT;
insert into leodb.t_view_LEOPU_ACC_BGT
select s_object_code, t_object_code from leodb.view_LEOPU_ACC_BGT;
commit;
-- 4、LEOUP付款类型+供应商类型映射贷方(应付报账)
start transaction;
create table if not exists leodb.t_view_LEOPU_PUR_BILL
(
s1_object_code varchar(20),
s2_object_code varchar(20),
t_object_code varchar(20)
);
delete from leodb.t_view_LEOPU_PUR_BILL;
insert into leodb.t_view_LEOPU_PUR_BILL
select s1_object_code, s2_object_code, t_object_code from leodb.view_LEOPU_PUR_BILL;
commit;
end#
delimiter ;
# 3.2、TASK2:提取流程初审人到物理表
-- 基础表:记录流程初审人
select * from leodb.t_mdfp_bpm_audit_first;
drop PROCEDURE if exists leodb.p_job_task_first_audit;
create PROCEDURE leodb.p_job_task_first_audit( in p_minute int )
begin
start transaction;
-- 创建初审人员表
create table if not exists leodb.t_mdfp_bpm_audit_first
( BUSINESS_ID varchar(32), OPERATE_TIME bigint, USER_NAME varchar(255));
-- 创建临时表,减少对审批步骤表的访问,提升查询效率
CREATE TEMPORARY TABLE if not exists leodb.tmp_mdfp_bpm_audit_history(
BUSINESS_ID varchar(32), OPERATE_TIME bigint, USER_NAME varchar(255));
truncate table tmp_mdfp_bpm_audit_history;
-- 拉取最近的初审记录(增量不同:前p_minute分钟内产生的记录)
insert into leodb.tmp_mdfp_bpm_audit_history( BUSINESS_ID, OPERATE_TIME, USER_NAME )
select BUSINESS_ID, OPERATE_TIME, USER_NAME
from mdfp.mdfp_bpm_audit_history as t
where ACT_NAME LIKE '%初审%'
and OPERATE_TYPE = 'approve'
and OPERATE_TIME > UNIX_TIMESTAMP(DATE_ADD(now(),INTERVAL -p_minute MINUTE))*1000
and not exists(select 1 from mdfp.mdfp_bpm_audit_history
where BUSINESS_ID = t.BUSINESS_ID
and ACT_NAME = t.ACT_NAME
and OPERATE_TYPE = t.OPERATE_TYPE
and OPERATE_TIME > t.OPERATE_TIME );
-- 处理数据1:存在且时戳较大的,需要更新回去
update leodb.t_mdfp_bpm_audit_first as d
join leodb.tmp_mdfp_bpm_audit_history as t on d.BUSINESS_ID = t.BUSINESS_ID and t.OPERATE_TIME > d.OPERATE_TIME
set d.OPERATE_TIME = t.OPERATE_TIME, d.USER_NAME = t.USER_NAME;
-- 处理数据2:不存在的,直接插入
insert into leodb.t_mdfp_bpm_audit_first( BUSINESS_ID, OPERATE_TIME, USER_NAME)
select BUSINESS_ID, OPERATE_TIME, USER_NAME
from leodb.tmp_mdfp_bpm_audit_history as t
where not exists(select 1 from leodb.t_mdfp_bpm_audit_first
where BUSINESS_ID = t.BUSINESS_ID);
-- 释放临时表
drop table leodb.tmp_mdfp_bpm_audit_history;
commit; -- 提交事务
end;
-- 测试,由于以上存储过程,结合了同步周期考虑增量查询,此处入参写大一点,实现全量数据初始化
call leodb.p_job_task_first_audit(10000);
# 3.3、TASK3:预算存物理表
drop PROCEDURE if exists leodb.p_job_task_budget;
create PROCEDURE leodb.p_job_task_budget()
begin
start transaction; -- 开始事务
-- 建表
create table if not exists leodb.t_budget_data
(
FYEAR int,
FMONTH int,
FDATE date,
FYEARMONTH varchar(10),
FBG_DEPT_CODE varchar(50),
FBG_ACCOUNT_CODE varchar(50),
FPROJECT_CODE varchar(50),
FINDUSTRY_CODE varchar(50),
BUDGET_AMOUNT decimal(16,6), -- 预算金额
OCCUPIED_AMOUNT decimal(16,6), -- 占用金额
ACTUAL_AMOUNT decimal(16,6), -- 发生金额
AVAILABLE_AMOUNT decimal(16,6), -- 可用金额
primary key(FYEAR, FMONTH, FBG_DEPT_CODE, FBG_ACCOUNT_CODE, FPROJECT_CODE, FINDUSTRY_CODE)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- 清除数据
delete from leodb.t_budget_data where fyear = 2023;
-- 插入数据
insert into leodb.t_budget_data( FYEAR, FMONTH, FDATE, FYEARMONTH, FBG_DEPT_CODE, FBG_ACCOUNT_CODE,
FPROJECT_CODE, FINDUSTRY_CODE, BUDGET_AMOUNT, OCCUPIED_AMOUNT, ACTUAL_AMOUNT, AVAILABLE_AMOUNT )
select h.FYEAR, h.FMONTH, h.FDATE, FYEARMONTH, h.FBG_DEPT_CODE, h.FBG_ACCOUNT_CODE,
h.FPROJECT_CODE, h.FINDUSTRY_CODE,
h.BUDGET_AMOUNT, h.OCCUPIED_AMOUNT, h.ACTUAL_AMOUNT, h.AVAILABLE_AMOUNT
from leodb.view_mdfp_bm_format as h
where fyear = 2003;
COMMIT; -- 提交事务
end;
3.4 Create JOB
# 4、创建具体JOB
delimiter #
drop event if exists leodb.JOB_RUN_EVERY1HOUR;
create event leodb.JOB_RUN_EVERY1HOUR
on schedule every 1 hour starts timestamp '2023-06-29 00:00:01'
do
begin
-- 1、维度数据、维度映射转存物理表
set @v_id=cast(unix_timestamp() as signed);
call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_get_data_4_rp', '');
call leodb.p_job_get_data_4_rp();
call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_get_data_4_rp', '维度数据转储成功');
-- 2、审批步骤初审人另存物理表
set @v_id=cast(unix_timestamp() as signed);
call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_task_first_audit(100)', '');
call leodb.p_job_task_first_audit(100); -- 100分钟
call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR', 'leodb.p_job_task_first_audit(100)', '初审人员转储成功');
-- 3、预算另存物理表
set @v_id=cast(unix_timestamp() as signed);
call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR','leodb.p_job_task_budget', '');
call leodb.p_job_task_budget(); -- 100分钟
call leodb.p_job_log( @v_id , 'JOB_RUN_EVERY1HOUR','leodb.p_job_task_budget', '预算数据转储成功');
end#
delimiter ;
3.5 Maintenance and viewing of JOB
# 5、JOB维护
-- 5.1、停止
ALTER EVENT leodb.JOB_RUN_EVERY1HOUR DISABLE;
-- 5.2、开启
ALTER EVENT leodb.JOB_RUN_EVERY1HOUR enable;
-- 5.3、查看状态
select * from mysql.event;
4. Summary
Through the above-mentioned intermediate data processing, the efficiency of report query can be significantly improved.
In actual projects, this solution can be used in scenarios where data real-time requirements are not high; for scenarios where real-time requirements are high, data can also be processed in segments. For example, historical data can be materialized first, and only current data can be processed. Perform real-time query, and then combine the two to accelerate the query of the entire data. If applied properly, it can also achieve a good speed improvement.
Original article, please indicate the source when reprinting - X-Files