Based on the analysis of kinship MaxCompute InformationSchema

First, the demand scenario analysis in actual data platform operations management process, the size of the data table often with the construction of access and data applications more business data gradually grow to very large sizes, data managers often want to take advantage of analysis of metadata to better grasp the different unrelated data tables to analyze the downstream data dependencies. This article describes how to analyze a blood relationship to a particular table according to the input of the output table MaxCompute InformationSchema job ID. Second, the program provides design ideas MaxCompute Information_Schema tasks_history job detail data access table, the table has a job ID, input_tables, downstream output_tables dependency field record table. Analysis on the table kinship 1, according to a one-day job history, detailed information tasks_history table of input_tables, output_tables, job ID field of acquisition and analysis of statistical each table within a certain time based on these three fields Statistics downstream dependencies. 2. The dependence on the downstream table inferred kinship. Third, the program to realize the method of Reference Example 1: (1) on a downstream query SQL table depends on the following processing based on the job ID:
select t2.input_table, t1.inst_id, replace(replace(t1.output_tables,"[",""),"]","") as output_table from information_schema.tasks_history t1 left join ( select ---去除表开始和结尾的[ ] trans_array(1,",",inst_id, replace(replace(input_tables,"[",""),"]","")) as (inst_id,input_table) from information_schema.tasks_history where ds = 20190902 )t2 on t1.inst_id = t2.inst_id where (replace(replace(t1.output_tables,"[",""),"]","")) <> "" order by t2.input_table limit 1000;
The results shown below:
 
(2) According to the analysis results obtained for each table input tables and output tables Table job ID of the connection, i.e., each table kinship. Kinship bits is shown below:
 
Intermediate links for the job ID, start the connection of the input table, the output table of the arrow. Reference Example Two: The following embodiment is provided by a partition, to analyze binding DataWorks kinship: (1) designed to store the results in Table Schema
CREATE TABLE IF NOT EXISTS dim_meta_tasks_history_a (stat_date STRING COMMENT 'Statistics Date', project_name STRING COMMENT 'Project Name', task_id STRING COMMENT 'job ID', start_time STRING COMMENT 'start time', end_time STRING COMMENT 'end time', input_table STRING COMMENT 'input table', output_table STRING COMMENT 'output table', etl_date STRING COMMENT 'ETL runtime');
(2) key analytical sql
SELECT '${yesterday}' AS stat_date ,'project_name' AS project_name ,a.inst_id AS task_id ,start_time AS start_time ,end_time AS end_time ,a.input_table AS input_table ,a.output_table AS output_table ,GETDATE() AS etl_date FROM ( SELECT t2.input_table ,t1.inst_id ,replace(replace(t1.input_tables,"[",""),"]","") AS output_table ,start_time ,end_time FROM ( SELECT * ,ROW_NUMBER() OVER(PARTITION BY output_tables ORDER BY end_time DESC) AS rows FROM information_schema.tasks_history WHERE operation_text LIKE 'INSERT OVERWRITE TABLE%' AND ( start_time >= TO_DATE('${yesterday}','yyyy-mm-dd') and end_time <= DATEADD(TO_DATE('${yesterday}','yyyy-mm-dd'),8,'hh') ) AND(replace(replace(output_tables,"[",""),"]",""))<>"" AND ds = CONCAT(SUBSTR('${yesterday}',1,4),SUBSTR('${yesterday}',6,2),SUBSTR('${yesterday}',9,2)) )t1 LEFT JOIN( SELECT TRANS_ARRAY(1,",",inst_id,replace(replace(input_tables,"[",""),"]","")) AS (inst_id,input_table) FROM information_schema.tasks_history WHERE ds = CONCAT(SUBSTR('${yesterday}',1,4),SUBSTR('${yesterday}',6,2),SUBSTR('${yesterday}',9,2)) )t2 ON t1.inst_id = t2.inst_id where t1.rows = 1 ) a WHERE a.input_table is not null ;
(3) task dependencies
 
 
(4) final kinship
 
Kinship analysis above is based on their own ideas to complete the practice. The real business scenarios require everyone to verify. So I hope you may need to do the appropriate sql modify according to their business needs. If you have found improper handling of local hope the exhibitions. I have adjusted accordingly.
 
 
 
This article Ali cloud content, shall not be reproduced without permission.

Guess you like

Origin www.cnblogs.com/yunqishequ/p/12083658.html