oracle connect by is very strong, but use it with caution, otherwise you may cry.

Foreword:

        The fourth industrial revolution has brought huge changes in technology, and also brought a lot of semi-structured data. A lot of data will be stored in the database in the form of collections and JSON. We extract this data into the data warehouse through ETL tools. Inside, how do we analyze it? The data is stored in the database like this. For example, as shown below, multiple people are responsible for testing the same testing project, so the data will be entered into the fields at once through delimiters such as "\" "/" and so on to facilitate users to maintain data. Of course, these data pose more challenges to analysts. High requirements. In order to split the data into multiple rows, we will use connect by to split the data into multiple rows.

         For this data, I wrote an article earlier to introduce how to process this data, also because of performance issues, and then use stored procedures to execute them one by one. After splitting a row of data, store it in the database until the split is completed. . For details, we can refer to the articles I wrote before.

Oracle made it clear at once that one field with multiple separators split multiple rows, and then multiple rows and columns with multiple separators split multiple rows. The final processing exceeded billions. . Billion-level data volume_oracle delimiter_They call me the technical director’s blog-CSDN blog

1. Common usage of connect by

        1. Perpetual calendar

        code

SELECT '年' lx, TO_CHAR(ADD_MONTHS(SYSDATE, (4 - ROWNUM)*12), 'YYYY') YEAR_LIST 
FROM DUAL CONNECT BY ROWNUM <= 5 --获取近5年的年份

        Effect

         2. Generate sequence

        code

select rownum from dual connect by rownum<=10;--生成1-10的序列

        Effect

        3. One line becomes multiple lines

        code

select REGEXP_SUBSTR('01#02#03#04', '[^#]+', 1, rownum) as newport
from dual connect by rownum <= REGEXP_COUNT('01#02#03#04', '[^#]+');--一行数据拆分为多行

         Effect

 Summarize:

        In general, connect by is still very powerful in processing a small amount of tree data, which is why many people like to use it. However, Oracle does not know the data structure of how many rows will appear after connect by. Therefore, Oracle is prone to wrong cardinality estimation, thus leading to NESTED LOOPS. Since it cannot estimate the resulting data rows, the performance deteriorates when the original data volume is between 500-800 rows. It will become very bad.

2. Analysis of actual cases

        1、union all +connect by

        In one of my current projects, I encountered a classic case. Due to the use of connect by, a data synchronization failed for almost 3 days. Why exactly did it take 3 days to synchronize? Because the user feedback happened to be after get off work on Friday, and because I got home from work and could not process remotely, I stopped the execution on Sunday, optimized some logic, and then executed it again. On Monday morning, I found that the desired effect was still not achieved. Because the corresponding SQ is more complicated, I did not analyze the reasons in detail. The specific SQ is as follows:

select 
   	   
       a.state xtstate,
       a.current_nodes_info dbr,
       c.*,
       b.FILENAME,
       b.FILE_URL,
 case when c.field0035  is null then  round(to_date(to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'),
                         'yyyy-mm-dd hh24:mi:ss')-  to_date(to_char(c.START_DATE, 'yyyy-mm-dd hh24:mi:ss'),
                     'yyyy-mm-dd hh24:mi:ss'),2)  --没有审批意见的
 when  FIELD0090  is not null then  round(to_date(to_char(FIELD0090, 'yyyy-mm-dd hh24:mi:ss'),
                         'yyyy-mm-dd hh24:mi:ss')-  to_date(to_char(c.START_DATE, 'yyyy-mm-dd hh24:mi:ss'),
                     'yyyy-mm-dd hh24:mi:ss'),2) end     


 clsc,
       to_char(FIELD0090, 'yyyy-mm-dd hh24:mi:ss') FINISH_DATE,
       to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss') etlts,
       case
          when a.FINISH_DATE is not null then
          '关闭'
         when FIELD0090 is  not null  then
          '评审完成'
       
       
         when   case when c.field0035  is null then  round(to_date(to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'),
                         'yyyy-mm-dd hh24:mi:ss')-  to_date(to_char(c.START_DATE, 'yyyy-mm-dd hh24:mi:ss'),
                     'yyyy-mm-dd hh24:mi:ss'),2)  --没有审批意见的
 when  FIELD0090  is not null then  round(to_date(to_char(FIELD0090, 'yyyy-mm-dd hh24:mi:ss'),
                         'yyyy-mm-dd hh24:mi:ss')-  to_date(to_char(c.START_DATE, 'yyyy-mm-dd hh24:mi:ss'),
                     'yyyy-mm-dd hh24:mi:ss'),2) end   >= 5 then
          '超期'
         when   case when c.field0035  is null then  round(to_date(to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss'),
                         'yyyy-mm-dd hh24:mi:ss')-  to_date(to_char(c.START_DATE, 'yyyy-mm-dd hh24:mi:ss'),
                     'yyyy-mm-dd hh24:mi:ss'),2)  --没有审批意见的
 when  FIELD0090  is not null then  round(to_date(to_char(FIELD0090, 'yyyy-mm-dd hh24:mi:ss'),
                         'yyyy-mm-dd hh24:mi:ss')-  to_date(to_char(c.START_DATE, 'yyyy-mm-dd hh24:mi:ss'),
                     'yyyy-mm-dd hh24:mi:ss'),2) end   < 5 then
          '进行中'
         
        
       end psjd,
       mx.*
  from V3XUSER.COL_SUMMARY A
  left join V3XUSER.CTP_ATTACHMENT b
    on a.id = b.SUB_REFERENCE
 right join  (select * from  V3XUSER.formmain_2182 c where to_char(c.start_date,'yyyy-mm-dd')>to_char(sysdate-60,'yyyy-mm-dd') )  c--只更新近2个半月的数据
    on c.id = a.FORM_RECORDID

  left join (select lscs.*,
                    lswc.field0068 wcsj,
                    lswc.field0069 wcqkms,
                    fj.filename    filename2,
                    fj.file_url    file_url2,
                    FIELD0071      结案确认
               from (select '临时措施' lx,
                            lscs.FIELD0055 csxq,
                            listagg(ry.name, '、') within group(order by ry.name) zrr,
                            lscs.FIELD0057 jhwcsj,
                            lscs.fid,
                            lscs.iid
                       from (SELECT distinct  id iid,
                                             formmain_id fid,
                                             REGEXP_SUBSTR(FIELD0056,
                                                           '[^,]+',
                                                           1,
                                                           LEVEL) cf,
                                             a.*
                               FROM V3XUSER.formson_3565 a
                             CONNECT BY REGEXP_SUBSTR(FIELD0056,
                                                      '[^,]+',
                                                      1,
                                                      LEVEL) is not null) lscs
                       left join V3XUSER.ORG_MEMBER ry
                         on lscs.cf = ry.id
                      group by '临时措施',
                               lscs.FIELD0055,
                               lscs.FIELD0057,
                               lscs.fid,
                               lscs.iid) lscs
               left join V3XUSER.formson_3568 lswc
                 on lscs.fid = lswc.formmain_id
                and lscs.csxq = lswc.field0067
               left join V3XUSER.CTP_ATTACHMENT fj
                 on lswc.field0070 = fj.SUB_REFERENCE
             union all
             select lscs.*,
                    lswc.field0073 wcsj,
                    lswc.field0074 wcqkms,
                    fj.filename,
                    fj.file_url    file_url,
                    field0076      结案确认
               from (select '长久措施' lx,
                            lscs.FIELD0058 csxq,
                            listagg(ry.name, '、') within group(order by ry.name) zrr,
                            lscs.field0060 jhwcsj,
                            lscs.fid,
                            lscs.iid
                       from (SELECT  id iid,
                                             formmain_id fid,
                                             REGEXP_SUBSTR(field0059,
                                                           '[^,]+',
                                                           1,
                                                           LEVEL) cf,
                                             a.*
                               FROM V3XUSER.formson_3566 a
		
                             CONNECT BY REGEXP_SUBSTR(field0059,
                                                      '[^,]+',
                                                      1,
                                                      LEVEL) is not null) lscs
                       left join V3XUSER.ORG_MEMBER ry
                         on lscs.cf = ry.id
                      group by '长久措施',
                               lscs.FIELD0058,
                               lscs.field0060,
                               lscs.fid,
                               lscs.iid) lscs
               left join V3XUSER.formson_3569 lswc
                 on lscs.fid = lswc.formmain_id
                and lscs.csxq = lswc.field0072
               left join V3XUSER.CTP_ATTACHMENT fj
                 on lswc.field0075 = fj.SUB_REFERENCE
             union all
             select lscs.*,
                    lswc.field0078 wcsj,
                    lswc.field0079 wcqkms,
                    fj.filename,
                    fj.file_url    file_url,
                    field0081      结案确认
               from (select '防呆设计' lx,
                            lscs.FIELD0061 csxq,
                            listagg(ry.name, '、') within group(order by ry.name) zrr,
                            lscs.field0063 jhwcsj,
                            lscs.fid,
                            lscs.iid
                       from (SELECT  id iid,
                                             formmain_id fid,
                                             REGEXP_SUBSTR(field0062,
                                                           '[^,]+',
                                                           1,
                                                           LEVEL) cf,
                                             a.*
                               FROM V3XUSER.formson_3567 a
                             CONNECT BY REGEXP_SUBSTR(field0062,
                                                      '[^,]+',
                                                      1,
                                                      LEVEL) is not null) lscs
                       left join V3XUSER.ORG_MEMBER ry
                         on lscs.cf = ry.id
                      group by '防呆设计',
                               lscs.FIELD0061,
                               lscs.field0063,
                               lscs.fid,
                               lscs.iid) lscs
               left join V3XUSER.formson_3570 lswc
                 on lscs.fid = lswc.formmain_id
                and lscs.csxq = lswc.field0077
               left join V3XUSER.CTP_ATTACHMENT fj
                 on lswc.field0080 = fj.SUB_REFERENCE
             
             ) mx
    on c.id = mx.fid

        Because the original table data volume was very small at the beginning of the project, there were basically no performance problems when executing connect by. But when running about 3 months of data, when the original data is about 600 rows, performance problems occur, and it is really slow. Specifically, we can take a look at the execution log.

         As shown in the picture above, there are a total of 410 pieces of data. It took almost 11 minutes to execute, and it still got stuck. Oh my god! What the hell is this. By splitting the execution process of each segment, it was found that the performance was stuck in the last " mx " table query, that is, the union all + connect by segment.

        2. Connect by Where exactly is the card stuck?

       

         As shown in the picture above, the user will take 3 measures, and each measure will have multiple responsible persons, so it will union all 2 times to splice the 3 measures together, because the responsible persons will be in one row, so connect by is used. Split into multiple lines. It seems that there is no problem with the logic, but the performance is really bad.

        Querying the temporary measures separately, after splitting 817 rows of data, it took almost 5 minutes, which may be acceptable, but adding union all will make things worse.

        Hahaha, the DBA came to me soon and said that the connect by process had been executed for almost 1000 minutes. Hahaha, this is a bit outrageous. 

              

 3. Solution

        1. Cursor execution/divide and conquer

        As mentioned above, connect by operates very efficiently when the amount of data is small, so we can use stored procedures, or split union all to execute, and then summarize the data into a bottom table. Specifically as shown in the figure below

        Define variables and use cursors to split execution line by line 

              Split execution through stored procedure cursors

              2. Change your thinking

        In fact, when we carefully analyzed the requirements, we found that using connect by is to split the responsible persons into multiple rows for storage, but when displaying, we need to use listegg to group the responsible persons together for display. Is there a way to handle it directly during display?

         As shown in the figure above, the corresponding primary keys of the responsible persons are separated by commas. Does it look familiar to in('A','B','C','D')? Therefore, when we display it, we can use something like select name from BI.Oa_Member where id in ('" + AI2 + "') to get the name of the responsible person, bingo~. I hope it will inspire you next time you encounter problems related to connect by~

Guess you like

Origin blog.csdn.net/qq_29061315/article/details/131537868