oracle如何处理undo使用率过大

报警内容

##### NBA统一监控告警平台-DB: [触发]

> `告警集群:` jq-prod-db-prometheus
> `告警描述:` 数据库Oracle回滚空间使用超过10G
> `告警环境:` prod
> `告警类型:` oracle_prod
> `告警实例:` NBA-rac01
> `告警IP:`   11.11.11.11
> `实例别名:`  NBA-rac01
> `当前数值:` 11.91G
> `告警级别:` 严重
> `触发时间:` 2021-03-03 04:01:59
> `持续时间:` 10s
> `告警次数:` 1
> `告警应用:`NBA_oracle
> `应用负责人:` 

处理过程

  • 登录11.11.11.11进入oracle,并设置列宽(如果使用plsql则不用设置)
[bzops@nba01 ~]$ sudo -i
[root@nba01 ~]# su - oracle
[oracle@nba01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Mar 2 15:05:36 2021

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> col sample_time for a30
col event for a30
col program for a30
col machine for a40
set lines 200 pages 999
  • 查询undo表空间的使用历史记录,发现在2021/3/3 3:58:39确实使用率很高,活动undo段总计达到了17G
select begin_time,end_time,undoblks*8/1024/1024 undog,maxqueryid,activeblks*8/1024/1024 activeundo,inst_id from gv$undostat

BEGIN_TIME          END_TIME            UNDOG               MAXQUERYID      ACTIVEUNDO  INST_ID
2021/3/3 4:48:39    2021/3/3 4:58:39    0.00018310546875    89w8y2pgn25yd   0.2481689453125 2
2021/3/3 4:38:39    2021/3/3 4:48:39    0.00014495849609375 0rc4km05kgzb9   0.2481689453125 2
2021/3/3 4:28:39    2021/3/3 4:38:39    0.001434326171875   89w8y2pgn25yd   0.2481689453125 2
2021/3/3 4:18:39    2021/3/3 4:28:39    9.1552734375E-5 0rc4km05kgzb9   0.2481689453125 2
2021/3/3 4:08:39    2021/3/3 4:18:39    0.00020599365234375 0rc4km05kgzb9   0.2481689453125 2
2021/3/3 3:58:39    2021/3/3 4:08:39    9.60930633544922    89w8y2pgn25yd   17.6923828125   2
2021/3/3 3:48:39    2021/3/3 3:58:39    8.96279907226563    89w8y2pgn25yd   4.6112060546875 2
2021/3/3 3:38:39    2021/3/3 3:48:39    0.00026702880859375 0rc4km05kgzb9   0.2481689453125 2
2021/3/3 3:28:39    2021/3/3 3:38:39    0.00115966796875    89w8y2pgn25yd   0.2481689453125 2
  • 根据上面定位到的时间查询对应时间的活动会话记录,XID用来过滤没有事务的会话
    发现这段时间oracle@scm03db02 (J000)进程调用程序91522的sql7p9061n7t9800执行了很久,基本可以确定是这个sql导致的undo使用率高
    oracle@scm03db02 (J000)J开头的进程说明是oracle的job进程,所以可以判断出这是一个运行中的job
SQL> select sample_time,session_id,program,sql_id,plsql_entry_object_id,event,blocking_session from dba_hist_active_sess_history 
where xid is not null and  sample_time between timestamp '2021-03-03 03:57:00' and timestamp '2021-03-03 04:01:00';

SAMPLE_TIME            SESSION_ID PROGRAM            SQL_ID        PLSQL_ENTRY_OBJECT_ID EVENT              BLOCKING_SESSION
------------------------------ ---------- ------------------------------ ------------- --------------------- ------------------------------ ----------------
03-MAR-21 04.00.14.255 AM       27888 JDBC Thin Client       4d4pmm4ayqw17             91533
03-MAR-21 04.00.04.125 AM       18788 JDBC Thin Client       95zdtpfm6x7h1             91526
03-MAR-21 04.00.14.255 AM       29133 JDBC Thin Client       6c4fa820c91hk
03-MAR-21 04.00.14.255 AM        3786 JDBC Thin Client       97a43gbatv8m9
03-MAR-21 04.00.44.671 AM        3786 JDBC Thin Client       4j071dbgbxcgv
03-MAR-21 04.00.27.486 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current block 2-way
03-MAR-21 04.00.58.066 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current block 2-way
03-MAR-21 03.58.46.227 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current block 2-way
03-MAR-21 03.59.16.527 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current block 2-way
03-MAR-21 03.59.26.626 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current block 2-way
03-MAR-21 03.59.36.726 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current block 2-way
03-MAR-21 03.58.56.327 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc cr grant 2-way
03-MAR-21 03.59.57.216 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 04.00.07.296 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 04.00.17.396 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 04.00.37.576 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 04.00.47.956 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.57.34.997 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.58.05.557 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.58.15.647 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.58.25.747 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.58.35.837 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.59.06.427 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.59.47.116 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.57.10.818 AM       24730 JDBC Thin Client       6c4fa820c91hk
03-MAR-21 04.00.04.125 AM        3786 JDBC Thin Client       aaxs4c2a89wja
03-MAR-21 03.57.41.218 AM        2202 JDBC Thin Client       cfs24zcws20d2
03-MAR-21 03.57.14.806 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522
03-MAR-21 03.57.45.377 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 db file sequential read
03-MAR-21 03.57.55.477 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 db file sequential read
03-MAR-21 03.57.04.696 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current grant 2-way
03-MAR-21 03.57.24.906 AM       18781 oracle@scm03db02 (J000)    7p9061n7t9800             91522 gc current grant 2-way
  • 查询具体的object_idsql_id,查出来是每天03:50运行的清理jobSP_DELETE_LOG
    其中的DELETE FROM T1 L WHERE L.CREATE_TIME < TRUNC(SYSDATE) - 90删除的量太大,导致了undo使用太多
SQL> select object_name,object_type from dba_objects where object_id=91522;

OBJECT_NAME            OBJECT_TYPE
------------------------------ -------------------
SP_DELETE_LOG              PROCEDURE

SQL> set long 20000
select sql_text from dba_hist_sqltext where sql_id='7p9061n7t9800';
SQL>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM T1 L WHERE L.CREATE_TIME < TRUNC(SYSDATE) - 90

SQL> select job_name,repeat_interval,next_run_date from dba_scheduler_jobs where ENABLED='TRUE' and job_action like '%SP_DELETE_LOG%';
JOB_NAME    REPEAT_INTERVAL NEXT_RUN_DATE
JOB_DELETE_LOG  FREQ=DAILY;BYHOUR=3;BYMINUTE=50 06-MAR-21 03.50.00.700000 AM ASIA/SHANGHAI

解决办法

  • 想办法把这个删除sql 进行分批提交

猜你喜欢

转载自blog.51cto.com/15122892/2648152