wait for a undo record等待时间的分析与模拟

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xxzhaobb/article/details/85287023

RDBMS 11.2.0.4 RAC 

    昨天库上发生了死锁,原因是有个job,job调用procedure,而procudure有调用package。而package里面写了很多成对的insert、delete语句,大约有10几对。而package里面是没有commit语句的。而procedure最后,有一个commit语句。开发在调试这个job的时候,因为一些字段问题,job中止了。这个时候,刚好又另一个开发在发布程序,刚好用到job里面insert和delete的那些表。结果就是库卡的很厉害。查询了下,发下有死锁。感觉很热闹,虽然这个问题很快就处理了。

今天看了下alert log,发现该job当是调试了不下10次,报错了10几次。这得回滚到..... 幸亏库还没有正式使用。

当是的alert log 。注意里面的parallel query server .

Wed Dec 26 05:46:36 2018
Archived Log entry 1190 added for thread 2 sequence 622 ID 0x589404d6 dest 1:
Wed Dec 26 06:01:10 2018
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc  (incident=48793):
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/XXXX/XXXX2/incident/incdir_48793/XXXX2_p012_13163_i48793.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Wed Dec 26 06:01:13 2018
Dumping diagnostic data in directory=[cdmp_20181226060113], requested by (instance=2, osid=13163 (P012)), summary=[incident=48793].
Wed Dec 26 06:01:15 2018
Sweep [inc][48793]: completed
Sweep [inc2][48793]: completed
Wed Dec 26 06:06:26 2018
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc  (incident=48794):
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/XXXX/XXXX2/incident/incdir_48794/XXXX2_p012_13163_i48794.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/XXXX/XXXX2/trace/XXXX2_p012_13163.trc:
ORA-10388: parallel query server interrupt (failure)
ORA-00600: internal error code, arguments: [kcbzwfcro_2], [90329], [1], [32768], [0], [], [], [], [], [], [], []
Wed Dec 26 06:06:27 2018
Dumping diagnostic data in directory=[cdmp_20181226060627], requested by (instance=2, osid=13163 (P012)), summary=[incident=48794].
Wed Dec 26 06:06:28 2018
Sweep [inc][48794]: completed
Sweep [inc2][48794]: completed

     今天看了下当时的awr报告,发现有个等待时间wait for a undo record.

这个等待时间,查了下MOS,IF: Undo Related Wait Event - Wait for an Undo Record (文档 ID 1951704.1) 上面有一些说明。

官方建议是修改fast_start_parallel_rollback = false ,但是修改这个参数,也给出了一些建议,建议查下MOS。

关于这个参数,在官方文档上有说明

官方文档:https://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams091.htm#REFRN10059

FAST_START_PARALLEL_ROLLBACK specifies the degree of parallelism used when recovering terminated transactions. Terminated transactions are transactions that are active before a system failure. If a system fails when there are uncommitted parallel DML or DDL transactions, then you can speed up transaction recovery during startup by using this parameter.

Values:

  • FALSE

    Parallel rollback is disabled

  • LOW

    Limits the maximum degree of parallelism to 2 * CPU_COUNT

  • HIGH

    Limits the maximum degree of parallelism to 4 * CPU_COUNT

If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism.

这个参数,默认设置是low,也就是2*cpu_count。 所以,当回滚的时候,系统性能下降就很正常了。

下面模拟下这个等待事件的产生。

RDBMS 12.2.0.1 

首先,创建一个表,然后插入大量的数据,不要提交

create table rollback as select * from dba_objects;
insert into rollback select * from rollback;
SYS@test>insert into rollback select * from rollback;

80801 rows created.

SYS@test>/

161602 rows created.

SYS@test>/

323204 rows created.

SYS@test>/

646408 rows created.

SYS@test>/

1292816 rows created.

SYS@test>

查看当前session对应的process id,并在os层面kill掉该进程

SYS@test>select spid from v$process where addr in (select paddr from v$session where sid in (select sid from v$mystat where rownum=1));

SPID
------------------------
17514

SYS@test>
kill -9 17514  

此时,查看v$fast_start_transactions

查看session等待时间

回滚完毕后,session等待事件没有了。v$fast_start_transactions视图

根据上图的xid查询,是哪个sql引起的

SYS@test>select distinct sql_id from V$ACTIVE_SESSION_HISTORY where xid=hextoraw('01000C0006510200');

SQL_ID
-------------
2ux4jwjr3g52b

SYS@test>select sql_id,sql_text from v$sql where sql_id='2ux4jwjr3g52b';

SQL_ID
-------------
SQL_TEXT
--------------------------------------------------------------------------------
2ux4jwjr3g52b
insert into rollback select * from rollback


SYS@test>

查看awr报告。可以看到等待时间有wait for a undo record.

到此,这个问题搞清楚了。

END

猜你喜欢

转载自blog.csdn.net/xxzhaobb/article/details/85287023