近期一个库需要生成AWR时发现近期没有可选择的快照
检查快照生成
select snap_id,instance_number,begin_interval_time,end_interval_time from dba_hist_snapshot order by instance_number,begin_interval_time
截取部分行发现快照生成时间不对
28628054113-APR-20 06.00.13.172 AM13-APR-20 07.00.01.661 AM
28728055113-APR-20 07.00.01.661 AM13-APR-20 08.00.18.495 AM
28828056113-APR-20 08.00.18.495 AM13-APR-20 09.00.06.341 AM
28928057113-APR-20 09.00.06.341 AM13-APR-20 10.00.18.680 AM
29028058113-APR-20 10.00.18.680 AM13-APR-20 11.00.06.287 AM
29128060113-APR-20 12.00.06.152 PM13-APR-20 01.00.30.297 PM
29228061113-APR-20 01.00.30.297 PM13-APR-20 02.00.01.093 PM <<<<<<<<<<<<<<<<<<<
节点1从4月13日起未生成快照
86228231207-MAY-20 08.00.26.682 PM07-MAY-20 09.00.52.925 PM
86328232207-MAY-20 09.00.52.925 PM07-MAY-20 10.04.14.955 PM
86428233207-MAY-20 10.04.14.955 PM08-MAY-20 05.16.24.751 PM
86528234208-MAY-20 05.16.24.751 PM08-MAY-20 05.25.23.908 PM
86628235208-MAY-20 05.25.23.908 PM08-MAY-20 06.08.14.022 PM
86728236208-MAY-20 06.08.14.022 PM08-MAY-20 07.00.40.240 PM
86828237208-MAY-20 07.00.40.240 PM08-MAY-20 08.00.05.696 PM
86928238208-MAY-20 08.00.05.696 PM08-MAY-20 09.00.03.040 PM
87028239208-MAY-20 09.00.03.040 PM08-MAY-20 10.00.14.741 PM
87128240208-MAY-20 10.00.14.741 PM08-MAY-20 11.00.11.279 PM
87228241208-MAY-20 11.00.11.279 PM09-MAY-20 12.00.59.568 AM<<<<<<<<<<<<<<<<<<<
2,3,4节点8日晚到9日中午才生成一次快照
144328234308-MAY-20 05.16.24.744 PM08-MAY-20 05.25.23.905 PM
144428235308-MAY-20 05.25.23.905 PM08-MAY-20 06.08.13.950 PM
144528236308-MAY-20 06.08.13.950 PM08-MAY-20 07.00.40.167 PM
144628237308-MAY-20 07.00.40.167 PM08-MAY-20 08.00.05.664 PM
144728238308-MAY-20 08.00.05.664 PM08-MAY-20 09.00.03.039 PM
144828239308-MAY-20 09.00.03.039 PM08-MAY-20 10.00.16.037 PM
144928240308-MAY-20 10.00.16.037 PM08-MAY-20 11.00.11.235 PM
145028241308-MAY-20 11.00.11.235 PM09-MAY-20 12.00.59.561 AM<<<<<<<<<<<<<<<<<<<
202128234408-MAY-20 05.16.24.741 PM08-MAY-20 05.25.23.908 PM
202228235408-MAY-20 05.25.23.908 PM08-MAY-20 06.08.14.004 PM
202328236408-MAY-20 06.08.14.004 PM08-MAY-20 07.00.40.250 PM
202428237408-MAY-20 07.00.40.250 PM08-MAY-20 08.00.05.695 PM
202528238408-MAY-20 08.00.05.695 PM08-MAY-20 09.00.02.990 PM
202628239408-MAY-20 09.00.02.990 PM08-MAY-20 10.00.14.785 PM
202728240408-MAY-20 10.00.14.785 PM08-MAY-20 11.00.11.281 PM
202828241408-MAY-20 11.00.11.281 PM09-MAY-20 12.00.59.575 AM<<<<<<<<<<<<<<<<<<<
snapshot设置也没问题
SQL> SELECT * FROM DBA_HIST_WR_CONTROL
DBID SNAP_INTERVAL RETENTION TOPNSQL
---------- -------------------------------- -------------------------- ----------
1073293429 +00000 01:00:00.0 +00045 00:00:00.0 DEFAULT
SQL>
检查trace
alert定期出现如下报错
Thu May 07 22:09:20 2020
Suspending MMON slave action kewrmafsa_ for 82800 seconds
MMON进程介绍
MMON进程是数据库10g版本引入的进程,是数据库的很多自我监视和自我调整
功能的支持进程。
此数据库实例收集有关活动和性能的大量统计数据。这些统计数据收集到SGA中,
通过发出sql查询,可以询问他们的当前值。
MMON从SGA定期捕获统计数据(默认是每小时一次),并将它们写入到数据字典中,在数据字典中,可以无限期的存储它们(不过,默认方式是只存储8天)。
每当MMON收集一组统计数据(称为快照)时,它还启动ADDM。ADDM工具使用由多名
DBA历经多年开发的专家系统来分析数据库活动。它观察两个快照(默认是当前快照和先前的快照),并得出有关性能的观察结果和建议。
除了收集快照外,MMON还持续监视数据库和实例,来确定是否应该发出任何警报。
M000trace报错
not in wait at each sample
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 3 csec) -----
----- END DDE Actions Dump (total 3 csec) -----
*** KEWROCISTMTEXEC - encountered error: (ORA-12751: cpu time or run time policy violation<<<<<<<<<<<<<<<<<<<<<<<<<<<
)
*** SQLSTR: total-len=295, dump-len=240,
STR={insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number, service_name_hash, stat_id, value) select :snap_id, :dbid, :instance_number, stat.service_name_hash, stat.stat_id, stat.value from v$active_services asvc, v$service_st}
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
CPU time exceeded 300 seconds <<<<<<<<<<<<<<<<<<<<<<<<<<<
Time limit violation detected at:
ksedsts()+465<-kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kgeade()+351<-kgerelv()+140
<-kgerev()+34<-kserec1()+170<-OCIKSEC()+189<-kewrose_oci_stmt_exec()+292<-kewrgwxf1_gwrsql_exft_1()+317<-kewrgwxf_gwrsql_exft()+496<-kewrews_execute_wr_sql()+52<-kewrftbs_flush_table_by_sql()+180<-kewrft_flush_table()+264<-kewrftec_flush_table_ehdlcx()+766
<-kewrfsvc_flush_svcstat()+45<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+898<-kewrfos_flush_onesnap()+170<-kewrfsc_flush_snapshot_c()+644<-kewrafs_auto_flush_slave()+769<-kebm_slave_main()+586<-ksvrdp()+1766<-opirip()+674
<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230Current Wait Stack:
Not in wait; last wait ended 5 min 5 sec ago
There are 3 sessions blocked by this session.
*** 2020-05-09 00:06:05.637
*** SESSION ID:(609.4217) 2020-05-09 00:06:05.637
*** CLIENT ID:() 2020-05-09 00:06:05.637
*** SERVICE NAME:(SYS$BACKGROUND) 2020-05-09 00:06:05.637
*** MODULE NAME:(MMON_SLAVE) 2020-05-09 00:06:05.637
*** ACTION NAME:(Auto-Flush Slave Action) 2020-05-09 00:06:05.637<<<<<<<<<<<<<<<<<<
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
CPU time exceeded 300 seconds<<<<<<<<<<<<<<<<<<
Time limit violation detected at:
<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230Current Wait Stack:
Not in wait; last wait ended 5 min 5 sec ago <<<<<<<<<<<<<<<<<<
There are 3 sessions blocked by this session.
Dumping one waiter:
inst: 3, sid: 1713, ser: 32257
wait event: 'enq: WF - contention'
p1: 'name|mode'=0x57460006
p2: '0'=0x4a
p3: '0'=0x0
row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
min_blocked_time: 288 secs, waiter_cache_ver: 23457
Wait State:
fixed_waits=0 flags=0x20 boundary=(nil)/-1
mmon trace
*** 2020-05-09 07:20:31.882
Unable to schedule a MMON slave at: Auto Flush Main 1
Slave action has been temporarily suspended
- Slave action had prior policy violations.
Unknown return code: 101
手动发起快照并使用10046进行跟踪,尝试几次有出现快照未生成当前session就被kill的现象
文档 2043531.1 有介绍
Taking an AWR snapshot executes many SQL statements to generate a new snapshot and has a built-in timeout mechanism, which causes timeouts between 300 and 900 seconds by default. As a result, if snapshot generation stalls, "Time limit violation" or "Runtime exceeded" messages and associated ORA-12751 errors are raised.
获取AWR快照将执行许多SQL语句来生成新的快照,并具有内置超时机制,该机制在默认情况下会导致300到900秒的超时。因此,如果快照生成停止,就会引发“时间限制违规”或“运行时超限”消息和相关的ORA-12751错误。
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug session_event 10046 trace name context forever ,level 4
Statement processed.
SQL> oradebug tracefile_name;
/oracle/app/oracle/diag/rdbms/cdr/cdr1/trace/cdr1_ora_19411.trc
SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
BEGIN DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT(); END;
*
ERROR at line 1:
ORA-00028: your session has been killed
ORA-00028: your session has been killed
ORA-06512: at "SYS.DBMS_WORKLOAD_REPOSITORY", line 99
ORA-06512: at "SYS.DBMS_WORKLOAD_REPOSITORY", line 122
ORA-06512: at line 1
分析10046 trace找到耗时的sql_id
=====================
PARSING IN CURSOR #140236033454680 len=295 dep=1 uid=0 oct=2 lid=0 tim=1588989110811688 hv=1114153516 ad='be2a17d0' sqlid='7g732rx16j8jc'
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number, service_name_hash, stat_id, value) select :snap_id, :dbid, :instance_number, stat.service_name_hash, stat.stat_id, stat.value from v$active_services asvc, v$service_stats stat where asvc.name_hash = stat.service_name_hash
END OF STMT
PARSE #140236033454680:c=0,e=500,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2550932738,tim=1588989110811688
BINDS #140236033454680:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0000 frm=00 csi=00 siz=72 off=0
kxsbbbfp=7f8b3edc6518 bln=22 avl=04 flg=05
value=28245
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24
kxsbbbfp=7f8b3edc6530 bln=22 avl=06 flg=01
value=1073293429
Bind#2
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=48
kxsbbbfp=7f8b3edc6548 bln=22 avl=02 flg=01
value=1
*** 2020-05-09 09:59:32.945
EXEC #140236033454680:c=457832613,e=462133450,p=0,cr=0,cu=0,mis=0,r=174,dep=1,og=1,plh=2550932738,tim=1588989572945189
ERROR #140236033454680:err=28 tim=1588989572945334
STAT #140236033454680 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL (cr=0 pr=0 pw=0 time=13 us)'
STAT #140236033454680 id=2 cnt=174 pid=1 pos=1 obj=0 op='NESTED LOOPS (cr=0 pr=0 pw=0 time=445643146 us cost=0 size=130 card=1)'
STAT #140236033454680 id=3 cnt=175 pid=2 pos=1 obj=0 op='MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=1791 us cost=0 size=78 card=1)'
STAT #140236033454680 id=4 cnt=7 pid=3 pos=1 obj=0 op='FIXED TABLE FULL X$KSWSASTAB (cr=0 pr=0 pw=0 time=84 us cost=0 size=39 card=1)'
STAT #140236033454680 id=5 cnt=175 pid=3 pos=2 obj=0 op='BUFFER SORT (cr=0 pr=0 pw=0 time=972 us cost=0 size=39 card=1)'
STAT #140236033454680 id=6 cnt=28 pid=5 pos=1 obj=0 op='FIXED TABLE FULL X$KEWSSMAP (cr=0 pr=0 pw=0 time=69 us cost=0 size=39 card=1)'
STAT #140236033454680 id=7 cnt=174 pid=2 pos=2 obj=0 op='FIXED TABLE FIXED INDEX X$KEWSSVCV (ind:2) (cr=0 pr=0 pw=0 time=462106244 us cost=0 size=52 card=1)'
*** KEWROCISTMTEXEC - encountered error: (ORA-00028: your session has been killed
)
*** SQLSTR: total-len=295, dump-len=240,
STR={insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number, service_name_hash, stat_id, value) select :snap_id, :dbid, :instance_number, stat.service_name_hash, stat.stat_id, stat.value from v$active_services asvc, v$service_st}
*** KEWRAFM1: Error=13509 encountered by kewrfteh
CLOSE #140236033454680:c=0,e=11,dep=1,type=0,tim=1588989572945875
*** KEWROCISTMTEXEC - encountered error: (ORA-00028: your session has been killed
)
*** SQLSTR: total-len=395, dump-len=240,
STR={insert into WRH$_SERVICE_WAIT_CLASS (snap_id, dbid, instance_number, service_name_hash, wait_class_id, wait_class, total_waits, time_waited) select :snap_id, :dbid, :instance_number, stat.service_name_hash, stat.wait_class_id, s}
*** KEWRAFM1: Error=13509 encountered by kewrfteh
CLOSE #140236033454680:c=0,e=4,dep=1,type=0,tim=1588989572946282
PARSE #140236033662208:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=0,tim=1588989572946322
BINDS #140236033662208:
格式化后
insert into WRH$_SERVICE_STAT
(snap_id, dbid, instance_number, service_name_hash, stat_id, value)
select :snap_id,
:dbid,
:instance_number,
stat.service_name_hash,
stat.stat_id,
stat.value
from v$active_services asvc, v$service_st
发现该sql多个执行计划有特别长耗时的
SQL> @xi 7g732rx16j8jc %
eXplain the execution plan for sqlid 7g732rx16j8jc child %...
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7g732rx16j8jc, child number 0
-------------------------------------
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select :snap_id, :dbid,
:instance_number, stat.service_name_hash, stat.stat_id, stat.value
from v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash
Plan hash value: 2550932738
--------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
| 2 | NESTED LOOPS | | 1 | | | |
| 3 | MERGE JOIN CARTESIAN | | 1 | | | |
|* 4 | FIXED TABLE FULL | X$KSWSASTAB | 1 | | | |
| 5 | BUFFER SORT | | 1 | 73728 | 73728 | |
|* 6 | FIXED TABLE FULL | X$KEWSSMAP | 1 | | | |
|* 7 | FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:2) | 1 | | | |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
6 - filter("M"."AGGID"=3)
7 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH"
AND "S"."KEWSOFF"="M"."OFFST"))
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
SQL_ID 7g732rx16j8jc, child number 1
-------------------------------------
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select :snap_id, :dbid,
:instance_number, stat.service_name_hash, stat.stat_id, stat.value
from v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash
Plan hash value: 2550932738
--------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
| 2 | NESTED LOOPS | | 1 | | | |
| 3 | MERGE JOIN CARTESIAN | | 1 | | | |
|* 4 | FIXED TABLE FULL | X$KSWSASTAB | 1 | | | |
| 5 | BUFFER SORT | | 1 | 73728 | 73728 | |
|* 6 | FIXED TABLE FULL | X$KEWSSMAP | 1 | | | |
|* 7 | FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:2) | 1 | | | |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
6 - filter("M"."AGGID"=3)
7 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH"
AND "S"."KEWSOFF"="M"."OFFST"))
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
SQL_ID 7g732rx16j8jc, child number 2
-------------------------------------
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select :snap_id, :dbid,
:instance_number, stat.service_name_hash, stat.stat_id, stat.value
from v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash
Plan hash value: 2550932738
--------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
| 2 | NESTED LOOPS | | 1 | | | |
| 3 | MERGE JOIN CARTESIAN | | 1 | | | |
|* 4 | FIXED TABLE FULL | X$KSWSASTAB | 1 | | | |
| 5 | BUFFER SORT | | 1 | 73728 | 73728 | |
|* 6 | FIXED TABLE FULL | X$KEWSSMAP | 1 | | | |
|* 7 | FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:2) | 1 | | | |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
6 - filter("M"."AGGID"=3)
7 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH"
AND "S"."KEWSOFF"="M"."OFFST"))
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
SQL_ID 7g732rx16j8jc, child number 3
-------------------------------------
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select :snap_id, :dbid,
:instance_number, stat.service_name_hash, stat.stat_id, stat.value
from v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash
Plan hash value: 2550932738
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | | 0 |00:00:00.01 | | | |
| 1 | LOAD TABLE CONVENTIONAL | | 1 | | 0 |00:00:00.01 | | | |
| 2 | NESTED LOOPS | | 1 | 1 | 174 |00:07:25.64 | | | | <<<<<<<<<<<<<<<<<<<<<<<<<<<<
| 3 | MERGE JOIN CARTESIAN | | 1 | 1 | 175 |00:00:00.01 | | | |
|* 4 | FIXED TABLE FULL | X$KSWSASTAB | 1 | 1 | 7 |00:00:00.01 | | | |
| 5 | BUFFER SORT | | 7 | 1 | 175 |00:00:00.01 | 73728 | 73728 | |
|* 6 | FIXED TABLE FULL | X$KEWSSMAP | 1 | 1 | 28 |00:00:00.01 | | | |
|* 7 | FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:2) | 175 | 1 | 174 |00:07:42.11 | | | | <<<<<<<<<<<<<<<<<<<<<<<<<<<<
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
6 - filter("M"."AGGID"=3)
7 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH" AND "S"."KEWSOFF"="M"."OFFST"))
141 rows selected.
可以看到这三个表无统计信息,特别是X$KEWSSVCV的NUM_ROWS与实际行数有显著差异。
因此,优化器在X$KEWSSMAP、X$KEWSSVCV和X$KSWSASTAB之间选择了不合适的连接顺序,并选择了无效索引来访问X$KEWSSVCV。
SQL> select * from DBA_TAB_STATISTICS where table_name in ('X$KSWSASTAB','X$KEWSSMAP','X$KEWSSVCV');
OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION SUBPARTITION_NAME SUBPARTITION_POSITION OBJECT_TYPE NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SAMPLE_SIZE LAST_ANALYZED GLO USE STATT STA
------------------------------ ------------------------------ ------------------------------ ------------------ ------------------------------ --------------------- ------------ ---------- ---------- ------------ ---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- ----------- ----------------- --- --- ----- ---
SYS X$KSWSASTAB FIXED TABLE
SYS X$KEWSSMAP FIXED TABLE
SYS X$KEWSSVCV FIXED TABLE
SQL>
SQL> select count(*) from X$KSWSASTAB;
COUNT(*)
----------
8
SQL> select count(*) from X$KEWSSMAP;
COUNT(*)
----------
265
SQL> select count(*) from X$KEWSSVCV;
COUNT(*)
----------
2576336
SQL>
####生产环境x$表查的时候要慎重
SQL> select * from X$KEWSSVCV where rownum<10;
ADDR INDX INST_ID SVCNAM SVCID SVCHSH KEWSOFF KEWSVAL
---------------- ---------- ---------- ---------------------------------------------------------------- ---------- ---------- ---------- ----------
00007F618F9DA658 0 1 --UNKNOWN-- 151 2985209977 0 0
00007F618F9DA658 1 1 --UNKNOWN-- 151 2985209977 1 0
00007F618F9DA658 2 1 --UNKNOWN-- 151 2985209977 2 0
00007F618F9DA658 3 1 --UNKNOWN-- 151 2985209977 3 0
00007F618F9DA658 4 1 --UNKNOWN-- 151 2985209977 4 0
00007F618F9DA658 5 1 --UNKNOWN-- 151 2985209977 5 0
00007F618F9DA658 6 1 --UNKNOWN-- 151 2985209977 6 0
00007F618F9DA658 7 1 --UNKNOWN-- 151 2985209977 7 0
00007F618F9DA658 8 1 --UNKNOWN-- 151 2985209977 8 0
9 rows selected.
alter中有大量ALTER SYSTEM SET service_names 这是datapump发起时会有的操作
Sat May 09 10:21:20 2020
Thread 1 advanced to log sequence 152367 (LGWR switch)
Current log# 3 seq# 152367 mem# 0: +DATADG/cdr/redo03.log
Sat May 09 10:22:16 2020
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200413145208.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413145208.CDR.H
EBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200509102215.CDR.HEBEI.MOBILE.COM' SCOPE=MEMORY SID='cdr1';
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$C_1_20200509102215.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200413145208.CDR.H
EBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413145208.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200509102215.CDR.HEBEI.MOBILE.COM' SCOPE=MEMORY SID='cdr1';
Sat May 09 10:22:17 2020
DM09 started with pid=364, OS id=11990, job DBMT.SYS_EXPORT_TABLE_08
Sat May 09 10:22:45 2020
DW04 started with pid=136, OS id=23074, wid=1, job DBMT.SYS_EXPORT_TABLE_08
Sat May 09 10:23:05 2020
DW01 started with pid=947, OS id=27865, wid=2, job DBMT.SYS_EXPORT_TABLE_08
Sat May 09 10:25:29 2020
Thread 1 advanced to log sequence 152368 (LGWR switch)
Current log# 4 seq# 152368 mem# 0: +DATADG/cdr/redo04.log
Sat May 09 10:31:15 2020
Thread 1 advanced to log sequence 152369 (LGWR switch)
Current log# 1 seq# 152369 mem# 0: +DATADG/cdr/redo01.log
Sat May 09 10:35:26 2020
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_1_20200509102215.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200413145208.CDR.H
EBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413145208.CDR.HEBEI.MOBILE.COM' SCOPE=MEMORY SID='cdr1';
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413152721.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$S_1_20200413145208.CDR.HEBEI.MOBILE.COM','SYS$SYS.KUPC$C_1_20200413145208.CDR.H
EBEI.MOBILE.COM' SCOPE=MEMORY SID='cdr1';
Sat May 09 10:35:54 2020
实例1服务中出现大量的unknown
在数据泵处理过程中,“——UNKNOWN——”服务名称将随着创建/删除服务的增加而增加,话单系统每月会调用大量的expdp进行表级备份任务发给nbu处理,吊起进程皆在节点1发起,所以导致节点1有这么多的UNKNOWN
SQL> select inst_id,service_name,count(1) from gv$service_stats
2 group by inst_id,service_name
3 order by 1,2,count(1) desc;
INST_ID SERVICE_NAME COUNT(1)
---------- ---------------------------------------------------------------- ----------
1 --UNKNOWN-- 2576100 <<<<<<<<<<<<<<<<<<<<<<
1 SYS$BACKGROUND 28
1 SYS$USERS 28
1 SYS.KUPC$C_1_20200413145208 27
1 SYS.KUPC$C_1_20200413152721 28
1 SYS.KUPC$C_1_20200509102215 7
1 SYS.KUPC$S_1_20200413145208 28
1 SYS.KUPC$S_1_20200413152721 27
1 SYS.KUPC$S_1_20200509102215 8
1 cdr.hebei.mobile.com 28
1 cdrXDB 27
2 --UNKNOWN-- 56
2 SYS$BACKGROUND 28
2 SYS$USERS 28
2 cdr.hebei.mobile.com 28
2 cdrXDB 28
3 --UNKNOWN-- 56
3 SYS$BACKGROUND 28
3 SYS$USERS 28
3 cdr.hebei.mobile.com 28
3 cdrXDB 28
4 --UNKNOWN-- 112
4 SYS$BACKGROUND 28
4 SYS$USERS 28
4 cdr.hebei.mobile.com 28
4 cdrXDB 28
26 rows selected.
SQL> select * from v$fixed_table where name='V$SERVICE_STATS';
NAME OBJECT_ID TYPE TABLE_NUM
------------------------------ ---------- ----- ----------
V$SERVICE_STATS 4294952576 VIEW 65537
SQL>
SQL> select view_definition from v$fixed_view_definition where view_name='GV$SERVICE_STATS';
VIEW_DEFINITION
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
select s.inst_id, s.svchsh, s.svcnam, m.extid, m.sname, s.kewsval from x$kewssvcv s, x$kewssmap m where s.kewsoff = m.offst and m.aggid = 3
SQL>
##X$KEWSSVCV是内存表也是gv$service_stats的基表,生成snapshot的时候慢的那条insert into WRH$_SERVICE_STAT 也使用到了这个表且耗时巨长
收集相关表统计信息,再检查执行计划
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS', 'X$KSWSASTAB', no_invalidate=>false);
PL/SQL procedure successfully completed.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS', 'X$KEWSSMAP', no_invalidate=>false);
PL/SQL procedure successfully completed.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS', 'X$KEWSSVCV', no_invalidate=>false);
PL/SQL procedure successfully completed.
SQL> select * from DBA_TAB_STATISTICS where table_name in ('X$KSWSASTAB','X$KEWSSMAP','X$KEWSSVCV');
OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION SUBPARTITION_NAME SUBPARTITION_POSITION OBJECT_TYPE NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SAMPLE_SIZE LAST_ANALYZED GLO USE STATT STA
------------------------------ ------------------------------ ------------------------------ ------------------ ------------------------------ --------------------- ------------ ---------- ---------- ------------ ---------- ---------- ----------- ------------------------- ------------------- ----------------- ------------------- ----------- ----------------- --- --- ----- ---
SYS X$KSWSASTAB FIXED TABLE 8 112 8 20200509 10:48:26 YES NO
SYS X$KEWSSMAP FIXED TABLE 265 58 265 20200509 10:48:33 YES NO
SYS X$KEWSSVCV FIXED TABLE 2576336 46 2576336 20200509 10:51:12 YES NO
SQL>
SQL> @xi 7g732rx16j8jc %
eXplain the execution plan for sqlid 7g732rx16j8jc child %...
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7g732rx16j8jc, child number 0
-------------------------------------
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select :snap_id, :dbid,
:instance_number, stat.service_name_hash, stat.stat_id, stat.value
from v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash
Plan hash value: 4168499536
---------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
|* 2 | HASH JOIN | | 166 | 948K| 948K| 1092K (0)|
| 3 | NESTED LOOPS | | 212 | | | |
|* 4 | FIXED TABLE FULL | X$KSWSASTAB | 8 | | | |
|* 5 | FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:1) | 28 | | | |
|* 6 | FIXED TABLE FULL | X$KEWSSMAP | 28 | | | |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."KEWSOFF"="M"."OFFST")
4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
5 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH"))
6 - filter("M"."AGGID"=3)
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
SQL_ID 7g732rx16j8jc, child number 2
-------------------------------------
insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select :snap_id, :dbid,
:instance_number, stat.service_name_hash, stat.stat_id, stat.value
from v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash
Plan hash value: 4168499536
---------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
|* 2 | HASH JOIN | | 166 | 965K| 965K| 1081K (0)|
| 3 | NESTED LOOPS | | 212 | | | |
|* 4 | FIXED TABLE FULL | X$KSWSASTAB | 8 | | | |
|* 5 | FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:1) | 28 | | | |
|* 6 | FIXED TABLE FULL | X$KEWSSMAP | 28 | | | |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."KEWSOFF"="M"."OFFST")
4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
5 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH"))
6 - filter("M"."AGGID"=3)
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
72 rows selected.
查看FIXED TABLE INDEX
SQL> select * from v$indexed_fixed_column where table_name='X$KEWSSVCV' ;
TABLE_NAME INDEX_NUMBER COLUMN_NAME COLUMN_POSITION
------------------------------ ------------ ------------------------------ ---------------
X$KEWSSVCV 2 KEWSOFF 0
X$KEWSSVCV 1 SVCHSH 0
SQL>
SQL> @sqlid 7g732rx16j8jc %
Show SQL text, child cursors and execution stats for SQLID 7g732rx16j8jc child %
HASH_VALUE PLAN_HASH_VALUE CH# SQL_TEXT
---------- --------------- ---- ------------------------------------------------------------------------------------------------------------------------------------------------------
1114153516 4168499536 0 insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number, service_name_hash, stat_id, value) select :snap_id, :dbid, :instance_number,
stat.service_name_hash, stat.stat_id, stat.value from v$active_services asvc, v$service_stats stat where asvc.name_hash = stat.service_name_hash
1114153516 4168499536 2 insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number, service_name_hash, stat_id, value) select :snap_id, :dbid, :instance_number,
stat.service_name_hash, stat.stat_id, stat.value from v$active_services asvc, v$service_stats stat where asvc.name_hash = stat.service_name_hash
CH# PARENT_HANDLE OBJECT_HANDLE PLAN_HASH PARSES H_PARSES EXECUTIONS FETCHES ROWS_PROCESSED ROWS_PER_FETCH CPU_SEC CPU_SEC_EXEC ELA_SEC LIOS PIOS SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- -------------- -------------- ---------- ------------ ---------- ---------- ---------- ---------- ---------------
0 00000000BE2A17D0 0000000101E08018 4168499536 42 8 42 0 7280 .468034 .011143667 1.176163 1622 253 0 0
2 00000000BE2A17D0 00000000BD655CC0 4168499536 1 2 1 0 224 .016001 .016001 .056783 77 10 0 0
检查mmon挂起状态
SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
Status: 3
Flags: 0
Runtime limit: 900
CPU time limit: 300
Violations: 28
Suspended until: 1589036765 <<<<<不为0代表挂起
再次实验AWR可以手动生成,观察后续
SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
PL/SQL procedure successfully completed.
SQL>
SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
Status: 3
Flags: 0
Runtime limit: 900
CPU time limit: 300
Violations: 28
Suspended until: 1589036765
SQL>
并在必要时执行以下命令以解除MMON挂起状态。
SQL> oradebug unit_test kebm_set_slv_attrs kewrmafsa_ retain retain retain retain 0 0
Modified attributes of kewrmafsa_ (slave id 13)
SQL>
SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
Status: 3
Flags: 0
Runtime limit: 900
CPU time limit: 300
Violations: 0
Suspended until: 0
SQL>
经过几个小时快照周期查看已经可以在正确时间间隔自动生成快照
with t as (
select snap_id,
instance_number,
begin_interval_time,
end_interval_time,
snap_flag,
row_number() over(partition by instance_number order by begin_interval_time desc) num
from (select snap_id,
instance_number,
begin_interval_time,
end_interval_time,
snap_flag
from dba_hist_snapshot
order by instance_number, begin_interval_time))
select snap_id,instance_number,begin_interval_time,end_interval_time,snap_flag from t where num<=10 order by instance_number,begin_interval_time;
当前数据库设置的快照生成时间为每小时一个,仔细对比上边的4个节点的快照生成时间,不难发现,节点1 从13-APR-20开始已经没有快照生成了,其他节点也出现了相隔2小时or >2小时才生成一次快照的现象
,回想上边抓到的sql 共有多个游标,也就是说没能生成快照的时间点该sql必定走了不良的执行计划,所以导致超时被kill
SNAP_IDINSTANCE_NUMBERBEGIN_INTERVAL_TIMEEND_INTERVAL_TIMESNAP_FLAG
28057113-APR-20 09.00.06.341 AM13-APR-20 10.00.18.680 AM0
28058113-APR-20 10.00.18.680 AM13-APR-20 11.00.06.287 AM0
28060113-APR-20 12.00.06.152 PM13-APR-20 01.00.30.297 PM0
28061113-APR-20 01.00.30.297 PM13-APR-20 02.00.01.093 PM0
28233107-MAY-20 10.04.14.903 PM08-MAY-20 05.16.24.675 PM1
28234108-MAY-20 05.16.24.675 PM08-MAY-20 05.25.23.869 PM1
28246109-MAY-20 09.51.50.009 AM09-MAY-20 10.53.05.153 AM1
28247109-MAY-20 10.53.05.153 AM09-MAY-20 12.00.06.767 PM0
28248109-MAY-20 12.00.06.767 PM09-MAY-20 01.00.18.207 PM0
28249109-MAY-20 01.00.18.207 PM09-MAY-20 02.00.29.186 PM0
28240208-MAY-20 10.00.14.741 PM08-MAY-20 11.00.11.279 PM0
28241208-MAY-20 11.00.11.279 PM09-MAY-20 12.00.59.568 AM0
28242209-MAY-20 12.00.59.568 AM09-MAY-20 09.06.20.210 AM0
28243209-MAY-20 09.06.20.210 AM09-MAY-20 09.20.47.483 AM0
28244209-MAY-20 09.20.47.483 AM09-MAY-20 09.33.38.493 AM0
28245209-MAY-20 09.33.38.493 AM09-MAY-20 09.51.50.048 AM0
28246209-MAY-20 09.51.50.048 AM09-MAY-20 10.53.05.214 AM0
28247209-MAY-20 10.53.05.214 AM09-MAY-20 12.00.06.849 PM0
28248209-MAY-20 12.00.06.849 PM09-MAY-20 01.00.18.605 PM0
28249209-MAY-20 01.00.18.605 PM09-MAY-20 02.00.29.250 PM0
28240308-MAY-20 10.00.16.037 PM08-MAY-20 11.00.11.235 PM0
28241308-MAY-20 11.00.11.235 PM09-MAY-20 12.00.59.561 AM0
28242309-MAY-20 12.00.59.561 AM09-MAY-20 09.06.20.189 AM0
28243309-MAY-20 09.06.20.189 AM09-MAY-20 09.20.47.480 AM0
28244309-MAY-20 09.20.47.480 AM09-MAY-20 09.33.38.491 AM0
28245309-MAY-20 09.33.38.491 AM09-MAY-20 09.51.50.049 AM0
28246309-MAY-20 09.51.50.049 AM09-MAY-20 10.53.05.207 AM0
28247309-MAY-20 10.53.05.207 AM09-MAY-20 12.00.06.828 PM0
28248309-MAY-20 12.00.06.828 PM09-MAY-20 01.00.18.663 PM0
28249309-MAY-20 01.00.18.663 PM09-MAY-20 02.00.29.388 PM0
28240408-MAY-20 10.00.14.785 PM08-MAY-20 11.00.11.281 PM0
28241408-MAY-20 11.00.11.281 PM09-MAY-20 12.00.59.575 AM0
28242409-MAY-20 12.00.59.575 AM09-MAY-20 09.06.20.189 AM0
28243409-MAY-20 09.06.20.189 AM09-MAY-20 09.20.47.482 AM0
28244409-MAY-20 09.20.47.482 AM09-MAY-20 09.33.38.497 AM0
28245409-MAY-20 09.33.38.497 AM09-MAY-20 09.51.50.047 AM0
28246409-MAY-20 09.51.50.047 AM09-MAY-20 10.53.05.203 AM0
28247409-MAY-20 10.53.05.203 AM09-MAY-20 12.00.06.865 PM0
28248409-MAY-20 12.00.06.865 PM09-MAY-20 01.00.18.609 PM0
28249409-MAY-20 01.00.18.609 PM09-MAY-20 02.00.29.628 PM0
总结:因为频繁的datapump备份导致X$KEWSSVCV基表更新频繁,且X$KEWSSVCV表无统计信息(统计信息自动收集已经关闭),导致创建snapshot语句执行计划不对耗时长被自动kill
X$KEWSSVCV表是内存表一般需要重启实例释放
所以,收集这几个表的统计信息,让创建snapshot语句根据CBO自动走正确的执行计划即可
#############test expdp generate service
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 racdb 28
1 SYS$USERS 28
1 racdbXDB 28
1 SYS$BACKGROUND 28
SQL>
SQL> create directory ohome as '/home/oracle';
Directory created.
SQL>
expdp dbmt/dbmt directory=ohome file=objs.dump tables=dbmt.objs
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "SYS$SYS.KUPC$C_1_20200511103119.RACDB" has 1 instance(s). <<<<<
Instance "racdb1", status READY, has 1 handler(s) for this service...
Service "SYS$SYS.KUPC$S_1_20200511103119.RACDB" has 1 instance(s). <<<<<
Instance "racdb1", status READY, has 1 handler(s) for this service...
Service "racdb" has 1 instance(s).
Instance "racdb1", status READY, has 1 handler(s) for this service...
Service "racdbXDB" has 1 instance(s).
Instance "racdb1", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@rac1 ~]$
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Legacy Mode Active due to the following parameters:
Legacy Mode Parameter: "file=objs.dump" Location: Command Line, Replaced with: "dumpfile=objs.dump"
Legacy Mode has set reuse_dumpfiles=true parameter.
Starting "DBMT"."SYS_EXPORT_TABLE_01": dbmt/******** directory=ohome dumpfile=objs.dump tables=dbmt.objs reuse_dumpfiles=true
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 32 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "DBMT"."OBJS":"P201601" 1.162 MB 21578 rows
. . exported "DBMT"."OBJS":"P201602" 1.161 MB 21579 rows
. . exported "DBMT"."OBJS":"P201603" 1.162 MB 21579 rows
. . exported "DBMT"."OBJS":"P201604" 1.161 MB 21578 rows
. . exported "DBMT"."OBJS":"PMAX" 0 KB 0 rows
Master table "DBMT"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for DBMT.SYS_EXPORT_TABLE_01 is:
/home/oracle/objs.dump
Job "DBMT"."SYS_EXPORT_TABLE_01" successfully completed at Mon May 11 10:31:40 2020 elapsed 0 00:00:17
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 racdb 28
1 SYS$USERS 28
1 racdbXDB 28
1 SYS$BACKGROUND 28
1 --UNKNOWN-- 56 <<<<56-0
SQL> alter system flush shared_pool;
System altered.
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 racdb 28
1 SYS$USERS 28
1 racdbXDB 28
1 SYS$BACKGROUND 28
1 --UNKNOWN-- 56
SQL>
SQL> exec dbms_stats.FLUSH_DATABASE_MONITORING_INFO();
PL/SQL procedure successfully completed.
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 racdb 28
1 SYS$USERS 28
1 racdbXDB 28
1 SYS$BACKGROUND 28
1 --UNKNOWN-- 56
SQL>
SQL> delete X$KEWSSVCV where SVCNAM='--UNKNOWN--';
delete X$KEWSSVCV where SVCNAM='--UNKNOWN--'
*
ERROR at line 1:
ORA-02030: can only select from fixed tables/views
-------测试环境:rac 19.3 新建的环境,未做过expdp/impdp操作
SQL> show pdbs
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB MOUNTED
SQL> SQL> SQL>
SQL>
SQL>
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 SYS$USERS 29
1 orclXDB 29
1 orcl 29
1 SYS$BACKGROUND 29
SQL> alter session set container=PDB;
Session altered.
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 pdb 29
-----导出根容器内的表c##dbmt.obj_test
[oracle@rac1:/home/oracle]$ expdp c##dbmt/**** directory=ohome file=objs.dump tables=c##dbmt.obj_test
Export: Release 19.0.0.0.0 - Production on Mon May 11 16:33:50 2020
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Legacy Mode Active due to the following parameters:
Legacy Mode Parameter: "file=objs.dump" Location: Command Line, Replaced with: "dumpfile=objs.dump"
Legacy Mode has set reuse_dumpfiles=true parameter.
Warning: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.
Starting "C##DBMT"."SYS_EXPORT_TABLE_01": c##dbmt/******** directory=ohome dumpfile=objs.dump tables=c##dbmt.obj_test reuse_dumpfiles=true
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "C##DBMT"."OBJ_TEST" 9.554 MB 72416 rows
Master table "C##DBMT"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for C##DBMT.SYS_EXPORT_TABLE_01 is:
/home/oracle/objs.dump
Job "C##DBMT"."SYS_EXPORT_TABLE_01" successfully completed at Mon May 11 16:55:30 2020 elapsed 0 00:12:27
----查询根容器
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 pdb 29
1 SYS$USERS 29
1 orclXDB 29
1 orcl 29
1 --UNKNOWN-- 58 <<<<<<<<<<<<
1 SYS$BACKGROUND 29
6 rows selected.
-----查询pdb
SQL> select inst_id,SVCNAM,count(*) from X$KEWSSVCV group by inst_id,SVCNAM;
INST_ID SVCNAM COUNT(*)
---------- ---------------------------------------------------------------- ----------
1 pdb 29
1 --UNKNOWN-- 58 <<<<<<<<<<<<<<<<,
-----导出pdb内的表dbmt.obj_test
ORA19C_pdb =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.3.11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = PDB)
)
)
编 号:
版 本 号:
受控状态:
密 级:
云和恩墨(北京)信息技术有限公司
<话单库><2020/05/09 >
无法生成AWR故障处理报告
版权声明和保密须知
本文件中出现的任何文字叙述、文档格式、插图、照片、方法、过程等内容,除另有特别注明,版权均属云和恩墨(北京)信息技术有限公司所有,受到有关产权及版权法保护。任何单位和个人未经云和恩墨(北京)信息技术有限公司的书面授权许可,不得复制或引用本文件的任何片断,无论通过电子形式或非电子形式。
Copyright © 2019云和恩墨(北京)信息技术有限公司 版权所有
文件修改记录
修改日期 |
版本号 |
变化状态 |
修改内容 |
修改人 |
2020/05/09 |
1.0 |
C |
|
李嘉诚 |
2020/05/13 |
1.1 |
A |
|
李嘉诚 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*变化状态:C——创建,A——增加,M——修改,D——删除
文件审批信息
版本号 |
审批人 |
审批角色 |
审批日期 |
发布日期 |
备注 |
1.2 |
张维照 |
|
2020/05/13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
目录
1. 故障概述
近期维护需求,需要抓取话单1节点AWR报告进行分析对比,发现@?/rdbms/admin/awrrpt.sql后没有近期可选的snapshot时间
下面是详细的故障分析诊断过程,以及详细的解决方案描述。
2. 故障分析
2.1. 故障现象
生成AWR报告发现节点1从4月13日起便不存在可选snap id了,无法生成需要的报告进行分析
snapshot设置也没问题,当前设置为保留45天,每小时自动生成一次
检查mmon进程也存在
检查快照select snap_id,instance_number,begin_interval_time,end_interval_time from dba_hist_snapshot order by instance_number,begin_interval_time
2.2. 故障根源
分析日志发现
Alert日志中相隔一定时间便出现如下问题
MMON进程是AWR后台进程,在Oracle10g中新增,另一个进程是MMNL
ASH 内存记录数据始终是有限的,为了保存历史数据,引入了自动负载信息库(AutomaticWorkload Repository ,AWR)由后台进程MMON完成。ASH信息同样被采集写出到AWR负载库中。由于内存不是足够的,所以MMNL进程在ASH写满后会将信息写出到AWR负载库中。ASH全部写出是不可接受的,所以一般只写入收集的10%的数据量,而且使用direct-pathinsert完成,尽量减少日志的生成,从而最小化数据库性能影响。写出到AWR负载库的ASH信息记录在AWR的基础表wrh$active_session_hist中,wrh$active_session_hist是一个分区表,Oracle会自动进行数据清理。
谁维护AWR?
分析MMON trace
看到auto flush suspended
M000trace可以看到进程有长时间等待且出现ORA 12715
至此,可以肯定是由于mmon进程遇到了问题导致不能生成快照,进一步分析
尝试手动发起快照并使用10046进行跟踪,尝试几次有出现快照未生成且当前session就被kill的现象,询问同事,没有任何人杀过会话
获取AWR快照将执行许多SQL语句来生成新的快照,并具有内置超时机制,该机制在默认情况下会导致300到900秒的超时。因此,如果快照生成停止,就会引发“时间限制违规”或“运行时超限”消息和相关的ORA-12751错误。
分析10046发现其中耗时最长的sql_id,在这一步进程被超时KILL
格式化后
insert into WRH$_SERVICE_STAT
(snap_id, dbid, instance_number, service_name_hash, stat_id, value)
select :snap_id,:dbid,:instance_number,stat.service_name_hash,stat.stat_id,stat.value
from v$active_services asvc, v$service_st
发现该sql多个执行计划其中有特别长耗时的引起注意
可以看到执行计划中引用的这三个表无统计信息,特别是X$KEWSSVCV的NUM_ROWS与实际行数有显著差异。
因此,优化器在X$KEWSSMAP、X$KEWSSVCV和X$KSWSASTAB之间选择了不合适的连接顺序,并选择了低效率索引来访问X$KEWSSVCV。
SQL> select count(*) from X$KSWSASTAB;
COUNT(*)
----------
8
SQL> select count(*) from X$KEWSSMAP;
COUNT(*)
----------
265
SQL> select count(*) from X$KEWSSVCV;
COUNT(*)
----------
2576336
2.3. 故障处置
Fixed index在dba_indexes中查询不到,它是一种内存中的C结构对象,索引名是以编号形式存在,如ind:2 , ind:1 .., 收集统计信息前访问X$KEWSSVCV使用的是2号索引,在KEWSOFF列上的索引,该列是大量重复的序列,选择率并不好,而收集统计信息后使用了1号索引(svchsh列),该列像是service hash值,唯一性特别好。使用1号索引查看SQL的平均执行时间在毫秒级。
至此问题已经解决,AWR已经可以手动生成,再看看gv$service_stats(X$KEWSSVCV)那么大是什么原因。 Automatic Workload Repository (kew) Statistics SerViCe V ? (I guess)
在alter中发现有大量ALTER SYSTEM SET service_names 这是datapump发起时会有的操作
查询当前数据库已注册的服务名,实例1服务中出现大量的unknown
在数据泵处理过程中,“——UNKNOWN——”服务名称将随着创建/删除服务的增加而增加,话单系统每月会调用大量的expdp进行表级备份任务发给nbu处理,吊起进程皆在节点1发起,所以导致节点1有这么多的UNKNOWN
查看v$service_stats视图的信息
发现X$KEWSSVCV表也是gv$service_stats的基表,生成snapshot的时候慢的那条insert into WRH$_SERVICE_STAT 也使用到了这个表
该库每月都会在节点1发起大量的expdp进行表导出备份上月数据.就会在基表X$KEWSSVCV产生大量——UNKNOWN数据,统计信息不准确(11G中oracle默认不会收集一些X$表的统计信息)
经过几个小时快照周期查看已经可以在正确时间间隔自动生成快照
with t as
(select snap_id,instance_number,begin_interval_time,end_interval_time,flush_elapsed,snap_flag,
row_number() over(partition by instance_number order by begin_interval_time desc) num
from (select snap_id,instance_number,begin_interval_time,end_interval_time,flush_elapsed,snap_flag
from dba_hist_snapshot
order by instance_number, begin_interval_time))
select snap_id,instance_number,begin_interval_time,end_interval_time,flush_elapsed,snap_flag
from t where num <= 150
order by instance_number, begin_interval_time;
当前数据库设置的快照生成时间为每小时一个,仔细对比4个节点的快照生成时间
其他节点也出现了相隔2小时或者>2小时才生成一次快照的现象,FLUSH_ELAPSED超时
因为统计信息不准,生成snapshot的语句产生了多个执行计划,也就是说没能生成快照的时间点该sql必定走了不良的执行计划,所以导致超时被kill,未生成相应时间点的快照.
图中蓝框为不正常时间快照SNAP_FLAG=1代表手动生成的快照 0代表自动生成
3. 根本解决方案及建议
X$KEWSSVCV表是内存表一般需要重启实例释放所以,收集这几个表的统计信息,让创建snapshot语句根据CBO自动走正确的执行计划即可
4. 其他
这是个4节点的RAC, 因为EXPDP导出任务都在节点1,所以X$KEWSSVCV 内存表记录在节点1非常明显, X$KEWSSVCV中的”–UNKNOWN–” 服务的问题(没有记录在MOS中的BUG)是由expdp每次执行结束后插入,测试该问题在19c中都一直存在,并且X$ FIXED TABLE只允许查询不允许任何DML,即使是SYSDBA也不可以,所以如果上面收集了统计信息后问题依旧存在(即使使用了正常的索引)或SQL性能慢的不可接受,建议重启实例释放X$ fixed table.
19C中同样有unknow条目
12C以上X$表统计信息便可以自动收集了,不会出现类似问题
5. 附件
参考MOS 2294282.1 2043531.1