RAC with asm on AIX, ORA-01114 error,with "gipcretAuthFail (22) " in ocssd.log

I/O Errors in Alert log with ORA-29701, with "gipcWait failed with 16" in trace (文档 ID 1496329.1)

1. Database alert log

Fri May 04 10:56:59 2018
Errors in file /oracle/app/oracle/diag/rdbms/orcl/rocl1/trace/rocl1_ora_65536796.trc:
ORA-01114: 将块写入文件  时出现 IO 错误 (块 # )
Fri May 04 10:57:00 2018

2. trace file

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/11.2.0/db_1
System name:    AIX
Node name:      rac1
Release:        1
Version:        7
Machine:        00F6E7C84C00
Instance name: rocl1
Redo thread mounted by this instance: 1
Oracle process number: 1540
Unix process pid: 13962128, image: oracle@rac1


*** 2018-05-04 10:56:58.840
*** SESSION ID:(292.52991) 2018-05-04 10:56:58.840
*** CLIENT ID:() 2018-05-04 10:56:58.840
*** SERVICE NAME:(orcl) 2018-05-04 10:56:58.840
*** MODULE NAME:(JDBC Thin Client) 2018-05-04 10:56:58.840
*** ACTION NAME:() 2018-05-04 10:56:58.840
 
2018-05-04 10:56:58.828: [ CSSCLNT]clssscConnect: gipcWait failed with 16 (12)
2018-05-04 10:56:58.840: [ CSSCLNT]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_scdb02_)) failed, rc 16
kgxgncin: CLSS init failed with status 3
kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS
kjfmsgr: unable to connect to NM for reg in shared group
ORA-01114: 将块写入文件  时出现 IO 错误 (块 # )
Dump of memory from 0x070001209CBA0328 to 0x070001209CBA0D3B
70001209CBA0320                   57495448 20544F44          [WITH TOD]

3. ocssd.log

-- 检查/oracle/app/11.2.0/grid/log/rac1/cssd/ocssd.log 文件
2018-05-04 10:56:59.495: [    CSSD][1029]clssgmQueueShare: (11ba99f10) target global grock DBORCL member 1 type 1 queued from client (1176496b0), global grock DBORCL, refcount 757
2018-05-04 10:56:59.495: [    CSSD][1029]clssgmRegisterShared: global grock DBORCL member 1 share type 1, refcount 757
2018-05-04 10:56:59.743: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11ba653d0, ret gipcretAuthFail (22) 
2018-05-04 10:56:59.743: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ]  error during accept on endp 1112a2970
2018-05-04 10:56:59.744: [GIPCXCPT][1029] gipcmodClscCallback: async request failed req 1172b0bf0 [00000000e3b63bc0] { gipcSendRequest : addr '', data 11727c490, len 48, olen 0, parentEndp 11abbcef
0, ret gipcretConnectionLost (12), objFlags 0x0, reqFlags 0x224 }, ret gipcretConnectionLost (12)
2018-05-04 10:56:59.745: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11abbcef0, ret gipcretConnectionInvalid (13)
2018-05-04 10:56:59.745: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretConnectionInvalid (13) ]  error during accept on endp 1112a2970
2018-05-04 10:56:59.804: [    CSSD][1029]clssscSelect: cookie accept request 11ad57f10
2018-05-04 10:56:59.804: [    CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ad57f10
2018-05-04 10:56:59.804: [    CSSD][1029]clssgmRegisterClient: proc(7589/11ad57f10), client(2/1174aaa90)
2018-05-04 10:56:59.804: [    CSSD][1029]clssscSelect: cookie accept request 11ba74630
2018-05-04 10:56:59.804: [    CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ba74630
2018-05-04 10:56:59.804: [    CSSD][1029]clssgmRegisterClient: proc(7591/11ba74630), client(1/117497510)
2018-05-04 10:56:59.931: [    CSSD][1029]clssgmRegisterShared: grp DG_LOCAL_DATA, mbr 0, type 1
2018-05-04 10:56:59.931: [    CSSD][1029]clssgmQueueShare: (11a93a690) target local grock DG_LOCAL_DATA member 0 type 1 queued from client (1174aaa90), local grock DG_LOCAL_DATA, refcount 721
2018-05-04 10:56:59.931: [    CSSD][1029]clssgmRegisterShared: local grock DG_LOCAL_DATA member 0 share type 1, refcount 721
2018-05-04 10:56:59.932: [    CSSD][1029]clssgmRegisterShared: grp DBORCL, mbr 1, type 1
2018-05-04 10:56:59.932: [    CSSD][1029]clssgmQueueShare: (11a93ab70) target global grock DBORCL member 1 type 1 queued from client (117497510), global grock DBORCL, refcount 758
2018-05-04 10:56:59.932: [    CSSD][1029]clssgmRegisterShared: global grock DBORCL member 1 share type 1, refcount 758
2018-05-04 10:57:00.194: [GIPCXCPT][1029] gipcmodClscCallback: async request failed req 11730eff0 [00000000e3b63c64] { gipcSendRequest : addr '', data 1172fce90, len 48, olen 0, parentEndp 11abbcef
0, ret gipcretConnectionLost (12), objFlags 0x0, reqFlags 0x224 }, ret gipcretConnectionLost (12)
2018-05-04 10:57:00.195: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11abbcef0, ret gipcretConnectionInvalid (13)
2018-05-04 10:57:00.195: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretConnectionInvalid (13) ]  error during accept on endp 1112a2970
2018-05-04 10:57:00.254: [    CSSD][1029]clssscSelect: cookie accept request 11ba4a590
2018-05-04 10:57:00.254: [    CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ba4a590
2018-05-04 10:57:00.254: [    CSSD][1029]clssgmRegisterClient: proc(7590/11ba4a590), client(2/11764d8f0)
2018-05-04 10:57:00.254: [    CSSD][1029]clssscSelect: cookie accept request 1109c2e00
2018-05-04 10:57:00.254: [    CSSD][1029]clssgmAllocProc: (11bac8dd0) allocated

4. 检查CRS_home空间及文件

目录空间足够。
ls -ld /var/tmp/.oracle
drwxrwxrwt    2 root     oinstall        256 Nov 23 2014  /var/tmp/.oracle
ls -ld /tmp/.oracle
drwxrwxrwt    2 root     oinstall       4096 Jan 23 01:43 /tmp/.oracle

5. 数据库此刻出现活动回话剧增,459f3z9u4fb3u语句查询字典视图出现(cursor: pin S wait on X)等待事件,且sga频繁收缩和扩展

SHRINK      |IMMEDIATE   |db_cache_size        |       93696|      93184|     93184|COMPLETE |05/03 16:44          |         1
SHRINK      |IMMEDIATE   |db_cache_size        |       93696|      93184|     93184|COMPLETE |05/03 16:44          |         2
SHRINK      |IMMEDIATE   |db_cache_size        |       93696|      93184|     93184|COMPLETE |05/03 16:44          |         2
GROW        |IMMEDIATE   |shared_pool_size     |       32768|      33280|     33280|COMPLETE |05/03 16:44          |         3
GROW        |IMMEDIATE   |shared_pool_size     |       32768|      33280|     33280|COMPLETE |05/03 16:44          |         3
GROW        |IMMEDIATE   |shared_pool_size     |       32768|      33280|     33280|COMPLETE |05/03 16:44          |         2
SHRINK      |IMMEDIATE   |db_cache_size        |       93184|      92672|     92672|COMPLETE |05/03 16:44          |         2
SHRINK      |IMMEDIATE   |db_cache_size        |       93184|      92672|     92672|COMPLETE |05/03 16:44          |         3
SHRINK      |IMMEDIATE   |db_cache_size        |       93184|      92672|     92672|COMPLETE |05/03 16:44          |         3
SHRINK      |IMMEDIATE   |db_cache_size        |       92672|      92160|     92160|COMPLETE |05/03 16:45          |         3
GROW        |IMMEDIATE   |shared_pool_size     |       33280|      33792|     33792|COMPLETE |05/03 16:45          |         3
GROW        |DEFERRED    |db_cache_size        |       92160|      92672|     92672|COMPLETE |05/03 16:55          |         1
SHRINK      |DEFERRED    |shared_pool_size     |       33792|      33280|     33280|COMPLETE |05/03 16:55          |         1
SHRINK      |DEFERRED    |shared_pool_size     |       33280|      32768|     32768|COMPLETE |05/04 09:53          |         0
GROW        |DEFERRED    |db_cache_size        |       92672|      93184|     93184|COMPLETE |05/04 09:53          |         0
GROW        |DEFERRED    |db_cache_size        |       93184|      93696|     93696|COMPLETE |05/04 10:02          |        88
SHRINK      |DEFERRED    |shared_pool_size     |       32768|      32256|     32256|COMPLETE |05/04 10:02          |        88
GROW        |DEFERRED    |db_cache_size        |       93696|      94208|     94208|COMPLETE |05/04 10:53          |       104
SHRINK      |DEFERRED    |shared_pool_size     |       32256|      31744|     31744|COMPLETE |05/04 10:53          |       104
SHRINK      |IMMEDIATE   |db_cache_size        |       94208|      93696|     93696|COMPLETE |05/04 10:54          |         1
GROW        |IMMEDIATE   |shared_pool_size     |       31744|      32256|     32256|COMPLETE |05/04 10:54          |         1
GROW        |IMMEDIATE   |shared_pool_size     |       32256|      32768|     32768|COMPLETE |05/04 10:54          |         7
SHRINK      |IMMEDIATE   |db_cache_size        |       93696|      93184|     93184|COMPLETE |05/04 10:54          |         6
GROW        |IMMEDIATE   |shared_pool_size     |       32256|      32768|     32768|COMPLETE |05/04 10:54          |         6
SHRINK      |IMMEDIATE   |db_cache_size        |       93696|      93184|     93184|COMPLETE |05/04 10:54          |         7
GROW        |IMMEDIATE   |shared_pool_size     |       32768|      33280|     33280|COMPLETE |05/04 10:55          |         1
SHRINK      |IMMEDIATE   |db_cache_size        |       93184|      92672|     92672|COMPLETE |05/04 10:55          |         1
SHRINK      |IMMEDIATE   |db_cache_size        |       92672|      92160|     92160|COMPLETE |05/04 10:55          |         4
SHRINK      |IMMEDIATE   |db_cache_size        |       92672|      92160|     92160|COMPLETE |05/04 10:55          |         1
GROW        |IMMEDIATE   |shared_pool_size     |       33280|      33792|     33792|COMPLETE |05/04 10:55          |         4
GROW        |IMMEDIATE   |shared_pool_size     |       33280|      33792|     33792|COMPLETE |05/04 10:55          |         1
SHRINK      |DEFERRED    |shared_pool_size     |       33792|      33280|     33280|COMPLETE |05/04 11:09          |        85
GROW        |DEFERRED    |db_cache_size        |       92160|      92672|     92672|COMPLETE |05/04 11:09          |        85

Cause 3. ocssd log has "gipcretAuthFail (22)" (文档 ID 1496329.1)

Example:

2012-09-08 05:26:31.168: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ]  error during accept on endp 111249b70
gipcretAuthFail (22) indicates "general security authorization failure".

This could occur for multiple reasons: 
* if filesystem is full and there is no space to create file under auth directory. Please check if there is sufficient space in CRS_HOME. 
* Also this issue could occur if /var/tmp/.oracle socket is deleted (/tmp/.oracle on some platforms) . Please check on this too.

核查结果与【Cause 3. ocssd log has "gipcretAuthFail (22)" (文档 ID 1496329.1)】情况一致,但我们数据库软件目录空间足够且.oracle文件存在。

问题分析总结:ORA-01114告警是由于数据库SGA出现抖动引起数据库出现性能问题导致。

处理建议:增加SGA大小132G扩大到180G(v$sga_target_advice建议值)

猜你喜欢

转载自www.cnblogs.com/wandering-mind/p/8992892.html