案例——集群启动到css报错的问题处理记录

版权声明:本文为博主原创文章,欢迎转载! https://blog.csdn.net/qq_40687433/article/details/83106944

今天早上一套rac的的一个节点的服务器宕机

服务器起来后登陆上去查看集群和数据库的启动情况

查看crs是否正在启动,有进程表示在启动中或已起好

# ps -ef|grep d.bin
root       7528      1  2 09:35 ?        00:00:01 /grid/app/11.2.0/grid/bin/ohasd.bin reboot
grid       7916      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/oraagent.bin
grid       7928      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/mdnsd.bin
grid       7940      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/gpnpd.bin
grid       7954      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/gipcd.bin
root       7955      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/orarootagent.bin
root       7971      1  5 09:35 ?        00:00:02 /grid/app/11.2.0/grid/bin/osysmond.bin
root       7992      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/cssdmonitor
root       8011      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/cssdagent
grid       8022      1  0 09:35 ?        00:00:00 /grid/app/11.2.0/grid/bin/ocssd.bin
root       8630   7235  0 09:36 pts/0    00:00:00 grep --color=auto d.bin

查看集群启动的阶段

 # /grid/app/11.2.0/grid/bin/crsctl stat res -t -init
-------------------------------------------------------------
NAME           TARGET  STATE        SERVER     STATE_DETAILS       
-------------------------------------------------------------
Cluster Resources
-------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                               
ora.crf
      1        ONLINE  ONLINE       xsdbd31                  
ora.crsd
      1        ONLINE  OFFLINE                               
ora.cssd
      1        ONLINE  OFFLINE                 STARTING
     
ora.cssdmonitor
      1        ONLINE  ONLINE       xsdbd31                  
ora.ctssd
      1        ONLINE  OFFLINE                               
ora.diskmon
      1        OFFLINE OFFLINE                               
ora.evmd
      1        ONLINE  OFFLINE                               
ora.gipcd
      1        ONLINE  ONLINE       xsdbd31                  
ora.gpnpd
      1        ONLINE  ONLINE       xsdbd31                  
ora.mdnsd
      1        ONLINE  ONLINE       xsdbd31 

集群启动到了css节点,cssd处于starting状态

正常来说,集群启动也是需要时间的,一般等待几分钟就集群就启动到下一个阶段

但是今天就出现了问题

等待了几分钟ora.cssd仍然没有起来,查看css日志(11g:$ORACLE_HOME/log/nodex/nodex/cssd/ocssd.log)

$ tail -f  ocssd.log
2018-10-17 09:41:38.087: [    CSSD][2186336000]clssscSelect: cookie accept request 0x13bbad0
2018-10-17 09:41:38.087: [    CSSD][2186336000]clssgmAllocProc: (0x7f1c6c085110) allocated
2018-10-17 09:41:38.088: [    CSSD][2186336000]clssgmClientConnectMsg: properties of cmProc 0x7f1c6c085110 - 1,2,3,4,5
2018-10-17 09:41:38.088: [    CSSD][2186336000]clssgmClientConnectMsg: Connect from con(0x38a1) proc(0x7f1c6c085110) pid(7954) version 11:2:1:4, properties: 1,2,3,4,5
2018-10-17 09:41:38.088: [    CSSD][2186336000]clssgmClientConnectMsg: msg flags 0x0000
2018-10-17 09:41:38.320: [    CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:38.320: [    CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:38.320: [    CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(341/0x7f1c6c083900)
2018-10-17 09:41:38.320: [    CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:38.320: [    CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x38c7
2018-10-17 09:41:39.321: [    CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:39.321: [    CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:39.321: [    CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(342/0x7f1c6c083420)
2018-10-17 09:41:39.321: [    CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:39.321: [    CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x38dd
2018-10-17 09:41:39.663: [    CSSD][2186336000]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:39.663: [    CSSD][2186336000]clssgmDeadProc: proc 0x7f1c6c085110
2018-10-17 09:41:39.663: [    CSSD][2186336000]clssgmDestroyProc: cleaning up proc(0x7f1c6c085110) con(0x38a1) skgpid  ospid 7954 with 0 clients, refcount 0
2018-10-17 09:41:39.663: [    CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x38a1
2018-10-17 09:41:40.323: [    CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:40.323: [    CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:40.323: [    CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(343/0x7f1c6c083010)
2018-10-17 09:41:40.323: [    CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:40.323: [    CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x3901
2018-10-17 09:41:40.841: [    GPNP][2183792384]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2104] get-profile call to url "ipc://GPNPD_xsdbd31" disco "" [f=0 claimed- host: cname:  
2018-10-17 09:41:40.848: [    GPNP][2183792384]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2234] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_xsdbd31"
2018-10-17 09:41:40.848: [    CSSD][2183792384]clssnmReadDiscoveryProfile: voting file discovery string(/dev/raw/raw*)
2018-10-17 09:41:40.848: [    CSSD][2183792384]clssnmvDDiscThread: using discovery string /dev/raw/raw* for initial discovery 
2018-10-17 09:41:40.848: [   SKGFD][2183792384]Discovery with str:/dev/raw/raw*:

2018-10-17 09:41:40.848: [   SKGFD][2183792384]UFS discovery with :/dev/raw/raw*:

2018-10-17 09:41:40.848: [   SKGFD][2183792384]Execute glob on the string /dev/raw/raw*
2018-10-17 09:41:40.848: [   SKGFD][2183792384]running stat on disk:/dev/raw/rawctl
2018-10-17 09:41:40.848: [   SKGFD][2183792384]WARNING: Using brute force method to determine the size of /dev/raw/rawctl.
 There will be performance issues. Please check configuration to determine the cause for the failure of ioctl

2018-10-17 09:41:40.848: [   SKGFD][2183792384]Fetching UFS disk :/dev/raw/rawctl:

2018-10-17 09:41:40.848: [   SKGFD][2183792384]OSS discovery with :/dev/raw/raw*:

2018-10-17 09:41:40.848: [    CSSD][2183792384]clssnmvDiskVerify: Successful discovery of 0 disks
2018-10-17 09:41:40.848: [    CSSD][2183792384]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2018-10-17 09:41:40.848: [    CSSD][2183792384]clssnmvFindInitialConfigs: No voting files found
2018-10-17 09:41:40.848: [    CSSD][2183792384](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

2018-10-17 09:41:41.324: [    CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:41.324: [    CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:41.324: [    CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(344/0x7f1c6c068720)
2018-10-17 09:41:41.324: [    CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:41.325: [    CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x3943
2018-10-17 09:41:42.326: [    CSSD][2186336000]clssscSelect: cookie accept request 0x7f1c6c06ecf0
2018-10-17 09:41:42.326: [    CSSD][2186336000]clssscevtypSHRCON: getting client with cmproc 0x7f1c6c06ecf0
2018-10-17 09:41:42.326: [    CSSD][2186336000]clssgmRegisterClient: proc(3/0x7f1c6c06ecf0), client(345/0x7f1c6c069370)
2018-10-17 09:41:42.326: [    CSSD][2186336000]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f1c6c)
2018-10-17 09:41:42.326: [    CSSD][2186336000]clssgmDiscEndpcl: gipcDestroy 0x3959

css没有找到表决盘

$ ls -lrt /dev/raw/raw*
crw-rw----. 1 grid asmadmin 162, 0 Oct 17 09:35 /dev/raw/rawctl

惊了,raw绑定出来的设备都不在了

找到了rc.local中的绑定规则

rc.local:

/bin/raw /dev/raw/raw113    /dev/mapper/data113
/bin/raw /dev/raw/raw101    /dev/mapper/data101
/bin/raw /dev/raw/raw102    /dev/mapper/data102
/bin/raw /dev/raw/raw121    /dev/mapper/data121
/bin/raw /dev/raw/raw120    /dev/mapper/data120
/bin/raw /dev/raw/raw118    /dev/mapper/data118
/bin/raw /dev/raw/raw119    /dev/mapper/data119
...
/bin/raw /dev/raw/raw2     /dev/mapper/ocr102
/bin/raw /dev/raw/raw1     /dev/mapper/ocr101
/bin/raw /dev/raw/raw12    /dev/mapper/vote102
/bin/raw /dev/raw/raw115    /dev/mapper/data115
/bin/raw /dev/raw/raw11    /dev/mapper/vote101
chown -R grid:asmadmin /dev/raw/*

这里不知道是rc.local没有跑还是 /dev/mapper/* 多路径有问题

[grid@xsdbd31 etc]$ ls -lrt /dev/mapper/*
crw-------. 1 root root 10, 236 Oct 17 09:35 /dev/mapper/control
...
lrwxrwxrwx. 1 root root       8 Oct 17 09:35 /dev/mapper/data121 -> ../dm-11
lrwxrwxrwx. 1 root root       8 Oct 17 09:35 /dev/mapper/data122 -> ../dm-10
lrwxrwxrwx. 1 root root       7 Oct 17 09:35 /dev/mapper/data123 -> ../dm-9
lrwxrwxrwx. 1 root root       8 Oct 17 09:35 /dev/mapper/data124 -> ../dm-21
lrwxrwxrwx. 1 root root       8 Oct 17 09:35 /dev/mapper/data103 -> ../dm-15
lrwxrwxrwx. 1 root root       8 Oct 17 09:35 /dev/mapper/data105 -> ../dm-12

可以确定是rc.local没跑了

直接手动执行rc.local中的脚本(raw规则和chown更改权限都执行一下)

再观察css日志

# tail -f ocssd.log
2018-10-17 09:51:48.742: [    CSSD][4159170304]clssgmRPCDone: rpc 0x7f522daba818 (RPC#61) state 6, flags 0x100
2018-10-17 09:51:48.742: [    CSSD][4159170304]clssgmAddGrockMemCmpl: rpc 0x7f522daba818, ret 0, client 0x7f52180eb910 member 0x7f52182089f0
2018-10-17 09:51:48.742: [    CSSD][4159170304]clssgmAddGrockMemCmpl: sending type 6, size 540 to 0x7f52180eb910
2018-10-17 09:51:48.742: [    CSSD][4159170304]clssgmFreeRPCIndex: freeing rpc 61
2018-10-17 09:51:48.742: [    CSSD][4159170304]clssgmHandleGrockRcfgUpdate: grock(crs_version), updateseq(70788), status(0), sendresp(1)
2018-10-17 09:51:48.972: [    CSSD][4159170304]clssgmTestSetLastGrockUpdate: grock(crs_version), updateseq(70788) msgseq(70789), lastupdt<0x7f51e003fd50>, ignoreseq(0)
2018-10-17 09:51:48.972: [    CSSD][4159170304]clssgmUpdateGrpData: grock(crs_version), private data(84), incarn(15)
2018-10-17 09:51:48.973: [    CSSD][4159170304]clssgmHandleGrockRcfgUpdate: grock(crs_version), updateseq(70789), status(0), sendresp(1)
2018-10-17 09:51:49.797: [    CSSD][4156016384]clssnmSendingThread: sending status msg to all nodes
2018-10-17 09:51:49.797: [    CSSD][4156016384]clssnmSendingThread: sent 5 status msgs to all nodes

查看集群启动状态

# /grid/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xsdbd31                  Started             
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xsdbd31                                      
ora.crf
      1        ONLINE  ONLINE       xsdbd31                                      
ora.crsd
      1        ONLINE  INTERMEDIATE xsdbd31                                      
ora.cssd
      1        ONLINE  ONLINE       xsdbd31  
                                   
ora.cssdmonitor
      1        ONLINE  ONLINE       xsdbd31                                      
ora.ctssd
      1        ONLINE  ONLINE       xsdbd31                  OBSERVER            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  INTERMEDIATE xsdbd31                                      
ora.gipcd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.gpnpd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.mdnsd
      1        ONLINE  ONLINE       xsdbd31                

css起来了,正在启动crs

再等几分钟

# /grid/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xsdbd31                  Started             
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xsdbd31                                      
ora.crf
      1        ONLINE  ONLINE       xsdbd31                                      
ora.crsd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.cssd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.cssdmonitor
      1        ONLINE  ONLINE       xsdbd31                                      
ora.ctssd
      1        ONLINE  ONLINE       xsdbd31                  OBSERVER            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.gipcd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.gpnpd
      1        ONLINE  ONLINE       xsdbd31                                      
ora.mdnsd
      1        ONLINE  ONLINE       xsdbd31  

集群启动完成 

登陆数据库查看状态ok

猜你喜欢

转载自blog.csdn.net/qq_40687433/article/details/83106944