Oracle Linux网卡参数默认设置导致ORA-603

环境:OEL6.8,ORACLE 11.2.0.4 双节点RAC
节点2 alert日志报错信息如下:

Fri Nov 24 09:11:42 2017
skgxpvfynet: mtype: 61 process 11799 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_11799.trc (incident=123381):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_123381/orcl2_ora_11799_i123381.trc
Fri Nov 24 09:11:42 2017
skgxpvfynet: mtype: 61 process 11801 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
opiodr aborting process unknown ospid (11743) as a result of ORA-603
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_11801.trc (incident=123382):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_123382/orcl2_ora_11801_i123382.trc
Dumping diagnostic data in directory=[cdmp_20171124091142], requested by (instance=2, osid=11743), summary=[incident=123380].
trace文件

#/u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_11799.trc
*** 2017-11-24 09:11:42.123
*** CLIENT ID:() 2017-11-24 09:11:42.123
*** SERVICE NAME:() 2017-11-24 09:11:42.123
*** MODULE NAME:() 2017-11-24 09:11:42.123
*** ACTION NAME:() 2017-11-24 09:11:42.123

SKGXP:[7fbcc2bfda88.0]{0}: SKGXPVFYNET: Socket self-test could not verify successful transmission of 32768 bytes (mtype 61).
SKGXP:[7fbcc2bfda88.1]{0}: The network is required to support UDP protocol sends of this size. Socket is bound to 169.254.188.234.
SKGXP:[7fbcc2bfda88.2]{0}: phase ‘send’, 0 tries, 100 loops, 4905 ms (last)
struct ksxpp * ksxppg_ [0xc122540, 0x7fbcc2995310) = 0x7fbcc2995308
Dump of memory from 0x00007FBCC2995308 to 0x00007FBCC2996838

/u01/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_123381/orcl2_ora_11799_i123381.trc

Dump continued from file: /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_11799.trc
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurr
#========= Dump for incident 123381 (ORA 603) ========
经查MOS(Oracle Linux: ORA-27301:OS Failure Message: No Buffer Space Available (文档 ID 2041723.1),发现可能是因为网卡的MUT参数设置过高导致网卡的缓存不足导致的。

CAUSE

This happens due to less space available for network buffer reservation.

SOLUTION

  1. On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total
    Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers
    reducing the probability of a low-buffer-space conditions.

*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1048576 ***

On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value
is to be split across all the nodes.

On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here ‘n’ is the
number of NUMA nodes.

  1. Additionally, the MTU value should be modified as below

#ifconfig lo mtu 16436

To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :

MTU=16436
Save the file and restart the network service to load the changes

#service network restart
这应该是OEL操作系统专属的错误,对比OEL、RHEL、CentOS系统,发现只有OEL系统网卡本地回环的MTU是65536,其他系统均是16436,而MOS上的解决方案是将网卡本地回环的MTU改为16436。这台服务器网卡本地回环的MTU当前设置如下:

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:1193604960 errors:0 dropped:0 overruns:0 frame:0
TX packets:1193604960 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:498857656538 (464.5 GiB) TX bytes:498857656538 (464.5 GiB)
当前服务器网卡本地回环的MTU默认设置是65536,按照MOS文档的方法修改这个设置。

ifconfig lo mtu 16436
这个命令将修改内存中网卡的参数,直接生效。

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1193606824 errors:0 dropped:0 overruns:0 frame:0
TX packets:1193606824 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:498858566038 (464.5 GiB) TX bytes:498858566038 (464.5 GiB)
但是重启后将恢复默认值,如果保证重启也生效,就需要修改网卡的配置文件。

vi /etc/sysconfig/network-scripts/ifcfg-lo

DEVICE=lo
IPADDR=127.0.0.1
NETMASK=255.0.0.0
NETWORK=127.0.0.0

If you’re having problems with gated making 127.0.0.0/8 a martian,

you can change this to something else (255.255.255.255, for example)

BROADCAST=127.255.255.255
ONBOOT=yes
NAME=loopback
MTU=16436

猜你喜欢

转载自blog.csdn.net/weixin_44662991/article/details/126588754
今日推荐