那些年遇到的坑---EPOLL（cpu100%/卡死）

那些年遇到的坑—EPOLL（cpu100%/卡死）

最近在定位BUG时发现了这么一个问题

程序跑的设备上4个CPU只有一个CPU是跑满的状态，其他CPU的占用率都非常低，且此设备上几乎无业务。

通过火焰图定位是哪个模块出现问题之后，通过pstack 看看有问题的这个进程在干什么

pstack pid
#0 0x00007fbea3424d43 in epoll_wait () from /lib64/libc.so.6
通过pstack 可以看到，有大量的系统调用sys_epoll_wait占用了CPU
所以怀疑event_base_loop陷入了死循环，一直在反复调用epoll_wait。
也可以使用
-bash-4.2# perf top -e cpu-clock -p pid
-bash-4.2# perf report -i perf.data 查看系统调用占用率

为了确认是否存在反复调用epoll_wait的问题，在设备上安装了systemtap工具，通过工具每隔1秒打印1次epoll_wait的调用次数。结果如下：
=Timer(1) Begin=
=Timer(1) End=
=Timer(2) Begin=
pid( 1554) xxx1 --> called 200843 epoll_wait(epfd13)
pid( 651) xxx-daemon --> called 127 epoll_wait(epfd4)
pid( 480) xxx-journal --> called 9 epoll_wait(epfd7)
pid( 604) xxx --> called 18 epoll_wait(epfd9)
pid( 1744) xxx --> called 3 epoll_wait(epfd5)
pid( 1947) xxx --> called 1 epoll_wait(epfd8)
=Timer(2) End=
=Timer(3) Begin=
pid( 1554) xxx1 --> called 408320 epoll_wait(epfd13)
pid( 651) xxx-daemon --> called 127 epoll_wait(epfd4)
pid( 480) xxx-journal --> called 9 epoll_wait(epfd7)
pid( 604) xxx --> called 18 epoll_wait(epfd9)
pid( 1744) xxx --> called 7 epoll_wait(epfd5)
pid( 1947) xxx --> called 2 epoll_wait(epfd8)
pid( 1554) xxx --> called 1 epoll_wait(epfd19)
=Timer(3) End=
=Timer(4) Begin=
pid( 1554) xxx1 --> called 615056 epoll_wait(epfd13)
pid( 670) xxx-logind --> called 135 epoll_wait(epfd4)
pid( 480) xxx-journal --> called 9 epoll_wait(epfd7)
pid( 604) xxx --> called 18 epoll_wait(epfd9)
pid( 1744) xxx --> called 11 epoll_wait(epfd5)
pid( 1947) xxx --> called 3 epoll_wait(epfd8)
pid( 1554) xxx --> called 1 epoll_wait(epfd19)
pid( 501) xxx-udevd --> called 2 epoll_wait(epfd11)
=Timer(4) End=

可以看到，的确是epoll_wait 惹的祸，xxx1调用高达了几十万次/s
可以分享一下stap脚本：

#! /usr/bin/env stap

global count
global process 
global pid

global times=0

probe begin, timer.s(1) {

    times++
    printf("=======================Timer(%d) Begin=======================\n", times)
    foreach (i in count)
    {
        printf("pid(% 5d) % 11s --> called % 10d epoll_wait(epfd%d)\n", pid[i], process[i], count[i], i)
    }
    printf("=======================Timer(%d) End=======================\n", times)

    if (times > 4)
    {
        exit()
    }
}

probe syscall.epoll_wait {
    count[epfd]++    
    pid[epfd]=pid()
    process[epfd]=execname()
}

通过systemtap脚本确认了传入epoll_wait的timeout后继续进行分析！

通过gdb设置断点，查看一直触发epoll_wait的fd=33，且events = 0x24ed0d0（EPOLLHUP | EPOLLRDHUP | EPOLLRDNORM | EPOLLRDBAND等），同时通过lsof未发现fd33，即此fd已被关闭，但是没有从epoll中移除，所以导致epoll_wait一直被触发。
通过systemtap脚本打印所有socket 、close、epoll_ctl的调用，发现对于socket 33，其是先close socket，再epoll_ctl(EPOLL_CTL_DEL)。此种操作是否有问题呢，从epoll的man手册中只找了以下的描述

man epoll
   Q6  Will closing a file descriptor cause it to be removed from all epoll sets automatically?

   A6  Yes,  but be aware of the following point.  A file descriptor is a reference to an open file description (see open(2)).  Whenever a descriptor is
       duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is  created.   An
       open file description continues to exist until all file descriptors referring to it have been closed.  A file descriptor is removed from an epoll
       set only after all the file descriptors referring to the underlying open file description have been  closed  (or  before  if  the  descriptor  is
       explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL).  This means that even after a file descriptor that is part of an epoll set has been closed,
       events may be reported for that file descriptor if other file descriptors referring to the same underlying file description remain open.

当这个socket没有被dup, fcntl ,fork等函数的复制，使得有新的fd指向同一个文件，那么就会自动删除。其实就是，当close是真的释放了文件描述符资源，而不是减少文件描述的引用计数的话，就会自动从epoll 监听文件描述符集合中删除。所以答案并不是肯定的，而是要看删除的这个文件描述符的引用计数是否为0.

当socket 33被关闭时，已经自动从epoll中移除了，再去手动epoll_ctl 删除时就会发生问题，会死在epoll_wait()里面！！

扫描二维码关注公众号，回复： 12676228 查看本文章

后来想想，如果使用了dup, fcntl ,fork等函数的复制，使关闭时引用计数不为0会怎么样

假设 Process A 创建了 epoll，并将 fd0 注册到 epoll 中。Process A fork 子进程 B，此时 B 也拥有和 A 同样的 fd table，B 的 epoll file description 和 A 是同一个。

复制 fd 问题：

继续上面的例子，假设 A close(fd0)，A 以为自己已经关闭了 fd0，不会收到 fd0 任何事件了。但是由于如下原因

epoll 监听的是 file description
只有指向 file description 的 file descriptor 都关闭，file description 才会删除
虽然 A 关闭 fd0，但是file description 还有 B 的 fd0 指着，所以不会删除

所以 A 还是会继续收到 fd0 的事件。由此可以看出，epoll 注册对象的生命周期和对象对应的 fd 生命周期不完全一致。

再比如，epoll 监听了 fd，程序执行 fd2 = dup(fd)，然后调用 close(fd)，会出现如下问题

程序还是能接收到 fd 的事件
fd 不能从 epoll 里面删除，即使做如下 epoll_ctl 操作也不行

epoll_ctl(efpd, EPOLL_CTL_DEL, rfd)
epoll_ctl(efpd, EPOLL_CTL_DEL, rfd2)

所以在 close 之前，一定要记得先把 fd 从 epoll set 里面删除。

参考：https://copyconstruct.medium.com/the-method-to-epolls-madness-d9d2d6378642

那些年遇到的坑---EPOLL（cpu100%/卡死）

猜你喜欢