In-depth understanding of select, poll, and epoll of e-commerce system notes

select poll

In the IO multiplexing model, the select function is responsible for listening to multiple sockets ( file descriptor FD) , and if any socket is ready, it returns readable. The user process can read the data and call receiveFrom again. In fact, if the number of concurrency is not large, using select/poll/epoll is not necessarily better than multi-threading + blocking IO performance

When the user process calls select, select will copy the set of readfds to be monitored to the kernel space (assuming that only the socket is readable), then traverse the socket sk it monitors, and call the poll logic of the sk one by one to check whether the sk is readable or not. There are readable events. After traversing all sks, if none of the sks are readable, then select will call schedule_timeout to enter the schedule loop, making the process sleep. If there is data readable on a certain sk within the timeout period, or waiting for the timeout, the process that calls select will be awakened. Next, select will traverse the monitored sk collection, collect readable events one by one, and return them to the user. The corresponding pseudocode is as follows:

 After a socket is created in user space, the return value is a file descriptor. Let's analyze how it is related to the file descriptor when the socket is created. In SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol), finally call sock_map_fd for association, where the returned retval is the file descriptor fd obtained by the user space, and sock is the successful socket created by calling sock_create.

shortcoming:

There are advantages and disadvantages, and the shortcomings of select:

[ 1 ] The monitored fds needs to be copied from user space to kernel space.
    In order to reduce the performance damage caused by data copying, the kernel imposes a limit on the size of the monitored fds set, and this is controlled by macros, and the size cannot be changed ( limitation to 1024 ) .
[ 2 ] In the monitored fds set, as long as there is data to read, the entire socket set will be traversed once and the poll function of sk is called to collect readable events
    Since the original requirement is simple, we only care about whether there is an event such as whether there is data to read. When the event notification comes, since the arrival of the data is asynchronous, we do not know how many monitored sockets have data when the event comes. Readable, so you can only traverse each socket one by one to collect readable events.

At this point, we have three problems to solve:

(1) The monitored fds set is limited to 1024, 1024 is too small, we hope to have a relatively large monitorable fds set

(2) The problem that the fds set needs to be copied from user space to kernel space, we hope that no copying is required

(3) When some of the monitored fds have readable data, we want the notification to be more refined, that is, we want to be able to get a list of fds with readable events from the notification, instead of traversing the entire fds to collect.

In order to only traverse ready fds, we need a place to organize those ready fds. To this end, epoll introduces an intermediate layer, a doubly linked list (ready_list), a separate sleep queue (single_epoll_wait_list)

Among the three problems left by select, problem (1) is a usage restriction problem, and problems (2) and (3) are performance problems. Poll is very similar to select. Poll does not solve the performance problem. Poll only solves the problem of select (1) The fds collection size is limited by 1024. The following is the function prototype of poll. Poll changes the description of the fds set and uses the pollfd structure instead of the fd_set structure of select, so that the limit of the fds set supported by poll is much larger than the 1024 of select. Although poll solves the limitation of the fds set size of 1024, it does not change the fact that a large number of descriptor arrays are copied between the address spaces of user mode and kernel mode as a whole, and the readiness of individual descriptors triggers the traversal of the entire descriptor set. Inefficiency problem. The performance of poll decreases linearly with the increase of the monitored socket set, and poll is not suitable for large concurrency scenarios.


epoll(event poll)

The three problems left by select, problem (1) is relatively easy to solve, and poll can be solved in two or three times, but the solution of poll is a bit tasteless. To solve problems (2) and (3) seems to be more difficult, how to solve it? We know that in the computer industry, there are two kinds of problem-solving ideas:

[ 1 ] Any problem in computer science can be solved by adding an intermediate layer
 [ 2 ]  Change from centralized ( central ) processing to decentralized ( distributed ) processing

Collection copy problem:

epoll introduces the epoll_ctl system call, which isolates the high-frequency epoll_wait from the low-frequency epoll_ctl. At the same time, epoll_ctl uses three operations (EPOLL_CTL_ADD, EPOLL_CTL_MOD, EPOLL_CTL_DEL) to disperse the modification of the fds set that needs to be monitored, and changes only when there is a change, turning select or poll high-frequency, large-block memory copy (centralized processing) into epoll_ctl The low-frequency, small-block memory copy (scattered processing) avoids a large number of memory copies.

Epoll is solved by the kernel and user space mmap (memory mapping) the same memory. mmap maps an address of user space and an address of kernel space to the same physical memory address at the same time (whether user space or kernel space are virtual addresses, and finally mapped to physical addresses through address mapping), so that this physical memory Memory is visible to both the kernel and the user, reducing data exchange between user mode and kernel mode.

In addition, epoll uses epoll_ctl to add, delete, and modify the monitored fds set, so the problem of fast search of fd must be involved. Therefore, a low-time complexity data structure of adding, deleting, modifying, and checking is used to organize The monitored fds collection is essential

*hash before linux 2.6.8

* red-black tree after linux 2.6.8

On-demand traversal of ready FDS issues


epoll introduces an intermediate layer, a doubly linked list (ready_list), a separate sleep queue (single_epoll_wait_list)

epoll cleverly introduces an intermediate layer to solve the problem of invalid traversal of a large number of monitoring sockets. Careful students will find that epoll prepares a separate callback function epoll_callback_sk for each monitored socket on the middle layer, and for select/poll, all sockets share the same callback function. It is this separate callback epoll_callback_sk that enables each socket to process itself independently, and hang its own socket into epoll's ready_list when it is ready. At the same time, epoll introduces a sleep queue single_epoll_wait_list, which divides two types of sleep waits. The process no longer sleeps on the sleep queues of all sockets, but sleeps on the sleep queues of epoll, waiting for "any socket readable ready" event. The intermediate wait_entry_sk sleeps on a specific socket instead of process, and when the socket is ready, it can process itself.

Very good article, cannot be reproduced due to copyright, share the link:

https://cloud.tencent.com/developer/article/1005481

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325391997&siteId=291194637