Four IO scheduling algorithms for Linux kernel

The Linux kernel contains four IO schedulers, namely Noop IO scheduler, Anticipatory IO scheduler, Deadline IO scheduler and CFQ IO scheduler.

anticipatory, expected; occurred early; expected

Usually the read and write impact of the disk is caused by the movement of the head to the cylinder. To solve this delay, the kernel mainly uses two strategies: cache and IO scheduling algorithm to compensate.

This article makes a brief introduction.

Scheduling algorithm concept

  1. When writing data blocks to the device or reading data blocks from the device, the requests are placed in a queue and waiting to be completed.

  2. Each block device has its own queue.

  3. The I / O scheduler is responsible for maintaining the order of these queues to make more efficient use of media. The I / O scheduler turns out-of-order I / O operations into ordered I / O operations.

  4. The kernel must first determine how many requests there are in the queue before scheduling.

1233356-a4c8b12cd2bc23f1.png

IO Scheduler (IO Scheduler)

1233356-866dc591d4af83a5.png

The IO Scheduler (IO Scheduler) is a method used by the operating system to determine the order in which IO operations are submitted on block devices. There are two purposes, one is to improve IO throughput, and the other is to reduce IO response time. However, IO throughput and IO response time are often contradictory. In order to balance the two, the IO scheduler provides multiple scheduling algorithms to adapt to different IO request scenarios. Among them, the most advantageous algorithm for the random reading and writing scenario of the database is DEANLINE.

The location of the IO scheduler in the kernel stack is as follows:

1233356-b48ba9a597c6119d.png
1233356-1760f2bf45d57b47.png

The most tragic part of the block device is the disk rotation, this process will be very time-consuming.
Each block device or block device partition has its own request queue (request_queue), and each request queue can select an I / O scheduler to coordinate the submitted request. The basic purpose of the I / O scheduler is to arrange the requests according to their corresponding sector numbers on the block device to reduce the movement of the magnetic head and improve the efficiency. The requests in the request queue of each device will be responded in order. In fact, in addition to this queue, each scheduler maintains a different number of queues to process the submitted requests, and the request at the top of the queue will be moved to the request queue in time to wait for a response.

1233356-199f65a6c352577e.png

The role of IO scheduler is mainly to reduce the need for disk rotation. Mainly achieved in 2 ways:

1. Merge
2. Sort

Each device will correspond to its own request queue, and all requests will be on the request queue before being processed. When a new request comes in, if it is found that this request is adjacent to a previous request, then it can be merged into one request. If the merger cannot be found, it will be sorted according to the rotation direction of the disk. Usually the role of the IO scheduler is to merge and sort, while not too affecting the processing time of a single request.

1、NOOP

1233356-9b8e15d7ea41ae04.png
FIFO
  1. What
    is noop? Noop is an input and output scheduling algorithm. NOOP, No Operation. Do nothing, request to process one by one. This approach is actually simpler and more effective. The problem is that there are too many disk seeks, which is unacceptable for traditional disks. But for the SSD disk, it can be, because the SSD disk does not need to rotate.

  2. Noop is
    also known as the elevator scheduling algorithm.

  3. What is the noop principle?

Put the input and output requests in a FIFO queue, and then execute the input and output requests in the queue in order:

When a new request comes:

  1. If you can merge, merge
  2. If it cannot be merged, it will try to sort. If the requests on the queue are already very old, this new request cannot be queued and can only be placed at the end. Otherwise, insert it in a suitable position
  3. If it cannot be merged and there is no suitable place to insert, it is placed at the end of the request queue.
  1. Applicable scene

4.1 In a scenario where you do not want to modify the sequence of input and output requests;
  4.2 Devices with more intelligent scheduling algorithms under input and output, such as NAS storage devices;
  4.3 Input and output requests that have been carefully optimized by upper-layer applications;
  4.4 Non-rotating heads Disk devices, such as SSD disks

2. CFQ (Completely Fair Queuing)

CFQ (Completely Fair Queuing) algorithm, as the name suggests, is absolutely fair algorithm. It tries to allocate one 请求队列and one for all processes competing for the right to use the block device 时间片. Within the time slice allocated to the process by the scheduler, the process can send its read and write requests to the underlying block device. The request queue will be suspended, waiting for scheduling.

The time slice of each process and the queue length of each process depend on the IO priority of the process. Each process will have an IO priority. The CFQ scheduler will use it as one of the considerations to determine the process ’s When the request queue can obtain the right to use the block device.

IO priority from high to low can be divided into three categories:

RT(real time)
BE(best try)
IDLE(idle)

Among them, RT and BE can be further divided into 8 sub-priorities. Can be viewed and modified through ionice. The higher the priority, the earlier it is processed, the more time slices are used for this process, and the more requests will be processed at one time.

In fact, we already know that the fairness of the CFQ scheduler is for the process, and only synchronous requests (read or syn write) exist for the process. They will be placed in the process's own request queue, and all the same Asynchronous requests of priority, no matter which process they come from, will be put into a public queue. There are a total of 8 (RT) +8 (BE) +1 (IDLE) = 17 in the queue of asynchronous requests.

As of Linux 2.6.18, CFQ is the default IO scheduling algorithm. For general-purpose servers, CFQ is a better choice. Which scheduling algorithm to use depends on the specific business scenario to be selected as a benchmark. It cannot be determined solely by the text of others.

3、DEADLINE

DEADLINE solves the extreme case of starvation of IO requests based on CFQ.

In addition to the IO sort queue that CFQ itself has, DEADLINE additionally provides FIFO queues for read IO and write IO, respectively.

1233356-747e9addbb9219ba.png

The maximum wait time for the read FIFO queue is 500ms, and the maximum wait time for the write FIFO queue is 5s (of course these parameters can be set manually).

The IO request priority in the FIFO queue is higher than that in the CFQ queue, and the priority of the read FIFO queue is higher than the priority of the write FIFO queue. The priority can be expressed as follows:

FIFO(Read) > FIFO(Write) > CFQ

The deadline algorithm guarantees a minimum delay time for a given IO request. From this point of view, it should be suitable for DSS applications.

The deadline is actually an improvement to Elevator:
1 to avoid some requests from being processed for too long.
2 Differentiate between read and write operations.

deadline IO maintains 3 queues. Like the Elevator, the first queue is sorted by physical location as much as possible. The second queue and the third queue are sorted by time, the difference is that one is a read operation and the other is a write operation.

The reason why deadline IO distinguishes between read and write is because the designer believes that if the application sends a read request, it will generally block there and wait until the result is returned. The write request is not usually an application request to write to the memory, and the background process writes it back to the disk. Applications generally continue to go down without waiting for the completion of writing. So read requests should have a higher priority than write requests.

Under this design, each new request will be placed in the first queue first. The algorithm is the same as the Elevator, and it will also be added to the end of the read or write queue. In this way, some requests from the first queue are processed first, and whether the first few requests of the second / third queue have been waited too long. If it has exceeded a threshold, it will be processed. This threshold is 5ms for read requests and 5s for write requests.

Personally think that for the partition that records the database change log, such as Oracle's online log, MySQL's binlog, etc., it is best not to use this partition. Because this type of write request is usually called fsync. If the writing is not completed, it will also affect the application performance.

4、ANTICIPATORY

The focus of CFQ and DEADLINE consideration is to meet scattered IO requests. For continuous IO requests, such as sequential read, there is no optimization.

In order to meet the scenario of random IO and sequential IO, Linux also supports the ANTICIPATORY scheduling algorithm. On the basis of DEADLINE, ANTICIPATORY has set a waiting time window of 6ms for each read IO. If the OS receives a read IO request from an adjacent location within these 6ms, it can be satisfied immediately.

summary

The choice of IO scheduler algorithm depends on both hardware characteristics and application scenarios.

On traditional SAS disks, CFQ, DEADLINE, and ANTICIPATORY are all good choices; for dedicated database servers, DEADLINE's throughput and response time are good. However, on emerging solid-state drives such as SSD and Fusion IO, the simplest NOOP may be the best algorithm, because the optimization of the other three algorithms is based on shortening the seek time, and the solid-state drive has no so-called seek time IO response time is very short.

1233356-56c0ffbe11bc0b1a.png

Fanwai

SUSE Linux Enterprise Server 11 SP1 provides a number of I/O scheduler alternatives to optimize for different I/O usage patterns. You can use the elevator= option at boot time to set the scheduler for I/O devices or you can assign a specific I/O scheduler to individual block devices.

Completely Fair Queuing (CFQ) scheduler

The Completely Fair Queuing (CFQ) scheduler is the default I/O scheduler for SUSE Linux Enterprise Server 11 SP1. The CFQ scheduler maintains a scalable per-process I/O queue and attempts to distribute the available I/O bandwidth equally among all I/O requests. The effort balancing of I/O requests has some CPU costs.

Deadline scheduler

The Deadline scheduler is one alternative to the CFQ scheduler. The deadline scheduler uses a deadline algorithm to minimize I/O latency by attempting to guarantee a start time for an I/O request. The scheduler attempts to be fair among multiple I/O requests and to avoid process starvation. This scheduler wills aggressively re-order requests to improve I/O performance.

NOOP scheduler

The NOOP scheduler is another alternative, that can help minimize the costs of CPU utilization of managing the I/O queues. The NOOP scheduler is a simple FIFO queue that uses the minimal amount of CPU/instructions per I/O operation to accomplish the basic merging and sorting functionality to complete the I/O operations.

I/O scheduler test results

In this test, twenty-three of the spinning hard disks attached to System B are replaced by fifty-two 73 GB SAS SSD devices for main database space.

Table 1 compares the database performance with the 3 different I/O schedulers using all of the workloads. We see that the in the mixed read/write workloads (2 and 4) the NOOP scheduler has a negative impact on performance and so should not be used for these workloads. The Deadline I/O scheduler shows a performance benefit when run against the smaller workloads on lesser performing storage while having no impact on the larger workloads with better performing storage.

Note

Values shown in the following table are the Performance metric result.

1233356-3f5f4f5d0c08b961.png

I/O scheduler conclusions

For the workloads used in this test, the Deadline scheduler was a better choice overall than the default CFQ scheduler. It yielded a higher throughput in the smaller workloads and performed equally to the CFQ scheduler in the larger workloads. While these particular workloads performed best with the deadline scheduler, not all workloads benefit equally from the same scheduler. If optimal performance is an requirement, it is worthwhile to investigate the benefits of each I/O scheduler for your workloads.

References

https://www.bilibili.com/read/cv4365546
https://www.cnblogs.com/gomysql/p/3582185.html
https://www.cnblogs.com/cobbliu/p/5389556.html
https://www.cnblogs.com/zhfan/archive/2013/06/08/3125856.html
https://www.ibm.com/support/knowledgecenter/en/linuxonibm/performance/tuneforsybase/ioschedulers.htm


Kotlin developer community

1233356-4cc10b922a41aa80

The public account of the first Kotlin developer community in China, which mainly shares and exchanges related topics such as Kotlin programming language, Spring Boot, Android, React.js / Node.js, functional programming, and programming ideas.

The more noisy the world, the more peaceful thinking is needed.

1665 original articles published · 1067 praised · 750,000 views

Guess you like

Origin blog.csdn.net/universsky2015/article/details/105531343