08 - IO model for network communication optimization: how to solve the IO bottleneck under high concurrency?

When it comes to Java I/O, I believe you must be familiar with it. You may use I/O operations to read and write files, or you may use it to implement Socket information transmission... These are the operations related to I/O that we most often encounter in the system.

We all know that the speed of I/O is slower than the speed of memory, especially in the current era of big data, the performance problem of I/O is particularly prominent, and I/O read and write has become a system in many application scenarios. Performance bottlenecks cannot be ignored.

Today, let's take a deeper look at the performance problems exposed by Java I/O in high concurrency and big data business scenarios, start from the source, and learn optimization methods.

1. What is I/O

I/O is the main channel for machines to obtain and exchange information, and streams are the main way to complete I/O operations.

In computing, a stream is a transformation of information. The stream is ordered, so compared to a certain machine or application, we usually call the information received by the machine or application from the outside world as input stream (InputStream), and the information output from the machine or application is called output Stream (OutputStream), collectively referred to as input / output stream (I/O Streams).

When exchanging information or data between machines or programs, the object or data is always converted into a certain form of stream, and then through the transmission of the stream, after reaching the specified machine or program, the stream is converted into object data. Therefore, a stream can be regarded as a data carrier through which data exchange and transmission can be realized.

Java's I/O operation classes are under the package java.io, where InputStream, OutputStream, Reader, and Writer are the four basic classes in the I/O package, which deal with byte streams and character streams respectively. As shown below:

 Looking back on my experience, I remember that when I first read the Java I/O stream documentation, I had such a question, and I will share it with you here, that is: "Whether it is file reading or writing or network sending and receiving, the minimum information The storage units are all bytes, so why are I/O stream operations divided into byte stream operations and character stream operations? "

We know that characters must be transcoded to bytes. This process is very time-consuming. If we don't know the encoding type, it is easy to cause garbled characters. Therefore, the I/O stream provides an interface for directly manipulating characters, which is convenient for us to perform stream operations on characters at ordinary times. Let's understand the "byte stream" and "character stream" respectively.

1.1, byte stream

InputStream/OutputStream is an abstract class of byte stream. These two abstract classes have derived several subclasses, and different subclasses handle different operation types. If it is a file read and write operation, use FileInputStream/FileOutputStream; if it is an array read and write operation, use ByteArrayInputStream/ByteArrayOutputStream; if it is an ordinary string read and write operation, use BufferedInputStream/BufferedOutputStream. The specific content is shown in the figure below:

1.2, character stream

Reader/Writer is an abstract class of character streams. These two abstract classes also derive several subclasses. Different subclasses handle different types of operations. The specific content is shown in the following figure:

2. Performance issues of traditional I/O

We know that I/O operations are divided into disk I/O operations and network I/O operations. The former reads the data source from the disk and inputs it into the memory, and then persists the read information to the physical disk; the latter reads the information from the network and inputs it into the memory, and finally outputs the information to the network. But whether it is disk I/O or network I/O, there are serious performance problems in traditional I/O.

2.1, multiple memory copies

In traditional I/O, we can read the data stream from the source data into the buffer through InputStream, and output the data to external devices (including disks and networks) through OutputStream. You can first look at the specific process of the input operation in the operating system, as shown in the following figure:

  • The JVM will issue a read() system call, and initiate a read request to the kernel through the read system call;
  • The kernel sends a read command to the hardware and waits for the read to be ready;
  • The kernel copies the data to be read into the pointed kernel cache;
  • The operating system kernel copies the data to the user-space buffer, and the read system call returns.

In this process, the data is first copied from the external device to the kernel space, and then copied from the kernel space to the user space, which results in two memory copy operations. This operation will cause unnecessary data copying and context switching, thereby reducing I/O performance.

2.2, blocking

In traditional I/O, InputStream's read() is a while loop operation, it will wait for the data to be read, and will not return until the data is ready. This means that if there is no data ready, the read operation will always be suspended, and the user thread will be blocked.

In the case of a small number of connection requests, there is no problem using this method, and the response speed is also high. But when a large number of connection requests occur, a large number of listening threads need to be created. At this time, if the thread has no data ready, it will be suspended and then enter the blocked state. Once a thread is blocked, these threads will continue to grab CPU resources, resulting in a large number of CPU context switches and increasing system performance overhead.

3. How to optimize I/O operations

In the face of the above two performance problems, not only the programming language has been optimized, but each operating system has also further optimized the I/O. JDK1.4 released the java.nio package (abbreviation for new I/O), and the release of NIO optimized the serious performance problems caused by memory copying and blocking. JDK1.7 released NIO2 again, which proposed asynchronous I/O realized from the operating system level. Let's take a look at the specific optimization implementation.

3.1. Use buffers to optimize read and write stream operations

In traditional I/O, stream-based I/O implementations, InputStream and OutputStream, are provided. This stream-based implementation processes data in units of bytes.

NIO is different from traditional I/O. It is based on blocks (Block), and it processes data with blocks as the basic unit. In NIO, the two most important components are buffer (Buffer) and channel (Channel). Buffer is a continuous block of memory, which is the transit point for NIO to read and write data. Channel represents the source or destination of buffered data, which is used to read buffered or written data, and is the interface to access buffered.

The biggest difference between traditional I/O and NIO is that traditional I/O is stream-oriented, while NIO is Buffer-oriented. Buffer can read files into memory at one time and then do subsequent processing, while the traditional method is to process data while reading files. Although traditional I/O also uses buffer blocks, such as BufferedInputStream, it is still not comparable to NIO. Using NIO to replace traditional I/O operations can improve the overall performance of the system, and the effect is immediate.

3.2. Use DirectBuffer to reduce memory copying

In addition to buffer block optimization, NIO's Buffer also provides a class DirectBuffer that can directly access physical memory. Ordinary Buffer allocates JVM heap memory, while DirectBuffer directly allocates physical memory.

We know that to output data to an external device, it must first be copied from user space to kernel space, and then copied to the output device, while DirectBuffer directly simplifies the steps to copy from kernel space to external device, reducing data copying.

To expand here, since DirectBuffer applies for non-JVM physical memory, the cost of creation and destruction is very high. The memory requested by DirectBuffer is not directly responsible for garbage collection by the JVM, but when the DirectBuffer wrapper class is recycled, the memory block will be released through the Java Reference mechanism.

3.3. Avoid blocking and optimize I/O operations

Many people in NIO also call it Non-block I/O, that is, non-blocking I/O, because it can better reflect its characteristics. Why do you say that?

Even if traditional I/O uses buffer blocks, there are still blocking problems. Due to the limited number of threads in the thread pool, once a large number of concurrent requests occur, threads exceeding the maximum number can only wait until there are idle threads in the thread pool that can be reused. When reading the input stream of the Socket, the read stream will be blocked until any of the following three situations occur:

  • have data to read;
  • connection release;
  • Null pointer or I/O exception.

The blocking problem is the biggest disadvantage of traditional I/O. After the release of NIO, the two basic components of channels and multiplexers have realized the non-blocking of NIO. Let's take a look at the optimization principles of these two components together.

3.3.1. Channel (Channel)

As we discussed earlier, data reading and writing of traditional I/O is copied back and forth from user space to kernel space, while data in kernel space is read or written from disk through the I/O interface at the operating system level.

At the beginning, when the application program calls the operating system I/O interface, the allocation is done by the CPU. The biggest problem with this method is "when a large number of I/O requests occur, the CPU is very consumed"; later, the operating system introduces DMA ( Direct memory storage), the access between the kernel space and the disk is completely responsible for the DMA, but this method still needs to apply for permission from the CPU, and needs to use the DMA bus to complete the data copy operation. If there are too many DMA buses, it will cause a bus conflict.

The emergence of channels solves the above problems. Channel has its own processor, which can complete I/O operations between kernel space and disk. In NIO, we read and write data through the Channel. Since the Channel is bidirectional, reading and writing can be performed at the same time.

3.3.2, multiplexer (Selector)

Selector is the basis of Java NIO programming. It is used to check whether the status of one or more NIO Channels is readable and writable.

Selector is based on event-driven implementation. We can register accpet and read monitoring events in Selector, and Selector will continuously poll the Channel registered on it. If a monitoring event occurs on a Channel, the Channel will be in the ready state, and then proceed I/O operations.

A thread uses a Selector to listen to events on multiple Channels by polling. We can set the channel to be non-blocking when registering the Channel. When there is no I/O operation on the Channel, the thread will not wait forever, but will continuously poll all Channels to avoid blocking.

At present, the I/O multiplexing mechanism of the operating system uses epoll. Compared with the traditional select mechanism, epoll does not have a limit of 1024 maximum connection handles. So Selector can theoretically poll thousands of clients.

3.3.3 Examples

Let me use a life-like scene as an example. After reading it, you will be more aware of what roles and functions Channel and Selector play in non-blocking I/O.

We can compare listening to multiple I/O connection requests to the entrance of a train station. In the past, only passengers on the nearest departure train could enter the station in advance, and there was only one ticket inspector. At this time, if passengers from other trains wanted to enter the station, they had to queue at the station entrance. This is equivalent to the earliest I/O operations that did not implement the thread pool.

Later, the railway station was upgraded, and several more ticket gates were added, allowing passengers of different trains to enter the station through their corresponding ticket gates. This is equivalent to creating multiple listening threads with multi-threading, and monitoring the I/O requests of each client at the same time.

In the end, the railway station was upgraded to accommodate more passengers. Each train can carry more passengers, and the trains are arranged reasonably. Passengers no longer queue up in groups and can enter the station through a large unified ticket gate. One ticket gate can check tickets for multiple trains at the same time. This large ticket gate is equivalent to Selector, train number is equivalent to Channel, and passengers are equivalent to I/O flow.

4. Summary

Java's traditional I/O is initially implemented based on two operation streams, InputStream and OutputStream. This stream operation is in bytes. If it is in a high-concurrency and large-data scenario, it is easy to cause blocking. Therefore, this operation Performance is very poor. In addition, the output data is copied from user space to kernel space, and then copied to the output device, which will increase the performance overhead of the system.

Traditional I/O later used Buffer to optimize the performance problem of "blocking". The buffer block was used as the smallest unit, but compared with the overall performance, it was still unsatisfactory.

So NIO was released. It is a stream operation based on buffer blocks. On the basis of Buffer, two new components "pipeline and multiplexer" are added to realize non-blocking I/O. NIO is suitable for a large number of In the case of I/O connection requests, these three components together improve the overall performance of I/O.

5. Thinking questions

In the JDK1.7 version, Java released the NIO upgrade package NIO2, which is AIO. AIO implements asynchronous I/O in the true sense, which directly hands I/O operations to the operating system for asynchronous processing. This is also an optimization of I/O operations, so why do many container communication frameworks still use NIO?

Guess you like

Origin blog.csdn.net/qq_34272760/article/details/132323633