No matter whether you are JDK or Linux, I Netty sit firmly on the Diaoyutai

Hello, my name is yes.

Today's discussion is also an old-fashioned problem: the empty polling bug of the JDK Selector.

Today, let's take a brief look. This thing can be traced back to 2006, and based on this bug, we will spread it out to see what inspiration it can give us.

trace back

Haven't I been writing the Netty series recently? I want to talk about Netty, but if you have seen relevant information on the Internet, it will definitely mention the bug of JDK NIO empty polling under Linux system, which is to call Selector.select(timeout) , even if no event occurs, it will not block the timeout time, but return immediately, such an empty polling causes the CPU to 100%.

I will tell you the general reason for this bug: the connection is suddenly interrupted, poll and epoll will be awakened by POLLHUP or POLLERR events, so the Selector will be awakened, but the JDK Selector has no events (JDK only defines CONNECT, READ, WRITE) , ACCEPT these events) need to be dealt with?

Then it loops again... no events to process, then loops again... no events to process, then loops again...

So back and forth, empty polling makes the CPU 100%.

This bug is really an old calendar . I checked the JDK's bug library. I haven't mentioned any related bugs since March 2013. According to the official statement, it is also related to the Linux version. So far, there should be no such problem. ? (I'm not sure).

Let's review the history of the bug library.

I checked the earliest mention of Selector (the bottom layer is poll or epoll) will not block the BUG in Linux on March 24, 2006.

It can be seen that the date of this Resolved is a bit long. It took a year, that is, in 2007 mentioned in 2006, to say that it was repaired, but a solution was given that day:


The solution is simple and rude, that is, delete the Selector that does not block the ventilation, and then create a new one to replace it. (sometimes simple and crude is best)

I went back to find the bug of Selector, and found that there was still a bug in 2008 (isn't it said to be fixed?), and the date of processing is 13 years! The end result is that there is no way to fix it, it's the JDK 6 version that is relevant.


There are similar bugs in 2013, but on the JDK 7 version, the final resolution is an incomplete fix! .

Judging from this processing time and results, I personally infer that JDK 's attitude towards this bug is negative , thinking that this is a phenomenon caused by Linux's own bugs (the same programmers habitually dump them).

It can be seen from JDK-6670302the evaluation of :

the general meaning is: just upgrade the Linux kernel version (version 2.4 has this problem), it has been 4 years since version 2.6 was released, this repair is not necessary, and the demand is very small.

This is actually understandable.

From the perspective of a JDK developer : my code runs fine under windows, but it doesn't work in your Linux? Ok? my question? Why does Linux give me such inexplicable events? What did you interrupt wake up to?

But from a Linux developer's point of view, it's different: eh? Throw the pot at me? Obviously you didn't take this special situation into account when you wrote the code, so why don't you put the blame on me?

As far as our Java development is concerned: why do you leave it with me? If there is a bug, leave it here, I care about your JDK situation, you have to fix it for me! (I believe JDK developers see Linux developers the same way)

Hahaha, is it true?

In short, I personally think that the reason why this bug is repeatedly whipped by articles on the Internet is because netty saves the country through a curve, which really avoids the empty polling generated by Java Selector directly for applications deployed in Linux during that time period. bug, so it was worth talking about Netty at the time.

The second is that there are a lot of articles in the world, everyone who understands it.

By the way, although I checked the bug library and found that there is no similar bug in the future, there are articles on the Internet saying that this bug is still reproduced in JDK8!

Link: https://juejin.cn/post/6844903491505242119

Tsk tsk, as the saying goes, relying on others is worse than relying on yourself, Netty solves this bug by itself, which is the simple and rude solution mentioned above!

Netty : Empty loop, right? I count how many times you loop. As long as it reaches a certain number of times, I think you have a bug, so I will rebuild a Selector and discard the previous Selector that has been exhausted! In this way, I don't care whether you will deal with JDK or Linux, I will sit firmly in the Diaoyutai this wave!

This is Netty's solution~ So it can't be said that Netty has fixed the bug of JDK NIO. It is just a curve to save the country and avoid this bug in disguise.

This can actually provide us with some ideas for our daily development. Sometimes it is better to ask others than ourselves. We should treat the interfaces of the second and third parties with a skeptical attitude. Don’t trust them too much, especially the interfaces of the third party. Be prepared for the other party to die or return strange results.

I connected to the interface of a large factory before, and the return value changed silently without any announcements and notifications, which is the kind of value you think it is impossible to change. For example, an interface that returns the city name, the normal return is called Hangzhou City, which inexplicably changed to Hangzhou City (commonly used). Of course, I'm just giving an example, it's inconvenient to say what it is.

There is also an interface from another major manufacturer before. At that time, their service was almost in the case of hanging, the return was slow, and it often timed out. I still have experience with this. At first, I set the interface to timeout and wait for the response time. It is 3s, and the other party has a problem with the service, which often exceeds 3s, and then raises a work order to the other party, and the other party even asks me to adjust the timeout waiting time to 10s. Set a period of 10s, this is to let me hang with his service, right?

In this case, don't listen to the other party. You have to think that this is dragging down your application. The longer you set the timeout to wait, the longer the thread is occupied, and the other requests will not be If there are no threads to process, then requests will pile up, and eventually your application will all crash.

It’s also a loss for the other party to come up with this kind of reply. If you encounter such a similar situation, if you are not sure about it personally, ask your colleague or leader to discuss it in time, don’t just listen to him stupidly and change it.

All in all, when dealing with two-party and three-party interfaces, we must pay more attention, and we must do a good job of emptying, downgrading, etc. I see that some new classmates don't like empty sentences very much, because they think it's ugly to write one more if, tsk tsk, young or too young, never been beaten!

So, what is the most disgusting scene you encountered with the three parties? Get the message area for everyone to laugh at?

Alright, that's all for today~

I'm yes, from a little to a million, see you in the next part~

Guess you like

Origin blog.csdn.net/yessimida/article/details/121998828