jvm中的stop the world 在日志中的表现

原文讲述了jvm发生STW可能的原因(不仅是GC),以及在日志中的体现(主要是对gc.log中,total time for which application threads were stopped这行的解释)。

链接:https://dzone.com/articles/logging-stop-world-pauses-jvm

下附原文。

Logging Stop-the-world Pauses in JVM


Different events can cause the JVM to pause all the application threads. Such pauses are called Stop-The-World (STW) pauses. The most common cause for an STW pause to be triggered is garbage collection (example in github) , but different JIT actions (example), biased lock revocation (example), certainJVMTI operations , and many more also require the application to be stopped.

The points at which the application threads may be safely stopped are called, surprise, safepoints. This term is also often used to refer to all the STW pauses.

It is more or less common that GC logs are enabled. However, this does not capture information on all the safepoints. To get it all, use these JVM options:

1.  -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime

If you are wondering about the naming explicitly referring to GC, don’t be alarmed – turning on these options logs all of the safepoints, not just garbage collection pauses. If you run a following example (source in github) with the flags specified above

01.  public class FullGc {
02.    private static final Collection<Object> leak =new ArrayList<>();
03.    private static volatile Object sink;
04.   
05.    public static void main(String[] args) {
06.      while (true) {
07.        try {
08.          leak.add(new byte[1024 1024]);
09.          sink = new byte[1024 1024];
10.        catch (OutOfMemoryError e) {
11.          leak.clear();
12.        }
13.      }
14.    }
15.  }

you should see entries similar to the following in your standard output:

扫描二维码关注公众号,回复: 8666692 查看本文章
1.  Application time: 0.3440086 seconds
2.  Total time for which application threads were stopped: 0.0620105 seconds
3.  Application time: 0.2100691 seconds
4.  Total time for which application threads were stopped: 0.0890223 seconds

Reading it is easy (especially comparing to GC logs) – from the above you can see that the application managed to do useful work for the first 344 milliseconds, paused all threads for 62 milliseconds, following with 210ms of useful work and another 89ms pause.

You can use these flags in combination with GC flags, for example when running the same program again with -XX:+PrintGCDetails added, the output would look similar to the following:

1.  [Full GC (Ergonomics) [PSYoungGen: 1375253K-&gt;0K(1387008K)] [ParOldGen: 2796146K-&gt;2049K(1784832K)] 4171400K-&gt;2049K(3171840K), [Metaspace: 3134K-&gt;3134K(1056768K)], 0.0571841 secs] [Times: user=0.02 sys=0.04, real=0.06 secs]
2.  Total time for which application threads were stopped: 0.0572646 seconds, Stopping threads took: 0.0000088 seconds

Based on the above, we see that the application threads were forced to stop for 57 ms to do a garbage collection. Out of which 8 microseconds were spent waiting for all the application threads to arrive to a safepoint. But if we take another example (source in github) and run it with the same options, we may see the following output:

1.  Total time for which application threads were stopped: 0.0001273 seconds, Stopping threads took: 0.0000196 seconds
2.  Total time for which application threads were stopped: 0.0000648 seconds, Stopping threads took: 0.0000174 seconds

From this, we cannot actually infer what caused the pause, because there was no garbage collecting activity. If you wish to understand the additional safepoints in more details, here is a couple of JVM arguments for you:

1.  -XX:+PrintSafepointStatistics  -XX:PrintSafepointStatisticsCount=1

Enabling them makes the JVM log additional information to the standard output, looking similar to the following:

1.  5.141: RevokeBias  [  13  0  2  ]  [  0  0  0  0  0  ]  0  
2.  Total time for which application threads were stopped: 0.0000782 seconds, Stopping threads took: 0.0000269 seconds

The information about the safepoints is displayed according to the following sequence :

  • The timestamp in milliseconds since the start of the JVM (5.141 in the example above)
  • The name of the operation that triggered the pause (RevokeBias).
    • If you see a “no vm operation”, it means that this is a so-called “guaranteed safepoint”. By default, the JVM triggers a safepoint each second to process all queued-up operations that are not urgent. The flag GuaranteedSafepointInterval can be used to change this behavior (set it to 0 to disable it altogether)
  • The number of threads that were stopped at safepoint (13)
  • The number of threads running at the beginning of the safepoint (0)
  • The number of threads that were blocked when the vm operation began its execution (2)
  • Timings for various stages of arriving at safepoint and executing the operation (all zeroes)

So now we see that the use of biased locks may result in a lot of STW pauses happening, even though they only take tens of microseconds. It is not uncommon to disable them right away in these massively parallel days.

All in all, you can really save yourself some trouble by having this data recorded at all times. To do so, specify these JVM options:

1.  -XX:+LogVMOutput -XX:LogFile=vm.log

And all the JVM’s output will be stored to the vm.log file. Interpreting and acting based on these logs, however, is a very broad topic quite outside the scope of this article, but I may write a post or two about it in the future, so if interested, stay tuned – either via our Twitter subscription or through theRSS feed.

If you want some further reading right away, there’s a great blog post by Nitsan Wakart: and, of course, the OpenJDK source code.


发布了66 篇原创文章 · 获赞 17 · 访问量 10万+

猜你喜欢

转载自blog.csdn.net/ligeforrent/article/details/80834154