5 things you need to know to build high-performance Java applications

This article is excerpted from "java performance". Students who are more concerned about java performance probably know this book. Performance may be something that many students rarely care about when they write java code on a daily basis, but when we write code The process is indeed inseparable from the impact on the performance of the program from time to time, as small as we use bit operations to implement arithmetic operations, as large as our overall architectural design of the JAVA code, the performance is actually very close to us. This article mainly mentions a few points, mainly in the field of performance, we are more concerned about some issues, and they are inspiring. If students are more interested in performance, then we can study each point in depth together.

For performance tuning, there are usually three steps: 1, performance monitoring; 2, performance analysis; 3, performance tuning

Our focus on operating system performance is mainly on the following points: CPU utilization, CPU scheduling execution queue, memory utilization, network I/O, and disk I/O.

1. CPU utilization

For an application, in order to achieve the best performance and scalability of the application, we should not only make full use of the available part of the CPU cycle, but also make the use of this part of the CPU more valuable, not wasteful. Being able to fully utilize CPU cycles can be challenging for multithreaded applications running on multiprocessor and multicore systems. In addition, when the CPU reaches saturation, it does not mean that the performance and scalability of the CPU have reached the optimal state. In order to distinguish how an application is using CPU resources, we have to detect it at the operating system level. On many operating systems, CPU utilization statistics reports usually include user and system or kernel usage of the operating system. User usage of the CPU refers to the time an application takes to execute application code execution. In contrast, the use of the CPU by the kernel and the system refers to the time an application spends executing operating system kernel code locks. High kernel or system CPU usage can indicate tight shared resources, or heavy I/O device interaction. Ideally To improve application performance and scalability, let the kernel or system CPU time be 0%, because the time spent executing kernel or system code can be used to execute application code. Therefore, a correct direction of CPU usage optimization is to minimize the time the CPU spends executing kernel code or system code.

For compute-intensive applications, performance monitoring goes deeper than monitoring user CPU usage and kernel or system CPU usage. In compute-intensive applications, we need to monitor the number of executions per clock (Instructions per clock; IPC) or is the CPU cycles (cycles per instruction; CPI) used by each CPU execution. For compute-intensive applications, it is a good choice for us to monitor the CPU from these two dimensions, because the packaged CPU performance reporting tools of modern operating systems usually only print the CPU utilization, not the CPU usage in CPU cycles. The time at which the command was executed. This means that when the CPU is waiting for data in memory, the operating system CPU performance reporting tool will also think that the CPU is in use. We call this scenario "Stall". "Stall" scenarios often occur, such as in CPU Any time an instruction is being executed, as long as the data required by the instruction is not ready, that is, not in a register or CPU cache, a "Stall" scenario will occur.

When a "Stall" scenario occurs, the CPU wastes clock cycles because the CPU has to wait for the data needed by the instruction to arrive in a register or buffer. And in this scenario, it is normal for hundreds of CPU clock cycles to be wasted, so in compute-intensive applications, the strategy to improve performance is to reduce the occurrence of "Stall" scenarios or increase the CPU cache usage to make more Fewer CPU cycles are wasted waiting for data. This type of performance monitoring knowledge is beyond the scope of this book and requires the help of a performance expert. However, a performance profiling tool, the Oracle Solaris Studio Performance Analyzer, described later, will include such data.

2. CPU scheduling queue

In addition to monitoring CPU usage, we can also check whether the system is fully loaded by monitoring the CPU execution queue. The execution queue is used to store lightweight processes. These processes are usually ready to execute but are waiting for CPU scheduling and are waiting in the scheduling queue. When the number of lightweight processes is higher than the current processor can handle. More often, the dispatch queue will be generated. Deeper CPU scheduling queues indicate that the system is fully loaded. The depth of the execution queue of the system is equal to the number of waits that cannot be executed by the virtual processor, and the number of virtual processors is equal to the number of hardware threads of the system. We can use the java api to get the number of virtual processors, Runtime.avaliableProcessors(). When the depth of the execution queue is four times or more the number of virtual processors, the operating system will become unresponsive.

A general guideline for detection of CPU scheduling queues is to pay attention when we find that the queue depth is more than twice the number of virtual processes, but there is no need to take immediate action. When it is more than three times or four times or higher, we should pay attention, and it is urgent to solve the problem.

There are usually two optional ways to observe the depth of the queue, the first is to share the load by increasing the CPU or reduce the load on the existing CPU. This approach essentially reduces the number of load threads per execution unit, thereby reducing the depth of the execution queue.

Another way is to increase the CPU usage by profiling the applications running on the system, in other words to find a way to reduce the number of CPU cycles spent on garbage collection, or to find a better algorithm to use fewer CPU cycles. Execute CPU instructions. Performance experts usually focus on the latter path: reducing code execution path length and better CPU instruction selection. JAVA programmers can improve the efficiency of code execution through better execution algorithms and data structures.

3. Memory utilization

In addition to CPU usage, system memory attributes also need to be monitored. These attributes include, for example, paging, swapping, locks, and context switching caused by multithreading.

Swapping usually occurs when the application needs more memory than the actual physical memory. To deal with this situation, the operating system usually configures a corresponding area called the swap area. The swap area is usually located on the physical disk. When the application in the physical memory is exhausted, the operating system will temporarily swap a part of the memory data to the disk space. This part of the memory area is usually the area with the lowest access frequency and will not affect the comparison. "Busy" memory area; when the memory swapped to the disk area is accessed by the application again, it is necessary to read the memory from the disk swap area into the memory in units of pages, and the swap will affect the performance of the application.

The performance of the virtual machine's garbage collector is very poor when swapping, because most of the areas visited by the garbage collector are unreachable, that is, the garbage collector will cause swapping to occur. The scenario is dramatic. If the heap area for garbage collection has been swapped to disk space, it will be swapped in units of pages at this time, so that it can be scanned by the garbage collector, which will dramatically cause garbage during the swap process. The collector's collection time is extended. At this time, if the garbage collector is "Stop The World" (making the application response stop), then this time will be extended.

4. Network I/O

The performance and scalability of distributed JAVA applications are limited by network bandwidth and network performance. For example, if we send more packets to the network interface than it can handle, the packets will pile up in the operating system's buffers, which will cause application delays, and other situations will also cause network application delays .

Tools for differentiation and monitoring are often hard to find in the operating system's packaging tools. Although linux provides the netstat command, both linux and solaris provide the implementation of network usage, and they both provide statistics including information such as sending packets per second, receiving packets, wrong packets, and conflicts. In Ethernet, a small number of packet collisions are normal. If there are many wrong packets, it may be a problem with the network card. At the same time, although netstat can count the data sent and received by the network interface, it is difficult to judge whether the network card is fully utilized. For example, if netstat -i shows that there are now 2500 packets per second from the network card, but we still can't tell whether the current network utilization is 100% or 1%, we can only know that there is currently traffic. This is only a conclusion that can be drawn without knowing the network packet size. Simply put, we cannot judge whether the current network affects performance through the netstat provided by linux and solaris. We need some other tools to monitor the network while our JAVA application is running.

5. Disk I/O

If the application operates on the disk, we need to monitor the disk to monitor possible disk performance problems. Some applications are I/O intensive, such as databases. The use of disk usually also exists in the application log system, and the log is usually used to record important information during the operation of the system.

This article is transferred from: Code Agricultural Network
address link: http://www.codeceo.com/article/5-tips-java-high-performance.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326914362&siteId=291194637