Previously we learned the entire JVM series, the ultimate goal is not only to understand the basics of JVM, but also to prepare for JVM performance tuning. This article leads you to learn about JVM performance tuning.
Performance Tuning
Performance tuning consists of multiple levels, such as: architecture tuning, code tuning, JVM tuning, database tuning, operating system tuning, and so on.
Architecture tuning and code tuning are the basis of JVM tuning, with architecture tuning having the greatest impact on the system.
Performance tuning basically follows the following steps: defining the optimization goal, identifying performance bottlenecks, performance tuning, obtaining data through monitoring and statistical tools, and confirming whether the goal has been reached.
When to perform JVM tuning
In the following cases, it is time to consider JVM tuning:
Heap memory (old age) continues to rise to reach the set maximum memory value;- Full GCs are frequent;
- The GC pause is too long (more than 1 second);
- The application has memory exceptions such as OutOfMemory;
- The application uses a local cache and consumes a lot of memory space;
- System throughput and response performance is poor or degraded.
Basic principles of JVM tuning
JVM tuning is a means to an end, but not necessarily all problems can be solved by JVM tuning, so there are some principles to follow when performing JVM tuning:
- Most Java applications do not require JVM optimization;
Most causes of GC problems are the result of problems at the code level (code level);
Before going live, you should consider setting the machine’s JVM parameters to the optimum;- Reduce the number of objects created (code level);
- Reduce the use of global variables and large objects (code level);
Priority architecture tuning and code tuning, JVM optimization is a last resort (code, architecture level);
It is better to analyze the GC situation to optimize the code than to optimize the JVM parameters (at the code level);
Through the above principles, we found that the most effective means of optimization is the architecture and code level optimization, and JVM optimization is the last resort, can also be said to be the last “squeeze” on the server configuration.
JVM Tuning Goals
The ultimate goal of tuning is to enable applications to carry more throughput with minimal hardware consumption. jvm tuning focuses on optimizing the collection performance of the garbage collector so that applications running on virtual machines can use less memory and latency to achieve greater throughput.
- Latency: low GC stutter and low GC frequency;
- Low memory footprint;
- High throughput; and
An increase in the performance of any one of these attributes is almost always at the expense of a loss in the performance of the other attributes, and you cannot have it both ways. This is determined by its importance in the business.
JVM Tuning Quantitative Objectives
Some reference examples of quantitative goals for JVM tuning are shown below:
- Heap memory usage <= 70%.
Old generation memory utilization <= 70%.- avgpause <= 1 second.
Full gc count 0 or avg pause interval >= 24 hrs ;- Full gc count 0 or avg pause interval >= 24 hrs
Note: The quantitative goals for JVM tuning are different for different applications.
Steps for JVM Tuning
In general, JVM tuning can be performed in the following steps:
Analyze GC logs and dump files to determine if optimization is needed and identify bottleneck problem points;- Determine quantitative goals for JVM tuning;
Determine JVM tuning parameters (based on historical JVM parameters);- Tuning memory, latency, and throughput metrics in turn;
- Compare and contrast the differences between before and after tuning;
Keep analyzing and tweaking until you find the right configuration for the JVM parameters;
Find the most appropriate parameters, apply them to all servers, and follow up.
Some of the above steps require multiple iterations. Generally, it starts with meeting the memory usage requirements of the program, followed by the time delay requirements, and finally the throughput requirements, which should be optimized continuously based on this step, and each step is the basis for the next step, which should not be reversed.
JVM Parameters
The most important tool for JVM tuning is JVM parameters. First of all, let’s understand the JVM parameters related content.
The -XX parameter is known as the instability parameter, and the setting of this parameter can easily cause differences in JVM performance, making the JVM extremely unstable. If these parameters are set properly, the performance and stability of the JVM will be greatly improved.
The unstable parameter syntax rule contains the following.
Boolean type parameter value:
- -XX:+ ‘+’ indicates that the option is enabled
- -XX:- ‘-‘ means turn off this option
Numeric type parameter value:
-XX:= Sets a numeric type value to the option that can follow the units, e.g., ‘m’ or ‘M’ for megabytes; ‘k’ or ‘K’ for kilobytes; ‘g’ or ‘G’ for gigabytes. 32K is the same size as 32768.
String type parameter value:
-XX:= Sets a string type value to the option, typically used to specify a file, path, or list of commands. For example: -XX:HeapDumpPath=. /dump.core
JVM parameter analysis and tuning
For example, the following parameter example:
-Xmx4g –Xms4g –Xmn1200m –Xss512k -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:PermSize=100m -XX:MaxPermSize=256m -XX:MaxTenuringThreshold=15
The above is an example for Java 7 and previous versions, in Java 8 the permanent generation parameters -XX:PermSize and -XX:MaxPermSize are no longer valid. This has been covered in previous sections.
Parameter parsing:
- -Xmx4g: Heap memory is maximized to 4GB.
- -Xms4g: Initialize heap memory size to 4GB.
-Xmn1200m: set the size of the young generation to 1200MB. after increasing the young generation, the size of the old generation will be reduced. This value has a large impact on the system performance, Sun officially recommends that the configuration is 3/8 of the entire heap.
-Xss512k: set the stack size of each thread. after JDK5.0, the stack size of each thread is 1MB, before that, the stack size of each thread is 256K. it should be adjusted according to the memory size required by the application threads. In the same physical memory, reduce this value can generate more threads. However, the operating system still has a limit on the number of threads within a process, not unlimited generation, the empirical value of about 3000~5000.
-XX:NewRatio=4: Sets the ratio of young generations (including Eden and the two Survivor zones) to old generations (excluding persistent generations). If you set it to 4, the ratio of young generation to old generation will be 1:4, and the young generation will occupy 1/5 of the whole stack.
-XX:SurvivorRatio=8: Set the size ratio of Eden zone to Survivor zone in the young generation. If it is set to 8, the ratio of two Survivor zones to one Eden zone is 2:8, and one Survivor zone accounts for 1/10 of the whole young generation.
-XX:PermSize=100m: Initialize the permanent generation size to 100MB.
-XX:MaxPermSize=256m: Set the persistence generation size to 256MB.
-XX:MaxTenuringThreshold=15: Set the maximum age of garbage. If it is set to 0, the young generation objects do not pass through the Survivor area and go directly to the old generation. For applications with more older generations, this can improve efficiency. If this value is set to a larger value, the young generation object will be copied multiple times in the Survivor area, which can increase the survival time of the object in the young generation, and increase the probability of being recycled in the young generation.
The parameters for new generation, old generation, and permanent generation, if not specified, the VM will automatically select the appropriate values and also automatically adjust them based on the overhead of the system.
Tunable Optimization Parameters:
-Xms: initialize heap memory size, defaults to 1/64th of physical memory (less than 1GB).
-Xmx: Maximum heap memory. By default (MaxHeapFreeRatio parameter can be adjusted) free heap memory is greater than 70%, the JVM reduces the heap until the -Xms minimum limit.
-Xmn: Cenozoic size, including Eden zone with 2 Survivor zones.
-XX:SurvivorRatio=1: The ratio of the Eden zone to a Survivor zone is 1:1.
-XX:MaxDirectMemorySize=1G: Direct memory. Report java.lang.OutOfMemoryError: Direct buffer memory exception to adjust this value upwards.
-XX:+DisableExplicitGC: Disable runtime explicit calls to System.gc() to trigger fulll GC.
Note: The timed GC trigger mechanism of Java RMI can be controlled by configuring -Dsun.rmi.dgc.server.gcInterval=86400.
-XX:CMSInitiatingOccupancyFraction=60: old-age memory reclamation threshold, default value is 68.
-XX:ConcGCThreads=4: CMS Garbage Collector Parallel Threads line, recommended value is number of CPU cores.
-XX:ParallelGCThreads=8: Number of threads for the new generation parallel collector.
-XX:MaxTenuringThreshold=10: Set the maximum age of garbage. If it is set to 0, the young generation objects do not pass through the Survivor area and go directly to the old generation. For applications with more older generations, this can improve efficiency. If this value is set to a larger value, the young generation object will be copied multiple times in the Survivor area, which can increase the survival time of the object in the young generation, and increase the probability that it will be recycled in the young generation.
-XX:CMSFullGCsBeforeCompaction=4: Specifies how many fullGCs to perform before compaction of tenured area memory space.
-XX:CMSMaxAbortablePrecleanTime=500: the abortable-preclean preclean phase will end when its execution reaches this time.
When setting it up, if you are concerned about performance overhead, you should try to set the initial and maximum values of the permanent generation to the same value, because resizing of the permanent generation requires a FullGC to be implemented.
Memory Optimization Example
Once the JVM has stabilized and FullGC is triggered we typically get the following message.
The above gc logs show the heap occupancy and GC time of the entire application at the time of the fullGC. In order to be more accurate, you need to collect them multiple times and calculate the average value. Or use the longest fullGC for estimation. In the above figure, the old age space is 93168kb (about 93MB), which is the active data of the old age space. Then other heap space is allocated based on the following rules.
java heap: parameters -Xms and -Xmx, recommended to expand to 3-4 times the old age space occupation after FullGC.
Permanent generation: -XX:PermSize and -XX:MaxPermSize, recommended to expand to 1.2-1.5 times FullGc after the permanent band space occupation.
New generation: -Xmn, recommended to expand to 1-1.5x the space occupation of the old generation after FullGC.
Older Ages: 2-3x the space occupied by the older ages after FullGC.
Based on the above rule, then the parameters are defined as follows:
java -Xms373m -Xmx373m -Xmn140m -XX:PermSize=5m -XX:MaxPermSize=5m
Latency Optimization Example
For latency optimization, you first need to understand the latency requirements and what are the tunable metrics.
Acceptable average stall time of the application: This time is the same as the measured Minor
GC duration for comparison. Acceptable Minor GC frequency: Minor- The frequency of GC is compared to the tolerable value.
Maximum Acceptable Standstill Time:Maximum standstill time is compared to the duration of the worst case FullGC.
Maximum acceptable frequency of stalls occurring: basically the frequency of FullGC.
Among them, the average stall time and the maximum stall time are the most important for user experience. For the above metrics, relevant data collection includes: duration of MinorGC, counting the number of MinorGCs, worst duration of FullGC, and worst case, frequency of FullGC.
As shown above, the average duration of Minor GC is 0.069 seconds and the frequency of Minor GC is 0.389 seconds once.
The larger the space of the new generation, the longer and less frequent the GC of Minor GC. If you want to reduce its duration, you need to reduce its space size. If you want to reduce its frequency, you need to increase its space size.
Here the delay time is reduced by reducing the size of the new generation space by 10%. During this process, the size of the old generation and the holding generation should be kept unchanged. The tuned parameters are varied as follows.
java -Xms359m -Xmx359m -Xmn126m -XX:PermSize=5m -XX:MaxPermSize=5m
Throughput Tuning
Throughput tuning is primarily based on the throughput requirements of the application. The application should have a comprehensive throughput metric that is derived based on the requirements and testing of the entire application.
Evaluate whether the gap between the current throughput and the target is huge, if it is around 20%, you can modify the parameters, increase the memory, and debug from scratch again, if it is huge, you need to consider the whole application level, the design as well as the target is consistent, and reevaluate the throughput target.
For garbage collectors, the goal of performance tuning to improve throughput is to avoid or have as little FullGC or Stop-The-World compressed garbage collection (CMS) as possible, since both of these cause application throughput to decrease. Try to recycle as many objects as possible in the MinorGC phase to avoid objects being lifted too quickly into old ages.
Tuning Tools
With the help of the GCViewer log analysis tool, it is very intuitive to analyze the advantages of pending tuning. It can be analyzed in the following ways:
Memory, analyze Totalheap, Tenuredheap, Youngheap memory usage and other indicators, theoretically the smaller the memory usage the better;
Pause, analyze each index in the three major items of Gc pause, Fullgc pause and Total pause, theoretically the less GC times the better, the smaller GC duration the better;
Original link: “JVM performance tuning explained