Monday, March 12, 2012

Java Tuning in a Nutshell - Part 1


While delivering a training recently, I got a request to put together a JVM tuning cheat sheet. Given the 50+ parameters available on the Sun hotspot, this request is understandable. The diagram below is what I came up with. I’ve tried to narrow down the most important flags that will solve 80% of JVM performance needs with 20% of the tuning effort. This article assumes basic JVM tuning knowledge - the different generations used in the Sun hotspot JVM, different garbage collection algorithms available, etc. Although this is intended primarily for enterprise grade Oracle Fusion Middleware products, it applies to most server JVM’s with large heaps and hosted on server class, multi-core machines. This is not an exhaustive list, only low hanging fruit. In fact, many JDK1.6 users need no tuning at all - the JVM picks good defaults and ergonomics does a decent job. Follow this only if the default behavior is not good enough (for instance, frequent garbage collections, low throughput, long GC pauses, etc). In my experience, a non-trivial production topology with Oracle Fusion Middleware products often requires this level of tuning. This includes Oracle WebLogic Server (JavaEE apps), Oracle Coherence, Oracle Service Bus, Oracle SOA Suite, BPM, AIA and other enterprise FMW apps running on the Sun hotspot JVM. I’ve used a mind map below to help visualize the relationship and dependencies between various JVM tuning flags. In the diagram, the flags in black are the ones to try first; the ones in gray are optional; anything not covered here can be ignored! :)



I’ve categorized the flags into 4 groups:
  1. Garbage collection (GC): The garbage collection algorithm is one of the two mandatory tunables for java performance tuning. Start with UseParallelOldGC. If GC pauses are not acceptable, switch to UseConcMarkSweepGC (prioritizes low application pause times at the cost of raw application throughput). Specify parameter ParallelGCThreads to limit GC threads (yes limit, the default is usually too high for multiple Weblogic servers sharing a large, multi-core machine). Recommendations for values and other flags will be covered later.
  2. Heap tuning: This is the other mandatory tunable. I’m using ‘heap’ as an umbrella term for all Java memory spaces. Technically, Perm and Stack are not part of the java heap in Sun hotspot. Required flags in my tuning exercise are total heap size (Xmx, Xms), young generation size (Xmn) and permanent generation size (PermSize, MaxPermSize). Xss tuning is optional. I only use it when tuning on a 32-bit heap-constrained JVM; reducing Xss only to squeeze memory out from native space so more is available for Xmx. In any case, never set Xss below 128k for Fusion Middleware (default is usually 512k to 1m depending on OS).
  3. Logging: GC logging is mandatory only for the duration of the tuning exercise itself. However, due to its low overhead (typically only one line written per collection, which itself is relatively infrequent), it is highly recommended for production as well. Otherwise, you will not be able to make an educated tuning decision if/when things don't work as expected. 
  4. (Optional) Other Performance: These are only used for fine tuning when performance is the driver for the tuning exercise. Even then, try these only after GC and heap are well tuned to begin with.
The primary requirement that warrants JVM tuning in production Oracle Fusion Middleware is not performance, rather unacceptable GC pauses. The cultprit almost always is a Full GC that causes long application pause. Symptoms include temporarily unresponsive servers, client session timeouts, etc. If you’re capturing GC logs using the flags in the diagram, a search for “Full GC” will show how many, how frequent and how long Full GC’s took. Following the tunables in the diagram above, this is how you can solve the problem (I have highlighted the parameters to match those in the diagram):
  1. Heap not sized correctly, causing Full GC’s
    1. -Xmx should be equal to -Xms Growing from Xms to Xmx requires Full GC’s to resize the heap. Set these to the same value if Full GC’s are to be completely eliminated in production.
    2. –XX:PermSize should be equal to –XX:MaxPermSize
      Both params need to be specified and should have the same value. Otherwise, a full GC is required for each Perm Gen resize while it grows up to MaxPermSize
    3. –XX:NewSize is specified but not equal to –XX:MaxNewSize
      Like the other heap params, resize of new/young gen requires a Full GC. The preferred approach is to avoid these two parameters and use -Xmn instead. This eliminates the problem as setting, say "-Xmn1g", is the same as setting "-XX:NewSize=1g -XX:MaxNewSize=1g".
    4. –XX:SurvivorRatio is specified but –XX:-UseAdaptiveSizePolicy is not. The SurvivorRatio specified will not stick if AdaptiveSizePolicy is in effect. By default, the JVM adapts and overrides the value you specified based on runtime heuristics. Use this parameter to disable adaptive sizing of generations (notice the 'minus' sign preceding UseAdaptiveSizePolicy).
  2. –XX:+UseConcMarkSweepGC is almost always used when there is a strict latency requirement or Service Level Agreement (SLA) and long GC pauses are unacceptable. That is, avoid Full GC’s at all cost. However there are many reasons why Full GC’s could still occur:
    1. Although UseConcMarkSweepGC is specified, CMS can and often will kick in too late, causing a Full GC when it can’t catch up. In other words, although CMS is collecting garbage, the application threads that are executing concurrently run out of heap for allocation because CMS couldn't free garbage soon enough. At this point, the JVM stops all application threads and does a Full GC. This is also called a “concurrent mode failure” in GC logs. The reason for concurrent mode failure - the JVM dynamically finds a value for when CMS should be initiated and changes this value based on statistics. However, in production, load is often bursty which leads to misses/miscalculation for the last dynamically computed initiation value. To prevent this, provide a static value for CMSInitiation. Use –XX:CMSInitiatingOccupancyFraction (as percentage of total heap) to tell the JVM what point it should initiate CMS. A value between 40 to 70 usually works for most Fusion middleware products. Start with the higher value (70) and tune down only if you still see the string “concurrent mode failure” in GC logs.
    2. Secondly, always specify –XX:+UseCMSInitiatingOccupancyOnly when CMSInitiatingOccupancyFraction is used, otherwise the value you specify does not stick (JVM will dynamically change it on the fly again). This is very important and commonly missed.
  3. UseParallelGC is used instead of –XX:+UseParallelOldGC
    1. UseParallelOldGC does old gen collection in parallel unlike UseParallelGC. In both cases, young gen (minor) collections are still parallel. By having multiple threads do old gen collection, the overall Full GC pause can be reduced.
    2. If no GC params are specified, UseParallelGC is usually the default (this may have changed in later versions of JDK6), so it is safe to always specify this parameter when throughput is the goal.
Rarely, no matter how well you tune your JVM, the heap gets backed up eventually and results in back-to-back Full GC’s (again, use GC logs to guide you). If this is the case, there is a possibility that your code has introduced a memory/reference leak. To confirm, take a few heap dumps and compare them to see if any particular object count is growing with time, even after GC completes. Again, this is very rare so make sure you do your due diligence with JVM tuning first. 

I’d be interested in your comments or questions after you try this out. Happy tuning!

30 comments:

  1. The correct parameter is -XX:+UseParallelOldGC

    ReplyDelete
    Replies
    1. Thanks for pointing out the typo... I've corrected it.

      Delete
    2. Thanks, I agree with your point about too many JVM parameters and 80% result can be achieved by choosing correct GC settings and memory settings. I have also shared 10 JVM Options Java programmer should know you may like

      Delete
  2. What about Hotspot JDK1.7, any good tuning tips specifically for the latest JVM?

    ReplyDelete
    Replies
    1. Use the same flags listed here for JDK7. The key addition in JDK7 is G1 collector which gives more predictable GC pauses for low latency apps. However, it doesn't perform much better than a well tuned CMS as shown here. So I wouldn't recommend it over CMS for low latency requirements... at least, not yet. If this is for Oracle Fusion Middleware products, I would first check if the product is certified on JDK7

      Delete
  3. Question - I notice you didn't discuss -XX:+UseLargePages - any particular reason?

    ReplyDelete
    Replies
    1. For some info on Java performance and large pages, try: http://zzzoot.blogspot.ca/2009/02/java-mysql-increased-performance-with.html

      Delete
    2. Two reasons for omitting:
      1. It is not required for all apps. I'm trying to keep this simple with only primary tunables
      2. -XX:+UseLargePages is enabled by default on some OS's like Solaris. What's more important is the page size (-XX:LargePageSizeInBytes) if you do need large pages.

      Delete
  4. What are your thoughts on XX:+UseCompressedOops? I've seen a lot of posts and confusion about it on the Internets.

    ReplyDelete
    Replies
    1. Highly recommended on 64-bit JVM's with an Xmx value less than 32g. However, this is available only on JDK6 update 14+.

      Delete
    2. Also, since this article was targeted specifically for Oracle Fusion middleware, you will also need the latest and greatest version/patch of Weblogic Server 11g, otherwise -XX:+UseCompressedOops caused issues in older versions of WLS.

      Delete
  5. You don't mention -XX:+UseTLAB. Have you not found it helpful?

    ReplyDelete
    Replies
    1. Not really. This is one of the flags I let the JVM tune for me. It is turned on by default, again depending on OS.

      Delete
  6. We found out that for real time application Oracle (BEA originally) JRocket JVM perform even better and it memory leakage profiling on production system is excellent for troubleshooting.

    ReplyDelete
    Replies
    1. Adi, I totally agree re: JRocket. We had significant problems w/ high-memory utilization in one of our apps, and JRocket was great at helping us deal with them. Additionally, before the multi-threaded garbage collection, JRocket was one of the few JVMs that allowed to do parallelized garbage collection. Otherwise, our memory-hungry app would just freeze at times while GC.

      Delete
    2. That is correct, JRockit provides deterministic GC that Hotspot JDK6 does not. It requires additional licensing though. JRockit is the preferred JVM on x86 based architecture (Windows, Linux). However, Sun hotspot performs better on Solaris.
      JDK7 has G1 which is the hotspot equivalent but we're not discussing JDK7 here :)

      JRMC (JRockit Mission Control) is a really neat profiler (among other things) that comes with JRockit. The closest equivalent in Hotspot would be JVisualVM with appropriate plugins. However, JRMC is much more feature rich.

      Delete
    3. JRockit has been practically discontinued by Oracle. Besides, in my tests well tuned HotSpot performed better and more stable under heavy and particularly bursty loads.

      Delete
    4. JRockit and Hotspot will converge in the near future... so it won't technically be 'discontinued'. Until then, I prefer a well tuned Hotspot over JRockit myself. JRockit works very well if you're on Linux, running server-class apps, don't intend to do much tuning (OOTB performance), need predictable response times (deterministic GC) and/or enterprise features like JRockit Mission Control. I'm not aware of any stability issues with JRockit.

      Delete
  7. what about the new-new G1 Garbage collector? To use it, just do: -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

    The new G1 collector does better w/ pauses and also does a more through job w/ Young collection than the CMS collector mentioned above.

    ReplyDelete
    Replies
    1. Not really. Do not use this on JDK6 with Fusion Middleware.

      Delete
  8. Rupesh, can you suggest any particularly helpful in-depth resources for learning more about the details of Oracle's VM implementations? I'm looking for books. Thanks.

    ReplyDelete
    Replies
    1. My ex-Sun colleagues Charlie Hunt and Binu John just came out with an excellent JVM internals and tuning book - titled "Java Performance". I highly recommend it.

      Delete
  9. Some missing log statements (there are more but these are minimal to know when performance is bad).

    -XX:+PrintGCApplicationStoppedTime
    -XX:+PrintGCApplicationConcurrentTime

    ReplyDelete
    Replies
    1. Don't really need these. There are other ways to do this analysis, for which you only need the flags I mentioned. I will cover that in detail in a later article. Until then, keep it simple and minimize logging so it can be left on in production.

      Delete
  10. Hi,
    Get lot of Full GC in prod, there are 2 types.
    Type-1
    1287470.552: [Full GC (System) [PSYoungGen: 1494K->0K(692224K)] [PSOldGen: 190715K->179290K(1400832K)] 192209K->179290K(2093056K) [PSPermGen: 138023K->138023K(139264K)], 5.1737903 secs] [Times: user=5.17 sys=0.01, real=5.17 secs]

    Type-2
    1305498.007: [Full GC (System)[Unloading class sun.reflect.GeneratedMethodAccessor9791]
    1309103.580: [Full GC (System)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor5242]
    1312709.715: [Full GC (System)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor5323]
    1316315.734: [Full GC (System)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor5364]

    Is Type-2 purely code/framework problem? does setting appropriate heap and/or GC params can avoid these type-2 Full GC cycles?

    Thanks
    Amit

    ReplyDelete
    Replies
    1. The 'System' in "[Full GC (System)" log means this full GC was triggered by system.gc() in code. The correct way to fix it is remove system.gc() calls from your code since it is bad practice in the modern JVM. If that is too much work or this code is out of your control, you can avoid this kind of GC globally by adding the "-XX:+DisableExplicitGC" flag to your JVM. You will need to bounce your prod after adding this flag. You should no longer see these types of logs after that.

      Delete
  11. Under I find it best to include `-XX:+PrintVMOptions`, it doesn't do much special, but it makes my tests easier to analyze after the fact.

    ReplyDelete