Skip to main content

Garbage collections on Tomcat provoke timeout exceedings on Apache

Comments

6 comments

  • Zendesk API User
    Author: Peter_Jodeleit - 9/2/2014 13:24

    In which phase of the GC the pause occurs? Is it really a 12 second "stop the world" pause?

    Does Oracle Java 6u81 uses compressed ops by default? If not, did you already tried this?

    0
  • Zendesk API User
    Author: king - 9/2/2014 15:05

    Dear Peter,

    we suppose you think of the JVM configuration directive:
    - XX:+UseCompressedOops

    as we use a 64bit JVM. Correct? The problem: starting with Java 6u23 it seems to be activated by default according to this web page respectively a page created by Oracle directly. So, the question is, whether we should really explicitly set it?

    Regarding your first question, we see:

    2014-09-02T09:10:13.763+0200: 248098.974: [GC[YG occupancy: 1189037 K (2048000 K)]2014-09-02T09:10:13.763+0200: 248098.975: [Rescan (parallel) , 5.6090719 secs]2014-09-02T09:10:19.373+0200: 248104.584: [weak refs processing, 5.3681923 secs]2014-09-02T09:10:24.741+0200: 248109.953: [class unloading, 0.1829213 secs]2014-09-02T09:10:24.924+0200: 248110.136: [scrub symbol & string tables, 0.0415524 secs] [1 CMS-remark: 2856086K(7168000K)] 4045123K(9216000K), 12.1134059 secs] [Times: user=36.92 sys=0.29, real=12.11 secs]

    Total time for which application threads were stopped: 12.1258575 seconds

    2014-09-02T09:10:25.877+0200: 248111.089: [CMS-concurrent-sweep-start]

    Total time for which application threads were stopped: 0.0237557 seconds

    Total time for which application threads were stopped: 0.0128559 seconds

    To our understanding, the 12s pause happens in a JVM GC phase where it provokes a "stop the world" run. Do you agree?

    - What about our suggestions of tuning above?

    - are there more ideas on your side?

    0
  • Zendesk API User
    Author: Peter_Jodeleit - 9/3/2014 11:26

    Yes, "UseCompressedOps" was the switch I had in mind. It won't hurt to set it explictly..

    Now to your log: "CMS-remark: 2856086K(7168000K)] 4045123K(9216000K), 12.1134059 secs".

    CMS-remark isn't performed in parallel (description of this phase is "Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also processes Reference objects."). There is no "doing this to speed up this phase" advice possible, it's always a bit of trial and error sadly. That said, I would try to adjust the ratio between the eden and the survior spaces. The time should increases when young gen is larger and decrease when the size of the young gen is reduced.

    [EDIT]

    It's a bit old, but perhaps you can get some tips from here: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2008-September/000205.html

    0
  • Zendesk API User
    Author: isenberg - 9/8/2014 15:30

    As the application creates many temporary objects, reducing -XX:SoftRefLRUPolicyMSPerMB=20 to 1 would decrease the number of those objects if some of the objects are soft/weak references. In addition I suggest increasing the -Xmn value as long as the NewGen GC pauses stay below an acceptable pause limit, maybe about 0.5s. Generally, using the OldGen space for short lived temporary objects should be avoided as using the OldGen for temporary objects creates fragmentation and thereby Full GC after longer runtime. The current problem is the 12s pause caused by the CMS GC, as Peter already mentioned. CMS GC is the selected implementation for the OldGen GC. It is the best choice for FirstSpirit, but in this case overloaded.

    Maybe removing -XX:+CMSIncrementalMode helps, as that is an optimization for systems with 4 or less CPU cores.


    To even increase the usage of NewGen and reducing the load on the OldGen (CMS-GC) replace

    -XX:TargetSurvivorRatio=80

    -XX:InitialTenuringThreshold=5

    -XX:MaxTenuringThreshold=10

    with

    -XX:+NeverTenure

    Then objects from NewGen are kept as long NewGen is not completely filled, only then they are promoted to OldGen to create space for new objects in NewGen.

    The only drawback of a large NewGen is its high RAM usage as it requires 2 - 3 times the Java objects's size amount of RAM with the mirrored survivor spaces.

    To be sure, which parameter change really improves the situation, please test each change one after the other. All parameter suggestions, except for the Xmn increase are current standard values of FirstSpirit 5.0.

    0
  • Zendesk API User
    Author: king - 9/9/2014 7:56

    Interesting note regarding the "NeverTenure" option. According to the following web page this JVM directive might mean that objects located in "New" are never promoted to "Old" where the CMS collector runs.

    But according to H. Isenberg and his observations "never" is not really the case. As soon as one of the two survivor spaces is completely filled up with referenced objects and new objects are waiting for a move in "NewGen", the oldest objects will be moved to "OldGen". Without having configured "NeverTenure" these objects would have been moved at a certain age - independent of the filling state. The disadvantage: really long living objects are copied unnecessary long. But when the ratio between temporary/long living is permanent very big, as when using FirstSpirit, this does not attract attention.

    Starting with Oracle Java 1.8.0_40 e-Spirit hopes to use the G1 collector for FirstSpirit in productive environments as a full GC for the PermGen/MetaSpace will be dispensable then. When using FirstSpirit integrating reflection at a high level for its WebApp/JavaClient-Server-communication, the "PermGen" is occupied.

    0
  • Zendesk API User
    Author: isenberg - 9/9/2014 8:23

    The definition of "NeverTenure" indeed seems to be ambiguous. We introduced this parameter around FirstSpirit 5.0 as standard in fs-wrapper.conf with Oracle Java 1.6.0_27 after seeing good results from automatic stress tests with FirstSpirit. The usage of OldGen heap space of a FirstSpirit server appears normal, even now with Oracle Java 1.7.0_67 with most of the OldGen used by the Berkeley cache.

    This correlates with the following descriptions of this parameter:

    http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html

    -XX:+NeverTenure

    Objects from young space will never get promoted to tenured space while survivor space is large enough to keep them.

    Though the lead developer of the CMS GC has a different opionion in the following URL, but gives no reason. My guess is, that there are good use cases for the parameter as shown with our automatic tests and there are other cases where the automatic tenuring method gives better results:

    http://www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf

    -XX:+NeverTenure

    ● Very bad idea!

    0

Please sign in to leave a comment.