(Near?) zero overhead debugging with the IBM JVM
Scenario
- Web application running in a production environment is having
periodic OutOfMemory / extreme CPU load issues. - The server is not responding to HTTP requests.
- ‘ps -ef’ shows a couple Java proceses (out of 100’s total)
consuming 99% of CPU time. - The problem recurrs with a frequency of between 6 and 24 hours.
- There is, as yet, no known procedure to reproduce the problem
on demand.
Questions
Some questions to which I want answers:
- What Java threads correspond to each UNIX process
- The stack trace for every Java thread
- What locks are held / being waited on
- What objects are allocated
Tools
The IBM JVM comes with a number of custom, built-in tools for
collecting the information required to answer these questions.
Unlike the standard tools in the Java Platform Debugging
Architecture, these IBM-specific facilities impose (near?) zero
performance overhead on the JVM, so can be left permanently
enabled on production systems. What is more, these tools free,
quick and easy to use.
There is extensive (think 400+ pages long) documentation
available from the IBM JVM diagnosis docs site. What follows
is a brief overview of some of the details…
Java Dump (aka Javacore)
The javadump tool produces text files that contain diagnostic
information related to the JVM and the application running
within. This facility is enabled by default. A javadump can
be manually triggered by sending SIG_QUIT to the root process
of the JVM. The dump file will be created in first of the
following locations which is writable:
- The directory specified in IBM_JAVACOREDIR env variable
- The JVM’s current working directory
- The directory specified in TMPDIR env variable
- The /tmp directory
A dump file will be generated automatically if a native exception
occurrs in the JVM. If the IBM_JAVADUMP_OUTOFMEMORY env variable
is set, then a dump will also be generated when heapspace is
exhausted. The file name will be of the form javacore_[pid].[time].txt
Some of the information included in the dump file is:
- The OS memory info (eg from /proc/meminfo)
- User resource limits
- Signal handlers
- Environment variables
- OS memory maps (eg from /proc/[pid]/maps)
- List of all object monitors being waited on
- List of all system locks (and what thread, if any,
is holding them). - List of all threads, their Java names & their UNIX
process ids. - Java stack trace for each thread
- Native OS stack trace for each thread
- List of all classloaders
- List of all loaded classes
This is all output in a wellformatted text files which makes for
fairly easy human reading & trivial Perl scripting.
Heap dump
The heapdump tool produces text files that contain a dump of
all objects currently allocated on the heap and a list of
references between objects. This facility is disabled by default,
but can be enabled by setting the env variable IBM_HEAP_DUMP=true
prior to starting the JVM. When enabled, a heapdump is generated
at the same time as the javadump, eg by sending a SIG_QUIT signal.
The filename will be of the form heapdump_[pid].[time].txt While
plain text, the format of this file isn’t directly interesting to
humans, but when processed with the tool ‘heaproot’ you can get
some interesting information:
- Objects table – list of objects in the heap sorted
by size, address, name, subtree size, many more… - Types table – list of classes with objects in heap
sorted by total size, instance count, name - Gaps table – list of free spaces in the heap
- Object info – list of references to & from an object
- Paths/routes – find a route between two objects (if
one exists).
JFormat
The JFormat tool is a TUI/GUI for extracting & displaying
information from UNIX core dumps. To enable core dumps there
are two things to check before starting the JVM:
- Remove any resource limits ‘ulimit -c unlimited’
- Change into a writable directory
Now to generate a core dump, simply send the root process
a SIG_ABRT signal.
Since core dumps are operating system specific, you must
now process it with the jextract tool to create a platform
independant core dump.
$JAVA_HOME/jre/bin/jextract [path-to-native-core-dump]
Then you can launch the jformat gui with
$JAVA_HOME/bin/jformat -g
The JFormat GUI provides menus to pull out all sorts of
useful information, which I won’t go into now, other than
to say there is alot of overlap with the Java Dump and
Heap Dump metrics.
Example
Having set the IBM_HEAPDUMP=true env variable & restarted the
servers I just had to wait for the performance slowdown to
recurr (which indicated the garbage collector was starting to
thrash). I now sent a SIGQUIT to the root process generating
a heapdump file about 850 MB in size. This was then loaded into
the HeapRoots analysis program:
java -jar /tmp/HR205.jar heapdump3632.1083842645.txt
After parsing the file heaproots shows a few basic statistics
about the dump:
# Objects : 7,727,411 # Refs : 12,807,015 # Unresolved refs : 31 Heap usage : 315,413,344
The first thing to do is invoke the ‘process’ command. This resolves
all references between the objects, detects roots, calculates total
size metrics per class & sub-tree.
Pure roots : 174,088 ... which reach : 7,706,600 Artificial roots : 1,531 Softlinks followed : 16,014
Now, its ready to answer some interesting queries. Lets see what
the biggest tree roots are. This is done by running the ‘dt’
command – ‘Dump from roots, sort by total size’. It prompts for
some values such as max depth to print out, min size threshold,
then produces something looking like
<0> [270,167,808] 0x100a2ef8 [72] java/lang/Thread <1> [270,167,680] 0x10053c70 [168] com/caucho/java/CompilingClassLoader <2> [270,164,496] 0x10053d18 [168] com/caucho/util/DirectoryClassLoader - 14 children of 0x10053d18 too deep. <2> {270,167,680} 0x10053c70 [168] com/caucho/java/CompilingClassLoader <2> <270,163,200 parent:0x10053d18> 0x10053dc0 [168] com/caucho/java/CompilingClassLoader <2> <270,163,200 parent:0x10053d18> 0x10053dc0 [168] com/caucho/java/CompilingClassLoader - 13 children of 0x10053c70 filtered. - 4 children of 0x100a2ef8 filtered. <0> [5,234,712] 0x16a85b90 [447,296] array of java/lang/Object - 111820 children of 0x16a85b90 filtered.
Unsurprisingly the root with highest memory consumption is the top level
thread of the JVM. Not so enlightening. I figure the memory leak is
probably in CCM code, so lets find the biggest object in a package whose
name begins with ‘com/arsdigita/’. For this we use the ‘Objects dump, sort
by total size’ command (‘ot com/arsdigita/’) again this prompts for some
more optional filters, but accepting the defaults we get a list of big
objects:
Addr Size Root-owner Parent Total-size Name ------------------------------------------------------------------------------- N 0x284ffb48 256 0x100a2ef8 0x284ffc88 224,107,816 class com/arsdigita/templating/Templating N 0x220d3458 96 0x100a2ef8 0x22bc0ee8 4,585,520 com/arsdigita/london/theme/util/ThemeDevelopmentFileManager N 0x1cbc24b0 32 0x100a2ef8 0x1e18a5a8 2,460,464 com/arsdigita/templating/XSLTemplate N 0x23180b28 80 0x100a2ef8 0x22bc0ee8 2,164,688 com/arsdigita/london/search/RemoteSearcher N 0x1bf215e0 56 0x100a2ef8 0x1bf2a850 1,844,152 com/arsdigita/persistence/Session N 0x1bf2ab98 40 0x100a2ef8 0x1bf215e0 1,770,096 com/arsdigita/persistence/TransactionContext N 0x1affda20 16 0x100a2ef8 0x1aa4d388 1,681,056 com/arsdigita/persistence/DataQueryImpl$UnaliasMapper N 0x1affc508 96 0x100a2ef8 0x1affda20 1,681,040 com/arsdigita/persistence/DataAssociationCursorImpl N 0x1a985ff0 56 0x100a2ef8 0x1a98e8e8 1,138,696 com/arsdigita/bebop/PageState
So, the class object for ‘com.arsdigita.templating.Templating’ is taking
up 224 MB of the heap. A quick look at the code shows it has a static
HashMap storing XSLTemplate objects with no size bounding. To confirm this
is the culprit we go back tot he ‘Dump from roots, sort by total size’ command,
this time starting display from the object at address 0x284ffb48:
<0> [224,107,816] 0x284ffb48 [256] class com/arsdigita/templating/Templating <1> [224,092,408] 0x28500840 [56] java/util/HashMap <2> [224,092,352] 0x1cdf9160 [1,552] array of java/util/HashMap$Entry <3> [4,288,800] 0x2dbfa500 [32] java/util/HashMap$Entry <4> [3,216,568] 0x1bf5cc00 [32] java/util/HashMap$Entry - 3 children of 0x1bf5cc00 too deep. <4> [1,069,960] 0x2c824ca8 [32] com/arsdigita/templating/XSLTemplate - 4 children of 0x2c824ca8 too deep. - 1 child of 0x2dbfa500 filtered. <3> [3,216,584] 0x263c8ff8 [32] java/util/HashMap$Entry <4> [2,144,416] 0x3d00ade8 [32] java/util/HashMap$Entry - 3 children of 0x3d00ade8 too deep. <4> [1,069,912] 0x237796b0 [32] com/arsdigita/templating/XSLTemplate - 4 children of 0x237796b0 too deep. - 1 child of 0x263c8ff8 filtered. <3> [3,216,568] 0x19319d58 [32] java/util/HashMap$Entry <4> [2,144,400] 0x33ddd918 [32] java/util/HashMap$Entry - 3 children of 0x33ddd918 too deep. <4> [1,069,912] 0x1864f950 [32] com/arsdigita/templating/XSLTemplate - 4 children of 0x1864f950 too deep. - 1 child of 0x19319d58 filtered. [snipped for another 10 pages of output] <1> {224,107,816} 0x284ffb48 [256] class com/arsdigita/templating/Templating - 15 children of 0x284ffb48 filtered. There were 362 objects expanded.
So, from this its clear that the problem is a huge HashMap in the Templating
class containing around one hundred XSLTemplate objects, each around 1 MB in
size. So, the moral of the story – if you implement a static cache in a server,
make sure you:
- benchmark the average size of objects in the cache
- decide how much memory can be set aside to it
- use results from a) & b) to calculate a good max entry
count for the cache before invoking an purge algorithm.