BHeapSampler: Arndt's Java Heap Analyis Tool FAQ

Questions & Answers

What is BHeapSampler?

BHeapSampler is a Java Memory Analysis Tool that presents a graph-view of a java heap dump

How does it work?

It takes a (size weighted) random sample of objects from where to compute root-paths and adds these path up to a graph

Why a graph? How does it compare to Eclipse-MAT's Dominator Tree?

BHeapSampler generates a class-level graph. Eclipse-MAT's Domitator Tree is an instance level tree. These are two totally different concepts. BHeapSampler achieves information-compaction by projecting the instance-graph to a graph whos nodes more or less correspond to classes. More or less because the node's identity is anything between instance and class, subject to configuration, but mostly it's class identity and second most it's "parent identity", meaning that the node inherits it's identity from it's referes.

How is the result presented?

BHeapSampler does not include a graph-layouter. The resulting graph is stored in "DOT"-language. (see http://en.wikipedia.org/wiki/DOT_language) Graphical layout can be done using GraphViz, e.g. for PDF-Format via command-line: "dot -Tpdf -omemory_graph.pdf memory_graph.dot"

Why statistical amd not exact? What about the statitical error?

Just for algorithmic reasons. The graph view was derived from a similar tool for performance-analysis from statistical stack-sampling - just replaced the stacktraces by memory-paths. That's why it's still statistical. Never thought about an exact approach, maybe that's possible. However, statistical error is not the problem when analysing a heap dump. Conceptual problems in assigning memory allocation to responsible structures are where the headache comes from. Statistical error can always be as low as needed by calculating enough paths.

What is the maximum size of heap dumps BHeapSampler can process?

BHeapSampler loads part of the heap-dump in it's own memory, but never allocates more memory than 2/3 of the size of the dump. So there's no limit in size, just a 2-billion limit in object count (32-bit max-int). Just -Xmx enough heap memory to the tool.

Which heap-dump-formats are supported?

BHeapSampler reads heap-dumps in HPROF-Binary format. The extended HPROF-format used by Android/Dalik can be processed as well (tested for Gingerbread only, use the hprov-conv tool if it fails).

Does BHeapSampler ignore weak references in path-finding?

Yes and no. It uses sorted avoid-class-lists, which defaults to avoid Weak-/Soft-Refs and the finalizer queue, and prefers static versus dynamic roots. So weak paths are found if and only if no strong path exists, and the finalizer queue is found if and only if the object is finalizable.

How can I find an alterative path if the graph shows a non-exlusive, shortest path?

You can either use the random-walk path finder, or you can modify the avoid-class-lists to avoid the path that you already know.

Why are there two different avoid class lists?

The avoidGhostList is used for ghost-like references (Weak/Soft/Finalizers). The avoidClassList is used to specify preferences between normal, strong paths. The technical difference is that paths avoided via the ghost-list do not prevent the strong path from beeing displayed as exclusive.

Is BHeapSampler better than Eclipse-MAT?

BHeapSampler is powerful at what it does: presenting an intuitive view of the dominating structures of a java heap. It can do that in productive environments where extracting the heap dump to the developers desk may fail due to size or security restrictions. However, it's just a command-line tool and not a full-featured memory-debugger, so facing it to other tools is comparing apples and oranges.

What is the develoment status of BHeapSampler?

Statistical heap analysis is a new idea, and BHeapSampler was started as a proof-of-concept with limited time-budget. It has evolved to a field-proven tool, but there's no ongoing active development other than bugfixes. It is presented here as a binary "run-and-see" version with limited configuration (e.g. with hardcoded identity policy) mainly to present the concept of getting an inituitive view on memory structures via a class-level graph. It would be nice to see the community pick up the idea for an open-source project or for integration into existing tools.

You provide BHeapSampler as an obfuscated binary. How can I be sure that it does not do any bad things like calling home or tampering with my system?

It does nothing like that. Just reads the dump and writes two files. I did not use any agressive obfusaction options and it's only 20k of bytecode, so you can just decompile and read what it's doing.