We mostly use memory-safe high level languages at LShift (although we’ve done the odd embedded systems dev job), but sometimes a bit of systems programming knowhow still comes handy. I had the misfortune of a pure, i.e. no-JNI java program segfaulting on me with Oracle Java 7 in a non-reproducible fashion. I wanted to find out what exactly the program was up to at the point of the crash. Helpfully, on fatal errors java will generate a slightly obscurely named file
is the pid your deceased java process run under (the
comes from HotSpot, in case you wonder). This file contains amongst other things a VM stacktrace which will tell you were in C-land things went wrong.
But let’s jump straight to the chase and open the core dump file like in gdb like so:
OK, so far we’ve learned two things: firstly, the program died on SIGABRT (signal 6), which is raised by the abort(3) call, which is amongst other things invoked on failed asserts and secondly that there are no debug symbols in this java binary, hence no source level debugging which will make things more… interesting.
Some quick googling suggests there’s no quick and easy way to get a debug build for Oracle JDK (yet another reason to use OpenJDK…).
Undeterred we start out with a look at the b(ack)t(race):
Right, so it looks like java segfaulted whilst trying to read an entry from a
zipfile. Let’s look at the relevant stackframe:
OK, at this point the absence of debug symbols starts to make itself painfully felt and I wink over my colleague Jarek who’s got l00ter GDB skills than myself. Deprived of some nice local variables or source code to look at we can still turn to disassemble the address the instruction pointer (
register) points to and look at things at the ASM level like so:
But let’s skip that for now and take a step back. The main thing I’d actually like to know is what zipfile java was trying to read when it segfaulted. But without debug symbols and thus the ability to inspect local variables and function call args it’s not obvious how to do so. The first ingredient we need to make progress is the relevant function signature.
Thankfully, Java is GPL’ed so we could download the source, but a bit of Googling is quicker, the signature we’re looking for is:
Another quick search tells out what jzfile looks like:
Fortuitously the filename is right at the beginning of the struct, which means we could get at it easily if we just had a pointer to the struct instance itself: we just need another pointer dereference (by contrast getting at the content of
for example would be more involved, because we would not only need to work out the size of all the preceding entries in the struct but also how much padding the compiler added for alignment).
But how do we get at the pointer
that was passed as the first arg to
? Well, since I was running this on a 64-bit unix work station, we need to look up the linux calling conventions on amd64. It turns out the first argument it passed in the RDI register. So let’s have a look at the registers:
Right, so we want to inspect the content at
Alternatively, since we’re on a 64bit (= 8 bytes) machine, this gives the same
Let’s do the second de-reference:
Aha! The segfault occured as java was trying to read in SOME_PROJECT‘s
Since gdb can also evaluate simple C expressions with the p(rint) command, we
could also directly have done:
But what if we had indeed wanted to look at
instead? The easiest way would probably have been to do let the compiler do the work of computing the offset for us by writing a small program that defines the same struct type and prints out the offset between the two pointers. Or we could have just loaded a debug build of a program that uses that struct as a bogus symbol table in gdb. At this point though, I’m glad I only rarely need to leave the comfortable confines of a high level VM these days.