Debugging a segfaulting binary without debug symbols

By: on March 31, 2013

We mostly use memory-safe high level languages at LShift (although we’ve done the odd embedded systems dev job), but sometimes a bit of systems programming knowhow still comes handy. I had the misfortune of a pure, i.e. no-JNI java program segfaulting on me with Oracle Java 7 in a non-reproducible fashion. I wanted to find out what exactly the program was up to at the point of the crash. Helpfully, on fatal errors java will generate a slightly obscurely named file hs_err_pid${pid}.log where ${pid} is the pid your deceased java process run under (the hs comes from HotSpot, in case you wonder). This file contains amongst other things a VM stacktrace which will tell you were in C-land things went wrong.

But let’s jump straight to the chase and open the core dump file like in gdb like so:

> gdb `which java` ./core GNU gdb (GDB) 7.1-ubuntu [...] Reading symbols from /usr/bin/java...(no debugging symbols found)...done. [...] Program terminated with signal 6, Aborted. #0 0x00007ff378354a75 in raise () from /lib/ (gdb)

OK, so far we’ve learned two things: firstly, the program died on SIGABRT (signal 6), which is raised by the abort(3) call, which is amongst other things invoked on failed asserts and secondly that there are no debug symbols in this java binary, hence no source level debugging which will make things more… interesting.

Some quick googling suggests there’s no quick and easy way to get a debug build for Oracle JDK (yet another reason to use OpenJDK…).

Undeterred we start out with a look at the b(ack)t(race):
(gdb) bt #0 0x00007ff378354a75 in raise () from /lib/ #1 0x00007ff3783585c0 in abort () from /lib/ #2 0x00007ff377d40455 in os::abort(bool) () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ #3 0x00007ff377ea0717 in VMError::report_and_die() () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ #4 0x00007ff377d43f60 in JVM_handle_linux_signal () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/ #5 #6 0x00007ff3763b837e in ZIP_Read () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/ #7 0x00007ff3763b7d5c in Java_java_util_zip_ZipFile_read () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/ #8 0x00007ff37052a17e in ?? () #9 0x0000000000000000 in ?? ()

Right, so it looks like java segfaulted whilst trying to read an entry from a
zipfile. Let’s look at the relevant stackframe:

(gdb) frame 6 #6 0x00007ff3763b837e in ZIP_Read () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/ (gdb) info frame Stack level 6, frame at 0x7ff361bdf3e0: rip = 0x7ff3763b837e in ZIP_Read; saved rip 0x7ff3763b7d5c called by frame at 0x7ff361be14b0, caller of frame at 0x7ff361bdf3b0 Arglist at 0x7ff361bdf3d0, args: Locals at 0x7ff361bdf3d0, Previous frame's sp is 0x7ff361bdf3e0 Saved registers: rbx at 0x7ff361bdf3b0, rbp at 0x7ff361bdf3d0, r12 at 0x7ff361bdf3b8, r13 at 0x7ff361bdf3c0, r14 at 0x7ff361bdf3c8, rip at 0x7ff361bdf3d8

OK, at this point the absence of debug symbols starts to make itself painfully felt and I wink over my colleague Jarek who’s got l00ter GDB skills than myself. Deprived of some nice local variables or source code to look at we can still turn to disassemble the address the instruction pointer (rip register) points to and look at things at the ASM level like so:

(gdb) disassemble 0x7ff3763b837e
But let’s skip that for now and take a step back. The main thing I’d actually like to know is what zipfile java was trying to read when it segfaulted. But without debug symbols and thus the ability to inspect local variables and function call args it’s not obvious how to do so. The first ingredient we need to make progress is the relevant function signature.

Thankfully, Java is GPL’ed so we could download the source, but a bit of Googling is quicker, the signature we’re looking for is:
jint ZIP_Read(jzfile *zip, jzentry *entry, jlong pos, void *buf, jint len);
Another quick search tells out what jzfile looks like:
typedef struct jzentry { /* Zip file entry */ char *name; /* entry name */ jlong time; /* modification time */ jlong size; /* size of uncompressed data */ jlong csize; /* size of compressed data (zero if uncompressed) */ jint crc; /* crc of uncompressed data */ char *comment; /* optional zip file comment */ jbyte *extra; /* optional extra data */ jlong pos; /* position of LOC header or entry data */ } jzentry;

Fortuitously the filename is right at the beginning of the struct, which means we could get at it easily if we just had a pointer to the struct instance itself: we just need another pointer dereference (by contrast getting at the content of char* commentfor example would be more involved, because we would not only need to work out the size of all the preceding entries in the struct but also how much padding the compiler added for alignment).

But how do we get at the pointer jzfile *zip that was passed as the first arg to Zip_Read? Well, since I was running this on a 64-bit unix work station, we need to look up the linux calling conventions on amd64. It turns out the first argument it passed in the RDI register. So let’s have a look at the registers:

(gdb) info registers rax 0x7ff361bdf3f0 140683293619184 rbx 0xf5 245 rcx 0x7ff361bdf3f0 140683293619184 rdx 0x0 0 rsi 0x0 0 rdi 0x7ff3641a9d40 140683333246272 rbp 0x7ff361bdf3d0 0x7ff361bdf3d0 rsp 0x7ff361bdf3b0 0x7ff361bdf3b0 r8 0xf5 245 r9 0x7ff361be14d8 140683293627608 r10 0x7ff37052a104 140683538243844 r11 0xeeb4f6e0 4004837088 r12 0x7ff3641a9d40 140683333246272 r13 0x0 0 r14 0x7ff361bdf3f0 140683293619184 r15 0x352f1d8 55767512 rip 0x7ff3763b837e 0x7ff3763b837e eflags 0x206 [ PF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0

Right, so we want to inspect the content at 0x7ff3641a9d40.

(gdb) help x Examine memory: x/FMT ADDRESS. ADDRESS is an expression for the memory address to examine. FMT is a repeat count followed by a format letter and a size letter. Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal), t(binary), f(float), a(address), i(instruction), c(char) and s(string). Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes). The specified number of objects of the specified size are printed according to the format. Defaults for format and size letters are those previously used. Default count is 1. Default address is following last thing printed with this command or "print".

(gdb) x/a 0x7ff3641a9d40 0x7ff3641a9d40: 0x7ff3641a9e10

Alternatively, since we’re on a 64bit (= 8 bytes) machine, this gives the same
(gdb) x/gx 0x7ff3641a9d40 0x7ff3641a9d40: 0x00007ff3641a9e10
Let’s do the second de-reference:
(gdb) x/s 0x00007ff3641a9e10 $4 = 0x7ff3641a9e10 "/home/alexander/.m2/repository/SOME_PROJECT/resources-1.0.jar"
Aha! The segfault occured as java was trying to read in SOME_PROJECT‘s
resources jar!

Since gdb can also evaluate simple C expressions with the p(rint) command, we
could also directly have done:
(gdb) p *((char**) 0x7ff3641a9d40) $4 = 0x7ff3641a9e10 "/home/alexander/.m2/repository/SOME_PROJECT/resources-1.0.jar"


But what if we had indeed wanted to look at char* comment instead? The easiest way would probably have been to do let the compiler do the work of computing the offset for us by writing a small program that defines the same struct type and prints out the offset between the two pointers. Or we could have just loaded a debug build of a program that uses that struct as a bogus symbol table in gdb. At this point though, I’m glad I only rarely need to leave the comfortable confines of a high level VM these days.



  1. Tim Flynn says:

    Interesting, just thought I’d mention that another way you probably could have found the problem file (once you new you were looking for a problem with a file load) would be to run the java process with strace -eopen. Most likely the problem file would have been the last file opened by the process.

Post a comment

Your email address will not be published.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>