The great GC vs reference counting debate

By: on September 19, 2013

I read a blog post post recently to the effect that GC is too expensive in mobile devices, Steve Jobs was right, reference counting is the way. It’s titled ‘why mobile web apps are slow’.

I’m inclined to take issue with this: It’s a long since resolved dispute, and GC won. I don’t want Steve Jobs to reach out from the grave and drag us back to the 70s. There’s nothing special about mobile phones: they are more powerful than the computers that GC won on in the first place.

Don’t get me wrong, Apple must have had a good reason to choose reference counting for memory management. That reason is actually pretty obvious: Objective-C is not a safe language, so you can’t move objects. If you can’t move objects, You have to use something like malloc to minimise fragmentation: you pay a big up front cost to allocate an object in the right place. In most GC schemes, allocation is free – you just increment a pointer. This is fine, because you are going to iterate through all your objects every now and again, and it doesn’t cost a whole lot more to move some of them around at the same time. You get to compact away all the fragmentation this results in. This makes GC fairly miserly with memory too. In iOS , if you want to allocate something bigger than the largest free fragment, you die. With GC, you have to wait a while for the GC to compact everything, after which memory will be compacted 100% efficiently and there will be a single memory extent as big as all available memory. Objective-C can’t have any of these benefits, so Apple chose a different route for iOS, and put the best possible spin on it.

The amusing thing is, this leads us to why web apps are so slow. Reference counting requires the developer take care to avoid reference loops, because they leak memory. iOS just demands the programmer to take this care. Browsers can’t  do that, because the ECMAScript standard doesn’t place this demand on programmers. Browsers must use GC. But they can’t move objects, because they are largely written in C, which is also not safe. The C code and javascript share a large data structure: the DOM. The ECMAScript can manipulate the DOM, so the DOM is managed memory. Fragmentation has to be managed on allocation, and on top of that we have to GC. A browser is the worst possible hybrid of memory management techniques.

The author of the above post doesn’t see things this way. He references Python as an example of GC, but it actually does things the same hopelessly inefficient way the web browser does, for the same reasons. He cites a 10 year old paper for empirical evidence on how inefficient GC is. Citations are good, but I think there’s a bit of a sample bias here. Anyway, the paper has a few problems. It says malloc takes 6 clock cycles. It does, until you allocate a certain amount of memory, configured during compilation of libc. once you pass that threshold, it starts trying to manage fragmentation. Then it takes 120 cycles. I suspect they assumed malloc takes constant time, and neglect to measure the cost of free at all. The GC test uses Jikes, and the authors pick their own GC strategies, rather than use proven ones. None of the strategies make any sense, and no rational for there selection is offered: you would expect the results to be poor. A test of the performance of the standard sun JVM strategy isn’t included.

Mobile apps won’t be fast until browsers are re-written in a fast, safe programming language that can share a good GC with ECMAScript. Of course browsers won’t be secure until then either, but I’m not holding my breath. There is no such language that is sufficiently popular, and Google fluffed it’s lines with Go. Maybe ECMAScript implementations will gain so much in performance, that all code that manipulates the DOM will be written in ECMAScript, and we can finally use a high performance GC.


  1. Masklinn says:

    The use of Python as an example in the post is quite funny as CPython (the main implementation, though pypy is also mentioned) is usually derided for not having a GC as it stuck to refcounting. Unless you create cycles (which will require the invocation of the cycle breaker to be collected) it thus has no more GC overhead than Objective-C (it has a lot of memory overhead in other places)

  2. Sam Tobin-Hochstadt says:

    You should look at Mozilla’s Rust and Servo projects, which are the language and web browser projects that you’re looking for.

  3. Peter Gerdes says:

    Now don’t get me wrong. I’m all for GC in almost all applications (I have yet to see a good GC that doesn’t stop the world while allowing allocation and object sharing on many threads…but I suspect these papers are trying to solve too much rather than getting the real use cases first).

    However, REFERENCE COUNTING MUST BE FASTER THAN GC. Why? Because exact GC (no mistaking ints for pointers all garbage is eventually collected) IS heavily heavily optimized reference counting. I still reference count if I use the optimization that when I delete x I remove 3 references to and skip the code that counts them up in the delete. GC just goes all the way and stores the reference counts in the pointer graph itself.

    Ok, this sounds like nitpicking but I bring it up for a good reason. The right distinction is whether humans or compilers/runtimes manage garbage not unprincipled distinctions between the kind of algorithms used.

    The best algorithms blend aspects of both. They trace out possible data paths and infer that no references can exist at certain places. For certain kinds of objects they may simple store the number of paths from the root set to an object if these can be easily recognized.

    But yah, the article stupid. LISP GC was pretty damn good even 10 years ago.

  4. david says:

    For example: a copying collector for a language with immutable data is not reference counting: The collector never knows how many references there are, it just knows which generation any given object is in (by doing pointer comparisons).

    Even with mutable data, and mark sweep, a GC only tags things with reachability, which is boolean: not counting.

    I’m not aware of a GC that ‘stores reference counts in the pointer graph’. Can you link to one?

  5. tonyg says:

    The paper “A Unified Theory of Garbage Collection”, Bacon, Cheng and Rajan (OOPSLA 2004) is directly relevant here:

    From the abstract: “Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties. […] they are in fact duals of each other. […] all high-performance collectors […] are in fact hybrids of tracing and reference counting.”

  6. david says:

    This paper keeps saying that mark sweep calculates reference counts. But admits that in a naive implementation, it doesn’t.

    Basically, this line is a bit telling: ‘While this may seem slightly counter intuitive, recall that in our formulations all collectors compute reference counts, rather than just mark bits’

    Obviously you can describe collection algorithms this way: knowing that you have > 0 references always tells you if an object is reachable, but the observation is meaningless, unless the algorithm actually uses the count.

    One example is keeping the reference count in ‘remembered set’ that keeps track of references from mature to nursery. Obviously you don’t need to know the reference count to know it’s non-zero, and this is enough for all the collectors I know of.

    If you don’t know the reference count collection of cycles that cross generations is delayed. This could be really important. Or it could not be. No evidence is presented either way.

  7. Paul Crowley says:

    If an algorithm allows an object to be

    • allocated simply by adding to a free pointer (checking for overrun)
    • referenced in a new object simply by writing a reference
    • freed in zero clock cycles by simple omission

    then it’s not in any meaningful sense reference counted. Sure, objects that survive collection are in some sense being counted (though only the 0 and >0 cases matter) but the important thing from a performance point of view is that very little work is done for short-lived objects.

  8. david says:

    If I was ever aware of the existence of Rust, I’d forgotten. It sounds a lot like a programming language I might use, so thanks for pointing that out.

  9. Ben says:

    No one outside of research uses best practice ref counting …most ref count is quite simple .

    Hybrid GCs which use Nurseries and Ref counting have great performance with low pauses , better than any GC or ref count. Examples are URC and RC-Immix .

  10. david says:

    I had a quick look at this paper:

    Is this what you mean?

    This paper appears to refer Metronome: ‘Bacon, Cheng, and Rajan’. It says the write barrier hurts throughput. Metronome achieves < 1ms pauses, with 70% BMU. The URC tests don’t get to 70% BMU until the pause is >100ms. Metronome needs enough spare memory for all allocations during a collect pass, and passes are much bigger then pauses.

    Or am I just misreading this somehow?

    I’m looking for memory management solutions for a safe language implemented in a micro controller (see a later post). This seems not too complicated to implement, so it’s of interest, although I’m also looking at
    ‘Using Page Residency to Balance Tradeoffs in Tracing Garbage Collection’

  11. Jon Harrop says:

    @Ben: “Hybrid GCs which use Nurseries and Ref counting have great performance with low pauses , better than any GC or ref count.”

    What gave you that impression?

  12. Jon Harrop says:

    @David Ireland: Great post. I couldn’t agree more.

  13. Ram says:

    Let’s create a problem so we can bicker over the best solution! ……. Wait! …… here’s an idea! Let’s not create the problem to begin with!

    This problem was created when newer languages started to have built in memory management. In C++ this isn’t even an issue *IF* you put some up front thought into your data structures. This is where reference counting shines. You only use it where you need it and therefore don’t create loops in the first place. If you need some back-pointer somewhere, just make it a weak (normal) pointer and in fact you can use weak pointers almost everywhere when passing stuff around. Strong pointers are only used to designate ownership. ….. But you say what about new/malloc? I’m at the mercy of the system there. Well not really. There are many ways to speed that up using fixed size object pools and free-lists so that memory allocation can be just popping something off a stack in most cases. Such code is very easy to write. The point is you can tailor your memory management for what you need to do. A bad solution isn’t just rammed down your throat. Sorry but in the real world if you take care of memory management yourself you are often going to get a big speed bonus and it really isn’t so hard to do. The real problem is too few people are taught how to do this properly.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>