technology from back to front

Garbage Collection in Erlang

The new persister that is being developed for RabbitMQ is nearing completion and is currently working its way through code review and QA. It’s being pretty thoroughly tested and generally stressed to see what could go wrong. One of the issues that we’ve come across in the past has to do with Erlang’s garbage collector: indeed there’s code in at least one area of RabbitMQ written in a specific (and non-obvious) way in order to work around issues with Erlang’s garbage collection of binary data.

We had noticed in the release notes for Erlang R13B03, that it mentions improvements to the garbage collector, and today when testing with both R13B02 and R13B03, we noticed substantial improvements with R13B03. The new persister is able to send out to disk partial queues. Thus a queue can have a mix of messages – some just in RAM, some just on disk, and some somewhere in between. This is separate from whether or not a message is marked persistent. The proportion pushed out to disk varies smoothly with the amount of RAM left available to Erlang: the idea is to avoid flooding the disk with enormous amounts of write requests which would potentially stall the queue, and cause blockages elsewhere in RabbitMQ.

The test I’d written used the Erlang experimental client. It had one channel, it created a queue, consumed from the queue, set QoS prefetch count to 10, and then went into a loop. In this loop, it would publish two 1KB messages, then receive 1 message, and acknowledge it. This way the queue would always grow, and memory would be fairly fragmented (the gap from the head of the queue to the tail of the queue would increase steadily as the head is moving forwards at twice the rate of the tail). With no memory limit, I saw the following (I manually killed this after the queue grew to just over 350,000 messages long (which means 700,000 publishes, and 350,000 acknowledgements)):

Memory usage with no memory limit set

Note that for R13B03, the garbage collector is much more active, and in general memory usage is certainly more fine-grained. In this test, all the messages were always in RAM, no messages were pushed out to disk. Flat-size refers to the value returned by pushing the queue state through erts-debug:flat-size/1 which returns the amount of memory used by the data structure.

Next, I imposed a limit of about 200MB and did the same test. With R13B02, it got stuck after just over 260,000 messages: it was no longer able to reclaim any further space, and so flow-control kicked in and stopped the publisher, game over. With R13B03 it soldiered merrily on – I ended up manually killing it somewhere past the 1million message mark as I was getting bored. It’s also very clear to see how with R13B03, it successfully kicks down to pushing all the messages out to disk (which is why the size of the state suddenly gets very small – the memory growth from there on is due to an ets table). That’s certainly still possible with R13B02, and I have seen that happen, but there’s much greater risk, as seen here, of it getting stuck before that happens.

Memory usage with 200MB limit

In short, the garbage collector in R13B03 seems a solid improvement. Even if you’re not using the experimental new persister, I suspect you’ll gain from upgrading to R13B03. And yes, that really is 1-million 1KB messages successfully sent into a queue using under 200MB of RAM.

by
matthew
on
01/12/09
  1. [...] First Tweet: 1 hour ago lshift Highly Influential LShift Ltd New blog post: Garbage Collection in Erlang http://www.lshift.net/blog/2009/12/01/garbage-collection-in-erlang retweet [...]

  2. Social comments and analytics for this post…

    This post was mentioned on Twitter by paulosuzart: New blog post: Garbage Collection in Erlang http://www.lshift.net/blog/2009/12/01/garbage-collection-in-erlang (via @lshift)…

  3. Stephen Day
    on 01/12/09 at 9:27 am

    Great Work, Matthew!

  4. James Seigel
    on 16/12/09 at 4:04 pm

    Hello!

    I was wondering what this statement “Next, I imposed a limit of about 200MB and did the same test” means. Is there a way in rabbitMQ to put hard limits on the memory consumption?

    Cheers
    James

  5. @James,

    In Rabbit v1.7.0 and before, there is the memsup mechanism which can work, though has some flaws. http://www.rabbitmq.com/extensions.html#memsup

    For the next release of Rabbit (even if it doesn’t contain the new persister – v1.7.1), there will be a new memory control which is much better at detecting and constraining Rabbit’s memory usage.

  6. Hello,

    Can we start using the new persister implementation now?

    Thanx,
    – baliga

 
 


five − = 4

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us