Merry Christmas: Toke — Tokyo Cabinet driver for Erlang

By: on December 21, 2009

Tokyo Cabinet is a rather excellent key-value store, with the ability to write to disk in a sane way (i.e. not just repeatedly dumping the same data over and over again), operate in bounded memory, and go really fast. I like it a lot, and there’s a likelihood that there’ll be a RabbitMQ plugin fairly soon that’ll use Tokyo Cabinet to improve the new persister yet further. Toke is an Erlang linked-in driver that allows you to use Tokyo Cabinet from Erlang.

There is already a Tokyo Cabinet driver for Erlang, tcerl, however, I couldn’t make it work: even after fixing the C so that it compiles (I hit this bug), I still couldn’t make it work. Inspecting the code, I get the feeling it’s bit-rotted – the Tokyo Cabinet API has moved on, and tcerl hasn’t kept up.

The other issue with tcerl is that it’s not a linked-in driver. Erlang allows two different types of drivers: the first are external C programs — these have a main() and run in their own process. Communication is done by stdin/stdout. These are a bit safer because if they crash they don’t take out the Erlang VM, but they’re never going to be blazingly fast. Toke, on the other hand, is a fully linked-in driver. It dynamically links with the Erlang VM, exists in the same address space and goes as fast as it possibly can (using the Erlang driver callbacks which avoid all copying of data passed from the Erlang). My tests show it’s about three times slower driving Tokyo Cabinet from Erlang via Toke, than driving it natively through C (which is quite good: some googling suggests both the Ruby and Python bindings to Tokyo Cabinet are rather slower). Toke is also about twice as slow as the Erlang ets module, which is in-memory only.

Toke only implements the Tokyo Cabinet hash table (tchdb*) functions, and doesn’t even support all of those: I only wrapped exactly what I needed. You’ll want to read the documentation for Tokyo Cabinet for these. The functions implemented are as follows (refer to the Tokyo Cabinet documentation to explain these further):

  • toke_drv:new/1 — Set up the driver with a new TCHDB object.
  • toke_drv:delete/1 — Destroy the driver’s TCHDB object.
  • toke_drv:tune/5 — Tune the driver’s TCHDB Object.
  • toke_drv:set_cache/2 — Set the number of records to cache.
  • toke_drv:set_xm_size/2 — Set the extra amount of memory mapped in.
  • toke_drv:set_df_unit/2 — Set the steps between auto defrag.
  • toke_drv:open/3 — Open a db.
  • toke_drv:close/1 — Close an open db.
  • toke_drv:insert/3 — Insert. If the key already exists, value is updated.
  • toke_drv:insert_new/3 — Insert new. If the key already exists, the old value is silently kept.
  • toke_drv:insert_concat/3 — Concatenate the supplied value with an existing value for this key.
  • toke_drv:insert_async/3 — Asynchronously insert. If the key already exists, value is updated.
  • toke_drv:delete/2 — Delete a key from the db.
  • toke_drv:get/2 — Fetch a key from the db. Returns ‘not_found’ on occasion.
  • toke_drv:fold/3 — Fold over every value in the db. This internally uses the iteration functions. It’s just wrapped up as a fold to make it appear more functional.
  • toke_drv:stop/1 — Stop the driver and close the port.

You should be able to use Mercurial to clone it:

# hg clone http://hg.opensource.lshift.net/toke/

Make sure you have Tokyo Cabinet installed (ideally from source. If you’re using a package, make sure you have the development headers available. If you do compile from source, make sure you ldconfig to make your system pick up the new library once it’s installed.). Then it should just be a case of make. There’s also a make run target that starts up an Erlang shell with the paths set up correctly for testing Toke:

toke# make run
erl -pa ebin +K true +A30
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:30] [hipe] [kernel-poll:true]

Eshell V5.7.4  (abort with ^G)
1> toke_test:test().
passed
2>

Toke is licensed under the MPL. As ever, feedback is very welcome, as are patches!

FacebookTwitterGoogle+

8 Comments

  1. Bhasker V Kode says:

    Will try this out. You should check out medici as well, a tokyo tyrant client.

    Have you had a chance to use erl_nif btw?

    ~B

  2. matthew says:

    @Bhasker, thanks for the pointer to Medici. I was vaguely aware of that but it definitely warrants a deeper look. The latest commits to toke improve the C somewhat, and the fold is fixed (before it wasn’t sending back the key!).

    I’ve not looked at erl_nif. It looks interesting, but obviously something that’s that new and experimental we can’t really use in Rabbit. We still support back to R12 if not earlier (I can never quite remember!)

  3. @kuenishi says:

    What if you need more than 16 tables? Usecase for Message Queue seems to need more than that…

    I’ve written both linkedin driver and NIF version(see my blog post in link URL). Linkedin driver version is a bit slow for larger number of database files.
    Linkedin-driver is good for throughput-oriented operation, while NIF is far better for latency-oriented implementation.

  4. matthew says:

    @kuenishi, Many thanks for your comment and your blog post which I have read in detail. I must admit I was totally unaware of yatce, which is why it’s not mentioned in my post.

    I do take objection to your comment in your blog post: “It’s a good example case that not expressing nor publishing in English leads products/opinions shall be ignored, and bad practice”. The simple fact is that I was not aware of yatce, and indeed, googling for “erlang tokyo cabinet” does not return any result for yatce until the 3rd page. If I had known about yatce in advance, then I would have tried to use it in preference to writing Toke: we have absolutely no desire to reinvent the wheel unnecessarily.

    The rest of your blog post I think is very interesting. I’ve not written a driver before, and your comments regarding the use of port_control are interesting and something I’ll look into further.

    We are definitely developing Toke’s API solely on a by need basis: we are not looking to do a full blown TC interface. Thus for our purposes, we don’t have any objection to the one-table-per-port limitation. I understand that this is going to limit throughput eventually when using multiple tables, but that’s not something we need to address right now. I don’t however understand your question about 16 tables. I’ve not come across limits on Erlang drivers and ports but maybe there’s something I’ve missed?

  5. matthew says:

    @kuenishi, I’ve just spent some time reworking Toke so that it uses port_control rather than port_command for some actions (eg get) and the associated changes to the driver. If anything, it’s slightly slower than the current code (as well as being slightly more messy). It would seem the context switch is not hurting me at all. Even for tests which I would have thought would most expose control as being faster (eg lots of gets), it’s in fact no faster at all. Thus I’m going to leave the code as is.

  6. smallfang says:

    hello,
    Today i install toke(Tokyo Cabinet driver for Erlang)library,but when run toketest:test() in erlang shell,I obtian a error:
    ** exception error: bad argument
    in function open
    port/2
    called as openport({spawndriver,libtoke},[binary,stream]).please help me thanks!

Post a comment

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>