Merry Christmas: Toke — Tokyo Cabinet driver for Erlang
Tokyo Cabinet is a rather excellent key-value store, with the ability to write to disk in a sane way (i.e. not just repeatedly dumping the same data over and over again), operate in bounded memory, and go really fast. I like it a lot, and there’s a likelihood that there’ll be a RabbitMQ plugin fairly soon that’ll use Tokyo Cabinet to improve the new persister yet further. Toke is an Erlang linked-in driver that allows you to use Tokyo Cabinet from Erlang.
There is already a Tokyo Cabinet driver for Erlang, tcerl, however, I couldn’t make it work: even after fixing the C so that it compiles (I hit this bug), I still couldn’t make it work. Inspecting the code, I get the feeling it’s bit-rotted – the Tokyo Cabinet API has moved on, and tcerl hasn’t kept up.
The other issue with tcerl is that it’s not a linked-in driver. Erlang allows two different types of drivers: the first are external C programs — these have a
and run in their own process. Communication is done by stdin/stdout. These are a bit safer because if they crash they don’t take out the Erlang VM, but they’re never going to be blazingly fast. Toke, on the other hand, is a fully linked-in driver. It dynamically links with the Erlang VM, exists in the same address space and goes as fast as it possibly can (using the Erlang driver callbacks which avoid all copying of data passed from the Erlang). My tests show it’s about three times slower driving Tokyo Cabinet from Erlang via Toke, than driving it natively through C (which is quite good: some googling suggests both the Ruby and Python bindings to Tokyo Cabinet are rather slower). Toke is also about twice as slow as the Erlang ets module, which is in-memory only.
Toke only implements the Tokyo Cabinet hash table (tchdb*) functions, and doesn’t even support all of those: I only wrapped exactly what I needed. You’ll want to read the documentation for Tokyo Cabinet for these. The functions implemented are as follows (refer to the Tokyo Cabinet documentation to explain these further):
- toke_drv:new/1 — Set up the driver with a new TCHDB object.
- toke_drv:delete/1 — Destroy the driver’s TCHDB object.
- toke_drv:tune/5 — Tune the driver’s TCHDB Object.
- toke_drv:set_cache/2 — Set the number of records to cache.
- toke_drv:set_xm_size/2 — Set the extra amount of memory mapped in.
- toke_drv:set_df_unit/2 — Set the steps between auto defrag.
- toke_drv:open/3 — Open a db.
- toke_drv:close/1 — Close an open db.
- toke_drv:insert/3 — Insert. If the key already exists, value is updated.
- toke_drv:insert_new/3 — Insert new. If the key already exists, the old value is silently kept.
- toke_drv:insert_concat/3 — Concatenate the supplied value with an existing value for this key.
- toke_drv:insert_async/3 — Asynchronously insert. If the key already exists, value is updated.
- toke_drv:delete/2 — Delete a key from the db.
- toke_drv:get/2 — Fetch a key from the db. Returns ‘not_found’ on occasion.
- toke_drv:fold/3 — Fold over every value in the db. This internally uses the iteration functions. It’s just wrapped up as a fold to make it appear more functional.
- toke_drv:stop/1 — Stop the driver and close the port.
You should be able to use Mercurial to clone it:
# hg clone http://hg.opensource.lshift.net/toke/Make sure you have Tokyo Cabinet installed (ideally from source. If you’re using a package, make sure you have the development headers available. If you do compile from source, make sure you
to make your system pick up the new library once it’s installed.). Then it should just be a case of
. There’s also a
target that starts up an Erlang shell with the paths set up correctly for testing Toke:
toke# make run erl -pa ebin +K true +A30 Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:30] [hipe] [kernel-poll:true] Eshell V5.7.4 (abort with ^G) 1> toke_test:test(). passed 2>
Toke is licensed under the MPL. As ever, feedback is very welcome, as are patches!