Transactions Everywhere

September 26th, 2005 matthias

In an interview about the future of Visual Basic 9, Erik Meijer makes the following prediction:

We will use transactions everywhere; in-memory, database, XML, everywhere. We will have one way for dealing with concurrency.

This statement is made in the context of talking about Software Transactional Memory, and the concurrency control mechanisms one can build on top of it. When I first read about STMs, this is exactly the direction I was thinking in - I want to be able to have transactions that span memory, in more than one system, several databases and other transaction-aware resources. And I want to be able to use these transactions for concurrency control.

This, of course, is nothing other than Distributed Transactions - an area that has been researched for decades and for which several commercially supported Distributed Transaction Protocols (DTPs) exist, e.g. XA, CORBA OTS, JTS, and Jini, plus a plethora of Web Services related specs.

The big question is: Can STMs participate in a DTP?

The first hurdle is that STMs used for concurrency control, though not plain STMs, requires nested transactions in order to implement the ‘orelse’ choice construct. Nested transactions are not supported very widely, and the main DTP standard, XA, does not include them. Other standards like CORBA OTS, Jini and JTS do though, but that is of little help when most of the potential DTP participants, notably the most popular databases, lack such support.

The second hurdle is that STMs’ concurrency control requires a mechanism to retry transactions when something has changed. This is something that none of the existing DTPs support. However, I think the addition of just a single operation to the transaction participant API is sufficient to meet the requirement. The operation instructs the participant to abort a transaction and notify the transaction manager when some of the underlying data has been modified.

Retro-fitting this operation onto existing DTP implementations is hard if one wants to be precise, i.e. ensure that the notification is only triggered when the relevant data has really changed. However, such precision is only required for optimal performance. There is no harm in making notification more coarse-grained. In fact, this is exploited by some STM implementations in order to reduce the amount of book keeping information. Applying this technique to, say, a legacy database, we could wrap the entire database interface, and trigger notifications whenever a transaction has committed. That is the equivalent of having one ownership record for an entire STM.

The third, and final hurdle is guaranteeing progress. STMs employ an optimistic concurrency model in which transaction execute without acquiring any locks, and the final commit only fails when any data read by the transaction has been modified by the commit of another transaction. This guarantees progress in a single-machine setting. DTPs, however, employ 2PC, in which the decision on whether a transaction is committable is separated from the instruction to actually commit the transaction. In a naive port of STMs to the DTP world this can destroy the progress guarantee. The problem is that once an STM has voted to commit a transaction, it cannot vote to commit another transaction that has read from some of the memory locations modified by the first until the first transaction has either committed or aborted. It therefore must either vote to abort the second transaction, which may result in lack of progress, or delay the decision, which may result in deadlock.

In conclusion, the transactional world envisaged by Meijer and others may well be achievable, but clearly more research is needed before we can tell for sure.

Entry Filed under: Technology

2 Comments Add your own

  • 1. matthias  |  September 29th, 2005 at 4:35 pm

    There is one other, rather important question Are STMs the right abstraction? The literature suggests that they really only work well in shared memory architectures, and when there is little contention.

  • 2. matthias  |  October 11th, 2005 at 10:37 am

    While STM operations are cheap compared to other methods for controlling access to shared data in a multi-threaded system, they are still substantially more expensive than straightforward memory access. However, only shared, mutable storage needs STM semantics and incur the cost of STM operations.

    It thus follows that STMs are only suitable as a memory model for languages where mutation and sharing is rare and static program analysis can easily determine where it takes place.

    This makes languages like Haskell - with its pure functional characteristics guaranteed by the type system - ideal candidates for STM implementations.

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed

Calendar

September 2005
M T W T F S S
« Aug   Oct »
 1234
567891011
12131415161718
19202122232425
2627282930  

Most Recent Posts