Wednesday, December 15, 2010

The NZSTM Software transactional memory

Another look at NZTM: Nonblocking Zero-indirection Transactional
Memory (pdf)
and NZTM appendix.

Last year around this time I was planning to write a blog post about NZSTM since I had just read the paper on it at SPAA 2009. Then work kicked in and I lost the text. I just realized the pictures I made up were still on the drive.

I'm even busier now, but I figure it took me a while to make these pictures, so even a quick tour through them should be worth something even if software transactional memory isn't as hot these days.

There are a lot of STMs out there, but I figure this is the one to start with, it's practical in many ways and has some really neat techniques.

So you've got a bunch of objects that you want to be managed by NZSTM. You add 3 fields to each object. Owner type, owner pointer, and a backup pointer. There are only going to be a two owner types so you can store the type and the pointer in a single word, this will be important since compare and swap (CAS) only works on word sized items.

Assume sequential consistency here, no weird memory models.

Objects will be acquired by updating the owner field (o-type,o-pointer). I'll talk about the backup pointer and the owner type more later.



In the normal case, the owner of an object is going to be a transaction object. You create a transaction object when you start a transaction. These transactions aren't nested so you can assume each thread is only going to be using one at a time.

The transaction object just stores the state of the transaction. But also, importantly, it occupies a unique position in memory. This is a picture of the states that the transaction can be in and how you can move between them.



If we have a transaction and the object does not have an owner, we can acquire it by CASing in the pointer to our transaction into the owner field of the object.



This puts the object into an acquired state.



To commit a transaction in NZSTM, you use CAS to change its state as shown above. CAS is used because the owner of a transaction can commit it, but other threads may try to abort the transaction. A commit only involves the transaction object, you don't have to go back and access all of the objects involved.

To acquire an object that has already been used in a transaction, you have to


  1. Get the owner of the object, at this point, it will always be a transaction
  2. Get the transaction status
  3. If the transaction is committed or aborted, you can CAS your transaction in to acquire the object.


Once the transaction is committed or aborted, it cannot move to any other state and will not be used to access any of the objects it owns anymore, so we can safely take ownership of the object from it.



With this set of operations on objects, you can make a blocking STM, which you could just call locks with a different interface. Locking STMs are usually faster than non-blocking STMs because they avoid copying and avoid pointer chasing. The nice thing about NZSTM is that it acts just like a locking STM in the normal case so you get the performance benefits, but also gives you an option to abort another transaction if needed.

How does that option work?

To add the option, we need to add code to back up objects before you write to them. This is done by

  1. Making a copy of the object
  2. Setting the backup pointer to point to the object (plain write, no CAS)


The important part here is that all the bits are written to the copy before you set the backup pointer, otherwise other threads may see a partial copy.




To abort a transaction, we can CAS its state from ACTIVE to ABORT REQUEST. Only the owning thread of a transaction can move the transaction from ABORT REQUEST to ABORTED. If a thread checks its transaction and realizes that it has been requested to ABORT, it can move into the ABORTED stage which lets other threads know that it will no longer write to the objects it owns using that transaction.

If you want to acquire an object that is owned by an ABORTED transaction, you:

  1. Get the owner transaction object and read that it is ABORTED
  2. CAS in your own transaction object to kick it out
  3. If the CAS was successful, copy the backup data back to the object to undo any changes the ABORTED transaction may have done. Because the transaction is ABORTED, you know that there will be no more writes related to that transaction on the object, so for instance, something wouldn't suddenly change after you copied the backup data back over.




So far so good, but this only works if the thread that owns the transaction is able to realize that it has an ABORT REQUESTED and move to the ABORTED stage. To be able to work on objects owned by a thread that suddenly dies or just goes to sleep forever, we have to add in pointer indirection like other STMs.

The cool part is that this indirection only needs to be used as a last resort, so if it doesn't happen very often, most of the time the only price you pay is to make an extra copy of the objects you are writing to. But the actual reading and writing is done directly to the object, so they stay the same speed.

This is where the owner type field comes in. In the normal case, the owner of an object will be a transaction, so the type will be transaction. In the normal case objects are directly written and read to.

In the new case, the owner of the object will be a locator, the type will be a locator, and we will write and read from a copy of the object that we point to.

Using a locator, we can acquire an object owned by an transaction in ABORT REQUEST state without waiting for it to move to ABORTED. This picture calls it ABORT REQUEST "ABORTING" I guess I hadn't changed it.



We come across this object, which is in the ABORT REQUEST state, we can't acquire it in the normal way, so first we create a locator. The problem is that until the owner is ABORTED we can't do anything with the data actually in the object because it might be written to at any time.

The backup copy is safe though, so we:

  1. (aborting transaction) Get the pointer to the ABORT REQUEST transaction
  2. (backup) Get a pointer to the backup copy
  3. (updated copy) Copy the backup copy to a local copy to modify and use
  4. (owner)Get a pointer to our transaction


And put all of these things in a data structure called a locator:



After the locator is created, we CAS it in to the owner field to acquire the object. If the CAS was successful, no other thread managed to do this before us.



This puts the object in what is called an inflated state. An object is inflated if its owner is a locator type.




To read and write an inflated object, you have to go to its locator, get the update copy, and read and write that. It's slower but it lets you work on the object when a locking STM would not allow you to do anything.

I'm out of pictures, so it'll have to be text only for the end. Now that there are inflated objects we need be able to acquire an inflated object with a locator pointing to a committed transaction, and acquire an inflated object with a locator pointing to an abort requested or aborted transaction.

My memory is a bit off in this case, but the steps are mostly the same in all these cases. The object is inflated so it will be owned by a locator, call it the enemy locator.

  1. (aborting) get the aborting transaction of the enemy locator
  2. (owner) get a pointer to our transaction
  3. If the owner transaction of the enemy locator is COMMITTED:

    1. (updated) get a copy of the enemy locator update data
    2. (backup) point this to the enemy locator update data

  4. If the owner transaction of the enemy locator is ABORTED or ABORT REQUESTED:

    1. (updated) get a copy of the enemy locator backup
    2. (backup) point this to the enemy locator backup data



Put all these things into a locator, and CAS it in to the owner field of the object. If the CAS succeeds then you own the inflated object. The enemy can continue to modify its local copy of the object, but it will not affect anything else.

Finally, to finish it off, inflated objects are slower so we need to be able to deflate objects to their faster deflated forms. This can only be done if the aborting transaction, the one that still has direct access to the object, has moved the transaction to the ABORTED state. To do this, roughly the idea is:


  1. Acquire the object in inflated mode, meaning you will have to create a locator.
  2. Check the aborting transaction of the locator and see that it is ABORTED
  3. CAS the backup pointer of the locator to the backup pointer of the object
  4. CAS your transaction in with type transaction to the object owner, replacing your locator as the owner.
  5. Copy the backup data to the object.


There might be a modification to these steps to make sure the backup pointer is updated properly. The main idea is that you might become ABORT REQUESTED before copying the backup data to the object, but that is ok because any other transactions will only look at the backup data at that point anyway.

And since you can only deflate by CASing out your own locator, it makes sure that no other transaction acquired the object in between, or the CAS would fail.

The object will now point to a transaction type, and so it won't be inflated anymore.

That completes the system, a non blocking transactional memory with low overhead and direct object access in the common case with low contention.

One of the key ideas of NZSTM is that the pointer structure combined with the data in the structure encodes a state that an object can be in. Every form of acquiring an object uses a sequence of mini transactions involving one CAS to move the object into another state. At any of these states there is a sequence of mini transactions that can get you to an acquired state, that makes NZSTM nonblocking.

I was pretty loose about how CAS was used. Usually you read the field to get the compare value, then possibly do some other reads to create a swap value, then swap it in.

How do you reason about systems like NZSTM? My opinion is that with some work it can become fairly straightforward to write decent detailed proofs of correctness for them that cover the subtle logic involved and may even be able to involve weak memory models. Fully formal proofs may take some further time after that. I went through the process of making these kinds of proofs in my research and I really think it's just a matter of interest and time.

Are people interested in this kind of reasoning? There are definitely some groups doing a lot of good research in this area, but outside of it? I'm really not sure.