Tuesday 25 August 2009

Persistence on HornetQ

Persistence on HornetQ is really fast. It is so fast that you would think you are sending non-persistent messages by accident.

Instead of using heavy weighted databases that would provide a bunch of stuff we don't need on HornetQ, we used something faster and still as reliable.

We have written our own circular file Journal that uses either Linux libaio or Java NIO.

Linux libaio is a library that works at the kernel level. We submit writes by sending a DMA Buffer (Direct Memory Access) and a callback interface. The kernel will deal directly with the buffer saving copy time between Java and the disk controller. When the disk is done with the write the callback is returned, and we are sure the data is persisted on the disk.

You may ask, what? At the kernel level?! At the controller level?

Yeah.. that' s the beauty of libaio. It provides system calls to the kernel.

BTW: If you are not a geek who loves programming like I do, you could stop reading this post now :-) Since I will dig a little bit on how it works:

The Journal:

The Journal has a set of pre-allocated files. We keep each file size as close as possible to what would fit on a disk cylinder. We have found a value of 10MiB but that could be different in other systems.

The journal is an append only journal. We always append to the current used file of the pre-allocated set. This way we avoid mechanical movements getting most of the performance possible out of the disk controller.

Deletes are taken as appended records. We add a delete record to the bottom of the file.

We also have a reference counting of records, so when the original file is totally clean (all the records deleted), that file is ready for reuse.

And the journal is transactional also. We have a very nice transactional control, where a commit record is only taken into consideration *if* the entire transaction is on the disk. That gives us ACID control.

SequentialFile interface:

We abstract the disk access through that interface. There are two implementations NIO and AIO. You can select what implementation you want through our configuration. (see User's Manual)

NIO:
This is a very fast approach already. We work at file level, avoiding disk movements. If you don't have Linux or libaio installed in your system, we default to this 100% Java implementation.

AIO (linux libaio):

We have written a small JNI layer that "talks" to libaio on Linux. The basic write method in java, has this signature:

write(int position, int size, ByteBuffer directBuffer, AIOCallback callback). (More detatils on the javadoc)

The buffer here is sent directly to a libaio method called aio_write. (look at aio_write man page).

Another thread will be polling events out of libaio. As soon as the data is on the disk the JNI layer will execute the callback method.

Instead of performing syncs on the disk (which is a slow operation), we use a concurrent latch. You could have many more transactions being executed in parallel. Instead of blocking the whole system while one sync is being performed, we just write as usual and wait for the callback. Each thread will use the most of the performance available at the disk controller. Instead of waiting an expensive sync operation, each thread will be waiting the callback when the data is safely stored.

Conclusion:

Persistence on HornetQ is not only fast but it also scales up when several threads are performing transactions.

This is just one of many of other innovations from HornetQ. We are working hard to make a great software. Feel free to contact us on IRC or our user's forum. We would love to get your feedback.

9 comments:

  1. This sounds pretty cool, nice explanation. Do you know how this compares to the file persistent store that AMQ uses by default?

    ReplyDelete
  2. libaio provides for the kernel based asynchronous I/O capabilities for Linux, but most other operating systems support kernel based asynchronous I/O. Do you think you will look into supporting it on other operating systems as well?

    ReplyDelete
  3. Hey, just another side note on AIO for other operating systems. The aio_write native method you are calling is a wrapper for the POSIX call, which then delegates to the Linux specific functions in the library, so your JNI layer could be used with any operating system that implements the POSIX AIO functions, which should be pretty much everyone, with the exception of Windows (although they have some POSIX stuff in there, so it might be there, I just don't know.)

    ReplyDelete
  4. Andy - using the POSIX api is something we've considered. I remember discussing this with Clebert before. Although I'm not sure how great the actual implementations are on OSes other than Linux - especially Windows!

    ReplyDelete
  5. John Russell-

    How does it compare to AMQ. All I can say is try it yourself - you will be very pleasantly suprised ;)

    ReplyDelete
  6. Andy - It is the opposite. The POSIX lib will make calls to libaio that will make calls to the kernel.

    I should write an implementation using POSIX at some point.

    ReplyDelete
  7. Yes, you are correct, I worded that all wrong, now that I looked at it. Typing too fast for my own brain.

    ReplyDelete
  8. "The buffer here is sent directly to a libaio method called aio_write. (look at aio_write man page)."

    libaio does not have an aio_write routine. This is quite likely the source of confusion among commenters about native AIO vs. POSIX AIO calls.

    Next, you can't guarantee your writes are on the disk platter without either disabling the write cache or issuing a cache flush. If you're not doing this today, you should look into implementing it.

    ReplyDelete
  9. Great Guys!

    i am very impressed, is sure that you took the messaging solutions to the next level.

    Great post, very well written, precise and cool :D

    Best Regards,
    Diego Pacheco

    ReplyDelete