Event Streams on Windows Azure: Choosing the Storage Type

Duncan Edwards Jones

0/5 (0 vote)

Jan 9, 2016

CPOL

3 min read

5874

If you are using Windows Azure as a backing technology for an event store based system, there are three possible choices, each with pros and cons.

Azure SQL

This is (essentially) a cloud hosted version of SQL Server. To use it as a backing store for an event stream based system, you could create a core "events" table that stored the non-business part of the event stream such as the event type, sequence and event context and an individual "detail" table.

Pros

SQL Server is a very widely known technology which means that finding skilled developers and the tooling to help them make best use of the data is not a major impediment.

This also means it is easier to look inside the data to see what is happening for given events.

Cons

Adding new events or adding attributes to existing events requires a database schema change which can be a significant process. This, in turn, means that a database backed system will be resistant to evolution.

SQL server is also designed to be a concurrent update system and even if you are pure in making your event streams append-only, the underlying system has to be designed to allow deletes, updates, etc. and to maintain transactional consistency. This makes the system slower and less parallel friendly than other solutions.

Azure Tables

Azure Tables are a NoSQL type solution whereby a table can have different dynamic fields for each row, with only the partition key and row key (which uniquely identify the row between them) being common to all rows.

The easiest way to create an event stream on top of an Azure Tables solution is to use the partition key to hold the unique aggregate identifier and the row key to hold the event sequence.

Pros

There is reasonable tooling to allow the inspection of data within the event stream for diagnostic purposes.

If the partition key is chosen well, the Azure Table can be made use of in a highly parallel system.

Cons

There is a restriction of 252 attributes available for any given event as there are only 255 fields in a table.

The data are stored as a name:value pair which means that you end up storing (and transferring) a large amount of data that describes the event. This adds to cost and reduces the data transport speed.

Azure Blob Storage

A binary large object is a way of storing data that is either unstructured or has implied structure. In implementing an event stream on blob storage, I would recommend using a data structure that is common for the context of each event and then having the dynamic payload of the event follow that.

Microsoft has recently added an "append only" type of Blob which is perfectly suited to the needs of an event stream.

Pros

Each event stream can grow to 50,000 records and having a separate append blob for each unique event stream means a potential data size of 500TB is possible.

Cons

Because the data in the Blob are unstructured, you need to create your own tools to investigate that data. (Typically, this would be some form of binary serialisation/deserialisation.)

Conclusion

I would use SQL Azure if the event streaming aspect was only a small part of a much larger data system and if there was no need for very highly parallel processing.

I would use Azure Table storage for hybrid cases, where there is a need easily to inspect the event stream data and also for prototyping systems.

I would use append blob systems where I wanted to maximise the system performance and scale to the very top end of the cloud based system size.

Code for Change

One thing I would definitely recommend is writing the implementation of the event store against a defined set of interfaces and being fanatical in keeping the implementation logic separate to the business/domain logic. This will allow you to switch backing storage types as a project evolves.

In my own case, I did this by starting with an Azure Tables based prototype and then moving to an Append Blob based system.