Zulfiqar's weblog

Architecture, security & random .Net

Archive for the ‘Architecture’ Category

Queues and Workers

Posted by zamd on October 10, 2014

In the last post, I briefly talked about the architecture of project I’m currently leading. We got a clear read/write separation in the architecture and for past few sprints we are pushing more & more work on the write path which made our write pipeline a bit heavy. Our challenge is to quickly process huge bulk of tags flowing through the write pipeline. Just to give you an idea of the numbers:

The business process involves doing an “all clothing” stock count every Monday between 06:00 – 09:00 AM. For our UK deployment, we are aiming 600 stores each containing an inventory of roughly 80k garments. To get good accuracy from RFID tags, all inventory must be counted at least twice.  The usual process is, a group of people count the backroom and shop floor in parallel – they then swap and count again.  So the math look like this:

80K * 600 = 48 million * 2 = 96 million tags

These needs to be processed within 3 hours which equates to roughly 10,000 commands/tags/second to our backend service.

We already spent quite a bit of time to optimize the inventory pipeline and 99th percentile latency is below 100ms which is reasonably good considering we are using NHibernate & Oracle and calling bunch of backend services. There is further juice we can extract out of inventory pipeline but realistically to process all this load, we need to scale out the system.  We kind of knew this from day one, so we designed the system in a way where commands are pretty much queuable after some simple invariants checking.

We came with a very simple scale model to run multiple workers behind a set of queues – we started by re-hosting our domain in a worker process (a simple console application, the plan is to use NSSM in production). This simple model works great – workers compete for the commands, a single worker read a command under peak-lock and run the unit of work(UOW). If UOW cannot be committed, message is simply retried on another worker and in most cases the transient failure (a small race condition :-)) gets resolved on subsequent retries.

With this simple model, we were able to get a throughput of over 2000 tags/second using 16 workers on a single beefy machine.

There is huge number of duplicates in our scenario, so our next attempt was to detect/remove duplicate before they hit our workers. Ideally the messaging system should do this for us – We use Tibco EMS but unfortunately EMS doesn’t have any built de-duping functionality.

We also use Redis as our read store in our architecture – so we decided to build de-duping (on publish) functionality in Redis using the simple Get/Set operation. The results were awesome as we can de-dup a batch of 50 tags in 0.3ms.

This one change has significantly reduced the problem size for us as there are at least 100% duplicates in a stock count and there is no way to avoid them on the client/sender side.  By efficiently de-duping them on server means our workers only has to process 1/2 of the load ~ 5000 tags/sec

Another interesting pattern we have seen is around large UOWs which becomes very in-efficient to be done as a single UOW synchronously. In these situations, a worker simple breaks the larger UOW into ‘N’ smaller UOWs which are queued and then processed in parallel. The downside here is that coding become bit tedious, as we are reading a message from the queue, breaking it down in smaller messages and writing them back in the queue.  It’s not perfect, but it gives us a nice way to break & parallelize large UOWs (and we got plenty of these).

Posted in .net, Architecture, Redis | Leave a Comment »

RFID Stock Management–Architecture & Challenges

Posted by zamd on September 12, 2014

I’m currently working on an RFID based stock management solution – The high level goals of the solution is to ensure, the stock shown on the computer is the stock available in the store, correct products (in correct quantities) are displayed on the shop floor & to reduce stock loss by having a real-time visibility of what’s passing through the tills before it’s taken out of the store. As a start we are only doing this for clothing products where our items are source (factory) tagged with RFID tags and these items are then tracked from deliveries to sale & returns using various types of RFID readers like Handhelds, Fixed Portals, Security Gates and Click & Collect readers etc.

The hardware side of project is interesting but it’s mostly off the shelf readers & gates supplied by Motorola, Checkpoint & Nedap. These readers are doing the bulk of work and we run a simple integration agent on top of them to connect them to our software backend.

Our software backend is SOA based (REST) web services built with ASP.NET Web API & hosted on Windows. From design & implementation perspective, we use CQS, DDD & Event Sourcing and our domain entities are persisted (using NHibernate) in Oracle (Exadata) which is our master data store. We use SpecFlow/NCrunch to automate our acceptance testing and NUnit for unit testing.

We started this as a typical .NET project and had interesting challenges around performance & latency on the read path, which pushed us to do more & more work on the asynchronous write path. We started to separate our read & write stores and decided to build the read store completely in the cache based on the event stream we capture on the write pathh. We started with Couchbase with it’s memcached Data Bucket as our first choice for read store– Couchbase is a great technology, it converges the key-value & document store models into one great product. I love the power & simplicity of it’s map/reduce framework.

For us the Couchbase didn’t work as well as our latency was still high because of the computation involved on huge list of stock data. We needed to bring data streams from the cache into our service and compute variances and categorization etc and then store the computed results back in the cache.

Our next choice was Redis, the Sorted Set data structure in Redis aligned nicely with the data model we need to store & compute. I’m extremely impressed with the power of Redis and ability to run computation in the cache is exactly what we needed. Most of our computation can be done with a single union or intersection command on sorted sets which is a sub millisecond operation. We are actively building on this model and in future posts, I’ll share more details on our architecture and specifics of Redis usage. Our high level architecture looks like this…

 

image

Posted in Architecture, Couchbase, Redis | Leave a Comment »