Thursday, July 25, 2013

Sequential access to resources in Python

I was in a situation where I wanted all access to an SQLite database from multiple threads to happen sequentially. Initially I used a lock to keep more than one thread from accessing the database at a time, which worked fine. But I wasn't really satisfied. It wasn't fun enough. Taking inspiration from this excellent answer on stackoverflow, I started playing around with making access to the database happen sequentially using queues. The idea was to place the SQL query onto a queue which has a single worker waiting for it. When the worker receives the query, it runs it and then puts the result onto a return queue. Since one worker can only run one task at a time, the queries will always happen in the order received. But I did not want to deal with the queues explicitly, instead I wanted library code to handle it behind the scenes. Furthermore, I wanted to expand this to deal with more than just SQL queries; I wanted to sequentialize any task in this way.

I started with an abstract base class which I called Task. This is not truly an abstract class because, well this is Python1, and Python does allow you to make a direct instance of it. But in order to be useful, Task needs to be subclassed and extended which I'll explain below. Task contains the request for the resource, or "thing to run", and also a return queue on which to put the result when it is available. It ties the request to the result so it doesn't get lost among all the other requests. Since I wanted to make this a general purpose system, I decided there should be separate queues for different types of requests, so Task also contains the name of the queue on which its requests are placed so they end up in the right place.

There is also a dispatcher to route the requests to the right queue. This would not be needed if this was only for doing SQL queries, but as I said, I wanted it to be general purpose. I also added a get_queue() function to make the _queues variable accessable cleanly outside of the module.




To do anything useful, Task must be subclassed and must have an executor method. The executor is a static method which runs in a thread started by the dispatcher. Its job is to wait on the queue and processes requests, putting the results onto the return queue.

For my SQLite queries, I wanted to be able to do transactions, so the query Task implementation, which I called Query, needs to be able to run multiple queries at a time. I thus included an add() method that appends queries to a list, rather than, say, having a single query passed into the constructor. Multiple add() calls can be chained before calling run(), which does the actual sending off to the dispatcher, which then places it on the correct queue on which the executor is waiting.



In cases where multiple queries are run, I decided to place only the last result onto the return queue. This wasn't a problem for me, since in most cases where I would run multiple queries I didn't care about a return value, and it was only when doing a (single) select that I wanted a result.

Finally, the dispatcher needs to be started with the start() function before anything will work. I also included a shutdown() function.



To use, create a Query object, add one or more queries, then run().



I also added some convenience functions for running queries without having to create the Query object directly.



This worked out nicely for the database access. Now I wanted to make it so I could limit access to functions in the same way. I made a subclass of Task called Function, whose job is to run functions, and I created a decorator called atomic() to make any function pass through a Function.



The atomic() decorator takes an argument which represents the name of the queue on which the requests will be placed. The nice thing about it is that it allows you to group functions together onto the same queue if you like, simply by passing the same queue name to the decorator.

I have managed to find a number of places where using this technique was useful. For example, I created a web based UI for submitting documents that were to be checked into a Subversion repository. Naturally I did not want it to attempt a Subversion checkin while another one was in process, so I passed it through a Function using the atomic() decorator.



1 Python does have a module called abc for defining abstract base classes, but I did not use it for this purpose.

No comments:

Post a Comment