Sunday, November 10, 2013

Client Side SSL Certificates on the JVM

Recently I finished the course Functional Programming Principles in Scala and I've been wanting to try out some real world Scala code. I decided to rewrite a small screen scraper that I had written in Python to get a feel for how to do the same thing in Scala. The first task was to find an HTTP client library to use, and I chose Play framework's WS API since Play is well known and has documentation.

The site from which I am scraping requires a client side SSL certificate, which is very easy to deal with using the Requests library for Python, simply by passing the path of the certificate to the constructor of a session. But the documentation for Play gives a somewhat unclear explanation on how to achieve this, by saying that you first need direct access to the underlying AsyncHttpClient instance and customizing it:

WS does not support client certificates (aka mutual TLS / MTLS / client authentication). You should set the SSLContext directly in an instance of AsyncHttpClientConfig and set up the appropriate KeyStore and TrustStore.

The problem is, there does not seem to be a way to pass a customized AsyncHttpClientConfig to the client, since the passing of the config is hidden inside private methods (see here if you are interested).

After reading lots of confusing documentation on Java and SSL (practically none exists for Scala), it turns out the solution is not specific to either Scala or Play Framework. The JVM will handle your certificates for you, and the AsyncHttpClient used by Play just uses the ones that the JVM knows about. So your code does not need to specify the certificate, instead you can import it into a keystore and pass the path to the keystore to the JVM using command line arguments. Here is how I did it.

First, you need the client certificate, its private key, and the CA certificate of the authority that signed your certificate. I will call them example.crt, example.key, and cacert.crt. Mine were in PEM format, but they must be converted to PKCS12. The openssl command can do this for you.

openssl pkcs12 -export -in example.crt -inkey example.key -out certs.p12 -name client -CAfile cacert.crt -caname root

While the command is running, you will be asked to provide a password to protect the output file. The output file is called certs.p12, which you can use to create a JVM keystore using the keytool command that is included with the JDK.

keytool -importkeystore -deststorepass p@ssw0rd -destkeypass p@ssw0rd -destkeystore keystore.jks -srckeystore certs.p12 -srcstoretype PKCS12 -srcstorepass p@ssw0rd -alias client

The -srcstorepass argument needs to be the password that you provided when converting your certificate to PKCS12 format. Now you will have a keystore file called keystore.jks. When you fire up your JVM, pass to it the path of your new keystore and its password.

java [args ...] -Djavax.net.ssl.keyStore=/path/to/keystore.jks -Djavax.net.ssl.keyStorePassword=p@ssw0rd

Now your Play WS client will automagically use the client certificate from the keystore. For security purposes, you probably don't really want to pass a command line argument containing the password of your keystore, so you may set the properties in your code by reading the password from a properly protected file or some other source. Also, don't actually use 'p@ssw0rd' for your password! But this should be enough to illustrate what is required to get it working.

Thursday, July 25, 2013

Sequential access to resources in Python

I was in a situation where I wanted all access to an SQLite database from multiple threads to happen sequentially. Initially I used a lock to keep more than one thread from accessing the database at a time, which worked fine. But I wasn't really satisfied. It wasn't fun enough. Taking inspiration from this excellent answer on stackoverflow, I started playing around with making access to the database happen sequentially using queues. The idea was to place the SQL query onto a queue which has a single worker waiting for it. When the worker receives the query, it runs it and then puts the result onto a return queue. Since one worker can only run one task at a time, the queries will always happen in the order received. But I did not want to deal with the queues explicitly, instead I wanted library code to handle it behind the scenes. Furthermore, I wanted to expand this to deal with more than just SQL queries; I wanted to sequentialize any task in this way.

I started with an abstract base class which I called Task. This is not truly an abstract class because, well this is Python1, and Python does allow you to make a direct instance of it. But in order to be useful, Task needs to be subclassed and extended which I'll explain below. Task contains the request for the resource, or "thing to run", and also a return queue on which to put the result when it is available. It ties the request to the result so it doesn't get lost among all the other requests. Since I wanted to make this a general purpose system, I decided there should be separate queues for different types of requests, so Task also contains the name of the queue on which its requests are placed so they end up in the right place.

There is also a dispatcher to route the requests to the right queue. This would not be needed if this was only for doing SQL queries, but as I said, I wanted it to be general purpose. I also added a get_queue() function to make the _queues variable accessable cleanly outside of the module.




To do anything useful, Task must be subclassed and must have an executor method. The executor is a static method which runs in a thread started by the dispatcher. Its job is to wait on the queue and processes requests, putting the results onto the return queue.

For my SQLite queries, I wanted to be able to do transactions, so the query Task implementation, which I called Query, needs to be able to run multiple queries at a time. I thus included an add() method that appends queries to a list, rather than, say, having a single query passed into the constructor. Multiple add() calls can be chained before calling run(), which does the actual sending off to the dispatcher, which then places it on the correct queue on which the executor is waiting.



In cases where multiple queries are run, I decided to place only the last result onto the return queue. This wasn't a problem for me, since in most cases where I would run multiple queries I didn't care about a return value, and it was only when doing a (single) select that I wanted a result.

Finally, the dispatcher needs to be started with the start() function before anything will work. I also included a shutdown() function.



To use, create a Query object, add one or more queries, then run().



I also added some convenience functions for running queries without having to create the Query object directly.



This worked out nicely for the database access. Now I wanted to make it so I could limit access to functions in the same way. I made a subclass of Task called Function, whose job is to run functions, and I created a decorator called atomic() to make any function pass through a Function.



The atomic() decorator takes an argument which represents the name of the queue on which the requests will be placed. The nice thing about it is that it allows you to group functions together onto the same queue if you like, simply by passing the same queue name to the decorator.

I have managed to find a number of places where using this technique was useful. For example, I created a web based UI for submitting documents that were to be checked into a Subversion repository. Naturally I did not want it to attempt a Subversion checkin while another one was in process, so I passed it through a Function using the atomic() decorator.



1 Python does have a module called abc for defining abstract base classes, but I did not use it for this purpose.

Sunday, July 19, 2009

Web Mail with Postfix, Dovecot, and Hastymail

I wanted to be able to check my email from anywhere, so I looked into what webmail options there were. A Google search quickly pointed to Hastymail as a good possible solution, so I decided to give it a try. I've been running Postfix and Dovecot together for a long time and I didn't know if I would run into any issues that might be specific to the servers I was using. As it turns out, there were no settings specific to either of those and it couldn't have been easier.

Firstly, Hastymail is written in PHP so I had to install that first. On Ubuntu Server, I just did

# apt-get install php5-cgi
# apt-get install php5-cli

I downloaded Hastymail from http://www.hastymail.org, and untarred it into the document root, which in my case is /www/vhosts/mail.example.com. I also linked the resulting directory to hastymail2 so that I wouldn't have to modify the web server settings if I decided to upgrade hastymail later. I could just point the symlink to the new version.

$ cd /www/vhosts/mail.example.com
$ gunzip -c hastymail2_rc_6.tar.gz | tar xvf -
$ ln -s hastymail2_rc_6 hastymail2

Then I created some directories Hastymail needs.

# mkdir /etc/hastymail2
# mkdir /var/lib/hastymail2
# mkdir /var/lib/hastymail2/attachments
# mkdir /var/lib/hastymail2/user_settings

I changed file ownership to the user that the web server will run as.

# chown -R www-data:www-data /var/lib/hastymail2

There is a config file in the document root called hastymail2.conf.example that comes with it. Copy this to /etc/hastymail2/hastymail2.conf. There were only a few changes I needed to make to get this up and running. First was the url_base variable. Since I am putting the mail application at the root of mail.example.com, I set this to "/".

url_base = /

Secondly, I changed it to use https. I don't know why this isn't set by default.

http_prefix = https

I also changed attachments_path and settings_path to the directories I created earlier.

attachments_path = /var/lib/hastymail2/attachments
settings_path = /var/lib/hastymail2/user_settings

Now all I had to do was run the install script.

# php /www/vhosts/mail.example.com/hastymail2_rc_6/install_scripts/install_config.php /etc/hastymail2/hastymail2.conf /etc/hastymail2/hastymail2.rc

The web server needs to be set up next. I decided to use lighttpd with FastCGI. On Ubuntu (or Debian), you can put your custom configs into /etc/lighttpd/conf-available, then link the ones you want to activate into /etc/lighttpd/conf-enabled. Lighttpd comes with some of these custom configs, and you will need to activate the fastcgi one.

# cd /etc/lighttpd/conf-enabled
# ln -s ../conf-available/10-fastcgi.conf

Also, if you're like me you'll want to activate SSL. The SSL config that comes with lighttpd looks for a certificate called server.pem in /etc/lighttpd, so you'll need to have that. I created a self signed certificate, using instructions at http://www.cyberciti.biz.

# ln -s ../conf-available/10-ssl.conf
# cd /etc/lighttpd
# openssl req -new -x509 -keyout server.pem -out server.pem -days 365 -nodes

My virtual host config was very simple. I put this into a file called /etc/lighttpd/conf-available/90-mail.example.com.conf:

$HTTP["host"] == "mail.example.com" {
# Redirect login page to SSL
$HTTP["scheme"] == "http" {
url.redirect = ( "^/$" => "https://mail.example.com/" )
}
accesslog.filename = "/var/log/lighttpd/mail.example.com-access.log"
server.document-root = "/www/vhosts/mail.example.com/hastymail2"
}

Then I linked to it from the conf-enabled directory.

# cd /etc/lighttpd/conf-enabled
# ln -s ../conf-available/90-mail.example.com

After this when I started lighttpd, I was greeted by the mail login.

Saturday, July 4, 2009

Simple File Cache Between Python Processes

Here is a simple way to cache pickled data between multiple Python processes.

#!/usr/bin/env python

from __future__ import with_statement

import cPickle as pickle
import os
import sys
import time

CACHE_TTL = 300 # seconds
LOCK_TTL = 60
DATA_FILE = '/tmp/pickle_file'
LOCK_FILE = '/tmp/pickle_lock'

def get_data():
data_store = {}
locked = False
if os.path.exists(LOCK_FILE):
lock_file_info = os.stat(LOCK_FILE)
# Check if lock file is stale
if lock_file_info[9] + LOCK_TTL < time.mktime(time.localtime()):
os.remove(LOCK_FILE)
else:
locked = True
if not os.path.exists(DATA_FILE):
lock_file_obj = open(LOCK_FILE, 'w')
data_store = refresh_data()
data_file_obj = open(DATA_FILE, 'w')
pickle.Pickler(data_file_obj).dump(data_store)
data_file_obj.close()
lock_file_obj.close()
os.remove(LOCK_FILE)
else:
data_file_obj = open(DATA_FILE, 'r')
data_store = pickle.Unpickler(data_file_obj).load()
data_file_obj.close()
if not locked:
if time.mktime(time.localtime()) > data_store['expire']:
lock_file_obj = open(LOCK_FILE, 'w')
data_store = refresh_data()
data_file_obj = open(DATA_FILE, 'w')
pickle.Pickler(data_file_obj).dump(data_store)
data_file_obj.close()
lock_file_obj.close()
os.remove(LOCK_FILE)
return data_store

def refresh_data():
"""
Only needs to return a dictionary with 'expire' key set
"""
data_store = {}
with open('/dev/random', 'r') as rnd:
data_store['k1'] = rnd.readline()
data_store['expire'] = time.mktime(time.localtime()) + CACHE_TTL
return data_store


print get_data()

Thursday, June 18, 2009

Drawing in a UIView on the iPhone

Stanford University has a class for iPhone development online that you can follow here. I got stuck on assignment 3, because you are asked to draw polygons in a UIView, but they give no information on how to do drawing. The Apple documentation is pretty dense, but I was able to figure it out after a bit of trial and error.

Update: I discovered that the Stanford lecture 5 does cover drawing, out of order from assignment 3, but this may still be useful as a bare bones introduction to drawing.

The API you will use for drawing is the Core Graphics framework, or Quartz 2D API, and is in C, not Objective-C. Because of this, in some cases you might need to do some conversions between C data types and Objective-C objects.

Most of the drawing functions operate upon the CGContext data type, and take a reference to the CGContext as the first argument. The context contains the state of the view that you are going to draw, such as the coordinates of the points to be drawn, background colors and such.
To get this reference, simply call UIGraphicsGetCurrentContext(). This is from the UIKit framework and is a C function. The UIKit framework is the iPhone counterpart to the Cocoa AppKit.

When your iPhone app is run, a message is sent to the drawRect: method, which you must implement in your UIView subclass. This is where your drawing code will go. One thing that initially confused me about this is that drawRect: implies drawing a rectangle, but it refers to the entire view, not the contents within it. I was expecting to find methods such as drawCircle, drawPolygon, etc. But as I said above, the actual drawing is not done in Objective-C, it is done by the Core Graphics framework, and drawRect: is a place to contain those operations.

The following code snippet draws a triangle.


- (void)drawRect:(CGRect)rect
{
// Start by getting your context, you will need it
// for the rest of the drawing functions.
CGContextRef context = UIGraphicsGetCurrentContext();

// This array holds all of the points to be put into the context.
CGPoint points[3];
points[0] = CGPointMake (10.0, 10.0);
points[1] = CGPointMake (100.0, 10.0);
points[2] = CGPointMake (10.0, 100.0);

// Set background color of the context to white.
// Arguments 2, 3, and 4 represent red, green, and blue, respectively.
// The last argument is the opacity.
// The value ranges are between 0.0 and 1.0.
CGContextSetRGBFillColor(context, 1.0, 1.0, 1.0, 0.0);

// Paint the base of the context.
CGContextFillRect(context, rect);

// Set color of the lines to black, with full opacity.
CGContextSetRGBStrokeColor(context, 0.0, 0.0, 0.0, 1.0);

// Add the coordinates of the lines to the context.
// The last argument is the number of elements in the array.
CGContextAddLines(context, points, 3);

// Connect the end of the last line to the first
// to close the triangle.
CGContextClosePath(context);

// Finally draw your lines.
CGContextStrokePath(context);
}


When you run this, you should see a right triangle in the upper left corner of the view.

If you see that any of this information is inaccurate, let me know and I'll correct it.