Consistent hashing

Tuesday 18 March 2008

Like most production web sites these days, we use memcached to cache all sorts of data to increase responsiveness. It’s great, partly because it’s really simple. One of its simplicities, though, is that the available client libraries use a very simple hashing mechanism to assign a resource to a server. If you have N memcached servers, the server id is just (hash(key) mod N).

This isn’t great if your servers come and go, because only 1/N resources get mapped to the same server when adding or removing a server. Also, if some servers have much larger capacity than the rest, either the big servers won’t fill up, or the small servers will have a lower hit rate as they flush more frequently. Consistent Hashing is a technique for assigning resources to servers much more flexibly. When adding a new server, only 1/N resources change their server, and different servers can be assigned varying fractions of the resources.

Now we just need a Python memcached implementation...

Comments

[gravatar]
John Cavanaugh 7:53 PM on 18 Mar 2008

Ned, check out

svn://svn.audioscrobbler.net/misc/ketama/

[gravatar]
Ned Batchelder 8:08 PM on 18 Mar 2008

John, thanks, but that's a PHP library. Good to know people are getting close though...

[gravatar]
John Cavanaugh 11:22 PM on 19 Mar 2008

In the svn repos there is a python wrapper library for a c based consistent hashing library. Its not pure python, but it looks to be a simple compile away. ;-)

[gravatar]
amix 9:53 AM on 20 Nov 2008

You may want to check out: http://amix.dk/blog/viewEntry/19367 - Pure Python implementation of consistent hashing.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.