Patching Django’s Memcached cache.set() Method

Django’s memcached set method looks innocuous enough:

    def set(self, key, value, timeout=0, version=None):
        key = self.make_key(key, version=version)
        self._cache.set(key, value, self._get_memcache_timeout(timeout))

However it suppresses the return value of the underlying call to memcached’s set command. In most cases this is fine – calls to memcached usually never fail. However, in those rare cases where something goes wrong, it makes debugging far more difficult. I recently had one of those “This cannot happen” moments, which ultimately turned out to be downstream of a silent failed cache.set() call. It would be much nicer if the set call either threw an exception or returned an error if the set failed. (I suggested this to the django maintainers a while back but was summarily rejected.)

Here are two ways to work around this. The first is to patch the set method. Just put this somewhere in your code:

def patched_memcached_set(self, key, value, timeout=0, version=None):
    key = self.make_key(key, version=version)
    return self._cache.set(key, value, self._get_memcache_timeout(timeout))

CacheClass.set = patched_memcached_set

Now you can check the return value of cache.set() – True means success, False means failure.

from django.core.cache import cache
import logging

if not cache.set(key, value):
    logging.error('Failed to set key/val %s/%s in memcache!', key, value)
    # Recovery code

The second way is to use your own cache backend.

class MemcachedCache(BaseMemcachedCache):
    """A (slightly hacked) implementation of a cache binding using python-memcached"""
    def __init__(self, server, params):
        import memcache

        super(MemcachedCache, self).__init__(server, params,
                                             library=memcache,
                                             value_not_found_exception=ValueError)

    # Return the value of set so we can tell if the set was successful or not.
    def set(self, key, value, timeout=0, version=None):
        key = self.make_key(key, version=version)
        return self._cache.set(key, value, self._get_memcache_timeout(timeout))

This time you’ll need to do one more thing – install the cache backend in your settings.py file:

CACHES = {
    'default': {
        #'BACKEND' : 'django.core.cache.backends.memcached.MemcachedCache',
        'BACKEND' : 'apps.utils.memcached.MemcachedCache',
        'LOCATION': [
            '10.1.1.101:11211',
            '10.1.1.102:11211',
            '10.1.1.103:11211',
        ],
    }
}

Easy enough. Now, as before, you can check the return value of cache.set() to see if your set actually succeeded.

I strongly recommend doing one of these options if you’re running django and memcached. It’ll save you a ton of time debugging when you hit the rare case of cache.set() failing.

Posted in Uncategorized | Tagged , | 2 Comments

Using Memcached As A Distributed Lock From Within Django

I recently needed to lock a critical section of code to a single thread despite running in an environment with django on several servers. After a little bit of searching I came across the popular idea of using memcached’s “add” command, which succeeds if the key doesn’t already exist, and fails otherwise. All access for the same key is serialized, so with this you can build a nice distributed lock. In fact I found a nice one turned into a python context manager, so I started using it, and it seemed to work great. You just do something like

with dist_lock(my_key):
    # Critical section
    pass

Even though you might be running on different machines, the memcached “add” command ensures that only one thread gets into the critical section per value of “my_key”.

Then we got sporadic odd bug reports. Should have tested things a little more. The code in that link above has a subtle bug. If a second thread doesn’t end up ever getting the lock, it still ends up releasing it.

Here’s my attempt at fixing up the problem. (Note the addition of the “got_lock” variable):

import time
import logging
import contextlib
import random
from django.core.cache import cache as django_cache

class MemcacheLockException(Exception):
    def __init__(self, *args, **kwargs):
        Exception.__init__(self, *args, **kwargs)

@contextlib.contextmanager
def memcache_lock(key, attempts=1, expires=120):
    key = '__d_lock_%s' % key

    got_lock = False
    try:
        got_lock = _acquire_lock(key, attempts, expires)
        yield
    finally:
        if got_lock:
            _release_lock(key)

def _acquire_lock(key, attempts, expires):
    for i in xrange(0, attempts):
        stored = django_cache.add(key, 1, expires)
        if stored:
            return True
        if i != attempts-1:
            sleep_time = (((i+1)*random.random()) + 2**i) / 2.5
            logging.debug('Sleeping for %s while trying to acquire key %s', sleep_time, key)
            time.sleep(sleep_time)
    raise MemcacheLockException('Could not acquire lock for %s' % key)

def _release_lock(key):
    django_cache.delete(key)

I use it like this:

try:
    with memcache_lock(my_key):
        # Critical section
        pass
except MemcacheLockException:
    # Never got the lock

This seemed to work great! Then we got odd sporadic bugs. Should have tested more.

Turns out django’s use of memcached isn’t entirely optimal. For each request, django creates a new client to your set of memcached servers. That means a new tcp connection (assuming you’re using tcp) to each memcached server in your cluster. Even though having more memcached servers typically helps with scale, this behavior can show up as a performance hit with high load and more memcached servers. That is, if you have 5 memcached servers, each django request causes 5 tcp connections to be created (and then subsequently destroyed when the request is over). I’ll have another blog post on how I’ve mitigated that issue coming soon.

Anyway, under high load weird things can happen. Resources get scarce, CPU or IO becomes a bottleneck, and connections time out. I am using the python-memcached library, a pure python interface to memcached, which is perfectly good, but was not designed to be used in this environment.

During a memcached call the library needs to figure out which memcached server to use based on a hash of the key. Suppose the key hashes to memcached_1, and suppose that at that moment attempts to connect to memcached_1 fail. The code then rehashes the key, probably leading to a new memached server, and tries to connect again.

    def _get_server(self, key):
        if isinstance(key, tuple):
            serverhash, key = key
        else:
            serverhash = serverHashFunction(key)

        for i in range(Client._SERVER_RETRIES):
            server = self.buckets[serverhash % len(self.buckets)]
            if server.connect():
                #print "(using server %s)" % server,
                return server, key
            serverhash = serverHashFunction(str(serverhash) + str(i))
        return None, None

This is probably ok in a situation where you’ve configured multiple memcached servers and one is down for a long time, but it’s not ok for the occasional network blip. Supposed two requests come in under load, both reach the critical section of the memcached lock, and one hashes the key to memcached_1 while the other hashes to memcached_2. No locking will occur, and two threads will access the same critical section of code at the same time for the same key. Actually, we don’t have to suppose this will happen. It will happen. It happened.

There are probably a bunch of ways to fix this, but for me I wanted a consistent hash. If key1 hashes to memcached_1, it should always hash to memcached_1. And if I can’t talk to memcached_1, fail the request back to the caller. So, I simply decided not to retry. Here’s how I made that happen.

First, I created a new cache backend by cribbing from django’s MemcachedCache. Before the class is fully loaded, I reach into the python-memcache library and set some constants.

from django.core.cache.backends.memcached import BaseMemcachedCache

class MemcachedCache(BaseMemcachedCache):
    """A (slightly hacked) implementation of a cache binding using python-memcached"""
    def __init__(self, server, params):
        import memcache

        # Monkey patch memcache library so it doesn't retry on failure to connect to memcache servers.
        # If we allow it to retry on connection failures it uses a different hash on subsequent
        # attempts, and this can lead to us using a different memcache server for a key than would
        # normally be used.  Future reads of that key can thus return missing or wrong data.
        memcache.Client._SERVER_RETRIES = 1
        memcache._SOCKET_TIMEOUT        = 20
        memcache._DEAD_RETRY            = 0

        super(MemcachedCache, self).__init__(server, params,
                                             library=memcache,
                                             value_not_found_exception=ValueError)

Then, I use this new cache backend in my settings.py file instead of the typical one provided by django:

CACHES = {
    'default': {
        #'BACKEND' : 'django.core.cache.backends.memcached.MemcachedCache',
        'BACKEND' : 'apps.utils.memcached.MemcachedCache',
        'LOCATION': [
            '10.1.1.101:11211',
            '10.1.1.102:11211',
            '10.1.1.103:11211',
        ],
    }
}

And voila! A hardened distributed lock using memcached for use within django.

Couple things to think about with this approach:

  • Disabling retries in python-memcache means memcache calls will fail if the code can’t connect. If you’re doing long maintenance where one or more of your memcached servers are down for a long time, consider either removing them from LOCATION or restoring retries.
  • Memcached can crash (although I’ve never seen this in production), be restarted, or run out of memory and drop your keys/values. Any of these at exactly the right moment will undermine this distributed locking strategy.
Posted in django, memcached, python | Tagged | 2 Comments