This time a year ago we were deploying new code to using a completely custom deployment system written in C++. And I don’t mean real C++; it was more like C with classes, where the original developers decided that std::string was “not fast enough” and wrote their own string class struct:

    const uint8_t *pbData;
    size_t cbData;

It’s not our idea of fun to worry about buffer sizes and string lengths when writing high-level deployment scripts.

Then there was the NIH distributed file transfer system — client and server. And our own diffing library, just for fun. All very worthwhile things for a hotel review website to spend time developing in-house! :-)

Screenshot of our Ansible-based deployment

Sarcasm aside, this wasn’t a joke: we replaced more than 20,000 lines of C++ code with about 1000 lines of straight-forward Ansible scripts. And it really did restore our sanity:

  • Rather than 28 manual steps (some of which, if you ran out of order, could bring the site down) we run a single Ansible command. All we have to specify manually is which revision to deploy and type in some deployment notes to record to our internal log (for example, “Shipped mobile version of hotel page”).
  • Instead of spending hours digging into log files on a remote machine whenever our fragile home-grown system broke, Ansible gives us clear and generally easy-to-track down error messages. The most we have to do is SSH to a machine and manually restart something.

Choice of tools

Some teams within TripAdvisor use Chef for server setup (and other tools like Fabric for code deployments). We also looked briefly at Puppet. However, both Chef and Puppet gave us a very “enterprisey” feel, which isn’t a great match for our team’s culture.

This is partly due to their agent-based model: Chef, for example, requires a Chef server in between the runner and the nodes, and requires you to install clients (“agents”) on each of the nodes you want to control. I think this picture gives a pretty good idea of the number of components involved:

Chef Diagram

In contrast, Ansible has basically five parts:

  • playbooks
  • inventory files
  • vars files
  • the ansible-playbook command
  • nodes

I’m sure there are advantages and more power available to systems like Chef, but we really appreciated the simplicity of the Ansible model. Two things especially wooed us:

  1. You don’t have to install and maintain clients on each of the nodes. On the nodes, Ansible only requires plain old SSH and Python 2.4+, which are already installed on basically every Linux system under the sun. This also means developers don’t have to learn a new type of authentication: ordinary SSH keys or passwords work great.
  2. Simple order of execution. Ansible playbooks and plays run from top to bottom, just like a script. The only exception to this is “handlers”, which run at the end of a play if something has changed (for example, to reload the web server config).

Ansible Tower Screenshot

Ansible Tower UI

Ansible itself is free and open source and available on GitHub. But they also provide a fancy web UI to drive it, called Ansible Tower. It’s nice and has good logging and very fine-grained permissions control, but we found it was somewhat tricky to install in our environment, and as developers it didn’t gain us much over running a simple command.

Our thinking is that in a larger organization, where they need finer-grained permissions or logging, or where non-developers need to kick off deployments, using Ansible Tower would pay off.

Our deployment scripts

As noted above, Ansible has a very simple order of execution, and its model is kind of a cross between declarative (“get my system configuration into this state”) and imperative (“do this, run this, then try this”). Our site deployment involves some system configuration, but is mostly a series of steps that “orchestrate” the deployment. Here’s more or less how it works:

  1. Setup: update code, run tests on staging server, upload new static assets.
  2. Turn off B app servers, run on A (we have 8 Python app servers in each group).
  3. Update code on B app servers.
  4. Turn off A app servers, run on B (making new code live on B).
  5. Update code on A app servers.
  6. Make all A and B app servers live.
  7. Record deployment log and send “finished deployment” email.

To show you some example Ansible code, step 3 (and step 5) use the following code:

- name: Update code on B app servers
  hosts: app_b
  - name: Update code on app servers
    subversion: repo={{ svn_repo }} dest={{ code_dir }} username={{ svn_username }}
                password={{ svn_password }} revision={{ svn_revision }}

  - name: Restart app service
    service: name=server-main state=restarted

  - name: Wait for app server to start
    wait_for: port={{ app_port }} timeout=300

  - name: Check that new version is running
    local_action: uri url=http://{{ inventory_hostname }}:{{ app_port }}{{ version_url }}
    register: response
    failed_when: response.json['SvnRevision'] != {{ svn_revision }}

As you can see, Ansible uses fairly straight-forward YAML syntax. In the above code, Ansible runs these tasks against our 8 “app_b” hosts in parallel — a simple but powerful concept.

For a given “play” such as the above, each task is executed in order — we really appreciated how it doesn’t try to outsmart you in terms of how and when things run. The only exception to this is Ansible’s handlers, which are tasks run at the end of a play, but only if something “notified” them. For example, in our deployment, handlers are used to restart our nginx servers when the nginx config file changes.

You’ll see there are a lot of {{ variables }} used here: each task line is actually a Jinja2 template string that is rendered against your current set of host variables. This makes it very easy to modify settings which change depending on environment (staging, production, etc). It also separates playbooks from user-specific data, meaning settings aren’t hard-coded in playbooks and folks can share them much more easily.

We deploy solely to Linux-based machines (about 50 nodes), and Linux is where Ansible started and where it excels. However, we have something of a Windows history, so it was interesting to learn that as of August 2014 (version 1.7), they started adding support for managing Windows machines — this is done via Powershell remoting rather than SSH.

In short, what sold Ansible to us was:

  • Simple YAML-based syntax
  • Simple execution order: top to bottom, and then handlers
  • Powerful: Jinja2 templates, large library of builtin modules
  • Agentless: no client to install and maintain

Pre-Ansible, we dreaded our 28-manual-step deployments. Post-Ansible, it’s almost fun to deploy code, and the focus is on the features we’re deploying, instead of “what’s going to go wrong this time?”. So I hope you get the chance to try Ansible! And no, we weren’t paid to link that…

{ 1 comment}

When Building Your Own CMS is the Right Choice

by Alex on February 16, 2015

In the latter half of last year, we decided to replace the CMS that powers the content on Actually we replaced three CMSs with a single one. Oyster is primarily in the business of creating content such as our in-depth hotel reviews, roundups, slideshows, and various other articles that help travelers spend their hard-earned vacation days and dollars wisely. So we knew it was an important task to build the best tool we could to enable our writing and editing staff to put out high quality content easily.


Obviously we’ve been doing this for a while, so we had tools in place, but we had reached a point where we needed to make a change. As I mentioned above we had three different CMSs that we used: one for hotel reviews, one for articles (both of these were custom), and a WordPress blog for blog posts.

Documents and Structure

Pros, Cons & Bottom LineThe custom editors were used for creating structured documents which consisted of a number of sections (such as the Pros, Cons, and Bottom Line sections of our hotel reviews) which in turn consisted of a number of fields. These were stored in a custom text format and any text formatting was stored as wiki markdown. This made it harder than it needed to be to update documents’ structure or create new document types since all the code that parsed and rendered the documents was custom. Also the UIs for the custom editors were due for a good refresh.

The WordPress editor presented different problems. WordPress is quite good for producing a nicely formatted bit of text, but what you get when you write a post is a big blob of HTML. Formatting and styles are all mixed in with your content. Also everything is totally static, so if a hotel changes names or closes, or some other piece of information in our hotel database changes, it doesn’t get updated in the blog post. We knew we wanted our blog content to be integrated with the same database used by our hotel reviews so we could more easily surface rich information about hotels and pricing.

We realized we wanted to keep the concept of structured documents; our hotel reviews have a well-defined format, and we need to be able to write those in a structured way. Similarly our roundups always consist of an intro section and a list of hotels each with a short blurb relevant to the roundup topic. At the same time we want our writers to have the flexibility to produce more freeform blog-oriented content with a degree of flexibility for formatting. We decided we could do this by defining a set of formatting blocks that reflected style conventions they were already using, with an eye to extending these fairly easily as needed. This frees up the writers from having to focus on layout and focus on what they want to say. Having well-defined formatting blocks, or “widgets,” also means we can create responsive templates for how the articles display – we can make it look good on a desktop, phone, or tablet since the documents contain content information and not layout information.

JSON Documents

When deciding how the documents should be stored, it was really a no-brainer that they should be stored as JSON. JSON plays well with pretty much anything these days. Any server side language you use (we use Python) ought to be able to parse JSON into a useful data structure with a few lines of code. We use Postgres for our content database, and Postgres has a built-in JSON type that you can query against, put indexes on, and use with various functions and operators. Storing documents as JSON in the database means we don’t have to change the database schema every time we want to add a new widget type or document field, but we don’t really have to compromise on queryability either.

Part of a Slideshow Document

    "IsOrdered": true,
    "IsAward": true,
    "ShowAboutOyster": false,
    "FeaturedOnArticles": false,
    "Title": "Best Beach Hotels in Miami",
    "Intro": "<p>A team of Oyster reporters has made multiple trips to Miami to visit nearly 200 hotels. We slept in the beds, lounged by the pools, ate in the restaurants, and even sampled the nightlife, all with an eye toward selecting the most distinguished properties. Here's a list of our favorite beachfront hotels.</p>",
    "Hotels": [
            "Url": "/miami/hotels/the-setai/",
            "Type": "Hotel",
            "Blurb": "<p>Paradise doesn't come cheap. Striking but sober mood-lit design; impeccable service; huge, immaculate rooms; three pools, each a different temperature; and a prime beachside location make the Setai one of the best hotels in Miami. Its restaurants are more about design than food, but several of Miami's best restaurants are just half a block away.</p>",
            "PhotoUrl": "/miami/hotels/the-setai/photos/beach-the-setai-v134081"
            "Url": "/miami/hotels/w-south-beach/",
            "Type": "Hotel",
            "Blurb": "<p>The stunning new 312-room W South Beach -- located on the beach, on the northern outskirts of South Beach -- blends cute comforts, intricate design (that spares no expense), and flawless service. Large, modern rooms; terraces angled to overlook the ocean; elegant landscaping around the pool; a freshly-opened spa -- the W tops the Miami greats.</p>",
            "PhotoUrl": "/miami/hotels/w-south-beach/photos/beach-w-south-beach-opening-may-2009-v289241"

And of course the UI for our CMS is web-based, which means the bulk of the functionality is written in JavaScript, so working with JSON on the front end is extremely easy.

I mentioned before that our documents have a well-defined structure, but a JSON object just consists of arrays, strings, numbers, booleans, and more nested objects. What we needed was a way to define how a given JSON document of a certain type is supposed to look – a JSON schema if you will. So we used, unsurprisingly, JSON Schema. Much like XML DTD does for XML documents, JSON Schema lets you define how a JSON object should be structured, and the definition itself is a JSON object. It provides the basics such as what types of values are allowed for a given property, which properties are required, max and min ranges, enums, regexes, most of what you’ll need. You can also have nested schemas, so we can define a “Slide” schema, and then say the “Slideshow” schema consists of a title, an intro paragraph, and one or more Slides.

JSON Schema for Slideshow documents

"Slideshow": {
    "allOf": [
        {"$ref": "#/definitions/baseArticle"},
            "properties": {
                "IsAward": {"type": "boolean", "default": false},
                "Slides": {
                    "type": "array",
                    "items": {"$ref": "#/definitions/Slide"}
            "required": ["Slides"]

Well it’s nice to have your document structure defined, but you have to do something with that information. Namely you want to be able to validate your documents and get useful error information when the validation fails. For that we used the Python jsonschema package. When a writer saves a document in our CMS, it sends a JSON object to the server. The CMS back end validates the document against the relevant schema, and we get back a handy error tree that tells us what went wrong. Since the document structure on the server matches the structure on the CMS UI, it’s not too hard to parse that error tree and match error messages to input fields on the writer’s screen to show them some helpful feedback: “This field is required,” “This is an invalid URL,” and so on.

errors Error Handling

UI Concerns

That brings us to the front end of the CMS. There are of course various pages that allow you to search through documents by types and tags, and see the editing history, but most of that was ground we had covered before. A large part of the work done was on the document editor itself – the interface our writers use for creating a single document.

To create the document editor, we wrote a healthy amount of JavaScript. Basically the editor needs to take the concept of structured Documents composed of Widgets and show the user an intuitive UI for writing and editing.

So let’s say you’ve started writing a new Travel Guide article. What happens is an animated paperclip with eyes pops up and says “I see you’re writing a new Travel Guide, need some help?” Wait no, that’s not what happens.


What you’ll see is a mostly blank document with inputs for some top-level fields and then spaces to populate with different Widgets. The toolbar on the right hand side has a list of the relevant Widget types which you can drag into place in the document. All the Widgets can be dragged and dropped in the different places they can go in the Document.

Each Widget contains various fields for text, URLs, checkboxes, drop-down menus, etc. Some fields allow WYSIWYG editing with a whitelisted subset of HTML tags via the WYSIHTML5 editor. We store the HTML as HTML in the JSON rather than markdown – since we allow such a small list of tags, and the editor outputs well-formed markup, it’s perfectly safe, so we figured why go through an extra encoding and decoding step?

The design for how the editor turns a JSON document into a UI and back again is quite involved, and highly structured, but follows the principles of how the documents are organized. Each Document type, Widget type, and Field type corresponds to a JavaScript class in our editor code, and these classes all inherit from a common base class.

When the editor starts up to edit a Document, we just pass in a JSON object from the server. Each object and nested object in the JSON has a “Type” field which is used to call the proper constructor for the main Document object and its constituent Fields and Widgets to turn the plain JSON objects into class instances. These classes provide, at minimum, a fromJSON method to assign JSON properties to instance properties, and a toJSON method to put the instance properties into a plain JavaScript object that’s ready for serialization. Some classes also provide methods for things like sanitizing input, formatting error messages, providing a word count, etc.

A method for turning a class instance into a simple JSON object

models.Model.prototype.toJSON = function () {
    // return a JSON-friendly object

    var jsonObj = {}, property, jsonProperty, value;

    for(property in this) {
        value = this[property];
        if (value.toJSON) {
            // JSON properties are uppercased
            jsonProperty = util.ucFirst(property);

            jsonObj[jsonProperty] = value.toJSON();

    if (this.typeName) {
        jsonObj.Type = this.typeName;

    // delete null properties and empty strings
    for (property in jsonObj) {
        value = jsonObj[property];
        if (value === null || value === '' || Array.isArray(value) && value.length === 0) {
            delete jsonObj[property];

    return jsonObj;

So you might rightly ask “okay so you’ve got your document, and you turned that into some other objects, but how do you, you know, do stuff with it?” Well the “doing stuff” part of the editor – putting in text, dragging things around, deleting, and adding things – is all handled with data binding.

If you’re interested at all in building JavaScript-based UIs, you’re undoubtedly familiar with the concept of data binding. In short you have some object or objects which represent your data (in our case, the Document) and the UI which consists of a bunch of DOM nodes. When someone changes the DOM representation of your data, you want that to be updated in the model object, and similarly changes to the data model should be reflected in the UI. Data binding is the practice of doing this in an explicit and automated fashion so that your UI and your data are always in sync.

A number of popular JavaScript application frameworks like Angular, Backbone, and Ember provide data binding as part of what they do, but they tend to be more useful when building large, single page applications as they provide lots of other features such as URL routing, module loading, and dependency injection. They also tend to be pretty opinionated about how your code is structured, and often learning to use them correctly is quite involved.

We really only wanted something to handle the data binding piece – a data binding library rather than a whole application framework. To that end we chose Rivets.js for a number of reasons. Rivets.js is small first and foremost – both in the scope of what it does (it does data binding, and that’s it) and in terms of source code. It also doesn’t care about what your data model looks like – it binds to regular object properties, so your model object can be a plain object or some custom class you wrote, whatever. It’s also easy to learn, easy to use, and easy to extend.

Rivets bindings in use:

<div id="tag-search">
        <input type="text" rv-edit="tagSearch:searchTerm" rv-on-keydown="tagSearch:keyDown">
        <i class="fa fa-search search-main-icon" rv-hide="tagSearch:isSearching"></i>
        <i class="fa fa-circle-o-notch fa-spin search-main-icon" rv-show="tagSearch:isSearching"></i>
        <i class="fa fa-plus search-main-icon" rv-on-click="tagSearch:clickAdd"></i>

        <ul id="tag-results" rv-show="tagSearch:results:length">
            <li rv-each-result="tagSearch:results" rv-class-selected="result:selected" rv-on-click="tagSearch:clickResult">
                <span class="tag-icon" rv-addclass="result:type">
                    <i class="fa fa-tag" rv-match-tag="result:type"></i>
                    <i class="fa fa-map-marker" rv-match-location="result:type"></i>
                    <i class="fa fa-cubes" rv-match-category="result:type"></i>
                <span rv-html="result:html"></span> ({ result:count })

One instance of how we extended Rivets was for templating. Each of our Widget and Field classes have various properties and functionality, and they also each have a particular way they need to display on the screen. We wanted to have a snippet of HTML for each Widget to use as a template, and use Rivets bindings within the template. Frameworks like Angular let you define partial templates, nest them inside eachother, and each template can have its own isolated scope. Rivets doesn’t support this out of the box, but it turned out to be easy enough to add a new template binder that does just that. It’s pretty basic – it doesn’t do lazy-loading or have sophisticated scoping options, but it works fine for what we needed, and you never have to try to remember what the word “transclude” means.

Partial template binder for Rivets.js

rivets.binders['template-*'] = {
    bind: function(el) {
    unbind: function(el) {
        var children = $(el).children(), boundView;

        //unbind the view from the child element
        if (children.length) {
            boundView = $(children[0]).data('templateBoundView');

            if (boundView) {

    routine: function(el, value) {
        var modelName, templateName;

        if (!value) {
            console.log('missing value', el);

        templateName = value.template.toLowerCase();
        modelName = this.type.split('-')[1];

        renderTemplate(el, templateName, modelName, value);

function renderTemplate(el, templateName, modelName, model) {
    var myEl, html, templateData, child, view;
    myEl = $(el);

    if (myEl.html()) {

    //insert the html - must have 1 root element
    html = EDITOR_TEMPLATES[templateName]
    if (!html) {
        throw("Can't find template for: " + templateName);

    //bind the view to the child element
    templateData = {};
    templateData[modelName] = model;
    child = $(myEl.children()[0]);
    view = rivets.bind(child, templateData);
    $(child).data('templateBoundView', view);

One other way we had to modify our use of Rivets deals with how the library binds to object properties. Out of the box, Rivets detects changes in the data model by wrapping attribute access with getters and setters. That works fine if the properties you want to observe hold primitive values like Strings and Integers and the like – but for some of our bindings we needed to detect changes on entire objects, and especially arrays.

Thankfully Rivets allows you to write new adapters to modify how change detection happens. Also thankfully, we only needed to support recent versions of Chrome on the browser side of things, and Chrome now supports a native Object.observe and Array.observe. So when new Widget gets pushed onto an array inside our Document, the iteration binder that renders the Widgets gets updated automatically.

Too often internal tools don’t get the attention they deserve – they are often left to languish while development resources are put towards customer-facing features and operational concerns. In our situation we felt the need to get our content management tools right and do the smart thing rather than what would necessarily be the easy thing. It took a lot of hard work from our dev team, but so far it has paid off in terms of productivity and agility. There were some interesting challenges along the way, and we’ll have to continually adapt our tools as the business grows, but I think we’ve set ourselves up in a good position from which to move forward.


Saving 9 GB of RAM with Python’s __slots__

by Ben on November 17, 2013

We’ve mentioned before how’s Python-based web servers cache huge amounts of static content in huge Python dicts (hash tables). Well, we recently saved over 2 GB in each of four 6 GB server processes with a single line of code — using __slots__ on our Image class.

Here’s a screenshot of RAM usage before and after deploying this change on one of our servers:

RAM usage before and after deploying this change

We allocate about a million instances of a class like the following:

class Image(object):
    def __init__(self, id, caption, url): = id
        self.caption = caption
        self.url = url

    # ... other methods ...

By default Python uses a dict to store an object’s instance attributes. Which is usually fine, and it allows fully dynamic things like setting arbitrary new attributes at runtime.

However, for small classes that have a few fixed attributes known at “compile time”, the dict is a waste of RAM, and this makes a real difference when you’re creating a million of them. You can tell Python not to use a dict, and only allocate space for a fixed set of attributes, by settings __slots__ on the class to a fixed list of attribute names:

class Image(object):
    __slots__ = ['id', 'caption', 'url']

    def __init__(self, id, caption, url): = id
        self.caption = caption
        self.url = url

    # ... other methods ...

Note that you can also use collections.namedtuple, which allows attribute access, but only takes the space of a tuple, so it’s similar to using __slots__ on a class. However, to me it always feels weird to inherit from a namedtuple class. Also, if you want a custom initializer you have to override __new__ rather than __init__.

Warning: Don’t prematurely optimize and use this everywhere! It’s not great for code maintenance, and it really only saves you when you have thousands of instances.


If you use Python on Windows and you have programs or servers which allocate a lot of items on the heap (both of which we do), you should upgrade to Python 2.7.4. Especially if you do anything with HTTPS/SSL connections.

Python versions 2.7.3 and below use an older version of OpenSSL, which has a serious bug that can cause minutes-long, CPU-bound hangs in your Python process. Apart from the process taking over your CPU, the symptom we saw was a socket.error with the message “[Errno 10054] An existing connection was forcibly closed by the remote host”. This is because the HTTPS request is opened before the OpenSSL hang kicks in, and it takes so long that the remote server times out and closes the connection.

The cause of the bug is actually quite arcane: the Windows version of OpenSSL uses a Win32 function called Heap32Next to walk the heap and generate random data for cryptographic purposes.

However, a call to Heap32Next is O(N) if there are N items in the heap, so walking the heap is an O(N2) operation! Of course, if you’ve got 10 million items on the heap, this takes about 5 minutes. The first connection to an HTTPS server (which uses OpenSSL) essentially brings Python to a grinding halt for this time.

There’s a workaround: call the ssl.RAND_status() function on startup, before you’ve allocated the big data on your heap. That seemed to fix it, though we didn’t dig too deep to guarantee the fix. We were still running on Python 2.6, and given that the just-released 2.7.4 addressed this issue by using a newer version of OpenSSL, we fixed this by simply upgrading to Python 2.7.4. Note that even Python 2.7.3 has the older version of OpenSSL, so be careful.

Other interesting things we found while hunting down this bug:

  • At first we thought this was a bug in Python’s SSL handling, and it turns out there’s a strangely similar bug in Python 2.6’s SSL module. This was interesting, but it wasn’t our problem.
  • Microsoft’s Raymond Chen has a very good historical explanation of why walking the heap with Heap32Next is O(N2), and why OpenSSL shouldn’t really be using this function.
  • You can reproduce the Heap32Next hang just by allocating a ton of Python objects (eg: x = [{i: i} for i in range(500000)]) and seeing the first HTTPS request take ages, with the CPU sitting at around 100%.
  • A blog post with graphs showing Heap32Next’s O(N) behaviour, as well as the connection to OpenSSL.
  • What’s new in Python 2.7.4 notes the update to the bug-fixed OpenSSL version 0.9.8y on Windows.
  • This is the second bug we’ve found due to running something of an eccentric architecture (6GB of website data cached in Python dicts). The other one was related to garbage collection, and incidentally the handling of that was improved in Python 2.7 too. Yes, I know, somebody will leave a comment about how we should be using memcached for this, and they’d probably be right, except for this. :-)


Most of is powered by Python and, but — perhaps surprisingly — this is the first time we’ve had to think about garbage collection. Actually, I think the fact that we’ve only run into this issue after several years on the platform is pretty good. So here’s the saga…

Observing a system alters its state

It started when we noticed a handful of “upstream connection refused” lines in our nginx error logs. Every so often, our Python-based web servers were not responding in a timely fashion, causing timeouts or errors for about 0.2% of requests.

Thankfully I was able to reproduce it on my development machine — always good to have a nice, well-behaved bug. I had just narrowed it down to our template rendering, and was about to blame the Cheetah rendering engine, when all of a sudden the bug moved to some other place in the code. Drat, a Heisenbug!

But not at all random

It wasn’t related to rendering at all, of course, and after pursuing plenty of red herrings, I noticed it was happening not just randomly across 0.2% of requests, but (when hitting only our homepage) exactly every 445 requests. On such requests, it’d take 4.5 seconds to render the page instead of the usual 15 milliseconds.

But it can’t be garbage collection, I said to myself, because Python uses simple, predictable reference counting for its garbage handling. Well, that’s true, but it also has a “real” garbage collector to supplement the reference counting by detecting reference cycles. For example, if object A refers to object B, which directly or indirectly refers back to object A, the reference counts won’t hit zero and the objects will never be freed — that’s where the collector kicks in.

Sure enough, when I disabled the supplemental GC the problem magically went away.

A RAM-hungry architecture

Stepping back a little, I’ll note that we run a slightly unusual architecture. We cache the entire website and all our page metadata in local Python objects (giant dict objects and other data structures), which means each server process uses about 6GB of RAM and allocates about 10 million Python objects. This is loaded into RAM on startup — and yes, allocating and creating 10M objects takes a while. You’re thinking there are almost certainly better ways to do that, and you’re probably right. However, we made a speed-vs-memory tradeoff when we designed this, and on the whole it’s worked very well for us.

But when the garbage collector does decide to do a full collection, which happened to be every 445 requests with our allocation pattern, it has to linearly scan through all the objects and do its GC magic on them. Even if visiting each object takes only a couple hundred nanoseconds, with 10 million objects that adds up to multiple seconds pretty quickly.

Our solution

Response time (ms) vs time, before and after the fix

So what’s the solution? We couldn’t just disable the GC, as we do have some reference cycles that need to be freed, and we can’t have that memory just leaking. But it’s a relatively small number of objects, so our short-term fix was to simply to bump up the collection thresholds by a factor of 1000, reducing the number of full collections so they happen only once in a blue moon.

The longer-term, “correct” fix (assuming we decide to implement it) will be to wait till the GC counts near the thresholds, then temporarily stop the process receiving requests and do a manual collection, and then start serving again. Because we have many server processes, nginx will automatically move to the next process if one of them’s not listening due to this full garbage collection.

One other thing we discovered along the way is that we can disable the GC when our server process starts up. Because we allocate and create so many objects on startup, the GC was actually doing many (pointless) full collections during the startup sequence. We now disable the collector while loading the caches on startup, then re-enable it once that’s done — this cut our startup time to about a third of what it had been.

To sum up

In short, when you have millions of Python objects on a long-running server, tune the garbage collector thresholds, or do a manual gc.collect() with the server out of the upstream loop.


Here at most of our codebase is written in Python, which is of course open source, and we use many open source libraries, including, Babel, and lxml. Now it’s time to give a (tiny bit) back.

Three third-party services we use heavily are Sailthru, QuickBase, and myGengo. When we started with them, they didn’t have Python libraries available, or the Python libraries that were available kinda sucked, so we rolled our own.

Note that the idea here is not a fully-fledged API, but an “at least what we needed” wrapper. It may well be what you need, too, so check out the source code on GitHub. Below are some quick examples.


Sailthru is the email provider we use to send transactional and mass emails.

>>> import sailthru
>>> sailthru.send_mail('Welcome', '', name='Bobby')
>>> sailthru.send_blast('Weekly Update', 'Newsletter', 'The CEO', '',
                        'Your weekly update', '<p>This is a weekly update!</p>')


We use QuickBase as a nice user interface to enter and manage portions of our hotel data, and use this Python module to sync between our PostgreSQL database and QuickBase.

>>> import quickbase
>>> client = quickbase.Client(username, password, database='abcd1234')
>>> client.do_query("{'6'.EX.'Foo'}", columns='a')
[{'record_id': 1234, ...}]
>>> client.edit_record(1234, {6: 'Bar'}, named=False)


Our site is (partially) translated into 5 languages, and myGengo provides an easy-to-use automated translation API. We also run the gettext strings in our web templates through myGengo.

>>> import mygengo
>>> client = mygengo.Client(api_key, private_key)
>>> client.get_account_balance()
>>> client.submit_job('This is a test', 'fr', auto_approve=True)
{'job_id': '1234', ...}
>>> client.get_job(1234)
{'body_tgt': "Il s'agit d'un test", ...}

Pull requests welcome

You’re welcome to send pull requests on our GitHub page, or comment below to send other feedback! If you like or use any of these APIs, we’d love to hear from you.


In March 2012, Apple announced and released the new iPad with a Retina display. There was only a short time between the announcement and when the device arrived in stores. This left a very short time for developers to get a working app out the door. And we wanted to take advantage of the wave of publicity, so we tried to get our app ready the same day the devices shipped.

There had been speculation for months that the next iPad would have a Retina display. But we didn’t want to start working on a Retina version without official specs. It wasn’t even confirmed that Apple would release a new iPad yet. Finally, on March 7th, Apple announced the new iPad with a Retina display. The released the official specs, and later that evening they released Xcode with a Retina simulator. The device would ship on March 16th. We had just over a week!

Porting to Retina

The biggest improvement that we cared about was the “Retina display” — Apple’s name for a screen resolution where the pixels are so small the average human eye can’t see them. This means that angled lines look smoother and more natural. The display has twice the resolution in both the vertical and horizontal directions. But the size of the screen remained the same, so each pixel is one quarter the size.

To minimize issues with the transition to Retina, Apple’s iOS APIs measure things in what they call “points”, rather than straight pixels. In the old iPads one point represents one pixel, in the new iPad one point represents four pixels.

Close-up on a non-Retina device

Close-up on a non-Retina device

The same image on a Retina display

The same image on a Retina display

This makes it relatively simple to upgrade an app. Most of the work actually falls to the artists to generate larger sized images for UI elements. You bundle the new Retina assets in the app, and you don’t even have to modify the code — if the old asset is called “foo.png”, then name the new one “foo@2x.png”, and everything just magically works. But our app is photo heavy and relies on pulling down a lot of images off the web. This took some work to fix.

Huge photo sizes

The biggest amount of work involved making sure we request the right size photo. We have 31 different image sizes of each of our 750,000 hotel photos, and they are not all double the previous size. We chose to solve this in the client — if the app detects it’s running on a Retina device, it requests the Retina-ready photo.

The one challenge I ran into was some button icons that were being served by our web server. The original buttons were 16×16. We already had some 35×35 versions, and so the artist decided to use these, instead of creating new 32×32 ones. Since they were going on a button I figured it would be ok. Until I saw them … and they were just over twice as big as they should have been. The problem was iOS doubling the image size automatically. I didn’t want that — I wanted to use the same magic as UIImage does with “@2x” in the filename. The magic is in the “scale” attribute of the UIImage object. Set scale = 2 and the image is no longer doubled. I added this code to our image downloader, to mimic the “@2x” iOS supports.

UIImage *image = [[UIImage alloc] initWithData:imageData];
if ([url hasSuffix:@"@2x"])
    image = [UIImage imageWithCGIImage:[image CGImage] scale:2.0 orientation:UIImageOrientationUp];

But let’s not quadruple the file size

We did have to create one new size for our photos — for the full-screen 2048×1536 images. This was no big deal, although it did take a few days to churn through all 750,000 hotel photos. But 2048×1536 is one big image, and contains a lot of data. With over 3 million pixels the image file sizes were four times as big as before. We didn’t want our app to become slower due to image load time, so we reduced the image quality on the larger sized images. We figured the Retina display would hide some of the JPEG aritifacts, and settled on a 65% quality level. The resulting file size was only about 1.5 times as big as before (instead of 4 times).

We worked fast and furious the last few days, to hit our goal of uploading to the app store on Friday. There were a few minor layout issues that had to be tweaked. While the simulator was nice (although I could only see about half of it on my 21 inch 1600×1200 monitor!) we had to test the program on a real device. But like everyone else we had to wait until they were released on Friday. After receiving the new iPad and verifying everything worked, we packed up the app and shipped it off .

Reviewed by Apple in six hours

Our app was verified by Apple in just six short hours! Our Retina version was available to the general public late Friday evening. Which means folks could see all of our thousands of awesome photos in glorious Retina display right from the very beginning. Also, being one of the first “retinized” apps, we made it on several of Apple’s top lists. This boosted our download rate immensely. It was worth the effort to take advantage of the small window Apple gave us, and makes our app that much better.


How to build a 40TB file server

by Anton on December 21, 2011

The one most valuable asset at is our photo collection. Take away the intellectual property and what’s left is, essentially, markup (with a bit of backend to snazz it up.) So we need a solid backup solution for the original high-res photos. The old servers were about to run out of capacity and their slightly outdated specs did not make transferring huge datasets any easier or faster. Thinking a few months ahead, we were looking at a 40TB data set. In strict accordance with KISS methodology, we opted against LTO and S3, and decided to build a big BOX. (For starters, 40TB on S3 costs around $60,000 annually. The components to build the Box — about 1/10th of that.)

Areca 1882ix-24 RAID Controller

Areca 1882ix-24 RAID Controller

Coincidentally, a great new product was just about to hit the market, reinforcing our decision with its timely relevance — the dual-core Areca ARC-1882ix RAID Host Bus Adapter, which comes with an on-board DDR3 SDRAM socket with up to 4GB chip support. Since we already opted for RAID Level 6 (striped, distributed parity–error checking, tolerates two disk failures) and dual-core RAID-On-Chip means it processes two streams of parity calculations simultaneously — it seemed ideal.

The first challenge in putting together the big box was getting internal SAS connectors properly seated into the backplane adaptor sockets, the bottom few being especially cumbersome to reach. Thankfully, our hardware technicians’ exceptional manual dexterity rendered having to disassemble the internal fan panel frame unnecessary.

Internal mini-SAS connectors (SFF-8087)

Internal mini-SAS connectors (SFF-8087)

The housing assembly comes with six individual backplanes, each accommodating four SAS or SATA disks. Each backplane is secured to the drive bay assembly with three thumbscrews, their shape and material designed to fall within the required torque range when screwed on “as tight as possible.” As we found out the hard way, it is absolutely critical to ensure that each of the backplane cards is seated ‘full snug’ in the slot and secured dead tight with the thumbscrews. A shoddy connection is not always immediately obvious, it turns out. We observed intermittent timeouts on a particular drive bay as well as degraded overall system performance caused by one of the backplane cards not having been secured quite tight enough — however, the array was still functional, making troubleshooting an opaque nightmare.

One of the six backplane cards

One of the six backplane cards

One of the most important differences between this system and your run-of-the-mill high performance enterprise server with a couple of hard disks is the addition of: six back-plane cards, one 24-channel raid controller, 24 hard disks, and internal connectors — all creating a new potential point of failure (at least 37 additional ones). Every single component’s installation must strictly conform to spec, as the delicately balanced system immediately amplifies any fractional deviations exponentially, resulting in problems persisting for hours, days, and weeks, and many more lost megabytes per second.

If the configuration of all components is optimized, the small individual gains add up to a significant performance boost. At the risk of stating the obvious, things are much less likely to go wrong in a stable, fine-tuned system which performs at max capacity.

Big Box with 24 hot-swappable drive bays

Big Box with 24 hot-swappable drive bays

One of those things is aligning the physical array dimensions with the file system’s allocation units, in our case with a stripe size of 64K (since most of our image files are relatively large) on 512 byte blocks, we format using also 64K cluster (“allocation”) size. It eliminates the RAID on-board logic overhead of having to keep the logical disk synced with the physical.

A few important things regarding the driver must be mentioned. Windows operating systems inherited their native SAS/SATA RAID controller driver framework from SCSI technology, (SCSIport) and they have several serious drawbacks. I stumbled upon an interesting investigative white paper which goes into great detail about these issues. The preferred driver for modern SATA/SAS cards is the STORport driver, developed by the manufacturer’s consortium in response to the inadequate state of native drivers, which inherited limitations of the SCSI protocol. The STORport driver is not certified by Microsoft, therefore the OS by default installs the inferior SCSIport driver. Switching to the STORport driver visibly improved stability and performance during the project’s earliest stages, instantly bumping write speed by several dozen MB/sec.

Gig-E is the bottleneck

Gig-E is the bottleneck

Having spent some extra time on research,  fine-tuning, and optimizing the new server, we were glad to find that the gigabit network had became the bottleneck, rather than the all-too-commonly disappointing I/O. By aggregating both on-board gigabit network interfaces we can expect transfer rates of 200MB/s (the disks are 3Gbps), which is great for us to maintain a light, low-maintenance incremental backup. Another big advantage of this system in terms of capacity scaling is the external SAS connector which can accommodate another external box of disks to expand our array into. While not the fastest solution possible, it strikes a great balance between performance, value, and reliability (redundancy), which is exactly what we we are looking for.


Cohabitation with Python and C++

by Chris on November 23, 2011

Back in the day, was a C++ shop. One day we decided to convert to Python. We didn’t convert everything to Python, which left us with the task of bridging the gap between them. Of course there were issues when setting up this communication between the C++ and Python libraries.

We made the decision to convert to Python for several reasons. One main reason was to take advantage of some good but free Python libraries. We converted almost all of our code — it’s better and easier to maintain code in one language than in two. But one of our backend engines was complicated and it worked, so we decided to leave it in C++.

In simple cases, bridging the gap from Python to C++ is relatively easy. Python provides a routine that will convert data from Python’s managed memory to C++. PyArg_ParseTuple is used to convert incoming Python objects into C data types. We had to then iterate over the lists using PyList_GetItem to convert lists to arrays. We wrote the conversion function, everything worked, and we pushed the results into production.

But we experienced periodic crashes which we could not track down. While investigating the crashes we discovered that our multithreaded system was essentially only handling one incoming request at a time! Our C++ code had lots of dependencies and sometimes could take a while to return a result. It turned out that the entire Python server would block waiting for the C++ code to return.

The problem was Python’s Global Interpreter Lock (GIL). The GIL prevents multiple native threads from executing Python bytecodes at once. Apparently this is done because Python’s memory management isn’t thread safe.

While there are a few things that can be done to allow Python to play nicely with C++ in a multithreaded environment, we were in a time crunch to get the problem solved. The problem wasn’t so much the multithreading, but the amount of time that was spent inside the C++ code. If the C++ code is quick then Python won’t block for very long.

We solved our problem by redesigning the flow — instead of calling our C++ code directly inside Python, we switched to running our C++ code in a separate process and talking to it via a simple HTTP API. Our multithreading and GIL issues disappeared and became multi-processing issues (where there are much clearer, safer boundaries).

The lesson we learned from this is that multithreading is difficult, even when it looks simple. Python and C++ can play nicely with each other, as long as the C++ call is quick. You can’t let the call into C++ block for too long, as you need to let Python release the GIL occasionally to let other threads run. While it may be possible to solve this problem, we deemed our new multi-process HTTP solution worked great (in fact it probably works better this way), and we didn’t have the time to delve into a solution involving Python’s GIL.

As always, tread carefully when doing any sort of multithreading.


CherryPy, ctypes, and being explicit

by Ben on October 31, 2011

Here at part of our web stack consists of and CherryPy, and on the whole we’ve found they’re fast and stable. However, a little while ago, the CherryPy server started intermittently raising an exception and bombing out — a WindowsError due to an invalid handle on an obscure SetHandleInformation() call.

Auto-restart is not a solution

At first this was only happening once in a very long while, but after certain changes it would start happening a few times a day. We’ve got a script in place that restarts our servers when they die, but because of the aggressive caching we do, our web servers load tons of stuff from the database on startup, and hence take a while to load. So just letting our auto-restart scripts kick in wasn’t a solution.

On further digging, we found there was already a relevant CherryPy bug open, with someone else getting the same intermittent exception. They were working around it by changing an unrelated line of code, so something smelled fishy.

HANDLE != uint32

I noticed SetHandleInformation() was being called with ctypes, and had just recently been using ctypes for a named mutex class I’d written (to make Python’s logging module safe for writes from multiple processes).

ctypes is great for calling C DLLs when you just want a thin Python-to-C wrapper. Its defaults are good — for instance, Python integers get converted to 32-bit ints in C, which is normally what you want. SetHandleInformation()’s first parameter is a handle, which I (and apparently CherryPy) assumed was just an integer, so it was getting passed to C as a 32-bit value. However, it’s actually defined as a HANDLE, which is typed as void pointer, so on our 64-bit Windows machines it was actually a 64-bit value.

SetHandleInformation() was looking for the high 32 bits of the handle on the stack or in a register someone else owned, and of course sometimes those 32 undefined bits weren’t zero. Crash bang.

On being explicit

Once we realized what was happening, the fix was easy enough — ctypes lets you override the default conversions by specifying argument and return types explicitly. So we changed a straight ctypes call:

windll.kernel32.SetHandleInformation(sock.fileno(), 1, 0)

to a ctypes call with an explicit type spec, like this:

SetHandleInformation = windll.kernel32.SetHandleInformation
SetHandleInformation.argtypes = [wintypes.HANDLE, wintypes.DWORD, wintypes.DWORD]
SetHandleInformation.restype = wintypes.BOOL
SetHandleInformation(sock.fileno(), 1, 0)

Lo and behold, we were now telling ctypes to respect the function’s signature, and everything worked fine. We told the CherryPy folks and they were quick to implement this fix and resolve the bug.

So don’t be scared of ctypes, but just remember, it doesn’t memorize Windows.h, so avoid pain and suffering by telling it your types. Explicit isn’t for raunchy movies — it’s point #2 in the Zen of Python.

{ 1 comment}