Oyster.com Tech Blog Insights from our engineering team

Using React and jQuery Together

React + jQuery

In this post I’ll be covering some tips on how to use React and jQuery together in the same UI. Okay so first off you might be thinking “why would you want to do such a thing?” - in fact the idea of trying to make React’s declarative style live together with imperative jQuery DOM updates may have you thinking something like this, and for the most part, you wouldn’t be wrong.

So first the “why.” If you’re starting a brand new, “greenfield” project and you want to use React, then just do it. There’s no good reason I can think of to mix-and-match React with jQuery or Mustache or whatever other DOM-helper/template library if you don’t have to - just use React and everything will be cool and you won’t have to worry about it. If on the other hand, you have one of those “legacy” applications that has “customers” and makes “money,” and for some reason your boss is not into the idea of you spending a few weeks rewriting the whole front end in React, but you still want in on that declarative React goodness, you may have to figure out how to get React to play well with jQuery or something similar.

Are you sure you can’t rewrite it?

Let’s say you have some piece of UI that gets rendered with jQuery, and you want to stick in some new component written with React. Take a look at your jQuery rendering function. Are you just building up some big string of HTML and sticking it in the DOM? Or using some kind of JavaScript template? If your rendering is already reasonably functional, i.e. some data goes into your function and some HTML comes out (or gets appended to the DOM or whatever) it will be pretty easy to just rewrite in React. So you should probably just do that and save yourself the inevitable hassle you’ll have when something breaks and you have to debug it.

Okay so you can’t rewrite it

For whatever reason you’ve determined it’s not practical to rewrite your jQuery code. Here is an important caveat: I think using jQuery and React to manage updates to the same DOM elements is a bad idea. React is really smart about figuring out how to update the DOM, but that only works if React is the only thing doing the updates. So unless you can cleanly separate the DOM elements in your UI so that some only get updated by React and others only get updated by jQuery, I wouldn’t try it.

So say for instance you’re going to render a product list with a React template, and then jQuery is going to add and remove CSS classes to the list items, and then your React render function might get called again later. This is a bad idea. React will have no idea about the changes that jQuery has made. Some of React’s efficiency comes from reusing DOM nodes on the page when things change, rather than always inserting or deleting nodes. If jQuery is making DOM changes that React doesn’t know about, some node that gets reused might be in an unexpected state.

One further caveat is that you should only really consider this mixed approach if you’re planning to eventually replace jQuery rendering with React. Using both doesn’t make sense long-term, but if you’re looking to gradually transition to React, you may have some parts of your UI using both for a while.

Let’s write some code

As an example we’ll start by rendering a simple list of products with jQuery. We’ll just show the name for each product and a button to buy it. Then we’ll get into replacing parts of the UI with React.

Our jQuery product list is pretty basic - it takes an array of products and inserts the list into #product-list-container. If the product list is updated, you just call productListJustJquery() again and replace the whole list with a new list.

var products = [
  {
    id: 1,
    name: 'Book',
    price: 15
  },
  {
    id: 2,
    name: 'Burrito',
    price: 8
  },
  {
    id: 3,
    name: 'Spaceship',
    price: 999999999
  },
  {
    id: 4,
    name: 'Dinosaur Bones',
    price: 5000000
  }
];

function buyProduct(productId) {
  // buy the product
}

/* -- Just jQuery -- */
function buyButtonJquery(product) {
  var button = $('<button class="buy-button">$' + product.price
    + '</button>');
  
  // handle click event
  $(button).on('click', function(event) {
    event.preventDefault();
    buyProduct(product.id);
  });

  return button;
}

function productListJustJquery(products, element) {
  var list = $('<ul class="product-list"></ul>');

  products.forEach(function(product) {
    var item = $('<li>' + product.name + '</li>');
    item.append(buyButtonJquery(product));
    list.append(item);
  });

  // replace the existing list if there is one
  var currentList = $(element).find('.product-list');
  if (currentList.length) {
    currentList.replaceWith(list);
  } else {
    $(element).append(list);
  }
}

jQuery inside a React component

We’ll start by replacing most of the product list with React but leaving the buy button in jQuery. This is simpler than the inverse - sticking React inside a jQuery UI - so we’ll do it first. The ProductListComponent is pretty straightforward:

var ProductListComponent = React.createClass({
  render: function(props) {
    return (
      <ul className="product-list">
        {this.props.products.map(function(product) {
          return <ProductComponent 
              key={product.id} 
              product={product} />
        })}
      </ul>
    );
  }
});

but in ProductComponent we need some extra code to make the call to jQuery. We add an extra button-container element, so that we have somewhere to put the jQuery DOM, and keep a reference to it.

render: function(props) {
  /* we need to keep a ref to the 
   * button-container so we can update it with jQuery
   */
  return (
    <li>
      {this.props.product.name}
      <span className="button-container" 
        ref="buttonContainer"></span>
    </li>
  );
}

Life cycle methods

It’s important to get familiar with the various React life cycle methods. The relevant ones here are componentDidMount - which is called after the first render, and componentDidUpdate - which is called after subsequent renders. In each of these methods we just call renderBuyButton, which uses our reference to the button-container DOM node to create a brand new buy button with jQuery on each render.

componentDidMount: function() {
  this.renderBuyButton();
},
componentDidUpdate: function() {
  this.renderBuyButton();
},
renderBuyButton: function() {
  // render the buy button with jQuery
  $(this.refs.buttonContainer).html(
    buyButtonJquery(this.props.product)
  );
}

Here’s the complete ProductComponent:

var ProductComponent = React.createClass({
  componentDidMount: function() {
    this.renderBuyButton();
  },
  componentDidUpdate: function() {
    this.renderBuyButton();
  },
  render: function(props) {
    /* we need to keep a ref to the 
     * button-container so we can update it with jQuery
     */
    return (
      <li>
        {this.props.product.name}
        <span className="button-container" 
          ref="buttonContainer"></span>
      </li>
    );
  },
  renderBuyButton: function() {
    // render the buy button with jQuery
    $(this.refs.buttonContainer).html(
      buyButtonJquery(this.props.product)
    );
  }
});

React components inside jQuery

Now we’re going to do it the other way and stick some React DOM inside our jQuery DOM. This is a little trickier. We’ll start with a BuyButtonComponent in React, there’s not much to it:

var BuyButtonComponent = React.createClass({
  onClick: function(event) {
    buyProduct(this.props.product.id);
  },
  componentDidMount: function() {
    console.log('component did mount - stuff to clean up later');
  },
  render: function(props) {
    return (
      <button className="buy-button" 
        onClick={this.onClick}>{this.props.product.price}
      </button>
    );
  },
  componentWillUnmount: function() {
    console.log('about to unmount - clean up stuff here');
  }
});

function buyButtonReact(product, element) {
  ReactDOM.render(
    <BuyButtonComponent product={product} />,
    element
  );
}

I’ve added the componentDidMount and componentWillUnmount methods with some console.logs. They clearly don’t really do anything in this component, but in a real component you’ll often do something in componentDidMount - subscribe to event from a Flux store or something - that needs to be cleaned up when the component unmounts. We need to make sure these methods still get called at the right times or you risk memory leaks or trying to update a component’s state property when it no longer exists (which will throw an error).

So now we’ll alter our jQuery function for rendering the product list to use our new React BuyButton component. We’ll use the same strategy of adding an extra button-container component here. We also attach the product data to the container component so we can use it later.

products.forEach(function(product) {
  var item = $('<li>' + product.name + '</li>');

  /* add a container element where 
   * we'll attach our React component
   */
  var buttonContainer = $('<span class="button-container"></span>');
  // add product data to use in our React component
  buttonContainer.data('product', product);

  item.append(buttonContainer);
  list.append(item);
});

After we’ve inserted the main product list with jQuery, we iterate over the container nodes and use the product data to render the buy buttons with React. You can see in the console that componentDidMount is called for each component.

We might render this product list multiple times, so we need to make sure our productListJqueryReact function works when called repeatedly. jQuery is going to blow away the whole DOM each time which won’t give React a chance to do its clean up (calling componentWillUnmount), so we need to manually unmount the React components before we insert a new list with jQuery.

// clean up any mounted React components
$(element).find('.button-container').each(function() {
  ReactDOM.unmountComponentAtNode(this);
});

You can verify in the console that componentWillUnmount is called for each of the buy buttons every time the list is re-rendered. Here’s the complete function for rendering the product list:

function productListJqueryReact(products, element) {
  var list = $('<ul class="product-list"></ul>');

  // clean up any mounted React components
  $(element).find('.button-container').each(function() {
    ReactDOM.unmountComponentAtNode(this);
  });

  products.forEach(function(product) {
    var item = $('<li>' + product.name + '</li>');

    /* add a container element where 
     * we'll attach our React component
     */
    var buttonContainer = $('<span class="button-container"></span>');
    // add product data to use in our React component
    buttonContainer.data('product', product);

    item.append(buttonContainer);
    list.append(item);
  });

  // replace the existing list if there is one
  var currentList = $(element).find('.product-list');
  if (currentList.length) {
    currentList.replaceWith(list);
  } else {
    $(element).append(list);
  }

  // attach our React components to the containers
  list.find('.button-container').each(function() {
    var container = $(this);
    var product = container.data('product');

    /* React needs a plain, non-jQueryfied DOM 
     * element, so we can use plain "this"
     */
    buyButtonReact(product, this);
  });
}

I’m sure there are other ways to handle the same issues, and other edge cases where jQuery and React can conflict and cause problems, but the examples above cover the most common use cases I’ve encountered. There’s not a lot of writing I could find about using both simultaneously, and the conventional wisdom seems to basically be “don’t do it,” so let us know if you have experience working with both or think we missed something!

Rundeck vs. Crontab: Why Rundeck won

We like automation here at Oyster: we do one-command deployments using Ansible and we run a lot of other automated scripts and ETL processing on various schedules.

crontab on Linux and Task Scheduler on Windows are fine tools as far as they go. But they’re a little too simplistic: difficult to schedule, no retrying, not centralized, you can’t customize success or failure emails, almost impossible to get non-engineers to use, etc.

Enter Rundeck. It’s a powerful system, though we’re only using it in “glorified crontab with a web UI” mode, which is a fine way to start. We’ve now switched most of our cron jobs over to it, and we’re working on moving our Windows Task Scheduler jobs as well.

Rundeck logo

Below are some of the benefits for us.

UI with single sign-on support

Rundeck has a web UI that supports single sign-on, so anyone on our team has automatic access to it. Among other things, this means people on other teams can write ETL scripts and set them up in Rundeck without involving engineering.

The UI is also good for engineering. It allows you to create jobs, run a job, view history, and check job output. Previously to look something up we’d have to hunt through old emails, or SSH or remote into a machine and scan through log directories. Big plus.

Rundeck jobs UI

The UI is definitely better than Windows Task Scheduler (not hard to beat), but it’s not a stellar example of design. It has a very made by developers feel, with tons of detail upfront, and some common operations like looking at recent errors or log output hidden behind several clicks.

History and output capturing

Rundeck logs all job activity, making it easy to debug when something went wrong and why. You can filter the activity history by task name, user, success/failure status, or time.

Once you’ve found the relevant run, you can drill down into it and look at all stdout and stderr output from a given command. For example:

Rundeck error output

Success and failure emails

Windows Task Scheduler and crontab have very limited support for this, but with Rundeck it’s easily configurable and simple to set up. You can tell it to send stdout/stderr output from the job as an attachment to the email.

Some of our scripts we just want to email on failure, others we want to email on success as well, to keep folks in the loop:

Rundeck email notifications

You can customize the subject line using job variables, and you can specify a custom Markdown email template if you want (we just use the default – it’s not pretty, but it is functional).

Simple and advanced scheduling

Rundeck has a simple scheduling interface that allows you to run a job at a certain time every day or on selected days of the week. It also has a more advanced crontab scheduling option to allow you to run tasks on schedules like “every 15 minutes” or “1:30am every last Friday of the month”.

Rundeck simple scheduling

SSH support

Rundeck can run jobs locally on the machine Rundeck is installed on, or it can run jobs on remote machines using SSH. For most of our longer-running jobs, we have a single “worker server” that we run jobs on via SSH.

Rundeck has good support for multiple “node sources”. For example, in our installation, our techops team has set up Rundeck to use our master node list from Chef.

Ad-hoc commands

Rundeck also has a screen where you can run any ad-hoc command against one or more nodes. For example, you might want to check disk usage on a bunch of nodes:

Rundeck ad-hoc commands

Source control integration

One of the problems with UIs (like Task Scheduler) is that change tracking is hard or impossible. We wanted to be able to see who did what, and when. As of version 2.6.0, Rundeck has built-in support for source control integration via git. So we hooked it up to our git repo and now have a changed-tracked log of who did what to the job config.

All the project and job configuration is saved in YAML format, so is fairly easy to read, for example:

- description: ''
  executionEnabled: true
  group: Scheduled
  name: Main nightly scripts
  notification:
    onfailure:
      email:
        attachLog: true
        recipients: errors@oyster.com
  sequence:
    commands:
    - jobref:
        group: Task
        name: Dump live database to snapshot
    - jobref:
        group: Task
        name: Backup snapshot database
...

Other features

Rundeck has plenty of other features, many of which we’re not using yet:

Shout-out to our techops team who set up the Rundeck install for us and helped with various operational aspects.

We’re hiring!

If you like good engineering and automation and want to come work for us, apply here!

Using Ansible to restore developer sanity

This time a year ago we were deploying new code to Oyster.com using a completely custom deployment system written in C++. And I don’t mean real C++; it was more like C with classes, where the original developers decided that std::string was “not fast enough” and wrote their own string class struct:

struct SIZED_STRING
{
    const uint8_t *pbData;
    size_t cbData;
};

It’s not our idea of fun to worry about buffer sizes and string lengths when writing high-level deployment scripts.

Then there was the NIH distributed file transfer system — client and server. And our own diffing library, just for fun. All very worthwhile things for a hotel review website to spend time developing in-house! :-)

Screenshot of our Ansible-based deployment

Sarcasm aside, this wasn’t a joke: we replaced more than 20,000 lines of C++ code with about 1000 lines of straight-forward Ansible scripts. And it really did restore our sanity:

  • Rather than 28 manual steps (some of which, if you ran out of order, could bring the site down) we run a single Ansible command. All we have to specify manually is which revision to deploy and type in some deployment notes to record to our internal log (for example, “Shipped mobile version of hotel page”).
  • Instead of spending hours digging into log files on a remote machine whenever our fragile home-grown system broke, Ansible gives us clear and generally easy-to-track down error messages. The most we have to do is SSH to a machine and manually restart something.

Choice of tools

Some teams within TripAdvisor use Chef for server setup (and other tools like Fabric for code deployments). We also looked briefly at Puppet. However, both Chef and Puppet gave us a very “enterprisey” feel, which isn’t a great match for our team’s culture.

This is partly due to their agent-based model: Chef, for example, requires a Chef server in between the runner and the nodes, and requires you to install clients (“agents”) on each of the nodes you want to control. I think this picture gives a pretty good idea of the number of components involved:

Chef Diagram

In contrast, Ansible has basically five parts:

  • playbooks
  • inventory files
  • vars files
  • the ansible-playbook command
  • nodes

I’m sure there are advantages and more power available to systems like Chef, but we really appreciated the simplicity of the Ansible model. Two things especially wooed us:

  1. You don’t have to install and maintain clients on each of the nodes. On the nodes, Ansible only requires plain old SSH and Python 2.4+, which are already installed on basically every Linux system under the sun. This also means developers don’t have to learn a new type of authentication: ordinary SSH keys or passwords work great.
  2. Simple order of execution. Ansible playbooks and plays run from top to bottom, just like a script. The only exception to this is “handlers”, which run at the end of a play if something has changed (for example, to reload the web server config).

Ansible Tower Screenshot

Ansible Tower UI

Ansible itself is free and open source and available on GitHub. But they also provide a fancy web UI to drive it, called Ansible Tower. It’s nice and has good logging and very fine-grained permissions control, but we found it was somewhat tricky to install in our environment, and as developers it didn’t gain us much over running a simple command.

Our thinking is that in a larger organization, where they need finer-grained permissions or logging, or where non-developers need to kick off deployments, using Ansible Tower would pay off.

Our deployment scripts

As noted above, Ansible has a very simple order of execution, and its model is kind of a cross between declarative (“get my system configuration into this state”) and imperative (“do this, run this, then try this”). Our site deployment involves some system configuration, but is mostly a series of steps that “orchestrate” the deployment. Here’s more or less how it works:

  1. Setup: update code, run tests on staging server, upload new static assets.
  2. Turn off B app servers, run on A (we have 8 Python app servers in each group).
  3. Update code on B app servers.
  4. Turn off A app servers, run on B (making new code live on B).
  5. Update code on A app servers.
  6. Make all A and B app servers live.
  7. Record deployment log and send “finished deployment” email.

To show you some example Ansible code, step 3 (and step 5) use the following code:

---
- name: Update code on B app servers
  hosts: app_b
  tasks:
  - name: Update code on app servers
    subversion: repo= dest= username=
                password= revision=

  - name: Restart app service
    service: name=server-main state=restarted

  - name: Wait for app server to start
    wait_for: port= timeout=300

  - name: Check that new version is running
    local_action: uri url=http://:
                  return_content=true
    register: response
    failed_when: response.json['SvnRevision'] != 

As you can see, Ansible uses fairly straight-forward YAML syntax. In the above code, Ansible runs these tasks against our 8 “app_b” hosts in parallel — a simple but powerful concept.

For a given “play” such as the above, each task is executed in order — we really appreciated how it doesn’t try to outsmart you in terms of how and when things run. The only exception to this is Ansible’s handlers, which are tasks run at the end of a play, but only if something “notified” them. For example, in our deployment, handlers are used to restart our nginx servers when the nginx config file changes.

You’ll see there are a lot of `` used here: each task line is actually a Jinja2 template string that is rendered against your current set of host variables. This makes it very easy to modify settings which change depending on environment (staging, production, etc). It also separates playbooks from user-specific data, meaning settings aren’t hard-coded in playbooks and folks can share them much more easily.

We deploy solely to Linux-based machines (about 50 nodes), and Linux is where Ansible started and where it excels. However, we have something of a Windows history, so it was interesting to learn that as of August 2014 (version 1.7), they started adding support for managing Windows machines — this is done via Powershell remoting rather than SSH.

In short, what sold Ansible to us was:

  • Simple YAML-based syntax
  • Simple execution order: top to bottom, and then handlers
  • Powerful: Jinja2 templates, large library of builtin modules
  • Agentless: no client to install and maintain

Pre-Ansible, we dreaded our 28-manual-step deployments. Post-Ansible, it’s almost fun to deploy code, and the focus is on the features we’re deploying, instead of “what’s going to go wrong this time?”. So I hope you get the chance to try Ansible! And no, we weren’t paid to link that…

When Building Your Own CMS is the Right Choice

In the latter half of last year, we decided to replace the CMS that powers the content on Oyster.com. Actually we replaced three CMSs with a single one. Oyster is primarily in the business of creating content such as our in-depth hotel reviews, roundups, slideshows, and various other articles that help travelers spend their hard-earned vacation days and dollars wisely. So we knew it was an important task to build the best tool we could to enable our writing and editing staff to put out high quality content easily.

content

Obviously we’ve been doing this for a while, so we had tools in place, but we had reached a point where we needed to make a change. As I mentioned above we had three different CMSs that we used: one for hotel reviews, one for articles (both of these were custom), and a WordPress blog for blog posts.

Documents and Structure

Pros, Cons & Bottom LineThe custom editors were used for creating structured documents which consisted of a number of sections (such as the Pros, Cons, and Bottom Line sections of our hotel reviews) which in turn consisted of a number of fields. These were stored in a custom text format and any text formatting was stored as wiki markdown. This made it harder than it needed to be to update documents’ structure or create new document types since all the code that parsed and rendered the documents was custom. Also the UIs for the custom editors were due for a good refresh.

The WordPress editor presented different problems. WordPress is quite good for producing a nicely formatted bit of text, but what you get when you write a post is a big blob of HTML. Formatting and styles are all mixed in with your content. Also everything is totally static, so if a hotel changes names or closes, or some other piece of information in our hotel database changes, it doesn’t get updated in the blog post. We knew we wanted our blog content to be integrated with the same database used by our hotel reviews so we could more easily surface rich information about hotels and pricing.

We realized we wanted to keep the concept of structured documents; our hotel reviews have a well-defined format, and we need to be able to write those in a structured way. Similarly our roundups always consist of an intro section and a list of hotels each with a short blurb relevant to the roundup topic. At the same time we want our writers to have the flexibility to produce more freeform blog-oriented content with a degree of flexibility for formatting. We decided we could do this by defining a set of formatting blocks that reflected style conventions they were already using, with an eye to extending these fairly easily as needed. This frees up the writers from having to focus on layout and focus on what they want to say. Having well-defined formatting blocks, or “widgets,” also means we can create responsive templates for how the articles display – we can make it look good on a desktop, phone, or tablet since the documents contain content information and not layout information.

JSON Documents

When deciding how the documents should be stored, it was really a no-brainer that they should be stored as JSON. JSON plays well with pretty much anything these days. Any server side language you use (we use Python) ought to be able to parse JSON into a useful data structure with a few lines of code. We use Postgres for our content database, and Postgres has a built-in JSON type that you can query against, put indexes on, and use with various functions and operators. Storing documents as JSON in the database means we don’t have to change the database schema every time we want to add a new widget type or document field, but we don’t really have to compromise on queryability either.

Part of a Slideshow Document

{
    "IsOrdered": true,
    "IsAward": true,
    "ShowAboutOyster": false,
    "FeaturedOnArticles": false,
    "Title": "Best Beach Hotels in Miami",
    "Intro": "<p>A team of Oyster reporters has made multiple trips to Miami to visit nearly 200 hotels. We slept in the beds, lounged by the pools, ate in the restaurants, and even sampled the nightlife, all with an eye toward selecting the most distinguished properties. Here's a list of our favorite beachfront hotels.</p>",
    "Hotels": [
        {
            "Url": "/miami/hotels/the-setai/",
            "Type": "Hotel",
            "Blurb": "<p>Paradise doesn't come cheap. Striking but sober mood-lit design; impeccable service; huge, immaculate rooms; three pools, each a different temperature; and a prime beachside location make the Setai one of the best hotels in Miami. Its restaurants are more about design than food, but several of Miami's best restaurants are just half a block away.</p>",
            "PhotoUrl": "/miami/hotels/the-setai/photos/beach-the-setai-v134081"
        },
        {
            "Url": "/miami/hotels/w-south-beach/",
            "Type": "Hotel",
            "Blurb": "<p>The stunning new 312-room W South Beach -- located on the beach, on the northern outskirts of South Beach -- blends cute comforts, intricate design (that spares no expense), and flawless service. Large, modern rooms; terraces angled to overlook the ocean; elegant landscaping around the pool; a freshly-opened spa -- the W tops the Miami greats.</p>",
            "PhotoUrl": "/miami/hotels/w-south-beach/photos/beach-w-south-beach-opening-may-2009-v289241"
        },

And of course the UI for our CMS is web-based, which means the bulk of the functionality is written in JavaScript, so working with JSON on the front end is extremely easy.

I mentioned before that our documents have a well-defined structure, but a JSON object just consists of arrays, strings, numbers, booleans, and more nested objects. What we needed was a way to define how a given JSON document of a certain type is supposed to look – a JSON schema if you will. So we used, unsurprisingly, JSON Schema. Much like XML DTD does for XML documents, JSON Schema lets you define how a JSON object should be structured, and the definition itself is a JSON object. It provides the basics such as what types of values are allowed for a given property, which properties are required, max and min ranges, enums, regexes, most of what you’ll need. You can also have nested schemas, so we can define a “Slide” schema, and then say the “Slideshow” schema consists of a title, an intro paragraph, and one or more Slides.

JSON Schema for Slideshow documents

"Slideshow": {
    "allOf": [
        {"$ref": "#/definitions/baseArticle"},
        {
            "properties": {
                "IsAward": {"type": "boolean", "default": false},
                "Slides": {
                    "type": "array",
                    "items": {"$ref": "#/definitions/Slide"}
                }
            },
            "required": ["Slides"]
        }
    ]
}

Well it’s nice to have your document structure defined, but you have to do something with that information. Namely you want to be able to validate your documents and get useful error information when the validation fails. For that we used the Python jsonschema package. When a writer saves a document in our CMS, it sends a JSON object to the server. The CMS back end validates the document against the relevant schema, and we get back a handy error tree that tells us what went wrong. Since the document structure on the server matches the structure on the CMS UI, it’s not too hard to parse that error tree and match error messages to input fields on the writer’s screen to show them some helpful feedback: “This field is required,” “This is an invalid URL,” and so on.

errors Error Handling

UI Concerns

That brings us to the front end of the CMS. There are of course various pages that allow you to search through documents by types and tags, and see the editing history, but most of that was ground we had covered before. A large part of the work done was on the document editor itself – the interface our writers use for creating a single document.

To create the document editor, we wrote a healthy amount of JavaScript. Basically the editor needs to take the concept of structured Documents composed of Widgets and show the user an intuitive UI for writing and editing.

So let’s say you’ve started writing a new Travel Guide article. What happens is an animated paperclip with eyes pops up and says “I see you’re writing a new Travel Guide, need some help?” Wait no, that’s not what happens.

editor-2

What you’ll see is a mostly blank document with inputs for some top-level fields and then spaces to populate with different Widgets. The toolbar on the right hand side has a list of the relevant Widget types which you can drag into place in the document. All the Widgets can be dragged and dropped in the different places they can go in the Document.

Each Widget contains various fields for text, URLs, checkboxes, drop-down menus, etc. Some fields allow WYSIWYG editing with a whitelisted subset of HTML tags via the WYSIHTML5 editor. We store the HTML as HTML in the JSON rather than markdown – since we allow such a small list of tags, and the editor outputs well-formed markup, it’s perfectly safe, so we figured why go through an extra encoding and decoding step?

The design for how the editor turns a JSON document into a UI and back again is quite involved, and highly structured, but follows the principles of how the documents are organized. Each Document type, Widget type, and Field type corresponds to a JavaScript class in our editor code, and these classes all inherit from a common base class.

When the editor starts up to edit a Document, we just pass in a JSON object from the server. Each object and nested object in the JSON has a “Type” field which is used to call the proper constructor for the main Document object and its constituent Fields and Widgets to turn the plain JSON objects into class instances. These classes provide, at minimum, a fromJSON method to assign JSON properties to instance properties, and a toJSON method to put the instance properties into a plain JavaScript object that’s ready for serialization. Some classes also provide methods for things like sanitizing input, formatting error messages, providing a word count, etc.

A method for turning a class instance into a simple JSON object

models.Model.prototype.toJSON = function () {
    // return a JSON-friendly object

    var jsonObj = {}, property, jsonProperty, value;

    for(property in this) {
        value = this[property];
        if (value.toJSON) {
            // JSON properties are uppercased
            jsonProperty = util.ucFirst(property);

            jsonObj[jsonProperty] = value.toJSON();
        }
    }

    if (this.typeName) {
        jsonObj.Type = this.typeName;
    }

    // delete null properties and empty strings
    for (property in jsonObj) {
        value = jsonObj[property];
        if (value === null || value === '' || Array.isArray(value) && value.length === 0) {
            delete jsonObj[property];
        }
    }

    return jsonObj;
};

So you might rightly ask “okay so you’ve got your document, and you turned that into some other objects, but how do you, you know, do stuff with it?” Well the “doing stuff” part of the editor – putting in text, dragging things around, deleting, and adding things – is all handled with data binding.

If you’re interested at all in building JavaScript-based UIs, you’re undoubtedly familiar with the concept of data binding. In short you have some object or objects which represent your data (in our case, the Document) and the UI which consists of a bunch of DOM nodes. When someone changes the DOM representation of your data, you want that to be updated in the model object, and similarly changes to the data model should be reflected in the UI. Data binding is the practice of doing this in an explicit and automated fashion so that your UI and your data are always in sync.

A number of popular JavaScript application frameworks like Angular, Backbone, and Ember provide data binding as part of what they do, but they tend to be more useful when building large, single page applications as they provide lots of other features such as URL routing, module loading, and dependency injection. They also tend to be pretty opinionated about how your code is structured, and often learning to use them correctly is quite involved.

We really only wanted something to handle the data binding piece – a data binding library rather than a whole application framework. To that end we chose Rivets.js for a number of reasons. Rivets.js is small first and foremost – both in the scope of what it does (it does data binding, and that’s it) and in terms of source code. It also doesn’t care about what your data model looks like – it binds to regular object properties, so your model object can be a plain object or some custom class you wrote, whatever. It’s also easy to learn, easy to use, and easy to extend.

Rivets bindings in use:

<div id="tag-search">
    <div>
        <input type="text" rv-edit="tagSearch:searchTerm" rv-on-keydown="tagSearch:keyDown">
        <i class="fa fa-search search-main-icon" rv-hide="tagSearch:isSearching"></i>
        <i class="fa fa-circle-o-notch fa-spin search-main-icon" rv-show="tagSearch:isSearching"></i>
        <i class="fa fa-plus search-main-icon" rv-on-click="tagSearch:clickAdd"></i>

        <ul id="tag-results" rv-show="tagSearch:results:length">
            <li rv-each-result="tagSearch:results" rv-class-selected="result:selected" rv-on-click="tagSearch:clickResult">
                <span class="tag-icon" rv-addclass="result:type">
                    <i class="fa fa-tag" rv-match-tag="result:type"></i>
                    <i class="fa fa-map-marker" rv-match-location="result:type"></i>
                    <i class="fa fa-cubes" rv-match-category="result:type"></i>
                </span>
                <span rv-html="result:html"></span> ({ result:count })
            </li>
        </ul>
    </div>
</div>

One instance of how we extended Rivets was for templating. Each of our Widget and Field classes have various properties and functionality, and they also each have a particular way they need to display on the screen. We wanted to have a snippet of HTML for each Widget to use as a template, and use Rivets bindings within the template. Frameworks like Angular let you define partial templates, nest them inside eachother, and each template can have its own isolated scope. Rivets doesn’t support this out of the box, but it turned out to be easy enough to add a new template binder that does just that. It’s pretty basic – it doesn’t do lazy-loading or have sophisticated scoping options, but it works fine for what we needed, and you never have to try to remember what the word “transclude” means.

Partial template binder for Rivets.js

rivets.binders['template-*'] = {
    bind: function(el) {
    },
    unbind: function(el) {
        var children = $(el).children(), boundView;

        //unbind the view from the child element
        if (children.length) {
            boundView = $(children[0]).data('templateBoundView');

            if (boundView) {
                boundView.unbind();
            }
        }

        $(el).html('');
    },
    routine: function(el, value) {
        var modelName, templateName;

        if (!value) {
            console.log('missing value', el);
        }

        templateName = value.template.toLowerCase();
        modelName = this.type.split('-')[1];

        renderTemplate(el, templateName, modelName, value);
    }
};

function renderTemplate(el, templateName, modelName, model) {
    var myEl, html, templateData, child, view;
    myEl = $(el);

    if (myEl.html()) {
        return;
    }

    //insert the html - must have 1 root element
    html = EDITOR_TEMPLATES[templateName]
    if (!html) {
        throw("Can't find template for: " + templateName);
    }
    myEl.html(html);

    //bind the view to the child element
    templateData = {};
    templateData[modelName] = model;
    child = $(myEl.children()[0]);
    view = rivets.bind(child, templateData);
    $(child).data('templateBoundView', view);
}

One other way we had to modify our use of Rivets deals with how the library binds to object properties. Out of the box, Rivets detects changes in the data model by wrapping attribute access with getters and setters. That works fine if the properties you want to observe hold primitive values like Strings and Integers and the like – but for some of our bindings we needed to detect changes on entire objects, and especially arrays.

Thankfully Rivets allows you to write new adapters to modify how change detection happens. Also thankfully, we only needed to support recent versions of Chrome on the browser side of things, and Chrome now supports a native Object.observe and Array.observe. So when new Widget gets pushed onto an array inside our Document, the iteration binder that renders the Widgets gets updated automatically.

Too often internal tools don’t get the attention they deserve – they are often left to languish while development resources are put towards customer-facing features and operational concerns. In our situation we felt the need to get our content management tools right and do the smart thing rather than what would necessarily be the easy thing. It took a lot of hard work from our dev team, but so far it has paid off in terms of productivity and agility. There were some interesting challenges along the way, and we’ll have to continually adapt our tools as the business grows, but I think we’ve set ourselves up in a good position from which to move forward.

Saving 9 GB of RAM with Python’s __slots__

We’ve mentioned before how Oyster.com’s Python-based web servers cache huge amounts of static content in huge Python dicts (hash tables). Well, we recently saved over 2 GB in each of four 6 GB server processes with a single line of code — using __slots__ on our Image class.

Here’s a screenshot of RAM usage before and after deploying this change on one of our servers:

RAM usage before and after deploying this change

We allocate about a million instances of a class like the following:

class Image(object):
    def __init__(self, id, caption, url):
        self.id = id
        self.caption = caption
        self.url = url
        self._setup()

    # ... other methods ...

By default Python uses a dict to store an object’s instance attributes. Which is usually fine, and it allows fully dynamic things like setting arbitrary new attributes at runtime.

However, for small classes that have a few fixed attributes known at “compile time”, the dict is a waste of RAM, and this makes a real difference when you’re creating a million of them. You can tell Python not to use a dict, and only allocate space for a fixed set of attributes, by settings __slots__ on the class to a fixed list of attribute names:

class Image(object):
    __slots__ = ['id', 'caption', 'url']

    def __init__(self, id, caption, url):
        self.id = id
        self.caption = caption
        self.url = url
        self._setup()

    # ... other methods ...

Note that you can also use collections.namedtuple, which allows attribute access, but only takes the space of a tuple, so it’s similar to using __slots__ on a class. However, to me it always feels weird to inherit from a namedtuple class. Also, if you want a custom initializer you have to override __new__ rather than __init__.

Warning: Don’t prematurely optimize and use this everywhere! It’s not great for code maintenance, and it really only saves you when you have thousands of instances.