Oyster.com Tech Blog Insights from our engineering team

Oyster Hotel Panoramas

Here at Oyster we’re all about photos. We like photos of people, pools, property and power outlets. The more photos the better as our goal is to give you a real and ideally complete picture of the hotels we cover.  A single photo – while worth a thousand words – will generally show you a small window of about 40° horizontally and 27° vertically. What if we could give you a 360° view of the hotel? A panorama adds a lot of perspective and helps create a better sense of the space. We weren’t sure what the results would be but we decided to purchase a couple Gigapan pano heads and send them out to our photographers and see what we could make of it.

And we made this. Well that’s kind of cool, not very interactive but cool. When looking for some panoramic software we didn’t really have a big list of requirements. We wanted something that would work well and look good. We didn’t want any QuickTime or Java implementations, sticking to Flash and hopefully some in-progress HTML5/js solution. The first program we investigated was Pano2VR. Pano2VR definitely did the job, but when investigating how we might handle UI changes down the line it looked as though we’d have to generate each panorama over again. As an engineer this seemed like a less than ideal solution.

While not writing it off I continued to search for panorama software. I came across this website dedicated to panoramas, Panoramas.dk, and the panoramas I was looking at felt smoother. Digging a bit I found out they use krPano. After giving it a guick run-through it became obvious that this was going to be our solution. It ran off a suite of command-line tools and configurable template files, allowing you to separate your UI into an XML file, while packaging up the rest of your panorama into a swf. Now if we decide to change anything in the UI we just have to worry about modifying that one file – perfect!

Taking a look at the pano above you’ll see that we’re almost there. You might’ve noticed a rather visible seam where the left and right ends of the photo meet. Up until now we’d just been working out what panorama software we would use and it was time to get that stitching quality up. The software package that came with Gigapan wasn’t cutting it. Sure we could pass it off to our photo editors to try and do something about it, but the more streamlined this process was the better. GigaPan Stitch was out and AcroPano I found barely usable having to manually set points of similarity between each image as it was added. I did not get far enough to see what a stitch would look like. I finally stumbled upon Microsoft Research Image Composite Editor or MR ICE for short. Not only did ICE get rid of the seam but it also did a far superior job of selecting photos when blending moving objects. If you look closely at the photo below you can see a minivan on the road near the middle of the image that appears to be vanishing, while the ICE stitched photo correctly composites two images without the moving object.

So far we’re just working with individual panoramas for each hotel, but one great feature of krPano is the ability to create virtual tours. Though not something we’re currently working on we do have the capability with krPano to create a sort of walkthrough of a hotel. There are also some neat features with hotspots and javascript callbacks that could create some pretty interesting experiences. Take a look at the first batch of panoramas:

Bell Rock Inn

L’Auberge de Sedona

Best Western Plus Arroyo Roble Hotel & Creekside Villas

Who Can I Trust To Send My Email?

Oyster.com has a large average transaction size.  Customers visit our site and commonly hand over thousands of dollars.  Eventually they will expect their hotel room to be available, but what they want immediately after paying us is an email telling them that we got their order, their room has been confirmed, and everything is OK.  This is a very compelling reason to have a stable email provider.

There is also the ever-growing number of people on our email lists.  They have told us they love our website and would like to purchase something from us if we would just send them a nicely personalized email that maybe has a coupon in it.  That is another great reason to be able to get our emails out reliably and track their performance.

So who can I trust to send my email?  There are more companies around than you would believe who are fighting for the business.  I looked at about a dozen of them.  The criteria used were:

  1. Cost.  Oyster is a startup and doesn’t like to pay BigCompany prices.
  2. Deliverability.  Spammers have made email deliverability a huge pain in the rear.  Getting your message into your customer’s inbox is not trivial.
  3. API.  I am not interested in a solution that I can’t extend.  We push all our services and tools very hard and that’s impossible without a good API.
  4. Features.  Of course, the fewer things I have to write means more time actually working on our website instead of our email campaigns.

The Competitors

The cheapest is Amazon.  $0.10 CPM (cost-per-thousand) will stretch a startup’s budget a long way.  Their deliverability is a question mark though, and they are very low on features.  I am trying to hire an email vendor, not build one.

A step up are companies like MailChimp.  $0.50 CPM, but you get a lot of feature for your money.  Deliverability was a concern, but they will give you a dedicated IP address for $1,000 and that will at least keep the spammers off the exact same address.  Things started breaking down though when I dug into their API.  It is very campaign-centric.  How do I pull up a list of the last 10 mails I sent a specific customer?  They do get some points for having the best designed website in the industry.

Another bump up the pricing ladder takes you to a fellow NYC startup, Sailthru.  They probably have the worst designed website in the industry, but I heard good things about them so I got a demo account and took it for a spin.  They had a simple and powerful REST API that would play nicely with our codebase.  They definitely understood that deliverability is paramount.    There were some rough edges around the product (like the documentation), but I got quick support whenever I had a question.

Which of these companies hired a web designer?

The top tier includes some of the big names in the industry.  StrongMail, ExactTarget, CheetahMail, BlueHornet, Silverpop, and on and on.  This is the world of long web demos where they spend half an hour showing you their amazing WSIWYG editor (which a) sucks and b) we’re not going to use).  Then comes a breakdown of every possible report you could generate.  The conversation gets awkward when you ask for access to a sandbox where you can try it for yourself.

There were some diamonds in the rough.  StrongMail and ExactTarget both had strong offerings.  StrongMail  can pull data straight from your database.  They had a convoluted API, but everything you need is in there.  There were also some nice goodies like multi-variate A/B testing support built in.  Prices start in the $4 -$5 CPM range, but it comes down as your volume grows.

The Winner Is..

The competition was fierce.  Salespeople were sending me Yankees tickets and taking me out to fancy restaurants.  At the end of the day we chose Sailthru.  They had some features that were very useful for us.  Every mail we sent was archived on the web and we easily tied links to them into our customer service portal.  They have a developer-friendly API and were happy to add the hooks and functions that we asked for.  The templating system is simple and flexible.  There are a couple of warts, but it is a great system for the money.

So how is it going?

We’ve been using Sailthru for a while now.  Overall I’ve been very happy with them.  Senderscore reports our emails as highly deliverable.

98% Acceptance Rate

 

Besides deliverability, there are a couple key features that make Sailthru work for us.  The first is that their templating system does not get in our way.  Our transactional mails go out in different languages and currencies, so it is not going to work if we have to build emails in your system.  Most of our Sailthru templates look like this:

{body}

We process everything on our side with Cheetah templates and our internationalization code and then send the entire email up to them.  We get all the benefits of the reports, link tracking, and other goodies in Sailthru, but we can leverage our localization tools to build the mail.

International emails mean basically all the content changes.

The second is that their API is simple and complete.  It is a REST api and we wrote our own interface in Python, but they have a bunch of clients available.  We use it for everything.  Sending transactional mails is as easy as POSTing the body of the mail.  Campaign mails are a two step process because they are heavily personalized.

The issue is how do you get different content for 100,000 people into your email provider?  A solution like StrongMail can connect right to your database, but I haven’t seen anyone else who does that.  Sailthru has data feeds that you can setup, but it is only good for specifying the same content for everyone.  Sailthru’s solution is through their API ‘job’ call.  This is used for a lot of bulk tasks like massive updates to your lists.  In this case it allows you to set data fields on all your users.

Every night we build up a file with customized article selections for each user and post it to Sailthru.  A callback lets us know it has been processed and we are ready to start a campaign mail whenever it is convenient for us.  My data fields look like:

Data fields get populated in Sailthru

And they result in an email that looks like this:

Email is a critical component of our business, and we evaluate our provider regularly.  So far Sailthru’s combination of deliverability, features, API and price point have been tough to beat.

Oyster Shots on the Front End

In our last post Ben brought you up to speed on some of the inner workings of our latest addition to the site, Oyster Shots. Building the user interface for this new feature presented its own set of challenges. With Oyster Shots we wanted to create as immersive an experience as possible, allowing users to navigate our mountains of photographic content in a new and fun way.

Photo Sizing

One of the main goals we had with Oyster Shots was to provide the best photo-browsing experience to users with a wide range of screen resolutions: ranging from nerds like us that have huge monitors, to laptop displays, and even to those desktop displays still kicking around with 1024×768 resolution. We do this both on the photo detail view and the results view using a couple of different techniques:

The Client-Side Part

First off, we scale the photos as you resize your browser window using a combination of CSS and JavaScript. For the photo detail view, we have a fluid layout where the sidebar has a fixed width and the photo expands to fill as much space as is available. For the result view we used percentage measurements to define the width of image columns. At most window sizes four columns of photos looked pretty great, but at some smaller screen resolutions, the photos got too small. To address that we used JavaScript to change the number of columns based on window size: if the window size is less than 1410 pixels, you get three columns; if it’s more you get four. To keep the spacing between photos consistent regardless of size, we used an outer container with its width set by percentage and an inner box with fixed margins in pixels.

.photo-result-container {
    display: inline-block;
    max-width: 610px;
    min-width: 245px;
    vertical-align: top;
    /* percentage width to fit four photo results per row */
    width: 25%;
}
.photo-result {
    background-color: #fff;
    border: 1px solid #ccc;
    /* margins on the inside keep the spacing consistent,
    and don't mess with the width of the container */
    margin: 0 16px 44px 0;
    padding: 8px 0;
    position: relative;
    vertical-align: top;
}

We also use CSS and JavaScript to resize the photos themselves. Setting the image’s width to 100% and leaving the height at “auto” means the image will fill its container horizontally and maintain its original aspect ratio. Obviously we don’t want to use the same image file for every possible screen size: using a huge image and scaling it down would add undue page weight, while using a small image and scaling it up would look pretty terrible.

A Photo Detail Page for high-resolution display, and low-resolution

A Photo Detail Page for high-resolution display, and low-resolution

We have a defined set of image sizes that are available for all of our photos. So using a custom jQuery plugin, we check the size of the images when the browser window resizes to see if the image file needs to be replaced. So if the new size of the image is larger than its native resolution, we swap the src attribute with a larger image so that the scaling is cleaner. By measuring the direction a user is scaling his or her browser, either growing or shrinking the window, we can start to load new image sizes early.

Conventional wisdom seems to be that you’re not supposed to scale images in the browser. The two most oft-cited reasons are:

  1. You serve larger images than are needed: We dealt with this by having multiple sizes and serving the one that most closely matches the current display size.
  2. Browsers don’t do a good job of scaling images: We found that not to be the case. Rescaling images in a modern browser to a size close to its native resolution yields results that are quite acceptable.
One of these images was rescaled in Photoshop using bicubic resampling, one was rescaled in Internet Explorer.  Can you tell which is which?

One of these images was rescaled in Photoshop using bicubic resampling, one was rescaled in Internet Explorer. Can you tell which is which?

The Server-Side Part

Changing image sizes as a user resizes his or her browser window is one thing, but ideally we’d like to serve up the optimal image size on page load as well. To do this, we set a cookie with the user’s display size. This cookie is written on page load and every time the browser window is resized. We use this cookie on a number of pages that have fluid-sized images, including the Oyster Shots results page, to determine how large an image will be displayed when the HTML is loaded. Starting with the dimensions of the browser window, we do a few calculations: subtracting the height of the page header, the width of the sidebar and so on, to determine how much page real estate the photo will have to display. We find the closest available image size, and generate the HTML on the server side to use that image. That way when the resizing plugin on the client side takes over, the image does not need to be loaded again. Of course this can fail if a user is browsing in multiple windows, or clears cookies, for instance, but that is a minority of cases, and the image would just be reloaded once the client side sizing logic kicks in.

Quick Browsing

Another goal we had was to be able to browse photos quickly. From the Oyster Shots results page, once you click a photo to get to the detail view, you can start browsing through the results by clicking the arrows at the top right, clicking the photo itself, or using the left and right arrow keys on your keyboard. You may notice how quick it is to navigate from one photo to the next. This is because the detail pages are all loaded via Ajax, which allowed us to make a few key performance improvements.

The advantages of using Ajax to update only a portion of the page content, rather than causing a complete refresh, are well-known. What we did to go beyond that was to load multiple photo detail pages in a single request. The HTML for a dozen photo pages is stored in memory. That way, navigating to the next or previous photo in a series usually only requires reading a JavaScript variable to get the HTML content.

Loading multiple photo pages at once also allows us to preload images as well. When you’re looking at a photo detail page, we can look at the source of the next or previous image in the series and start to load it ahead of time.

The photo pages are stored in memory as objects that have a handful of properties, including a string that is the HTML for the page itself. Storing and manipulating a large number of photo pages in memory called for a couple of unique solutions, the first of which was a basic issue of organization. In the many ways that Oyster Shots works with these objects, we will at times need to access a particular photo page arbitrarily according to the unique image id, the id of the hotel that image belongs to, or the index (the order in which the image appears in the result set).

To allow for this flexibility, we maintain a single container object that stores the page objects and is keyed by the index, and two supplemental indexes where page objects are keyed separately by image id and hotel id. We used an object rather than an array for the primary collection.

//set up the main collection and the two supplemental indexes:
var images = {};  //main collection
var imageKeys = {}; //indexed by image id
var hotelIdImageKeys = {}; //indexed by hotel id

//update the indexes when we add an image to the collection:
function addImage(index, imageId, imageObj) {
    images[index] = imageObj;
    imageKeys[imageId] = index;
    var hotelId = imageObj.hotelId;
    if(!hotelIdImageKeys[hotelId])
        hotelIdImageKeys[hotelId] = [index];
    else
        hotelIdImageKeys[hotelId].push(index);
}

//now looking up image pages by image id or hotel id is easy:
function getImageById(imageId) {
    if(!imageKeys.hasOwnProperty(imageId))
        return false;
    var index = imageKeys[imageId];
    return images[index];
}
function getImagesByHotelId(hotelId) {
    if(!hotelIdImageKeys.hasOwnProperty(hotelId))
        return false;
    var hotelImages = [];
    var indexes = hotelIdImageKeys[hotelId];
    for(var i = 0, len = indexes.length; i < len; i++)
        hotelImages.push(images[indexes[i]]);
    return hotelImages;
}

Maintaining these indexes in this way allows us to quickly find a particular page object by its id, its hotel id, or its position in the search results with just a couple of object attribute lookups rather than looping through the entire set of images.

Another issue that arose is how to change the stored HTML, if needed, once it’s been retrieved from the server. We store the HTML as strings rather than DOM fragments because string manipulation is faster than DOM manipulation (even in a fragment) and strings take up far less memory than DOM fragments. Unfortunately that means that all of jQuery’s handy DOM methods are off the table for working with this content. To get around this, we marked sections of the page that would need to be changed on the fly with HTML comments. Then it was just a simple find and replace operation with no DOM manipulation required.

/*
    the markers are HTML comments like:
    <!--begin section-->
    and
    <!--end section-->
*/
function replaceHtmlAtMarkers(html, replace, beginMarker, endMarker) {
    var begin = html.indexOf(beginMarker);
    var end = html.indexOf(endMarker);
    end += endMarker.length;
    if(begin === -1 || end === -1)
        return html;
    var original = html.substring(begin, end);
    html = html.replace(original, replace);
    return html;
}

There were tons of problems we had to solve while building out the front end for Oyster Shots, and these are merely examples of a few of them. If you’re finding our new blog useful or interesting, leave a comment and let us know!


*Turn your monitor upside-down or stand on your head to read the answer:
ɹǝɹoןdxǝ ʇǝuɹǝʇuı uı pǝzısǝɹ sɐʍ ʇɟǝן ǝɥʇ uo ǝbɐɯı ǝɥʇ :ɹǝʍsuɐ

How our photo search engine really works

Our recently-launched Oyster Shots is a new tool you can use to search all of our stunning hotel photos by keyword and hotel attributes (location, rating, and hotel type). So far we have about 300,000 published photos, so there’s a lot to search. Some examples to get you started: New York City Restaurants and Bars, Dominican Republic Beaches, and Luxury Hotel Bathrooms.

If you want to know how to use Oyster Shots, just go to the landing page and watch the “How does it work?” video on the left. But if you’re a techie and want to know how it really works under the hood, read on.

Tagging our photos

One of the main questions up-front was whether we’d take a full-text or a tagging approach. Full-text search sounds nice, because you can just plug in a library and let it do the dirty work. But we don’t have enough textual data for each photo to make full-text results relevant — all we have is the photo’s original folder name, which is also used to generate the captions.

For example, a pool photo might be put into a folder called “Pool”, or if the photographer was feeling nice, perhaps “Pool/Infinity Pool”. So we generate one or two tags from that data. And we also use the metadata associated with the photo’s hotel, such as the hotel name, type, and location.

Armed with tags, the next step was to reduce duplication and remove pointless tags with only a few photos, like “Villa Suite Grand Ocean View Room With Double King Bed”. We did this via a combination of a manually-prepared “tag merge spreadsheet” and some Pythonic scripting goodness.

The tags are all stored in singular form, but the search code recognises plural forms too. This is done via a lookup table created using a pluralize() function (see ActiveState recipe).

Our tagging isn’t perfect, of course. One thing it doesn’t yet handle is synonyms — our autocomplete provides clues for what the user can type, but at present we more or less force them to use our keywords. For example, you have to type “new york bathrooms” and not “new york restrooms”. Also on the radar is to allow users to add their own tags.

Autocomplete

The first piece of back-end technology you use when doing an Oyster Shots search is the autocomplete. We use Ajax (via jQuery) to fetch the autocomplete results as you type. Results need to come back fast, and our Python backend can handle about 3000 calls per second (though of course you won’t get that across a real HTTP connection).

An autocomplete lookup uses sorted indexes with Python’s built-in “bisect” module to do a binary search. Each binary search is of course O(log(N)), where N is the total number of items in the index. Python’s “bisect” module has a fast C version just to speed things up even further.

We have one index for whole-name matches and one for single-word matches. For example, if we looked at just the Las Vegas sections of the indexes we’d see something like this:

name_index = [
    ('lasvegas', LocationEntry('Las Vegas')),
    ...
]

word_index = [
    ('las', LocationEntry('Las Vegas')),
    ('vegas', LocationEntry('Las Vegas')),
    ...
]

If your query is “las vegas”, both indexes will match to give you Las Vegas, but if you just type in “veg” or “vegas”, the word_index will match and give you results. After the binary search we merge and sort the results: name matches are prioritized over word matches, and each match type has its own priority.

If there are no results, the autocompleter switches to multi-tag matching mode, which chops words off the end of the query until there are matches (if any), and then combines the first result of that with results from matching on the rest of the query to produce multi-tag matches. For example, if you type “miami p”, nothing will match directly, so we combine the first result for “miami” with results from “p”, giving “Miami + Pool”, “Miami + Presidential Suite”, etc.

The Sound Pool at The Standard Miami

Sorting and searching

The first step here was to read Donald Knuth’s The Art of Computer Programming, Volume 3, Sorting and Searching — I did this one morning on my subway commute to work, and then jotted down the answers to all the exercises on the commute home.

Seriously though, the authors of Python have read Knuth for us, and Python’s dict implementation is probably one of the fastest general-purpose hash table implementations on the planet. Our search code makes use of them liberally — dict lookups are fast, RAM is cheap, so why not cache 300,000 Image objects with a bunch of dicts? The two main caches map tags to images and hotels to images. Each value is a pre-sorted list of Image objects to speed up the sorting and pagination process.

The code also makes heavy use of Python’s built-in set objects to keep track of what we’ve seen, various filter options, etc. Sets use dicts under the covers, so lookups and insertions are both O(1) and fast.

Performance

Choosing good algorithms and data structures is much more important than which language you’re programming in. We chose Python because it allows us to iterate and write features quickly. Python compiles to simple and fairly unoptimized byte code, so it’s not known as a “fast” language, but because its data structures and built-ins are heavily optimized and written in C, it’s “fast enough”. (In fact, C++ hash_maps are significantly slower than Python dicts.)

As mentioned above, we’ve chosen our data structures carefully and made heavy use of dicts to cache things. But you have to be careful — partly because Python is flexible and dynamically typed, it’s easy to hide an O(N) or even an O(N2) operation where you meant an O(1) one. For example, consider the following (more or less real) code:

for image in images:
    if image.hotel_id in hotel_ids:
        ...

If hotel_ids is a set, as it should be here, this is an O(len(images)) operation, but if it’s a list, we’re talking O(len(images) * len(hotel_ids)), which could be a very significant difference. You can shoot yourself in the performance foot in any language, but Python’s conciseness means it’s relatively easy to hide a really slow operation in a comparatively innocent-looking line of code.

Up next

So there you have it (or some of it). In our next blog entry, our lead front-end developer Alex will describe the fancy CSS and JavaScript he uses to make the front-end side of Oyster Shots so slick.

Welcome to the Oyster Tech Blog

Oyster.com is building the best online resources and community for hotels and travel.  We want everyone to be able to find the hotel that is perfect for them.

For the engineering team, that means building tools and features to showcase terabytes of photos, combine hotel facts from a multitude of sources, and always find you the best price.

This blog is going to share the dirty details of how we do it.  How do you build a content management system that allows writers to do their job from a hut in Jamaica with no internet access?  How do you expose your photo database to the world in a way that is easy and fun?  Oh, and by the way, it’s going to be on national news tomorrow night and your servers better be able to handle the load.

We’re excited to get started.  First up is how we built Oyster Shots!