Oyster.com Tech Blog Insights from our engineering team

Automated virtual tour

HDR Panorama

Welcome back to the 2nd of the 3-part Computer Vision series at Oyster.com. If you have not seen our 1st part of the series, we would recommend you check it out, in that part we show how HDR (High Dynamic Range) panoramas are done at Oyster. That includes some comparisons between Oyster’s panoramas and Google’s.

In this part, I will share the core work of our recently released feature Virtual Tours, also known as walkthroughs.

I will first give an introduction of walkthroughs and why we needed to build our own framework for this purpose. I will then jump right into the details of the computer vision system that generates virtual walkthroughs from sets of panoramas. Lastly I will show some walkthrough results obtained from this approach, including different scenarios for indoor and outdoor.

Generating walkthroughs from panoramas

A walkthrough is a set of connected panoramas where users can navigate from one panorama to another. This type of feature provides more interactive experience for users looking at remote destinations. Some examples of walkthroughs can be checked out on Oyster.com, for example: walkthrough for Trump Soho Penthouse or walkthrough along a Scrub Island Beach Pool. There are different approaches to generate those walkthroughs.

One common approach is to use depth information to reconstruct the 3D scene for each panorama spot, like Matterport. This provides seamless transitions from scene to scene. However, it comes with high cost in purchasing their own special device and model hosting, and its quality is not comparable to standard DLSR cameras. The more economic and more popular approach is one that uses 2D panoramas to build walkthroughs. These can be viewed with popular 360 image viewers such as Krpano that allow users to walk between panoramas. The connections between panoramas can be built manually, but Oyster shoots thousands of walkthroughs a month and needed an automated solution. We developed a fully automated framework that uses Computer Vision to find and connect all related panoramas into one complete walkthrough.

Automating walkthrough process

Our process starts with a set of HDR panoramas as input (Figure 1 shows a list of test panoramas taken at TripAdvisor Office@NYC). It finds the panoramas that are connected, estimates the links between those panoramas, and integrates those links into a Krpano virtual tour project.

Figure 1: Set of equirectangular panoramas as input Set of equirectangular panoramas as input

Given two panoramas, which we can call 1 and 2, the problem of creating a virtual tour from these two panoramas becomes finding the location of camera 1 in panorama 2’s spherical coordinate, and location of camera 2 in panorama 1’s spherical coordinate. By projecting our spherical coordinates (defined by horizontal angular ath and vertical angular atv) into planar coordinates (horizontal value x and horizontal value y) (where the reprojection from planar to spherical is given by: ath = (x/width - 0.5) * hfov and atv = (y/height - 0.5) * vfov, and hfov = 360 and vfov = 180 for our panorama), resulting in a set of local planar slices, we can look for the location of camera 1 in all local slices of camera 2’s model and vice versa, the location of camera 2 in all local slices of camera 1’s. Therefore, the original problem of finding camera location for each pair of panoramas becomes a search for camera location in all slices of the other panorama, given a set of n panoramas and m local slices for each panoramas we will need to carry out n * (n - 1) * m * (m - 1) / 4 slice-slice matchings.

Figure 2 illustrates the top-down view between panorama 1 (in blue, having camera center O1) and panorama 2 (in yellow, having camera center O2). Each panorama sphere is warped onto a cube, which appears as a square looking top-down with 4 sides FRONT-RIGHT-BACK-LEFT.

Figure 2: Automated process for generating virtual tour Camera models for two panoramas

For each image-image matching of 2 slices, the camera location is the center of the pin-hole camera model, and the geometry between these two image models is an epipolar geometry. If we look at image 1-FRONT and 2-RIGHT, camera centers O1 and O2 of the two images form a line from image 1 to image 2 and intersects each image at E1 and E2, or the Epipoles in epipolar geometry. The Epipole E1 on image 1 is the pixel coordinate (x1, y1) of camera center O2 on image 1, and the Epipole E2 on image 2 is the pixel coordinate (x2, y2) of camera center O1 on image 2.

Without loss of generality, let us assume sides Front and Back of camera 1 share overlapping views with sides Right and Left of camera 2 in this example. Our process will need to locate E1 (image of camera center O2 in panorama 1) and E2 (image of camera center O1 in panorama 2). The coming sections will describe how these two values can be calculated based on the overlapping plane and corresponding points detected in both images. The whole process for this image-image matching has 4 main steps: construct local views, find corresponding points, find hotspot location in local coordinate, and transform local to spherical coordinate. Those steps are illustrated in Figure 3.

Figure 3: Automated process for generating virtual tour Automated process for generating virtual tour

Construct local views

A 360 panorama is a representation of a sphere which center is at the camera location. The original equirectangular format of input panorama can be divided into 6 non-overlapping rectangular local views representing the Up, Down, Left, Right, Front, and Back side of the cube covering the 360 sphere of the panorama scene. This division enables us to use epipolar geometry constraints of two image planes sharing overlapping views to find epipoles, which are images of camera positions in our case. We leave out Up and Down views since they do not contain hotspots for virtual tour.

An equirectangular panorama can be split into 6 rectangular images using krpano tool, the result is shown in Figure 4

krpanotools64.exe makepano image.tif normal.config

Figure 4: Slicing panorama into local views Slicing panorama into local views

This slicing warps all pixels (having spherical coordinates ath, atv) on the sphere into the pixels (having planar coordinates x, y) on the sides of the bounding cube. Each side dimension is twice the radius of the cube. The spherical-planar projection is defined by x = (ath / hfov + 0.5) * width and y = (atv / vfov + 0.5) * height, which we will use to project our estimated camera planar coordinates back to spherical coordinates.

Find corresponding points

In pin-hole camera model, a pixel coordinate on an image represents a set of points lying on the ray light from camera center towards that point in 3D (and goes on to infinity). With another camera viewing the same scene, we can see that line, or in other words, a point in one camera is transferable into a line in another camera in epipolar geometry, this line is corresponding line, as illustrated in Figure 5. All the corresponding lines have a common property, they all go through the Epipole, that is, given all points on image 1-FRONT in Figure 2, we can project all corresponding lines on image 2, and all these corresponding lines intersect at Epipole E2 on image O2, and similarly for Epipole E1 on image 1. In order to find corresponding lines from image points in pixel coordinate, we need first to find the fundamental matrix of the epipolar geometry. This fundamental matrix is a rank-2 3x3 matrix that represents the relative pose (translation + rotation) of image Left and right Right (or vice versa) as well as the intrinsic parameters of two camera. It has 7 parameters, 2 for each Epipole, and 3 for the homography that relates the two image planes. The convenient property of fundamental matrix is that it can be calculated from sufficient corresponding points. Corresponding points here are pixel points appear on two images that are pointing to the same 3D real-world point.

Figure 5: Epipolar geometry of the overlapping view of 2 camera models Epipolar geometry

With all those theories established, our problem of connecting panoramas into a virtual walkthrough now comes down to finding corresponding points on each slice image pair of the two panoramas. For this task, we resort to feature matching, which is a robust approach for dynamic views. The matching consists of three main steps, feature detection and feature matching and feature pruning.

Feature detection

Feature detection is the process of running pre-defined feature filters on an image to discover features that are discriminative and view invariant (for example point at corners or edges where ). OpenCV has implementation for a collection of robust local features such as FAST, STAR, SIFT, or SURF (please check out OpenCV’s documentation for more available feature detectors)

An example of how SIFT can be used for detecting local features is (Note: Since SIFT and SURF are patented feature, you should use other free features provided by OpenCV to avoid license fee)

detector = cv2.xfeatures2d.SIFT_create(nfeatures=2000, nOctaveLayers=3, 
    contrastThreshold=0.03, edgeThreshold=10, sigma=1.6)
kp, des = detector.detectAndCompute(self.image, self.mask)

In the code above, there are 2 paramas that are quite important, nfeatures and contrastThreshold, reducing contrast threshold or increasing number of maximum features will give us more features, and vice versa. Those values should be chosen based on the nature of image data that we are dealing with, and the focus of our detection process. In our case, the distance between camera locations in real-world coordinate is unknown, which could be too far so a more appropriate design is to extract as many features as we can at detection phase, then in feature matching and pruning phase we will filter out irrelevant features. More practical decisions like this will be discussed later in our last section, along with our coarse-to-fine approach to efficiently extract features at constrained processing time.

The following figure shows the result of our feature detection process on two images, one from each panorama, local features are drawn in different colors, and as we can see they are mostly detected on corners and edges, and some features seem to be detected on both paranomas, those are the corresponding points that we are looking for.

Figure 6: Local features detected from the two images Local features detected from the two images

Feature matching

At each location where the feature is detected, a set of attributes are extracted to define that feature, they are called feature descriptors, some of the most common feature descriptors implemented in OpenCV are SIFT, SURF, HOG, BRIEF, BRISK, again please refer to OpenCV’s documentation for more available feature descriptors. Those descriptors can be seen as a normalized (orientation-wise) vectorized aggregation (spatial-wise) of primitive filter response (e.g. SIFT descriptor gives a 128-dimensional vector aggregated from 4 x 4 location bins in left-right top-down spatial order, each bin is represented by accumulated gradients grouped in 8 orientation bins).

Featue matching is the process of finding the same set of features that appear in both images given feature descriptors (in forms for multidimensional vectors). Given ten of thousands of features being detected in each image, an efficient matching approach is to use kd-tree (k dimensional binary tree) to first index all features of one image and matching with features from the other image can then be done by traversing the indexed trees. In order to minimized false negatives in matching, knn (finding the nearest k matches for each feature) is also used. All those theories can be done with OpenCV API in 2 lines of code

matcher = cv2.FlannBasedMatcher(dict(algorithm=FLANN_INDEX_KDTREE, trees=5), dict(checks=50))
matches = matcher.knnMatch(self.pano_image_from.des, self.pano_image_to.des, k=2)

In this context, number of trees, traversal checks, and number of k nearest neighbors are chosen based on accuracy-time tradeoffs in our particular applications. The following figure shows the result of this matching step, as we can see in this design we aim to reduce false negative matches so more matches are returned than needed, those matches will be cleaned up in the following feature pruning section.

Figure 7: Feature matching results Feature matching results

Feature pruning

Feature pruning is a series of filters being applied onto matched features, to filter out features that have passed through our previous matching process (which was purely based on similarity in appearance) but are not the exact corresponding features. These wrongly matched features are actually very common, most of the time they are features on similar or the same objects (e.g. features on the brick wall do share the same appearance across the whole wall, features on 2 different corners of the board do have similar appearance, just a rotated version of each other), or features that are noise (those are normally random dots on clean backgrounds that appear so frequently that there are many matches of these, with similar matching distance).

A successful feature matching system is mostly determined by how well feature pruning is implemented, if it is too strict we might end up with insufficent matches, but if it is too loose that will not only increase our processing time but will return in incorrect matches for later phases.

Pruning features should be designed based on the nature of the data. For our problem, there are 4 main pruning filters that can be used, which are ratio filter, cross-match filter, orientation consistency filter and spatial consistency filter.

  • Ratio filter is designed to remove feature noise as mentioned above, the idea is, given all matches of a feature along with their similarity scores, a feature is determined as a feature noise when its best and second best match has too similar similarity scores. Again, if a feature can be matched with similar confidence to 2 different features, that feature is considered as a feature noise (e.g. a random dot on a wall could be matched with similar confidence with other random dots). This filter checks for the ratio of similarity score between the second best match and the best match, that ratio has to fall under a certain value for that feature and that match to be valid. This pruning technique was first proposed by David Lowe (the author of SIFT feature) using 0.7 as ratio threshold, but this type of filter could be used effectively with any features that have explicit similarity scores.

  • Cross-match filter is checking for mutual matching result of a pair of features, in other words, 2 features are considered to be correctly matched when each feature appears in the match list of the other feature.

  • Orientation consistency filter is specific for our particular problem where there are no rotation in the transformation of one image into another, that is, given the cameras are placed and locked on tripod when taking the photo, that results in images having same upright orientation wherever the tripod is placed. This filter checks for the dominant orientation of the feature and its match to see if they are close. This filter can be applied on any feature detector that calculates dominant orientation.

  • Spatial consistency filter is similar to orientation consistency, because image orientation is preserved between different shots, relative spatial relationship between filters are preserved, and this filter checks if a feature and its match keeps this relationship.

The following figure shows the result of feature pruning process after 4 filters have been applied, those features indicate a more clean and accurate match than the first matching set we obtained from pure appearance comparison.

Figure 8: Feature pruning results Feature pruning results

Find hotspot location in local coordinate

The corresponding features detected from two images are then used to find hotspot location in local coordinate, this is done in three steps, first to find the fundamental matrix of the epipolar geometry constructed from these two sets of corresponding points, then use the fundamental matrix and the two sets of features to find a set of corresponding lines on each image, and lastly interesections of corresponding lines on each image are derived as the epipoles or camera locations in local planar coordinate.

As discussed previously, fundamental matrix in epipolar geometry can be estimated based solely of a set of coordinates (with at least 8 corresponding pairs). A robust approach towards finding fundamental matrix is by using RANSAC (Random sample consensus) technique, which is an iterative process where at each iteration, a random subset of the corresponding pairs are chosen as the “inliers” to construct the sample fundamental matrix, the rest of the corresponding pairs are then treated as test samples, where the sample fundamental matrix is used to find estimated corresponding points of the test set, which is then compared with the actual corresponding points of the test set to find inliers in the test set, which is then used as the measure for this sample pick. The best sample fundamental matrix is then chosen and returned. The following code shows how it is done in OpenCV (here fundamental_mask is used to trace back on original matches for inlier (or valid) matches).

fundamental_mat, fundamental_mask = cv2.findFundamentalMat(valid_matches_left, valid_matches_right, cv2.FM_RANSAC)

Given the fundamental matrix, we can then calculate the epilines on the other image for every inlier point from one image.

valid_matches_left = valid_matches_left[fundamental_mask.ravel() == 1]
valid_matches_right = valid_matches_right[fundamental_mask.ravel() == 1]
epilines_left = cv2.computeCorrespondEpilines(valid_matches_right.reshape(-1, 1, 2), 2, fundamental_mat)
epilines_right = cv2.computeCorrespondEpilines(valid_matches_left.reshape(-1, 1, 2), 2, fundamental_mat)

Once epilines are detected, epipole is then derived as the intersection of those epilines, and epipoles that fall within the image boundaries are valid. The estimated epipoles indicate the ray that two cameras are connected, so it could go from one image to another or vice versa. As we need to identify whether we can navigate from one image to another, we use the concept of far-near relationship to describe the 2 images. Given 2 images are showing the same scene, there is one image is closer to the scene than the other image, that is the requirement for valid epipole to be found within image boundary, and the navigation will only happen from the far image to the close image, in order words we are looking for the far image as its epipole is the hotspot we are finding. In order to find out which image is further from the scene, we use the average distance all all inlier features to its mean location (in both horizontal and vertical dimension). Image with smaller average distance is the image further away from the scene.

Figure 9: Hotspot estimation - forward Correct hotspon estimation

Out of all possible local view matches (16 matches for 4 view consideration - LEFT, RIGHT, FRONT, BACK - or 36 matches for 6 local view consideration - including TOP, DOWN) we should ideally end up with 2 epipole locations, the first one to go from one panorama to the other, and the second one to go back from the other panorama. However, in practice we normally end up with more than 2 valid epipoles, we use a metric called average vertical distance to rank pairs of epipoles in terms of correctness. Average vertical distance is the average of the distance from the two estimated epipoles to the middle lines. In theory, given the camera tripod is at a fixed height level, the epipoles should always reside on the middle of the image, so we can use this property to find the best epipole pair that has the minimum distance to the middle lines of the images. Using this metric, we are then be able to locate the best hotspot in all matches from one pano to the other pano (E1 in Figure Epipolar Top-Down), in turns help us decide the opposite hotspot on the other side of the spherical cube from the other pano back (E2 in Figure Epipolar Top-Down)

Figure 10: Hotspot estimation - backward Correct hotspon back

Transform local to spherical coordinate

Once the location of hotspots in local planar coordinate are found, we can derive global planar coordinate based on the index of the local view plane, then spherical coordinate can be calculated based on panorama’s size and field of view

Figure 11: Projecting local coordinates to spherical coordinates Correct hotspon back

Practical implementation tips and tricks

So far we have presented a complete workflow to generate virtual tours automatically using OpenCV and Krpano. There are few additional points that we might take into consideration when trying to implement this workflow for production scale.

  • Local view matching: an alternative to running all 4 x 4 local views matching, we can use the result one view against other 4 views to decide we need to carry out the rest of the matches
  • Coarse-to-fine framework for feature detection and feature matching: the number of features detected and matched is proportional to processing time and accuracy, so we derived a coarse-to-fine framework with 3 layers going from small number of features (coarse layer) to large number of features (fine layer). Only when some good matches are identified at a coarser level, we can move on to the finer level. This type of framework reduces greatly processing time while still maintains high accuracy in matching.
  • Metrics to find best hotspot: in this context we use the average distance to the horizon line of the image as the metric to find best epipole location, but that metric could also be other estimators that work best for the scenario, for example good feature ratio (ratio of final inlier feature counts on all detected features), or the amount of inliners found…
  • Free vs. non-free: One important note about feature choice is the extra cost involved in using non-free features (SIFT, SURF), for production code it is best to consider the tradeoffs between accuracy, processing time and cost to make sure you pick the most suitable feature type for your application.

Results and summary

In this post, we have presented the complete workflow to create virtual tour from a Computer Vision approach, using local feature matching and epipolar geometry. We also discussed some practical notes about implementing such system for production scale. Together with our presented work in HDR panoramas/computer-vision-part-1-hdr-panorama), we can create high quality virtual tours to boost up user experience and engagement on our sites. Again, if you have not checked out our previous part where we talked about how High Dynamic Range panoramas are created at large scale, please check it out/computer-vision-part-1-hdr-panorama), in the next and last part of this Computer Vision series, we will show you how to integrate smart features like moving mini-maps or dynamic mouse arrows into virtual tours.

Here are some of the virtual tours generated from the approach we just described, the links above each screenshot will lead you to our live walkthroughs on Oyster.com, with support for both desktop and mobile.

Dream Downtown Lobby cv2-wt-dream-downtown-lobby

Gansevoort The Chester cv2-wt-gansevoort-the-chester

Holiday Inn Montego Main Pool cv2-wt-holiday-inn-montego-main-pool

Refinery Hotel Rooftop cv2-wt-refinery-hotel-rooftop

Scrub Island North Beach cv2-wt-scrub-island-north-beach

Trump Soho Penthouse cv2-wt-trump-soho-penthouse

About the author:

Tuan Thi is a Senior Software Engineer in Computer Vision at Oyster.com, part of Smarter Travel Media Group, at TripAdvisor. He finished his PhD in Computer Vision and Machine Learning in 2011. Before joining TripAdvisor, he was a research engineer and computer vision scientist at Canon Research and Placemeter Ltd. with various international publications and patents in the field of local features, structured learning and deep learning.

Generating HDR panoramas at scale

HDR Panorama

Here at Oyster, we are the leading website for comprehensive photographic reviews of hotels. One key component of our imagery database is panoramas, produced at high quality and large scale (over 150,000 to date). In this three-part series, we will be looking at the Computer Vision work that has been part of our panorama pipeline. In this first part of the series, we will introduce our automated pipeline for generating High Dynamic Range (HDR) panoramas.

HDR Panorama

Panorama images at Oyster have full angle range with 180 degrees vertically and 360 degrees horizontally. They provide an immersive experience for viewers to explore a ubiquitous view of the venue - whether it is outside at the pool, on the rooftop or inside a hotel room. Panorama images have now become a trending and must-have media type for most image-oriented websites. Meanwhile, HDR imaging is a common technique to produce a greater dynamic range of luminosity than standard digital imaging. It is especially useful for panorama imaging where an evenly distributed look will greatly improve the quality. HDR imaging is normally achieved by merging multiple low-dynamic-range photographs. We use PTGui, a stitching software, to carry out batch stitching of 12 fisheye images (180 degree x 180 degree) of the four different views (left, right, front, back) with three images of different exposure for each view. The stitching process will return one equirectangular panorama.

Raw fisheye images

Fisheyes images

While PTGui is a good choice for image stitching, its HDR quality is not the best available. People often use alternative tools for HDR merging. SNS-HDR is the package used by Oyster. It has support for batch processing, excellent HDR quality, and adequate deghosting support (compared to PTGui).

PTGui HDR

PTGui HDR

SNS-HDR

SNS-HDR

The tricky part for using SNS-HDR as a batch tool is its limited support on file format input and the auto-grouping of images of the same view, and that is where Computer Vision comes into play.

The first problem is on accepted formats for image input. SNS-HDR works with RAW files, but it works especially well with converted raw DNG format (using DNGConverter), compared to two common formats CR2 or NEF.

One possible artifact with SNS-HDR on original raw files

SNS-HDR RAW

Stable SNS-HDR merge on dng files

SNS-HDR DNG

The second problem of using SNS-HDR batch processing is to figure out the three images of the same view. This is done using OpenCV (Python binding with Numpy), one of the most comprehensive Computer Vision libraries to date. We use DCRAW to convert the DNG files to TIFF since OpenCV does not work directly with raw files (CR2, NEF, DNG). Then OpenCV can be used to detect the four sets of near-duplicate three images of the same view.

Near-duplicate image detection

In this context, we have 12 fisheye images (180 deg vertical by 180 deg horziontal) of four adjacent views (left, front, right, back). Each view has three images taken at different exposure level (shutter speed varied), and we need to robustly divide the 12 images into those groups of three. Since they are different in exposure, we cannot apply direct comparison methods like checksum.

There exists several methods to carry out near-duplicate image detection which all involve using a pre-processing step (e.g. histogram equalization), a pair-wise similarity metric of choice (e.g. pixel-wise or block-wise distance, edge or contour difference, norm-1 or norm-2 distances), and an association method. These methods are combined and tuned based on the practical constraints of the problem.

In our approach, we apply histogram equalization to balance out multiple exposure levels, then a pixel-wise absolute difference. Pixel-wise difference is chosen because spatial difference is more important in our case of unaltered adjacent views. For cases like detecting transformed images people normally opt to local edge or contour difference.

This is followed by a postprocessing step of lower bound trimming to remove illumination difference noise, and image erosion to remove camera movement noise. This step will return a binary difference image of any two input images which could also be used for detecting ghosting problems in HDR merging.

Image Comparison using pixel-wise difference, with lower bound trimming and erosion

def image_comparison(img1, img2, lower_bound=120):
    e1 = cv2.equalizeHist(img1)
    e2 = cv2.equalizeHist(img2)

    diff = cv2.absdiff(e1, e2)
    _, diff = cv2.threshold(diff, lower_bound, 255, cv2.THRESH_BINARY)
    kernel = np.ones((2, 2), np.uint8)
    diff = cv2.erode(diff, kernel, iterations=1)
    nonZero = cv2.countNonZero(diff)

    return nonZero

The last step in this approach is association, where pair-wise image difference is used to associate similar images into sets. This is the step that is normally specific and fine-tuned for different systems, and there are two common ways this can be implemented. It is similar to a common clustering problem, and you can either use distance-based (hierarchical clustering) or iterative centroid or group-based (k-means clustering).

In a distance-based method, a distance threshold value is chosen (or learned empirically from data) to decide if two items belong to the same group. Once two images are matched, subsequent association steps will only need to be carried on one sample element of the group. This approach has the advantage of being fast (linear processing time), but its performance depends on how well the distance threshold is chosen, therefore this method is mostly used when data is well separated and speed is a requirement.

In our case the number of images is small and the accuracy requirement is 100% of correct match (imagine a HDR image merged from three different images - it does not look pretty). Therefore we go for the second approach where we match all pair-wise combination of source images (66 matches for 12 images). The constraint on four sets of three images is used as the termination condition for our association step. Our iterative association consists of two steps, collecting tuples of top three matches (representing one group) and filtering out good matches (any match tuples that are collected exactly three times is a correct association). This is repeated until all elements are filtered.

Similar image association

def find_groups(unselected, grouped, matches):
    freqs = {}
    m = {}
    for i in range(12):
        n = {}
        for j in range(12):
            if i == j: continue
            if j in unselected:
                n[j] = matches[(i, j)]
        m[i] = sorted(n.items(), key=operator.itemgetter(1))
        new_set = sorted([v[0] for v in m[i][:2]] + [i])
        freqs[tuple(new_set)] = 1 if tuple(new_set) not in freqs else freqs[tuple(new_set)] + 1

    for k, v in freqs.iteritems():
        if v == 3:
            for i in k:
                unselected.remove(i)
            grouped.append(k)

Once similar images are grouped into correct views, SNS-HDR is used to merge LDR images into HDR images (with tonemapping). PTGui is then called to stitch the four merged HDR into one equirectangular panorama.

Normal Panorama

PTGui HDR

HDR Panorama done right

SNS-HDR

Here is list of randomly selected panoramas from Oyster compared to Google

Trump Soho from Google.com

Trump Soho from Oyster.com

Refinery Hotel from Google.com

Refinery Hotel from Oyster.com

Holiday Inn Resort Montego Bay on Google.com

Holiday Inn Resort Montego Bay on Oyster.com

Sensatori Jamaica by Karisma on Google.com

Sensatori Jamaica by Karisma on Oyster.com

In this post, we have presented our approach to generating HDR panorama at large scale using available packages like DNGConverter, DCRAW, SNS-HDR, PTGui, and with the help from Computer Vision techniques with OpenCV. Please feel free to visit our website Oyster to see our rich collection of hotel panoramas all around the world. Also, please stay tuned for part 2 and 3 of this Computer Vision series, where we will show you how virtual tour can be generated (again fully automated at large scale) from a set of panoramas, and how smart features like mini-maps can be added to your tour to improve user experience.

About the author:

Tuan Thi is a Senior Software Engineer in Computer Vision at Oyster.com, part of Smarter Travel Media Group, at TripAdvisor. He finished his PhD in Computer Vision and Machine Learning in 2011. Before joining TripAdvisor, he was a research engineer and computer vision scientist at Canon Research and Placemeter Ltd. with various international publications and patents in the field of local features, structured learning and deep learning.

Using React and jQuery Together

React + jQuery

In this post I’ll be covering some tips on how to use React and jQuery together in the same UI. Okay so first off you might be thinking “why would you want to do such a thing?” - in fact the idea of trying to make React’s declarative style live together with imperative jQuery DOM updates may have you thinking something like this, and for the most part, you wouldn’t be wrong.

So first the “why.” If you’re starting a brand new, “greenfield” project and you want to use React, then just do it. There’s no good reason I can think of to mix-and-match React with jQuery or Mustache or whatever other DOM-helper/template library if you don’t have to - just use React and everything will be cool and you won’t have to worry about it. If on the other hand, you have one of those “legacy” applications that has “customers” and makes “money,” and for some reason your boss is not into the idea of you spending a few weeks rewriting the whole front end in React, but you still want in on that declarative React goodness, you may have to figure out how to get React to play well with jQuery or something similar.

Are you sure you can’t rewrite it?

Let’s say you have some piece of UI that gets rendered with jQuery, and you want to stick in some new component written with React. Take a look at your jQuery rendering function. Are you just building up some big string of HTML and sticking it in the DOM? Or using some kind of JavaScript template? If your rendering is already reasonably functional, i.e. some data goes into your function and some HTML comes out (or gets appended to the DOM or whatever) it will be pretty easy to just rewrite in React. So you should probably just do that and save yourself the inevitable hassle you’ll have when something breaks and you have to debug it.

Okay so you can’t rewrite it

For whatever reason you’ve determined it’s not practical to rewrite your jQuery code. Here is an important caveat: I think using jQuery and React to manage updates to the same DOM elements is a bad idea. React is really smart about figuring out how to update the DOM, but that only works if React is the only thing doing the updates. So unless you can cleanly separate the DOM elements in your UI so that some only get updated by React and others only get updated by jQuery, I wouldn’t try it.

So say for instance you’re going to render a product list with a React template, and then jQuery is going to add and remove CSS classes to the list items, and then your React render function might get called again later. This is a bad idea. React will have no idea about the changes that jQuery has made. Some of React’s efficiency comes from reusing DOM nodes on the page when things change, rather than always inserting or deleting nodes. If jQuery is making DOM changes that React doesn’t know about, some node that gets reused might be in an unexpected state.

One further caveat is that you should only really consider this mixed approach if you’re planning to eventually replace jQuery rendering with React. Using both doesn’t make sense long-term, but if you’re looking to gradually transition to React, you may have some parts of your UI using both for a while.

Let’s write some code

As an example we’ll start by rendering a simple list of products with jQuery. We’ll just show the name for each product and a button to buy it. Then we’ll get into replacing parts of the UI with React.

Our jQuery product list is pretty basic - it takes an array of products and inserts the list into #product-list-container. If the product list is updated, you just call productListJustJquery() again and replace the whole list with a new list.

var products = [
  {
    id: 1,
    name: 'Book',
    price: 15
  },
  {
    id: 2,
    name: 'Burrito',
    price: 8
  },
  {
    id: 3,
    name: 'Spaceship',
    price: 999999999
  },
  {
    id: 4,
    name: 'Dinosaur Bones',
    price: 5000000
  }
];

function buyProduct(productId) {
  // buy the product
}

/* -- Just jQuery -- */
function buyButtonJquery(product) {
  var button = $('<button class="buy-button">$' + product.price
    + '</button>');
  
  // handle click event
  $(button).on('click', function(event) {
    event.preventDefault();
    buyProduct(product.id);
  });

  return button;
}

function productListJustJquery(products, element) {
  var list = $('<ul class="product-list"></ul>');

  products.forEach(function(product) {
    var item = $('<li>' + product.name + '</li>');
    item.append(buyButtonJquery(product));
    list.append(item);
  });

  // replace the existing list if there is one
  var currentList = $(element).find('.product-list');
  if (currentList.length) {
    currentList.replaceWith(list);
  } else {
    $(element).append(list);
  }
}

jQuery inside a React component

We’ll start by replacing most of the product list with React but leaving the buy button in jQuery. This is simpler than the inverse - sticking React inside a jQuery UI - so we’ll do it first. The ProductListComponent is pretty straightforward:

var ProductListComponent = React.createClass({
  render: function(props) {
    return (
      <ul className="product-list">
        {this.props.products.map(function(product) {
          return <ProductComponent 
              key={product.id} 
              product={product} />
        })}
      </ul>
    );
  }
});

but in ProductComponent we need some extra code to make the call to jQuery. We add an extra button-container element, so that we have somewhere to put the jQuery DOM, and keep a reference to it.

render: function(props) {
  /* we need to keep a ref to the 
   * button-container so we can update it with jQuery
   */
  return (
    <li>
      {this.props.product.name}
      <span className="button-container" 
        ref="buttonContainer"></span>
    </li>
  );
}

Life cycle methods

It’s important to get familiar with the various React life cycle methods. The relevant ones here are componentDidMount - which is called after the first render, and componentDidUpdate - which is called after subsequent renders. In each of these methods we just call renderBuyButton, which uses our reference to the button-container DOM node to create a brand new buy button with jQuery on each render.

componentDidMount: function() {
  this.renderBuyButton();
},
componentDidUpdate: function() {
  this.renderBuyButton();
},
renderBuyButton: function() {
  // render the buy button with jQuery
  $(this.refs.buttonContainer).html(
    buyButtonJquery(this.props.product)
  );
}

Here’s the complete ProductComponent:

var ProductComponent = React.createClass({
  componentDidMount: function() {
    this.renderBuyButton();
  },
  componentDidUpdate: function() {
    this.renderBuyButton();
  },
  render: function(props) {
    /* we need to keep a ref to the 
     * button-container so we can update it with jQuery
     */
    return (
      <li>
        {this.props.product.name}
        <span className="button-container" 
          ref="buttonContainer"></span>
      </li>
    );
  },
  renderBuyButton: function() {
    // render the buy button with jQuery
    $(this.refs.buttonContainer).html(
      buyButtonJquery(this.props.product)
    );
  }
});

React components inside jQuery

Now we’re going to do it the other way and stick some React DOM inside our jQuery DOM. This is a little trickier. We’ll start with a BuyButtonComponent in React, there’s not much to it:

var BuyButtonComponent = React.createClass({
  onClick: function(event) {
    buyProduct(this.props.product.id);
  },
  componentDidMount: function() {
    console.log('component did mount - stuff to clean up later');
  },
  render: function(props) {
    return (
      <button className="buy-button" 
        onClick={this.onClick}>{this.props.product.price}
      </button>
    );
  },
  componentWillUnmount: function() {
    console.log('about to unmount - clean up stuff here');
  }
});

function buyButtonReact(product, element) {
  ReactDOM.render(
    <BuyButtonComponent product={product} />,
    element
  );
}

I’ve added the componentDidMount and componentWillUnmount methods with some console.logs. They clearly don’t really do anything in this component, but in a real component you’ll often do something in componentDidMount - subscribe to event from a Flux store or something - that needs to be cleaned up when the component unmounts. We need to make sure these methods still get called at the right times or you risk memory leaks or trying to update a component’s state property when it no longer exists (which will throw an error).

So now we’ll alter our jQuery function for rendering the product list to use our new React BuyButton component. We’ll use the same strategy of adding an extra button-container component here. We also attach the product data to the container component so we can use it later.

products.forEach(function(product) {
  var item = $('<li>' + product.name + '</li>');

  /* add a container element where 
   * we'll attach our React component
   */
  var buttonContainer = $('<span class="button-container"></span>');
  // add product data to use in our React component
  buttonContainer.data('product', product);

  item.append(buttonContainer);
  list.append(item);
});

After we’ve inserted the main product list with jQuery, we iterate over the container nodes and use the product data to render the buy buttons with React. You can see in the console that componentDidMount is called for each component.

We might render this product list multiple times, so we need to make sure our productListJqueryReact function works when called repeatedly. jQuery is going to blow away the whole DOM each time which won’t give React a chance to do its clean up (calling componentWillUnmount), so we need to manually unmount the React components before we insert a new list with jQuery.

// clean up any mounted React components
$(element).find('.button-container').each(function() {
  ReactDOM.unmountComponentAtNode(this);
});

You can verify in the console that componentWillUnmount is called for each of the buy buttons every time the list is re-rendered. Here’s the complete function for rendering the product list:

function productListJqueryReact(products, element) {
  var list = $('<ul class="product-list"></ul>');

  // clean up any mounted React components
  $(element).find('.button-container').each(function() {
    ReactDOM.unmountComponentAtNode(this);
  });

  products.forEach(function(product) {
    var item = $('<li>' + product.name + '</li>');

    /* add a container element where 
     * we'll attach our React component
     */
    var buttonContainer = $('<span class="button-container"></span>');
    // add product data to use in our React component
    buttonContainer.data('product', product);

    item.append(buttonContainer);
    list.append(item);
  });

  // replace the existing list if there is one
  var currentList = $(element).find('.product-list');
  if (currentList.length) {
    currentList.replaceWith(list);
  } else {
    $(element).append(list);
  }

  // attach our React components to the containers
  list.find('.button-container').each(function() {
    var container = $(this);
    var product = container.data('product');

    /* React needs a plain, non-jQueryfied DOM 
     * element, so we can use plain "this"
     */
    buyButtonReact(product, this);
  });
}

I’m sure there are other ways to handle the same issues, and other edge cases where jQuery and React can conflict and cause problems, but the examples above cover the most common use cases I’ve encountered. There’s not a lot of writing I could find about using both simultaneously, and the conventional wisdom seems to basically be “don’t do it,” so let us know if you have experience working with both or think we missed something!

Rundeck vs. Crontab: Why Rundeck won

We like automation here at Oyster: we do one-command deployments using Ansible and we run a lot of other automated scripts and ETL processing on various schedules.

crontab on Linux and Task Scheduler on Windows are fine tools as far as they go. But they’re a little too simplistic: difficult to schedule, no retrying, not centralized, you can’t customize success or failure emails, almost impossible to get non-engineers to use, etc.

Enter Rundeck. It’s a powerful system, though we’re only using it in “glorified crontab with a web UI” mode, which is a fine way to start. We’ve now switched most of our cron jobs over to it, and we’re working on moving our Windows Task Scheduler jobs as well.

Rundeck logo

Below are some of the benefits for us.

UI with single sign-on support

Rundeck has a web UI that supports single sign-on, so anyone on our team has automatic access to it. Among other things, this means people on other teams can write ETL scripts and set them up in Rundeck without involving engineering.

The UI is also good for engineering. It allows you to create jobs, run a job, view history, and check job output. Previously to look something up we’d have to hunt through old emails, or SSH or remote into a machine and scan through log directories. Big plus.

Rundeck jobs UI

The UI is definitely better than Windows Task Scheduler (not hard to beat), but it’s not a stellar example of design. It has a very made by developers feel, with tons of detail upfront, and some common operations like looking at recent errors or log output hidden behind several clicks.

History and output capturing

Rundeck logs all job activity, making it easy to debug when something went wrong and why. You can filter the activity history by task name, user, success/failure status, or time.

Once you’ve found the relevant run, you can drill down into it and look at all stdout and stderr output from a given command. For example:

Rundeck error output

Success and failure emails

Windows Task Scheduler and crontab have very limited support for this, but with Rundeck it’s easily configurable and simple to set up. You can tell it to send stdout/stderr output from the job as an attachment to the email.

Some of our scripts we just want to email on failure, others we want to email on success as well, to keep folks in the loop:

Rundeck email notifications

You can customize the subject line using job variables, and you can specify a custom Markdown email template if you want (we just use the default – it’s not pretty, but it is functional).

Simple and advanced scheduling

Rundeck has a simple scheduling interface that allows you to run a job at a certain time every day or on selected days of the week. It also has a more advanced crontab scheduling option to allow you to run tasks on schedules like “every 15 minutes” or “1:30am every last Friday of the month”.

Rundeck simple scheduling

SSH support

Rundeck can run jobs locally on the machine Rundeck is installed on, or it can run jobs on remote machines using SSH. For most of our longer-running jobs, we have a single “worker server” that we run jobs on via SSH.

Rundeck has good support for multiple “node sources”. For example, in our installation, our techops team has set up Rundeck to use our master node list from Chef.

Ad-hoc commands

Rundeck also has a screen where you can run any ad-hoc command against one or more nodes. For example, you might want to check disk usage on a bunch of nodes:

Rundeck ad-hoc commands

Source control integration

One of the problems with UIs (like Task Scheduler) is that change tracking is hard or impossible. We wanted to be able to see who did what, and when. As of version 2.6.0, Rundeck has built-in support for source control integration via git. So we hooked it up to our git repo and now have a changed-tracked log of who did what to the job config.

All the project and job configuration is saved in YAML format, so is fairly easy to read, for example:

- description: ''
  executionEnabled: true
  group: Scheduled
  name: Main nightly scripts
  notification:
    onfailure:
      email:
        attachLog: true
        recipients: errors@oyster.com
  sequence:
    commands:
    - jobref:
        group: Task
        name: Dump live database to snapshot
    - jobref:
        group: Task
        name: Backup snapshot database
...

Other features

Rundeck has plenty of other features, many of which we’re not using yet:

Shout-out to our techops team who set up the Rundeck install for us and helped with various operational aspects.

We’re hiring!

If you like good engineering and automation and want to come work for us, apply here!

Using Ansible to restore developer sanity

This time a year ago we were deploying new code to Oyster.com using a completely custom deployment system written in C++. And I don’t mean real C++; it was more like C with classes, where the original developers decided that std::string was “not fast enough” and wrote their own string class struct:

struct SIZED_STRING
{
    const uint8_t *pbData;
    size_t cbData;
};

It’s not our idea of fun to worry about buffer sizes and string lengths when writing high-level deployment scripts.

Then there was the NIH distributed file transfer system — client and server. And our own diffing library, just for fun. All very worthwhile things for a hotel review website to spend time developing in-house! :-)

Screenshot of our Ansible-based deployment

Sarcasm aside, this wasn’t a joke: we replaced more than 20,000 lines of C++ code with about 1000 lines of straight-forward Ansible scripts. And it really did restore our sanity:

  • Rather than 28 manual steps (some of which, if you ran out of order, could bring the site down) we run a single Ansible command. All we have to specify manually is which revision to deploy and type in some deployment notes to record to our internal log (for example, “Shipped mobile version of hotel page”).
  • Instead of spending hours digging into log files on a remote machine whenever our fragile home-grown system broke, Ansible gives us clear and generally easy-to-track down error messages. The most we have to do is SSH to a machine and manually restart something.

Choice of tools

Some teams within TripAdvisor use Chef for server setup (and other tools like Fabric for code deployments). We also looked briefly at Puppet. However, both Chef and Puppet gave us a very “enterprisey” feel, which isn’t a great match for our team’s culture.

This is partly due to their agent-based model: Chef, for example, requires a Chef server in between the runner and the nodes, and requires you to install clients (“agents”) on each of the nodes you want to control. I think this picture gives a pretty good idea of the number of components involved:

Chef Diagram

In contrast, Ansible has basically five parts:

  • playbooks
  • inventory files
  • vars files
  • the ansible-playbook command
  • nodes

I’m sure there are advantages and more power available to systems like Chef, but we really appreciated the simplicity of the Ansible model. Two things especially wooed us:

  1. You don’t have to install and maintain clients on each of the nodes. On the nodes, Ansible only requires plain old SSH and Python 2.4+, which are already installed on basically every Linux system under the sun. This also means developers don’t have to learn a new type of authentication: ordinary SSH keys or passwords work great.
  2. Simple order of execution. Ansible playbooks and plays run from top to bottom, just like a script. The only exception to this is “handlers”, which run at the end of a play if something has changed (for example, to reload the web server config).

Ansible Tower Screenshot

Ansible Tower UI

Ansible itself is free and open source and available on GitHub. But they also provide a fancy web UI to drive it, called Ansible Tower. It’s nice and has good logging and very fine-grained permissions control, but we found it was somewhat tricky to install in our environment, and as developers it didn’t gain us much over running a simple command.

Our thinking is that in a larger organization, where they need finer-grained permissions or logging, or where non-developers need to kick off deployments, using Ansible Tower would pay off.

Our deployment scripts

As noted above, Ansible has a very simple order of execution, and its model is kind of a cross between declarative (“get my system configuration into this state”) and imperative (“do this, run this, then try this”). Our site deployment involves some system configuration, but is mostly a series of steps that “orchestrate” the deployment. Here’s more or less how it works:

  1. Setup: update code, run tests on staging server, upload new static assets.
  2. Turn off B app servers, run on A (we have 8 Python app servers in each group).
  3. Update code on B app servers.
  4. Turn off A app servers, run on B (making new code live on B).
  5. Update code on A app servers.
  6. Make all A and B app servers live.
  7. Record deployment log and send “finished deployment” email.

To show you some example Ansible code, step 3 (and step 5) use the following code:

---
- name: Update code on B app servers
  hosts: app_b
  tasks:
  - name: Update code on app servers
    subversion: repo= dest= username=
                password= revision=

  - name: Restart app service
    service: name=server-main state=restarted

  - name: Wait for app server to start
    wait_for: port= timeout=300

  - name: Check that new version is running
    local_action: uri url=http://:
                  return_content=true
    register: response
    failed_when: response.json['SvnRevision'] != 

As you can see, Ansible uses fairly straight-forward YAML syntax. In the above code, Ansible runs these tasks against our 8 “app_b” hosts in parallel — a simple but powerful concept.

For a given “play” such as the above, each task is executed in order — we really appreciated how it doesn’t try to outsmart you in terms of how and when things run. The only exception to this is Ansible’s handlers, which are tasks run at the end of a play, but only if something “notified” them. For example, in our deployment, handlers are used to restart our nginx servers when the nginx config file changes.

You’ll see there are a lot of `` used here: each task line is actually a Jinja2 template string that is rendered against your current set of host variables. This makes it very easy to modify settings which change depending on environment (staging, production, etc). It also separates playbooks from user-specific data, meaning settings aren’t hard-coded in playbooks and folks can share them much more easily.

We deploy solely to Linux-based machines (about 50 nodes), and Linux is where Ansible started and where it excels. However, we have something of a Windows history, so it was interesting to learn that as of August 2014 (version 1.7), they started adding support for managing Windows machines — this is done via Powershell remoting rather than SSH.

In short, what sold Ansible to us was:

  • Simple YAML-based syntax
  • Simple execution order: top to bottom, and then handlers
  • Powerful: Jinja2 templates, large library of builtin modules
  • Agentless: no client to install and maintain

Pre-Ansible, we dreaded our 28-manual-step deployments. Post-Ansible, it’s almost fun to deploy code, and the focus is on the features we’re deploying, instead of “what’s going to go wrong this time?”. So I hope you get the chance to try Ansible! And no, we weren’t paid to link that…