14 Nov 2016 by John
Here at Oyster we have a ton of written content. We cover over 15,000 hotels with written reviews, but beyond that we have around 3,500 blog articles, 1,500 ‘best of’ lists, and over 1,000 other documents including our fakeouts, photo slideshows, and travel guides. That is a LOT of words - 17.4 million to be more precise.
Our engineering problem has been making all of that content available to a much wider audience than just those who can understand English. The first part of tackling that problem is getting that content translated. However, it’s not as simple as just sending those 17.4 million words to a translation service. There are many factors to consider:
-
Our content changes over time. In total, our documents number around 22,000 but our revisions over time amount to around 325,000! The good thing about that is, a lot of that content is duplicated as drafts evolve and only a portion is published. But, as published documents change, we still have to make sure either our translations get updated or they don’t break.
-
Translations are expensive. The rate we work with is 8 cents per word which means if we straight up translated everything, our costs would amount to roughly $1.4 million just for 1 language! And currently we support 5 non English languages bringing that bill up to $7 million. Our buget is a mere fraction of that cost so it’s clear that we’d rather spend money delivering tons of new amazing photos and reviews rather than translating everything.
-
Luckily, it seems we don’t need to translate everything since there is a high variability of traffic/value across all of our content. A small fraction of our content accounts for a large amount of our revenue - big name hotels in destinations like Dominican Republic, New York, or Las Vegas are more in demand. ‘Best of’ lists also turn out to be very high in demand because they cut out a lot of the time it take to make a very valuable decision.
-
Just having translations for big ticket items is not good enough. Wait a second, I just said it seems we don’t need to translate everything. Well, it turns out that we do in order to create a good overall experience. Imagine you were browsing a list of 10 hotels and half the hotel summaries were in English (or your native language) and half were in a foreign language. The experience would be quite jarring and even if you found the perfect hotel, you still might have a dissatisfied feeling. Luckily there are ways of doing what could be called ‘compressed’ translations, by either translating only the most important parts of a document or filling in less important parts with low quality translations such as machine translations - which is what Google and Facebook let you do for content they detect is in a foreign language.
These factors led us to build a solution that provides all these features - translations are based on revisions so we can update different languages independently, they can be of different quality/price, there can be multiple translations of the same revision/language but of different quality and merged together, and translations can be full or partial.
Translations table (sample rows)

That is a LOT of moving parts on the backend for a translation but it has allowed us to be very flexible and picky with what and how we translate our content. We’ve managed to get by with only translating around 10% of our documents, of which around only half of those are full translations and the rest are partial. The other 90% is mostly machine translated, but we’re constantly striving to lower that number or find other solutions. The cost of this is a rather complex data structure and system to handle all the details needed to put these parts together. We’ve managed this by streamlining our translation process and consolidating the logic as much as possible.
That is why in addition to this system for handling hotel reviews and articles, we have a separate system for translating and displaying text that is part of our UI. For translating, we automate sending requests to Gengo through their API and for display we use standard gettext. This solution works very well for text that is static and highly visible - just always translate it at high quality. Fortunately, our UI text at 40,000 words is only 0.2% the size of our document text!
Earlier this year, we optimized our i18n experience even further on de.oyster.com as a test bed by choosing not to go the machine translation route but instead, selectively hiding content. We decided to hide all machine or untranslated blog articles and travel guides because we felt that those low quality translations might be interfering with the rest of our content from both a user experience and SEO perspective. At the same time, we almost tripled the number of documents we have translated to German, even though that still only brought us up to around 13%. The hard part about that was determining exactly which hotels and documents to translate which was a whole other problem that our data scientist was charged with tackling. Fortunately it seems these changes paid off well as our de.oyster.com SEO traffic growth is far outpacing the US site (see below).

Normal traffic patterns for a travel site are:
- High in January
- Dip in the spring
- Highest in the summer
- Declining for the rest of the year
The US site is showing that trend this year, but the German site has broken away since we made these changes in the spring. This is not the end of our i18n work as we have plenty of other existing features and plans to help us bring more of our amazing content to the rest of the world.
16 Aug 2016 by Tuan

Welcome back to the 2nd of the 3-part Computer Vision series at Oyster.com. If you have not seen our 1st part of the series, we would recommend you check it out, in that part we show how HDR (High Dynamic Range) panoramas are done at Oyster. That includes some comparisons between Oyster’s panoramas and Google’s.
In this part, I will share the core work of our recently released feature Virtual Tours, also known as walkthroughs.
I will first give an introduction of walkthroughs and why we needed to build our own framework for this purpose. I will then jump right into the details of the computer vision system that generates virtual walkthroughs from sets of panoramas. Lastly I will show some walkthrough results obtained from this approach, including different scenarios for indoor and outdoor.
Generating walkthroughs from panoramas
A walkthrough is a set of connected panoramas where users can navigate from one panorama to another. This type of feature provides more interactive experience for users looking at remote destinations. Some examples of walkthroughs can be checked out on Oyster.com, for example: walkthrough for Trump Soho Penthouse or walkthrough along a Scrub Island Beach Pool. There are different approaches to generate those walkthroughs.
One common approach is to use depth information to reconstruct the 3D scene for each panorama spot, like Matterport. This provides seamless transitions from scene to scene. However, it comes with high cost in purchasing their own special device and model hosting, and its quality is not comparable to standard DLSR cameras. The more economic and more popular approach is one that uses 2D panoramas to build walkthroughs. These can be viewed with popular 360 image viewers such as Krpano that allow users to walk between panoramas. The connections between panoramas can be built manually, but Oyster shoots thousands of walkthroughs a month and needed an automated solution. We developed a fully automated framework that uses Computer Vision to find and connect all related panoramas into one complete walkthrough.
Automating walkthrough process
Our process starts with a set of HDR panoramas as input (Figure 1 shows a list of test panoramas taken at TripAdvisor Office@NYC). It finds the panoramas that are connected, estimates the links between those panoramas, and integrates those links into a Krpano virtual tour project.
Figure 1: Set of equirectangular panoramas as input

Given two panoramas, which we can call 1 and 2, the problem of creating a virtual tour from these two panoramas becomes finding the location of camera 1 in panorama 2’s spherical coordinate, and location of camera 2 in panorama 1’s spherical coordinate. By projecting our spherical coordinates (defined by horizontal angular ath and vertical angular atv) into planar coordinates (horizontal value x and horizontal value y) (where the reprojection from planar to spherical is given by: ath = (x/width - 0.5) * hfov and atv = (y/height - 0.5) * vfov, and hfov = 360 and vfov = 180 for our panorama), resulting in a set of local planar slices, we can look for the location of camera 1 in all local slices of camera 2’s model and vice versa, the location of camera 2 in all local slices of camera 1’s. Therefore, the original problem of finding camera location for each pair of panoramas becomes a search for camera location in all slices of the other panorama, given a set of n panoramas and m local slices for each panoramas we will need to carry out n * (n - 1) * m * (m - 1) / 4 slice-slice matchings.
Figure 2 illustrates the top-down view between panorama 1 (in blue, having camera center O1) and panorama 2 (in yellow, having camera center O2). Each panorama sphere is warped onto a cube, which appears as a square looking top-down with 4 sides FRONT-RIGHT-BACK-LEFT.
Figure 2: Automated process for generating virtual tour

For each image-image matching of 2 slices, the camera location is the center of the pin-hole camera model, and the geometry between these two image models is an epipolar geometry. If we look at image 1-FRONT and 2-RIGHT, camera centers O1 and O2 of the two images form a line from image 1 to image 2 and intersects each image at E1 and E2, or the Epipoles in epipolar geometry. The Epipole E1 on image 1 is the pixel coordinate (x1, y1) of camera center O2 on image 1, and the Epipole E2 on image 2 is the pixel coordinate (x2, y2) of camera center O1 on image 2.
Without loss of generality, let us assume sides Front and Back of camera 1 share overlapping views with sides Right and Left of camera 2 in this example. Our process will need to locate E1 (image of camera center O2 in panorama 1) and E2 (image of camera center O1 in panorama 2). The coming sections will describe how these two values can be calculated based on the overlapping plane and corresponding points detected in both images. The whole process for this image-image matching has 4 main steps: construct local views, find corresponding points, find hotspot location in local coordinate, and transform local to spherical coordinate. Those steps are illustrated in Figure 3.
Figure 3: Automated process for generating virtual tour

Construct local views
A 360 panorama is a representation of a sphere which center is at the camera location. The original equirectangular format of input panorama can be divided into 6 non-overlapping rectangular local views representing the Up, Down, Left, Right, Front, and Back side of the cube covering the 360 sphere of the panorama scene. This division enables us to use epipolar geometry constraints of two image planes sharing overlapping views to find epipoles, which are images of camera positions in our case. We leave out Up and Down views since they do not contain hotspots for virtual tour.
An equirectangular panorama can be split into 6 rectangular images using krpano tool, the result is shown in Figure 4
krpanotools64.exe makepano image.tif normal.config
Figure 4: Slicing panorama into local views

This slicing warps all pixels (having spherical coordinates ath, atv) on the sphere into the pixels (having planar coordinates x, y) on the sides of the bounding cube. Each side dimension is twice the radius of the cube. The spherical-planar projection is defined by x = (ath / hfov + 0.5) * width and y = (atv / vfov + 0.5) * height, which we will use to project our estimated camera planar coordinates back to spherical coordinates.
Find corresponding points
In pin-hole camera model, a pixel coordinate on an image represents a set of points lying on the ray light from camera center towards that point in 3D (and goes on to infinity). With another camera viewing the same scene, we can see that line, or in other words, a point in one camera is transferable into a line in another camera in epipolar geometry, this line is corresponding line, as illustrated in Figure 5. All the corresponding lines have a common property, they all go through the Epipole, that is, given all points on image 1-FRONT in Figure 2, we can project all corresponding lines on image 2, and all these corresponding lines intersect at Epipole E2 on image O2, and similarly for Epipole E1 on image 1. In order to find corresponding lines from image points in pixel coordinate, we need first to find the fundamental matrix of the epipolar geometry. This fundamental matrix is a rank-2 3x3 matrix that represents the relative pose (translation + rotation) of image Left and right Right (or vice versa) as well as the intrinsic parameters of two camera. It has 7 parameters, 2 for each Epipole, and 3 for the homography that relates the two image planes. The convenient property of fundamental matrix is that it can be calculated from sufficient corresponding points. Corresponding points here are pixel points appear on two images that are pointing to the same 3D real-world point.
Figure 5: Epipolar geometry of the overlapping view of 2 camera models

With all those theories established, our problem of connecting panoramas into a virtual walkthrough now comes down to finding corresponding points on each slice image pair of the two panoramas. For this task, we resort to feature matching, which is a robust approach for dynamic views. The matching consists of three main steps, feature detection and feature matching and feature pruning.
Feature detection
Feature detection is the process of running pre-defined feature filters on an image to discover features that are discriminative and view invariant (for example point at corners or edges where ). OpenCV has implementation for a collection of robust local features such as FAST, STAR, SIFT, or SURF (please check out OpenCV’s documentation for more available feature detectors)
An example of how SIFT can be used for detecting local features is (Note: Since SIFT and SURF are patented feature, you should use other free features provided by OpenCV to avoid license fee)
detector = cv2.xfeatures2d.SIFT_create(nfeatures=2000, nOctaveLayers=3,
contrastThreshold=0.03, edgeThreshold=10, sigma=1.6)
kp, des = detector.detectAndCompute(self.image, self.mask)
In the code above, there are 2 paramas that are quite important, nfeatures and contrastThreshold, reducing contrast threshold or increasing number of maximum features will give us more features, and vice versa. Those values should be chosen based on the nature of image data that we are dealing with, and the focus of our detection process. In our case, the distance between camera locations in real-world coordinate is unknown, which could be too far so a more appropriate design is to extract as many features as we can at detection phase, then in feature matching and pruning phase we will filter out irrelevant features. More practical decisions like this will be discussed later in our last section, along with our coarse-to-fine approach to efficiently extract features at constrained processing time.
The following figure shows the result of our feature detection process on two images, one from each panorama, local features are drawn in different colors, and as we can see they are mostly detected on corners and edges, and some features seem to be detected on both paranomas, those are the corresponding points that we are looking for.
Figure 6: Local features detected from the two images

Feature matching
At each location where the feature is detected, a set of attributes are extracted to define that feature, they are called feature descriptors, some of the most common feature descriptors implemented in OpenCV are SIFT, SURF, HOG, BRIEF, BRISK, again please refer to OpenCV’s documentation for more available feature descriptors. Those descriptors can be seen as a normalized (orientation-wise) vectorized aggregation (spatial-wise) of primitive filter response (e.g. SIFT descriptor gives a 128-dimensional vector aggregated from 4 x 4 location bins in left-right top-down spatial order, each bin is represented by accumulated gradients grouped in 8 orientation bins).
Featue matching is the process of finding the same set of features that appear in both images given feature descriptors (in forms for multidimensional vectors). Given ten of thousands of features being detected in each image, an efficient matching approach is to use kd-tree (k dimensional binary tree) to first index all features of one image and matching with features from the other image can then be done by traversing the indexed trees. In order to minimized false negatives in matching, knn (finding the nearest k matches for each feature) is also used. All those theories can be done with OpenCV API in 2 lines of code
matcher = cv2.FlannBasedMatcher(dict(algorithm=FLANN_INDEX_KDTREE, trees=5), dict(checks=50))
matches = matcher.knnMatch(self.pano_image_from.des, self.pano_image_to.des, k=2)
In this context, number of trees, traversal checks, and number of k nearest neighbors are chosen based on accuracy-time tradeoffs in our particular applications. The following figure shows the result of this matching step, as we can see in this design we aim to reduce false negative matches so more matches are returned than needed, those matches will be cleaned up in the following feature pruning section.
Figure 7: Feature matching results

Feature pruning
Feature pruning is a series of filters being applied onto matched features, to filter out features that have passed through our previous matching process (which was purely based on similarity in appearance) but are not the exact corresponding features. These wrongly matched features are actually very common, most of the time they are features on similar or the same objects (e.g. features on the brick wall do share the same appearance across the whole wall, features on 2 different corners of the board do have similar appearance, just a rotated version of each other), or features that are noise (those are normally random dots on clean backgrounds that appear so frequently that there are many matches of these, with similar matching distance).
A successful feature matching system is mostly determined by how well feature pruning is implemented, if it is too strict we might end up with insufficent matches, but if it is too loose that will not only increase our processing time but will return in incorrect matches for later phases.
Pruning features should be designed based on the nature of the data. For our problem, there are 4 main pruning filters that can be used, which are ratio filter, cross-match filter, orientation consistency filter and spatial consistency filter.
-
Ratio filter is designed to remove feature noise as mentioned above, the idea is, given all matches of a feature along with their similarity scores, a feature is determined as a feature noise when its best and second best match has too similar similarity scores. Again, if a feature can be matched with similar confidence to 2 different features, that feature is considered as a feature noise (e.g. a random dot on a wall could be matched with similar confidence with other random dots). This filter checks for the ratio of similarity score between the second best match and the best match, that ratio has to fall under a certain value for that feature and that match to be valid. This pruning technique was first proposed by David Lowe (the author of SIFT feature) using 0.7 as ratio threshold, but this type of filter could be used effectively with any features that have explicit similarity scores.
-
Cross-match filter is checking for mutual matching result of a pair of features, in other words, 2 features are considered to be correctly matched when each feature appears in the match list of the other feature.
-
Orientation consistency filter is specific for our particular problem where there are no rotation in the transformation of one image into another, that is, given the cameras are placed and locked on tripod when taking the photo, that results in images having same upright orientation wherever the tripod is placed. This filter checks for the dominant orientation of the feature and its match to see if they are close. This filter can be applied on any feature detector that calculates dominant orientation.
-
Spatial consistency filter is similar to orientation consistency, because image orientation is preserved between different shots, relative spatial relationship between filters are preserved, and this filter checks if a feature and its match keeps this relationship.
The following figure shows the result of feature pruning process after 4 filters have been applied, those features indicate a more clean and accurate match than the first matching set we obtained from pure appearance comparison.
Figure 8: Feature pruning results

Find hotspot location in local coordinate
The corresponding features detected from two images are then used to find hotspot location in local coordinate, this is done in three steps, first to find the fundamental matrix of the epipolar geometry constructed from these two sets of corresponding points, then use the fundamental matrix and the two sets of features to find a set of corresponding lines on each image, and lastly interesections of corresponding lines on each image are derived as the epipoles or camera locations in local planar coordinate.
As discussed previously, fundamental matrix in epipolar geometry can be estimated based solely of a set of coordinates (with at least 8 corresponding pairs). A robust approach towards finding fundamental matrix is by using RANSAC (Random sample consensus) technique, which is an iterative process where at each iteration, a random subset of the corresponding pairs are chosen as the “inliers” to construct the sample fundamental matrix, the rest of the corresponding pairs are then treated as test samples, where the sample fundamental matrix is used to find estimated corresponding points of the test set, which is then compared with the actual corresponding points of the test set to find inliers in the test set, which is then used as the measure for this sample pick. The best sample fundamental matrix is then chosen and returned. The following code shows how it is done in OpenCV (here fundamental_mask is used to trace back on original matches for inlier (or valid) matches).
fundamental_mat, fundamental_mask = cv2.findFundamentalMat(valid_matches_left, valid_matches_right, cv2.FM_RANSAC)
Given the fundamental matrix, we can then calculate the epilines on the other image for every inlier point from one image.
valid_matches_left = valid_matches_left[fundamental_mask.ravel() == 1]
valid_matches_right = valid_matches_right[fundamental_mask.ravel() == 1]
epilines_left = cv2.computeCorrespondEpilines(valid_matches_right.reshape(-1, 1, 2), 2, fundamental_mat)
epilines_right = cv2.computeCorrespondEpilines(valid_matches_left.reshape(-1, 1, 2), 2, fundamental_mat)
Once epilines are detected, epipole is then derived as the intersection of those epilines, and epipoles that fall within the image boundaries are valid. The estimated epipoles indicate the ray that two cameras are connected, so it could go from one image to another or vice versa. As we need to identify whether we can navigate from one image to another, we use the concept of far-near relationship to describe the 2 images. Given 2 images are showing the same scene, there is one image is closer to the scene than the other image, that is the requirement for valid epipole to be found within image boundary, and the navigation will only happen from the far image to the close image, in order words we are looking for the far image as its epipole is the hotspot we are finding. In order to find out which image is further from the scene, we use the average distance all all inlier features to its mean location (in both horizontal and vertical dimension). Image with smaller average distance is the image further away from the scene.
Figure 9: Hotspot estimation - forward

Out of all possible local view matches (16 matches for 4 view consideration - LEFT, RIGHT, FRONT, BACK - or 36 matches for 6 local view consideration - including TOP, DOWN) we should ideally end up with 2 epipole locations, the first one to go from one panorama to the other, and the second one to go back from the other panorama. However, in practice we normally end up with more than 2 valid epipoles, we use a metric called average vertical distance to rank pairs of epipoles in terms of correctness. Average vertical distance is the average of the distance from the two estimated epipoles to the middle lines. In theory, given the camera tripod is at a fixed height level, the epipoles should always reside on the middle of the image, so we can use this property to find the best epipole pair that has the minimum distance to the middle lines of the images. Using this metric, we are then be able to locate the best hotspot in all matches from one pano to the other pano (E1 in Figure Epipolar Top-Down), in turns help us decide the opposite hotspot on the other side of the spherical cube from the other pano back (E2 in Figure Epipolar Top-Down)
Figure 10: Hotspot estimation - backward

Once the location of hotspots in local planar coordinate are found, we can derive global planar coordinate based on the index of the local view plane, then spherical coordinate can be calculated based on panorama’s size and field of view
Figure 11: Projecting local coordinates to spherical coordinates

Practical implementation tips and tricks
So far we have presented a complete workflow to generate virtual tours automatically using OpenCV and Krpano. There are few additional points that we might take into consideration when trying to implement this workflow for production scale.
- Local view matching: an alternative to running all 4 x 4 local views matching, we can use the result one view against other 4 views to decide we need to carry out the rest of the matches
- Coarse-to-fine framework for feature detection and feature matching: the number of features detected and matched is proportional to processing time and accuracy, so we derived a coarse-to-fine framework with 3 layers going from small number of features (coarse layer) to large number of features (fine layer). Only when some good matches are identified at a coarser level, we can move on to the finer level. This type of framework reduces greatly processing time while still maintains high accuracy in matching.
- Metrics to find best hotspot: in this context we use the average distance to the horizon line of the image as the metric to find best epipole location, but that metric could also be other estimators that work best for the scenario, for example good feature ratio (ratio of final inlier feature counts on all detected features), or the amount of inliners found…
- Free vs. non-free: One important note about feature choice is the extra cost involved in using non-free features (SIFT, SURF), for production code it is best to consider the tradeoffs between accuracy, processing time and cost to make sure you pick the most suitable feature type for your application.
Results and summary
In this post, we have presented the complete workflow to create virtual tour from a Computer Vision approach, using local feature matching and epipolar geometry. We also discussed some practical notes about implementing such system for production scale. Together with our presented work in HDR panoramas/computer-vision-part-1-hdr-panorama), we can create high quality virtual tours to boost up user experience and engagement on our sites. Again, if you have not checked out our previous part where we talked about how High Dynamic Range panoramas are created at large scale, please check it out/computer-vision-part-1-hdr-panorama), in the next and last part of this Computer Vision series, we will show you how to integrate smart features like moving mini-maps or dynamic mouse arrows into virtual tours.
Here are some of the virtual tours generated from the approach we just described, the links above each screenshot will lead you to our live walkthroughs on Oyster.com, with support for both desktop and mobile.
Dream Downtown Lobby

Gansevoort The Chester

Holiday Inn Montego Main Pool

Refinery Hotel Rooftop

Scrub Island North Beach

Trump Soho Penthouse

About the author:
Tuan Thi is a Senior Software Engineer in Computer Vision at Oyster.com, part of Smarter Travel Media Group, at TripAdvisor. He finished his PhD in Computer Vision and Machine Learning in 2011. Before joining TripAdvisor, he was a research engineer and computer vision scientist at Canon Research and Placemeter Ltd. with various international publications and patents in the field of local features, structured learning and deep learning.
07 Jul 2016 by Tuan

Here at Oyster, we are the leading website for comprehensive photographic reviews of hotels. One key component of our imagery database is panoramas, produced at high quality and large scale (over 150,000 to date). In this three-part series, we will be looking at the Computer Vision work that has been part of our panorama pipeline. In this first part of the series, we will introduce our automated pipeline for generating High Dynamic Range (HDR) panoramas.
HDR Panorama
Panorama images at Oyster have full angle range with 180 degrees vertically and 360 degrees horizontally. They provide an immersive experience for viewers to explore a ubiquitous view of the venue - whether it is outside at the pool, on the rooftop or inside a hotel room. Panorama images have now become a trending and must-have media type for most image-oriented websites. Meanwhile, HDR imaging is a common technique to produce a greater dynamic range of luminosity than standard digital imaging. It is especially useful for panorama imaging where an evenly distributed look will greatly improve the quality. HDR imaging is normally achieved by merging multiple low-dynamic-range photographs. We use PTGui, a stitching software, to carry out batch stitching of 12 fisheye images (180 degree x 180 degree) of the four different views (left, right, front, back) with three images of different exposure for each view. The stitching process will return one equirectangular panorama.
Raw fisheye images

While PTGui is a good choice for image stitching, its HDR quality is not the best available. People often use alternative tools for HDR merging. SNS-HDR is the package used by Oyster. It has support for batch processing, excellent HDR quality, and adequate deghosting support (compared to PTGui).
PTGui HDR

SNS-HDR

The tricky part for using SNS-HDR as a batch tool is its limited support on file format input and the auto-grouping of images of the same view, and that is where Computer Vision comes into play.
The first problem is on accepted formats for image input. SNS-HDR works with RAW files, but it works especially well with converted raw DNG format (using DNGConverter), compared to two common formats CR2 or NEF.
One possible artifact with SNS-HDR on original raw files

Stable SNS-HDR merge on dng files

The second problem of using SNS-HDR batch processing is to figure out the three images of the same view. This is done using OpenCV (Python binding with Numpy), one of the most comprehensive Computer Vision libraries to date. We use DCRAW to convert the DNG files to TIFF since OpenCV does not work directly with raw files (CR2, NEF, DNG). Then OpenCV can be used to detect the four sets of near-duplicate three images of the same view.
Near-duplicate image detection
In this context, we have 12 fisheye images (180 deg vertical by 180 deg horziontal) of four adjacent views (left, front, right, back). Each view has three images taken at different exposure level (shutter speed varied), and we need to robustly divide the 12 images into those groups of three. Since they are different in exposure, we cannot apply direct comparison methods like checksum.
There exists several methods to carry out near-duplicate image detection which all involve using a pre-processing step (e.g. histogram equalization), a pair-wise similarity metric of choice (e.g. pixel-wise or block-wise distance, edge or contour difference, norm-1 or norm-2 distances), and an association method. These methods are combined and tuned based on the practical constraints of the problem.
In our approach, we apply histogram equalization to balance out multiple exposure levels, then a pixel-wise absolute difference. Pixel-wise difference is chosen because spatial difference is more important in our case of unaltered adjacent views. For cases like detecting transformed images people normally opt to local edge or contour difference.
This is followed by a postprocessing step of lower bound trimming to remove illumination difference noise, and image erosion to remove camera movement noise. This step will return a binary difference image of any two input images which could also be used for detecting ghosting problems in HDR merging.
Image Comparison using pixel-wise difference, with lower bound trimming and erosion
def image_comparison(img1, img2, lower_bound=120):
e1 = cv2.equalizeHist(img1)
e2 = cv2.equalizeHist(img2)
diff = cv2.absdiff(e1, e2)
_, diff = cv2.threshold(diff, lower_bound, 255, cv2.THRESH_BINARY)
kernel = np.ones((2, 2), np.uint8)
diff = cv2.erode(diff, kernel, iterations=1)
nonZero = cv2.countNonZero(diff)
return nonZero
The last step in this approach is association, where pair-wise image difference is used to associate similar images into sets. This is the step that is normally specific and fine-tuned for different systems, and there are two common ways this can be implemented. It is similar to a common clustering problem, and you can either use distance-based (hierarchical clustering) or iterative centroid or group-based (k-means clustering).
In a distance-based method, a distance threshold value is chosen (or learned empirically from data) to decide if two items belong to the same group. Once two images are matched, subsequent association steps will only need to be carried on one sample element of the group. This approach has the advantage of being fast (linear processing time), but its performance depends on how well the distance threshold is chosen, therefore this method is mostly used when data is well separated and speed is a requirement.
In our case the number of images is small and the accuracy requirement is 100% of correct match (imagine a HDR image merged from three different images - it does not look pretty). Therefore we go for the second approach where we match all pair-wise combination of source images (66 matches for 12 images). The constraint on four sets of three images is used as the termination condition for our association step. Our iterative association consists of two steps, collecting tuples of top three matches (representing one group) and filtering out good matches (any match tuples that are collected exactly three times is a correct association). This is repeated until all elements are filtered.
Similar image association
def find_groups(unselected, grouped, matches):
freqs = {}
m = {}
for i in range(12):
n = {}
for j in range(12):
if i == j: continue
if j in unselected:
n[j] = matches[(i, j)]
m[i] = sorted(n.items(), key=operator.itemgetter(1))
new_set = sorted([v[0] for v in m[i][:2]] + [i])
freqs[tuple(new_set)] = 1 if tuple(new_set) not in freqs else freqs[tuple(new_set)] + 1
for k, v in freqs.iteritems():
if v == 3:
for i in k:
unselected.remove(i)
grouped.append(k)
Once similar images are grouped into correct views, SNS-HDR is used to merge LDR images into HDR images (with tonemapping). PTGui is then called to stitch the four merged HDR into one equirectangular panorama.
Normal Panorama

HDR Panorama done right

Here is list of randomly selected panoramas from Oyster compared to Google
Trump Soho from Google.com
Trump Soho from Oyster.com
Refinery Hotel from Google.com
Refinery Hotel from Oyster.com
Holiday Inn Resort Montego Bay on Google.com
Holiday Inn Resort Montego Bay on Oyster.com
Sensatori Jamaica by Karisma on Google.com
Sensatori Jamaica by Karisma on Oyster.com
In this post, we have presented our approach to generating HDR panorama at large scale using available packages like DNGConverter, DCRAW, SNS-HDR, PTGui, and with the help from Computer Vision techniques with OpenCV. Please feel free to visit our website Oyster to see our rich collection of hotel panoramas all around the world. Also, please stay tuned for part 2 and 3 of this Computer Vision series, where we will show you how virtual tour can be generated (again fully automated at large scale) from a set of panoramas, and how smart features like mini-maps can be added to your tour to improve user experience.
About the author:
Tuan Thi is a Senior Software Engineer in Computer Vision at Oyster.com, part of Smarter Travel Media Group, at TripAdvisor. He finished his PhD in Computer Vision and Machine Learning in 2011. Before joining TripAdvisor, he was a research engineer and computer vision scientist at Canon Research and Placemeter Ltd. with various international publications and patents in the field of local features, structured learning and deep learning.
22 Mar 2016 by Alex

In this post I’ll be covering some tips on how to use React and jQuery together in the same UI.
Okay so first off you might be thinking “why would you want to do such a thing?” - in fact the idea
of trying to make React’s declarative style live together with imperative jQuery DOM updates
may have you thinking something like this,
and for the most part, you wouldn’t be wrong.
So first the “why.” If you’re starting a brand new, “greenfield” project and you want to use React,
then just do it. There’s no good reason I can think of to mix-and-match React with jQuery or
Mustache or whatever other DOM-helper/template library if you don’t have to - just use React and everything will be cool
and you won’t have to worry about it. If on the other hand, you have one of those “legacy” applications
that has “customers” and makes “money,” and for some reason your boss is not into the idea of you
spending a few weeks rewriting the whole front end in React, but you still want in on that declarative
React goodness, you may have to figure out how to get React to play well with jQuery or something similar.
Are you sure you can’t rewrite it?
Let’s say you have some piece of UI that gets rendered with jQuery, and you want to stick in some
new component written with React. Take a look at your jQuery rendering function. Are you just building up some
big string of HTML and sticking it in the DOM? Or using some kind of JavaScript template? If your rendering
is already reasonably functional, i.e. some data goes into your function and some HTML comes out (or gets
appended to the DOM or whatever) it will be pretty easy to just rewrite in React. So you should probably
just do that and save yourself the inevitable hassle you’ll have when something breaks and you have to debug it.
Okay so you can’t rewrite it
For whatever reason you’ve determined it’s not practical to rewrite your jQuery code. Here is an important
caveat: I think using jQuery and React to manage updates to the same DOM elements is a bad idea.
React is really smart about figuring out how to update the DOM, but that only works if React is the only thing
doing the updates. So unless you can cleanly separate the DOM elements in your UI so that some only get
updated by React and others only get updated by jQuery, I wouldn’t try it.
So say for instance you’re
going to render a product list with a React template, and then jQuery is going to add and remove CSS classes
to the list items, and then your React render function might get called again later. This is a bad idea. React will have no idea about the changes that jQuery has made. Some of React’s efficiency comes from reusing DOM nodes on the page when things change, rather than always inserting or deleting nodes. If jQuery is making DOM changes that React doesn’t know about, some node that gets reused might be in an unexpected state.
One further caveat is that you should only really consider this mixed approach if you’re planning to
eventually replace jQuery rendering with React. Using both doesn’t make sense long-term, but if you’re
looking to gradually transition to React, you may have some parts of your UI using both for a while.
Let’s write some code
As an example we’ll start by rendering a simple list of products with jQuery. We’ll just show the name
for each product and a button to buy it. Then we’ll get into replacing parts of the UI with React.
Our jQuery product list is pretty basic - it takes an array of products and inserts the list into
#product-list-container. If the product list is updated, you just call productListJustJquery()
again
and replace the whole list with a new list.
var products = [
{
id: 1,
name: 'Book',
price: 15
},
{
id: 2,
name: 'Burrito',
price: 8
},
{
id: 3,
name: 'Spaceship',
price: 999999999
},
{
id: 4,
name: 'Dinosaur Bones',
price: 5000000
}
];
function buyProduct(productId) {
// buy the product
}
/* -- Just jQuery -- */
function buyButtonJquery(product) {
var button = $('<button class="buy-button">$' + product.price
+ '</button>');
// handle click event
$(button).on('click', function(event) {
event.preventDefault();
buyProduct(product.id);
});
return button;
}
function productListJustJquery(products, element) {
var list = $('<ul class="product-list"></ul>');
products.forEach(function(product) {
var item = $('<li>' + product.name + '</li>');
item.append(buyButtonJquery(product));
list.append(item);
});
// replace the existing list if there is one
var currentList = $(element).find('.product-list');
if (currentList.length) {
currentList.replaceWith(list);
} else {
$(element).append(list);
}
}
jQuery inside a React component
We’ll start by replacing most of the product list with React but leaving the buy button in jQuery.
This is simpler than the inverse - sticking React inside a jQuery UI - so we’ll do it first.
The ProductListComponent
is pretty straightforward:
var ProductListComponent = React.createClass({
render: function(props) {
return (
<ul className="product-list">
{this.props.products.map(function(product) {
return <ProductComponent
key={product.id}
product={product} />
})}
</ul>
);
}
});
but in ProductComponent
we need some extra
code to make the call to jQuery. We add an extra button-container
element,
so that we have somewhere to put the jQuery DOM, and keep a reference to it.
render: function(props) {
/* we need to keep a ref to the
* button-container so we can update it with jQuery
*/
return (
<li>
{this.props.product.name}
<span className="button-container"
ref="buttonContainer"></span>
</li>
);
}
Life cycle methods
It’s important to get familiar with the various React life cycle methods.
The relevant ones here are componentDidMount
- which is called after the first render, and
componentDidUpdate
- which is called after subsequent renders. In each of these methods we just call
renderBuyButton
, which uses our reference to the button-container
DOM node to create a brand new buy
button with jQuery on each render.
componentDidMount: function() {
this.renderBuyButton();
},
componentDidUpdate: function() {
this.renderBuyButton();
},
renderBuyButton: function() {
// render the buy button with jQuery
$(this.refs.buttonContainer).html(
buyButtonJquery(this.props.product)
);
}
Here’s the complete ProductComponent
:
var ProductComponent = React.createClass({
componentDidMount: function() {
this.renderBuyButton();
},
componentDidUpdate: function() {
this.renderBuyButton();
},
render: function(props) {
/* we need to keep a ref to the
* button-container so we can update it with jQuery
*/
return (
<li>
{this.props.product.name}
<span className="button-container"
ref="buttonContainer"></span>
</li>
);
},
renderBuyButton: function() {
// render the buy button with jQuery
$(this.refs.buttonContainer).html(
buyButtonJquery(this.props.product)
);
}
});
React components inside jQuery
Now we’re going to do it the other way and stick some React DOM inside our jQuery DOM. This is a little
trickier. We’ll start with a BuyButtonComponent
in React, there’s not much to it:
var BuyButtonComponent = React.createClass({
onClick: function(event) {
buyProduct(this.props.product.id);
},
componentDidMount: function() {
console.log('component did mount - stuff to clean up later');
},
render: function(props) {
return (
<button className="buy-button"
onClick={this.onClick}>{this.props.product.price}
</button>
);
},
componentWillUnmount: function() {
console.log('about to unmount - clean up stuff here');
}
});
function buyButtonReact(product, element) {
ReactDOM.render(
<BuyButtonComponent product={product} />,
element
);
}
I’ve added the componentDidMount
and componentWillUnmount
methods with some console.logs
. They clearly don’t really
do anything in this component, but in a real component you’ll often do something in componentDidMount
-
subscribe to event from a Flux store or something - that needs to be cleaned up when the component unmounts.
We need to make sure these methods still get called at the right times or you risk memory leaks or trying
to update a component’s state property when it no longer exists (which will throw an error).
So now we’ll alter our jQuery function for rendering the product list to use our new React BuyButton
component.
We’ll use the same strategy of adding an extra button-container
component here. We also attach the product data
to the container component so we can use it later.
products.forEach(function(product) {
var item = $('<li>' + product.name + '</li>');
/* add a container element where
* we'll attach our React component
*/
var buttonContainer = $('<span class="button-container"></span>');
// add product data to use in our React component
buttonContainer.data('product', product);
item.append(buttonContainer);
list.append(item);
});
After we’ve inserted the main product list with jQuery, we iterate over the container nodes and use the product
data to render the buy buttons with React. You can see in the console that componentDidMount
is called for each component.
We might render this product list multiple times, so we need to make sure our productListJqueryReact
function works
when called repeatedly. jQuery is going to blow away the whole DOM each time which won’t give React a chance
to do its clean up (calling componentWillUnmount
), so we need to manually unmount the React components before
we insert a new list with jQuery.
// clean up any mounted React components
$(element).find('.button-container').each(function() {
ReactDOM.unmountComponentAtNode(this);
});
You can verify in the console that componentWillUnmount
is called for each of the buy buttons every time the list is re-rendered. Here’s the complete function for rendering the product list:
function productListJqueryReact(products, element) {
var list = $('<ul class="product-list"></ul>');
// clean up any mounted React components
$(element).find('.button-container').each(function() {
ReactDOM.unmountComponentAtNode(this);
});
products.forEach(function(product) {
var item = $('<li>' + product.name + '</li>');
/* add a container element where
* we'll attach our React component
*/
var buttonContainer = $('<span class="button-container"></span>');
// add product data to use in our React component
buttonContainer.data('product', product);
item.append(buttonContainer);
list.append(item);
});
// replace the existing list if there is one
var currentList = $(element).find('.product-list');
if (currentList.length) {
currentList.replaceWith(list);
} else {
$(element).append(list);
}
// attach our React components to the containers
list.find('.button-container').each(function() {
var container = $(this);
var product = container.data('product');
/* React needs a plain, non-jQueryfied DOM
* element, so we can use plain "this"
*/
buyButtonReact(product, this);
});
}
I’m sure there are other ways to handle the same issues, and other edge cases where jQuery and React can conflict and cause problems, but the examples above cover the most common use cases I’ve encountered. There’s not a lot of writing I could find about using both simultaneously, and the conventional wisdom seems to basically be “don’t do it,” so let us know if you have experience working with both or think we missed something!
07 Mar 2016 by Ben
We like automation here at Oyster: we do one-command deployments using Ansible and we run a lot of other automated scripts and ETL processing on various schedules.
crontab on Linux and Task Scheduler on Windows are fine tools as far as they go. But they’re a little too simplistic: difficult to schedule, no retrying, not centralized, you can’t customize success or failure emails, almost impossible to get non-engineers to use, etc.
Enter Rundeck. It’s a powerful system, though we’re only using it in “glorified crontab with a web UI” mode, which is a fine way to start. We’ve now switched most of our cron jobs over to it, and we’re working on moving our Windows Task Scheduler jobs as well.

Below are some of the benefits for us.
UI with single sign-on support
Rundeck has a web UI that supports single sign-on, so anyone on our team has automatic access to it. Among other things, this means people on other teams can write ETL scripts and set them up in Rundeck without involving engineering.
The UI is also good for engineering. It allows you to create jobs, run a job, view history, and check job output. Previously to look something up we’d have to hunt through old emails, or SSH or remote into a machine and scan through log directories. Big plus.

The UI is definitely better than Windows Task Scheduler (not hard to beat), but it’s not a stellar example of design. It has a very made by developers feel, with tons of detail upfront, and some common operations like looking at recent errors or log output hidden behind several clicks.
History and output capturing
Rundeck logs all job activity, making it easy to debug when something went wrong and why. You can filter the activity history by task name, user, success/failure status, or time.
Once you’ve found the relevant run, you can drill down into it and look at all stdout and stderr output from a given command. For example:

Success and failure emails
Windows Task Scheduler and crontab have very limited support for this, but with Rundeck it’s easily configurable and simple to set up. You can tell it to send stdout/stderr output from the job as an attachment to the email.
Some of our scripts we just want to email on failure, others we want to email on success as well, to keep folks in the loop:

You can customize the subject line using job variables, and you can specify a custom Markdown email template if you want (we just use the default – it’s not pretty, but it is functional).
Simple and advanced scheduling
Rundeck has a simple scheduling interface that allows you to run a job at a certain time every day or on selected days of the week. It also has a more advanced crontab scheduling option to allow you to run tasks on schedules like “every 15 minutes” or “1:30am every last Friday of the month”.

SSH support
Rundeck can run jobs locally on the machine Rundeck is installed on, or it can run jobs on remote machines using SSH. For most of our longer-running jobs, we have a single “worker server” that we run jobs on via SSH.
Rundeck has good support for multiple “node sources”. For example, in our installation, our techops team has set up Rundeck to use our master node list from Chef.
Ad-hoc commands
Rundeck also has a screen where you can run any ad-hoc command against one or more nodes. For example, you might want to check disk usage on a bunch of nodes:

Source control integration
One of the problems with UIs (like Task Scheduler) is that change tracking is hard or impossible. We wanted to be able to see who did what, and when. As of version 2.6.0, Rundeck has built-in support for source control integration via git. So we hooked it up to our git repo and now have a changed-tracked log of who did what to the job config.
All the project and job configuration is saved in YAML format, so is fairly easy to read, for example:
- description: ''
executionEnabled: true
group: Scheduled
name: Main nightly scripts
notification:
onfailure:
email:
attachLog: true
recipients: errors@oyster.com
sequence:
commands:
- jobref:
group: Task
name: Dump live database to snapshot
- jobref:
group: Task
name: Backup snapshot database
...
Other features
Rundeck has plenty of other features, many of which we’re not using yet:
Shout-out to our techops team who set up the Rundeck install for us and helped with various operational aspects.
We’re hiring!
If you like good engineering and automation and want to come work for us, apply here!