10 May 2012 by Ben
Here at Oyster.com most of our codebase is written in Python, which is of course open source, and we use many open source libraries, including web.py, Babel, and lxml. Now it’s time to give a (tiny bit) back.
Three third-party services we use heavily are Sailthru, QuickBase, and myGengo. When we started with them, they didn’t have Python libraries available, or the Python libraries that were available kinda sucked, so we rolled our own.
Note that the idea here is not a fully-fledged API, but an “at least what we needed” wrapper. It may well be what you need, too, so check out the source code on GitHub. Below are some quick examples.
Sailthru
Sailthru is the email provider we use to send transactional and mass emails.
>>> import sailthru
>>> sailthru.send_mail('Welcome', 'bob@example.com', name='Bobby')
>>> sailthru.send_blast('Weekly Update', 'Newsletter', 'The CEO', 'ceo@example.com',
'Your weekly update', '<p>This is a weekly update!</p>')
QuickBase
We use QuickBase as a nice user interface to enter and manage portions of our hotel data, and use this Python module to sync between our PostgreSQL database and QuickBase.
>>> import quickbase
>>> client = quickbase.Client(username, password, database='abcd1234')
>>> client.do_query("{'6'.EX.'Foo'}", columns='a')
[{'record_id': 1234, ...}]
>>> client.edit_record(1234, {6: 'Bar'}, named=False)
1
myGengo
Our site is (partially) translated into 5 languages, and myGengo provides an easy-to-use automated translation API. We also run the gettext strings in our web templates through myGengo.
>>> import mygengo
>>> client = mygengo.Client(api_key, private_key)
>>> client.get_account_balance()
'42.50'
>>> client.submit_job('This is a test', 'fr', auto_approve=True)
{'job_id': '1234', ...}
>>> client.get_job(1234)
{'body_tgt': "Il s'agit d'un test", ...}
Pull requests welcome
You’re welcome to send pull requests on our GitHub page, or comment below to send other feedback! If you like or use any of these APIs, we’d love to hear from you.
03 Apr 2012 by Chris
In March 2012, Apple announced and released the new iPad with a Retina display. There was only a short time between the announcement and when the device arrived in stores. This left a very short time for developers to get a working app out the door. And we wanted to take advantage of the wave of publicity, so we tried to get our app ready the same day the devices shipped.
There had been speculation for months that the next iPad would have a Retina display. But we didn’t want to start working on a Retina version without official specs. It wasn’t even confirmed that Apple would release a new iPad yet. Finally, on March 7th, Apple announced the new iPad with a Retina display. The released the official specs, and later that evening they released Xcode with a Retina simulator. The device would ship on March 16th. We had just over a week!
Porting to Retina
The biggest improvement that we cared about was the “Retina display” — Apple’s name for a screen resolution where the pixels are so small the average human eye can’t see them. This means that angled lines look smoother and more natural. The display has twice the resolution in both the vertical and horizontal directions. But the size of the screen remained the same, so each pixel is one quarter the size.
To minimize issues with the transition to Retina, Apple’s iOS APIs measure things in what they call “points”, rather than straight pixels. In the old iPads one point represents one pixel, in the new iPad one point represents four pixels.
Close-up on a non-Retina device
The same image on a Retina display
This makes it relatively simple to upgrade an app. Most of the work actually falls to the artists to generate larger sized images for UI elements. You bundle the new Retina assets in the app, and you don’t even have to modify the code — if the old asset is called “foo.png”, then name the new one “foo@2x.png”, and everything just magically works. But our app is photo heavy and relies on pulling down a lot of images off the web. This took some work to fix.
Huge photo sizes
The biggest amount of work involved making sure we request the right size photo. We have 31 different image sizes of each of our 750,000 hotel photos, and they are not all double the previous size. We chose to solve this in the client — if the app detects it’s running on a Retina device, it requests the Retina-ready photo.
The one challenge I ran into was some button icons that were being served by our web server. The original buttons were 16×16. We already had some 35×35 versions, and so the artist decided to use these, instead of creating new 32×32 ones. Since they were going on a button I figured it would be ok. Until I saw them … and they were just over twice as big as they should have been. The problem was iOS doubling the image size automatically. I didn’t want that — I wanted to use the same magic as UIImage does with “@2x” in the filename. The magic is in the “scale” attribute of the UIImage object. Set scale = 2 and the image is no longer doubled. I added this code to our image downloader, to mimic the “@2x” iOS supports.
UIImage *image = [[UIImage alloc] initWithData:imageData];
if ([url hasSuffix:@"@2x"])
{
image = [UIImage imageWithCGIImage:[image CGImage] scale:2.0 orientation:UIImageOrientationUp];
}
But let’s not quadruple the file size
We did have to create one new size for our photos — for the full-screen 2048×1536 images. This was no big deal, although it did take a few days to churn through all 750,000 hotel photos. But 2048×1536 is one big image, and contains a lot of data. With over 3 million pixels the image file sizes were four times as big as before. We didn’t want our app to become slower due to image load time, so we reduced the image quality on the larger sized images. We figured the Retina display would hide some of the JPEG aritifacts, and settled on a 65% quality level. The resulting file size was only about 1.5 times as big as before (instead of 4 times).
We worked fast and furious the last few days, to hit our goal of uploading to the app store on Friday. There were a few minor layout issues that had to be tweaked. While the simulator was nice (although I could only see about half of it on my 21 inch 1600×1200 monitor!) we had to test the program on a real device. But like everyone else we had to wait until they were released on Friday. After receiving the new iPad and verifying everything worked, we packed up the app and shipped it off .
Reviewed by Apple in six hours
Our app was verified by Apple in just six short hours! Our Retina version was available to the general public late Friday evening. Which means folks could see all of our thousands of awesome photos in glorious Retina display right from the very beginning. Also, being one of the first “retinized” apps, we made it on several of Apple’s top lists. This boosted our download rate immensely. It was worth the effort to take advantage of the small window Apple gave us, and makes our app that much better.
21 Dec 2011 by Anton
The one most valuable asset at Oyster.com is our photo collection. Take away the intellectual property and what’s left is, essentially, markup (with a bit of backend to snazz it up.) So we need a solid backup solution for the original high-res photos. The old servers were about to run out of capacity and their slightly outdated specs did not make transferring huge datasets any easier or faster. Thinking a few months ahead, we were looking at a 40TB data set. In strict accordance with KISS methodology, we opted against LTO and S3, and decided to build a big BOX. (For starters, 40TB on S3 costs around $60,000 annually. The components to build the Box — about 1/10th of that.)
Areca 1882ix-24 RAID Controller
Coincidentally, a great new product was just about to hit the market, reinforcing our decision with its timely relevance — the dual-core Areca ARC-1882ix RAID Host Bus Adapter, which comes with an on-board DDR3 SDRAM socket with up to 4GB chip support. Since we already opted for RAID Level 6 (striped, distributed parity–error checking, tolerates two disk failures) and dual-core RAID-On-Chip means it processes two streams of parity calculations simultaneously — it seemed ideal.
The first challenge in putting together the big box was getting internal SAS connectors properly seated into the backplane adaptor sockets, the bottom few being especially cumbersome to reach. Thankfully, our hardware technicians’ exceptional manual dexterity rendered having to disassemble the internal fan panel frame unnecessary.
Internal mini-SAS connectors (SFF-8087)
The housing assembly comes with six individual backplanes, each accommodating four SAS or SATA disks. Each backplane is secured to the drive bay assembly with three thumbscrews, their shape and material designed to fall within the required torque range when screwed on “as tight as possible.” As we found out the hard way, it is absolutely critical to ensure that each of the backplane cards is seated ‘full snug’ in the slot and secured dead tight with the thumbscrews. A shoddy connection is not always immediately obvious, it turns out. We observed intermittent timeouts on a particular drive bay as well as degraded overall system performance caused by one of the backplane cards not having been secured quite tight enough — however, the array was still functional, making troubleshooting an opaque nightmare.
One of the six backplane cards
One of the most important differences between this system and your run-of-the-mill high performance enterprise server with a couple of hard disks is the addition of: six back-plane cards, one 24-channel raid controller, 24 hard disks, and internal connectors — all creating a new potential point of failure (at least 37 additional ones). Every single component’s installation must strictly conform to spec, as the delicately balanced system immediately amplifies any fractional deviations exponentially, resulting in problems persisting for hours, days, and weeks, and many more lost megabytes per second.
If the configuration of all components is optimized, the small individual gains add up to a significant performance boost. At the risk of stating the obvious, things are much less likely to go wrong in a stable, fine-tuned system which performs at max capacity.
Big Box with 24 hot-swappable drive bays
One of those things is aligning the physical array dimensions with the file system’s allocation units, in our case with a stripe size of 64K (since most of our image files are relatively large) on 512 byte blocks, we format using also 64K cluster (“allocation”) size. It eliminates the RAID on-board logic overhead of having to keep the logical disk synced with the physical.
A few important things regarding the driver must be mentioned. Windows operating systems inherited their native SAS/SATA RAID controller driver framework from SCSI technology, (SCSIport) and they have several serious drawbacks. I stumbled upon an interesting investigative white paper which goes into great detail about these issues. The preferred driver for modern SATA/SAS cards is the STORport driver, developed by the manufacturer’s consortium in response to the inadequate state of native drivers, which inherited limitations of the SCSI protocol. The STORport driver is not certified by Microsoft, therefore the OS by default installs the inferior SCSIport driver. Switching to the STORport driver visibly improved stability and performance during the project’s earliest stages, instantly bumping write speed by several dozen MB/sec.
Gig-E is the bottleneck
Having spent some extra time on research, fine-tuning, and optimizing the new server, we were glad to find that the gigabit network had became the bottleneck, rather than the all-too-commonly disappointing I/O. By aggregating both on-board gigabit network interfaces we can expect transfer rates of 200MB/s (the disks are 3Gbps), which is great for us to maintain a light, low-maintenance incremental backup. Another big advantage of this system in terms of capacity scaling is the external SAS connector which can accommodate another external box of disks to expand our array into. While not the fastest solution possible, it strikes a great balance between performance, value, and reliability (redundancy), which is exactly what we we are looking for.
23 Nov 2011 by Chris
Back in the day, Oyster.com was a C++ shop. One day we decided to convert to Python. We didn’t convert everything to Python, which left us with the task of bridging the gap between them. Of course there were issues when setting up this communication between the C++ and Python libraries.
We made the decision to convert to Python for several reasons. One main reason was to take advantage of some good but free Python libraries. We converted almost all of our code — it’s better and easier to maintain code in one language than in two. But one of our backend engines was complicated and it worked, so we decided to leave it in C++.
In simple cases, bridging the gap from Python to C++ is relatively easy. Python provides a routine that will convert data from Python’s managed memory to C++. PyArg_ParseTuple is used to convert incoming Python objects into C data types. We had to then iterate over the lists using PyList_GetItem to convert lists to arrays. We wrote the conversion function, everything worked, and we pushed the results into production.
But we experienced periodic crashes which we could not track down. While investigating the crashes we discovered that our multithreaded system was essentially only handling one incoming request at a time! Our C++ code had lots of dependencies and sometimes could take a while to return a result. It turned out that the entire Python server would block waiting for the C++ code to return.
The problem was Python’s Global Interpreter Lock (GIL). The GIL prevents multiple native threads from executing Python bytecodes at once. Apparently this is done because Python’s memory management isn’t thread safe.
While there are a few things that can be done to allow Python to play nicely with C++ in a multithreaded environment, we were in a time crunch to get the problem solved. The problem wasn’t so much the multithreading, but the amount of time that was spent inside the C++ code. If the C++ code is quick then Python won’t block for very long.
We solved our problem by redesigning the flow — instead of calling our C++ code directly inside Python, we switched to running our C++ code in a separate process and talking to it via a simple HTTP API. Our multithreading and GIL issues disappeared and became multi-processing issues (where there are much clearer, safer boundaries).
The lesson we learned from this is that multithreading is difficult, even when it looks simple. Python and C++ can play nicely with each other, as long as the C++ call is quick. You can’t let the call into C++ block for too long, as you need to let Python release the GIL occasionally to let other threads run. While it may be possible to solve this problem, we deemed our new multi-process HTTP solution worked great (in fact it probably works better this way), and we didn’t have the time to delve into a solution involving Python’s GIL.
As always, tread carefully when doing any sort of multithreading.
31 Oct 2011 by Ben
Here at Oyster.com part of our web stack consists of web.py and CherryPy, and on the whole we’ve found they’re fast and stable. However, a little while ago, the CherryPy server started intermittently raising an exception and bombing out — a WindowsError due to an invalid handle on an obscure SetHandleInformation() call.
Auto-restart is not a solution
At first this was only happening once in a very long while, but after certain changes it would start happening a few times a day. We’ve got a script in place that restarts our servers when they die, but because of the aggressive caching we do, our web servers load tons of stuff from the database on startup, and hence take a while to load. So just letting our auto-restart scripts kick in wasn’t a solution.
On further digging, we found there was already a relevant CherryPy bug open, with someone else getting the same intermittent exception. They were working around it by changing an unrelated line of code, so something smelled fishy.
HANDLE != uint32
I noticed SetHandleInformation() was being called with ctypes, and had just recently been using ctypes for a named mutex class I’d written (to make Python’s logging module safe for writes from multiple processes).
ctypes is great for calling C DLLs when you just want a thin Python-to-C wrapper. Its defaults are good — for instance, Python integers get converted to 32-bit ints in C, which is normally what you want. SetHandleInformation()’s first parameter is a handle, which I (and apparently CherryPy) assumed was just an integer, so it was getting passed to C as a 32-bit value. However, it’s actually defined as a HANDLE, which is typed as void pointer, so on our 64-bit Windows machines it was actually a 64-bit value.
SetHandleInformation() was looking for the high 32 bits of the handle on the stack or in a register someone else owned, and of course sometimes those 32 undefined bits weren’t zero. Crash bang.
On being explicit
Once we realized what was happening, the fix was easy enough — ctypes lets you override the default conversions by specifying argument and return types explicitly. So we changed a straight ctypes call:
windll.kernel32.SetHandleInformation(sock.fileno(), 1, 0)
to a ctypes call with an explicit type spec, like this:
SetHandleInformation = windll.kernel32.SetHandleInformation
SetHandleInformation.argtypes = [wintypes.HANDLE, wintypes.DWORD, wintypes.DWORD]
SetHandleInformation.restype = wintypes.BOOL
SetHandleInformation(sock.fileno(), 1, 0)
Lo and behold, we were now telling ctypes to respect the function’s signature, and everything worked fine. We told the CherryPy folks and they were quick to implement this fix and resolve the bug.
So don’t be scared of ctypes, but just remember, it doesn’t memorize Windows.h, so avoid pain and suffering by telling it your types. Explicit isn’t for raunchy movies — it’s point #2 in the Zen of Python.