A Faster UIWebView Communication Mechanism #

tl;dr: Use location.hash (instead of location.href or the src attribute of iframes) to do fast synthetic navigations that trigger a UIWebViewDelegate's webView:shouldStartLoadWithRequest:navigationType: method.

As previously mentioned, Quip's editor on iOS is implemented using a UIWebView that wraps a contentEditable area. The editor needs to communicate with the containing native layer for both big (document data in and out) and small (update toolbar state, accept or dismiss auto-corrections, etc.) things. While UIWebView provides an officially sanctioned mechanism for getting data into it (stringByEvaluatingJavaScriptFromString¹), there is no counterpart for getting data out. The most commonly used workaround is to have the JavaScript code trigger a navigation to a synthetic URL that encodes the data, intercept it via the webView:shouldStartLoadWithRequest:navigationType: delegate method and then extract the data out of the request's URL².

The workaround did allow us to communicate back to the native Objective-C code, but it seemed to be higher latency than I would expect, especially on lower-end devices like the iPhone 4 (where it was several milliseconds). I decided to poke around and see what happened between the synthetic URL navigation happening and the delegate method being invoked. Getting a stack from the native side didn't prove helpful, since the delegate method was invoked via NSInvocation with not much else on the stack beyond the event loop scaffolding. However, that did provide a hint that the delegate method was being invoked after some spins of the event loop, which perhaps explained the delays.

On the JavaScript side, we were triggering the navigation by setting the location.href property. By starting at the WebKit implementation of that setter, we end up in DOMWindow::setLocation, which in turn uses NavigationScheduler::scheduleLocationChange³. As the name “scheduler” suggests, this class requests navigations to happen sometime in the future. In the case of explicit location changes, a delay of 0 is used. However, 0 doesn't mean “immediately”: a timer is still installed, and WebKit waits for it to fire. That involves at least one spin of the event loop, which may be a few milliseconds on a low-end device.

I decided to look through the WebKit source to see if there were other JavaScript-accessible ways to trigger navigations that didn't go through NavigationScheduler. Some searching turned up the HTMLAnchorElement::handleClick method, which invoked FrameLoader::urlSelected directly (FrameLoader being the main entrypoint into WebKit's URL loading). In turn, the anchor handleClick method can be directly invoked from the JavaScript side by dispatching a click event (most easily done via the click() method). Thus it seemed like an alternate approach would be to create a dummy link node, set its href attribute to the synthetic URL, and simulate a click on it. More work than just setting the location.href property, but perhaps it would be faster since it would avoid spinning the event loop.

Once I got that all hooked up, I could indeed see that everything was now running slightly faster, and synchronously too — here's a stack trace showing native-to-JS-to-native communication:

 #0: TestBed`-[BenchmarkViewController endIteration:]
 #1: TestBed`-[BenchmarkViewController webView:shouldStartLoadWithRequest:navigationType:]
 #2: UIKit`-[UIWebView webView:decidePolicyForNavigationAction:request:frame:decisionListener:]
...
#17: WebCore`WebCore::FrameLoader::urlSelected(...)
...
#23: WebCore`WebCore::jsHTMLElementPrototypeFunctionClick(...)
#24: 0x15e8990f
#25: JavaScriptCore`JSC::Interpreter::execute(...)
...
#35: UIKit`-[UIWebView stringByEvaluatingJavaScriptFromString:]
#36: TestBed`-[BenchmarkViewController startIteration]
...

More recently, I took a more systematic approach in evaluating this and other communication mechanisms. I created a simple test bed and gathered timings (measured in milliseconds) from a few devices (all running iOS 7):

Method/Device iPhone 4
A4
iPad Mini
A5
iPhone 5
A6
iPhone 5s
A7
Simulator
2.7 GHz Core i7
location.href 3.88 2.01 1.31 0.84 0.22
<a> click 1.50 0.87 0.58 0.40 0.13
location.hash 1.42 0.86 0.55 0.39 0.13
frame.src 3.52 1.86 1.16 0.87 0.29
XHR sync 8.66 3.25 2.19 1.34 0.45
XHR async 6.38 2.32 1.62 1.00 0.33
document.cookie 2.89 1.22 0.78 0.55 0.16
JavaScriptCore 0.33 0.18 0.14 0.09 0.03

The mechanisms are as follows:

  • location.href: Setting the location.href property to a synthetic URL.
  • <a> click: Simulating clicking on an anchor node that has the synthetic URL set as its href attribute.
  • location.hash: Setting the location.hash property to a the data encoded as a fragment. The reason why it's faster than replacing the whole URL is because same-page navigations are executed immediately instead of being scheduled (thanks to Will Kiefer for telling me about this). In practice this turns out to be even faster than <a> click since the latter fires a click event, which results in a hit target being computed, which forces layout.
  • frame.src: Setting the src property of a newly-created iframe. Based on examining the chrome.js file inside the Chrome for iOS .ipa, this is the approach that it uses to communicate: it creates an iframe with a chromeInvoke://... src and appends it to the body (and immediately removes it). This approach does also trigger the navigation synchronously, but since it modifies the DOM the layout is invalidated, so repeated invocations end up being slower.
  • XHR sync/async: XMLHttpRequests that load a synthetic URL, either synchronously or asynchronously; on the native side, the load is intercepted via a NSURLProtocol subclass. This is the approach that Apache Cordova/PhoneGap prefers: it sends an XMLHttpRequest that is intercepted via CDVURLProtocol.This also ends up being slower because the NSURLProtocol methods are invoked on a separate thread and it has to jump back to the main thread to invoke the endpoint methods.
  • document.cookie: Having the JavaScript side set a cookie and then being notified of that change via NSHTTPCookieManagerCookiesChangedNotification. I'm not aware of anyone using this approach, but the idea came to me when I thought to look for other properties (besides the URL) that change in a web view which could be observed on the native side. Unfortunately the notification is triggered asynchronously, which explains why it's still not as fast as the simulated click.
  • JavaScriptCore: Direct communication via a JSContext using Nick Hodapp's mechanim. Note that this approach involves adding a category on NSObject to implement a WebFrameLoadDelegate protocol method that is not present on iOS. Though the approach degrades gracefully (if Apple ever provides an implementation for that method, their implementation will be used), it still relies on enough internals and "private" APIs that it doesn't seem like a good idea to ship an app that uses it. This result is only presented to show the performance possibilities if a more direct mechanism were officially exposed.

My tests show the location.hash and synthetic click approaches consistently beating location.href (and all other mechanisms), and on low-end devices they're more than twice as fast. One might think that a few milliseconds would not matter, but when responding to user input and doing several such calls, the savings can add up⁴.

Quip has been using the location.hash mechanism for more than a year, and so far with no ill effects. There a few things to keep in mind though:

  • Repeatedly setting location.href in the same spin of the event loop results in only the final navigation happening. Since the location.hash changes (and synthetic clicks) are processed immediately, they will all result in the delegate being invoked. This is generally desirable, but you may have to check for reentrancy since stringByEvaluatingJavaScriptFromString is synchronous too.
  • For synthetic clicks, changing the href attribute of the anchor node would normally invalidate the layout of the page. However, you can avoid this by not adding the node to the document when creating it.
  • Additionally, not having the anchor node in the document also avoids triggering any global click event handlers that you have have registered.

I was very excited when iOS 7 shipped public access to the JavaScriptCore framework. My hope was that this would finally allow something like Android's @JavaScriptInterface mechanism, which allows easy exposure of arbitrary native methods to JavaScript. However, JavaScriptCore is only usable in standalone JavaScript VMs; it cannot (officially⁵) be pointed at a UIWebView's. Thus it looks like we'll be stuck with hacks such as this one for another year.

Update on 1/12/2014: Thanks to a pull request from Felix Raab, the post was updated showing the performance of direct communication via JavaScriptCore.

  1. I used to think that this method name was preposterously long. Now that I've been exposed to Objective-C (and Apple's style) more, I find it perfectly reasonable (and the names that I choose for my own code have also gotten longer too). Relatedly, I do like Apple's consistency for will* vs. did* in delegate method names, and I've started to adopt that for JavaScript too.
  2. There are also more exotic approaches possible, for example LinkedIn experimented with WebSockets and is (was?) using a local HTTP server.
  3. Somewhat coincidentally, NavigationScheduler is where I made one of my first WebKit contributions. Though back then it was known as RedirectScheduler.
  4. As Brad Fitzpatrick pointed out on Google+, a millisecond is still a long time for what is effectively two function calls. The most overhead appears to come from evaluating the JS snippet passed to stringByEvaluatingJavaScriptFromString, followed by constructing the HTTP request that is passed to webView:shouldStartLoadWithRequest:navigationType:.
  5. In addition to implementing a WebFrameLoadDelegate method, another way of getting at a JSContext is via KVO to look up a private property. The latter is in some ways more direct and straightforward, but it seems even more likely to run afoul of App Store review guidelines.

Programmatically accepting keyboard auto-corrections on iOS #

tl;dr: To programatically accept keyboard auto-corrections on iOS, call reloadInputViews on the first (current) UIResponder.

Quip's document editor is implemented via contentEditable. This is true not just for the desktop web version, but also for the iOS and Android apps. So far, this has been a good way of getting basic editing behavior from the browser for “free” on all platforms while also having enough control to customize it for Quip's specific needs.

One area where mobile editing behavior differs from the desktop is in the interaction with the auto-correction mechanisms that on-screen keyboards have. Normally auto-corrections are transparent to web content, but Quip needs to override the behavior of some key events, most notably for the return key. Since the return key also accepts the auto-correction, we needed a way to accept the auto-correction without actually letting the key event be processed by the contentEditable layer or the browser in general¹.

Kevin did some research into programmatically accepting auto-corrections, and it turned out that this could be done by temporarily swapping the firstResponder. He implemented this (though most of the editor is in JavaScript, we do judiciously punch holes² to native side where needed) and all was well.

However, a few months later, when we started to test Quip with the iOS 7 betas, we noticed that accepting auto-corrections no longer worked. Kevin went once more unto the breach. He observed that the next/previous form element buttons that iOS places above web keyboards (that we normally hide) also had the side-effect of accepting auto-corrections. He thus implemented an alternate mechanism on iOS 7 that simulated advancing to a dummy form elements and the going back.

Once the initial iOS 7 release was out the door and we had some time to regroup (and I had a train ride that I could dedicate to this), I thought I would look more into this problem, to see if I could understand what was happening better. The goal was to stop having two divergent code paths, and ideally find a mechanism with fewer side effects (switching the firstResponder resulted in the keyboard being detached from the UIWebView, which would sometimes affect its scroll offset).

The first step was to better understand how the iOS 6 and 7 mechanisms worked. Stepping through them with a debugger seemed tedious, but I guessed that a notification would be sent as part of the accept happening. I therefore added a listener that logged all notifications:

[NSNotificationCenter.defaultCenter addObserverForName:nil
                                                object:nil
                                                 queue:nil
                                            usingBlock:^(NSNotification *notification) {
    NSLog(@"notification: %@, info: %@", notification.name, notification.userInfo);
}];

This logged a lot of other unrelated notifications, but there was something that looked promising:

notification: UIViewAnimationDidCommitNotification, info: {
    delegate = "<UIKeyboardImpl: 0xea33dd0; frame = (0 0; 320 216); opaque = NO; layer = <CALayer: 0xea2bd80>>";
    name = UIKeyboardAutocorrection;
}

This looks like a (private) notification that is sent when the animation that shows the auto-correction is being committed. Since committing of animations happens synchronously, whatever triggered the accept must still be on the stack. I therefore changed the listener to be more specific:

[NSNotificationCenter.defaultCenter addObserverForName:@"UIViewAnimationDidCommitNotification"
                                                object:nil
                                                 queue:nil
                                            usingBlock:^(NSNotification *notification) {
    if (notification.userInfo && [@"UIKeyboardAutocorrection" isEqualToString:notification.userInfo[@"name"]]) {
        NSLog(@"commited auto-correction animation");
    }
}];

The log statement isn't that interesting in and of itself, but I used it as a place to add a breakpoint to it that logs the callstack³. Now I could see how accepting auto-corrections on iOS 6 worked (where we made a dummy UITextView become the first responder). That had a stack of the form:

....
#11: UIKit`-[UIKeyboardImpl acceptAutocorrection] + 141
#12: UIKit`-[UIKeyboardImpl setDelegate:force:] + 377
#13: UIKit`-[UIKeyboardImpl setDelegate:] + 48
#14: UIKit`-[UIPeripheralHost(UIKitInternal) _reloadInputViewsForResponder:] + 609
#15: UIKit`-[UIResponder(UIResponderInputViewAdditions) reloadInputViews] + 175
#16: UIKit`-[UIResponder(Internal) _windowBecameKey] + 110
#17: UIKit`-[UIWindow _makeKeyWindowIgnoringOldKeyWindow:] + 343
#18: UIKit`-[UIWindow makeKeyWindow] + 41
#19: UIKit`+[UIWindow _pushKeyWindow:] + 83
#20: UIKit`-[UIResponder becomeFirstResponder] + 683
#21: UIKit`-[UITextView becomeFirstResponder] + 385
...

Whereas on iOS 7, where we accepted it by hijacking the next/previous form control accessory buttons the path was:

...
#12: UIKit`-[UIKeyboardImpl acceptAutocorrection] + 197
#13: UIKit`-[UIKeyboardImpl setDelegate:force:] + 534
#14: UIKit`-[UIKeyboardImpl setDelegate:] + 48
#15: UIKit`-[UIPeripheralHost(UIKitInternal) _reloadInputViewsForResponder:] + 374
#16: UIKit`-[UIResponder(UIResponderInputViewAdditions) reloadInputViews] + 287
#17: UIKit`-[UIWebBrowserView assistFormNode:] + 265
#18: UIKit`-[UIWebBrowserView accessoryTab:] + 110
#19: UIKit`-[UIWebFormAccessory _nextTapped:] + 50
...

UIKeyboardImpl's acceptAutocorrection was the holy grail, but as a private API it may not be used — what I was looking for in these stack traces was a publicly callable method. A close reading (see the frames highlighted in blue) showed that there were (at least) two different triggers for accepting the auto-correction:

  1. The "key" UIWindow changing (the key window is the one that's receiving keyboard events)
  2. The UIResponder.reloadInputViews method (input (accessory) views are additions to the keyboard)

It therefore seemed worthwhile to try to trigger either one more directly. Looking at the UIApplication.sharedApplication.windows list, I saw that there were two windows (in addition to the main window, there was another of type UITextEffectsWindow, another private class). I could therefore simulate the key window changing by switching between them:

UIWindow *keyWindow = UIApplication.sharedApplication.keyWindow;
UIWindow *otherWindow = nil;
for (UIWindow *window in UIApplication.sharedApplication.windows) {
    if (window != keyWindow) {
        otherWindow = window;
        break;
    }
}
if (otherWindow) {
    [keyWindow resignKeyWindow];
    [otherWindow makeKeyWindow];
    [keyWindow makeKeyWindow];
}

That worked! But there was also the other approach to investigate. To call reloadInputViews, I needed to find the current first responder (it's not necessarily the UIWebView itself). I accomplished that by walking through the view hierarchy to find it. Sure enough, the first responder was a (private) UIWebBrowserView class and calling reloadInputViews on it accepted the correction.

Of the two approaches, the reloadInputViews approach seemed preferable, since it relied the least on undocumented behavior. The other approach assumed that there is always another UIWindow present, which doesn't necessarily seem safe, especially since the documentation says "an app has only one window" (it also cautions against invoking resignKeyWindow directly). Now that I knew what to search for, I could also see that reloadInputViews seems to have worked since at least the iOS 5 days. Finally, it also had the advantage of not causing any spurious scrolling due to the UIWebView (temporarily) losing the keyboard.

As you can see, I'm still learning my way around iOS programming. I'm finding that what takes the longest to learn is not the framework itself⁴. Rather, it's all of the various tricks and tools that are needed to debug perplexing or confusing behavior. I don't see any shortcuts for gaining that knowledge. Even if someone were to hand me a giant list of all UIKit gotchas, I wouldn't understand most of them, or I wouldn't be able to recall and apply them at the right time. I didn't learn web front end developement by just reading through a big list of browser quirks, and that's not how programming works in general.

  1. If the behavior that contentEditable provides is not exactly what we want, we generally prefer to bypass it altogether and inplement it ourselves, instead of letting contentEditable do its thing and then patching things up. The patching is brittle, especially across browsers.
  2. See this post on our JavaScript-to-native communication mechanism.
  3. Spark Inspector has fancy-looking notification tracing support, but I haven't tried it yet.
  4. When I do need to browse around UIKit (or any other documentation set), Dash has proven indispensable.

Quip: Back to app development #

Though I've mentioned it off-hand a couple of times, I haven't explicitly blogged that I left Google towards the end of 2012¹. I joined Quip, a small start-up in San Francisco that's trying to re-think word processing.

My ​mid-2010 switch to the Chrome team was prompted by a desire to understand “the stack” better, and also a slight fear of being typecast as a frontend guy and thus narrowing my options of projects. However, after several months of working on the rendering engine, I realized that I missed the “we've shipped” feeling created by being on a smaller, focused project with distinct milestones (as opposed to continuously improving the web platform, though I certainly appreciate that as a platform client). I thus switched to the Chrome Apps team, co-leading the effort to build the “packaged apps” platform. We shipped the developer preview at Google I/O 2012, and I'm happy to see that these apps are now available to everyone.

As we were building the platform, I got more and more jealous of the people building apps on top of it. Making samples scratched that itch a bit, but there's still a big gap between a toy app that's a proof of concept and something that helps people get their job done. I also found it hard to cope with the longer timelines that are involved in platform building — it takes a while to get developers to try APIs, making changes requires preserving backwards compatibility or having a migration strategy, etc. In the end, I realized that although I enjoy understanding how the platform works, I don't feel the need to build it myself.

When Bret approached me about Quip in the fall of 2012, a lot of the above clicked together for the first time. It was also very appealing to get to work (with a small yet very capable team) on something that I would be using every day².

My fears of becoming overly specialized were assuaged by being at a startup, where we're always resource constrained and every engineer has to wear multiple hats. Depending on the day, I may be working in C++, Objective-C, Java, JavaScript, or Python. The changes can range from fixing a bug in protocol buffer code generation to figuring out how to mimic iOS 7's translucent keyboard background color³.

My time working on Chrome also turned out to be well spent. Even on my first day I had to dig through the source to understand some puzzling behavior⁴. It's also been interesting to see how quickly my perspective on certain platform (anti-)patterns has changed. When I was working on Chrome, slow-running unload event handlers and (worse yet) synchronous XMLHttpRequests were obviously abominations that prevented a fluid user experience. Now that I work on a product that involves user data, those are the exact mechanisms that I need to use to make sure it's not lost⁵.

I also view browser bugs differently now. Before I took them as an immutable fact of life, just something to be worked around. Now I still do the workaround, but I also report them. This is the biggest change (and improvement) between the last time I did serious web development (early 2010) and now — more and more browsers are evergreen, so I can rely on bug fixes actually reaching users. This doesn't mean that all bugs get fixed immediately; some are dealt with more promptly than others (and don't get me started on contentEditable bugs).

The same benefit applies not just to bugs but also to features. I've been wishing for better exception reporting via the global onerror handler, and now that it's finally happening, we get to use it pretty quickly. It's not just Chrome or WebKit — when we started to look into Firefox support it didn't render Mac OS X Lion-style scrollbars, but shortly after our launch that shipped in Firefox 23, and we didn't have to implement any crazy hacks⁶.

I hope to write more about the technical puzzlers that I've run into, since that's another way in which this experience has been very different from Google. At a big company there are mailing lists with hundreds of engineers that can answer your questions. Now that blogs (and Stack Overflow) fulfill the same need, I'd like to return the favor.

  1. This post a bit narcissistic. The primary audience is myself, 20 years from now.
  2. I'm guessing that Quip has more design docs written than the average year-old startup, since creating them is a chance to do more dogfooding.
  3. And on some days, I fix the espresso machine.
  4. If your HTTP response has a Last-Modified header and no other caching directives, Chrome will cache it for 10% of the time delta between the current time and the last modified time.
  5. Quip auto-saves frequently, but there's still a small window between when a change is made and when a tab could be closed during which data could be lost.
  6. Not knocking the hacks, we have our own share of crazy code, but the less code that we ship to the users, the better.

Internet Memories #

An Accidental Tech Podcast episode from a few months ago had some reminiscing of the ways to get online in the mid-90s (most notably, the X2 vs. K56flex debate). I thought I would write down some of my earliest recollections of Internet access¹.

The first (indirect) Internet access that I recall having was in the spring of 1995². My dad had access to an FTP-to-email gateway³ at work, and had discovered an archive that had various space pictures and renderings. He would print out directory listings and bring them home. We would go over them, and highlight the ones that seemed interesting. The next day he would send the fetch requests, the files would be emailed to him, and he would bring them home on a floppy disk. In this way I acquired an early International Space Station rendering and a painting of Galileo over Io. I recall using the latter picture as my startup screen, back in the day when having so many extensions that they wrapped onto a second row was a point of pride, and the associated multi-minute startup time necessitated something pretty to look at.

A little while later, my dad found an archive with Mac software (most likely Info-Mac or WUArchive). This was very exciting, but all of the files had .sit.hqx extensions, which we hadn't encountered before (uuencoding was the primary encoding that was used in the email gateway). .hqx to turned out to refer to BinHex, which Compact Pro (the sole compression utility that I had access to⁴) could understand. A bit more digging turned up that .sit referred to StuffIt archives, and the recently released (and free) StuffIt Expander could expand them. We somehow managed to find a copy of StuffIt that was either self-expanding or compressed in a format CompactPro understood, and from that point I was set.

In the early summer of 1995, almost at the end of the school year, my school got Internet access through the 100 Schools project⁵. It would take until the fall for the lab to be fully set up, but as I recall there was a server/gateway (something in the NEC PC-98 family running PC-UX) connected to 3 or 4 LC 575s (see the last picture on this page). Also at the end of the school year a student who had graduated the year before came by, and told wondrous tales of high-speed in-dorm Internet access. At this point I still didn't have Internet access at home, but I did have a copy of the recently released version 1.1 of Netscape. I learned a bunch of HTML from the browser's about page​⁶, since it was the only page that I had local access to.

Sometime in the fall, we got a Telebit Trailblazer modem. The modem only had a DB-25 connector, so my dad soldered together an adapter cable that would enable it work with the Mac's mini-DIN8 “modem” port. This was used to dial into local and U.S.-based BBSes, at first with ZTerm and later with FirstClass

A little while later, we finally got proper Internet access at home. I had read Neuromancer over the summer, and it left a strong impression. Before dialing up to the ISP I would declare that I was going to “jack in” and demand that the lights be left off and that I not be disturbed, all for a more immersive experience.

In the spring of 1996 we upgraded to a Global Village⁷ Teleport Platinum, going from 9600 baud to 28.8Kbps⁸. Over the summer, the modem developed an annoying tendency to drop the connection without any warning, with no obvious trigger. It eventually became apparent that this was correlated with the air conditioning kicking in. The modem was a U.S. model, designed for 120V. Japan runs at 100V, and though most U.S. electronics worked at the lower voltage, the extra load caused by the air conditioning (on the same circuit) turning on was enough to briefly reset the modem. The problem was solved by a (surprisingly large) 100V-to-120V step up transformer.

Over the next couple of years there was my first domain name, an upgrade to ISDN (a blazing fast 128Kbps when going dual-channel!), a Hotline phase, etc. However, this part is less interesting, both because it became increasingly common to have Internet access, and thus the stories are more alike, and because it's better documented.

I don't expect the contents of this post to have been all that interesting to anyone else. It was still an interesting exercise, both in terms how much I could remember and how much I could reconstruct given my personal archive and what's on the Internet. Given the advantages of digital storage that I had, I'm that much more impressed that memoirs manage to achieve any reasonable level of accuracy.

  1. Also as a way of testing Quip's new link inspector/editor.
  2. To give a bit of context, I was in middle school and living in Japan at the time. The Japan bit is relevant given its lag in Internet adoption.
  3. Some people use the web in this fashion even today.
  4. Despite using it for years, I dutifully clicked “Not Yet” in the shareware startup nag screen every time. Sorry Bill Goodman!
  5. Based on this list of domain names the smis.setagaya.tokyo.jp domain name was activated on May 26, 1995.
  6. As I recall, it was more than just a wall of text, it used both the <center> tag and tables.
  7. For reasons that I still don't understand, Global Village initially had the domain name globalvillag.com.
  8. Later firmware-flashed to 33.6K

Using Google Reader's reanimated corpse to browse archived data #

Having gotten all my data out of Google Reader, the next step was to do something with it. I wrote a simple tool to dump data given an item ID, which let me do spot checks that the archived data was complete. A more complete browsing UI was needed, but this proved to be slow going. It's not a hard task per se, but the idea of re-implementing something that I worked on for 5 years didn't seem that appealing.

It then occurred to me that Reader is a canonical single page application: once the initial HTML, JavaScript, CSS, etc. payload is delivered, all other data is loaded via relatively straightforward HTTP calls that return JSON (this made adding basic offline support relatively easy back in 2007). Therefore if I served the archived data in the same JSON format, then I should be able to browse it using Reader's own JavaScript and CSS. Thankfully this all occurred to me the day before the Reader shutdown, thus I had a chance to save a copy of Reader's JavaScript, CSS, images, and basic HTML scaffolding.

zombie_reader is the implementation of that idea. It's available as another tool in my readerisdead.com collection. Once pointed at a directory with an archive generated by reader_archive, it parses it and starts an HTTP server on port 8074. Beyond serving the static resources that were saved from Reader, the server uses web.py to implement a minimal (read-only) subset of Reader's API.

The tool required no modifications to Reader's JavaScript or CSS beyond fixing a few absolute paths1. Even the alternate header layout (without the Google+ notification bar) is something that was natively supported by Reader (for the cases where the shared notification code couldn't be loaded). It also only uses publicly-served (compressed/obfuscated) resources that had been sent to millions of users for the past 8 years. As the kids say these days, no copyright intended.

A side effect is that I now have a self-contained Reader installation that I'll be able to refer to years from now, when my son asks me how I spent my mid-20s. It also satisfies my own nostalgia kicks, like knowing what my first read item was. In theory I could also use this approach to build a proxy that exposes Reader's API backed by (say) NewsBlur's, and thus keep using the Reader UI to read current feeds. Beyond the technical issues (e.g. impedance mismatches, since NewsBlur doesn't store read or starred state as tags, or has per item tags in general) that seems like an overly backwards-facing option. NewsBlur has its own distinguishing features (e.g. training and "focus" mode)2, and forcing it into a semi-functional Reader UI would result in something that is worse than either product.

  1. And changing the logo to make it more obvious that this isn't just a stale tab from last week. The font is called Demon Sker.
  2. One of the reasons why I picked NewsBlur is that it has been around long enough to develop its own personality and divergent feature set. I'll be the first to admit that Reader had its faults, and it's nice to see a product that tries to remedy them.

Getting ALL your data out of Google Reader #

Update on July 3: The reader_archive and feed_archive scripts are no longer operational, since Reader (and its API) has been shut down. Thanks to everyone that tried the script and gave feedback. For more discussion, see also Hacker News.

There remain only a few days until Google Reader shuts down. Besides the emotions1 and the practicalities of finding a replacement2, I've also been pondering the data loss aspects. As a bit of a digital pack rat, the idea of not being able to get at a large chunk of the information that I've consumed over the past seven and a half years seems very scary. Technically most of it is public data, and just a web search away. However, the items that I've read, tagged, starred, etc. represent a curated subset of that, and I don't see an easy of recovering those bits.

Reader has Takeout support, but it's incomplete. I've therefore built the reader_archive tool that dumps everything related to your account in Reader via the "API". This means every read item3, every tagged item, every comment, every like, every bundle, etc. There's also a companion site at readerisdead.com that explains how to use the tool, provides pointers to the archive format and collects related tools4.

Additionally, Reader is for better or worse the papersite of record for public feed content on the internet. Beyond my 545 subscriptions, there are millions of feeds whose histories are best preserved in Reader. Thankfully, ArchiveTeam has stepped up. I've also provided a feed_archive tool that lets you dump Reader's full history for feeds for your own use.5

I don't fault Google for providing only partial data via Takeout. Exporting all 612,599 read items in my account (and a few hundred thousand more from subscriptions, recommendations, etc.) results in almost 4 GB of data. Even if I'm in the 99th percentile for Reader users (I've got the badge to prove it), providing hundreds of megabytes of data per user would not be feasible. I'm actually happy that Takeout support happened at all, since my understanding is that it was all during 20% time. It's certainly better than other outcomes.

Of course, I've had 3 months to work on this tool, but per Parkinson's law, it's been a bit of a scramble over the past few days to get it all together. I'm now reasonably confident that the tool is getting everything it can. The biggest missing piece is a way to browse the extracted data. I've started on reader_browser, which exposes a web UI for an archive directory. I'm also hoping to write some more selective exporters (e.g. from tagged items to Evernote for Ann's tagged recipes). Help is appreciated.

  1. I am of course saddened to see something that I spent 5 years working on get shut down. And yet, I'm excited to see renewed interest and activity in a field that had been thought fallow. Hopefully not having a a disinterested incumbent will be for the best.
  2. Still a toss-up between NewsBlur and Digg Reader.
  3. Up to a limit of 300,000, imposed by Reader's backend.
  4. If these command-line tools are too unfriendly, CloudPull is a nice-looking app that backs up subscriptions, tags and starred items.
  5. Google's Feed API will continue to exist, and it's served by the same backend that served Google Reader. However it does not expose items beyond recent ones in the feed.

Google Reader Shutdown Tidbits #

Based a lunch with Alan Green at Google on June 21, 2013. Posted on April 20, 2024, but backdated to the time that this was written in a private document.

The shutdown timing was mainly technical. There have been enough infrastructure changes that the Reader codebase has rotted, and it cannot be pushed to prod anymore. It sounded like there hadn't been any pushes for ~6 months. I'm pretty I pushed shortly before I left (October 2012), so it's a bit surprising that the code rotted that quickly.

The shutdown is mainly being handled by the SREs (Alan will actually be on vacation for the two weeks before July 1). It effectively sounded like they were going to be removing the GFE rules on July 1, and then take their time actually turning off any servers, since that actually involves understanding how things work and what depends on what. The FRBEs will definitely be running for a while longer, since there are other Google services that depend on them.

All of the feed data is going to be given to the Feeds team in Zurich (they also inherited the PubSubHubbub hub and maybe even the AJAX Feed API). They will hopefully archive it. Matt Cutts has been part of the cabal that has been trying to make the shutdown be handled as reasonably as possible.

He said politics didn't really factor into it. If it had been politics, the easiest thing would have been to do nothing, and let the service run as is idefinitely. Wipeout (Reader is not compliant, data for deleted Gaia accounts is still present) was a slight factor, but if that had been the only reason, it still would have been easier to let it keep running.

Once the shutdown decision was made, they needed to put someone's name on the blog posts (Google Blog, Reader Blog). Alan said he was OK with his name being on the Reader Blog, since he was the last engineer standing. However, it didn't make sense for this name (as a random engineer) to be on a Google Blog post that announced the shutting down of several services. It made more sense for a VP, or at least a director. PR asked several directors and VPs (including Alan Noble, the SYD site director), and they all begged off, saying that they had (external) people they were going to be meeting in the next couple of weeks, and if their name was on the blog post, they would just get a lot of hate about shutting down Reader. PR then asked Alan, and after thinking about it, he declined. PR thanked him for seriously considering it, and then went to Alan's manager and asked him to ask Alan. Alan declined again. PR asked his manager to ask him again, and Alan said he would only do it if they promoted him to director, and that was the last that he heard of it. Eventually Urs said he would be OK with putting his name on it. Alan seemed to have a pretty good opinion of Urs; that of all the VPs he was most willing to speak truthfully about Reader.

PR also gave Alan a document for posting to reader-discuss@ and internal Google+. It was apparently terrible, but he was at least allowed to rewrite it. In general it sounded like PR is now very involved in internal communications; Alan sounded rather cynical about that.

The blog post announcing the shutdown was done one day early. The idea was to take the opportunity of the new Pope being announced and Andy Rubin being replaced as head of Android, so that the Reader news may be drowned out. PR didn't apparently realize that the kinds of people that care about the other two events (especially the Pope) are not the same kind of people that care about Reader, so it didn't work.

This also screwed up the internal announcement plans. The idea had been to announce the management reshuffle on Tuesday, have a town hall about it Wednesday morning, and then announce the Reader shutdown on Wednesday afternoon, leaving TGIF (now on Thursdays) as the venue to discuss it. Since it all happened on Tuesday, the townhall ended up being dominated with Reader questions. They continued at TGIF, to the point where Sergey held up a microphone cable and said “If I bite down on this, will the pain stop?” Urs was the only VP who had decent answers to the Reader questions (Matt Cutts in particular spoke for a while defending Reader). About a month (?) later, there was a “bring your parent(s) to work day”, at which they held a special TGIF in Shoreline Amphitheatre. Parents were apparently encouraged to ask questions, and the first parent asked about the Reader shutdown, which elicited a lot of laughter from all the Googlers.

Image-based SVG Masking #

Image-based masking was first introduced by WebKit a few years ago, and has proven to be a useful CSS feature. Unfortunately browsers without a WebKit lineage do not support it, which makes it a less than appealing option for cross-browser development. There is however the alternative of SVG-based masking, introduced by Firefox/Gecko at least partly in response to WebKit's feature. My goal was to find some way to combine the two mechanisms, so that I could use the same image assets in both rendering engines to achieve the masking effect. My other requirement was that the masks had to be provided as raster images (WebKit can use SVG file as image masks, but some shapes are complex enough that representing it as a bitmap is preferable).

SVG supports an <image> element, so at first glance this would just be a matter of something like:

<style>
.mask {
  -webkit-mask: url("mask.png");
  mask: url(#svgmask);
}
</style>

<svg>
  <mask id="svgmask">
    <image xlink:href="mask.png" />
  </mask>
</svg>

Unfortunately, depending on your mask image, when trying that, you will most likely end up with nothing being displayed. A more careful reading shows that WebKit's image masks only use the alpha channel to determine what gets masked, while SVG masks use the luminance. If your mask image has black in the RGB channels, then the luminance is 0, and nothing will show through.

SVG 2 introduces a mask-type="alpha" property that is meant to solve this very problem. Better yet, code to support this feature in Gecko landed 6 months ago. However, the feature is behind a layout.css.masking.enabled about:config flag, so it's not actually useful.

After more exploration of what SVG can and can't do, it occurred to me that I could transform an alpha channel mask into a luminance mask entirely within SVG (I had initially experimented with using <canvas>, but that would have meant that masks would not be ready until scripts had executed). Specifically, SVG Filters can be used to alter SVG images, including masks. The feColorMatrix filter can be used to manipulate color channel data, and thus a simple matrix can be used to copy the alpha channel over to the RGB channels. Putting all that together gives us:

<style>
.mask {
  -webkit-mask: url("mask.png");
  mask: url(#svgmask);
}
</style>

<svg>
  <filter id="maskfilter">
    <feColorMatrix in="SourceAlpha"
                   type="matrix"
                   values="0 0 0 1 0
                           0 0 0 1 0
                           0 0 0 1 0
                           0 0 0 1 0" />
  </filter>

  <mask id="svgmask">
    <image xlink:href="mask.png" filter="url(#maskfilter)" />
  </mask>
</svg>

I've put up a small demo of this in action (a silhouette of the continents is used to mask various textures). It seems to work as expected in Chrome, Safari 6, and Firefox 21.

Source Quicklinks #

It occurred to me that I never blogged about my Source Quicklinks extension (source). I created it back in 2010, when I started working on the Chrome team, focusing on the WebKit side of things. I was spending a lot of time in Code Search trying to understand how things worked. Chromium's code search instance searches not just the Chromium repository itself, but also all its dependencies (WebKit, V8, Skia, etc.). A lot of times the only way to understand a piece of code was to look at the commit that added it (especially in WebKit code, where comments are scarce but the bug associated with the commit often provides the back story). I was therefore doing a lot of URL mangling to go from Code Search results to Trac pages that had "blame" (annotation) views.

Source Quicklinks screenshot

The extension made this process easier, adding a page action that provides back and forth links between Code Search, the Chromium repository (both the ViewVC and the Gitweb sites), WebKit's Trac setup and V8. Later, when the omnibox API became available, it also gained a "sql" search shortcut for the Chromium repository and commits.

Though I no longer work on Chrome, I still find myself going through those code bases quite often. For example, the best way to know whether something triggers layout in Blink/WebKit is by reading the code. I've therefore revved the extension to handle the Blink fork, in addition to cleaning up some other things that had started to bitrot. I also attempted to add cross-links between the WebKit and Blink repositories that takes into account the Blink reorganization, though we'll see how useful that ends up being as the codebases diverge more and more.

The People Behind Google Reader #

If Google Reader were a movie or TV show, at the end of the spectacle the credits would roll and you would get to see who was responsible for what you just saw. But in today's age, software "about box credits" are no longer common.

I thought it might be nice, for the sake of posterity, to list all those who worked on Reader over the years. There's been a lot of discussion about Reader's imminent shutdown, but most of it focused on Google (the corporate entity) and its strategy. However, at the end of the day, Reader was built by people. I and a few others have been lucky enough to be more visible, but everyone involved deserves credit and thanks. This is especially the case since as Chris and Brian have described, Reader faced quite a few internal struggles. As I remember it, nearly everyone on the Reader team explicitly requested to join it, and often had to fight to keep their role.

Coming up with this list was difficult, both technically (can you name all your coworkers going back 8 years?), and because it was tough to decide where to draw the line. Google is a big company, and many people in many supporting roles helped to Reader out. First, here's a list of all full-time Reader team members:

Additionally, here's others who contributed to Reader in various roles at Google:

  • Design: Micheal Lopez
  • Executives: Greg Badros, Jeff Huber, Pavni Diwanji
  • Legal: Halimah DeLaine
  • Localization: Gabriella Laszlo, John Saito, Katsuhiko Momoi, Sasan Banava
  • PR: Nate Tyler, Oscar Shine, Sonya Boralv
  • Product Management: Bruce Polderman, Sabrina Ellis
  • Product Marketing: Kevin Systrom, Louis Gray, Peter Harbison, Robby Stein, Tom Stocky, Zach Yeskel
  • Quality Assurance: Amar Amte, Jan Carpenter, Kavitha Venkatesan, Madhuri Kulkarni, Thanh Le
  • Site Reliability Engineering: Chen Wang, Christoph Pfisterer, David Parrish, Ed Bardsley, Eric Weigle, Gary Luo, Huaxia Xia, James Long, Jerry Zhiwei Cen, Keith Brady, Lantian Zheng, Liren Chen, Matthew Eastman, Nadav Samet, Niall Sheridan, Olivier Beyssac, Patrick Scott, Paul Chien, Pereira Braga, Petru Paler, Sara Smollett, Scott Lamb, Sebastian Adamczyk, Vladimir Filipović, Wensheng Wang, Yu Liao
  • 20% time and additional engineering: Aaron Boodman, Abdulla Kamar, Akshay Patil, Aman Bhargava, Brad Fitzpatrick, Brett Bavar, Brett Slatkin, Charles Chen, Ed Ho, John Pongsajapan, Olga Stroilova, Peter Baldwin, Steve Jenson, Steve Lacey, T.V. Raman, Wiktor Gworek
  • User Experience Designers: Jonathan Terleski, Sean McBride
  • User Experience Research: Anna Avrekh, David Choi, Nika Smith, Theresa Sobczak
  • User Support: Graham Waldon, Paul Wilcox, Wen-Ai Yu

I'm sure I'm missing names and got things wrong, so don't hesitate to contact me with corrections. And to everyone that I worked with on Reader, it was a pleasure!

P.S. For another take on the people behind Reader, see Chris's #unsungHeroesOfGoogleReader tweets.

A REPL for Chrome Apps APIs #

A few months ago, as I was making yet another test app to demonstrate a Chrome packaged app API, I wished for a REPL. In theory the Chrome Dev Tools would fit the bill, since the console lets you run arbitrary JavaScript statements. However, using the dev tools would still involve making a manifest with the right permissions and loading an unpacked app, and at least a background page if not an actual window to inspect. Once you inspected the right page, invoking and inspecting the results of asynchronous APIs would be tedious, with a lot of boilerplate to type every time.

I started to think about creating a purpose-built REPL for Chrome apps APIs. A generic REPL seemed out of the question, due to eval being disallowed due to the strict Content Security Policy used by apps. My initial thought involved a dropdown listing all functions in the chrome.* namespace and a way to invoke them with canned values (eval may be disallowed, but dynamic invocation of the form chrome[namespace][methodName](arg) is still possible). However, that seemed clunky, and wouldn't help with APIs like the socket one that need to chain several method calls with the parameters for one depending on the results of another.

I then thought more about the eval limitation, and if I could use sandboxed pages to create the REPL environment. In some ways that seemed contradictory; the whole point of sandboxed pages is that they don't have access to Chrome APIs (unlike the main frame/page). In exchange they can use less safe mechanisms such as eval (a form of privilege separation). However, sandboxed pages can communicate with the containing page and get data from them via postMessage1. In theory the input code could be eval-ed in the sandboxed frame, and when it tried to invoke Chrome APIs, the sandboxed frame would postMessage to the main frame, ask it to run that API method, get the result, and plug it back in the expression that was being evaluated.2

This plan hinged on fact that nearly all Chrome apps APIs are asynchronous already, thus it should be possible to create seemingly functionally identical proxies in the sandboxed frame. That way, as far as the user is concerned, they're running the original API methods directly. There would need to be some additional bookkeeping to make callback parameters work, but there was no technical barrier anymore.

Before talking about that bookkeeping, since we're now five paragraphs into the blog post, I should cut to the chase and give you a link to the REPL app that I ended up building: App APIs REPL (source). And if you'd like to see it in action, here's a screencast of it showing basic JavaScript expression evaluation and then a more complex example playing around with the socket API to mimic HTTP requests to www.google.com.

Here's how eval-ing the following statement works:

chrome.socket.create(
    'tcp',
    function(createInfo) {
      log(createInfo.socketId);
    });
  1. The main frame (also referred to as the "host" in the source code) gets the input and sends it to the sandboxed frame via a EVAL message. The sandbox dutifully evals it.
  2. chrome.socket.create is a stub that was created in the sandboxed frame: at application startup, the main frame walks over the chrome.* namespace and gathers all properties into a map and sends them to the sandbox (via a INIT_APIS message). The sandbox re-creates them, and for function properties and events a stub is generated.
  3. When the stub is invoked, it sends a RUN_API_FUNCTION message to the main frame with the API method (chrome.socket.create in this case) that should be run and its parameters. Most parameters can be copied directly via the structured clone algorithm that is used by postMessage.
  4. However, the second parameter is function that cannot be copied. Instead we generate an ID for it, put it in a pending callbacks map, and send the ID in its place.
  5. On the main frame side, the list of parameters is reconstructed. For function parameters, we generate a stub based on the ID that was passed in. Once we have the parameters, we invoke the API function (via dynamic invocation, see above) with them.
  6. When the stub function that was used as the callback parameter is invoked, it takes its arguments (if any), serializes them and then sends them and its function ID back to the sandboxed frame via a RUN_API_FUNCTION_CALLBACK message.
  7. The sandboxed frame looks up the function ID in the callbacks map, deserializes the parameters, and then invokes the function with them.
  8. The callback function uses the log() built-in function. That ends up sending a LOG message to the main frame with the data that it wants logged to the console.

Events work in a similar manner, with stubs being generated for add/removeListener() in the sandbox that end up adding/removing listeners in the main frame. There are two maps of listener functions, one in the sandboxed frame from ID to real listener, and one in the main frame from ID to stub/forwarding listener. This allows removing of listeners to work as expected.

The console functionality of the REPL is provided by jqconsole, which proved to the very easy to drop in and hook up input and output to. History of the console is persisted across app restarts via the storage API. Additional built-in commands like help and methods (which dumps a list of all available API methods) as implemented as custom getters getters in the global JavaScript namespace of the sandboxed frame. There's also a magic _ placeholder that can be used as a callback parameter or event listener; it will be replaced with a generated function that logs invocations.

In addition to being a useful developer and leaning tool, I hope that this REPL also helps with thinking with a sandboxed mindset. I know that the Content Security Policy that's used in apps has been controversial, with some taking it better than others. However, I think that privilege separation, declarative permissions, tying capabilities to user gestures/intent and other security features of the Chrome apps runtime are here to stay. CSP is applicable to the web in general, not just apps. Windows 8 requires sandboxing for store apps and its web-based apps are taking an approach similar to CSP to deter XSS. Sandboxing was one of the main themes for Mac desktop developers this year, with Apple finally pulling the trigger on sandbox requirements. Developers of large, complex applications were able to adapt them to the Mac OS X sandbox. That gives me hope that the Chrome app sandbox will not prevent real apps from being created. It's is starting with the even more restrictive web platform sandbox and relaxing it slightly, but is generally aiming for the same spot as the Mac one.

I'm also hopeful that there will be improvements that make it even easier to write secure apps. For example, the privilege isolation provided by sandboxed pages was inspired by a USENIX presentation (the paper presupposed no browser modifications, the Chrome team just paved the cowpath).

  1. pkg.js is a library that's cropped up recently for making such main/sandboxed frame communication easier.
  2. Note that this not the desired pattern for communication between the main and sandboxed frames. Ideally messages that are passed between the two should be as high-level as possible, with application semantics, not low-level Chrome API semantics. For example, if your sandboxed frame does image processing, it shouldn't get to pick the image paths that it reads/writes from; instead it should be given (and return) a blob of image data; it's up to the main frame to decide where it gets that image data (by reading a path on disk, from the webcam, etc.). Otherwise if the code in the sandbox is malicious, it could abuse the file I/O capability.