Google Reader Social Retrospective #

With the upcoming transition of social features in Google Reader to Google+, I thought this would be a good time to look back at the notable social-related events in Reader's history. For those of you who are new here, I was Reader's tech lead from 2006 to 2010.

Late 2004 to early 2005: Chris Wetherell starts work on "Fusion", one of the 20% projects that serve as prototypes for Google Reader. Among other neat features, it has a "People" tab that shows you what other people on the system are subscribed to and reading. There's no concept of a managed friends list, after all when the users are just a few dozen co-workers, we're all friends, right?

September 2005: Ben Darnell and Laurence Gonsalves add the concept of "public tags" to the nascent Reader backend and frontend. There are no complex ACLs, just a single boolean that controls whether a tag is world-readable.

October 2005: A remnant of the "People" tab is present in the HTML of the launched version of Google Reader, and an eagle-eyed Google Blogoscoped forum member notices it and speculates as to its intended use.

March 2006: Tag sharing launches, along with the ability to embed a shared tag as a widget in the sidebar of your blog or other sites. On one hand, tag sharing is quite flexible: you can share both individual items by applying a tag to them, and whole feeds (creating spliced streams) if you share folders. On the other hand, having to create a tag, share it and manually apply it each time is rather tedious. A lot of users end up sharing their starred items instead, since that enables one-click sharing.

Summer of 2006: As part of Brad Hawkes's summer internship, he looks into what can be done to make shared tags more discoverable (right now users have to email each other URLs with 20-digit long URLs). He whips up a prototype that iterates over a user's Gmail contacts and lists shared tags that each contact might have. This is neat, but is shelved for both performance (there's a lot of contacts to scan) and privacy (who exactly is in a user's address book?) concerns.

Reader &auot;share" actionSeptember 2006: Along with a revamped user interface, Reader re-launches with one-click sharing, allowing users to stop overloading starred items.

May 2007: Brad graduates and comes back work on Reader full-time. His starter project is to beef up Reader's support for that old school social network, email.

Fall of 2007: There is growing momentum within Google to have a global (cross-product) friend list, and it looks like the Google Talk buddy list will serve as the seed. Chris and I start to experiment with showing shared items from Talk contacts. We want to use this feature with our personal accounts (i.e. real friends), but at the same time we don't want to leak its existence. I decide to (temporarily) call the combined stream of friends' shared items "amigos". Thankfully, we remember to undo this before launch.

Friends' shared items treeDecember 2007: After user testing, revamps, and endless discussions about opt-in/out, shared items from Google Talk buddies launches. Sharing is up by 25% overnight, validating that sharing to an audience is better than doing it into the void. On the other hand, the limitations of Google Talk buddies (symmetric relationships only, contact management has to happen within Gmail or Talk, not Reader) and communication issues around who could see your shared items lead to some user stress too.

Spring of 2008: With sharing in Reader picking up steam, a few aggregators and leaderboards of shared items start to spring up. Louis Gray comes to the attention of the Reader team (and its users) by discovering the existence of ReadBurner before its creator is ready to announce it.

May 2008: Up until this point sharing has been without commentary; it was up to the reader of the shared item to decide if it had been shared earnestly, ironically, or to disagree with it. "Share with note" gives users an opportunity to attach a (hopefully pithy) commentary to their share. Also in this launch is the "Note in Reader" bookmarklet (internally called "Tag Anything") that allows users to share arbitrary pages through Reader.

August 2008: Incorporating the lessons learned from Reader's initial friends feature, the preferred Google social model is revamped. Instead of a symmetric friend list based on Google Talk buddies, there is a separate, asymmetric list that can be managed directly within Reader. The asymmetry is "push"-style: users decide to share items with some of their contacts, but it's up those contacts to actually subscribe if they wish (think "Incoming" stream on Google+, where people are added to a "See my Reader shared items" circle). This feature is brought to life by Dolapo Falola, who injects some much-needed humor into the Reader code: the unit tests use the Menudo band members to model relationships and friends acquire a (hidden) "ex-girlfriend" bit.

New comments indicatorMarch 2009: After repeated user requests, (and enabled by more powerful ACL supported added by Susan Shepard) comments on shared items are launched. Once again Dolapo is on point for the frontend side, while Derek Snyder does all the backend work and makes sure that Reader won't melt down when checking whether to display that "you have new comments" icon. The ability of the backend and user interface to handle multiple conversations about an item is stress-tested with a particularly popular Battlestar Galactica item.

May 2009: Bundles are launched, extended sharing from just individual tags to collections of feeds.

Hearts when like-ing an itemJuly 2009: Continuing the social learning process, the team (and Google) revamps the friends model once again, switching to a asymmetric "pull"-style (i.e. following) model. This is meant to be "pre-consistent" with the upcoming Google Buzz launch. Also included in this launch are better ties to Google Profiles and the ability to "like" items. In general there are so many moving parts that it's amazing that Jenna's head doesn't explode trying to design them all.

Also as part of this launch, intern Devin Kennedy's trigonometry skills are put to good use in creating an easter egg animation triggered when liking or un-liking an item after activating the Konami code.

August 2009: Up until this point, one-click sharing had mainly been for intra-Reader use only (though there were a few third-party uses, some hackier than others). With the launch of Send to (also Devin's work), Reader can now "feed" almost any other service.

February 2010: The launch of Google Buzz posed some interesting questions for the Reader team. Should items shared in Reader show up in Buzz? (yes!) Should we allow separate conversations on an item in Buzz versus Reader? (no!) With a lot of behind the scenes work, sharing and comments in Reader are re-worked to have close ties to Buzz, such that even non-Reader-using friends can finally get in on the commenting action.

March 2010: Partly as a tongue-in-cheek reaction to social developments within Google, and partly to help out some Buzz power users who were complaining that all the social features in Reader were slowing it down, I add a secret (though not for long) anti-social mode.

May 2010: Up until this point, it was possible to have publicly-shared items but only allow certain friends to comment on them. Though powerful, this amount of flexibility was leading to complexity and user confusion and workarounds. To simplify, we switch to offering just two choices for shared items, and in either case if you can see the shared item, you can comment on it.

As you can see, it's been a long trip, and with the switch to Google+ sharing features, Reader is on its fourth social model. This much experimentation in public led to some friction, but I think this incremental approach is still the best way to operate. Whether you're a sharebro, a Reader partier, a Gooder fan, the number 1 sharer or someone who "like"-d someone else, I am are very grateful that you were part of this experiment (and I'm guessing the rest of the past and present team is grateful too). And if you're looking to toast Reader for all its social stumbles accomplishments, the preferred team drink is scotch.

Adventures in Retro Computing #

One of the big assignments in my 7th English class was to write an autobiographical composition entitled "Me, Myself & I". This being 1994, "multimedia" was a buzzword, so students were also given the option of doing the assignment as an "interactive" experience instead*. I had been playing around with HyperCard, so I chose that option (it also meant extra computer time while the rest of the class was writing things out long-hand). I recall the resulting HyperCard stack being fun to work on, and it featured such cutting-edge things as startup 3D animation rendered with Infini-D (albeit, with the trial version that was limited to 30 frames only).

I'm a bit of a digital packrat, so I still have the HyperCard stack that I made 16 years ago. I recently remembered this and wanted to have a look, but lacked a direct way to view it. Thankfully, there are many options for emulating mid-90s 68K Macs. Between Basilisk II, a Quadra 650 ROM, Apple distributing System 7.5.3 for free, and a copy of HyperCard 2.4, I was all set. I was expecting to have more trouble getting things running, but this appears to be a pretty well-trodden path.

Me, Myself & I Screenshot I was pleasantly surprised that the full stack worked, including bits that relied on XCMDs to play back movies and show custom dialogs. The contents are a bit too embarrassing personal to share, but it contains surprisingly prescient phrases like "I will move to California and work for a computer company".

This stack also represents one of my earliest coding artifacts (outside of Logo programs that unfortunately got lost at some point), so I was also curious to look at the code. Unfortunately whenever that stack was loaded, all of the development-related menu commands disappeared. I remembered that stacks have user levels, and that lower ones are effectively read-only. I tried changing the user level in the Home stack, but to no effect: as soon as my stack was brought to the foreground, it was reset back to the lowest level. Hoping to disable script execution, I engaged in a bit of button mashing. Instead I rediscovered that holding down command and option shows all field outlines, including invisible fields. 13-year-old me was clever enough to include a backdoor – a hidden button in the lower right of all cards that when pressed reset the user level back to the development one.

Code-wise, 13-year-old me did not impress too much. There was a custom slider that moved between different events in my life, showing and hiding text in a main viewing area that was awfully repetitive:

on mouseDown
  repeat while the mouse is down
    set location of me to 118, mouseV()
    if mouseV() < 91 then
      set location of me to 118, 91
      walkfield
      exit mouseDown
    end if
    if mouseV() > 238 then
      set location of me to 118, 238
      stmarysfield
      exit mouseDown
    end if
  end repeat
  if mouseV() >= 91 and mouseV() <= 103 then
    set location of me to 118, 91
    walkfield
  end if
  if mouseV() > 103 and mouseV() <= 127 then
    set location of me to 118, 115
    talkfield
  end if
  ...and so on
end mouseDown

on walkfield
  play "Click"
  show card field walk
  hide card field talk
  hide card field beach
  hide card field garden
  hide card field school
  hide card field japan
  hide card field stmarys
end walkfield

on talkfield
  play "Click"
  hide card field walk
  show card field talk
  hide card field beach
  hide card field garden
  hide card field school
  hide card field japan
  hide card field stmarys
end walkfield

With Rosetta being removed from Lion, PowerPC-only Mac OS X software is next on the list of personally-relevant software to become obsolete (Iconographer in this case). Thankfully, it looks like PearPC is around in case I get nostalgic about 18-year-old me's output.

* I was initially going to have a snarky comment about the teacher** not realizing that the web was the way of the future, but after thinking about it more, having this level of flexibility was great, regardless of the specific technologies involved.

** Hi Mr. Warfield! Good luck with whatever is next!

Intersquares #

This past weekend I took part in the Foursquare Global Hackathon. I used this opportunity to implement an idea that I had while having dinner with Ann and Dan at Sprout:

The premise of the show How I Met Your Mother is that in the year 2030 the narrator (Ted) is telling his kids how he met their mother. He starts the story 25 years earlier (i.e. in 2005), and thus far (after 6 years), we've gotten a lot of hints, but we haven't met the mother yet. However, it's quite apparent that Ted has in fact been at the same venue as the mother several times. In a hypothetical world where Ted and the mother use Foursquare, I thought it would be neat if they could compare checkin histories and see all the near-misses that they had over the years.

Intersquares logoIntersquares does exactly that: you can sign in with your Foursquare account, and then once your checkin history is processed, another user can sign in with their account, and you'll both be told where you were together (whether you knew it at the time or not). This can be great for remembering first dates or for finding close calls.

To see it in action, feel free to see if you have run into me anywhere. There's also a screencast that demos the site.

This was my first hackathon, and I feel like it went pretty well (i.e. I managed to finish something). I definitely tried to keep the hack part in mind, as far as getting something together quickly. The code has a few dodgy technical decisions (keep all checkins in one entity property? what could go wrong?). It was also helpful to build on the same stack as Stream Spigot (App Engine, Python, Closure, Django Templates) so that I could lift a lot of utility code and patterns. Next time, I'd like to be a part of a team though: while being a one man band is satisfying (check out that logo), it does tend to limit the scope to toy-like apps such as this one.

The hackathon has prizes, so if Intersquares intrigues you, your vote is appreciated. Definitely check out some of the other entries too. I haven't gone through all of them yet, but so far Near, Magic Muggle Clock and Plan Your Next Trip all seem really neat.

Update on 9/28/2011: Intersquares was a finalist in the hackathon!

An interesting bug #

As Jonathan has blogged, "What is the hardest bug you've ever tackled?" is an interesting conversation starting point with engineers, one that I often use to start (phone) interviews. I usually rephrase it as "Describe an interesting or difficult bug that you ran into", since "hardest" often causes people to freeze up as they ponder whether the bug they have in mind is actually the hardest. In any case, most bugs become interesting if you ask "why?" enough.

Along these lines, here's a bug that I ran into in mid-2007 while I was working on Google Reader: Soon after a production push, we noticed that some users were complaining that Reader wasn't loading properly when they reloaded the page. Stranger still, others said that it wasn't working properly initially, but after a few reloads it would start working. Checking things in the office revealed similar inconsistent results: Reader would load for some but not for others. For those for whom Reader hadn't loaded successfully, it turned out to be because of a 404 that was returned when trying to load Reader's main JavaScript file.

This happened soon after Gears support was added to Reader, so we initially suspected some interaction with offline support. Perhaps an old version of the HTML was being used by some users, and that contained a link to a version of the JavaScript file that we didn't serve anymore. However, some quick Dremel-ing showed that we had never served the URLs that triggered 404s until the push began. Stranger still, not all requests for those URLs resulted in 404, only about half.

At this point a bit of background about Reader's JavaScript infrastructure is necessary. As previously mentioned, Reader uses the Closure Compiler for processing and minimization of JavaScript. Reader does runtime compilation, since it supports per-user experiments that would make it prohibitive to compile all combinations at build or push time. Instead, when a user requests their JavaScript file, the set of experiments for them is determined, and if we haven't encountered it before, a new variant is compiled and served. JavaScript (and other static resources) are served with a checksum of their contents in the filename. This allows each URL to be served with a far-future cache expiration header, and makes sure that when its content changes users will pick up changes by virtue of having a new URL to fetch.

The JavaScript URL is used in two places, once embedded as a <script src="..."> tag in the HTML, and once when requesting the file itself. The aforementioned compilation and serving steps happen once for each (identical) frontend machine, and some machines had one idea of what the URL should be, and others had a different expectation. Since the frontends are stateless, it was quite likely for users to request the JavaScript from a different one than the one they got the HTML with the URL from. If there was a mismatch, then the 404 would happen. However, if the user reloaded enough times, they would eventually hit a pair of machines that did think the JavaScript URL was the same.

I said the users were getting "seemingly" identical JavaScript, but there was actually a slight difference when doing a diff (which explained the difference in checksums). One variant contained return/^\s*$/.test(str == null ? "" : String(str)) while the other had return/^\s*$/.test((str == null ? "" : String(str))) (note the extra parentheses in the test() argument). The /^\s*$/ regular expression was distinctive enough that it was easy to map this as being the compiled version of the Closure function goog.string.isEmptySafe, which is defined as:

goog.string.isEmptySafe = function(str) {
  return goog.string.isEmpty(goog.string.makeSafe(str));
};

The goog.string.isEmpty and goog.string.makeSafe calls get inlined, hence the presence of the regular expression test and String() directly (note that the implementations may have changed slightly since 2007).

Now that I knew where to look, I began to turn compiler passes off until the output became stable, and it became apparent that the inlining pass itself was responsible. The functions would not be inlined in the same order (i.e. goog.string.isEmpty and then goog.string.makeSafe, or vice-versa), and in one case the the compiler decided to add extra parentheses for safety. Specifically, when inlining the compiler would check to see if the replacement AST node was of lower precedence that the one it was replacing. If it was, a set of parentheses was added to make sure that the meaning was not changed.

The current compiler inlining pass is very different from the one used at that point, but the relevant point here is that the compiler would use a HashSet to keep track of what functions needed to be inlined. The hash set was of Function instances, where Function was a simple class that had a couple of Rhino Node references. Most importantly, it didn't define either equals() or hashCode(), so identity/memory address comparisons and hash code implementations were used.

When actually inlining functions, the compiler pass would iterate through the HashSet, and since the Function instances corresponding to goog.string.isEmpty and goog.string.makeSafe had different addresses depending on the machine, they could be encountered in a different order. The fix was to switch the list of functions to inline to a List (Set semantics were not necessary, especially given that Function instances used identity comparisons so duplicates were not possible).

The inlining compiler pass had used a HashSet for a long time, so I was curious why this only manifested itself then. The explanation turned out to be prosaic: this was the first Reader release where goog.string.isEmptySafe was used, and there were no other places where there were nested inlineable function calls. (This bug happened around the time we switch to JDK6, which changed HashSet internals, but we hadn't actually switched to JDK6 at that point, so it was not involved).

None of this was reproducible when running a frontend locally or in the staging environment, since all those setups have a single frontend instance (they're of very low traffic). In those cases, no matter which version was compiled and which URL was generated, it was guaranteed to be serveable. To prevent the reoccurrence of similar bugs, I added a unit test that compiled Reader's JavaScript locally several times times, and made sure that the output did not change. Though not foolproof, it has caught a couple of other such problems before releases made it out into production.

The main reason why I enjoyed fixing this bug was because it involved non-determinism. However, unlike other non-deterministic bugs that I've been involved in, the triggering conditions were not so mysterious that it took months to solve.

There's a (web) app for that site #

Discovery (i.e., how a user finds apps to install) is an interesting aspect of app stores*. In some ways, discovery is not necessary: a significant appeal of the store is that it catalogs all the apps, so if the user is looking for a todo list or Twitter client, it's pretty obvious what to search for. However, that assumes that the user has a specific need in mind already, and is aware that that class of application exists.

Ads to promote apps are one way to expose users to apps that they hadn't heard of before. More generally, it's interesting to think of other "ambient" mechanisms that piggyback on existing user activities.

Along these lines, I thought I would play around with the Chrome Web Store set of apps. Ideally, if one is browsing a web site that has a corresponding app in the store, a page action icon would appear to indicate this, similarly to feed auto-discovery notification. Conveniently, hosted apps have a urls section in their manifest which indicates which URLs they want to include within the app. This seemed like a pretty good proxy for which URLs the app was "about". I extracted the URL patterns for a bunch of apps, cleaned them up a bit, and used that to implement a Chrome extension (source) which shows the aforementioned page action when visiting pages that match a Chrome Web Store entry.

Once I had that working, it seemed like a straightforward extrapolation to use the history API to also match browser history URLs against app data. When launched the extension shows apps that match history entries, sorted by recency (this is also available via the extension's options page). The fact that the app data lives locally means that all this matching can be done without uploading the history to a server, which is preferable from a privacy perspective.

Installing apps based on visited websites brings up the "aren't web apps just bookmarks?" question. As it turns out, some apps actually show a pretty different UI than the regular website. For example, the New York Times app features the Times Skimmer UI while the Vimeo app uses a "Couch Mode". The other aspect to consider is that bookmarks have several use cases. In addition to being launchers for frequently used sites, bookmarks are also used for gathering collections of items, remembering where to come back later, etc. Special-casing the launcher use case so that it implies "pretty icons on the homepage" may not be such a bad thing, even ignoring the other extra capabilities of apps.

The URL matching approach has its limitations. For example, the Foursquare Maps app doesn't show up for someone who has foursquare.com in their browser history, even though it ostensibly shows Foursquare data. That's because the app uses OAuth to accesses the Foursquare data on the server-side, so it doesn't have foursquare.com URL in its manifest. This sort of limitation could be fixed by allowing an explicit "this app is about this collection of URLs" entry in the manifest, though there are "interesting" implications to allowing an app to associated itself with a website that it doesn't necessarily own. On the plus side, such a mechanism would also allow this approach to be extended to any app store, even non-web app ones.

* "App store" is used generically in this post. Also, these are my idle weekend thoughts, not official Google promulgations.

In Praise of Incrementalism #

Pinky: "Gee, Brain, what do you want to do tonight?"
The Brain: "The same thing we do every night, Pinky — try to take over the world!"

My memory is a bit fuzzy, but from what I remember, if the Brain had set his sights slightly lower, he definitely could have taken over a city, or perhaps a small state as the first step in one night, and left the rest of the world to following nights.

Along these lines, I was talking with Dan about why I thought of Stack Overflow/Exchange as being significantly more successful than Quora. I wouldn't be surprised to find out that they have comparable traffic, users, or other metrics. However, from an outsider's perspective, Stack Overflow made fast progress on its initial goal of being a good programming Q&A site. There was never a clear mission accomplished moment, but at this point its success does not feel in doubt. There were follow-on steps, some more successful than others, and a general upward-and-onward feeling.

On the other hand, Quora's goals from the start were outrageous (in a good way): “Imagine a world where I knew everything that I wanted to know, as long as someone else in the world knew it.” I'm sure that having J.J. Abrams give his thoughts on monster/action scenes is a milemarker on that path. However, it's harder to see how far they've come or to feel like the site has a well-functioning foundation/core functionality, since the path is a continuous curve rather than a step function.*

Google might be considered a counter-example to this; from very early on its goal was quite broad and audacious. However, having a steady stream of corpora to add shows definite progress. There is also the matter of perceived goals versus actual internal goals. Thefacebook was long discounted by some as being a site just for college kids, surely even after they set their sights higher. Having others underestimate your ambition (but not too much, lest they ignore you) seems beneficial.

In the end, this probably reflects my personal bias towards the incremental Ben and Jerry's model. Though less exciting, over time it can lead to pretty good results.

* All of this might be a reflection of my being more aware of what Stack Overflow has done over the years via their podcast; Quora is harder to keep up with.

Non-fiction books for (curious) busy people #

I'm in the process of re-reading The Baroque Cycle and have gotten curious about Newton's time at the Royal Mint. Ideally, I would like something more detailed than the two paragraphs that Wikipedia devotes to this, but shorter than a 300+ page book*. I've had similar experiences in the past: no matter how much The Economist raved about a ~1000 page history of the British Navy, I was never able to commit to actually reading it all the way through. I think this is more than Internet-induced ADD; I manage to read a book every 4-6 weeks, and dedicating a slot to such a unitasker seems wasteful.

I realize that producing a shorter book on the subject may not be any cheaper or less resource/research-intensive than a long book. I would even be willing to pay the same amount for the digested version as I would for the full version. With recent developments like Kindle Singles there also wouldn't be the issue of fixed production/distribution costs that should be amortized by creating a longer book. Though fiction-centric, Charles Stross has a good explanation of why books are the length that they are.

I used to think that abridged editions, CliffsNotes, and the like were an abomination (as far as not getting the experience the author intended) and for lazy people. To some degree I still do; I think ideally these alternate editions should be produced by the same author, or with the author's blessing.

* As it turns out, there is a 128-page 1946 book about Newton's time at the Mint. Perhaps there was less need to pad then?

Update later that day: Based on the endorsement on Buzz I'll give the (modern) Newton book a try. Part of the reason why I was soured on longer non-fiction books was that I tried reading Operation Mincemeat and was put off by the amount of seemingly extraneous background information and cutesy anecdotes. Incidentally, Operation Mincemeat has a brief appearance in Cryptonomicon, another Neal Stephenson book – I promise that I read other authors too.

Chrome Startup Bookmarks Extension #

Continuing a tradition of making tools for family members, I made for my grandmother a simple Chrome extension that opens all the bookmarks in a folder at startup.

The extension code itself is nothing interesting (making the icon probably took longer). However, it does showcase a limitation of the current Chrome extension system. Since this extension needs to run code at startup, it needs a background page. The extension system architecture overview has a few more details, but briefly, background pages (and any other extension pages) end up in their own process. In this particular case, the background page is not needed after startup, but there is no way to indicate that, so the process hangs around indefinitely, wasting a bit of memory. There is a bug filed for this, part of a broader collection of changes that would enable certain classes of extensions to remove the need for (long-lived) background pages.

Asynchronous what now? #

A recent Daring Fireball article alludes to a iOS 4.3 Mobile Safari vs. UIWebView difference: "[Safari] uses asynchronous multithreading (UIWebView does not)." John Gruber doesn't make it clear what he means by "asynchronous multithreading", and the term itself seems fishy (doesn't threading imply asynchronous behavior? If you're going to block, why bother with threads?).

I tried to trace the source of this, and saw more possibly related references to asynchronous something or others: The Register says: "[UIWebViews] aren't rendered using Apple's newer 'asynchronous mode'. They're saddled with the old 'synchronous mode', which means means they don't quite look as good." Meanwhile, Ars Technica reports: "Developers have also noticed that full-screen Web apps also don't take advantage of MobileSafari's ability to asynchronously load scripts, which can cause some performance issues—particularly for games. The underlying WebKit engine gained this ability late last year, so it's not entirely clear if this issue is a regression, or if it is just new to MobileSafari and hasn't yet been carried over to WebSheet.app."

Based on the Ars Technica article, we would appear to have at last something concrete to test. However, using the <script defer> test case (mentioned in the WebKit blog post on asynchronous loading) with iOS 4.3, not even Mobile Safari actually loads the script asynchronously, (results - with defer support the result should be around 1,000ms). The same thing with a <script async> test case (results).

It's not clear whether Daring Fireball, The Register, or Ars Technica are talking about the same thing. The Register says they talked to an "unnamed developer", and the Ars Technica article came after (and references) The Register one, so it could just be a game of telephone, where they heard "asynchronous" and assumed it was about script loading. If anyone has concrete (technical) details, they would be appreciated. Alternatively, the iOS 4.3 release of WebCore and JavaScriptCore will show up on Apple's Open Source site eventually, and then it might be possible to investigate this behavior directly.

Feed Playback and Stream Spigot #

Feed Playback

A few weeks ago I came across sysadvent, an advent calendar-style blog with a new tool or tip each day. However, I only stumbled across it on day 15, and I wanted to read it from the beginning (and perhaps the previous years too). I could Instapaper all of the older posts but that would be 1) tedious 2) make it unlikely that I would ever actually read the entries, since seeing a pile of 36 long posts would be daunting.

Ann needed something similar a few years ago when she first came across cooking blogs like Chocolate & Zucchini and thepassionatecook, so I ended up writing a few scripts that scraped their archive and then generated a feed that would have a new post every day. Instead of one-off scraping scripts, I wondered if I could use Google Reader's archive of feeds to provide the backlog. Reader's been around for five and a half years by now, so this should work quite well for a lot of blogs.

Feed Playback is the resulting tool. You give it a blog or feed URL, and it generates a new feed for you that updates once a day (or more rarely if you'd like) with a new item from the original feed's archive. The use cases are learning a new skill or language, or reading a long-running comic from (nearly) the beginning. I also noticed that this playback idea seems to be a bit in vogue right now; as I was working on the tool I was reminded of Disunion (The New York Times' playback of the Civil War) or the Orwell Diaries (a playback of George Orwell's World War II diaries).

For the actual implementation of the tool, I decided to use Reader's shared tag functionality. For each playback, a corresponding shared tag is created. As the playback is advanced, items from Reader's archive of the original feed are tagged into it. This has a couple of advantages: serving of playback feeds is offloaded to Reader, instead of being the responsibility of my app. Perhaps more importantly, this means that copies of items are not made; the item that's in the playback feed is the one that was originally published. This means that metadata like who shared or liked it is preserved. In case any of this is interesting to you, source is available.

Stream Spigot

Working on this tool reminded me of my earlier work on Twitter Digest. Both are tools that enable me to more efficiently consume content. I decided to put both of them under the umbrella of Stream Spigot, and to (hopefully) collect more such tools at that site. Ben (or was it Chris?) would make TiVo analogies when describing Reader, and I think Stream Spigot can use that comparison too. Real-time content may be popular right now, but I think there's also a need for time-shifting tools that let you consume such content at your own convenience.

Though I'm sure I'll blog about big Stream Spigot changes here, if you'd like to keep up with more day-to-day tweaks, you can either follow the project on GitHub or its Twitter account.

WebKit Layout Tests: Practice #

This post originally appeared on the WebKit blog, but now appears to be gone. I've republished it here.

The previous post described the main testing mechanism of WebKit: layout tests. While it’s a (deceptively) simple yet powerful system, it’s not without its limitations. This post attempts to list some of the issues and areas for improvement.

Golden (expected) files

Each test has one or more “expected” output text or image files checked in alongside the test itself. For simple tests that assert some behavior, a basic text file with the “PASS” output is enough. However, for more complex tests, especially those that verify rendering behavior, the expected file is an image, and optionally a text dump of the render tree. Despite doing everything possible to ensure consistent output (always rendering a 800×600 image, using the same color space), the images can vary. For example, if the output has text (which gets anti-aliased), or form controls that have a platform-specific appearance, then the various platforms that WebKit is available on (Mac, GTK, Qt, etc.) will each need to have a different golden file.

There is a mechanism for handling per-platform expectations, but this does mean that changes that involve new or modified tests may need to worry about creating multiple golden files, often for platforms that the original developer doesn’t have access to. What usually happens is that changes are made with the expectation that tests will fail, and then when builders go red, new results are grabbed from them and used to update the checked in expectations (this is a process known as rebaselining, and there are tools to help).

One solution to this problem is “ref(erence) tests”, a concept borrowed from Mozilla. Instead of checking a pixel golden file, another HTML file is checked that attempts to arrive at the same result via a different (simpler, known to work) path. For example, if testing complex CSS float handling, the reference file would construct the same (pixel for pixel) layout using absolute positioning, which is (hopefully) an orthogonal codepath. Both files can be rendered to an image and compared, without having to worry about platform-specific output. Hayato Ito has been working on adding reftest support to the WebKit testing framework.

Flakiness

Having trustworthy tests helps to ensure both peace of mind (“was that a cosmic ray, or did I break something”) and ensure a smoother development experience (it’s no fun waiting for the commit queue to retry your patch because it ran into a flaky test). Flaky tests affect other projects too, and are perhaps an unavoidable problem in complex projects. In layout tests, they are most often caused by use of delays (i.e. setTimeout) that become brittle when test conditions change.

Julie Parent and Ojan Vafai had a flakiness crusade of sorts last year which helped a lot with this, but more help is always appreciated. Adam Barth and Eric Seidel have started to keep track of flaky tests and have the commit queue report them, and the Chromium WebKit port has a dashboard of flaky tests (in case you find it baffling, this page explains how to interpret it).

Test interdependence

A sub-category of test flakiness is caused by test interdependence: some tests will pass when run alone, but will fail when run as part of the whole suite (or vice-versa). Tests are not entirely isolated, the binary that they run in (DumpRenderTree) is only restarted every 1,000 tests for the sake of performance, and though some things are reset between each test, it’s not feasible to ensure a complete tabula rasa. Sometimes this is caused by obvious things, like usage of sessionStorage that is not cleaned up.

Other times, the interactions are much more subtle. For example, a patch that re-ordered some HTTP-level tests caused some entirely unrelated SVG tests to fail. It turned out that the reordering changed the chunking of tests (into groups of 1,000) and one test was triggering different kerning behavior in another due to overly coarse caching of some font rendering attributes (that has since been fixed).

Coverage

One would think that with over 20,000 tests, coverage would be good. However, given the billions and billions of web pages out there, it’s a somewhat common occurrence to break seemingly “obvious” things. I have personally broken the back button on both Google and Facebook, and the set of regression bugs shows that I’m not alone. Thankfully most of these are caught via nightly builds, and bug fixes always come with a test of their own, so one can only hope that things are improving.

Conclusion

While layout tests aren’t all rainbows, puppies, and sunshine, they are an important part of the WebKit project. The web is an ever-evolving creature, and they help us code fearlessly. If any of the challenges presented in this post tickle your interest, layout tests are a great way to get involved in the project (for example, if overly long C++ compiles are not your cup of tea, you may like working on the Python framework that runs the tests instead).

WebKit Layout Tests: Theory #

This post originally appeared on the WebKit blog, but now appears to be gone. I've republished it here.

When I began WebKit development, one of the things that I was curious about was how testing is handled. Having been a web developer, I was aware of both how many bugs browser rendering engines can have (though things are certainly getting better), and how increasingly complex web pages are pushing those engines more and more. Having to live with bugs for years is definitely something to be avoided, so enforcing spec compliance and avoiding regressions both seem key.

The WebKit solution to this is layout tests. At the simplest level, layout tests are simple web pages (the simpler the better) that are checked into the WebKit repository, along with expected renderings (golden files), either as text or as images. A test harness (run-webkit-tests) uses an app that embeds WebKit (DumpRenderTree) to go through the tests (all 20,000+ of them) and compares the result of rendering of test cases against golden files, and reports tests that fail the comparison, crash, time out, or otherwise behave unexpectedly. The WebKit project has builders that go through this process continuously across all platforms that it has been ported to, making it easy to spot changes that break things (and if they do, revert them).

Developers are also encouraged to run the tests before committing changes. The easiest way is to use the commit queue, which does this automatically. If not, running the full suite on a workstation is also quite feasible, it currently takes around 15 minutes and will be down to ~4 minutes or less with Dirk Pranke’s multi-process test runner.

With judicious use of test data, layout tests are used to verify the behavior of many things, from JavaScript engine spec compliance to repaint behavior and the WebSocket protocol implementation. For things like the latter that need network access, the test harness starts a local server (Apache, lighthttpd, or WebSocket) and runs tests from it. The local HTTP server is also useful for simulating network-related edge cases; it amuses me that I’ve had to learn and use more PHP in the past 6 months on WebKit than I have in 6 years of web development.

For simpler tests that are more in the unit test style (i.e. using assertions), there is a helper framework that makes this easy to set up. The golden file then is just a series of “success” statements.

Given that the layout test infrastructure tests not just rendering/layout, but also unit tests the JavaScript bindings, interactions with the network stack, does order-of-magnitude performance tests, and much more, the name “layout test” is increasingly inaccurate, something that gets discussed occasionally. Because of that flexibility, the layout test model also works well for importing third-party test suites. As part of layout tests, we run the Sputnik JavaScript conformance suite, Philip Taylor’s <canvas> suite, an HTML5 parser suite, and tests from other browser makers.

Generally layout tests accompany all check ins, especially those that fix bugs (to make sure that the bugs do not reappear). This also means that the first step in fixing a bug is reducing a possibly complex page that triggers the bug to something simpler. If you ever file a bug and it gets the NeedsReduction label, and you’re the author of the web page that exhibits the bug, you’re much better positioned than a WebKit developer in creating a minimal reduction. It’s much easier to investigate an issue if it boils down to reloading a page and looking for an alert, or the magical word “PASS”. It also means that if you provide a good reduced test case, you can achieve immortality insofar as your test being run hundreds of times a day.

A follow-up post discusses some realities of the layout test system. To learn even more about them see the WebKit wiki.