Most visible change #

I've been switching between Google Accounts a lot more lately, and it started to bother me that the login page that we show to mobile WebKit browsers doesn't use the email-specific virtual keyboard on iPhones, making it that much more annoying to type in an @ or a .com. The iPhone (as of iPhone OS 3.1) supports using a type="email" attribute on <input> tags, so I changed the login page template to use that.

It looks like the change is now live, and I'm pretty sure that this is the thing that I've done that will be seen by the most people (thus far).

iPhone screenshot

Of course it doesn't compare with Evan in the characters/user ratio.

Google Reader and Closure Tools #

Since Google Reader makes heavy use of the recently-open sourced Closure Tools, Louis Gray asked me to give a client's perspective on using them. He wrote up a great post summarizing my thoughts, and if you'd like to see the raw input, I've included it below:

There are three pieces to the Closure Tools, the compiler, the library and the template system. They appeared gradually, roughly in that order. The compiler in its current incarnation dates back to Gmail in 2004 (which is why Paul Buchheit refers to it as the "Gmail JavaScript compiler"), with the library and the template system starting a couple of years later.

Reader development started in early 2005, which meant that we always had the compiler available to us, and so except for early prototypes, we always ran our code through it. Until the last month leading up to the Reader launch in October 2005, the size benefits of the compiler were less important, since we were less focused on download time (and performance in general) and more on getting basic functionality up and running. Instead, the extra checks that the compiler does (e.g. if a function is called with the wrong number of parameters, typos in variable names) made it easier to catch errors much earlier. We have set up our development mode for Reader so that when the browser is refreshed, the JavaScript is recompiled on the server and is used with the page when it is reloaded. This results in a tight development loop that makes it possible to catch JavaScript errors as early as possible.

Since Reader development started before the library and template tools were available, we had homegrown code for doing both. There was actually shared code that did some of the same things as basic library functionality (e.g. a wrapper around getting the size of the window, handling different browser versions and quirks). However, that shared code was of various vintages (copied from project to project) and therefore not very consistent in style or quality. Erik Arvidsson's post talks a bit more about the inception of the Closure library (he's one of the co-creators, along with Dan Pupius).

Reader began using the Closure library and template systems gradually, first for new code and then replacing usages of the old shared library and our homegrown code. It was a gradual process, though I tried to keep it organized by doing an audit of all the usages of old code and their Closure equivalents, so that work could more easily be divided up (this was handled during "fixit" periods, where we focus on code quality more than features).

The benefits of the compiler system are tremendous. The most obvious are the size ones, without it Reader's JavaScript would be 2 megaytes, with it it goes down to 513K, and 184K with gzip (the compiler's output is actually optimized for gzip, since nearly all browsers support it). However, all of the above-mentioned checks, as well as many more that have been added over the past few years (especially type annotations) make it much more manageable to have a large JavaScript codebase that doesn't get out of control as it ages and accumulates features.

The library means that Reader is much less concerned about browser differences, since it tries very hard to hide all those away. Over time, the library has also moved up the UI "stack", going from just basic low level code (e.g. for handling events) to doing UI widgets. This means that it's not a lot of work to do auto-complete widgets, menus, buttons, dialogs, drag-and-drop, etc. in Reader.

One thing to keep in mind is that, as mentioned in the announcement blog post, these tools all started out as 20% projects, and for the most part are still dependent on it. If one project needs a feature from the compiler or the library that doesn't exist, they're encouraged to contribute it, so that other teams can benefit too. To give a specific example, Reader had some home-grown code for locating elements by class name and tag name (a much more rigid and simplified version of the flexible CSS selector-based queries that you can do with jQuery or with the Dojo-based goog.dom.query). As part of the process of "porting" to the Closure library, we realized that though there was an equivalent library function, goog.dom.getElementsByTagNameAndClass, it didn't use some of the more recent browser APIs that could it make it much faster (e.g. getElementsByClassName and the W3C Selector API). Therefore we not only switched Reader's code to use the Closure version, but we also incorporated those new API calls in it. This ended up making all other apps faster; it was very nice to get a message from Dan Pupius saying that the change had shaved off a noticeable amount of time in a common Gmail operation.

You can tell that there's something special about this when you look at the ex-Googlers cheering its release. If it had been some proprietary antiquated system that they had all been forced to use, they wouldn't have been so excited that it was out in the open now :)

If you'd like to know more about Closure, I recommend keeping an eye on Michael Bolin's blog. He already has a few posts about what makes it special, and I'm sure there are more coming.

PuSH Bot: PubSubHubbub to XMPP Gateway #

When XMPP support in Google App Engine was announced, it occurred to me that it would be pretty easy to use it to do a PubSubHubbub-to-XMPP bridge. Other things came up, but I was reminded again of the possibility when a "Oh, you didn't you see my Reader share?" conversation happened in a Partychat room. A bit of searching turned up someone else with the same idea, except it wasn't quite as user-friendly as it could be (i.e. as user friendly as a quasi-command-line interface, which isn't saying much).

After a bit of weekend hacking I've created PuSH Bot, which lets you subscribe to any PubSubHubbub-enabled feed and get notified of updates via XMPP (e.g. to your Google Talk account). It has some niceties like use of feed auto-discovery so you can just specify web page URLs, OPML import for bulk subscribing and throttling of updates. The homepage lets you know how to get started.

PuSH Bot Screenshot

The code is available, though there shouldn't be anything too special about it. Feed parsing is done via ROME (as is OPML parsing, though that needed some patching to get working). Feed auto-discovery is handled by Google AJAX Feed API because life is too short for HTML parsing.

Update on March 13, 2015: The App Engine App was migrated to the High Replication Datastore, which necessitated changing its app ID to push-bot-hrd. The URL should still redirect, but chat messages may not, so if you have subscriptions you may need to re-subscribe.

Twitter PubSubHubbub Bridge #

During the Twitter DDoS attacks, there was a thread on the Twitter API group about using PubSubHubbub to get low latency notifications from Twitter. This would be an alternative to the streaming API that Twitter already has. The response from a Twitter engineer wasn't all that positive, and it is indeed correct that the streaming API already exists and seems to satisfy most developers' needs.

However, my interest was piqued and I thought it might be a useful exercise to see what Twitter PubSubHubbub support could look like. I therefore decided to write a simple bridge between the streaming API and a PubSubHubbub hub. The basic idea was that there would be a simple streaming client that would in turn publish events to a hub. The basic flow would be:

Twitter PubSubHubbub flow 1
(created using Kushal's Diagrammr)

I'm using FriendFeed as the PubSubHubbub client, but obviously anything else could substitute for it. The "publisher" is where the bulk of the work happens. It uses the statuses/filter streaming API method* to get notified of when a user of interest has posted, and then it notifies the reference hub that there is an update. It also has a companion Google App Engine app that serves feeds for Twitter updates. This is both because the hub needs a feed to crawl and because the feed needs to have a <link rel="hub"> element, something which Twitter's own feeds don't have. Unfortunately the publisher itself can't run on App Engine since the streaming API requires long-lived HTTP connections, and App Engine will not let requests execute for more than 30 seconds. I considered using the tasks queue API to create a succession of connections, but that seemed too hacky.

In any case, it all seems to work, as this screencast shows:

On the right is the Twitter UI where messages are posted. In the middle is the publisher which receives these messages and relays them to the hub. On the left is FriendFeed which gets updates from the Hub.

Latency isn't great, and as mentioned in the group thread, Twitter could have to deal with the hub being slow. Part of the reason why latency isn't great is because the hub has to crawl the feed to get at the update, even though the publisher already knows exactly what the update is. This could be fixed by running a custom hub (possibly even by Twitter, see the hub can be integrated into the publisher's content management system option), with the flow becoming something like:

Twitter PubSubHubbub flow 2

In the meantime, here's the source to both the publisher and the app.

* This was called the "follow" method until very recently.

Twitter Streaming API from Python #

I'm playing around with Twitter's streaming API for a (personal) project. tweetstream is a simple wrapper for it that seemed handy. Unfortunately it has a known issue that the HTTP library that it uses (urllib2) uses buffering in the file object that it creates, which means that responses for low volume streams (e.g. when using the follow method) are not delivered immediately. The culprit appears to be this line from (in the AbstractHTTPHandler class's do_open method):

fp = socket._fileobject(r, close=True)

socket._fileobject does have a bufsize parameter, and its default value is 8192. Unfortunately the AbstractHTTPHandler doesn't make it easy to override the file object creation. As is pointed out in the bug report, using httplib directly would allow this to be worked around, but that would mean losing all of the 401 response/HTTP Basic Auth handling that urllib2 has.

Instead, while holding my nose, I chose the following monkey patching solution:

# Wrapper around socket._fileobject that forces the buffer size to be 0
_builtin_socket_fileobject = socket._fileobject
class _NonBufferingFileObject(_builtin_socket_fileobject):
  def __init__(self, sock, mode='rb', bufsize=-1, close=False):
        self, sock, mode=mode, bufsize=0, close=close)

# Wrapper around urllub2.HTTPHandler that monkey-patches socket._fileobject
# to be a _NonBufferingFileObject so that buffering is not use in the response
# file object
class _NonBufferingHTTPHandler(urllib2.HTTPHandler):
  def do_open(self, http_class, req):
    socket._fileobject = _NonBufferingFileObject
    # urllib2.HTTPHandler is a classic class, so we can't use super()
    resp = urllib2.HTTPHandler.do_open(self, http_class, req)
    socket._fileobject = _builtin_socket_fileobject
    return resp

Then in tweetstream's urllib2.build_opener() call an instance of _NonBufferingHTTPHandler can be added as a parameter, and it will replace the built-in HTTPHandler.

Exporting likes from Google Reader #

I started this as another protip comment on this FriendFeed thread about Reader likes but it got kind of long, so here goes:

Reader recently launched liking (and a bunch of other features). One of the nice things about liking is that it's completely public*. It would therefore make sense to be pretty liberal with liking data, and in fact Reader does try to expose liking in our feeds. If you look at my shared items feed you will see a bunch of entries like:


These are the users that have liked. Users are represented by their IDs, which you can use to generate Reader shared page URLs. More interestingly, you can plug these into the Social Graph API to see who these users are.

Liking information isn't just limited to Reader shared item feeds. If you use Reader's view of a feed, for example The Big Picture's, you can see the <gr:likingUser> elements there too. This means that as a publisher you can extract this information and see which of your items Reader users find interesting.

For now liking information that is included inline in the feed is limited to 100 users, mainly for performance reasons. That number may go up (or down) as we see how this feature is used. However, if you'd like to get at all of the liker information for a specific item, you can plug in an item ID into the /reader/api/0/likers API endpoint, and then get at it in either JSON or XML formats.

* I've seen some wondering what the difference between liking, sharing and starring is. To some degree that's up to each user, but one nice thing about liking is that it has less baggage associated with it. We learned that if we try to redefine existing behaviors (like sharing) users get upset.

HTML Color OneBox #

Work is the sort of place that cares about specific colors, so HTML color hex triplets come up in conversation quite often. Neil suggested that this should be a OneBox in search results. It occurred to me that this could done via the Subscribed Link feature that we offer for search results. It turned out that subscribed links can use gadgets, which meant that an inline preview of colors was even possible. Regular expression matching also meant that I didn't have to list out every color by hand. This page has more information on the OneBox, or you can subscribe directly.

Color OneBox preview

Once you have installed this, you can search for things like #fafafa or #ccc and get an immediate preview (in fact, the # can be omitted).