Twitter PubSubHubbub Bridge #
During the Twitter DDoS attacks, there was a thread on the Twitter API group about using PubSubHubbub to get low latency notifications from Twitter. This would be an alternative to the streaming API that Twitter already has. The response from a Twitter engineer wasn't all that positive, and it is indeed correct that the streaming API already exists and seems to satisfy most developers' needs.
However, my interest was piqued and I thought it might be a useful exercise to see what Twitter PubSubHubbub support could look like. I therefore decided to write a simple bridge between the streaming API and a PubSubHubbub hub. The basic idea was that there would be a simple streaming client that would in turn publish events to a hub. The basic flow would be:
(created using Kushal's Diagrammr)
I'm using FriendFeed as the PubSubHubbub client, but obviously anything else could substitute for it. The "publisher" is where the bulk of the work happens. It uses the statuses/filter streaming API method* to get notified of when a user of interest has posted, and then it notifies the reference hub that there is an update. It also has a companion Google App Engine app that serves feeds for Twitter updates. This is both because the hub needs a feed to crawl and because the feed needs to have a <link rel="hub">
element, something which Twitter's own feeds don't have. Unfortunately the publisher itself can't run on App Engine since the streaming API requires long-lived HTTP connections, and App Engine will not let requests execute for more than 30 seconds. I considered using the tasks queue API to create a succession of connections, but that seemed too hacky.
In any case, it all seems to work, as this screencast shows:
On the right is the Twitter UI where messages are posted. In the middle is the publisher which receives these messages and relays them to the hub. On the left is FriendFeed which gets updates from the Hub.
Latency isn't great, and as mentioned in the group thread, Twitter could have to deal with the hub being slow. Part of the reason why latency isn't great is because the hub has to crawl the feed to get at the update, even though the publisher already knows exactly what the update is. This could be fixed by running a custom hub (possibly even by Twitter, see the hub can be integrated into the publisher's content management system option), with the flow becoming something like:
In the meantime, here's the source to both the publisher and the app.
* This was called the "follow" method until very recently.