Twitter PubSubHubbub Bridge #

During the Twitter DDoS attacks, there was a thread on the Twitter API group about using PubSubHubbub to get low latency notifications from Twitter. This would be an alternative to the streaming API that Twitter already has. The response from a Twitter engineer wasn't all that positive, and it is indeed correct that the streaming API already exists and seems to satisfy most developers' needs.

However, my interest was piqued and I thought it might be a useful exercise to see what Twitter PubSubHubbub support could look like. I therefore decided to write a simple bridge between the streaming API and a PubSubHubbub hub. The basic idea was that there would be a simple streaming client that would in turn publish events to a hub. The basic flow would be:

Twitter PubSubHubbub flow 1
(created using Kushal's Diagrammr)

I'm using FriendFeed as the PubSubHubbub client, but obviously anything else could substitute for it. The "publisher" is where the bulk of the work happens. It uses the statuses/filter streaming API method* to get notified of when a user of interest has posted, and then it notifies the reference hub that there is an update. It also has a companion Google App Engine app that serves feeds for Twitter updates. This is both because the hub needs a feed to crawl and because the feed needs to have a <link rel="hub"> element, something which Twitter's own feeds don't have. Unfortunately the publisher itself can't run on App Engine since the streaming API requires long-lived HTTP connections, and App Engine will not let requests execute for more than 30 seconds. I considered using the tasks queue API to create a succession of connections, but that seemed too hacky.

In any case, it all seems to work, as this screencast shows:

On the right is the Twitter UI where messages are posted. In the middle is the publisher which receives these messages and relays them to the hub. On the left is FriendFeed which gets updates from the Hub.

Latency isn't great, and as mentioned in the group thread, Twitter could have to deal with the hub being slow. Part of the reason why latency isn't great is because the hub has to crawl the feed to get at the update, even though the publisher already knows exactly what the update is. This could be fixed by running a custom hub (possibly even by Twitter, see the hub can be integrated into the publisher's content management system option), with the flow becoming something like:

Twitter PubSubHubbub flow 2

In the meantime, here's the source to both the publisher and the app.

* This was called the "follow" method until very recently.

9 Comments

In that API group thread, Kalucki said "Technically, someone could build a service to consume from the Streaming API and push into PubSubHubBub. This would be against the
EULA though."

That's what you're doing here, right? I haven't read through the EULA, but I'd be interested in the conflict (since I'd love to build this funcionality into Twitalytic).
Link to pubsubhubbub is busted.
Interesting way to connect twitter real time streams in a pubsubhubbub way. I'm interested in real time streaming implementations, on a conversational/interaction level (openff folks are doing their best to build open social media).

If you're interested Mihai check it out:
http://friendfeed.com/openff
http://openff.org/wiki
Mihai - there's a (non-standard) PSHB fatpublish API that accepts the feed content in the ping so that the feed doesn't need to be crawled - http://code.google.com/p/pubsubhubbub/source/browse/trunk/nonstandard/fat_publish.py
@Anonymous, thanks, fixed.

@gina, I haven't been able to find this EULA, presumably Kalucki refers to the agreement that you sign when you get broader access to the API (I'm using the endpoint that has a maximum of 400 users to track).
hey, interesting post
was googling the very same thing you have made, and came across your post.
Any changes on implementing this now or has everything remained the same?
also (noob question alert), why does the publisher send to both the hub and the app?
@zsquare: The publisher sends to both the hub and the app so that the feed can get populated. If it used the fatpublish API that Ivan mentioned above, it could get away with just notifying the hub (and avoiding the hub -> app crawl).

As for this getting made, I haven't seen any change in the EULA/ToS from Twitter, but I haven't followed this that closely either.
I take it this isn't running anymore? The app just says "hello world" now :)
@openid: Yeah, this wasn't meant to stay up longer than it took me to record the screencast :) The source (for everything, including the App Engine app) is available though (link in the last paragraph) if you want to run it yourself.

Post a Comment