RSS-based PageRank™ Monitoring Tool #

Short Version:

Recent revelations have made it very easy to determine the checksum necessary to request a URL's PageRank from Google. Some people, for commercial, egoistical or other reasons, are interested in knowing when the PR of certain URLs changes. As a result, I've whipped up pagerank2rss.pl, a simple Perl script that outputs (in the RSS format) the PR of a list of URLs. Simply drop it in your web server's cgi-bin directory and point your aggregator at it. To change the URLs that it monitors, open it up in a text editor, and modify the %pageRankURLs hash with the URL and its checksum, as computed with the help of this site.

Long Version:

It is argued that Google's PageRank is dead. Though that debate is still unsettled (and will continue to be so, unless Google states outright that they are discontinuing it), one PageRank-related thing is in fact dead. Until recently, the only way to get a site's PageRank was to use the Google Toolbar, which relied on a private "channel" to find out the information from Google's servers. This appears to consist of a private URL to which one can pass as arguments, a URL and its checksum, and in return receive that page's rank.

Lately, this (obfuscated) barrier has been dented, with the advent of Prog (née Proogle) and the attempted auction of the (reverse-engineered) algorithm on eBay. This culminated today in the public domain distribution of an implementaion.

What Google will do about this (if anything - Prog has been up and running for a while, and the eBay auction was not stopped, though thankfully no one was clueless enough to bid either) remains to be seen. Regardless, in the meantime there are a few interesting things than can be done with this bit of information. The first that came to my mind was a simple script that monitors the PR of URLs and reports the results as an RSS feed. Hooked up to an aggregator, it's now possible to see when an URL's rank changes. Presumably this is old news to SEO-types that already have similar tools for doing this, but it may be useful to those that need their egos stroked on a regular basis.

The net result is pagerank2rss.pl, a Perl script that can be dropped in a web server's cgi-bin directory. It relies on curl, since that's available by default on Mac OS X, but with a few tweaks it could be made to use wget or LWP. The list of URLs to monitor is stored in the %pageRankURLs hash, along with their computed checksums (using this handy web interface). It is obvious that this is a 20-minute quick and dirty script - I could've computed the checksums myself and the list of URLs could be made accessible via a friendlier web interface. However, given the limbo-ish status of the methods that it uses, I figured it would not have been worth it to spend more time on something that might go away soon.

Post a Comment