Automatically Generated Blogroll from NetNewsWire #

NetNewsWire is my newsreader of choice, and thus I figured that if I were to have a blogroll, the data should come from it (as opposed to a hand-maintained list). It supports exporting of its subscriptions to OPML, (which could be then transformed to HTML in various ways) but only in a flat format (through group support is coming soon).

Thus, the first step in having an automatically-updated blogroll was to get full hierarchical the data out of NNW. Based on this script snippet, we can have the following piece of AppleScript to extract the relevant subscription information:

set output to ""
tell application "NetNewsWire"
  repeat with currentSubscription in every subscription
    tell currentSubscription
      set output to output & RSS URL & "\t" & ¬
                             givenName & "\t" & ¬
                             is group & "\t" & ¬
                             group's givenName & "\t" & ¬
                             synthetic & "\t" & ¬
                             home URL & "\n"
    end tell
  end repeat
end tell
output

Although presumably AppleScript could be used to extract the subscription hierarchy, I've left that to a language I'm more comfortable with, Perl. Therefore, we need to get the output of this snippet into a Perl script. There is a Mac::AppleScript module at CPAN, but the 0.04 version number isn't too encouraging, and generally I try to keep the reliance on non-bundled modules to a minimum. However, since Mac OS X 10.2 there's been a osascript CLI program to execute AppleScript snippets, and thus we can use that instead. This MacDevCenter article contains a _osascript Perl subroutine that opens pipes to/from the CLI app, allowing us to embed the ApplesScript snippet above.

From the captured AppleScript output, we can extract the necessary hierarchy. However, we don't want to be doing this (or any of the succeeding steps) needlessly, if the list of subscriptions hasn't changed. A very easy way to detect this is to compute an MD5 hash (using the Digest::MD5 module) and compare it with the hash of the previously extracted subscriptions. Only if they're different do we need to proceed. We can then extract the subscriptions and groups and recursively descend them to output a very simple HTML-based (i.e. nested <ul>'s and <li>'s) hierarchy.

Once we have this HTML representation of the subscription hierarchy, we want to transfer it to the web server where Movable Type resides. The easiest way (assuming your server supports it) is by using the scp command. However, we don't want to embed our password into the script. Therefore, if this isn't already set up, we must first make a public/private key pair, and put one on the receiving end, as described by this guide, so that we can then have secure (well, as long as the source machine isn't compromised) uploading.

We can just reference the page we uploaded directly, but it's more attractive if we wrap it in a standard template. We can therefore make a "Blogroll" index template, where can include a standard header, footer and all other usual stuff. We also need a template module with the name "Blogroll Body", where the "file link" field points to the place where our blogroll gets uploaded automatically, so that it can be included it in the "Blogroll" template with the command <$MTInclude module="Blogroll Body"$>.

The trick now is to rebuild this template automatically, whenever a new blogroll file was uploaded. We can work up a very simple CGI script that invokes mt-rebuild as follows:

#!/usr/bin/perl

print "Content-type: text/html\n\n";

`path to your MT install/mt-rebuild.pl -mode="index" -blog_id="1" -template="Blogroll"`;

print "rebuilt\n";

It may be desirable to put this CGI in its own directory and protect with with an .htaccess password, otherwise someone who gets a hold of its URL could attempt a DoS attack by causing ceaseless rebuilds.

The final step is to make this script (the side that resides on the client) be invoked periodically. This is typically done via a cronjob, which can be installed by using one of the many GUI crontab managers or by simply runnning crontab -e and adding the following line (which in this case, makes the script be run every day at midnight):

0 0 * * *       full path to script/blogrollUpdate.pl

blogrollUpdate.pl is available, and should be put in its own directory where it can also keep the MD5 hash, an error log and the temporary HTML file. The CGI script snippet above goes on the server. The results of the script can be seen by clicking on the Daily Reads link in the sidebar.

It looks like someone else had the same idea, although they chose to do less on the client (just the AppleScript portion) and more on the server (using PHP and MySQL). I'm also currently having a similar encoding issue; this will be something to look at.

As a sidenote, NNW seems to have some difficulty with giving the right parent group if a subscription's name has changed since you subscribed to it (e.g. Dave Hyatt's development journal used to be "Confessions of a Mozillian", but now it's "Surfin' Safari"). If these don't get placed in the right group, you may have to unsubscribe and re-subscribe to convince NNW to do the right thing.

Also, using the term "blogroll" so many times has nearly driven me mad, but there's no better, more recognizable, alternative with the same Google-juice.

Post a Comment