Blogger Migration Part II: Getting Data Into Blogger #

Blogger doesn't have any built-in entry importing facilities. My plan for dealing with this was to use their API to re-post all of the entries I had exported from Movable Type. A quick test showed that such entries could be back-dated, which was my main concern.

Using some sample code that I found as a starting point, it was pretty easy to write a simple importing script for entries. Since I needed to parse the Atom responses from the API (e.g. to get at entry IDs), the Universal Feed Parser came in handy. Since I had used the Convert Line Breaks option on quite a few posts, I had to HTML-ify some post bodies before sending them to Blogger (I've turned off Blogger's similar setting. In addition to being only at the blog level, instead of a per-post setting, I've decided that the closer to regular HTML my posts are, the easier it will be to migrate again in the future).

The Blogger API allows for posting of comments too, but unfortunately without the ability to impersonate others (even when anonymous comments are enabled). The solution was to impersonate the regular (non-API) comment posting form, which does allow for authors to be specified (but no backdating, which is why all imported comments are dated February 10th). This was made slightly trickier since a security token is required (to avoid XSRF attacks), so the posting page had to be scraped first to extract it.

Finally, since Blogger proved to be occasionally flaky when doing many API operations in a row, I had to add some simple checkpointing so that if the process failed I could restart it and it would continue from where it left off. Once I did all that, importing 350+ entries and 600+ comments took around an hour, but worked flawlessly.

I've uploaded an archive of my code that I used for the importing. It's not the cleanest it could be, but others may find it useful too. Additionally, in the time that has passed since I began my importing, it looks like a similar script has appeared that does a similar import operation from WordPress, which may be worth a look too.

Understanding feed reader marketshare numbers #

Update on 2/25: FeedBurner has published a post discussing this same issue but providing numbers for their whole userbase, which makes it even more interesting.

Ever since the Reader team announced that we were making public subscriber counts (thanks Justin), bloggers have been excitedly posting about the bumps they're seeing in their subscriber stats. I'm obviously very happy that Reader is getting all this attention, and that we turn out to be quite popular when compared to other feed readers. However, these statistics need a bit of interpretation. Most people post charts of their subscriber counts, like this one for this blog:

FeedBurner subscribers

For web-based readers where feeds are fetched on behalf of multiple users, the subscriber number is based on what the site reports. To the best of my knowledge, with the exception of My Yahoo!, these number are total subscribers, even if an account is inactive. Unless the site is aggressive about cleaning up inactive accounts, these numbers are only upper bounds on the number of actual readers that you have.

A more interesting number to look at is how many viewers each item gets from each feed reader. FeedBurner provides this as part of their TotalStats package. By embedding a small tracking image in your burned posts and looking at referrers, it's possible to see these item-specific views. Here are how many views and clicks my post from yesterday got in various feed readers:

FeedBurner item use

From this it would appear that Reader has an even bigger lead over Bloglines (though given the biases in this blog's readership, I'm not reading too much into this). There are other factors involved here too. The user bases for feed readers are not identical, if an item appeals more to one population than another, that may skew things. Additionally, some readers (especially homepage-style ones like My Yahoo!, Google Personalized Homepage and Netvibes) don't have to display the item body and allow users to jump straight to the post page. These would show up in the "Clicks" column but not in the "Views" one.

What becomes apparent is that none of these statistics provide a complete picture of your readership, but that when used together they can still give you broad trends and help you tailor your content to your audience.

Blogger Migration Part I: Getting Data Out of Movable Type #

The first step in migrating away from Movable Type was to get all of my entries and comments in a structured format that could be parsed and uploaded to Blogger*. MT doesn't hold data hostage, there is a documented import/export format. Six Apart considers the format "lossy", in that it doesn't save a complete snapshot of your blog. I decided that what it did contain was good enough, though it turn out that what it lacked (entry IDs and permalinks) did make things slightly more difficult. A search on code.google.com for Python code to parse the format turned up Transfusion which does just that (searching for one of the magic strings in the format, CONVERT BREAKS specifically, was the easiest way to track this down).

As I was skimming through the exported entries, I saw that they weren't quite HTML. I had used MTMacro to create various shorthand tags for linking to entries, reference to images, etc. Similarly, I used MTCodeBeautifier to pretty-print code samples. None of these were getting evaluated when exporting, and even if they had been, I probably would have wanted to tweak their output anyway (e.g. to change URLs). Generally, it seemed like the time I had spent customizing my Movable Type installation with cruft-free URLs, plug-ins, etc. would be directly proportional to the time I would have to spend migrating away from it.

One of the more prevalent macros I had used was one of the form <entryLink id="NNN">foo</entryLink> so that I could link to my past entries. Unfortunately, since entry IDs were not included entry IDs issue, there was no easy way to turn these into actual links, since the exported information did not contain entry IDs or URLs. In the end, I ended up converting these by hand.

That's it for the exporting part, part II will contain the Blogger import process and part III the template/design reasoning.

* the other migration option that I was considering was WordPress. However, the idea of having to do SQL queries to serve traffic didn't seem that appealing given my current provider's slow SQL performance. WordPress.com would have been a hosted option, but if I was going to relinquish control of the installation, it might as well be to a Google product.

Switched to Blogger #

Partly because I was fed up with Movable Type's rebuild times, but also for dog food reasons, I've moved my blog over to Blogger (custom domains was the final missing piece). Redirects should be in place and no links should break, but feed readers will most likely see a bunch of new entries (I didn't see an option in FeedBurner to suppress duplicates). Please leave a comment if you see anything amiss.

I'll be writing up more details this coming week about the work that was necessary to migrate.