persistent.info: Using Google Reader's reanimated corpse to browse archived data

Using Google Reader's reanimated corpse to browse archived data #

Date: Monday, July 08, 2013

Having gotten all my data out of Google Reader, the next step was to do something with it. I wrote a simple tool to dump data given an item ID, which let me do spot checks that the archived data was complete. A more complete browsing UI was needed, but this proved to be slow going. It's not a hard task per se, but the idea of re-implementing something that I worked on for 5 years didn't seem that appealing.

It then occurred to me that Reader is a canonical single page application: once the initial HTML, JavaScript, CSS, etc. payload is delivered, all other data is loaded via relatively straightforward HTTP calls that return JSON (this made adding basic offline support relatively easy back in 2007). Therefore if I served the archived data in the same JSON format, then I should be able to browse it using Reader's own JavaScript and CSS. Thankfully this all occurred to me the day before the Reader shutdown, thus I had a chance to save a copy of Reader's JavaScript, CSS, images, and basic HTML scaffolding.

zombie_reader is the implementation of that idea. It's available as another tool in my readerisdead.com collection. Once pointed at a directory with an archive generated by reader_archive, it parses it and starts an HTTP server on port 8074. Beyond serving the static resources that were saved from Reader, the server uses web.py to implement a minimal (read-only) subset of Reader's API.

The tool required no modifications to Reader's JavaScript or CSS beyond fixing a few absolute paths¹. Even the alternate header layout (without the Google+ notification bar) is something that was natively supported by Reader (for the cases where the shared notification code couldn't be loaded). It also only uses publicly-served (compressed/obfuscated) resources that had been sent to millions of users for the past 8 years. As the kids say these days, no copyright intended.

A side effect is that I now have a self-contained Reader installation that I'll be able to refer to years from now, when my son asks me how I spent my mid-20s. It also satisfies my own nostalgia kicks, like knowing what my first read item was. In theory I could also use this approach to build a proxy that exposes Reader's API backed by (say) NewsBlur's, and thus keep using the Reader UI to read current feeds. Beyond the technical issues (e.g. impedance mismatches, since NewsBlur doesn't store read or starred state as tags, or has per item tags in general) that seems like an overly backwards-facing option. NewsBlur has its own distinguishing features (e.g. training and "focus" mode)², and forcing it into a semi-functional Reader UI would result in something that is worse than either product.

And changing the logo to make it more obvious that this isn't just a stale tab from last week. The font is called Demon Sker.
One of the reasons why I picked NewsBlur is that it has been around long enough to develop its own personality and divergent feature set. I'll be the first to admit that Reader had its faults, and it's nice to see a product that tries to remedy them.

17 Comments

Getting "ImportError: No module named third_party.web" on Windows with Python 2.7.

Presumably my vanilla laptop doesn't have all the necessary dependencies. Any ideas where to get the missing module from?

Posted by: Koranteng July 8, 2013 at 2:00 PM

@Koranteng: web.py is included in the downloaded archive. How are you running the script? Via the bin\zombie_reader.bat wrapper script, or another way (I have not tested the script on Windows)?

Posted by: Mihai Parparita July 8, 2013 at 2:55 PM

Can I use zombie reader to prune the archive I downloaded using reader_archive?

I have a lot of feeds that I don't need to keep (In fact, if I would have thought about it, I would have unsubscribed from them before I started the archive). Can I remove them from my archive by "unsubscribing" in zombie reader? If not, is there another way to do it?

Posted by: menachem July 8, 2013 at 7:18 PM

@menachem: zombie_reader is read-only (any changes are not persisted), so that won't work.

You'll need to write a custom tool to remove items from the archive. Start by walking through the streams/ directory to find the feed whose items you'd like to remove. That'll give you a list of item IDs. Then you can find the item bodies in the items directory (grouped by the first 4 characters in the ID). Each item file may have multiple item, you'll need to rewrite it to remove the one(s) you don't want.

Posted by: Mihai Parparita July 8, 2013 at 11:11 PM

First of all, thank you for writing and sharing these awesome tools! I'm having the same problem as Koranteng on Windows 7 with zombie_reader. To get reader_archive to work, I used some instructions from the ycombinator thread. Having never used Python before (or run ANY script for that matter) I was pretty stoked when it worked. :)

First, I copied the "base" folder to my Python27 directory in C:, and then in cmd I ran:

C:\mihaip-readerisdead>reader_archive\reader_archive.py --output_directory C:\greader

And the data started happily flowing.

So, for zombie_reader I figured I could do the same thing. I downloaded the latest .zip from readerisdead, replaced my old one and copied the new "base" folder to my Python directory. Then in cmd I ran:

C:\mihaip-readerisdead>zombie_reader\zombie_reader.py C:\greader

It returns: ImportError: No module named third_party.web

Any ideas or thoughts on how to get this to run?

Posted by: tobyrare July 8, 2013 at 11:16 PM

@tobyrare You must be at root dir and launch like this:
C:\mihaip-readerisdead>bin\zombie_reader.bat C:\greader

@mihai
Mihai, thank you very much for this tool, among all, it's useful as reference for future attempts to build feed reader software. :)

Also - maybe it's possible to add some stub data on GitHub? Asking because if no data is saved through reader_archive, zombie_reader won't launch...

Posted by: Unknown July 9, 2013 at 3:01 AM

Thanks for your hard work! Unfortunately, I have only made a backup using Google Takeout (I starred everything I want to backup manually). Would you provide a tool for reading files generated by Google Takeout?

Posted by: Cookie Lee July 9, 2013 at 7:12 AM

@Alex K Thanks, works great! I just had to add c:\python27 to my System Variables Path and now I'm reliving almost 4 years of great reading, and discovering some cool things I missed along the way. :)

Posted by: tobyrare July 9, 2013 at 9:55 AM

Thanks for this! It's working well.

I also ran into the import error. I'm not sure it helps, but I made sure to set PYTHONPATH to the current directory which seemed to alleviate the issue (Although I was messing around with some other things as well).

In windows 7 assuming the zip is extracted to readerisdead and the archive was put into the "download" folder in the same directory as the zombie_reader folder, the below code seems to work well.

cd C:\Users\Admin\Desktop\readerisdead

set PYTHONPATH=%CD%

c:\Python27\python zombie_reader\zombie_reader.py download

Posted by: Anonymous July 9, 2013 at 12:20 PM

@Mihai: Is it possible to get search to work in Zombie Reader? I just get an "unexpected condition" error when I try to use it.

Posted by: tobyrare July 9, 2013 at 3:51 PM

@tobyrare: Search may eventually work, but it'll take some effort on my part. You can follow the GitHib repository (https://github.com/mihaip/readerisdead) if you'd like to be notified when I add it.

Posted by: Mihai Parparita July 9, 2013 at 4:37 PM

This worked brilliantly for me in OS X. Thank you so much, I have access to over 50,000 items read going back to 2006.

http://cogdogblog.com/2013/07/10/zombie-reader/

What is enlightening is that the archive I got via the Takeout tool was 180 Mb, and the one with your tool is over 3 Gb. That is a huge gap.

Posted by: Alan July 10, 2013 at 12:31 AM

Brilliant! Thank you for doing this.

I am able to run the script, but I'm finding the following error(s):

[I 130710 09:07:40 zombie_reader:198] Loading archive for lcmello@XXXX
Traceback (most recent call last):
File "zombie_reader\zombie_reader.py", line 319, in
main()
File "zombie_reader\zombie_reader.py", line 184, in main
_load_archive_data(archive_directory)
File "zombie_reader\zombie_reader.py", line 199, in _load_archive_data
_load_friends()
File "zombie_reader\zombie_reader.py", line 203, in _load_friends
friends = [base.api.Friend.from_json(t) for t in _data_json('friends.json')]

AttributeError: type object 'Friend' has no attribute 'from_json'

Does anyone has any idea about how to fix it?

Thank you very much.

Posted by: Luiz Mello July 10, 2013 at 5:12 AM

@Luiz: What version of Python are you running with? Only Python 2.7 is supported.

Also, if you had an older version of the code from when reader_archive was released, you'll need to update all of it, not just the zombie_reader directory.

Posted by: Mihai Parparita July 11, 2013 at 7:27 AM

@Mihai: I have Python 2.7 (needed to install it to download the archive in the first place).

The second part may be the explanation, though. I have updated more than the zombie_reader directory, but I did it manually and may have left something out. I'll update the whole thing and let you guys know if it worked.

Thanks!

Posted by: Luiz Mello July 11, 2013 at 7:33 AM

@Mihai: I imagine search would require a lot of effort, and I'll definitely keep my eyes on the Git for it. Really enjoying Zombie Reader so far to browse my archive. Thanks!

Posted by: tobyrare July 11, 2013 at 9:50 AM

@Mihai: I tested it, and it works wonderfully. Thank you again for these brilliant tools.

Posted by: Luiz Mello July 11, 2013 at 6:02 PM

persistent.info

Using Google Reader's reanimated corpse to browse archived data #

17 Comments

Post a Comment

Archives

By Year

By Label

About