Using Google Reader's reanimated corpse to browse archived data #
Having gotten all my data out of Google Reader, the next step was to do something with it. I wrote a simple tool to dump data given an item ID, which let me do spot checks that the archived data was complete. A more complete browsing UI was needed, but this proved to be slow going. It's not a hard task per se, but the idea of re-implementing something that I worked on for 5 years didn't seem that appealing.
It then occurred to me that Reader is a canonical single page application: once the initial HTML, JavaScript, CSS, etc. payload is delivered, all other data is loaded via relatively straightforward HTTP calls that return JSON (this made adding basic offline support relatively easy back in 2007). Therefore if I served the archived data in the same JSON format, then I should be able to browse it using Reader's own JavaScript and CSS. Thankfully this all occurred to me the day before the Reader shutdown, thus I had a chance to save a copy of Reader's JavaScript, CSS, images, and basic HTML scaffolding.
zombie_reader
is the implementation of that idea. It's available as another tool in my readerisdead.com collection. Once pointed at a directory with an archive generated by reader_archive
, it parses it and starts an HTTP server on port 8074. Beyond serving the static resources that were saved from Reader, the server uses web.py to implement a minimal (read-only) subset of Reader's API.
The tool required no modifications to Reader's JavaScript or CSS beyond fixing a few absolute paths1. Even the alternate header layout (without the Google+ notification bar) is something that was natively supported by Reader (for the cases where the shared notification code couldn't be loaded). It also only uses publicly-served (compressed/obfuscated) resources that had been sent to millions of users for the past 8 years. As the kids say these days, no copyright intended.
A side effect is that I now have a self-contained Reader installation that I'll be able to refer to years from now, when my son asks me how I spent my mid-20s. It also satisfies my own nostalgia kicks, like knowing what my first read item was. In theory I could also use this approach to build a proxy that exposes Reader's API backed by (say) NewsBlur's, and thus keep using the Reader UI to read current feeds. Beyond the technical issues (e.g. impedance mismatches, since NewsBlur doesn't store read or starred state as tags, or has per item tags in general) that seems like an overly backwards-facing option. NewsBlur has its own distinguishing features (e.g. training and "focus" mode)2, and forcing it into a semi-functional Reader UI would result in something that is worse than either product.
- And changing the logo to make it more obvious that this isn't just a stale tab from last week. The font is called Demon Sker.
- One of the reasons why I picked NewsBlur is that it has been around long enough to develop its own personality and divergent feature set. I'll be the first to admit that Reader had its faults, and it's nice to see a product that tries to remedy them.
17 Comments
Presumably my vanilla laptop doesn't have all the necessary dependencies. Any ideas where to get the missing module from?
I have a lot of feeds that I don't need to keep (In fact, if I would have thought about it, I would have unsubscribed from them before I started the archive). Can I remove them from my archive by "unsubscribing" in zombie reader? If not, is there another way to do it?
You'll need to write a custom tool to remove items from the archive. Start by walking through the streams/ directory to find the feed whose items you'd like to remove. That'll give you a list of item IDs. Then you can find the item bodies in the items directory (grouped by the first 4 characters in the ID). Each item file may have multiple item, you'll need to rewrite it to remove the one(s) you don't want.
First, I copied the "base" folder to my Python27 directory in C:, and then in cmd I ran:
C:\mihaip-readerisdead>reader_archive\reader_archive.py --output_directory C:\greader
And the data started happily flowing.
So, for zombie_reader I figured I could do the same thing. I downloaded the latest .zip from readerisdead, replaced my old one and copied the new "base" folder to my Python directory. Then in cmd I ran:
C:\mihaip-readerisdead>zombie_reader\zombie_reader.py C:\greader
It returns: ImportError: No module named third_party.web
Any ideas or thoughts on how to get this to run?
C:\mihaip-readerisdead>bin\zombie_reader.bat C:\greader
@mihai
Mihai, thank you very much for this tool, among all, it's useful as reference for future attempts to build feed reader software. :)
Also - maybe it's possible to add some stub data on GitHub? Asking because if no data is saved through reader_archive, zombie_reader won't launch...
I also ran into the import error. I'm not sure it helps, but I made sure to set PYTHONPATH to the current directory which seemed to alleviate the issue (Although I was messing around with some other things as well).
In windows 7 assuming the zip is extracted to readerisdead and the archive was put into the "download" folder in the same directory as the zombie_reader folder, the below code seems to work well.
cd C:\Users\Admin\Desktop\readerisdead
set PYTHONPATH=%CD%
c:\Python27\python zombie_reader\zombie_reader.py download
http://cogdogblog.com/2013/07/10/zombie-reader/
What is enlightening is that the archive I got via the Takeout tool was 180 Mb, and the one with your tool is over 3 Gb. That is a huge gap.
I am able to run the script, but I'm finding the following error(s):
[I 130710 09:07:40 zombie_reader:198] Loading archive for lcmello@XXXX
Traceback (most recent call last):
File "zombie_reader\zombie_reader.py", line 319, in
main()
File "zombie_reader\zombie_reader.py", line 184, in main
_load_archive_data(archive_directory)
File "zombie_reader\zombie_reader.py", line 199, in _load_archive_data
_load_friends()
File "zombie_reader\zombie_reader.py", line 203, in _load_friends
friends = [base.api.Friend.from_json(t) for t in _data_json('friends.json')]
AttributeError: type object 'Friend' has no attribute 'from_json'
Does anyone has any idea about how to fix it?
Thank you very much.
Also, if you had an older version of the code from when reader_archive was released, you'll need to update all of it, not just the zombie_reader directory.
The second part may be the explanation, though. I have updated more than the zombie_reader directory, but I did it manually and may have left something out. I'll update the whole thing and let you guys know if it worked.
Thanks!
Post a Comment