More Scraped Feeds #

Some time ago I posted about scraped comic feeds. Some time in the past few months, the one for Frazz disappeared. I have therefore taken it upon myself to produce an unofficial Frazz feed. Comics.com doesn't make it easy to do this, since they embed a hash (unless I'm missing some pattern) in each day's comic image. Still, it was reasonably easy to parse the archive pages with Python's htmllib and scrape the necessary URLs.

In a similar vein, I've created a feed for recordings of Stanford's Computer Systems Laboratory Colloquium. Some interesting speakers come, and the entire lecture is available online. The schedule is available, but checking it by hand is tedious. Items tend to show up early, so the scraper script actually checks for a valid video URL before including it in the feed.

Update on 11/9/2005: It occurred to me that I never like how to the source for these scraping scripts is unvailable, since if one of them stops working, someone else willing to pick up the ball has to start from scratch. I've therefore uploaded the Python code that generates these scraped feeds.

1 Comment

FWIW, here is a scrape I do of "This Modern World" by Tom Tomorrow http://bloglines.com/sub/http://www.joegrossberg.com/tmw.rss

Post a Comment