Blog feed aggregation on HTML pages

For the past few days I have been immersed once again in the arcane world of blog feeds. Getting my server to read some RSS and Atom feeds and digest the information and then post it on “regular” web pages so you can get a “taste” of a bunch of blogs and then go see the ones you’re interested in.

Blogs, as you know, are a phenomenon. People have been blogging for a number of years, but just as it took years for email to become an accepted way of life (I first used email in 1970, then again in 1985, and finally in today’s incarnation from 1995 onward) it has taken a few years for blogging to catch on. I think that as people begin to enrich their lives “easily” using tools like blogs, and Flickr (for photo exchange), they find that they really want to weave it all together. And blogging is the place where that can take place.

A group of folks that I meet with periodically, called NextNow, recently had a “conversion experience” when on New Year’s Day a dozen of them spent the afternoon with Robert Scoble talking about blogging. Robert “demystified” blogging for the NextNow members who joined him at Doug Engelbart’s home for the afternoon . (I was not there, and I guess that leaves me kinda in the dark.)

The members of the group got so excited that a half-dozen blogs were born within the next 48 hours. In order to support the effort, I decided to aggregate some of the posts from the new member blogs on the front page of the NextNow blog, which I had started in 2003. Aggregating dynamic content on a (static) HTML page is a challenge. HTML page, by their very nature, are static. The way most blogs are created is 1) some blog-generating software is used (sometimes this is a web site); and 2) the HTML pages are generated by that software and loaded onto your web server. (Sometimes this is done as one “integrated” process where the creation is done on a web site and published instantly to an associated web site – blogger.com does this.) Whichever the case, the published site is a set of static HTML pages.

Although one could “ingest” the feeds from a bunch of blogs and knit those into the HTML home page of a site, you’d only have a snapshot of the blogs as of the date-and-time the HTML page was captured/created. That’s not the best.

Another way to do it is to embed JavaScript in the page that “opens” an online file and uses JavaScript statements to write the current contents of the blogs on the HTML page. Actually, your browser does the work – the JavaScript allows one to embed the contents of one or many online files into what would otherwise be a static HTML page. (I use this process to embed search results from our knowledgebase software into HTML pages.) In the process of researching this I discovered Jawfish which is one (of many, I think) services that reads the RSS feeds and provides you with files you can embed in HTML

GeckoTribe has two other products which I ended up purchasing and using. They also have free GPL-license versions of their tools for individuals, but I needed some advanced capabilities and a commercial license. CaRP is a PHP-based software tool that reads RSS feeds and embeds them in PHP (web) pages. Thru a “trick” it is possible to embed PHP within an HTML page and get my server to parse it and serve up a dynamic up-to-date HTML page. CaRP can be configured many ways – the most useful is its ability to cache the RSS feeds, and thus it only needs to read them periodically, not every time the page is to be served up. I configured it to look at the feeds every 6 hours, no matter how many times it served up the combined page. The second tool, Grouper, is required if any of the blog feeds are Atom rather than RSS. Turns out that Blogger only serves Atom feeds, and because it’s so popular, I needed to parse Atom feeds. After 10 hours or so of figuring all of these things out, I was able to get ’em configured properly. (There’s one glitch, which I assume the GeckoTribe will fix, which is that not all of the Atom objects produced by Blogger can be correctly parsed – it doesn’t cause errors, just some data doesn’t get transformed.) Final results can be seen at the NextNow blog.

BTW, Wikipedia has a good list of feed tools, in case you don’t already have a way of picking up RSS and other feeds and putting them “on your desktop.” I use Safari on my Mac OSX systems, and subscribe to bloglines as well.

Leave a Reply