No Such Weblog | Contact

Last update: Sun Aug 17 22:48:44 CEST 2003

 

 

Links

 

You may notice that some items in this weblog now have links to "Related Stories". These links are not added manually, but generated automatically from RSS feeds.

The idea behind this is relatively simple, but seems to work quite well: Most stories in blogs are about something which has been published elsewhere. They contain precise links to the sources from which the blogger is drawing information.

Thus, stories which link to the same pages most likely are related to each other.

In order to generate the list, my blogging script regularly mines a couple of RSS feeds from various weblogs for links. The result is a list of weblog items for each external ressource encountered. Then, for each item in my blog, the script takes the deep links which actually identify the topic of a story (I tag them in the source file - you can't see that on the web), and checks where else these links show up. For each occurence of one of these links, the score of an external weblog item is increased. The items with the top scores are expected to be most relevant, and are displayed as "related stories". The implementation of this fits into some 200 lines of Perl. I'm going to publish that code.

Of course, this approach makes some assumptions about the contents of an RSS feed. In particular, each item in the RSS feed must have a link attribute which points to the item's permanent location, a title (for the link), and a description (for mining). The description is assumed to be escaped HTML, which is then mined for links.

Examples of weblogs which have such feeds include this weblog, Bret Fausett's ICANN.blog, Martin Schwimmer's Trademark.blog, and Alexander Svensson's ICANNchannel.

Tue Jul 30 01:41:10 CEST 2002 #

 

 

 

About

This is the personal blog of Thomas Roessler.

It's mostly used for comments regarding ICANN, and matters of ICANN's Generic Names Supporting Organization and At-Large Advisory Committee (ALAC).