The FuzzyBlog :: Scott Johnson's Blog

Scott Johnson / The FuzzyGroup, Feedster / PHP Consulting / Random geeky stuff / I Blog Therefore I Am.

Tuesday, July 22, 2003

GGSearch is Coming to Feedster

I've had some interesting IM chats with Pieter and he's updating GGSearch for Feedster. GGSearch is a client side (for Windows) tool for more easily building Google searches and its actually pretty darn cool. The way it works is it builds up a search query and then sends it to the browser. Pieter made a conscious decision to avoid the Soap api for Google and he's going to apply the *same approach to Feedster -- just build up a url. Give Pieter any comments you have on how a client side API for Feedster might work.

*Note: We do have a Soap API for Feedster although it is currently undocumented.

When: 1:52:21 PM | Permalink:

| comment [] | IM Me About This

Robots.txt Support is (Partially) Implemented

I just wanted to let people know that we've now partially implemented Robots.txt. Now before anyone flames on, I can just let you know that the main Feedster spider, the thing that checks sites regularly, is using Robots.txt. We're grabbing it once per week, every Sunday. Please note that there is NO standard for how frequently you grab it and this is what we decided. If more frequent is needed, let us know.

And please bear in mind that if you use robots.txt to turn off our indexing of your blog (which is fine) then we will still periodically check your url to make sure that you haven't changed your mind. If you want us to just go the heck away, never touch you, etc., then you have to let us know personally since we'll flip the database bit that says "A real live human made an intelligent reasoned decision to opt out of Feedster so we're never bothering them again unless they specifically ask us too".

As per the ill defined robots.txt spec we look for robots.txt in the root directory of a weblog. If your blog is located at http://radio.weblogs.com/0103807/ then this means that we're looking for the file http://weblogs.com/robots.txt. Subdomains and blogs are way, way, way too random for us to check every possible location.

Now where Robots.txt isn't doing yet is handling images. The reason for this is we use a separate crawler to handle Images and, in particular, we have to do a bunch of path analysis to compare relative to absolute paths and the whole issue of www and no www. So while we're making progress, the current implementation is, indeed, buggy and hence turned off. We'll get it in as soon as possible but with Gnomedex this week and my traveling to, of all places, Des Moines, Iowa, this isn't really likely. At least not a stable, reliable version.

When: 12:12:11 PM | Permalink:

| comment [] | IM Me About This

So What About that "Technorati Cosmos" Scott?

I've had a flurry of requests recently asking why Feedster doesn't have a Technorati like "Cosmos". So over the past few days, with the assistance of Mike from down under who tossed in a critical string parser, I've been implementing this. And we're calling it your "Feedster Web". We looked at the whole Nebula / Galaxy / Constellation / etc but Richard Soderberg came up with "Web" and that one stuck. Now while the final code may not be ready until after Gnomedex (there's one bug that makes some domains non-"webbable"), I thought I'd at least give you a preview so you can tell me if we're on track.

See Feedster Web Demo

Please leave comments on this blog entry and I'll see if we can, well, accommodate them.

When: 11:40:25 AM | Permalink:

| comment [] | IM Me About This