Yesterday, Technorati released Microformats Search. My first thought was something like "finally..." It's been six months since Google released Google Base. At that time many people were pointing out that submission-based search isn't going to work in the long-term because it takes more work than crawl-based search, and all other things being equal, laziness wins. The advantage of Google Base over traditional search was that it used structured data, so the obvious solution was to make the web more structured.

Microformats make the web more structured, so I thought it would be interesting to see how much structured data a big search company like Google could hope to find by crawling instead of asking for submissions. I made Microformat Base, and before long, the microformats community, the broader semantic web community, and the entire world were ... completely ignoring it. No one used it. No one talked about it. No one copied the open source.

Well, that's not entirely true. A few people on the microformats discussion list said some nice things about it. But the conversation there quickly went back to what would later become hAtom. I went on with my life, playing with other technologies in my spare time. I hoped maybe in another year or two, someone with some venture capital would pick up the idea and make the web a more interesting place.

Then yesterday, as I said, Technorati released Microformats Search, and I thought "finally..." I think I have a pretty good track record for predicting where technologies are headed, and I continue to be annoyed by how long it takes the rest of the world to catch up to my imagined future. I didn't expect Technorati to act as quickly as it did, but I made Microformat Base in a day, so it really shouldn't have taken them six months.

I was happy Technorati had caught up, but then I started reading what Technorati was writing about it. The first thing I read was on the blog of Tantek Çelik. I subscribe to his blog because he talks about interesting technology, and I like to pay attention to where things are heading. Tantek wrote I invite you to come take a look at this first of a kind realtime microformats search engine At that point, I thought "Hmm...that's an odd way of phrasing that. It almost sounds like he has no idea that I ever made Microformat Base..." Then Tantek sent an email to the microformats discussion list writing There are some indexers of specific microformats right now (e.g. Reevoo and Kritx both index hReviews), but no general microformats search engine. At that point, I realized that Tantek really did have no idea I had made Microformat Base, which was surprising because I knew he had previously commented on it.

I wrote a response, saying Hmm... I'm pretty sure I was indexing contacts, events, and reviews several months ago...I'd assume you missed that, except that you commented on it. And Kevin Marks, who also works at Technorati, responded to that with Great stuff Scott, do you want to get pings relayed? At this point, I was trying to be charitable with my take on what was going on here, but it really looked to me like Technorati was intentionally ignoring what I had done, except where they realized that I could be feeding them data.

So I wrote that in response:

What I didn't expect was this feeling that microformats are increasingly just another product owned and sold by Technorati. I'm disappointed that Technorati has apparently developed selective amnesia here regarding others' work. Tantek says "Technorati believes in the voice of the individual," but here I am, an individual, and everyone from Technorati is pretending like I don't exist except where I could contribute more data toward Technorati's profit. I have no doubt that if I had done the same work at a corporation, I wouldn't be seeing phrases like "no general microformats search engine" and "first of a kind" coming out of Technorati. And I'm certainly not the only individual who has worked on this. Dozens of individuals helped lay the groundwork for Technorati's newest product, but not a single one is acknowledged in Technorati's discussion of the Microformats Search - only corporations. This will certainly make me think twice before experimenting further with microformats in my free time.

And I sent it off and thought "bridges: burninated!" I'm not working at one of Technorati's partners, so if microformats really are a product Technorati intends to claim or imply ownership of, I have little to lose by criticizing this trend. I was content to do so in the pseudo-privacy of the microformats email list, but then Tantek responded on his public blog.

He wrote a long post, starting with Mea culpa, including my name nine times, linking to me five times, and going on about how great it is that web workers of the world are uniting under the microformats flag of data freedom or something like that. And I guess I'm happy that Tantek is now reaffirming my romantic notion of what the web could be. The only problem is, I don't really believe it any more.

As I pointed out yesterday, a microformat search engine isn't the first project I've done a proof-of-concept for that later became a successful part of the web. In fact, nearly everything I've ever done online has followed this pattern, going back ten years to when I edited the raw compiled source of a browser plugin (a simpler task back then) to allow users to add any search engine they wanted to a field in their browser and released it for free as "AnySearch Extras". It's now ten years later, and FireFox is the most popular browser to have this same functionality built in. So about 10% of the world has caught up to what I was waiting for ten years ago.

I'm tired of waiting for the web to pay attention. The web is awful at paying attention. One might think Technorati would be a little better at paying attention than others, given what it does and its ownership of a trademark on the phrase "attention index." But experience suggests Technorati is just like the rest of the web. Interesting technology doesn't get the web's attention. Open source and open data don't get the web's attention. I've been doing both for several years. The web hasn't noticed. As Christian Montoya recently observed, money is what gets the web's attention.

It's not just that Tantek originally gave thanks only to corporations with money and ignored all the individuals working on microformats. Tantek's a busy guy, so he forgets things. But the whole web today gave notice to Technorati, with money, and ignored an individual who did the same thing, without money. Tantek wrote:

Companies take note - on the internet, there will always be smarter, more clever people building on each other's work than your secret internal committees, your architecture councils, your internal discussion forums -- no matter how many supergeniuses you think you may have hired away and locked up with golden shackles in your labs.

This has long been a popular mythology on the web, but I no longer believe it. I am the prototypical clever person building on others' work and encouraging others to build on mine. Companies can safely ignore me.

 

I want to find one that is good for searching but which also isn't in the business of turning this into this.

reynir, on Ask MetaFilter

This whole Google censorship thing was less disturbing as an abstract concept. Looking at the pictures, it's hard to avoid the conclusion that Google is exchanging principles for money.

 

At first I didn't like the results Google recently started inserting for searches I maybe should have made instead of what I actually searched for. I'm pretty smart, you see, and I don't need to be bothered by Google treating me like a fool, assuming I don't know what I'm looking for.

And that was basically my thinking up until I searched for something unfamiliar and wasn't entirely clear what I was looking for, and Google gave me some results for what I would have been searching for if I knew what I was doing. At that point I found the functionality very useful.

 

Google has begun returning results for things you didn't type, but maybe should have. For example, if you search for "opera," you get a block of results for "oprah." It looks like it will benefit people who don't know what they're looking for, and probably annoy people who do. I don't like it yet.

I was testing this out with random words when I typed in 'search' and saw that Google returned itself as the top result. And then I thought it would be interesting to try a search for 'search' on various search engines. It turns out Google is the only search engine that returns itself as the top result for 'search.'

Search EngineTop Result
GoogleGoogle
AltavistaSearch.com
YahooSearch.com
Search.comMSN Search
MSN SearchGoogle
AOL SearchGoogle
LycosSearch.com
Ask JeevesLycos
DogpileAltavista
ExciteGoogle

Google is also the most popular search engine among search engines, just slightly ahead of Search.com.

 

As Brendan kindly pointed out in comments, MySpace has (finally) added RSS feeds for blogs, so there's no longer any need to use my MySpace feed scraping tool. Hopefully a Google search for myspace RSS will soon start returning MySpace as the top result (or at least on the first page!) rather than me. I'll try to get around to editing the tool soon so it redirects to MySpace's version of the feed, which seems to be pretty much identical.

Update: it turns out my aggregator was still showing me the previous content and MySpace's RSS feeds are a bit more limited than what I've been offering, so I might keep the tool running until they improve their own feeds.