PiggyBank and Forced Semantics

PiggyBank looks like an interesting tool, which bills itself as an extension to the Firefox web browser that turns it into a “Semantic Web browser”, letting you make use of existing information on the Web in more useful and flexible ways. It's just one step too many for me to actually install, but the description looks like something that might be more widely used if it were a bit more simple. At some point, I expect PiggyBank and other similar tools will be more widely used, and I wonder what will happen to the web then.

Given enough context, it's not difficult to force semantics onto any website. And if the context isn't provided on the publishing side, there's no reason the reader can't provide it. I know where the movie titles are on an IMDB page, even though it isn't marked <h1 class="movie_title">A Great Movie</h1>. This is a lesson I learned through working on disemployed, and I'm relearning through playing with tools like GreaseMonkey, my MySpace RSS feed tool, and the widget I'm working on (and hope to release within the next week). All of these tools use context to infer meaning from otherwise meaningless markup. There is more and more technology adopting this method, but where is this leading?

Forcing semantics onto a website will only work so long as the website maintains enough predictable structure to know where to put the semantics. When the structure changes, everything breaks. PiggyBank, for example, will require new scrapers nearly every time a target site changes structure. This isn't stable. It's also not scalable. There are billions of websites, and it's just not going to work to write and maintain custom scripts for each one to make it more semantic. At some point website developers will need to start participating in the semantic web for it to work.

But the current trend seems to discourage such participation in two ways. First, tools like PiggyBank and GreaseMonkey, as they become more popular, provide disincentives to change website markup. This is good for the stability issue I mentioned, but it's bad for the transition to a more semantic web. Second, as forced semantics tools get better and better at converting non-semantic websites into something semantic, there is little reason for the websites to themselves become more semantic.

Maybe I'm wrong, and website developers will look at something like PiggyBank, see the benefit of semantics to users, and decide to start using more descriptive XHTML or more RDF. But it seems to me more likely that we're headed towards a "semantic web" in which the semantics are forced onto websites by browsers and other intermediaries. This isn't necessarily a problem, but it isn't what most people have in mind when they talk about the semantic web.

Be number 1:

knows half of 8 is