Dare Obasanjo writes on screen scraping, It seems Richard Macmanus has missed the point. The issue isn't depending on a third party site for data. The problem is depending on screen scraping their HTML webpage. An API is a service contract which is unlikely to be broken without warning. A web page can change depending on the whims of the web master or graphic designer behind the site.

I completely agree that screen scraping is an undesirable practice, but I think it's actually Dare who is missing the point. No one scrapes a site with an API, so comparing the two doesn't make much sense. Of course the API is better, but what good does that do us when we want data in a certain format and there is no API? Answer: no good at all. Not only does scraping not at all compete with APIs, it actually encourages development of APIs by establishing an existing market for structured data and creating a competitor for customers until the API exists.

Case in point: I scrape MySpace and provide RSS feeds. I don't even use MySpace myself, but I want to read the weblogs of my friends who do via RSS, so I made this scraper. When I put it online, I discovered there are many other people who want to use MySpace RSS feeds. When these people do a Google search for "myspace rss," they currently find a full page of results, begining with my scraper. Myspace.com only shows up on the second page. This is bad business for MySpace. They've lost control of the experience of these potential customers. They need an API.

And they got one. I don't imagine my scraper had much to do with it in this case, but I have scraped smaller sites who didn't provide a feed until my scraper was being used by a significant portion of their readers. This puts such sites in a position where they need to provide the structured data their visitors clearly want or lose those visitors.

Screen scraping brings an increased risk of breakage, as I've experienced a few times already with the MySpace scraper. But without an alternative API, the structured data is worth that risk for many people. Dare writes Web 2.0 isn't about screenscraping. I say Web 1.9beta1 is about screen scraping.