i did robots.txt handling for the job search over at dismployed (which, by the way, is now the most user-friendly and useful job search on the internet, in my biased opinion) by extending the open source "snoopy" php class. i thought this might be useful for others, so i tried to come up with a less complicated demonstration here on randomchaos. what i came up with turned out to be more complicated than i expected, and it's almost certainly missing something, but i nonetheless point you to browser spoof.

if it isn't obvious from the name, the tool lets you pass as other browsers (or more broadly, user agents). it will hopefully be useful for those sites that do browser sniffing and prevent access to certain browsers (among other uses). it will also stop you from accessing a page if the user agent you're passing as is disallowed in the requested server's robots.txt file. to do this, it uses the new robots.inc class, which was designed for exactly this purpose, in the interest of having a standards-friendly search engine over at disemployed.