joel on software writes:
When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.
to say PHP's character encoding deficiencies make it "darn near impossible to develop good international web applications" is only partially true. the only thing you really can't do with PHP and non-ASCII character sets is edit text (and you can even do that in some very limited ways). but there's nothing stopping anyone from writing a good international web application in PHP, so long as that application doesn't require text editing.
take my daily japanese lessons for an example. i won't be so bold as to suggest this qualifies as a good international web application, but i use PHP to post new lessons, display lessons, and organize lessons, all with non-ASCII text. i won't say it wouldn't be nice to be able to edit my lessons through a web interface, but that's not such a problem that i can't work around it. i get the impression joel hasn't actually tried to develop an international web application with PHP before declaring it "darn near impossible".
I think you're confusing not being able to develop the particular web application you're building in PHP, with web applications in general Joel.
http://www.hut.fi/u/hsivonen/php-utf8/
Unicode in particular is the only valid project out there to promote the harmonious coexistence of multiple languages and writing systems. Even the most die-hard supporters of roman languages must concede that the future of the internet must accommodate other scripts and encodings, especially if they are vested in a particular programming language. The competition will not wait, I want to see a great offering like PHP remain in there.
no, no, no. i specifically said "that's not such a problem that i can't work around it." i didn't say "there's no problem." but joel did say "there's no solution" which is simply untrue. what bothers me is not the PHP-bashing specifically (i've coding in both ASP and Perl in the past - so i'm no language purist), but the effect it has already had and will continue to have on hobbyist scripters like jonathon. when someone like joel says something is impossible, people believe it and give up, and we'll all be worse off as a result. the PHP-bashing is just part of the essay that says the "absolute minimum" every programmer must know about unicode is more than 10 pages long. it's not. programming isn't that hard.
Why don't you use actual UTF-8 (rather than numeric character references (&#x...;)? And why is your XML response for this form in iso-8859-1, rather than UTF-8? And why is this page in iso-8859-1, rather than Unicode?