PHP does have limited unicode support

joel on software writes:

When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.

to say PHP's character encoding deficiencies make it "darn near impossible to develop good international web applications" is only partially true. the only thing you really can't do with PHP and non-ASCII character sets is edit text (and you can even do that in some very limited ways). but there's nothing stopping anyone from writing a good international web application in PHP, so long as that application doesn't require text editing.

take my daily japanese lessons for an example. i won't be so bold as to suggest this qualifies as a good international web application, but i use PHP to post new lessons, display lessons, and organize lessons, all with non-ASCII text. i won't say it wouldn't be nice to be able to edit my lessons through a web interface, but that's not such a problem that i can't work around it. i get the impression joel hasn't actually tried to develop an international web application with PHP before declaring it "darn near impossible".

 
 
 
For the record, we *are* developing an international web application with PHP, and it *is* darn near impossible. For a simple example based from our actual experience, suppose you needed to parse incoming email (each email can and will use a different encoding) and display the subjects of all the incoming messages on the same UTF-8 encoded web page. Without the ability to manipulated convert from an arbitrary encoding to Unicode, process the Unicode internally, and then convert Unicode to UTF-8 for display as a web page. There is zero, nil, no support for any of this in PHP.
 
 
 
 
And of course every web application has to do that. @_@

I think you're confusing not being able to develop the particular web application you're building in PHP, with web applications in general Joel.
 
 
 
 
"There is zero, nil, no support for any of this in PHP."

http://www.hut.fi/u/hsivonen/php-utf8/
 
 
 
 
This debate including Joel, Jonathan, and this blog as well as I'm sure others reveals what I think is an important point. PHP, unlike many other programming languages is simply not up to snuff when it comes to working with Unicode. It is wonderful that we share info on the myriad ways that we can overcome these difficulties to still create useful Unicode applications, but let us admit that some postings do this in an attempt to vindicate PHP in this regard. I'm afraid that this is simply denial. The majority of the PHP developer community is, as far as I can tell, not taking Unicode seriously, as many in this discussion have pointed out. Joel was right to highlight the fact, and PHP can only benefit from increased pressure on it to take Unicode seriously. Encoding issues, and Unicode in particular are not going to just go away...

Unicode in particular is the only valid project out there to promote the harmonious coexistence of multiple languages and writing systems. Even the most die-hard supporters of roman languages must concede that the future of the internet must accommodate other scripts and encodings, especially if they are vested in a particular programming language. The competition will not wait, I want to see a great offering like PHP remain in there.
 
 
 
 
"let us admit that some postings do this in an attempt to vindicate PHP in this regard"

no, no, no. i specifically said "that's not such a problem that i can't work around it." i didn't say "there's no problem." but joel did say "there's no solution" which is simply untrue. what bothers me is not the PHP-bashing specifically (i've coding in both ASP and Perl in the past - so i'm no language purist), but the effect it has already had and will continue to have on hobbyist scripters like jonathon. when someone like joel says something is impossible, people believe it and give up, and we'll all be worse off as a result. the PHP-bashing is just part of the essay that says the "absolute minimum" every programmer must know about unicode is more than 10 pages long. it's not. programming isn't that hard.
 
 
 
 
Just a couple questions:
Why don't you use actual UTF-8 (rather than numeric character references (&#x...;)? And why is your XML response for this form in iso-8859-1, rather than UTF-8? And why is this page in iso-8859-1, rather than Unicode?
 
 
 
 
i have no good answers to your questions, martin. my weblog isn't in unicode because i haven't experienced the need for unicode and i used numeric references rather than UTF-8 because it worked better for me at the time.
 

Be number 8:

 
 
 
knows half of 8 is