unicode with PHP is possible

after yesterday's post in which i attempted to suggest that joel was wrong about PHP, my favorite weblogger, jonathon delacour, accepted joel's statements as truth:

I wish Joel Spolsky had published his excellent introduction to Unicode and character encoding a week earlier, because then I wouldn't have wasted a couple of hours trying to write a snippet of PHP code to convert Japanese characters to Unicode character entities.

so today i wrote a snippet of PHP code to convert japanese characters to unicode character entities. now i think we must either conclude that joel was wrong about it being darn near impossible to develop good international web applications or i am somehow capable of performing the impossible. i'll be satisfied with either conclusion.

We could also conclude, Scott, that I should leave the PHP coding to those who actually know what they're doing.

Thanks for clarifying this -- and for writing the snippet of PHP code.

I've written a new weblog entry in which I point to your posts (and eat a tasty portion of humble pie):


My main regret is that this ill-fated foray into PHP pontificating may mean I've plummeted from the top to the bottom of your list of favorite webloggers.
jonathon, you're still my favorite weblogger. you were just repeating what joel had said, and if i hadn't had experience with PHP and japanese, i would have believed it myself.
This is a bit of a cheater. The form is set up in UTF-8, so the PHP doesn't convert from a Japanese legacy encoding (iso-2022-jp, shift_jis, euc-jp), which is what Joel said he needed, to Unicode, just from UTF-8 to numeric character references. Calling one of the Japanese and the other Unicode is strange, both are Japanese, and both are Unicode. The former requires large tables (as e.g. available in the iconv library), the later is done with an algorithm probably less than a page long.
This is more for Joel, but there is actually an iconv API in PHP, see http://us2.php.net/manual/en/ref.iconv.php. That's what you need for your email stuff, I guess.
hi everyone i want to work with PHP+Unicode to generate Urue or any other language like chinies, plz if anyone help me? my email address is ibnegulzar1979@hotmail.com
martin's right. this won't help you if you don't have control over the input character encoding. for most needs, the content is coming from a page controlled by the code author, so this will work. i'm working on a solution for the other cases where this won't work.

Be number 7:

knows half of 8 is