OmidTahouri OmidTahouri - 3 months ago 22
PHP Question

Foreign characters and LDAP. What encoding/charset does LDAP expect?

I am parsing XML, with

simplexml_load_string()
, and using the data within it to update Active Directory (AD) objects, via LDAP.

Example XML (simplified):

<?xml version="1.0" encoding="UTF-8"?>
<users>
<user>Bìlbö Bággįnš</user>
<user>Gãńdåłf Thê Gręât</user>
<user>Śām Wīšë</user>
</users>


I firstly run an
ldap_search()
to find a single user and then proceed to change their attributes. Pumping the above values straight into AD, using LDAP, will result in some pretty mangled characters showing up.

For example:
Bìlbö Bággįnš


I've tried the following functions, to no avail:

utf8_encode($str);
utf8_decode($str);
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
iconv("UTF-8", "ASCII//TRANSLIT", $str);
iconv("UTF-8", "T.61", $str);


Ideally, I don't want to do any of these string conversions. UTF-8 should be fine, right?!

I've also noticed the following:
I have printed out the values to see how they come out. curl-ing the script in CLI will show the correct characters, but web browsers show the same as AD.

What's going on? Should I be looking at something else, eg. URL encoding?
I'm hoping this is down to a simple mistake on my end.

EDIT:
I entered in these characters using AD admin GUI to see how they would come out. I can read them via LDAP fine. Correct characters are displayed when in a browser. curl-ing via CLI will show question marks instead of foreign characters. Passing one of these returned values into
mb_detect_encoding()
will return UTF-8.

I decided to immediately modify the same object by not writing in a new string, but just reversing the existing value and saving the object. This works fine - I see the correct value (reversed) in AD.


  • Developing on Mac OS X 10.7 Lion - PHP 5.4.3

  • Running production on: Red Hat 6 - PHP 5.4.3

  • AD server: Windows 2003



UPDATE:
After a few months, I was unable to find the answer/solution to this problem.
In the end, I went with replacing characters to their non-accented equivalent (NOT ideal, I know).

Answer

Are you using LDAP v3?

ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);

LDAPv3 supports UTF-8 by default, which it expects requests and responses to be in by default. See here: http://technet.microsoft.com/en-us/library/cc961766.aspx