Foreign characters and LDAP. What encoding/charset does LDAP expect?

I am parsing XML, with

, and using the data within it to update Active Directory (AD) objects, via LDAP.

Example XML (simplified):

<?xml version="1.0" encoding="UTF-8"?>
<user>Bìlbö Bággįnš</user>
<user>Gãńdåłf Thê Gręât</user>
<user>Śām Wīšë</user>

I firstly run an
to find a single user and then proceed to change their attributes. Pumping the above values straight into AD, using LDAP, will result in some pretty mangled characters showing up.

For example:
Bìlbö Bággįnš

I've tried the following functions, to no avail:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
iconv("UTF-8", "ASCII//TRANSLIT", $str);
iconv("UTF-8", "T.61", $str);

Ideally, I don't want to do any of these string conversions. UTF-8 should be fine, right?!

I've also noticed the following:
I have printed out the values to see how they come out. curl-ing the script in CLI will show the correct characters, but web browsers show the same as AD.

What's going on? Should I be looking at something else, eg. URL encoding?
I'm hoping this is down to a simple mistake on my end.

I entered in these characters using AD admin GUI to see how they would come out. I can read them via LDAP fine. Correct characters are displayed when in a browser. curl-ing via CLI will show question marks instead of foreign characters. Passing one of these returned values into
will return UTF-8.

I decided to immediately modify the same object by not writing in a new string, but just reversing the existing value and saving the object. This works fine - I see the correct value (reversed) in AD.

  • Developing on Mac OS X 10.7 Lion - PHP 5.4.3

  • Running production on: Red Hat 6 - PHP 5.4.3

  • AD server: Windows 2003

After a few months, I was unable to find the answer/solution to this problem.
In the end, I went with replacing characters to their non-accented equivalent (NOT ideal, I know).

Are you using LDAP v3?

ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);

LDAPv3 supports UTF-8 by default, which it expects requests and responses to be in by default. See here:

