I am parsing XML, with
, and using the data within it to update Active Directory (AD) objects, via LDAP.
Example XML (simplified):
<?xml version="1.0" encoding="UTF-8"?>
<user>Gãńdåłf Thê Gręât</user>
I firstly run an
to find a single user and then proceed to change their attributes. Pumping the above values straight into AD, using LDAP, will result in some pretty mangled characters showing up.
I've tried the following functions, to no avail:
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
iconv("UTF-8", "ASCII//TRANSLIT", $str);
iconv("UTF-8", "T.61", $str);
Ideally, I don't want to do any of these string conversions. UTF-8 should
be fine, right?!
I've also noticed the following:
I have printed out the values to see how they come out. curl-ing the script in CLI will show the correct characters, but web browsers show the same as AD.
What's going on? Should I be looking at something else, eg. URL encoding?
I'm hoping this is down to a simple mistake on my end.
I entered in these characters using AD admin GUI to see how they would come out. I can read them via LDAP fine. Correct characters are displayed when in a browser. curl-ing via CLI will show question marks instead of foreign characters. Passing one of these returned values into
will return UTF-8.
I decided to immediately modify the same object by not writing in a new string, but just reversing the existing value and saving the object. This works fine - I see the correct value (reversed) in AD.
- Developing on Mac OS X 10.7 Lion - PHP 5.4.3
- Running production on: Red Hat 6 - PHP 5.4.3
- AD server: Windows 2003
After a few months, I was unable to find the answer/solution to this problem.
In the end, I went with replacing characters to their non-accented equivalent (NOT ideal, I know).