I need to combine differently stuctured XML files using PHP. What I am doing is;
The best approach I could see is using a custom callback with
array_uintersect() function. This way works in steps like;
1- Write a comparing function that will calculate the similarity. Check
array_uintersect() manual from php.net to have an idea about how you need to write this callback function. Say it's name would be
2- Collect both entries from different XML files into two arrays repectively. (For a quick way, do a
json_encode()first and then
3- Have intersection function find the similar entries like;
$similar_products = array_uintersect($xml_array1, $xml_array2, 'find_similar_entries');
4- Now you have similar entries collected in one array.
array_diff() to remove similar entries from the original arrays.
6- Finally combine all three arrays into a new XML structure per your wish, using
Note1: I used
similar_text() and SmithWatermanGotoh to calculate the similarity and they work well together I can say. But when it comes to very close product names which may differ only a few chars from each other would end up "identical". There is nothing you can do about it except extracting the distinguishing words from the strings. Like "model name" in my case.
Note2: This method works as expected but PHP's intersection functions have a bug I think, which makes these function so slow. I created a bug report for that. Intersection compares not the elements of two arrays cross wise only; but it also compares the array's own elements too. This is actually illogical because intersection can be calculated only by comparing at least two parties. So comparing one array from the inside is not actually "intersection". This is why if you have large files, your script will die if you just run this straight forward. Maybe you can do it chunk by chunk.