Paul Atkins Paul Atkins - 7 months ago 25
PHP Question

php preg_split to find all words in a string is not working

I am using preg_split to split a string into words.

However, it is not working for a particular string that is fetched from a mysql text column.

If I manually assign the string to a variable it will work correctly but not when the string is fetched from the database.

Here is the simple code I am using:

//The failing string. When manually assigned like this it works correctly

$string = "<p><strong>Iden is lesz lehetoseg a foproba és a koncert napjan ebedet kerni a MUPA-ban. Ára 1000-1200 Ft körül várható. Azoknak, akik még nem jártak a MUPA-ban ingyenes bejarasi lehetoseget biztositunk. Tovabba segitunk a pesti szallas megszervezeseben is, ha igenyt tartotok ra.</strong></p>";

$string = strip_tags(trim($string));

$words = preg_split('/\PL+/u', $string, null, PREG_SPLIT_NO_EMPTY);


Here is what the preg_split returns when called on the string from the database:

array(1) { [0]=> string(269) "Iden is lesz lehetoseg a foproba és a koncert napjan ebedet kerni a MUPA-ban. Ára 1000-1200 Ft körül várható. Azoknak, akik még nem jártak a MUPA-ban ingyenes bejarasi lehetoseget biztositunk. Tovabba segitunk a pesti szallas megszervezeseben is, ha igenyt tartotok ra." }


Does anyone know what is causing preg_split to fail for this string?

Thanks

Answer

I tested your code with a string from the database and happened the same error, change the regular expresion and you will have the solution. Use this expression:

$words = preg_split('/[\s]/', $string, null, PREG_SPLIT_NO_EMPTY);


//var_dump result

array(42) {
  [0]=>
  string(4) "Iden"
  [1]=>
  string(2) "is"
  [2]=>
  string(4) "lesz"
  [3]=>
  string(9) "lehetoseg"
...
}

UPDATE: The modifier /u are for UTF 8, maybe your database is not in UTF8, and so the expression did not work