Wazime Wazime - 3 months ago 13
PHP Question

PHP regex is too greedy when trying to fetch Foreign chars

It's seems like a newbie question,

But I am fighting with this super simple regex for too long,

Googled it and didn't find the answer.

I am lookin to fetch Hebrew chars from within HTML.
this is my code sample, the weird chars are in Hebrew.

<DIV>
<span>
שלום</span> inside a span
מה<b> קורה</b> is "whats up"
Peace is also שלומות in Hebrew
</div>


I want the result to be only the Hebrew words not include anything:


שלום

מה

קורה

שלומות


I have tried the next regex's

preg_match("/([\p{Hebrew}].*)/u", $input_line, $output_array);


but then its get super greedy

שלום</span> inside a span
מה<b> קורה</b> is "whats up"
שלומות in Hebrew


while if I am trying the non greedy:

preg_match("/([\p{Hebrew}].*?)/u", $input_line, $output_array);


I am getting only the first Hebrew char in each line:

ש
מ
ש


I am sure this is a simple flag but I can't find it :-(

Answer

You've forgotten the quantifier and there're no needs for character class:

preg_match("/(\p{Hebrew}+)/u", $input_line, $output_array);