PHP Question

PHP regex is too greedy when trying to fetch Foreign chars

It's seems like a newbie question,

But I am fighting with this super simple regex for too long,

Googled it and didn't find the answer.

I am lookin to fetch Hebrew chars from within HTML.
this is my code sample, the weird chars are in Hebrew.

שלום</span> inside a span
מה<b> קורה</b> is "whats up"
Peace is also שלומות in Hebrew

I want the result to be only the Hebrew words not include anything:





I have tried the next regex's

preg_match("/([\p{Hebrew}].*)/u", $input_line, $output_array);

but then its get super greedy

שלום</span> inside a span
מה<b> קורה</b> is "whats up"
שלומות in Hebrew

while if I am trying the non greedy:

preg_match("/([\p{Hebrew}].*?)/u", $input_line, $output_array);

I am getting only the first Hebrew char in each line:


I am sure this is a simple flag but I can't find it :-(

Answer Source

You've forgotten the quantifier and there're no needs for character class:

preg_match("/(\p{Hebrew}+)/u", $input_line, $output_array);
