tonytz tonytz - 7 months ago 19
PHP Question

question mark in regular expression

I saw this regular expression performed on an url:

$url = 'http://www.domain.com/';
preg_match('/(http)(.*?)\n/', $url, $matches);


I am not sure what the use of the question mark "?" is in this regex expression. According to regex manuals, the "?" is a meta character that is equivalent to {0,1}. Then, what is the point of having "?" after an * since * already represents {0,}

Can someone please enlighten me. Thanks.

Answer

It has a different meaning when it follows another quantifier.

In this case it changes the matching behaviour of the preceding quantifier. The default behaviour is greedy and the the ? changes it to "ungreedy".

  • "Greedy" means match as much as possible

  • "Ungreedy" means match as less as possible

See the article on regular-expression.info

For example:

a.+b will match "aabxb" in aabxb

a.+?b will match only "aab" in aabxb

See the example here on Regexr

You may be interested in my blog post about this topic: You do know Quantifiers. Really?

About your regex

preg_match('/(http)(.*?)\n/', $url, $matches);

I don't think it makes a difference here. The . matches anything but newline characters by default (you can change this by adding a s after the closing regex delimiter), so if the question mark is there or not, it will match only till the first \n.

If you change the behaviour by using preg_match('/(http)(.*?)\n/s', $url, $matches);, it will make a difference. .*\n would match till the last \n and .*?\n will stop at the first \n.

Comments