toobee toobee - 4 years ago 151
Perl Question

RegEx doesn't match in Perl: Why?

I have this (German) example string

Gesundheit und einen besseren Fußball- u. Musikgeschmack!


I want to match the words that are connected by
- u.


In this case I want the to match
Fußball- u. Musikgeschmack

I wrote an RegEx expression which does exactly that but it seems to work differently if I run it as part of a Perl script.

My RegEx is this:
[ |^]*([A-Za-zÄäÖöÜüß]+[\-\\][ ]*[u][\.][A-Za-zÄäÖöÜüß ]+)

According to this website that allows interactive RegEx testing it selects what it should: https://regex101.com/r/tN6gB4/1

What perl gives me is
ball- u. Musikgeschmack


I have the German special character
ß
in the block that matches
ball
so I don't get why it does not match
Fußball

Answer Source

Indeed, @stribizhev seem to be right, it's use utf8; issue: this pragma says that string literals in the source file are utf8-encoded and thus allow Perl to decode them into Unicode and operate properly.

use utf8;
binmode(STDOUT, ":utf8");

$s = "Gesundheit und einen besseren Fußball- u. Musikgeschmack!";

$s=~/[ |^]*([A-Za-zÄäÖöÜüß]+[\-\\][ ]*[u][\.][A-Za-zÄäÖöÜüß ]+)/;
print($1)

Output:

Fußball- u. Musikgeschmack

See also perlunicode for details.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download