Artalus Artalus - 1 year ago 58
Perl Question

Perl: regex won't work without parentheses

I am writing a simple script in Perl to check string for different wordforms (in english and russian) of a nickname. I would use the next regex:

- which is valid, according to test and Notepad++. However, on my computer in Perl this regex doesn't work unless I put additional parentheses to
. My friend, whom I've asked of this, couldn't reproduce this behavior. Is it some kind of setting of script or Perl interpreter itself that I should change?

Edit: As requested, the code of my tests:

my $GUN = "gunner";
my $HZ = "!!!";

sub GetNickFromMsg
my ($msg) = @_;
if ( $msg =~ /(gunn?er|gunn?|ганн?еру?|ганн?у?)/i )
return $GUN
return $HZ;

my @nicks = ("Gunner", "guner", "ганнер", "ганеру", "гану");
foreach $n (@nicks)
my $res = GetNickFromMsg($n);
print "$n -> $res\n");

The output I get:

Gunner -> !!!
guner -> !!!
ганнер -> !!!
ганеру -> !!!
гану -> !!!

If I change the regex to the second version, with parentheses everywhere, the output for every wordform is "-> gunner" as it should be. I've tried to add
use feature 'unicode_strings'
to the beginning of the script and use
instead of
modifier as Casimir supposed, but it didn't help.

I launch the script on Linux server,
Linux version 4.3.0-1-amd64 ( (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #1 SMP Debian 4.3.3-5 (2016-01-04)
with Perl version 5.22.1

Answer Source

You need to add use utf8 at the top of your program to specify that your program code uses UTF-8-encoded characters

You will also need to set STDOUT to handle UTF-8 encoding, otherwise you will get Wide character in print warnings

Here's an edited version of your program that works correctly and provides the behaviour that you expected


use utf8;
use strict;
use warnings 'all';

use open qw/ :std :encoding(UTF-8) /;

my $GUN = 'gunner';
my $HZ  = '!!!';

sub GetNickFromMsg {
    my ($msg) = @_;

    if ( $msg =~ /(gunn?er|gunn?|ганн?еру?|ганн?у?)/i ) {
        return $GUN;

    return $HZ;

my @nicks = qw/ Gunner guner ганнер ганеру гану /;

foreach my $n (@nicks) {
    my $res = GetNickFromMsg($n);
    print "$n -> $res\n";


Gunner -> gunner
guner -> gunner
ганнер -> gunner
ганеру -> gunner
гану -> gunner