Andrew Newby Andrew Newby -4 years ago 35
Perl Question

Regex matching issue on newer version of Perl

I've moved over to a new server, with Perl 5.22.1. I have this bit of code:

$html =~ m{
( # $1 the whole tag
<
(
?:
!--
( # $2 the attributes are all the data between
.*?
)
--
| # or
(
?:
( # $3 the name of the tag
/?\S+?\b
)
( # $4 the attributes
[^'">]*
(
?:
( # $5 just to match quotes
['"]
)
.*?\5
[^'">]*
)*
)
)
)
>
)
}gsx


...and it now gives me this error:

A fatal error has occurred:

In '(?...)', the '(' and '?' must be adjacent in regex; marked by <-- HERE in m/
( # $1 the whole tag
<
(
? <-- HERE :
!--
( # $2 the attributes are all the data between
.*?
)
--
| # or
(
?:
( # $3 the name of the tag
/?\S+?\b
)
( # $4 the attributes
[^'">]*
(
?:
( # $5 just to match quotes
['"]
)
.*?\5
[^'">]*
)*
)
)
)
>
)
/ at ./admin/GT/HTML/Parser.pm line 207.
Compilation failed in require at (eval 25) line 8.

Please enable debugging in setup for more details.


I'm not really sure what it's complaining about. Any ideas?

Answer Source

You need to make sure ?: (the non-capturing group markers) go right after the opening parenthesis even when x modifier is used.

See the fixed regex declaration:

$html =~ m{
    ( # $1 the whole tag
        <
        (?:
            !--
            ( # $2 the attributes are all the data between
                .*?
            )
            --
            | # or
            (?:
                ( # $3 the name of the tag
                    /?\S+?\b
                )
                ( # $4 the attributes
                    [^'">]*
                    (?:
                        ( # $5 just to match quotes
                            ['"]
                        )
                        .*?\5
                        [^'">]*
                    )*
                )
            )
        )
        >
    )
}gsx

See this reference:

Note that anything inside a \Q...\E stays unaffected by /x. And note that /x doesn't affect space interpretation within a single multi-character construct. For example in \x{...}, regardless of the /x modifier, there can be no spaces. Same for a quantifier such as {3} or {5,}. Similarly, (?:...) can't have a space between the "{" , "?" , and ":". Within any delimiters for such a construct, allowed spaces are not affected by /x, and depend on the construct. For example, \x{...} can't have spaces because hexadecimal numbers don't have spaces in them.

I think there is a typo - { must be actually (. I bolded the part of text that is relevant for the current scenario.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download