favoretti favoretti - 6 months ago 11
Python Question

Regexp to match random order words

I have the following pseudo-DSL:

< allow | deny >
< tcp | udp | any >
src < prefix | $ip | @hostgroup | any > [ port number | range | @portgroup | any ]
dst < prefix | $ip | @hostgroup | any > [ port number | range | @portgroup | any ]
[ stateful ]
[ expire YYYYMMDD ] [ log ]
[ # comment ]


The order is fixed, starting from allow up to
dst
and its
port
.
That I'm matching with the following, rather dumb, regexp:

m = re.search("^(allow|deny)?\s+(tcp|udp|tcpudp|any)\s+?(src\s\S+)\s*?(port\s+\S+)?\s*?(dst\s\S+)\s?(port\s+\S+)?\s*?(\S+)?\s*?(\S+)?", line)


Pardon me for the n00bness of the questions, but the parts I'm having problems with are:


  1. How can I match
    stateful
    ,
    expire <value>
    ,
    log
    if all 3 are optional but in case they are present I want to match them in separate groups.

  2. How can I match optional statement
    port <value>
    in such a way that the match group will contain only the value, without creating an extra matching group, i.e. without using
    (port\s+(\S+))?



Thanks!

[edit for more of a problem statement]

To elaborate a bit more, sure I can check whether one of the 3 groups contain either
log
or
stateful
, but if I use the same approach, a non-capturing group for expire, aka
(?:expire\s(\S+))
, I'd need to make an assumption. Unless I can somehow have order-less matching? i.e. match on
(stateful|log|(?:expire\s(\S+)))
?

Answer
  1. How can I match stateful, expire <value>, log if all 3 are optional but in case they are present I want to match them in separate groups.

Use capture groups that have a ? after them so that they will be optional.

Ex. \s*(stateful)?\s*(?:expire (\d{8}))?\s*(log)?

To allow those optional groups to appear in any order in the match string, but still always have them in the same numbered capture group, use a look-ahead (?= ).

Ex. (?=(?:.*(stateful))?)(?=(?:.*expire (\d{8}))?)(?=(?:.*(log))?)

  1. How can I match optional statement port <value> in such a way that the match group will contain only the value, without creating an extra matching group, i.e. without using
    (port\s+(\S+))?

Use a non-capturing group (?: ) to put those characters together for the following ? without capturing them. (You probably want to do this for expire above also)

(?:port\s+(\s+))?

Complete Regex