xetra11 xetra11 - 2 months ago 4
Bash Question

Regex word can be optional but only if it matches the characters

Following pattern:

(v[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2})(-[0-9]{1,2})?((-schema)?(-dev)?)((-schema)?(-dev)?)
from http://regexr.com/ is meant to be used in a shell script with
grep
and does match the following strings (working example):


  • Hello I am a text and this is my v1.12.33-32 version

  • Hello I am a text and this is my v1.12.33-dev version

  • Hello I am a text and this is my v1.12.33-dev-schema version

  • Hello I am a text and this is my v1.12.33-schema version

  • Hello I am a text and this is my v1.12.33-3-schema version



and so forth

So I made the words
schema
and
dev
optional. They can be ommitted or used in a arbitrary order. What I don't what is this:


  • Hello I am a text and this is my v1.12.33-foo version

    or
    Hello I am a text and this is my v1.12.33-asfs version



to match.

I want the option to be a bit more constrained. At the moment the Regex is still matching the stuff that...well actually matches.

This for example:

Hello I am a text and this is my v1.123.33


results in an empty string while this:

`Hello I am a text and this is my v1.12.33-bla"

still results in
v.1.12.33


Is this because of the grouping I made? So at least the fully matching groups will be taken for the returned match-string?

Answer

To match only the version string, disallow extra trailing tags, yet allow trailing unmatched text, you need a regex language that supports lookahead. Standard grep / egrep regexes do not support lookahead.

You have two options:

  1. Since you seem to be relying on GNU grep anyway, you could use a Perl regex, such as
v[0-9]{1,2}(\.[0-9]{1,2}){2}(-[0-9]{1,2})?((-schema(-dev)?)?|(-dev(-schema)?)?)?(?!\S)

The negative lookahead at the end allows the match to appear at the end of the line, but also requires that if it does not end the line then the next character following the match must be whitespace (which is not itself included in the match).

  1. You could give up on completely isolating the target text via -o, and instead allow the pattern to match the trailing context, too:
v[0-9]{1,2}(\.[0-9]{1,2}){2}(-[0-9]{1,2})?((-schema(-dev)?)?|(-dev(-schema)?)?)?(\s.*)?$

In this case, you could isolate the target text in a second step, by stripping off any tail beginning with whitespace.

Note that neither of these pays attention to text preceeding the match. You have similar options for handling that portion as you do for handling the trailing portion.

Comments