rasen58 rasen58 - 4 months ago 9
Bash Question

Why does a space cause the remembered pattern in sed to output different things

I'm trying to get the value of the value entry in this xml line via terminal so I'm using sed.

abcs='<param name="abc" value="bob3" no_but_why="4"/>'

echo $abcs | sed -e 's/.*value="\(.*\)" .*/\1/'
echo $abcs | sed -e 's/.*value="\(.*\)".*/\1/'


The output is:

bob3
bob3" no_but_why="4


Why does the second way without the space cause more than just what I wanted to be printed out? Why would the \1 be affected by that

Answer

As you can see difference is use of greedy pattern .* in second regex after " without space.

Reason why it is behaving differently because there is a double quote after no_but_why= as well and .* being a greedy pattern is matching until last " before /> in second regex.

In your first regex "\(.*\)" is matching only "bob3" because there is a space after this which makes regex engine prevent .* matching till last double quote in input.

To avoid this situation you should be using negated character class instead of greedy matching.

Consider these sed command examples:

sed -e 's/.*value="\([^"]*\)" .*/\1/' <<< "$abcs"
bob3

sed -e 's/.*value="\([^"]*\)".*/\1/' <<< "$abcs"
bob3

Now you can see both command are producing same output bob3 because negated character class [^"]* will match until it gets next " not till the very last " in input as the case with .*.