Linux Question

How do you "debug" a regular expression with sed?

I'm trying to use a regexp using

sed
. I've tested my regex with kiki, a gnome application to test regexpd, and it works in kiki.

date: 2010-10-29 14:46:33 -0200; author: 00000000000; state: Exp; lines: +5 -2; commitid: bvEcb00aPyqal6Uu;


I want to replace
author: 00000000000;
with nothing. So, I created the regexp, that works when I test it in kiki:

author:\s[0-9]{11};


But doesn't work when I test it in
sed
.

sed -i "s/author:\s[0-9]{11};//g" /tmp/test_regex.txt


I know regex have different implementations, and this could be the issue. My question is: how do I at least try do "debug" what's happening with sed? Why is it not working?

Answer

My version of sed doesn't like the {11} bit. Processing the line with:

sed 's/author: [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9];//g'

works fine.

And the way I debug it is exactly what I did here. I just constructed a command:

echo 'X author: 00000000000; X' | sed ...

and removed the more advanced regex things one at a time:

  • used <space> instead of \s, didn't fix it.
  • replaced [0-9]{11} with 11 copies of [0-9], that worked.

It pretty much had to be one of those since I've used every other feature of your regex before with sed successfully.

But, in fact, this will actually work without the hideous 11 copies of [0-9], you just have to escape the braces [0-9]\{11\}. I have to admit I didn't get around to trying that since it worked okay with the multiples and I generally don't concern myself too much with brevity in sed since I tend to use it more for quick'n'dirty jobs :-)

But the brace method is a lot more concise and adaptable and it's good to know how to do it.

Comments