Pablo Pablo - 2 months ago 8
Linux Question

How to output only captured groups with sed

Is there any way to tell

sed
to output only captured groups? For example given the input:

This is a sample 123 text and some 987 numbers


and pattern:

/([\d]+)/


Could I get only 123 and 987 output in the way formatted by back references?

Answer

The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want.

string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

This says:

  • don't default to printing each line (-n)
  • exclude zero or more non-digits
  • include one or more digits
  • exclude one or more non-digits
  • include one or more digits
  • exclude zero or more non-digits
  • print the substitution (p)

If you have GNU grep (it may also work in BSD, including OS X):

echo "$string" | grep -Po '\d+'

or variations such as:

echo "$string" | grep -Po '(?<=\D )(\d+)'

The -P option enables Perl Compatible Regular Expressions. See man 3 pcrepattern or man 3 pcresyntax.