Jason Jason - 2 months ago 16
Linux Question

Extract multiple values using sed

So, I am trying to extract multiple values from a string using sed that are separated by ",".

Working Eg:

Input :
echo "abc-de-aa-zzzz-1.2.3-4" | sed -E 's/(^([a-z]{3}-[a-z]{1,5}-[a-z]{1,5}-[a-z]{1,15})).*/\1/'

Output:
abc-de-aa-zzzz


Need help with the below expression:

Not Working Eg:

Input:
echo "abc-de-aa-zzzz-1.2.3-4,abc-de-aa-kkkk-1.2.5-4" | sed -E 's/(^([a-z]{3}-[a-z]{1,5}-[a-z]{1,5}-[a-z]{1,15})).*/\1/'

Current output:
abc-de-aa-zzzz

Correct output:
abc-de-aa-zzzz,abc-de-aa-kkkk

This one works as well:
abc-de-aa-zzzz
abc-de-aa-kkkk


Thanks,

Jason

Answer

One way is to delete only string not needed, in this case deletion pattern is - followed by 3 set of digits with . as delimiter and then a final sequence of digits

$ echo "abc-de-aa-zzzz-1.2.3-4,abc-de-aa-kkkk-1.2.5-4" | sed -E 's/-([0-9]+\.){2}[0-9]+-[0-9]+//g'
abc-de-aa-zzzz,abc-de-aa-kkkk


Alternate solutions: - extract what is required

Using grep and pcre

$ echo "abc-de-aa-zzzz-1.2.3-4,abc-de-aa-kkkk-1.2.5-4" | grep -oP '(^|,)\K([^-]+\-){3}[^-]+'
abc-de-aa-zzzz
abc-de-aa-kkkk

Using GNU sed

$ echo "abc-de-aa-zzzz-1.2.3-4,abc-de-aa-kkkk-1.2.5-4" | sed 's/,/\n/' | sed -E 's/^(([^-]+\-){3}[^-]+).*/\1/'
abc-de-aa-zzzz
abc-de-aa-kkkk


In case you need to combine the output as single line delimited by ,

$ echo "abc-de-aa-zzzz-1.2.3-4,abc-de-aa-kkkk-1.2.5-4" | grep -oP '(^|,)\K([^-]+\-){3}[^-]+' | paste -s -d,
abc-de-aa-zzzz,abc-de-aa-kkkk