Kris Kris - 3 months ago 12
Linux Question

How to extract multiple strings from a line using SED regex in Linux and write them to a file?

I have an XML file with multiple lines like below ( I only care about the lines that start with SOURCE)

SOURCE BUSINESSNAME ="" DATABASETYPE ="Oracle" DBDNAME ="OrclExp11g" DESCRIPTION ="" NAME ="EMPLOYEES" OBJECTVERSION ="1"

SOURCE BUSINESSNAME ="" DATABASETYPE ="Oracle" DBDNAME ="OrclExp11g" DESCRIPTION ="" NAME ="HR" OBJECTVERSION ="1"


In every line that starts with SOURCE I need to get 3 strings and write them to another file like below.

Oracle,OrclExp11g,EMPLOYEES

Oracle,OrclExp11g,HR

sed -n -e '/SOURCE /p' InputFile.XML | sed -r 's/.* NAME \=\"(.+)\" OBJECTVERSION \=\".*/\1/' > $Source_List.Out


I am new to using SED but so far I was able to get out only one string out using SED. I really appreciate if anyone can help me how to get 3 strings out.
Thanks so much in advance!

Answer

As you guessed sed is your friend, you could replace matched regex using \1,\2 and so on.

$ sed -nE '/SOURCE/{s/^.*DATABASETYPE ="([^"]*)".*DBDNAME ="([^"]*)".*NAME ="([^"]*)".*$/\1,\2,\3/;p}' file >outputfile

Output

$ cat outputfile
Oracle,OrclExp11g,EMPLOYEES
Oracle,OrclExp11g,HR

Notes

  • -E enable extended regex.
  • -n with sed suppresses the normal output. Only the lines that you would print with p will be printed.