Pete Pete - 26 days ago 15
Bash Question

Extract XML Value in bash script

I'm trying to extract a value from an xml document that has been read into my script as a variable. The original variable, $data, is:

<item>
<title>15:54:57 - George:</title>
<description>Diane DeConn? You saw Diane DeConn!</description>
</item>
<item>
<title>15:55:17 - Jerry:</title>
<description>Something huh?</description>
</item>


and I wish to extract the first title value, so

15:54:57 - George:


I've been using the sed command:

title=$(sed -n -e 's/.*<title>\(.*\)<\/title>.*/\1/p' <<< $data)


but this only outputs the second title value:

15:55:17 - Jerry:


Does anyone know what I have done wrong?
Thanks!

Answer

As Charles Duffey has stated, XML parsers are best parsed with a proper XML parsing tools. For one time job the following should work.

grep -oPm1 "(?<=<title>)[^<]+"

Test:

$ echo "$data"
<item> 
  <title>15:54:57 - George:</title>
  <description>Diane DeConn? You saw Diane DeConn!</description> 
</item> 
<item> 
  <title>15:55:17 - Jerry:</title> 
  <description>Something huh?</description>
$ title=$(grep -oPm1 "(?<=<title>)[^<]+" <<< "$data")
$ echo "$title"
15:54:57 - George:
Comments