Jeff82 Jeff82 - 4 months ago 10
Bash Question

Trying to figure out how to convert function to accept piped stdin

I am working on a way to easily parse XML using bash for a defined purpose. I have gotten this to work with some code I found on this site which I then recoded everything because this code worked so well. This is currently working with a function and I have to have the data in a file to be able to process it. Here is it in it's working state:

[ ~]$ cat testxml.xml
CTYPE PARTS SYSTEM "parts.dtd">
<?xml-stylesheet type="text/css" href="xmlpartsstyle.css"?>
<PARTS>
<TITLE>Computer Parts</TITLE>
<PART>
<ITEM>Motherboard</ITEM>
<MANUFACTURER>ASUS</MANUFACTURER>
<MODEL>P3B-F</MODEL>
<COST> 123.00</COST>
</PART>
<PART>
<ITEM>Video Card</ITEM>
<MANUFACTURER>ATI</MANUFACTURER>
<MODEL>All-in-Wonder Pro</MODEL>
<COST> 160.00</COST>
</PART>
<PART>
<ITEM>Sound Card</ITEM>
<MANUFACTURER>Creative Labs</MANUFACTURER>
<MODEL>Sound Blaster Live</MODEL>
<COST> 80.00</COST>
</PART>
<PART>
<ITEM> 20 inch Monitor</ITEM>
<MANUFACTURER>LG Electronics</MANUFACTURER>
<MODEL> 995E</MODEL>
<COST> 290.00</COST>
</PART>
</PARTS>

[ ~]$
[ ~]$ rdom () { local IFS=\> ; read -d \< E C ;} ; while rdom; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done < testxml.xml | xargs -L3
PART: ITEM: Motherboard COST: 123.00
PART: ITEM: Video Card COST: 160.00
PART: ITEM: Sound Card COST: 80.00
PART: ITEM: 20 inch Monitor COST: 290.00
[ ~]$


As you can see this pulls out the data I am looking for and I am able to reformat it to suit my needs. However I would much rather prefer to have this accept the input from stdin such as the following:

cat out.xml2 | IFS=\> ; until [ EOF ]; do read -d \< E C ; if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then echo $E: $C ; fi ; done;


This code never ends the loop. This may be impossible and I just don't understand how the loop is ending b/c it has "rdom" as the expression it is waiting for to show loop termination. I've tried this with a while loop, etc. Not sure how to determine when the data is no more so that the loop can end. I feel like there may be a much better way restructure this that i'm completely missing although. I like being able to use stdin b/c it allows easy use for one liners. The actual data I am parsing is much larger and multi-dimensional. I created this example for testing purposes. The first example works with the large data I have though. End result is I am trying to get this to parse from stdin rather then from a file. Any recommendations are much appreciated.

Jeff

Answer

Try:

$ rdom() { local IFS=\> ; while read -d \< E C ; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then  echo $E: $C ; fi ; done; }
$ rdom <out.xml2
PART: 

ITEM: Motherboard
COST:  123.00
PART: 

ITEM: Video Card
COST:  160.00
PART: 

ITEM: Sound Card
COST:  80.00
PART: 

ITEM:  20 inch Monitor
COST:  290.00

Or, without using the function definition but still taking input from stdin:

{ IFS=\> ; while read -d \< E C ; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then  echo $E: $C ; fi ; done; } <out.xml2

Because the question does not show desired output, I don't know if this is what you want.

Some comments:

  1. cat out.xml2 | IFS=\> ; sends the text of out.xml2 to the variable assignment IFS=\>. After the variable assignment completes, the text is discarded.

  2. until [ EOF ]; do read -d \< E C ; ... does not do what you want. In shell, the string EOF is just three characters. By contrast, while read -d \< E C ; do ... will stop when the input is exhausted.

Examples with piping

To demonstrate that the above work with piping, not just redirection from a file, try:

cat out.xml2 | rdom

Or:

cat out.xml2 | { IFS=\> ; while read -d \< E C ; do if [[ $E = 'PART' ]] || [[ $E = 'ITEM' ]] || [[ $E = 'COST' ]] ; then  echo $E: $C ; fi ; done; }

Alternative output format

Continuing with the use of cat as a stand in for a pipeline:

$ cat out.xml2 | { IFS=\> ; while read -d \< E C ; do case "$E" in PART) printf "%s:" "$E";; ITEM) printf " %s: %s" "$E" "$C";; COST) printf " %s: %s\n" "$E" "$C";; esac ; done; }
PART: ITEM: Motherboard COST:  123.00
PART: ITEM: Video Card COST:  160.00
PART: ITEM: Sound Card COST:  80.00
PART: ITEM:  20 inch Monitor COST:  290.00
Comments