user2570205 user2570205 - 6 months ago 252597
Bash Question

Remove string until the second occurrence in shell script

My shell script does the following:

grep '<record' /data/error/usage_20160422_165920.lerr.xml|sed -e 's/&apos;//g'|cut -d ';' -f1,40,43,46


The result will looks like

<record record_no = "1" error_code="101">;RevShare-2.txt;TWN;1


I want to remove xml tags
<record record_no = "1" error_code=
and result should look like
101;RevShare-2.txt;TWN;1
;

Record nos# are dynamic.

EDIT: I have added
cut -d '=' -f3|tr -d '",>'
to achieve this. This works but it is taking 3 seconds to process a file containing 20,000 records. I have 500 files coming daily. Is there a better way to speed up this process?

EDIT : Here is how the record look like.

<record record_no = "1" error_code="101">&apos;&apos;;&apos;25467&apos;;&apos;&apos;;&apos;&apos;;&apos;FIRSTNAME&apos;;&apos;Manikin&apos;;&apos;1234001&apos;;&apos;12484254823&apos;;&apos;&apos;;&apos;&apos;;&apos;&apos;;&apos;103&apos;;&apos;12484254815&apos;;&apos;XXXXX9680&apos;;&apos;OFX&apos;;&apos;0&apos;;&apos;1028000002130745&apos;;&apos;20160422&apos;;&apos;0000&apos;;&apos;25467&apos;;&apos;20160422&apos;;&apos;Y&apos;;&apos;&apos;;&apos;&apos;;&apos;6&apos;;&apos;2&apos;;&apos;1&apos;;&apos;0&apos;;&apos;&apos;;&apos;263&apos;;&apos;99&apos;;&apos;N&apos;;&apos;&apos;;&apos;Idverifyprod@50&apos;;&apos;136&apos;;&apos;7, 74, 77, 80, 105, 136, 153&apos;;&apos;0&apos;;&apos;&apos;;&apos;501&apos;;&apos;RevShare-2.txt&apos;;&apos;20160422165920&apos;;&apos;000009680&apos;;&apos;TWN&apos;;&apos;1449587762538&apos;;&apos;1&apos;;&apos;1&apos;;&apos;0&apos;;&apos;&apos;;&apos;Verifier&apos;
</record>

Answer
$ awk '
BEGIN { FS=OFS=";" }
/<record/ {
    gsub(/&apos;/,"")
    gsub(/.*="|">/,"",$1)
    print $1, $40, $43, $46
}
' /data/error/usage_20160422_165920.lerr.xml
101;RevShare-2.txt;TWN;1