hatlord hatlord - 1 month ago 7
Ruby Question

Using Ruby and Nokogiri to Parse XML

I'm trying to parse an XML document using Ruby and Nokogiri and one part has me stumped. The document is in essence the output from a Firewall configuration. What I am trying to do is build a hash of Firewall rules. I will later output this data to CSV/Console/Whatever I need.
I have cut down the XML to show my issue:

`<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
<headings>
<heading>Rule</heading>
<heading>Action</heading>
<heading>Source</heading>
<heading>Destination</heading>
<heading>Service</heading>
<heading>Log</heading>
</headings>
<tablebody>
<tablerow>
<tablecell><item>test_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>host02_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.447">[Group] host02_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>randomhost</item></tablecell>
<tablecell><item>Allow</item></tablecell>
**<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>**
<tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
<tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
<tablecell><item>No</item></tablecell>
</tablerow>
</tablebody>
</table>`


So what we have is the headers of the columns and three Firewall rules.

Here is my cut down code:

#!/usr/bin/env ruby

require 'nokogiri'
require 'csv'

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []

fwpol.xpath('./table/tablebody/tablerow').each do |item|
rules = {}

rules[:name] = item.xpath('./tablecell/item')[0].text
rules[:action] = item.xpath('./tablecell/item')[1].text
rules[:source] = item.xpath('./tablecell/item')[2].text
rule_array << rules
end

puts rule_array


So - The first two hash entries, :name and :action work perfectly, because there is only ever one value in those fields. If you run the code though, you will note that it does not print where there are multiple values. I have made a line of the XML bold to show which I am referring to. I realise that I must need to iterate over the values somehow, but so far my attempts have been fruitless.

Any help greatly appreciated!

Answer

You can get multiple element texts as Array in the following way.

require 'nokogiri'
require 'csv'

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []

fwpol.xpath('./table/tablebody/tablerow').each do |item|
  rules = {}

  rules[:name]   = item.xpath('./tablecell[1]/item').text
  rules[:action] = item.xpath('./tablecell[2]/item').text
  rules[:source] = item.xpath('./tablecell[3]/item').map(&:text)
  rule_array << rules
end

puts rule_array

output is here.

{:name=>"test_inbound", :action=>"Allow", :source=>["[Group] test_b2_group"]}
{:name=>"host02_inbound", :action=>"Allow", :source=>["[Group] host02_group"]}
{:name=>"randomhost", :action=>"Allow", :source=>["[Group] Host_group_2", "[Group] another_server"]}
Comments