hatlord hatlord - 1 month ago 8
Ruby Question

How to use Ruby and Nokogiri to parse XML

This document is the output from a firewall configuration. I am trying to build a hash of firewall rules. I will later output this data to CSV/console/whatever I need:

<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
<headings>
<heading>Rule</heading>
<heading>Action</heading>
<heading>Source</heading>
<heading>Destination</heading>
<heading>Service</heading>
<heading>Log</heading>
</headings>
<tablebody>
<tablerow>
<tablecell><item>test_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>host02_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.447">[Group] host02_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>randomhost</item></tablecell>
<tablecell><item>Allow</item></tablecell>
**<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>**
<tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
<tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
<tablecell><item>No</item></tablecell>
</tablerow>
</tablebody>
</table>


We have the headers of the columns and three firewall rules.

Here is my code:

#!/usr/bin/env ruby

require 'nokogiri'
require 'csv'

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []

fwpol.xpath('./table/tablebody/tablerow').each do |item|
rules = {}

rules[:name] = item.xpath('./tablecell/item')[0].text
rules[:action] = item.xpath('./tablecell/item')[1].text
rules[:source] = item.xpath('./tablecell/item')[2].text
rule_array << rules
end

puts rule_array


The first two hash entries,
:name
and
:action
work perfectly, because there is only one value in those fields.

If I run the code it does not print where there are multiple values. The bolded XML line shows what I am referring to. I need to iterate over the values somehow, but so far my attempts have been fruitless.

Answer

You can get multiple element texts as Array in the following way.

require 'nokogiri'
require 'csv'

fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []

fwpol.xpath('./table/tablebody/tablerow').each do |item|
  rules = {}

  rules[:name]   = item.xpath('./tablecell[1]/item').text
  rules[:action] = item.xpath('./tablecell[2]/item').text
  rules[:source] = item.xpath('./tablecell[3]/item').map(&:text)
  rule_array << rules
end

puts rule_array

output is here.

{:name=>"test_inbound", :action=>"Allow", :source=>["[Group] test_b2_group"]}
{:name=>"host02_inbound", :action=>"Allow", :source=>["[Group] host02_group"]}
{:name=>"randomhost", :action=>"Allow", :source=>["[Group] Host_group_2", "[Group] another_server"]}