ssharma ssharma - 4 months ago 17
Ruby Question

How to fetch attributes from XML

I have an XML file like:

<?xml version="1.0" encoding="UTF-8"?>
<bulkCmConfigDataFile xmlns:es="EricssonSpecificAttributes.17.08.xsd"
xmlns:un="utranNrm.xsd" xmlns:xn="genericNrm.xsd"
xmlns:gn="geranNrm.xsd" xmlns="configData.xsd">
<fileHeader fileFormatVersion="32.615 V4.5" vendorName="Ericsson"/>
<configData dnPrefix="Undefined">
<xn:SubNetwork id="ONRM_ROOT_MO_R">
<xn:SubNetwork id="MKT_9364">
<xn:MeContext id="936426_SEYMOUR">
</xn:MeContext>
</xn:SubNetwork>
</xn:SubNetwork>
</configData>
<fileFooter dateTime="2017-05-08T10:15:53Z"/>
</bulkCmConfigDataFile>


I want to grab all the attributes from the file. I can get
@doc.at('fileHeader')['vendorName']
but I am not getting the expected result
ONRM_ROOT_MO_R
for the second
puts
statement.

Here is my Ruby code:

#!/usr/bin/env ruby

require 'xmlsimple'
require 'nokogiri'
require 'ap'


@doc = Nokogiri::XML(File.open("seymour.xml"))
puts @doc.at('fileHeader')['vendorName']
puts @doc.at('xn:SubNetwork')['id']


The output is:

Ericsson
./bulk_cm_parse.rb:10:in `<main>': undefined method `[]' for nil:NilClass
(NoMethodError)

Answer Source

Your document has namespaces, so you have to take those into account, plus your selector was wrong:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<bulkCmConfigDataFile xmlns:es="EricssonSpecificAttributes.17.08.xsd"
xmlns:un="utranNrm.xsd" xmlns:xn="genericNrm.xsd"
xmlns:gn="geranNrm.xsd" xmlns="configData.xsd">
<fileHeader fileFormatVersion="32.615 V4.5" vendorName="Ericsson"/>
<configData dnPrefix="Undefined">
    <xn:SubNetwork id="ONRM_ROOT_MO_R">
        <xn:SubNetwork id="MKT_9364">
            <xn:MeContext id="936426_SEYMOUR">
            </xn:MeContext>
        </xn:SubNetwork>
    </xn:SubNetwork>
</configData>
<fileFooter dateTime="2017-05-08T10:15:53Z"/>
</bulkCmConfigDataFile>
EOT

namespaces = doc.collect_namespaces
doc.at('xn|SubNetwork', namespaces)['id'] # => "ONRM_ROOT_MO_R"

at, like search tries to figure out whether you're using a CSS selector or XPath. Your selector didn't have the normal XPath earmarks, so it assumed you meant CSS, but then the namespace wasn't delimited correctly for CSS, which uses |.

xn:SubNetwork isn't correct for XPath either though, as you needed to tell Nokogiri where to look in the document. // means "search everywhere" in XPath-ese and would allow Nokogiri to determine it should use XPath:

doc.at('//xn:SubNetwork', namespaces)['id'] # => "ONRM_ROOT_MO_R"

Read Nokogiri's "Namespaces" documentation in the "Searching a XML/HTML Document" tutorial, to the end of the page, along with the collect_namespaces documentation.