StoutPanda StoutPanda - 5 months ago 42
Ruby Question

How to pull data from tags based on other tags

I have the following example document:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<n1:Form109495CTransmittalUpstream xmlns="urn:us:gov:treasury:irs:ext:aca:air:7.0" xmlns:irs="urn:us:gov:treasury:irs:common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage IRS-Form1094-1095CTransmitterUpstreamMessage.xsd" xmlns:n1="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage">
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>1</RecordId>
<CorrectedInd>0</CorrectedInd>
<irs:TaxYr>2015</irs:TaxYr>
<EmployeeInfoGrp>
<OtherCompletePersonName>
<PersonFirstNm>JOHN</PersonFirstNm>
<PersonMiddleNm>B</PersonMiddleNm>
<PersonLastNm>Doe</PersonLastNm>
</OtherCompletePersonName>
<PersonNameControlTxt/>
<irs:TINRequestTypeCd>INDIVIDUAL_TIN</irs:TINRequestTypeCd>
<irs:SSN>123456790</irs:SSN>
</Form1095CUpstreamDetail>
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>2</RecordId>
<CorrectedInd>0</CorrectedInd>
<irs:TaxYr>2015</irs:TaxYr>
<EmployeeInfoGrp>
<OtherCompletePersonName>
<PersonFirstNm>JANE</PersonFirstNm>
<PersonMiddleNm>B</PersonMiddleNm>
<PersonLastNm>DOE</PersonLastNm>
</OtherCompletePersonName>
<PersonNameControlTxt/>
<irs:TINRequestTypeCd>INDIVIDUAL_TIN</irs:TINRequestTypeCd>
<irs:SSN>222222222</irs:SSN>
</EmployeeInfoGrp>
</Form1095CUpstreamDetail>
</n1:Form109495CTransmittalUpstream>


Using Nokogiri I want to extract the value between the
<PersonFirstNm>
,
<PersonLastNm>
and
<irs>:SSN
for each
<Form1095CUpstreamDetail>
based on the
<RecordId>
.

I tried removing namespaces as well. I posted a small snippet, but I have tried many iterations of working through the XML with no success. This is my first time using XML, so I realize I am likely missing something easy.

When I set my XPath:

require 'nokogiri'
submission_doc = Nokogiri::XML(open('1094C_Request.xml'))
submissions = submission_doc.remove_namespaces
nodes = submission.xpath('//Form1095CUpstreamDetail')


I do not seem to have any association between the
RecordId
and the tags mentioned above, and I am stuck on where to go next.

The fields are not listed as children for the
RecordId
, so I can't think of how to approach obtaining their values. I am including the full document as an example to make sure I am not excluding anything.

I have an array of values, and I would like to pull the three tags mentioned above if the RecordId is contained within the array of numbers.

Answer

First of all the xml validator reports error

The default (no prefix) Namespace URI for XPath queries is always '' and it cannot be redefined to 'urn:us:gov:treasury:irs:ext:aca:air:7.0'.

so you must set this default xmlns to "".

You can use this code.

require 'nokogiri'

doc = Nokogiri::XML(open('1094C_Request.xml'))

doc.namespaces['xmlns'] = ''

details = doc.xpath("//:Form1095CUpstreamDetail")

elem_a = ["PersonFirstNm", "PersonLastNm", "irs:SSN"]

output = details.each_with_object({}) do |element, exp|
  exp[element.xpath("./:RecordId").text] = elem_a.each_with_object({}) do |elem_n, exp_h|
    exp_h[elem_n] = element.xpath(".//#{elem_n.include?(':') ? elem_n : ":#{elem_n}"}").text
  end
end

output

p output
# {
#   "1" => {"PersonFirstNm" => "JOHN", "PersonLastNm" => "Doe", "irs:SSN" => "123456790"},
#   "2" => {"PersonFirstNm" => "JANE", "PersonLastNm" => "DOE", "irs:SSN" => "222222222"}
# }

I hope this helps