chinshr chinshr - 3 months ago 19
Ruby Question

Parsing XML to hash with Nori and Nokogiri with undesired result

I am attempting to convert an XML document to a Ruby hash using Nori. But instead of receiving a collection of the root element, a new node containing the collection is returned. This is what I am doing:

@xml = content_for(:layout)
@hash = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(@xml)


or

@hash = Hash.from_xml(@xml)


Where the content of
@xml
is:

<bundles>
<bundle>
<id>6073</id>
<name>Bundle-1</name>
<status>1</status>
<bundle_type>
<id>6713</id>
<name>BundleType-1</name>
</bundle_type>
<begin_at nil=\"true\"></begin_at>
<end_at nil=\"true\"></end_at>
<updated_at>2013-03-21T23:02:32Z</updated_at>
<created_at>2013-03-21T23:02:32Z</created_at>
</bundle>
<bundle>
<id>6074</id>
<name>Bundle-2</name>
<status>1</status>
<bundle_type>
<id>6714</id>
<name>BundleType-2</name>
</bundle_type>
<begin_at nil=\"true\"></begin_at>
<end_at nil=\"true\"></end_at>
<updated_at>2013-03-21T23:02:32Z</updated_at>
<created_at>2013-03-21T23:02:32Z</created_at>
</bundle>
</bundles>


The parser returns
@hash
of format:

{"bundles"=>{"bundle"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}}


Instead I would like to get:

{"bundles"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}


The point is that I control the XML, where it if formed similar to the way described above.

My question is also related to Does RABL's JSON output not conform to standard? Can it?

Answer

Imagine an XML that consists only of a list of the same tags, e.g.

<shoppinglist>
    <item>apple</item>
    <item>banana</item>
    <item>cherry</item>
    <item>pear</item>
<shoppinglist>

When you convert this into a hash, it is quite straightforward to access the items with e.g. hash['shoppinglist']['item'][0]. But what would you expect in this case? just an array? According to your logic, the items should now be accessible with hash['shoppinglist'][0] but what if you have different elements inside the container e.g.

<shoppinglist>
    <date>2013-01-01</date>
    <item>apple</item>
    <item>banana</item>
    <item>cherry</item>
    <item>pear</item>
<shoppinglist>

How would you now access the items? And how the date? The problem is that the conversion to a hash has to work in the general case.

Although i do not know Nori, i am pretty sure what you ask from it is not baked in, just because it makes no sense when you consider the general case. As an alternative, you can still get the bundle array up one level by yourself:

@hash['bundles'] = @hash['bundles']['bundle']