pandaman pandaman - 6 months ago 54
JSON Question

Why does converting Nokogiri XML to JSON with Hash#from_xml remove content?

In converting a Nokogiri object to XML and then to JSON, the majority of the content dissapears.

Code getting the data and converting:

def get_data
doc = Nokogiri::HTML(open("<url>", "User-Agent" => "Ruby/#{RUBY_VERSION}"))

# Get interesting block of HTML
blurb = doc.css('.entry')

# Convert Nokogiri object to XML
xmlBlurb = blurb.to_xml

# Convert to JSON
jsonBlurb = Hash.from_xml(xmlBlurb).to_json

return jsonBlurb
end


Somehow between
xmlBlurb
and
jsonBlurb
, I'm going from 10+ lines of XML, to a single JSON object
{ attr: content }
with only 1 attribute.

I know there are several questions on SO regarding converting XML to JSON but none that I read address this specific issue.

Does anyone know what can cause the loss of data?

Answer

Hash#from_xml is an addition to the standard library Hash class made by Rails. This method is documented as troublesome in losing attributes under various conditions during the conversion from XML to Hash.

This SO link provides some suggestions: convert XML to ruby hash with attributes

Sources: