pandaman pandaman - 1 year ago 117
JSON Question

Why does converting Nokogiri XML to JSON with Hash#from_xml remove content?

In converting a Nokogiri object to XML and then to JSON, the majority of the content dissapears.

Code getting the data and converting:

def get_data
doc = Nokogiri::HTML(open("<url>", "User-Agent" => "Ruby/#{RUBY_VERSION}"))

# Get interesting block of HTML
blurb = doc.css('.entry')

# Convert Nokogiri object to XML
xmlBlurb = blurb.to_xml

# Convert to JSON
jsonBlurb = Hash.from_xml(xmlBlurb).to_json

return jsonBlurb

Somehow between
, I'm going from 10+ lines of XML, to a single JSON object
{ attr: content }
with only 1 attribute.

I know there are several questions on SO regarding converting XML to JSON but none that I read address this specific issue.

Does anyone know what can cause the loss of data?

Answer Source

Hash#from_xml is an addition to the standard library Hash class made by Rails. This method is documented as troublesome in losing attributes under various conditions during the conversion from XML to Hash.

This SO link provides some suggestions: convert XML to ruby hash with attributes