josh josh - 4 months ago 5x
Ruby Question

How to feed only a string to Nokogiri

I have the following sample XML:

<reg info='<root><h level="2" i="1"> something </h><root>'

I want to parse the XML provided in the
property of the
tag, but I don't know how to feed the content of the
attribute to Nokogiri.

This is what I have now:

doc = Nokogiri::HTML(open-uri(mylink))
node = doc.xpath(//houses/reg)
puts node[0]['info'].class #string
#content of info property as string. This is what I want to feed to nokogiri as xml
puts node[0]['info'].text

How can I do this?


You need to get the text of the info attribute, and use the GCI class to unescape the HTML. Then you can feed the string to Nokogiri::HTML and it will be parsed. Something like this.

require "nokogiri"
require "open-uri"
require "cgi"

doc = Nokogiri::HTML(open-uri(""))
node = doc.xpath("//houses/reg")
info_string = CGI.unescapeHTML(node[0]['info'])
info_doc = Nokogiri::XML(info_string)
# Now you can have a Nokogiri document from that attribute.