josh josh - 6 months ago 14
Ruby Question

How to feed only a string to Nokogiri

I have the following sample XML:

<all>
<houses>
<reg info='<root><h level="2" i="1"> something </h><root>'
other="test"
something
</reg>
</houses>
</all>


I want to parse the XML provided in the
info
property of the
<reg>
tag, but I don't know how to feed the content of the
info
attribute to Nokogiri.

This is what I have now:

doc = Nokogiri::HTML(open-uri(mylink))
node = doc.xpath(//houses/reg)
puts node[0]['info'].class #string
#content of info property as string. This is what I want to feed to nokogiri as xml
puts node[0]['info'].text


How can I do this?

Answer

You need to get the text of the info attribute, and use the GCI class to unescape the HTML. Then you can feed the string to Nokogiri::HTML and it will be parsed. Something like this.

require "nokogiri"
require "open-uri"
require "cgi"

doc = Nokogiri::HTML(open-uri("http://example.com/foo.xml"))
node = doc.xpath("//houses/reg")
info_string = CGI.unescapeHTML(node[0]['info'])
info_doc = Nokogiri::XML(info_string)
# Now you can have a Nokogiri document from that attribute.
Comments