Cisplatin Cisplatin - 1 month ago 14
Ruby Question

How to turn a file into a Nokogiri::XML object?

I have a sample XML file (let's call it

example.xml
for the sake of this question) and want to turn it into a Nokogiri object.

According to documentation and lots of other online sources, this should work:

xml = Nokogiri::XML(File.read("example.txt"))


But the value of
xml.to_xml
is only:

"<?xml version=\"1.0\"?>\n"


In other words, it's ignoring the rest of the file. There are many tags afterwards and none of them are in the
xml
object.

How do I get Nokogiri to get all the tags?

EDIT: Here's the XML I'm using:

<? xml version="1.0" encoding="UTF-8" ?>
<Document>
<Test>Test</Test>
</Document>

Answer

It looks like you are trying to parse an invalid XML doc.

This can be fixed by removing the spaces in the XML declaration:

<?xml version="1.0" encoding="UTF-8"?>
<Document>
    <Test>Test</Test>
</Document>

How I figured this out

By default, when Nokogiri has errors parsing a document it populates an errors array.

xml = Nokogiri::XML(File.read("example.txt"))
p xml.errors
# => [#<Nokogiri::XML::SyntaxError: xmlParsePI : no target name>, #<Nokogiri::XML::SyntaxError: Start tag expected, '<' not found>]

You can also configure Nokogiri to raise an exception of it has parsing errors:

xml = Nokogiri::XML(File.read("example.txt")) do |config|
  config.strict
end

Both of these cases show that there were issues parsing the document