jvdh jvdh - 8 days ago 6
Javascript Question

Difference between XmlService and importxml

When trying to parse html as xml in google apps script, this code:

var yahoo= 'http://finance.yahoo.com/q?s=aapl'
var xml = UrlFetchApp.fetch(yahoo).getContentText();
var document = XmlService.parse(xml);


will return an error like this:

Error on line 20: The entity name must immediately follow the '&' in the entity reference. (line 13, file "")

Presumably because the html is not xml-compliant in some way in line 20. What surprises me is that when you do the same thing in google sheets and also supply an xpath, the html will be parsed as xml without problems:

=IMPORTXML("http://finance.yahoo.com/q?s=aapl,"//div[@class='title']")


will return "Apple Inc. (AAPL)". I assume that the sheets function has some way of cleaning the html to make it xml compliant.


  • do you think that could be the case?

  • if yes, do you have an idea how I could adapt the xml parser in apps script in such a way that I can access html from yahoo finance and treat it as xml?



thanks in advance!

Answer

New XmlService could not do lenient parse. So no way right now. But you can still use old Xml service that is support lenient parse (perhaps IMPORTXML use it as well). The code that works:

var yahoo= 'http://finance.yahoo.com/q?s=aapl'
var xml = UrlFetchApp.fetch(yahoo).getContentText(); 
var document = Xml.parse(xml, true);

And there is the issue report about no ability to lenient parse in the new XmlService: https://code.google.com/p/google-apps-script-issues/issues/detail?id=3727

So I propose you to use old way and keep an eye on this issue.