Benja0906 Benja0906 - 9 months ago 48
HTML Question

Can't get subscript text, from parsing html

I am parsing a website, for an inorganic compound, and need to get it's chemical formula.

let data = NSData(contentsOf: URL(string: "")!)
let doc = TFHpple(htmlData: data as! Data)

if let elements = doc?.search(withXPathQuery: "//*[@class='selflink']/text()") as? [TFHppleElement] {
for element in elements {

It prints out "AuBr" But I need it to print the whole formula out which is "AuBr3"

This is the html code I'm getting the formula from:

enter image description here

How can I make it print out the whole formula with the 3 at the end?

Answer Source

Given the following HTML from the Wiki page:

    <div style="padding:0.1em 0;line-height:1.2em;"><a href="/wiki/Chemical_formula" title="Chemical formula">Chemical formula</a></div>

the following XPath expression

string(//tr[td[1]/div/a = "Chemical formula"]/td[2])

will return:

> xmllint --xpath 'string(//tr[td[1]/div/a = "Chemical formula"]/td[2])' ~/test.html