Cmag Cmag - 7 days ago 6
Python Question

Python lxml/beautiful soup to find all links on a web page

I am writing a script to read a web page, and build a database of links that matches a certain criteria. Right now I am stuck with lxml and understanding how to grab all the

<a href>
's from the html...

result = self._openurl(self.mainurl)
content = result.read()
html = lxml.html.fromstring(content)
print lxml.html.find_rel_links(html,'href')

Answer

Use XPath. Something like (can't test from here):

urls = html.xpath('//a/@href')