RanRag RanRag - 7 months ago 37
Python Question

Python: CSS Selector to use inside lxml.cssselect

I am trying to parse the given below html code using

lxml.html
and using
CSSSelector
instead of
XPath
.

link = doc.cssselect('html body div.results dl dt a)


the above code is giving me
content-1
and
content-2
as output but my desired output is
link 1 link 2
. So I replaced my code with

link = doc.cssselect('html body div.results dl dt a[href]')


but still am getting the same output. So my question is what's the proper CSS selector to get href attribute.

<div class = "results">
<div> some tags here </div>
<dl>
<dt title = "My Title 1" style = "background: transparent url('/img/accept.png') no-repeat right center">
<a href = "/link 1"> content-1</a>
</dt>
</dl>

<dl>
<dt title = "My Title 2" style = "background: transparent url('/img/accept.png') no-repeat right center">
<a href = "/link 2">content-2</a>
</dt>
</dl>
</div>

Answer

I believe you cannot get the attribute value through CSS selectors. You should get the elements...

>>> elements = doc.cssselect('div.results dl dt a')

...and then get the attributes from them:

>>> for element in elements:
...     print element.get('href')
... 
/link 1
/link 2

Of course, list comprehensions are your friends:

>>> [element.get('href') for element in elements]
['/link 1', '/link 2']

Since you cannot update properties of attributes in CSS, I believe there is no sense on getting them through CSS selectors. You can "mention" attributes in CSS selectors to retrieve only to match their elements. However, is is just cogitation and I may be wrong; if I am, please someone correct me :) Well, @Tim Diggs confirms my hypothesis below :)

EDIT: You can now do this using pseudo-selectors eg:

doc.cssselect('div.results dl dt a::attr('href')')

This will return the href attribute of each link.