user2968505 user2968505 - 2 years ago 111
HTML Question

Python scrapy, how to only get immediate children

so i have some html like this

<div class="content">
<div class="infobox">
<p> text </p>
<p> more text </p>
<p> text again </p>
<p> even more text </p>

And i am using this selector
'.content p::text'
i thought this would only get me the immediate children, so i wanted it to extract "text again" and "even more text" but it's also getting the text from the paragraphs inside the other div, how can i prevent this from happening, i only want text from the paragraphs that are the immediate children of the div with the class .content

Answer Source

Scrapy uses an extended set of CSS selectors and XPath selectors. In your case, you're using CSS selectors. The CSS relationship selector you want is > denoting a parent/child relationship, as in: .content > p::text. Scrapy's selectors are described in the section titled "Selectors" in its documentation.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download