horace_vr horace_vr - 7 months ago 9
Python Question

BeautifulSoup4: Find elements with children tags

I am building a program to scrape webpages.

I need to find all

tr
elements in a page that have
td
children with
class="table"


<tr>
<td class="table">1</td>
<td class="table">
<a href="...">...</a>
</td>
<td class="table">18</td>
</tr>


I already managed to find all
td
with
class=table
elements with

MySoup = soup.find_all("td", { "class" : "table" })


and also all
tr


MySoup = soup.find_all("tr")


but there are too many in the whole page, and this is not exactly what I need anyway...

Answer

I need to find in a page all tr elements that have td children with class="table"

soup.select('tr td.table')

If you want all td's which are direct childrens of tr then use

soup.select('tr > td.table')

Example:

>>> html = '''<tr>
    <td class="table">1</td>
    <td class="table">
        <a href="...">...</a>
    </td>
    <td class="table">18</td>
</tr><td class="table">19</td>'''
>>> soup = BeautifulSoup(html, 'lxml')
>>> soup.select('tr td.table')
[<td class="table">1</td>, <td class="table">\n<a href="...">...</a>\n</td>, <td class="table">18</td>]
>>> 
Comments