Sundorer Sundorer - 6 months ago 13
Python Question

Getting list based on text lxml

I have some html like:

...
<table width="100%">
<tr class="blueborder">
<td colspan="2" class="blackbold">Some Other Text</td>
</tr>
</table>
<table width="100%">
<tr class="upcoming">
<td class="lists" >
<ul>
<li> List1 Element1</li>
<li> List1 Element2</li>
<li> List1 Element3</li>
</ul>
</td>
</tr>
</table>
<table width="100%">
<tr class="blueborder">
<td colspan="2" class="blackbold">Signaling Text</td>
</tr>
</table>
<table width="100%">
<tr class="upcoming">
<td class="lists" >
<ul>
<li> List2 Element1</li>
<li> List2 Element2</li>
<li> List2 Element3</li>
</ul>
</td>
</tr>
</table>
...


I was using
employees = root.xpath('.//td[@class = "lists"]/ul/li/text()')
, but this grabs both list elements. I'd just like to grab lists 2, except they have the same properties (class and such). The only difference is
<td colspan="2" class="blackbold">Signaling Text</td>
comes before the list I want. Is there some way to indicate to only get this list after this?

Answer

You can select the following td after the tr with the text Signaling Text:

h = """ <table width="100%">
            <tr class="blueborder">
              <td colspan="2" class="blackbold">Some Other Text</td>
            </tr>
          </table>
          <table width="100%">
        <tr class="upcoming">
          <td class="lists" >
            <ul>
              <li> List1 Element1</li>
              <li> List1 Element2</li>
              <li> List1 Element3</li>
            </ul>
          </td>
        </tr>
     </table>
      <table width="100%">
        <tr class="blueborder">
          <td colspan="2" class="blackbold">Signaling Text</td>
        </tr>
      </table>
      <table width="100%">
        <tr class="upcoming">
          <td class="lists" >
            <ul>
              <li> List2 Element1</li>
              <li> List2 Element2</li>
              <li> List2 Element3</li>
            </ul>
          </td>
        </tr>
     </table>  """

from lxml import html
tree = html.fromstring(h)
print(tree.xpath('//td[contains(.,"Signaling Text")]/following::td[@class = "lists"]/ul/li/text()'))

Which would give you:

[' List2 Element1', ' List2 Element2', ' List2 Element3']

Or if you were sure it was the second occurrence:

tree.xpath('(//td[@class = "lists"])[2]/ul/li/text()')