bEtTy Barnes bEtTy Barnes - 1 year ago 56
HTML Question

Parse a table from HTML using jsoup

I've got another problem with scraping html text. Here's the sample of what I'm trying to extract from:

<table class="scripture">
<tbody>
<tr>
<td class="verse" valign="top">
<a name="2:1"></a><a class="vers" href="javascript:getParallel('LUK', 2, 1);" title="Klik om grondtekst en SV te zien">&nbsp;1&nbsp;</a>
</td>
<td class="content">
<span class="main">En het geschiedde in die dagen dat er een gebod uitging van keizer Augustus dat heel de wereld ingeschreven moest worden.</span>
</td>
</tr>
</tbody>
</table>

<table class="scripture">
<tbody>
<tr>
<td class="verse" valign="top">
<a name="2:2"></a><a class="vers" href="javascript:getParallel('LUK', 2, 2);" title="Klik om grondtekst en SV te zien">&nbsp;2&nbsp;</a>
</td>
<td class="content">
<span class="main">Deze eerste inschrijving vond plaats toen Cyrenius over Syriƫ stadhouder was.</span>
</td>
</tr>
</tbody>
</table>


This is similar to my problem in this link but I want to get the verse text and the Scripture content. How do I achieve this?

So far this is what I've tried:

Element table = doc.select("table[class=scripture]").first();
Log.e("BB", "passage1: " + table.ownText());


But it doesn't display anything. Any help would be appreciated. Thanks.

Answer Source

Assuming that you want to get the span's content corresponding to the table that itself contains the verse 2:2, you can do it with:

String verse = "2:2";
// The span of class main located inside the table of class scripture
// that contains a td of class verse with a link whose attribute name is the value of verse
Element p = doc.select(
    String.format("table.scripture:has(td.verse a[name=%s]) span.main", verse)
).first();
System.out.println(p.text());

Output:

Deze eerste inschrijving vond plaats toen Cyrenius over Syriƫ stadhouder was.