bEtTy Barnes bEtTy Barnes - 24 days ago 12
HTML Question

Parse a table from HTML using jsoup

I've got another problem with scraping html text. Here's the sample of what I'm trying to extract from:

<table class="scripture">
<tbody>
<tr>
<td class="verse" valign="top">
<a name="2:1"></a><a class="vers" href="javascript:getParallel('LUK', 2, 1);" title="Klik om grondtekst en SV te zien">&nbsp;1&nbsp;</a>
</td>
<td class="content">
<span class="main">En het geschiedde in die dagen dat er een gebod uitging van keizer Augustus dat heel de wereld ingeschreven moest worden.</span>
</td>
</tr>
</tbody>
</table>

<table class="scripture">
<tbody>
<tr>
<td class="verse" valign="top">
<a name="2:2"></a><a class="vers" href="javascript:getParallel('LUK', 2, 2);" title="Klik om grondtekst en SV te zien">&nbsp;2&nbsp;</a>
</td>
<td class="content">
<span class="main">Deze eerste inschrijving vond plaats toen Cyrenius over Syrië stadhouder was.</span>
</td>
</tr>
</tbody>
</table>


This is similar to my problem in this link but I want to get the verse text and the Scripture content. How do I achieve this?

So far this is what I've tried:

Element table = doc.select("table[class=scripture]").first();
Log.e("BB", "passage1: " + table.ownText());


But it doesn't display anything. Any help would be appreciated. Thanks.

Answer

Assuming that you want to get the span's content corresponding to the table that itself contains the verse 2:2, you can do it with:

String verse = "2:2";
// The span of class main located inside the table of class scripture
// that contains a td of class verse with a link whose attribute name is the value of verse
Element p = doc.select(
    String.format("table.scripture:has(td.verse a[name=%s]) span.main", verse)
).first();
System.out.println(p.text());

Output:

Deze eerste inschrijving vond plaats toen Cyrenius over Syrië stadhouder was.
Comments