Nico Hoppel Nico Hoppel - 8 months ago 69
Java Question

Web Crawler Amazon get span-Element

I'm crawling amazon categories and I get the salesrank and the product URLs. Now I want to crawl the category and I get every information from the category span.

<span class="zg_hrsr_ladder">in&nbsp;<a href="">B&uuml;cher</a> &gt; <a href="">Krimis & Thriller</a> &gt; <b><a href="">Deutschland</a></b></span>

This is an example code snippet and with following code

Elements category ="span.zg_hrsr_ladder");

I get everything inside the span. But I want only the text inside the a href "Bücher" "Krimis & Thriller" and "Deutschland". How can I get this information?


You want to get the text inside the <a> element, so select anchors in your span (append " a" to the selector) and call text() and the resulting elements.

Example Code

String source = "<span class=\"zg_hrsr_ladder\">in&nbsp;<a href=\"\">B&uuml;cher</a> &gt; <a href=\"\">Krimis & Thriller</a> &gt; <b><a href=\"\">Deutschland</a></b></span>";

Document htmlDocument = Jsoup.parse(source, "UTF-8");

Elements category ="span.zg_hrsr_ladder a");

category.forEach(aElement -> {


Krimis & Thriller