hari anggara hari anggara - 1 month ago 10
Java Question

Jsoup remove the unused element

I try to remove the unused html tag and attributes from my program.

I've already got the element that is needed. But some elements in this result are not needed and they have to be removed. How to remove them?

Elements tes = doc.select("div.pd__content__row");
Elements spesifikasiProductContent = tes.select("[class=pd__spec__table]");
System.out.println(spesifikasiProductContent);


the result:

<table class="pd__spec__table">
<tbody>
<tr>
<td>Isi</td>
<td>750ml</td>
</tr>
<tr>
<td>Material</td>
<td>Tritan Material, ABS Plastic</td>
</tr>
<tr>
<td>Dimensi</td>
<td>21 X 15 X 3 Cm</td>
</tr>
<tr>
<td>Lain-lain</td>
<td>Dimensi : A5 <br> min. -20C, Max. 120C</td>
</tr>
<tr>
<td>Sertifikasi</td>
<td>CE / EU, CIQ, EEC, FDA, LFGB, SGS</td>
</tr>
<tr>
<td>Volume</td>
<td>&lt; 0.500 L</td>
</tr>
</tbody>
</table>


The expected result:

Isi 750ml
Material Tritan Material, ABS Plastic
Dimensi 21 X 15 X 3 Cm
Lain-lain
Dimensi : A5 <br> min. -20C, Max. 120C
Sertifikasi CE / EU, CIQ, EEC, FDA, LFGB, SGS
Volume &lt; 0.500 L

Answer

Use the text() method on the row elements to get the textnodes:

public java.lang.String text()

Gets the combined text of this element and all its children. Whitespace is normalized and trimmed. For example, given HTML <p>Hello <b>there</b> now! </p>, p.text() returns "Hello there now!"

Returns:

unencoded text, or empty string if none.

Example Code

Elements tes = doc.select("div.pd__content__row");
Elements spesifikasiProductContent = tes.select("[class=pd__spec__table]");

String cleaned = "";

for (Element element : spesifikasiProductContent) {
    for (Element rowElement : element.select("tr")) {
        cleaned += rowElement.text()+"\n";
    }
    System.out.println(cleaned);
}

Output

Isi 750ml
Material Tritan Material, ABS Plastic
Dimensi 21 X 15 X 3 Cm
Lain-lain Dimensi : A5 min. -20C, Max. 120C
Sertifikasi CE / EU, CIQ, EEC, FDA, LFGB, SGS
Volume < 0.500 L