Sergio Ares Sergio Ares - 19 days ago 6
Android Question

Problems using jsoup to parse HTML Table

Few days before I asked a question to parse HTML Table with JSOUP. @luksch helped me and I could solve my problem. My problem was how to parse one part of HTML file with many TR and TD to select an especific text in them (Group Block TABLE).

HTML CODE:

<TABLE SUMMARY="Topline" WIDTH="100%">
<TR><TD HEIGHT=16>&nbsp;</TD></TR> <!-- For the menu bar -->
<TR>
<TD VALIGN=MIDDLE ALIGN=LEFT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Xymon</B></FONT
</TD>
<TD VALIGN=MIDDLE ALIGN=CENTER WIDTH="40%">
<CENTER><FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Current Status</B></FONT></CENTER>
</TD>
<TD VALIGN=MIDDLE ALIGN=RIGHT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Thu Jul 23 16:05:06 2015</B></FONT>
</TD>
</TR>
<TR>
<TD COLSPAN=3> <HR WIDTH="100%"> </TD>
</TR>
</TABLE>
<BR>
<A NAME=hosts-blk>&nbsp;</A>

<CENTER><TABLE SUMMARY="Group Block" BORDER=0 CELLPADDING=2>
<TR><TD VALIGN=MIDDLE ROWSPAN=2><CENTER><FONT COLOR="#FFFFF0" SIZE="+1">&nbsp;</FONT></CENTER></TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbd"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbgen"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbgen</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbtest"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbtest</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?conn"><FONT COLOR="#87a9e5" SIZE="-1"><B>conn</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?cpu"><FONT COLOR="#87a9e5" SIZE="-1"><B>cpu</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?disk"><FONT COLOR="#87a9e5" SIZE="-1"><B>disk</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?files"><FONT COLOR="#87a9e5" SIZE="-1"><B>files</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?hobbitd"><FONT COLOR="#87a9e5" SIZE="-1"><B>hobbitd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?http"><FONT COLOR="#87a9e5" SIZE="-1"><B>http</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?info"><FONT COLOR="#87a9e5" SIZE="-1"><B>info</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?memory"><FONT COLOR="#87a9e5" SIZE="-1"><B>memory</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?msgs"><FONT COLOR="#87a9e5" SIZE="-1"><B>msgs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?ports"><FONT COLOR="#87a9e5" SIZE="-1"><B>ports</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?procs"><FONT COLOR="#87a9e5" SIZE="-1"><B>procs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?trends"><FONT COLOR="#87a9e5" SIZE="-1"><B>trends</B></FONT></A> </TD>
</TR>

<TR><TD COLSPAN=15><HR WIDTH="100%"></TD></TR>
<TR class=line>
<TD NOWRAP><A NAME="hostname1">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.1">hostname1</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1.&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbd:green:268d04h25m" TITLE="bbd:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbgen"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbgen:green:268d04h24m" TITLE="bbgen:green:268d04h24m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbtest"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbtest:green:268d04h25m" TITLE="bbtest:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:268d04h25m" TITLE="conn:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=cpu"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="cpu:green:169d00h15m" TITLE="cpu:green:169d00h15m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=disk"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="disk:green:268d04h25m" TITLE="disk:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=files"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="files:clear:268d04h25m" TITLE="files:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=hobbitd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="hobbitd:green:169d01h05m" TITLE="hobbitd:green:169d01h05m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.1" TITLE="info:green:127.0.0.1" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=memory"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="memory:green:268d04h25m" TITLE="memory:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=msgs"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="msgs:green:268d04h20m" TITLE="msgs:green:268d04h20m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=ports"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="ports:clear:268d04h25m" TITLE="ports:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=procs"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="procs:clear:268d04h25m" TITLE="procs:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

<TR class=line>
<TD NOWRAP><A NAME="hostname2">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.2">hostname2</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/red.gif" ALT="bbd:red:16d06h46m" TITLE="bbd:red:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:16d06h46m" TITLE="conn:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:16d06h46m" TITLE="http:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.2" TITLE="info:green:127.0.0.2" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

</TABLE></CENTER><BR>
<BR><BR>


First part (TABLE Group Block with bbd, bbdgen, bbtest, etc) I fixed with:

ArrayList<String> groupBlock = new ArrayList<String>();
Object[] objPlace;
Element table = document.select("TABLE").get(1); //select the second table: "Group Block"
Elements rows = table.select("TR");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(i);
Elements cols = row.select("TD");
for (Element col : cols){
switch(col.text()){
case "bbd":
case "bbgen":
case "bbtest":
//...more cases
groupBlock.add(col.text());
break;
default:
break;
}
}
}
objPlace = groupBlock.toArray();


Now I have to parse the two hostnames (hostname1 and hostname2) to put in a separates TextView but the problem is that hostname can change its name in the future. In addition, I have to parse the "IMG SRC" in each TD, for example:

<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>


I need parse only the IMG SRC /hobbit/gifs/static/green.gif to which will have to append the rest of the url at the begining: http://example.com/hobbit/gifs/static/green.gif to get the image and put it in another field in XML layout. I have to do this with all IMG SRC TD in HTML file.

I know that once I get the images I have to do something like:

InputStream input = new java.net.URL(imgSrc).openStream();
bitmap = BitmapFactory.decodeStream(input);
ImageView logoimg = (ImageView) findViewById(R.id.logo);
logoimg.setImageBitmap(bitmap);


imgSrc should be array with all IMG SRC

I don't know how to start in previous steps, I'm very rookie with Jsoup and Android.

Answer

You could query for the <td> with the hostname element. Then go to the parent, i.e. the <tr>. from then get again all children <td>. these will be the entries containing the links you want to get. Something along this:

Document document = Jsoup.parse(html);
Element table = document.select("TABLE").get(1); 
Elements asWithName = table.select("tr>td a[name]");
for (Element aWithName : asWithName){
    String name = aWithName.attr("name");
    System.out.println("hostname="+name);
    Element tr = aWithName.parent().parent();
    for (Element td : tr.select("td")){
        Element img = td.select("img").first();
        if (img == null){
            continue;
        }
        String imgRelPath = img.attr("src");
        System.out.println("  imgRelPath="+imgRelPath);
    }
}
Comments