Sergio Ares Sergio Ares - 22 days ago 10
Android Question

How to parse HTML table using jsoup?

I am trying to parse HTML using jsoup. This is my first time working with jsoup and it is being a little hard to me. The HTML table which I am trying to parse is below. The HTML table is very complicated because of many TR and TD and I don't know how to proceed to select the name of each column in table 1: "Group Block" (Table 0 is Topline and I don't need it).

I need only to select "bdd, bbgen, bbtest, conn, cpu, disk, files, hobbitd, http, info, memory, msgs, ports, procs, trends" to set them in a TextView tag in a xml file. Is this possible using jsoup?

I have to say that I'm doing the conexión to the URL as follows:

String username = "user";
String password = "pass";
String login = username + ":" + password;
String base64login = new String(android.util.Base64.encode(login.getBytes(), android.util.Base64.NO_WRAP));
Document document = Jsoup.connect("http://example.com").header("Authorization", "Basic " + base64login).get();


HTML code:

<TABLE SUMMARY="Topline" WIDTH="100%">
<TR><TD HEIGHT=16>&nbsp;</TD></TR> <!-- For the menu bar -->
<TR>
<TD VALIGN=MIDDLE ALIGN=LEFT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Xymon</B></FONT
</TD>
<TD VALIGN=MIDDLE ALIGN=CENTER WIDTH="40%">
<CENTER><FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Current Status</B></FONT></CENTER>
</TD>
<TD VALIGN=MIDDLE ALIGN=RIGHT WIDTH="30%">
<FONT FACE="Arial, Helvetica" SIZE="+1" COLOR="silver"><B>Thu Jul 23 16:05:06 2015</B></FONT>
</TD>
</TR>
<TR>
<TD COLSPAN=3> <HR WIDTH="100%"> </TD>
</TR>
</TABLE>
<BR>
<A NAME=hosts-blk>&nbsp;</A>


<CENTER><TABLE SUMMARY="Group Block" BORDER=0 CELLPADDING=2>
<TR><TD VALIGN=MIDDLE ROWSPAN=2><CENTER><FONT COLOR="#FFFFF0" SIZE="+1">&nbsp;</FONT></CENTER></TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbd"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbgen"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbgen</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?bbtest"><FONT COLOR="#87a9e5" SIZE="-1"><B>bbtest</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?conn"><FONT COLOR="#87a9e5" SIZE="-1"><B>conn</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?cpu"><FONT COLOR="#87a9e5" SIZE="-1"><B>cpu</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?disk"><FONT COLOR="#87a9e5" SIZE="-1"><B>disk</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?files"><FONT COLOR="#87a9e5" SIZE="-1"><B>files</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?hobbitd"><FONT COLOR="#87a9e5" SIZE="-1"><B>hobbitd</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?http"><FONT COLOR="#87a9e5" SIZE="-1"><B>http</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?info"><FONT COLOR="#87a9e5" SIZE="-1"><B>info</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?memory"><FONT COLOR="#87a9e5" SIZE="-1"><B>memory</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?msgs"><FONT COLOR="#87a9e5" SIZE="-1"><B>msgs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?ports"><FONT COLOR="#87a9e5" SIZE="-1"><B>ports</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?procs"><FONT COLOR="#87a9e5" SIZE="-1"><B>procs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?trends"><FONT COLOR="#87a9e5" SIZE="-1"><B>trends</B></FONT></A> </TD>
</TR>
<TR><TD COLSPAN=15><HR WIDTH="100%"></TD></TR>


EDIT:

I tried this but it doesn't work:

ArrayList<String> groupBlock = new ArrayList<String>();
Object[] objPlace;
Element table = document.select("TABLE").get(1); //select the second table: "Group Block"
Elements rows = table.select("TR");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(i);
Elements col = row.select("TD");
if (col.get(1).text().equals("bbd")) { //Check only one field by the moment
groupBlock.add(col.get(1).text());
}
}
objPlace = groupBlock.toArray();


Then I do:

TextView txtGroupBlock = (TextView) findViewById(R.id.txtGroupBlock);
txtGroupBlock.setText("");
for (int i = 0; i < objPlace.length; i++) {
txtGroupBlock.append(objPlace[i].toString() + " ");
}


The error:

07-23 21:26:36.454: E/AndroidRuntime(330): FATAL EXCEPTION: AsyncTask #1
07-23 21:26:36.454: E/AndroidRuntime(330): java.lang.RuntimeException: An error occured while executing doInBackground()
07-23 21:26:36.454: E/AndroidRuntime(330): at android.os.AsyncTask$3.done(AsyncTask.java:200)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.FutureTask$Sync.innerSetException(FutureTask.java:274)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.FutureTask.setException(FutureTask.java:125)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:308)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.FutureTask.run(FutureTask.java:138)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1088)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:581)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.lang.Thread.run(Thread.java:1019)
07-23 21:26:36.454: E/AndroidRuntime(330): Caused by: java.lang.IndexOutOfBoundsException: Invalid index 1, size is 1
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.ArrayList.throwIndexOutOfBoundsException(ArrayList.java:257)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.ArrayList.get(ArrayList.java:311)
07-23 21:26:36.454: E/AndroidRuntime(330): at org.jsoup.select.Elements.get(Elements.java:544)
07-23 21:26:36.454: E/AndroidRuntime(330): at activities.monitorapp.MainActivity$Update.doInBackground(MainActivity.java:211)
07-23 21:26:36.454: E/AndroidRuntime(330): at activities.monitorapp.MainActivity$Update.doInBackground(MainActivity.java:1)
07-23 21:26:36.454: E/AndroidRuntime(330): at android.os.AsyncTask$2.call(AsyncTask.java:185)
07-23 21:26:36.454: E/AndroidRuntime(330): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:306)


EDIT 2:

Now I have a parallel problem. I have to do something like before but now I have the following HTML CODE (just follows the previous html code, it is the same html file):

...
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?procs"><FONT COLOR="#87a9e5" SIZE="-1"><B>procs</B></FONT></A> </TD>
<TD ALIGN=CENTER VALIGN=BOTTOM WIDTH=45>
<A HREF="/hobbit-cgi/hobbitcolumn.sh?trends"><FONT COLOR="#87a9e5" SIZE="-1"><B>trends</B></FONT></A> </TD>
</TR>
<TR><TD COLSPAN=15><HR WIDTH="100%"></TD></TR>

<TR class=line>
<TD NOWRAP><A NAME="hostname1">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.1">hostname1</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1.&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbd:green:268d04h25m" TITLE="bbd:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbgen"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbgen:green:268d04h24m" TITLE="bbgen:green:268d04h24m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=bbtest"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="bbtest:green:268d04h25m" TITLE="bbtest:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:268d04h25m" TITLE="conn:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=cpu"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="cpu:green:169d00h15m" TITLE="cpu:green:169d00h15m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=disk"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="disk:green:268d04h25m" TITLE="disk:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=files"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="files:clear:268d04h25m" TITLE="files:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=hobbitd"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="hobbitd:green:169d01h05m" TITLE="hobbitd:green:169d01h05m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.1" TITLE="info:green:127.0.0.1" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=memory"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="memory:green:268d04h25m" TITLE="memory:green:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=msgs"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="msgs:green:268d04h20m" TITLE="msgs:green:268d04h20m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=ports"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="ports:clear:268d04h25m" TITLE="ports:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=procs"><IMG SRC="/hobbit/gifs/static/clear.gif" ALT="procs:clear:268d04h25m" TITLE="procs:clear:268d04h25m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

<TR class=line>
<TD NOWRAP><A NAME="hostname2">&nbsp;</A>
<FONT SIZE="+1" COLOR="#FFFFCC" FACE="Tahoma, Arial, Helvetica"><span title="127.0.0.2">hostname2</span></FONT><TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=bbd"><IMG SRC="/hobbit/gifs/static/red.gif" ALT="bbd:red:16d06h46m" TITLE="bbd:red:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=conn"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="conn:green:16d06h46m" TITLE="conn:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:16d06h46m" TITLE="http:green:16d06h46m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=info"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="info:green:127.0.0.2" TITLE="info:green:127.0.0.2" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER>-</TD>
<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname2&amp;SERVICE=trends"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="trends:green:" TITLE="trends:green:" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>
</TR>

</TABLE></CENTER><BR>
<BR><BR>


In this case I have to parse the two hostnames (hostname1 and hostname2) to put in a separates TextView but the problem is that hostname can change its name in the future. In addition, I have to parse the "IMG SRC" in each TD, for example:

<TD ALIGN=CENTER><A HREF="/hobbit-cgi/bb-hostsvc.sh?HOST=hostname1&amp;SERVICE=http"><IMG SRC="/hobbit/gifs/static/green.gif" ALT="http:green:268d04h19m" TITLE="http:green:268d04h19m" HEIGHT="16" WIDTH="16" BORDER=0></A></TD>


I need parse /hobbit/gifs/static/green.gif to which will have to append the rest of the url at the begining: http://example.com/hobbit/gifs/static/green.gif to get the image.

I know that once I get the image I have to do something like:

InputStream input = new java.net.URL(imgSrc).openStream();
bitmap = BitmapFactory.decodeStream(input);
ImageView logoimg = (ImageView) findViewById(R.id.logo);
logoimg.setImageBitmap(bitmap);


But I miss me in previous steps...Some idea? I don't know how to start...

Answer

The problem is here

if (col.get(1).text().equals("bbd")) {
  groupBlock.add(col.get(i).text());  
}

you try to access col.get(i), but i may by out of bounds, which is what the error tells you also.

If you change the index to something that you want, you should be fine. Maybe something like this:

ArrayList<String> groupBlock = new ArrayList<String>();
Object[] objPlace;
Element table = document.select("TABLE").get(1); //select the second table:     "Group Block"
Elements rows = table.select("TR");             
for (int i = 0; i < rows.size(); i++) {
    Element row = rows.get(i);
    Elements cols = row.select("TD");
    for (Element col : cols){
        switch(col.text()){
        case "bbd": 
        case "bbgen":
        case "bbtest":
        //...more cases if you need them
            groupBlock.add(col.select("a").first().attr("href"));
            System.out.println(col.text()); 
            break;
        default:
            break;
        }
    }      
}
objPlace = groupBlock.toArray();

I am not sure what you need from the DOM, but I think you get the idea.

Comments