user3000019 user3000019 - 1 month ago 6
Java Question

Trouble Getting information from html tables in java

I want to get information from the first table inside this site
Link

This its the code i have

Document document = Jsoup.parse(DownloadPage("http://www.transtejo.pt/clientes/horarios" +
"-ligacoes-fluviais/ligacao-barreiro-terreiro-do-paco/#dias-uteis"));

Elements table = document.select("table.easy-table-creator:nth-child(1) tbody");
Elements trAll = table.select("tr");

//For the Table Hour
Elements tr_first = table.select("tr:nth-child(1)");
Element tr = tr_first.get(1);
Elements td = tr.getElementsByTag("td");
for(int i = 0; i < td.size(); i++) {
Log.d("TIME TABLE:"," " + td.get(i).text());

for(int i1 = 1; i1 < trAll.size(); i1++) {

Elements td_inside = trAll.get(i1).getElementsByTag("td");
Log.d("TD INSIDE:"," " + td_inside.get(i).text());


}



}


Right now im being able to get information, the problem its that im getting content from other tables, because all tables class name are the same and im having trouble specifying the table that i need, and im also getting IndexOutOfBoundsException

This its the Log of this
Loglink

The type of log i want its something like this:
The Hour(TIME TABLE) and then in this hour i want to get all the bottom lines with the minutes (TD INSIDE) for that hour, and then move to next hour (...)

Thans for your time.

[EDIT]
Better log example
Check first table.

TIME TABLE: 05H
TD INSIDE: 15
TD INSIDE: 45
TIME TABLE: 06H
TD INSIDE: 15
TD INSIDE: 35
TD INSIDE: 45
TD INSIDE: 55
TIME TABLE: 07H
TD INSIDE: 05
TD INSIDE: 15
TD INSIDE: 20
TD INSIDE: 25
TD INSIDE: 35
TD INSIDE: 40
TD INSIDE: 50
TD INSIDE: 55


(...)

Answer

You can do it:

Element table = document
  .select("table.easy-table-creator:nth-child(1) tbody").first();
Elements trAll = table.select("tr");
Elements trAllBody = table.select("tr:not(:first-child)");

// For the Table Hour
Element trFirst = trAll.first();
Elements tds = trFirst.select("td");
for(int i = 0; i < tds.size(); i++){
    Element td = tds.get(i);
    Log.d("TIME TABLE:", " " + td.text());

    String query = "td:nth-child(" + (i + 1) + ")";
    Elements subTds = trAllBody.select(query);
    for (int j = 0; j < subTds.size(); j++) {
        Element subTd = subTds.get(j);
        String tdText = subTd.text();
        if(!tdText.isEmpty()){                  
            Log.d("TD INSIDE:", " " + subTd.text());
        }
    }
}

Some interesting points:

  • your table.easy-table-creator:nth-child(1) tbody selector was selecting all the tables in the page;
  • with a progressive select you can retrieve all the tds in a given column: td:nth-child(index);
  • trAllBody here contains all the trs that are not the first one (using the tr:not(:first-child) selector).