Agustin Scalisi Agustin Scalisi - 6 months ago 70
Java Question

Jsoup Get data from table inside a table

This is not simple.
I am parsing a page (http://www.catedralaltapatagonia.com/invierno/partediario.php?default_tab=0)
I need the data contented in a table inside other table, but I cannot access because i receive allways errors about Invalid index Index

I need this values

cells i need

This cells are inside a td inside a tr, inside a table, and this table are inside an another table.
each column of cells are inside a div id "meteo_info", and inside each td there is the same name div id.

I tried this way with no success

Elements base1=document.select("div#pd_foto_fondo");
Elements base2 = base1.select("table");
Elements base3 = base2.select("tr");
Elements base4 = base3.select("table");
Elements base5 = base4.select("tr");
Elements base6 = base5.select("td");
Element base7 =base6.get(0);
Element div1 = base7.getElementById("meteo_info");
Elements tables1 = div1.getElementsByTag("table");
Element table1 = tables1.get(0);

String text2 = table1.getElementsByTag("tr").get(3).getElementsByTag("td").get(2).text();


I use this code inside an Asyntask doInBackground

TDG TDG
Answer

First thing, when downloading the web page in your app, change the USER AGENT field to match the browser you are using on your computer. I will assure that you get the exactly the same page in your app with same tags.
I use FF, but if you use another browser it should be almost the same -
open the developer tools (in FF it's F12), choose the inspector and choose the element picker (FF - left most tool). After that choose one of the elements you want to get, let's say the Sensación Térmica of SECTOR BASE. The browser will highlight the code that contains that element.
Place the mouse over the highligthed code, right click it and select Copy unique selector.
Then you can use this code to get the element -

Elements e = doc.select("#pd_foto_fondo > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(1) > div:nth-child(1) > div:nth-child(3) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(4) > td:nth-child(3)"); 

And you can get the value by

e.text();

Now, do it for all the elements you need, and you will find a pattern - there are three tables (SECTOR BASE, SECTOR INTERMEDIO, SECTOR SUPERIOR) and their id is at the 7th place from the end (not easy to see it, too long line...) -

#pd_foto_fondo > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(1) > div:nth-child(1) > div:nth-child(3) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(4) > td:nth-child(3)
#pd_foto_fondo > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > div:nth-child(1) > div:nth-child(3) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(4) > td:nth-child(3)
#pd_foto_fondo > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(3) > div:nth-child(1) > div:nth-child(3) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(4) > td:nth-child(3)

And also, each row has a different id, this time it's the second one from the end. The Sensación Térmica is

#pd_foto_fondo > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(1) > div:nth-child(1) > div:nth-child(3) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(4) > td:nth-child(3)

and the Viento is

#pd_foto_fondo > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(1) > div:nth-child(1) > div:nth-child(3) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(5) > td:nth-child(3)

(pay attention to the 4 and 5 at the last two lines).
You can run over those selectors with two nested for loops and get all the information you need.