gishara gishara - 4 months ago 23
HTML Question

JSoup parsing HTML table in div

I am trying to crawl the following website:


http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget


I am connecting to the site and parse html table as below:

Document doc = Jsoup
.connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
.data("FLAT_TYPE", "02")
.data("NME_NEWTOWN", "BD Bedok")
.data("NME_STREET", "")
.data("NUM_BLK_FROM", "")
.data("NUM_BLK_TO", "")
.data("dteRange", "12")
.data("DTE_APPROVAL_FROM", "May 2015")
.data("DTE_APPROVAL_TO", "May 2016")
.data("AMT_RESALE_PRICE_FROM", "")
.data("AMT_RESALE_PRICE_TO", "")
.data("Process", "continue")
.cookies(cookies)
.timeout(0)
.post();

Element table = doc.getElementsByTag("table").first();


I tried the below way also, but the table was still null:

Element tableBody = doc.select("div[class=content]").select("table").first();


However the table is always empty.Please someone tell me where I am doing wrong.
Thanks in advance.

Answer

You must add another parameter to your request: enter image description here

Working code:

    try {

        String url = "http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget";

        Connection.Response response = Jsoup
                .connect(url)
                .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)" +
                        " Chrome/33.0.1760.152 Safari/537.36")
                .ignoreHttpErrors(true)
                .method(Connection.Method.GET)
                .execute();

        Document responseDocument = Jsoup.parse(response.body());

        Element rtisEnqFlagID = responseDocument.select("div.row input[type=hidden]").last();
        String name = rtisEnqFlagID.attr("name");
        String value = rtisEnqFlagID.attr("value");

        Document document = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)" +
                        " Chrome/33.0.1750.152 Safari/537.36")
                .data("FLAT_TYPE", "02")
                .data("NME_NEWTOWN", "BD      Bedok")
                .data("NME_STREET", "")
                .data("NUM_BLK_FROM", "")
                .data("NUM_BLK_TO", "")
                .data("dteRange", "12")
                .data("DTE_APPROVAL_FROM", "May 2015")
                .data("DTE_APPROVAL_TO", "May 2016")
                .data("AMT_RESALE_PRICE_FROM", "")
                .data("AMT_RESALE_PRICE_TO", "")
                .data("Process", "continue")
                .data(name, value)
                .cookies(response.cookies())
                .post();

        Elements tableBody = document.select("div.content table");

        for (Element table : tableBody)
            System.out.println(table);

    } catch (IOException e) {
        e.printStackTrace();
    }

Output:

<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>514</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1979</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$240,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Jun 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>101</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1978</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$240,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Nov 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>113</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>10 to 12</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>44.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1978</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$244,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Mar 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>535</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>01 to 03</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$250,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Jan 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>534</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>04 to 06</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$248,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Nov 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>535</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>10 to 12</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$230,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Nov 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>535</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>04 to 06</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$246,500.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Oct 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>541</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>10 to 12</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1985</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$238,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Jul 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>620</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$250,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Mar 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>618</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>04 to 06</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$250,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Feb 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>620</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>01 to 03</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$245,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>May 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>38</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>44.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1978</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$253,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>May 2015</span></td> 
  </tr> 
 </tbody>
</table>
Comments