Abhishek Abhishek - 1 month ago 18
CSS Question

Jsoup scraping text from children of div

I am trying to extract a review of a the product on the link- Moto X using JSoup but it is throwing NullPointerException. Also, I want to extact the text which is shown after clicking "Read More" link of the review.

import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;

public class JSoupEx
{
public static void main(String[] args) throws IOException
{
Document doc = Jsoup.connect("https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA").get();
Element ele = doc.select("div[class=qwjRop] > div").first();
System.out.println(ele.text());
}
}


Any solutions?

Answer

As gherkin suggested, using the network tab in the developer tools, we see a request that receives the reviews (in JSON format) as a response:

https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=0

Using a JSON parser like JSON.simple we can extract information like review author, usefulness and text.

Example Code

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
String reviewApiCall = "https://www.flipkart.com/api/3/product/reviews?productId=MOBEFM5HAFRNSJJA&count=15&ratings=ALL&reviewerType=ALL&sortOrder=MOST_HELPFUL&start=";
String xUserAgent = userAgent + " FKUA/website/41/website/Desktop";
String referer = "https://www.flipkart.com/moto-x-play-with-turbo-charger-white-16-gb/product-reviews/itmefzwvdejejvth?pid=MOBEFM5HAFRNSJJA";
String host = "www.flipkart.com";
int numberOfPages = 2; // first two pages of results will be fetched

try {
    // loop for multiple review pages
    for (int i = 0; i < numberOfPages; i++) {
        // query reviews
        Response response = Jsoup.connect(reviewApiCall+(i*15)).userAgent(userAgent).referrer(referer).timeout(5000)
                .header("x-user-agent", xUserAgent).header("host", host).ignoreContentType(true).execute();

        System.out.println("Response in JSON format:\n\t" + response.body() + "\n");

        // parse json response
        JSONObject jsonObject = (JSONObject) new JSONParser().parse(response.body().toString());
        jsonObject = (JSONObject) jsonObject.get("RESPONSE");
        JSONArray jsonArray = (JSONArray) jsonObject.get("data");

        for (Object object : jsonArray) {
            jsonObject = (JSONObject) object;
            jsonObject = (JSONObject) jsonObject.get("value");
            System.out.println("Author: " + jsonObject.get("author") + "\thelpful: "
                    + jsonObject.get("helpfulCount") + "\n\t"
                    + jsonObject.get("text").toString().replace("\n", "\n\t") + "\n");
        }
    }
} catch (Exception e) {
    e.printStackTrace();
}

Output

Response in JSON format:
    {"CACHE_INVALIDATION_TTL":"132568825671","REQUEST":null,"REQUEST-ID": [...] }

Author: Flipkart Customer   helpful: 140
    A great phone at an affordable price with
    -an outstanding camera
    -great battery life
    -an excellent display
    -premium looks
     the flipkart delivery was also fast and perfect.

Author: Vaibhav Yadav   helpful: 518
    I m writing this review after using 2 months..
    First of all ..I must say this is one of the best product ..camera quality is best in natural lights or daytime..but in low light and in the night..camera quality is not so good but it's ok..
    It has good battery backup ..last one day on 3g usage ..while using 4g ..it lasts for about 10-12 hour..
    Turbo charges is good..although ..my charger is not working..
    Only problem in this phone is ..while charging..this phone heats a lot..this may b becoz of turbo charger..if u r using other charger than it does not heat..

Author: KAPIL CHOPRA    helpful: 9
[...]

Note: output truncated ([...])

Comments