User124235 User124235 - 27 days ago 7
Java Question

Jsoup imdb rating

I wrote a program which reads the name and the rating of the top 250 movies on imdb and return the mean of the rating. I have the follow program

import java.io.IOException;

import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class da {

/**
* @param args
*/
public static void main(String[] args) {
try {


Document doc=Jsoup.connect("http://www.imdb.com/chart/top").get();
Elements e=doc.getElementsByClass("titleColumn");
Elements t=doc.getElementsByClass("imdbRating");
float suma=0;
for(int i=0;i<e.size();i++)
suma=suma+Float.parseFloat(t.get(i).text());

System.out.println(suma/250);


} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}


}

}


My question is why in 't' it needs "imdbRating" because if i look in the html on the page i see that where rating is located it writes "ratingColumn imdbRating" (i did this program by mistake and i don't know why it is working this way and not the other way)

Answer Source

You don't need the element e in this program. The titleColumn in the webpage just contains the title of the movie. Considering you only need the ratings, this is unnecessary. You can just use the t element when I renamed to ratings and cleaned up your code a little bit:

    import java.io.IOException;

import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class da {

    /**
     * @param args
     */
    public static void main(String[] args) {
        try {

            Document doc = Jsoup.connect("http://www.imdb.com/chart/top").get();
            Elements ratings = doc.select(".ratingColumn.imdbRating");

            float suma = 0;

            for(int i = 0; i < ratings.size(); i++)
                suma = suma + Float.parseFloat(ratings.get(i).child(0).text());

            System.out.println(suma/250);


        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }


    }

}

EDIT: To select elements with multiple classes, you must use doc#select and pass it a CSS query like above.