BarneyL. BarStin BarneyL. BarStin - 3 months ago 20
HTTP Question

why Jsoup can't connect with some URL?

In my little app im using the framework "Jsoup" for download the html, but the problem is that my code don't work with some urls. This is my code:

public static void main(String[] args) {

String html=null;

//Descargamos el html
String url = "http://www.opposingviews.com";
Connection conn = Jsoup.connect(url);
try {
Response resp = conn.execute();
if (resp.statusCode() != 200) {
System.out.println("Error: "+resp.statusCode());
}else{
System.out.println(Thread.currentThread().getName()+" is downloading "+ url);
//html = conn.get().html();
}
}catch(IOException e) {
System.out.println(e.getStackTrace());
System.out.println(Thread.currentThread().getName()+"No puedo conectar con "+ url);
System.out.println("No se puede conectar");
}


And dont work with some urls like:

http://www.topix.com
http://www.wittyfeed.com
http://www.wittyfeed.com...


But work with others like:
http://www.google.com, http://www.amazon.es
...

The error is

org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590),
org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540),
org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227),
Practica1.prueba.main(prueba.java:34)


What can be the problem for this behavior?

Answer

First thing, you need to print what exception you get when you try to connect to the URL

which is

http://www.topix.comorg.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://www.topix.com

So please add the User agent like below

Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");

made changes to your code

import java.io.IOException;
import org.jsoup.Connection;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;


public class JsonExample {

    public  static void main(String[] args) {

        String html=null;

        //Descargamos el html
        String url = "http://www.topix.com";
        Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
        try {
            Response resp = conn.execute();
            if (resp.statusCode() != 200) {
                System.out.println("Error: "+resp.statusCode());
            }else{
                System.out.println(Thread.currentThread().getName()+" is downloading "+ url);
                //html = conn.get().html();
            }   
        }catch(IOException e) {
             System.out.println(e.getStackTrace());
             System.out.println(Thread.currentThread().getName()+"No puedo conectar con  "+ url + e);
             System.out.println("No se puede conectar");
        }
    }   
}
Comments