OsmaK OsmaK - 6 months ago 46
Java Question

Java - Get data from website doesn't work (403 Error)

Hi I'm trying to make a program that get data from "http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html" but an error appear (403 error)

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;

public class Test {

public static void main(String[] args) throws IOException {
URL urlObject;
String codigo;
try{
urlObject=new URL("http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html");
InputStreamReader isr = new InputStreamReader(urlObject.openStream());
BufferedReader br=new BufferedReader(isr);
while((codigo=br.readLine())!=null)
System.out.println(codigo);
br.close();

}
catch(MalformedURLException e){
e.printStackTrace();
}
catch(IOException e){
e.printStackTrace();
}
}

}


When I run the program this error appear:

java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at Test.Test.main(Test.java:17)


The purpose of the program it's get the price of the product and print it with a System.out.println, how can I do that?

Answer

I have just tested with curl it works, but if I set the User-Agent used by Java by default I get this 403 HTTP error. It seems that the web master of this website doesn't like Java :-)

To work around this, simply set another User-Agent by doing this:

urlObject=new URL("http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html");
URLConnection c = urlObject.openConnection();
c.setRequestProperty("User-Agent", "<put a the user agent of your choice here>");
InputStreamReader isr = new InputStreamReader(c.getInputStream());

If you don't know which User-Agent to use, use the one of your browser that you can get from here

Comments