Dave Dave - 7 months ago 25
Java Question

Java JSoup error fetching URL

So i am creating this application which will enable me to fetch values from a specific website to the console. The value is from a span and i am using JSoup. But i am getting this error ""Error fetching URL". Here is my Java code:

public class TestSl {

public static void main(String[] args) throws IOException{

Document doc = Jsoup.connect("http://stackoverflow.com/questions/11970938/java-html-parser-to-extract-specific-data").get();
Elements spans = doc.select("span[class=hidden-text]");

for (Element span : spans) {
System.out.println(span.text());
}


}


}

And here is the error on Console:


Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=Java Html parser to extract specific data?
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216)
at TestSl.main(TestSl.java:19)


I am out of options and i have tried everything. If possible, Please try to write the full coding so i could understand it without having to confuse myself over my question and answer. :)

Answer

Set the user-agent header:

.userAgent("Mozilla")

Example:

Document document = Jsoup.connect("http://stackoverflow.com/questions/11970938/java-html-parser-to-extract-specific-data").userAgent("Mozilla").get();
Elements elements = document.select("span.hidden-text");
for (Element element : elements) {
  System.out.println(element.text());
}

Stack Exchange

Inbox

Reputation and Badges

source: http://stackoverflow.com/a/7523425/1048340


Perhaps this is related: http://meta.stackexchange.com/questions/277369/a-terms-of-service-update-restricting-companies-that-scrape-your-profile-informa

Comments