Niminim Niminim - 3 months ago 16
HTML Question

Java - parsing source code - illegal argument exception

I parsed source code from yahoo finance and I had no problem in reading the data. I used this static method:

public static String readYahooHtml(String symbol) {
In page = new In("http://finance.yahoo.com/quote/" + symbol);
String html = page.readAll();
if (html.contains("<title></title>")) return null;
else return html;
}


Page example https://finance.yahoo.com/quote/AES

When I try to do the same to a source code from gurufocus

// Given symbol, get HTML
public static String readGuruFocusHtml(String symbol) {
In page = new In("http://www.gurufocus.com/stock/" + symbol);
String htmlGF = page.readAll();
if (htmlGF.contains("<title></title>")) return null;
else return htmlGF;
}


I get the following exception:

Exception in thread "main" java.lang.IllegalArgumentException: Could not open http://www.gurufocus.com/

Page example - http://www.gurufocus.com/stock/AES

Why is that so? Maybe the source type is a bit different or something like that ? Is there any way to block the access to a source code ?

Edit: There's no need to debug the code, it's here just so you can see that this code works.

The entire stacktace:
Exception in thread "main" java.lang.IllegalArgumentException: Could not open http://www.gurufocus.com/
at Algorithms.Tools.In.(In.java:186)
at Investing.TestData.main(TestData.java:16)

Answer

Your problem is that it returning you 403 ;) You can try to add request property to your connection. But I don't know where you are opening it, maybe in In object?

Something like this:

URLConnection connection = new URL("http://www.gurufocus.com/stock/" + symbol).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();

Ok, I tried it and with this request property it is OK, so complete code:

public static void main(String[] args) throws ParseException {
      URL page = null;
      try {
        page = new URL("http://www.gurufocus.com/stock/AES");
        URLConnection connection = page.openConnection();
        connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
        BufferedReader in = new BufferedReader(new InputStreamReader(
            connection.getInputStream(), "UTF-8"));
        String inputLine;
        StringBuilder a = new StringBuilder();
        while ((inputLine = in.readLine()) != null)
          a.append(inputLine);
        in.close();
        System.out.println(a.toString());


      } catch (MalformedURLException e) {
        e.printStackTrace();
      } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }

Just to complete answer, here is good article about webserver security and how to block Bots. In this case, you calling was Bot ;)