Hamza Rabbani Hamza Rabbani - 2 months ago 16
JSON Question

Jsoup returning different output from web browser

I have this API to parse.


https://data.studentedge.com.au/api/comments/getpage?page=1&sort=Oldest&url=%2Fforums%2Fdetails%2Fany-surfers-out-there


When I browser with web browser (with or without JavaScript enabled)
it returns this:

{"Items":[{"CommentBody":"<p>I ride a 5'9 and am from the mid north coast</p>\r\n\r\n","MemberName":"Jack F","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/21450f07-ddcc-4f19-8cba-296f22e84ee1.jpeg","PostDate":"2016-09-03T01:38:38+00:00","CommentId":"f1c50066-69b3-4a92-bc0c-a676001b174f","ParentId":null,"PosterId":"28936bc3-f705-45d6-8f94-a5b0004585c6","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"<p>I surf everyday on Google Chrome - SA here ;)</p>\r\n\r\n","MemberName":"Bryan A","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/02a713ee-2ca1-4029-85f8-314878386621.png","PostDate":"2016-09-09T10:36:47+00:00","CommentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","ParentId":null,"PosterId":"5192fcf7-703b-4f78-b6fd-a3a000427119","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":1,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"<p>Same... Chrome's the only thing I surf....</p>\r\n<p>My mate goes 5'10&quot; and also snowboards...</p>\r\n\r\n","MemberName":"Sandy S","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/6542e863-3f04-496d-b4aa-d6adcb16ca39.jpg","PostDate":"2016-09-09T10:51:40+00:00","CommentId":"9479d9f2-845a-48a5-8d28-a67c00b2fcd7","ParentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","PosterId":"165dc3d0-9e3d-484f-b5be-a3a100cfc691","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false}],"PageNumber":1,"Order":"Oldest"}


it's perfect JSON.
But when I use Jsoup it returns.

<html> <head></head> <body> {"Items":[{"CommentBody":" <p>I ride a 5'9 and am from the mid north coast</p>\r\n\r\n","MemberName":"Jack F","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/21450f07-ddcc-4f19-8cba-296f22e84ee1.jpeg","PostDate":"2016-09-03T01:38:38+00:00","CommentId":"f1c50066-69b3-4a92-bc0c-a676001b174f","ParentId":null,"PosterId":"28936bc3-f705-45d6-8f94-a5b0004585c6","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":" <p>I surf everyday on Google Chrome - SA here ;)</p>\r\n\r\n","MemberName":"Bryan A","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/02a713ee-2ca1-4029-85f8-314878386621.png","PostDate":"2016-09-09T10:36:47+00:00","CommentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","ParentId":null,"PosterId":"5192fcf7-703b-4f78-b6fd-a3a000427119","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":1,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":" <p>Same... Chrome's the only thing I surf....</p>\r\n <p>My mate goes 5'10" and also snowboards...</p>\r\n\r\n","MemberName":"Sandy S","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/6542e863-3f04-496d-b4aa-d6adcb16ca39.jpg","PostDate":"2016-09-09T10:51:40+00:00","CommentId":"9479d9f2-845a-48a5-8d28-a67c00b2fcd7","ParentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","PosterId":"165dc3d0-9e3d-484f-b5be-a3a100cfc691","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false}],"PageNumber":1,"Order":"Oldest"} </body></html>


JSOUP code:

Document doc = Jsoup.connect(baseUrl + keyword)
.followRedirects(true)
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0")
.header("Accept-Encoding", "gzip, deflate")
.header("Accept-Language", "en-US,en;q=0.5")
.header("Host", "data.studentedge.com.au")
.header("Origin", "https://studentedge.com.au")
.header("Referer", "https://studentedge.com.au/forums/details/any-surfers-out-there")
.get();
String result = doc.html();


Note: if I use doc.text() it somehow breaks json.

Answer

Use execute and body to get the raw data:

    String result = Jsoup.connect(baseUrl + keyword)
            .followRedirects(true)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0")
            .header("Accept-Encoding", "gzip, deflate")
            .header("Accept-Language", "en-US,en;q=0.5")
            .header("Host", "data.studentedge.com.au")
            .header("Origin", "https://studentedge.com.au")
            .header("Referer", "https://studentedge.com.au/forums/details/any-surfers-out-there")
            .execute().body();