Alexiy Alexiy - 7 months ago 50
Groovy Question

Unable to parse webpage with frameset

I'm trying to parse Groovydoc, but Jsoup doesn't find the frameset in which everything is contained.

Connection connection=Jsoup.connect('')
Document document=connection.get()
Elements element= document.getElementsByTag('frameset')
element.each {println(it)}


If you check the result that is returned by connection.get() you can see that there is no frameset tag:

println document

Now, if you open the site in a browser and use development tools to look at it's html code you can see that the frameset you are looking for is a child of an iframe from source

Just load the iframe url with Jsoup to get the frameset

Connection connection = Jsoup.connect('')
Document document = connection.get()
Elements element = document.getElementsByTag('frameset')
element.each { println it }

Or if you do not want to hardcode the iframe source url to parse, look at this SO answer on how to get the source url