Mauve Ranger Mauve Ranger - 3 months ago 12
Java Question

What Java structure is best to hold HTML trees?

For fun, I'm writing a basic parser that finds data within an HTML document. I want to find the best structure to represent the branches of the parsed file.
The criteria for "best structure" is this: I want to easily search for a tag's relative location and access its contents, like "the image in the second image tag after the third h3 tag in the body" or "the title tag in the header".

I expect to search the first level of tags for the tag I'm looking for, then move into the branch associated with that tag. That's the structure this question is looking for, but if there is a better way to find relative locations in an HTML document, please explain.

So that's the question. More generally, what kind of Java structures can represent tree data structures?


Don't reinvent the wheel, just use an HTML parser like Jsoup, you will be able to get your tags thanks to a CSS selector using the method Element#select(cssQuery).

Document doc = Jsoup.parse(file, encoding);
Elements elements =;