masa masa - 5 months ago 34
Javascript Question

Getting access to the original HTML in HtmlUnit HtmlElement?

I am using HtmlUnit to read content from a web site.

Everything works perfectly to the point where I am reading the content with:

HtmlDivision div = page.getHtmlElementById("my-id");

returns the expected String object, but I want to get the original HTML inside
as a String object. How can I do that?

I am not willing to change
to something else, as the web site expects the client to run JavaScript, and
seems to be capable of doing what is required.


If by original HTML you mean the HTML code that HTMLUnit has already formatted then you can use div.asXml(). Now, if you really are looking for the original HTML the server sent you then you won't find a way to do so (at least up to v2.14).

Now, as a workaround, you could get the whole text of the page that the server sent you with this answer: How to get the pure raw HTML of a page in HTMLUnit while ignoring JavaScript and CSS?

As a side note, you should probably think twice why you need the HTML code. HTMLUnit will let you get the data from the code, so there shouldn't be any need to store the source code but rather the information it is contained in it. Just my 2 cents.