I have a project which requires me to load an HTML document as a string, and parse it. I am trying to determine which HTML node will exceed the height of a page (8.5x11) so I can insert a ‘page-break-after’ before it. This will be done with a .NET dll I am producing.
I have tried using the mshtml dom. It’s not easy to load a string value into this, and when I did manage to accomplish this the offsetHeight (etc) properties always return zero. The only way I have found to make this work is to save the HTML to disk, load it via SHDocVw.InternetExplorer, and then pass that to the mshtml dom.
I’m assuming that unless the HTML is ‘rendered’ by SHDocVw, I have no offsetHeight information for mshtml to report, as this is based on screen pixels. I could be wrong.
My current code is as follows:
Dim myIE As New SHDocVw.InternetExplorer
Dim myDoc As mshtml.HTMLDocument = CType(myIE.Document, mshtml.HTMLDocument)
Dim divTag As mshtml.IHTMLElement = myDoc.getElementById("someID")
For Each childNode As mshtml.IHTMLElement In TryCast(divTag.children, mshtml.IHTMLElementCollection)
If childNode.offsetTop + childNode.offsetHeight > 750 Then '72pixels = 1 inch.
childNode.insertAdjacentHTML("beforeBegin", "<DIV style='page-break-after:always'></DIV>")
WebBrowser (maybe, not sure) will take your HTML string and convert it to a navigable DOM. Reuse, don't reinvent an HTML parser. you'll have more hair left at the end of your project.