bloody numen bloody numen - 4 months ago 26
C Question

how to use libxml2 to parse dirty html in C programing

The html maybe dirty
such as premature end of data in tag

How can i do it? Thanks

Answer

Using the libxml2 HTML parser it will normalize "dirty" HTML into a normalized tree. see htmlDocPtr htmlParseFile(const char * filename, const char * encoding)

http://xmlsoft.org/html/libxml-HTMLparser.html