When I try to load an HTML file as XML using
This HTML file may have unneeded spaces and maybe some other errors that I would like SimpleXML to ignore.
I would suggest using PHP Simple HTML DOM. I've used it myself for anything from page scraping to manipulating HTML template files and its very simple and quite powerful and should suit your needs just fine.
Here's a few examples from their docs that show the kind of things you can do:
// Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>'; // Find all links foreach($html->find('a') as $element) echo $element->href . '<br>';