RealityDysfunction RealityDysfunction -4 years ago 124
C# Question

add slash to self-closing tags

I need to parse a chunk of html, I obtain from a page, into an xml. Most of the tags convert fine when I put them into XmlDocument, except self-closing tags that are not closed (xmlDocument does not like those). Unfortunately I cannot add these in the page itself, since it is generated by a third party engine. So I have to add them myself. I am not that great at Regex so I need some help on how to add these "/" to one of these

Appreciate any input.

Answer Source

I would recommend using the HTML Agility Pack to parse it. The pack has the ability to write to XML and will take care of all of the closing of tags for you (as well as CDATA wrapping and other tricky problems you may run into). For example, this is how you can convert HTML to XML:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

string HTML = "<HTML><body><a href ='something'> <img src='a.jpg'></a></HTML>";

doc.LoadHtml(HTML);
MemoryStream ms = new MemoryStream();
XmlWriter xml = XmlWriter.Create(ms);
doc.OptionOutputAsXml = true;
doc.Save(xml);

ms.Position = 0;
StreamReader sr = new StreamReader(ms);
Debug.WriteLine (sr.ReadToEnd());

Which renders the output:

<?xml version="1.0" encoding="iso-8859-1"?><html><body><a href="something"> <img src="a.jpg" /></a></body></html>
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download