wonea wonea - 2 months ago 5
C# Question

Best way to read, modify, and write XML

My plan is to read in an XML document using my C# program, search for particular entries which I'd like to change, and then write out the modified document. However, I've become unstuck because it's hard to differentiate between elements, whether they start or end using XmlTextReader which I'm using to read in the file. I could do with a bit of advice to put me on the right track.

The document is a HTML document, so as you can imagine, it's quite complicated.

I'd like to search for an element id within the HTML document, so for example look for this and change the src;

<img border="0" src="bigpicture.png" width="248" height="36" alt="" id="lookforthis" />


If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument, XElement etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.

You can use XPath where that's appropriate, or the built-in axis methods (Elements(), Descendants(), Attributes() etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.

If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.

For your specific example:

XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
    // src will be null if the attribute is missing
    string src = (string) img.Attribute("src");
    img.SetAttributeValue("src", src + "with-changes");