Trowa Trowa - 2 months ago 15
C# Question

Replace some char within a string (XML format)

I was given with a String variable with the following content:

<main>
<Title title="Hello World" />
<Content content="bla bla bla... by <1% to ??? on other bla bla...." />
</main>


This string will eventually passed to a Stored Procedure for XQuery.

As you can see, the content of "Content" contains of char "<" , which when I try to parse in Stored Procedure, it return with an error.

My question is how to convert the "<" into &lt ; (in this case <1% to &lt ;1%) in an efficient way.

I want to retain other "<" as it is.

Tks

Dai Dai
Answer

Since you updated your question to point out you are dealing with XML, but the unencoded values are in attribute values, not #text nodes, then it makes it somewhat simpler, just extract the attribute value using a similar approach to my previous answer, then use a library function to entitize it, then output.

Note that CDATA only applies to #text, not attributes.

String doc =
@"<main>
<Title title=""Hello World"" />
<Content content=""bla bla bla... by <1% to ??? on other bla bla...."" />
</main>";

Int32 contentOpenStart = doc.IndexOf("<Content");
Int32 contentAttribContentValueStart = doc.IndexOf("content=\"", contentOpenStart) + "content=\"".Length;
Int32 contentAttibContentValueEnd    = doc.IndexOf("\"", contentAttribContentValueStart);

String attributeValueOld = doc.Substring( contentAttribContentValueStart, contentAttibContentValueEnd );
String attributeValueNew = System.Net.WebUtility.HtmlEncode( attributeValueOld );

String doc2 = String.Concat(
    doc.Substring( 0, contentAttribContentValueStart );
    attributeValueNew,
    doc.Substring( contentAttibContentValueEnd );
);

doc2 then contains the fixed attribute value.

Note that using HtmlEncode to perform HTML-Encoding of entities is not strictly correct in XML, as the set of XML entities is much smaller than HTML's - indeed, XML is only concerned with &amp;, &gt;, &lt;, &quot; and &apos;, all other values should be in the document as raw/native characters.