StepUp StepUp - 1 month ago 7
C# Question

Get only Xml among text file

I have many files ".txt" files which has usual text and xml tags in the file. The file is really big and quantity of files is really high. So I want just take xml without text. I know that
tags start from

<body>
and ends with
</body>
. I need take just
<body>
and all nested tags in
<body>


Example of file:

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
<body>
...
</body>

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
<body>
...
</body>

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
<body>
...
</body>

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText

exampleTextexampleTextexampleTextexampleTextexampleTextexampleText
exampleTextexampleTextexampleTextexampleTextexampleTextexampleText


I've tried to use
XDocument doc = XDocument.Parse(str);
, but I've got an exception:


Data at the root level is invalid. Line 1, position 1.

Answer

Try something like code below. It will work if all the lines start with a "<". If not we may need to use Regex.

            StreamReader reader = new StreamReader(FILENAME, Encoding.UTF8);
            string inputLine = "";
            string str = "";
            while ((inputLine = reader.ReadLine()) != null)
            {
                if (inputLine.Trim().StartsWith("<"))
                {
                    str += inputLine + "\n";
                }
            }
Comments