Ryan Weil Ryan Weil - 1 month ago 7
C# Question

Find String Between To Identical Control Separators?

I'm reading from a file, and need to find a string that is encapsulated by two identical non-ascii values/control seperators, in this case 'RS'

Image in Notepad++

How would I go about doing this? Would I need some form of regex?

Answer

RS stands for Record Separator, and it has a value of 30 (or 0x1E in hexadecimal). You can use this regular expression:

\x1E([\w\s]*?)\x1E

That matches the RS, then matches any letter, number or space, and then again the RS. The ? is to make the regex match as less characters as possible, in case there are more RS characters afterwards.

If you prefer not to match numbers, you could use [a-zA-Z\s] instead of [\w\s].

Example:

string fileContents = "Something \u001Eyour string\u001E more things \u001Eanother text\u001E end.";
MatchCollection matches = Regex.Matches(fileContents, @"\x1E([\w\s]*?)\x1E");

if (matches.Count == 0)
    return; // Not found, display an error message and exit.

foreach (Match match in matches)
{
    if (match.Groups.Count > 1)
        Console.WriteLine(match.Groups[1].Value);
}

As you can see, you get a collection of Match, and each match.Value will have the whole matched string including the separators. match.Groups will have all matched groups, being the first one again the whole matched string (that's by default) and then each of your groups (those between parenthesis). In this case, you only have one in your regex, so you just need the second one on that list.