Dough Dough - 3 months ago 23
C# Question

Regex for searching for tags in docx files

I'm trying to use Gembox.Document to search a docx file for a tag and to retrieve the value held within the tag. The tag will always be

<!
and
!>
, for example,
<!sometexthere!>
will return sometexthere.

However, I can't get my regex to work properly - I've got the below.

var pattern = Regex.Escape("<!(.*?)!>");


Any help is appreciated. Thanks.

Answer

To get all the values you need use Regex.Matches instead of the Regex.Escape:

var res = Regex.Matches(s, @"<!(.*?)!>")
    .Cast<Match>()
    .Select(s => s.Groups[1].Value)
    .ToList();

The Regex.Escape is only used to escape literal strings to be used inside regular expression patterns, e.g. . will become \. to match a literal dot symbol. Regex.Match searches for a single match, while Regex.Matches will return all non-overlapping matches. Since you need just Group 1 value, the Select clause is quite handy here and .Select(s => s.Groups[1].Value) returns just those values that are captured with Group 1 in the pattern.

See this online C# demo