Max T Max T - 9 months ago 42
C# Question

get Substring from Document with Regex

I'm not very much in Regex and hope to get some help from you guys:

I've got a String like this:

"... p.msochpdefault
{mso-style-name:msochpdefault;} ..."

Now I don't know, whats before and after this part of the string and I don't know the content between the brackets.

I've tried this, but it does take the last ";}" of the file and does not contain "p.msochpdefault"

string match = Regex.Match(str, @"p.msochpdefault(.+);}", RegexOptions.Singleline).Groups[1].Value;

How can I extract this in the right way?

Answer Source

There are a some issues with your RegEx:


  1. You are searching for p.MsoNormal, not for p.msochpdefault.
  2. You have to escape the dot, otherwise it will match any character (p\.MsoNormal or p\.msochpdefault)
  3. The term .+ requires at least one character to be between p\.MsoNormal and ;}. In your example you have none. So it should be .*
  4. You are using greedy evaluation, which is th reason why you catch the last instance of ;}. You have to use lazy evaluation. That is .+? instead of .+ and .*? instead of .* That will catch the first match, not the last.

I would recomend you check a regex evaluator. There are many (also free ones) online. With such a tool you can try your regex and revise it if it doesn't work.