opitzh opitzh - 1 year ago 69
C# Question

C# Regex Performance very slow

I am very new in regex topic. I want to parse log files with following regex:


A log line looks like this:

2001.07.13 09:40:20|1|SomeSection|3|====== Some log message::Type: test=sdfsdf|||.\SomeFile.cpp||60|-1

A log file with appr. 3000 lines takes very long to parse it. Do you have some hints to speed up the performance? Thank you...

I use regex because I use different log files which do not have the same structure and I use it that way:

string[] fileContent = File.ReadAllLines(filePath);
Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat));

foreach (var line in fileContent)
// Split log line
Match match = pattern.Match(line);

string logDate = match.Groups["time"].Value.Trim();
string logLevel = match.Groups["level"].Value.Trim();
// And so on...


Thank you for help. I've tested it with following results:

1.) Only added RegexOptions.Compiled:

From 00:01:10.9611143 to 00:00:38.8928387

2.) Used Thomas Ayoub regex

From 00:00:38.8928387 to 00:00:06.3839097

3.) Used Wiktor Stribiżew regex

From 00:00:06.3839097 to 00:00:03.2150095

So thank you very much for your help!!!

Answer Source

Let me "convert" my comment into an answer since now I see what you can do about the regex performance.

As I have mentioned above, replace all .*? with [^|]*, and also all repeating [|][|][|] with [|]{3} (or similar, depending on the number of [|]. Also, do not use nested capturing groups, that also influences performance!

var logFileFormat = @"(?<time>[^|]*)[|](?<placeholder4>[^|]*)[|](?<source>[^|]*)[|](?<level>[1-3])[|](?<message>[^|]*)[|]{3}(?<placeholder1>[^|]*)[|]{2}(?<placeholder2>[^|]*)[|](?<placeholder3>.*)";

Only the last .* can remain "wildcardish" since it will grab the rest of the line.

Here is a comparison of your and my regex patterns at RegexHero.

enter image description here

Then, use RegexOptions.Compiled:

Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat), RegexOptions.Compiled);