Khanh Nguyen Khanh Nguyen - 3 months ago 17
C# Question

Count numbers of repetitive words in a string without breaking order

I have a string like this

var str = "S3;S4;S3;S4;S5;S5;S4;S4;S4"
, I would like to split this into a list like this,

{ {"S3" : 1}, {"S4" : 1}, {"S3" : 1}, {"S4" : 1}, {"S5" : 2}, {"S4" : 3} }


Basically a count for each word in the sequence. I tried to use LINQ group by but it will only give me a sorted list of unique word. Is there a way that I can maintain the order and just count the repetitions of a word?

Thanks for any suggestions or help!

This is what I have so far

var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var grouped = from state in list group state by state.ToState() into g select new { Name = g.Key, Count = g.Count() };


I'm trying to use LINQ by the way...

Please take a look at serhiyb's answer for a non LINQ/Regex and Xiaoy312 for a LINQ/Regex and really nice solution!

Answer

It can be done by mixing Regex and little bit of LINQ :

Regex.Matches("S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;", @"(?<key>.+?)(?<repeated>;\k<key>)*;")
    .Cast<Match>()
    .Select(x => new
    {
        Key = x.Groups["key"].Value,
        Count = 1 + x.Groups["repeated"].Captures.Count
    })

The Regex matches the followings :

  • (?<key>.+?) matches anything and put it into a named group key
  • (?<repeated>;\k<key>) matches any number of repetitions of previously matched key

Result:

Key Count
S3 1 
S4 1 
S5 2 
S4 2 
S3 3 
S4 1