invarbrass invarbrass - 2 months ago 21
C# Question

Regex: Repeated capturing groups

I have to parse some tables from an ASCII text file. Here's a partial sample:

QSMDRYCELL 11.00 11.10 11.00 11.00 -.90 11 11000 1.212
RECKITTBEN 192.50 209.00 192.50 201.80 5.21 34 2850 5.707
RUPALIINS 150.00 159.00 150.00 156.25 6.29 4 80 .125
SALAMCRST 164.00 164.75 163.00 163.25 -.45 80 8250 13.505
SINGERBD 779.75 779.75 770.00 773.00 -.89 8 95 .735
SONARBAINS 68.00 69.00 67.50 68.00 .74 11 3050 2.077


The table consists of 1 column of text and 8 columns of floating point numbers. I'd like to capture each column via regex.

I'm pretty new to regular expressions. Here's the faulty regex pattern I came up with:

(\S+)\s+(\s+[\d\.\-]+){8}


But the pattern captures only the first and the last columns. RegexBuddy also emits the following warning:


You repeated the capturing group
itself. The group will capture only
the last iteration. Put a capturing
group around the repeated group to
capture all iterations.


I've consulted their help file, but I don't have a clue as to how to solve this.

How can I capture each column separately?

Answer

In C# (modified from this example):

string input = "QSMDRYCELL   11.00   11.10   11.00   11.00    -.90      11     11000     1.212";
string pattern = @"^(\S+)\s+(\s+[\d.-]+){8}$";
Match match = Regex.Match(input, pattern, RegexOptions.MultiLine);
if (match.Success) {
   Console.WriteLine("Matched text: {0}", match.Value);
   for (int ctr = 1; ctr < match.Groups.Count; ctr++) {
      Console.WriteLine("   Group {0}:  {1}", ctr, match.Groups[ctr].Value);
      int captureCtr = 0;
      foreach (Capture capture in match.Groups[ctr].Captures) {
         Console.WriteLine("      Capture {0}: {1}", 
                           captureCtr, capture.Value);
         captureCtr++; 
      }
   }
}

Output:

Matched text: QSMDRYCELL   11.00   11.10   11.00   11.00    -.90      11     11000     1.212
...
    Group 2:      1.212
         Capture 0:  11.00
         Capture 1:    11.10
         Capture 2:    11.00
...etc.
Comments