Lee Lee - 3 months ago 34
C# Question

C# Groupby then Sum after Split CSV (no Headers)

Having read the various post, I think I'm close to getting the result I want but the "sum" part is giving the wrong answer. I suspect its due to the way I call the "sum" as my csv file does not have column headers.

csv file contains the following:


222, 1

223, 2

222, 1

224, 2

222, -1


What I want to is to:


  • read the CSV file and split the data

  • group by first column

  • sum second column by group



Here is what I have done so far:

var newquery = from line in File.ReadAllLines(path_source) //read all lines in csv file
let values = line.Split(',') //split the values by separator
let price_value = values[0]
group line by price_value into g
orderby g.Key
//select g;
select new
{
price_column = g.Key,
qty_column = g.Sum(x => x[1])
};

var writesum = newquery.Select(record => record.price_column + "," + record.qty_column);

File.WriteAllLines(path_target, writesum);


The path_target file shows:


222, 45

223, 80

224, 65


This tells me that the
split
,
groupby
,
orderby
syntax is correct but the
sum
is totally wrong.

I suspect one of the reasons the results for the
sum
is wrong is due to the syntax here
qty_column = g.Sum(x => x[1])
.

As my data does not contain headers, I'm not able to call
x.something
as in the examples.

On the other hand I could be totally wrong on this point!

Any help is greatly appreciated.

Answer

First of all, group the values after they are splitted, not the raw line:

group values by price_value into g

instead of

group line by price_value into g

Next parse the string you want to sum as ints before you sum them:

qty_column = g.Sum(x => int.Parse(x[1]))

instead of

qty_column = g.Sum(x => x[1])

since x[1] is a string.

Now the result is:

var newquery = from line in File.ReadAllLines(path_source) //read all lines in csv file.
                       let values = line.Split(',') //split the values by separator
                       let price_value = values[0]
                       group values by price_value into g
                       orderby g.Key
                       //select g;
                       select new
                       {
                           price_column = g.Key,
                           qty_column = g.Sum(x => int.Parse(x[1]))
                       };

Which gives you:

222, 1

223, 2

224, 2

Anyway, fluent syntax will be somewhat more readable:

var v = new List<string> File.ReadAllLines(path_source) //read all lines in csv file.
    .Select(x => x.Split(','))
    .Select(x => new { Key = x.First(), Value = x.Last() })
    .GroupBy(x => x.Key)
    .Select(x => new
    {
        price_column = x.Key,
        qty_column = x.Sum(y => int.Parse(y.Value))
    });