Mattlinux1 Mattlinux1 - 3 months ago 50
C# Question

Stripping data from a EPPlus output, from a date range

Quick overview: The main goal is to read in the data from a set date, from the row and get the ref number form the set date e.g. Start date.

For example if i just wanted the data from the date set to the 1st of last month and upwards.

I currently have to extract some data from the excel spreadsheet example below:

Start date Ref number
29/07/2015 2342326
01/07/2016 5697455
02/08/2016 3453787
02/08/2016 5345355
02/08/2015 8364456
03/08/2016 1479789
04/07/2015 9334578


enter image description here

Output using EPPlus:

29/07/2015
2342326
29/07/2016
5697455
02/08/2016
3453787
02/08/2016
5345355
02/08/2015
8364456
03/08/2016
1479789
04/07/2015
9334578


This part is fine, but when i try to strip the output via a date range i'm getting errors, e.g. using LINQ i get the following error output.

An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll

Additional information: Specified cast is not valid.


LINQ code:

var rowsOfInterest = tbl.AsEnumerable()
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
.ToList();


I've also tried to modify the from date range using the datatable:

DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");


But get the following error:

An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll

Additional information: Cannot perform '>=' operation on System.String and System.Double.


The last attempt was to try and see if i can strip the date from within the loop.

Code used:

DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();

if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}

DateTime date1 = new DateTime(2016, 7, 1);

if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}


This sort of works but just lists the set dates and not there values, which is understandable, If i take this route how would i get the dates with there values?

Output:

29/07/2016
02/08/2016
02/08/2016
03/08/2016


The full code example used:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.OleDb;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop.Excel;
using System.Data;
using System.IO;

namespace Number_Cleaner
{
public class NumbersReport
{

//ToDo: Look in to fixing the code so it filters the date correctly with the right output data.
public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
System.Data.DataTable tbl = new System.Data.DataTable();
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{

DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();
//Console.WriteLine(dt);

if (DateTime.TryParse(dt, out dDate))
{
DateTime dts = Convert.ToDateTime(dt);
}

DateTime date1 = new DateTime(2016, 7, 1);

if (dDate >= date1)
{
Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}

//Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}
}
//var rowsOfInterest = tbl.AsEnumerable()
// .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
//.ToList();
//Console.WriteLine(tbl);
//DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");

return tbl;
}
}


Modified from: How to match date to row then get the final column value using EPPlus?

Answer

Based on your code, you are storing everything in your DataTable as strings by calling cell.Text. But using that you are loosing valuable information - the cell data type. You are much better off using cell.Value which will either be a string or a double. With Excel, dates, integers, and decimal values are all stored as doubles.

The error you are seeing has to do with the fact that you store the values as string but query them like DateTime here:

.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))

and here:

"'Start date' >= #1/7/2016#"

If you look at my post here: How to parse excel rows back to types using EPPlus you will see the helper function ConvertSheetToObjects which deals pretty much with what you are trying to do. With a little modification we can turn it into something that takes a WorkSheet and converts it to a DataTable. Like the Object converstion method you should still provide it the expected structure in the form as a DataTable passed it rather then having it try to guess it by casting cell values:

public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable)
{
    //DateTime Conversion
    var convertDateTime = new Func<double, DateTime>(excelDate =>
    {
        if (excelDate < 1)
            throw new ArgumentException("Excel dates cannot be smaller than 0.");

        var dateOfReference = new DateTime(1900, 1, 1);

        if (excelDate > 60d)
            excelDate = excelDate - 2;
        else
            excelDate = excelDate - 1;
        return dateOfReference.AddDays(excelDate);
    });

    //Get the names in the destination TABLE
    var tblcolnames = dataTable
        .Columns
        .Cast<DataColumn>()
        .Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType})
        .ToList();

    //Cells only contains references to cells with actual data
    var cellGroups = worksheet.Cells
        .GroupBy(cell => cell.Start.Row)
        .ToList();

    //Assume first row has the column names and get the names of the columns in the sheet that have a match in the table
    var colnames = cellGroups
        .First()
        .Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
        .Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name))
        .ToList();


    //Add the rows - skip the first cell row
    for (var i = 1; i < cellGroups.Count(); i++)
    {
        var cellrow = cellGroups[i].ToList();
        var tblrow = dataTable.NewRow();
        dataTable.Rows.Add(tblrow);

        colnames.ForEach(colname =>
        {
            //Excel stores either strings or doubles
            var cell = cellrow[colname.index];
            var val = cell.Value;
            var celltype = val.GetType();
            var coltype = tblcolnames.First(tcol => tcol.Name ==  colname.Name).Type;

            //If it is numeric it is a double since that is how excel stores all numbers
            if (celltype == typeof(double))
            {
                //Unbox it
                var unboxedVal = (double)val;

                //FAR FROM A COMPLETE LIST!!!
                if (coltype == typeof (int))
                    tblrow[colname.Name] = (int) unboxedVal;
                else if (coltype == typeof (double))
                    tblrow[colname.Name] = unboxedVal;
                else
                    throw new NotImplementedException($"Type '{coltype}' not implemented yet!");
            }
            else if (coltype == typeof (DateTime))
            {
                //Its a date time
                tblrow[colname.Name] = val;
            }
            else if (coltype == typeof (string))
            {
                //Its a string
                tblrow[colname.Name] = val;
            }
            else
            {
                throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!");
            }
        });

    }

}

To use it on something like this:

enter image description here

You would run this:

[TestMethod]
public void Sheet_To_Table_Test()
{
    //http://stackoverflow.com/questions/38915006/stripping-data-from-a-epplus-output-from-a-date-range

    //Create a test file
    var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx");

    using (var package = new ExcelPackage(fi))
    {
        var workbook = package.Workbook;
        var worksheet = workbook.Worksheets.First();

        var datatable = new DataTable();
        datatable.Columns.Add("Col1", typeof(int));
        datatable.Columns.Add("Col2", typeof(string));
        datatable.Columns.Add("Col3", typeof(double));
        datatable.Columns.Add("Col4", typeof(DateTime));

        worksheet.ConvertSheetToDataTable(ref datatable);

        foreach (DataRow row in datatable.Rows)
            Console.WriteLine(
                $"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" +
                $", Col2({row["Col2"].GetType()}): {row["Col2"]}" +
                $", Col3({row["Col3"].GetType()}): {row["Col3"]}" +
                $", Col4({row["Col4"].GetType()}):{row["Col4"]}}}");

        //To Answer OP's questions
        datatable
            .Select("Col4 >= #01/03/2016#")
            .Select(row => row["Col1"])
            .ToList()
            .ForEach(num => Console.WriteLine($"{{{num}}}"));
    }
}

Which gives this in the output:

row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM}
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM}
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM}
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM}
{8}
{978}