Unnikrishnan Unnikrishnan - 16 days ago 6
C# Question

Why Dates appearing differently when opened in .CSV file

I have hundreds of lines of data from a text file which is captured using Regex MatchCollection and outputting the same to as a comma demlimited (csv) file for subsequent examination in excel.

My Regex Expression is as follows:-

Regex Line3 = new Regex(@"(?<one>[0-9]{2}-[0-9]{2}-[0-9]{2})\s{1,20}114B\s{1,15}(?<two>\d{1,11})\s{1,15}(?<three>\d{1,11})\s{1,15}(?<four>\d{1,11})\s{1,30}(?<five>\d{1,11})");//<one> catpures the date data.

MatchCollection matches = Line3.Matches(line1);
foreach (Match m in matches)
{
Writer1.WriteLine("")//
//Writer1.Write(line1.Substring(1, 27) + ","); //Do not consider this.
Writer1.Write(m.Groups["one"].Value + ",");
Writer1.Write(m.Groups["two"].Value + ",");
Writer1.Write(m.Groups["three"].Value + ",");
Writer1.Write(m.Groups["four"].Value + ",");
Writer1.Write(m.Groups["five"].Value + ",");
}


My tex file will always contain uniform data that matches with the Regular Expression and my programme is beautifully capturing the required information thanks to the Regular Expressions designers genius.

But when I open the csv file in excel (double clicking the .csv), the column containing the date data appears irregularly like below.

12-04-2012,0,0,0,0, //appears right-aligned in excel.
12-04-2012,0,0,0,0, //this is how it looks like in Editpad Lite.
12-04-2012,0,0,0,0, // these dashes appears in excel as as 12/4/2012
12-04-2012,0,0,0,0, //next five lines as well.
12-04-2012,0,0,0,0,
12-04-2012,0,0,0,0, //
12-04-2012,5467,757488,846815,0,
13-04-12,0,0,0,0, //appears left aligned in excel.
13-04-12,0,0,0,0,
20-04-12,0,0,500,0,
21-04-12,1740,17905,17900,0,
21-04-12,0,0,0,0,
24-04-12,1466,31666,31420,0,


My input file looks like.

12-04-12 114B 0 0 0 0
12-04-12 114B 0 0 0 0
12-04-12 114B 0 0 0 0
12-04-12 114B 0 0 0 0
12-04-12 114B 0 0 0 0
12-04-12 114B 0 0 0 0
12-04-12 114B 5467 757488 846815 0
13-04-12 114B 0 0 0 0
13-04-12 114B 0 0 0 0
20-04-12 114B 0 0 500 0
21-04-12 114B 1740 17905 17900 0
21-04-12 114B 0 0 0 0
24-04-12 114B 1466 31666 31420 0


The output is very much uniform when I examined the .csv file with notepad. The problem crops up only whe I open the csv file in excel.

Could anyone of you help solve the reason for the inconsistency?.

Answer

If you output the dates in yyyy-MM-dd format then Excel should parse them as dates.

You can do that by converting the text dd-MM-yy into a DateTime using an appropriate CultureInfo, it is then easy to write out the date in yyyy-MM-dd format.

using System;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            string inFile = @"C:\temp\sampledata.txt";
            string outFile = @"C:\temp\sampledata.csv";

            //<one> catpures the date data:
            Regex re = new Regex(@"(?<one>[0-9]{2}-[0-9]{2}-[0-9]{2})\s{1,20}114B\s{1,15}(?<two>\d{1,11})\s{1,15}(?<three>\d{1,11})\s{1,15}(?<four>\d{1,11})\s{1,30}(?<five>\d{1,11})");

            using (var sr = new StreamReader(inFile))
            {
                using (var sw = new StreamWriter(outFile))
                {
                    string line1;
                    DateTime dt;
                    var ci = new CultureInfo("ur-PK");
                    while (!sr.EndOfStream)
                    {
                        line1 = sr.ReadLine();
                        MatchCollection matches = re.Matches(line1);
                        foreach (Match m in matches)
                        {
                            dt = DateTime.Parse(m.Groups["one"].Value, ci);
                            sw.Write(dt.ToString("yyyy-MM-dd") + ",");
                            sw.Write(m.Groups["two"].Value + ",");
                            sw.Write(m.Groups["three"].Value + ",");
                            sw.Write(m.Groups["four"].Value + ",");
                            sw.Write(m.Groups["five"].Value + Environment.NewLine);
                        }
                    }
                }
            }
        }
    }
}

I used "ur-PK" as you mentioned that your input file date format is what is used in India and Pakistan, but there are several -IN codes to choose from and I do not know if one of them could be incorrect for your use.

Output using the sample data you showed:

2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,5467,757488,846815,0
2012-04-13,0,0,0,0
2012-04-13,0,0,0,0
2012-04-20,0,0,500,0
2012-04-21,1740,17905,17900,0
2012-04-21,0,0,0,0
2012-04-24,1466,31666,31420,0

When you open the csv file in Excel, it should recognise "2012-04-12" etc. as dates regardless of the Windows date format settings. I do not have Excel to test with.

It should then display the dates in the Windows short date format setting.