JourneyMan JourneyMan - 2 months ago 8
Linux Question

Changing format from %d/%m/%y to %Y-%m-%d in a csv file using AWK on Linux (UBUNTU)

I have hundreds of csv files with date format as %d / %m / %y %H:%M:%S but i want to change them to the format %Y-%m-%d %H:%M:%S

INPUT_FILE.csv (date format == %d/%m/%y %H:%M:%S )
13/05/87 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
14/05/87 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
15/05/87 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
16/05/87 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
17/05/87 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326
......
......
......
24/02/09 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
25/02/09 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
26/02/09 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
27/02/09 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
28/02/09 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326


REQUIRED_OUTPUT.csv (date format == %Y-%m-%d %H:%M:%S )

1987-05-13 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
1987-05-14 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
1987-05-15 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
1987-05-16 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
1987-05-17 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326
......
......
......
2009-02-24 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
2009-02-25 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
2009-02-26 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
2009-02-27 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
2009-02-28 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326


I have tried a couple of AWK variations but i could not get it to work . any help

Update: my mistake , i should have mentioned that the dates or years start from 1981 to 2016

here is what i have tried so far:

awk -F, '{ gsub("/","-"); split($1, f, " "); print > ("my_data_" f[1]"v" ".csv")}' INPUT_FILE.csv


i get files split into e.g

my_data_13-05-87v.csv
my_data_14-05-87v.csv
my_data_15-05-87v.csv


with file contents as below

# for my_data_13-05-87v.csv
13-05-87 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356

# for my_data_14-05-87v.csv
14-05-87 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493

# for my_data_15-05-87v.csv
15-05-87 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427


NOTE: Years go from 1981 to 2016

I want to split the files into e.g

my_data_1987-05-13v.csv
my_data_1987-05-13v.csv
my_data_1987-05-13v.csv


with file contents as below

# for my_data_1987-05-13v.csv
1987-05-13 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356

# for my_data_1987-05-14v.csv
1987-05-14 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493

# for my_data_1987-05-15v.csv
1987-05-15 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427

Answer

You only need to redefine the input field separator to / and and reorder the first three fields. Also, if the year field has a value >16, assume 20th century, otherwise 21st. And while at it, it writes the lines to files named per dates:

$ cat script.awk
{
    print ($3>16?"19":"20") $3 "-" $2 "-" $1, $4 > my_data_$1"-"$2"-"$3".csv"
}

Run it:

$ awk -F'[/ ]' -f script.awk INPUT_FILE.csv
Comments