sigmazen sigmazen - 1 year ago 58
Bash Question

unix awk subtracting integer field from month of date field

I have a file with around 10MM records. Here is my dateSample src file:

0000000566 2017/01/01 0
0000000055 2017/01/01 0
0000000109 2017/01/01 1
0000000940 2017/01/01 0
0000000566 2017/01/01 1
0000000055 2017/01/01 1
0000000109 2017/01/01 2


I essentially need to subtract the last integer value off of the month in the date and print the new value without the integer, thus:

0000000566 2017/01/01
0000000055 2017/01/01
0000000109 2016/12/01
0000000940 2017/01/01
0000000566 2016/12/01
0000000055 2016/12/01
0000000109 2016/11/01


I've been having real troubles with date (or gdate on macOS whilst testing) and been searching in vain for the last couple of days.

It's either prefixing with a zero and dropping the m and d values:

awk '{ print (gdate -d $2 +"%Y/%m/%d") }' <$src


or suffixing with a zero and subtracting the integer from the year:

awk '{ print (gdate -d $2 +-$3 months +"%Y/%m/%d") }' <$src


or mashing the whole thing together still isn't correct:

awk '{ print gdate -d (gdate -d $2 +"%Y/%m/%d") +-$3 months +"%Y/%m/%d" }' <$src


I found the following excellent response:
Increment date with AWK for few days and months
which is doing exactly what I want, but it is running very very slow which I'm assuming is because of a command within a command.

Here is current awk (I'm using gdate because I'm running on macOS BSD for now):

awk '{ cmd=" gdate -d \"$(gdate -d \""$2"\")+\"-"$3"\"months\" \"+%Y/%m/%d\" ";
cmd | getline fmtDate; close(cmd);
print $1, fmtDate
}' <$src


So I basically need that output in a performant way.

Thanks in advance for any guidance / rewrite.

Cheers

Answer Source

If your awk supports time functions mktime and strftime (which are a GNU extension), you can simply do it like this:

awk -F'[ /]' '{print $1 " " strftime("%Y/%m/%d", mktime($2" "($3-$5)" "$4" 0 0 0"))}' file

First we convert the date into a Unix timestamp. mktime accepts dates only in "YYYY MM DD HH MM SS" format, that's why we need to construct it manually. But it does the normalization automatically, and it will happily convert "2017 -1 1 0 0 0" to the same timestamp as "2016 11 1 0 0 0".

After that we just need to convert the timestamp to "y/m/d" format and print it.


Or, you could do the date arithmetic "by hand" in the simple case that doesn't require date normalization -- if the day of the month is always <= 28. (For days greater that 28, like 31, you would also need to add clipping/clamping or overflowing to the script below, but then you would have to take care of leap years, etc.)

#!/usr/bin/awk -f

BEGIN {
    FS = "[ /]";
}

{
    mm = $2 * 12 + ($3 - 1) - $5;
    y = int(mm / 12);
    m = mm % 12 + 1;
    d = $4;
    printf("%s %04d/%02d/%02d\n", $1, y, m, d);
}

So, the idea is simple. We split the line on spaces and slashes, so we can convert year/month into a total number of months (12 * y + m). Then we subtract the month from the last column and convert the total number of months back to year/month via divmod operation.

Output:

$ ./script.awk file
0000000566 2017/01/01
0000000055 2017/01/01
0000000109 2016/12/01
0000000940 2017/01/01
0000000566 2016/12/01
0000000055 2016/12/01
0000000109 2016/11/01
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download