Danilo Piazzalunga Danilo Piazzalunga - 3 months ago 9
Bash Question

Merge two text files by upserting records (update existing ones, insert new ones)

I am looking for a UNIX shell solution to perform the equivalent of a SQL

MERGE
(or
UPSERT
) between two text files, where some fields are key fields, and other fields can be updated/overwritten.

Input data

This is my original data:

AA;123;2016-01-31;1;456.53
AA;123;2016-02-01;1;75.24
AB;123;2000-08-08;1;756.1
AB;456;2016-07-07;2;8.24
CC;123;2007-07-21;15;10.34
CC;456;2009-09-09;9;943.65
CC;789;2005-04-23;1;1345.6


where the first three fields (Product, Customer and Date) are key fields, and the last two fields (Quantity and Amount) can be updated.

This is my second file which only contains new and updated data:

AA;123;2016-01-31;7;983.63
AA;123;2016-08-24;17;1687.73
CC;456;2009-09-09;11;2161.65
DD;91;2016-08-03;5;98.48


Both files have been sorted with:

sort -t';' -k1 -k2 -k3


The first and the third records should overwrite existing lines (updating Quantity and Amount), whereas the second and the third records should be inserted as new lines.

Desired output

AA;123;2016-01-31;7;983.63
AA;123;2016-08-24;17;1687.73
AA;123;2016-02-01;1;75.24
AB;123;2000-08-08;1;756.1
AB;456;2016-07-07;2;8.24
CC;123;2007-07-21;15;10.34
CC;456;2009-09-09;11;2161.65
CC;789;2005-04-23;1;1345.6
DD;91;2016-08-03;5;98.48


I am looking for some quick solution using
sort
,
uniq
or
awk
and
perl
.

Answer

You want to overwrite the lines in file1 when the index on 1st, 2nd, 3rd fields matches in the file2. That is, the file2 has "preference" over file1, so a line in file1 is just printed if it does not have its correspondence in file2.

If so, what about printing all two files, starting by file2, and skipping those lines that are repeated file1-file2-file3-wise? Then, pipe to sort for a sorted output:

$ awk -F";" '!seen[$1, $2, $3]++' f2 f1 | sort
AA;123;2016-01-31;7;983.63
AA;123;2016-02-01;1;75.24
AA;123;2016-08-24;17;1687.73
AB;123;2000-08-08;1;756.1
AB;456;2016-07-07;2;8.24
CC;123;2007-07-21;15;10.34
CC;456;2009-09-09;11;2161.65
CC;789;2005-04-23;1;1345.6
DD;91;2016-08-03;5;98.48