Leandro Jimenez Leandro Jimenez - 4 months ago 7
Linux Question

How to exclude rows in which column one and two have the same value?

I have a file with some rows with the same name (col 1, eg gph ans stp) but in the col 2 maximum value and exclude the others rows in linux (shell, eg awk grep perl)

For example:

col1 | col2 | col3 | col4
------+------+--------+-----------
ghp | 25 | data1 | otherdata1
ghp | 23 | data2 | otherdata2
ghp | 23 | data3 | otherdata3
jkl | 58 | data4 | otherdata4
stp | 42 | data5 | otherdata5
stp | 41 | data6 | otherdata6
stp | 41 | data7 | otherdata7


result

col1 | col2 | col3 | col4
------+------+--------+-----------
ghp | 25 | data1 | otherdata2
jkl | 58 | data4 | otherdata4
stp | 42 | data5 | otherdata7


In the same file I have some rows with the same name (col 1, eg ghi ans stu) and in the col 2 the same values, I want to exclude these rows in linux (shell, eg awk grep perl)

For example

col1 | col2 | col3 | col4
------+------+--------+-----------
ghi | 23 | data1 | otherdata1
ghi | 23 | data2 | otherdata2
ghi | 23 | data3 | otherdata3
jkl | 58 | data4 | otherdata4
stu | 41 | data5 | otherdata5
stu | 41 | data6 | otherdata6
stu | 41 | data7 | otherdata7


Result:

col1 | col2 | col3 | col4
------+------+--------+-----------
jkl | 58 | data4 | otherdata4


Thanks

Answer

You can use this awk command:

awk 'NR<=2{print; next} $2>max[$1]{max[$1]=$2; a[$1]=$0} END{for (i in a) print a[i]}' file

col1 | col2 | col3   | col4
------+------+--------+-----------
ghp  | 25   | data1  | otherdata1
jkl  | 58   | data4  | otherdata4
stp  | 42   | data5  | otherdata5
Comments