Taner Taner - 3 months ago 18
Bash Question

Duplicated rows must be unique

My file content is as following. I want to first columns values' sum. But 5th columns are not unique and it has not got every second. It should be unique and if there is no any second it is not important. Important that is UNIQUE.

18 /traffic-2.log00980-####<Aug 7, 2016 11:37:34 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:37 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:37:38 PM EEST
11 /traffic-2.log00980-####<Aug 7, 2016 11:37:39 PM EEST
18 /traffic-2.log00980-####<Aug 7, 2016 11:37:40 PM EEST
12 /traffic-2.log00980-####<Aug 7, 2016 11:37:41 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:37:42 PM EEST
18 /traffic-2.log00980-####<Aug 7, 2016 11:37:43 PM EEST
11 /traffic-2.log00980-####<Aug 7, 2016 11:37:44 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:45 PM EEST
18 /traffic-2.log00980-####<Aug 7, 2016 11:37:43 PM EEST
11 /traffic-2.log00980-####<Aug 7, 2016 11:37:44 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:45 PM EEST
12 /traffic-2.log00980-####<Aug 7, 2016 11:37:46 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:47 PM EEST
11 /traffic-2.log00980-####<Aug 7, 2016 11:37:48 PM EEST
17 /traffic-2.log00980-####<Aug 7, 2016 11:37:49 PM EEST
12 /traffic-2.log00980-####<Aug 7, 2016 11:37:50 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:51 PM EEST
9 /traffic-2.log00980-####<Aug 7, 2016 11:37:54 PM EEST
9 /traffic-2.log00980-####<Aug 7, 2016 11:37:55 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:56 PM EEST
12 /traffic-2.log00980-####<Aug 7, 2016 11:37:57 PM EEST
11 /traffic-2.log00980-####<Aug 7, 2016 11:37:58 PM EEST
7 /traffic-2.log00980-####<Aug 7, 2016 11:37:59 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:38:00 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:38:01 PM EEST
9 /traffic-2.log00980-####<Aug 7, 2016 11:37:55 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:37:56 PM EEST
12 /traffic-2.log00980-####<Aug 7, 2016 11:37:57 PM EEST
11 /traffic-2.log00980-####<Aug 7, 2016 11:37:58 PM EEST
7 /traffic-2.log00980-####<Aug 7, 2016 11:37:59 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:38:00 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:38:01 PM EEST
10 /traffic-2.log00980-####<Aug 7, 2016 11:38:02 PM EEST
15 /traffic-2.log00980-####<Aug 7, 2016 11:38:03 PM EEST
13 /traffic-2.log00980-####<Aug 7, 2016 11:38:04 PM EEST

Answer

by awk;

awk  '!seen[$5]++' yourFile | awk '{ sum+=$1} END {print sum}'

-first awk delete duplicate on 5th columns

-second is sum of the first fields.

Comments