Ajim Bagwan Ajim Bagwan - 4 months ago 9
Bash Question

How to delete non-contiguous duplicate lines in vi without sorting?

I know how to remove contiguous duplicates in vi. Either

:%!uniq


or

:g/^\(.*\)$\n\1$/d).


But I have a file which has data in a random order and there are some duplicate lines which are scattered all over the file. How do I remove all these duplicates without disturbing the order of lines? The first unique line should be kept and the next(or rest all) duplicate should be removed?

E.g. cat file1

Here's looking at you, Kid.
Casablanca
Here's looking at you, Kid.
Go ahead, make my day.
Dirty Harry
sleep 5
Go ahead, make my day.
Yippee-ki-yay


Output should be:

Here's looking at you, Kid.
Casablanca
Go ahead, make my day.
Dirty Harry
sleep 5
Yippee-ki-yay

Answer

There is one awk liner very handful for this:

$ awk '!a[$0]++' file
Here's looking at you, Kid.
Casablanca 
Go ahead, make my day. 
Dirty Harry
sleep 5
Yippee-ki-yay

It keeps track of the lines processed in the array a[]. Whenever the line comes again, the counter is already positive so that the condition is false and the line is not printed.

If you want to run it in vim, do:

:%!awk '\!a[$0]++'
        ^^
       you have to escape the ! to be treated properly
Comments