sinha sinha - 3 months ago 10
Bash Question

Find duplicate entries in a text file using shell

I am trying to find duplicate *.sh entry mention in a text file(test.log) and delete it, using shell program. Since the path is different so uniq -u always print duplicate entry even though there are two first_prog.sh entry in a text file

cat test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/first_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh


output:

/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh


I tried couple of way using few command but dont have idea on how to get above output.

rev test.log | cut -f1 -d/ | rev | sort | uniq -d


Any clue on this?

Answer

awk shines for these kind of tasks but here in a non awk solution,

$ sed 's|.*/|& |' file | sort -k2 -u | sed 's|/ |/|'

/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

or, if your path is balanced (the same number of parents for all files)

$ sort -t/ -k5 -u file

/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh