Graph4Me Graph4Me - 1 year ago 137
Linux Question

Find common files between two folders

Given two root folders A and B,

how can I find duplicate text files between subfolders of A and of B ?

In other words, I am considering the intersection of files from A and B.

I dont want to find duplicate files within A, or within B, but only files, that are in A and in B.


By duplicate I mean files with the same content

Answer Source

As indicated in the comments section, I would generate a single MD5 checksum for each file, just once - then look for duplicated checksums.

Something like this:

find DirA -name \*.txt -exec md5sum {} +  > /tmp/a
find DirB -name \*.txt -exec md5sum {} +  > /tmp/b

Now find all those checksums that occur in both files.

So, along these lines:

awk 'FNR==NR{md5[$1];next}$1 in md5' /tmp/[ab]

or maybe like this:

awk 'FNR==NR{s=$1;md5[s];$1="";name[s]=$0;next}$1 in md5{s=$1;$1="";print name[s] " : " $0}' /tmp/[ab]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download