Ruthus99 Ruthus99 - 1 month ago 7
Python Question

Using glob to find duplicate filenames with the same number in it

I am currently writing a script that cycles through all the files in a folder and renames them according to a naming convention.

What I would like to achieve is the following; if the script finds 2 files that have the same number in the filename (e.g. '101 test' and '101 real') it will move those two files to a different folder named 'duplicates'.

My original plan was to use glob to cycle through all the files in the folder and add every file containing a certain number to a list. The list would then be checked in length, and if the length exceeded 1 (i.e. there are 2 files with the same number), then the files would be located to this 'duplicates' folder. However for some reason this does not work.

Here is my code, I was hoping someone with more experience than me can give me some insight into how to achieve my goal, Thanks!:

app = askdirectory(parent=root)




for x in range(804):
listofnames = []
real = os.path.join(app, '*{}*').format(x)
for name in glob.glob(real):
listofnames.append(name)
y = len(listofnames)
if y > 1:
for names in listofnames:
path = os.path.join(app, names)
shutil.move(path,app + "/Duplicates")

Answer

A simple way is to collect filenames with numbers in a structure like this:

numbers = {
     101: ['101 test', '101 real'],
     93: ['hugo, 93']
}

and if a list in this dict is longer than one do the move.

import re, os
from collections import defaultdict

app = askdirectory(parent=root)
# a magic dict
numbers = defaultdict(list)

# list all files in this dir
for filename in os.listdir(app):
    # \d+ means a decimal number of any length
    match = re.search('\d+', filename)

    if match is None:
        # no digits found
        continue

    #extract the number
    number = int(match.group())

    # defaultdict magic
    numbers[number].append(filename)

for number, filenames in numbers.items():
    if len(filenames) < 2:
        # not a dupe
        continue
    for filename in filenames:
        shutil.move(os.path.join(app, filename),
                    os.path.join(app, "Duplicates"))

defaultdict magic is just a short hand for the following code:

    if number not in numbers:
        numbers.append(list())
    numbers[number] = filename
Comments