Ruthus99 Ruthus99 - 1 year ago 141
Python Question

Using glob to find duplicate filenames with the same number in it

I am currently writing a script that cycles through all the files in a folder and renames them according to a naming convention.

What I would like to achieve is the following; if the script finds 2 files that have the same number in the filename (e.g. '101 test' and '101 real') it will move those two files to a different folder named 'duplicates'.

My original plan was to use glob to cycle through all the files in the folder and add every file containing a certain number to a list. The list would then be checked in length, and if the length exceeded 1 (i.e. there are 2 files with the same number), then the files would be located to this 'duplicates' folder. However for some reason this does not work.

Here is my code, I was hoping someone with more experience than me can give me some insight into how to achieve my goal, Thanks!:

app = askdirectory(parent=root)

for x in range(804):
listofnames = []
real = os.path.join(app, '*{}*').format(x)
for name in glob.glob(real):
y = len(listofnames)
if y > 1:
for names in listofnames:
path = os.path.join(app, names)
shutil.move(path,app + "/Duplicates")

Answer Source

A simple way is to collect filenames with numbers in a structure like this:

numbers = {
     101: ['101 test', '101 real'],
     93: ['hugo, 93']

and if a list in this dict is longer than one do the move.

import re, os
from collections import defaultdict

app = askdirectory(parent=root)
# a magic dict
numbers = defaultdict(list)

# list all files in this dir
for filename in os.listdir(app):
    # \d+ means a decimal number of any length
    match ='\d+', filename)

    if match is None:
        # no digits found

    #extract the number
    number = int(

    # defaultdict magic

for number, filenames in numbers.items():
    if len(filenames) < 2:
        # not a dupe
    for filename in filenames:
        shutil.move(os.path.join(app, filename),
                    os.path.join(app, "Duplicates"))

defaultdict magic is just a short hand for the following code:

    if number not in numbers:
    numbers[number] = filename
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download