Xeberdee Xeberdee - 23 days ago 6
Python Question

Check for directory pattern Python

I have directories that are created by an application that have a basic structure.

G009832
- SearchData
_SearchDb_
_SearchDb_.dbsync
- WaveformCache
file1.wcf
file2.wcf
ProjectNameFile.apf
ProjectNameSettings.xml
ProjectNameSettingsBinary.abf


This folder and file structure is pretty constant, the only variable is the root folder name, which is a machine name, and the amount of *.wcf waveform cache files present in the WaveformCache folder.

I want to check that users have not added any extra content to any of these folders or to any subfolder before I delete them.

At the moment I'm checking the contents of each folder with os.walk.

illegal = []
if IsMachineFormatted(folder): # Function that checks folder is machine folder.
for root, dirs, files in os.walk(folder):
for f in files:
if f == '_SearchDb_':
pass
elif f == '_SearchDb_.dbsync':
pass
elif f.endswith('.wcf'):
pass
elif f.endswith('.apf'):
pass
elif f.endswith('.xml'):
pass
elif f.endswith('.abf'):
pass
else:
illegal.append(f)
else:
pass


This is ok and returns a list of illegal files, but it isn't very elegant and it doesn't check for extra directories etc. It will also allow any number of .xml or any other allowed types of files through.

I'm pretty sure there is a better way to do this, before I start to improve on this code.

I am not aware of comparison operators for directories with python, and I would be happy enough to return something like a list of dictionaries that contains the structure of any extra files and folders that have been added to the default directory pattern. Eg.

errors = [{'FolderName':['Naughty.txt','illegal.etc'],{'Disobedience':['Here.txt','AndHere.txt']}]

Answer

I just rewrote your code a little. It can be more elegant with any but it less effective (because it requires checking all ends). I'm not sure that I got correctly your question about error because you gave it in the wrong syntax.

Also, I recommend you to break this code to functions to avoid 5 steps nesting:

from collections import defaultdict

ALLOWED_NAMES = ['_SearchDb_', '_SearchDb_.dbsync']
ALLOWED_ENDS = ['.wcf', '.apf', '.xml', '.abf']

def ends_with_any(word, ends):
    for end in ends:
        if word.endswith(end):
            return True
    return False

def is_legal_name(filename, allowed_names, allowed_ends):
    return filename in allowed_names or ends_with_any(f, allowed_ends)

illegal = defaultdict(list)
if IsMachineFormatted(folder): # Function that checks folder is machine folder.
    for dirpath, dirs, files in os.walk(folder):
        for f in files:
            if not is_legal_name(f, ALLOWED_NAMES, ALLOWED_ENDS):
                illegal[dirpath].append(f)