view raw
Adam Hughes Adam Hughes - 7 months ago 82
Python Question

Python regex numbers and underscores

I'm trying to get a list of files from a directory whose file names follow this pattern:


For example


Can't seem to get the right regex. I've tried the following:

pattern = re.compile(r'(\d{4})_(\d{2})_(\d{2}).dat')
>>> []

pattern = re.compile(r'*(\d{4})_(\d{2})_(\d{2}).dat')
>>> sre_constants.error: nothing to repeat

Regex is certainly a weakpoint for me. Can anyone explain where I'm going wrong?

To get the files, I'm doing:

files = [f for f in os.listdir(directory) if pattern.match(f)]

PS, how would I allow for .dat and .DAT (case insensitive file extension)?



You have two issues with your expression: re.compile(r'(\d{4})_(\d{2})_(\d{2}).dat')

The first one, as a previous comment stated, is that the . right before dat should be escaped by putting a backslash (\) before. Otherwise, python will treat it as a special character, because in regex . represents "any character".

Besides that, you're not handling uppercase exceptions on your expression. You should make a group for this with dat and DAT as possible choices.

With both changes made, it should look like:


As an extra note, I added ?: at the beginning of the group so the regex matcher ignores it at the results.