MattR MattR -4 years ago 79
Python Question

Python Regex error

I have a list of file paths, with the file name containing something I need to retrieve.


Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between
. Below is the code and error I get:

import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']

pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
matches =re.findall(pattern, x)

#error i get below
error: missing ), unterminated subpattern at position 0

So following the error i added an extra
and it works:

pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
# ^ added right there

What exactly is that extra
doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary

Answer Source

You're using the wrong tool. I'd recommend the os module for what you want to accomplish:

import os

files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
for file in files:
    base = os.path.basename(file)

This will print exactly what you want:


You can also wrap this as a one-liner inside a function as stated in comments:

import os

def get_filename(files):
    return [os.path.splitext(os.path.basename(file))[0] for file in files]

if __name__ == '__main__':
    files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download