MattR MattR -4 years ago 79
Python Question

Python Regex error

I have a list of file paths, with the file name containing something I need to retrieve.

C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx


Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between
\
and
.xslx
. Below is the code and error I get:

import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']

pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
matches =re.findall(pattern, x)
print(matches)

#error i get below
error: missing ), unterminated subpattern at position 0


So following the error i added an extra
)
and it works:

pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
# ^ added right there


What exactly is that extra
)
doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary

Answer Source

You're using the wrong tool. I'd recommend the os module for what you want to accomplish:

import os

files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
for file in files:
    base = os.path.basename(file)
    print(os.path.splitext(base)[0])

This will print exactly what you want:

thing1
thing2

You can also wrap this as a one-liner inside a function as stated in comments:

import os


def get_filename(files):
    return [os.path.splitext(os.path.basename(file))[0] for file in files]

if __name__ == '__main__':
    files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
    print(get_filename(files))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download