Pushpak Dagade Pushpak Dagade - 1 year ago 75
Python Question

Getting file extension using pattern matching in python

I am trying to find the extension of a file, given its name as a string. I know I can use the function

os.path.splitext
but it does not work as expected in case my file extension is
.tar.gz
or
.tar.bz2
as it gives the extensions as
gz
and
bz2
instead of
tar.gz
and
tar.bz2
respectively.

So I decided to find the extension of files myself using pattern matching.

print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz')group('ext')
>>> gz # I want this to come as 'tar.gz'
print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.bz2')group('ext')
>>> bz2 # I want this to come 'tar.bz2'


I am using
(?P<ext>...)
in my pattern matching as I also want to get the extension.

Please help.

Answer Source
>>> print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz').group('ext')
gz
>>> print re.compile(r'^.*?[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz').group('ext')
tar.gz
>>>

The ? operator tries to find the minimal match, so instead of .* eating ".tar" as well, .*? finds the minimal match that allows .tar.gz to be matched.