Pushpak Dagade Pushpak Dagade - 6 months ago 12
Python Question

Getting file extension using pattern matching in python

I am trying to find the extension of a file, given its name as a string. I know I can use the function

os.path.splitext
but it does not work as expected in case my file extension is
.tar.gz
or
.tar.bz2
as it gives the extensions as
gz
and
bz2
instead of
tar.gz
and
tar.bz2
respectively.

So I decided to find the extension of files myself using pattern matching.

print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz')group('ext')
>>> gz # I want this to come as 'tar.gz'
print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.bz2')group('ext')
>>> bz2 # I want this to come 'tar.bz2'


I am using
(?P<ext>...)
in my pattern matching as I also want to get the extension.

Please help.

Answer
>>> print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz').group('ext')
gz
>>> print re.compile(r'^.*?[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz').group('ext')
tar.gz
>>>

The ? operator tries to find the minimal match, so instead of .* eating ".tar" as well, .*? finds the minimal match that allows .tar.gz to be matched.