pynovice pynovice - 6 months ago 59
Python Question

What's the way to extract file extension from file name in Python?

The file names are dynamic and I need to extract the file extension. The file names look like this:
parallels-workstation-parallels-en_US-6.0.13976.769982.run.sh

20090209.02s1.1_sequence.txt
SRR002321.fastq.bz2
hello.tar.gz
ok.txt


For the first one I want to extract
txt
, for the second one I want to extract
fastq.bz2
, for the third one I want to extract
tar.gz
.

I am using os module to get the file extension as:

import os.path
extension = os.path.splitext('hello.tar.gz')[1][1:]


This gives me only gz which is fine if the file name is
ok.txt
but for this one I want the extension to be
tar.gz
.

Answer
import os

def splitext(path):
    for ext in ['.tar.gz', '.tar.bz2']:
        if path.endswith(ext):
            return path[:-len(ext)], path[-len(ext):]
    return os.path.splitext(path)

assert splitext('20090209.02s1.1_sequence.txt')[1] == '.txt'
assert splitext('SRR002321.fastq.bz2')[1] == '.bz2'
assert splitext('hello.tar.gz')[1] == '.tar.gz'
assert splitext('ok.txt')[1] == '.txt'

Removing dot:

import os

def splitext(path):
    for ext in ['.tar.gz', '.tar.bz2']:
        if path.endswith(ext):
            path, ext = path[:-len(ext)], path[-len(ext):]
            break
    else:
        path, ext = os.path.splitext(path)
    return path, ext[1:]

assert splitext('20090209.02s1.1_sequence.txt')[1] == 'txt'
assert splitext('SRR002321.fastq.bz2')[1] == 'bz2'
assert splitext('hello.tar.gz')[1] == 'tar.gz'
assert splitext('ok.txt')[1] == 'txt'