Balint Balint - 2 months ago 21
Python Question

Python: how could I access tarfile.add()'s 'name' parameter in add()'s filter method?

I would like to filter subdirectories (skip them) while creating tar(gz) file with tarfile (python 3.4).

Files on disk:


  • /home/myuser/temp/test1/

  • /home/myuser/temp/test1/home/foo.txt

  • /home/myuser/temp/test1/thing/bar.jpg

  • /home/myuser/temp/test1/lemon/juice.png

  • /home/myuser/temp/test1/



Tried to compress
/home/myuser/temp/test1/
by
tarfile.add()
.

I use with- and without-path modes. With full path it's OK, but with short path I have this problem:
directory exclusion doesn't work because tarfile.add() passes the
arcname
parameter to filter method - not
name
parameter!



archive.add(entry, arcname=os.path.basename(entry),
filter=self.filter_general)


Example:

file:
/home/myuser/temp/test1/thing/bar.jpg
->
arcname = test1/thing/bar.jpg


So because of
/home/myuser/temp/test1/thing
element in
exclude_dir_fullpath
, the filter method should exclude this file, but it can not because filter method gets
test1/thing/bar.jpg
.

How could I access tarfile.add()'s 'name' parameter in filter method?

def filter_general(item):
exclude_dir_fullpath = ['/home/myuser/temp/test1/thing', '/home/myuser/temp/test1/lemon']
if any(dirname in item.name for dirname in exclude_dir_fullpath):
print("Exclude fullpath dir matched at: %s" % item.name) # DEBUG
return None
return item


def compress_tar():
filepath = '/tmp/test.tar.gz'
include_dir = '/home/myuser/temp/test1/'
archive = tarfile.open(name=filepath, mode="w:gz")
archive.add(include_dir, arcname=os.path.basename(include_dir), filter=filter_general)

compress_tar()

Answer

You want to create a general/re-useable function to filter out files given their absolute path name. I understand that filtering on the archive name is not enough since sometimes it would be OK to include a file or not depending on where it is originated.

First, add a parameter to your filter function

def filter_general(item,root_dir):
    full_path = os.path.join(root_dir,item.name)

Then, replace your "add to archive" code line by:

archive.add(include_dir, arcname=os.path.basename(include_dir), filter=lambda x: filter_general(x,os.path.dirname(include_dir)))

the filter function has been replaced by a lambda which passes the directory name of the include directory (else, root dir would be repeated)

Now your filter function knows the root dir and you can filter by absolute path, allowing you to reuse your filter function in several locations in your code.

Comments