I would like to filter subdirectories (skip them) while creating tar(gz) file with tarfile (python 3.4).
Files on disk:
arcname = test1/thing/bar.jpg
exclude_dir_fullpath = ['/home/myuser/temp/test1/thing', '/home/myuser/temp/test1/lemon']
if any(dirname in item.name for dirname in exclude_dir_fullpath):
print("Exclude fullpath dir matched at: %s" % item.name) # DEBUG
filepath = '/tmp/test.tar.gz'
include_dir = '/home/myuser/temp/test1/'
archive = tarfile.open(name=filepath, mode="w:gz")
archive.add(include_dir, arcname=os.path.basename(include_dir), filter=filter_general)
You want to create a general/re-useable function to filter out files given their absolute path name. I understand that filtering on the archive name is not enough since sometimes it would be OK to include a file or not depending on where it is originated.
First, add a parameter to your filter function
def filter_general(item,root_dir): full_path = os.path.join(root_dir,item.name)
Then, replace your "add to archive" code line by:
archive.add(include_dir, arcname=os.path.basename(include_dir), filter=lambda x: filter_general(x,os.path.dirname(include_dir)))
the filter function has been replaced by a
lambda which passes the directory name of the include directory (else, root dir would be repeated)
Now your filter function knows the root dir and you can filter by absolute path, allowing you to reuse your filter function in several locations in your code.