In the program I maintain it is done as in:
# count the files in the archive
length = 0
command = ur'"%s" l -slt "%s"' % (u'path/to/7z.exe', srcFile)
ins, err = Popen(command, stdout=PIPE, stdin=PIPE,
ins = StringIO.StringIO(ins)
for line in ins: length += 1
proc = Popen(command, stdout=PIPE, stdin=PIPE,
out = proc.stdout
# ... count
returncode = proc.wait()
raise Exception(u'Failed reading number of files from ' + srcFile)
To count the number of archive members in a zip archive in Python:
#!/usr/bin/env python import sys from contextlib import closing from zipfile import ZipFile with closing(ZipFile(sys.argv)) as archive: count = len(archive.infolist()) print(count)
It may use
lzma modules if available, to decompress the archive.
To count the number of regular files in a tar archive:
#!/usr/bin/env python import sys import tarfile with tarfile.open(sys.argv) as archive: count = sum(1 for member in archive if member.isreg()) print(count)
It may support
lzma compression depending on version of Python.
You could find a 3rd-party module that would provide a similar functionality for 7z archives.
To get the number of files in an archive using
import os import subprocess def count_files_7z(archive): s = subprocess.check_output(["7z", "l", archive], env=dict(os.environ, LC_ALL="C")) return int(re.search(br'(\d+)\s+files,\s+\d+\s+folders$', s).group(1))
Here's version that may use less memory if there are many files in the archive:
import os import re from subprocess import Popen, PIPE, CalledProcessError def count_files_7z(archive): command = ["7z", "l", archive] p = Popen(command, stdout=PIPE, bufsize=1, env=dict(os.environ, LC_ALL="C")) with p.stdout: for line in p.stdout: if line.startswith(b'Error:'): # found error error = line + b"".join(p.stdout) raise CalledProcessError(p.wait(), command, error) returncode = p.wait() assert returncode == 0 return int(re.search(br'(\d+)\s+files,\s+\d+\s+folders', line).group(1))
import sys try: print(count_files_7z(sys.argv)) except CalledProcessError as e: getattr(sys.stderr, 'buffer', sys.stderr).write(e.output) sys.exit(e.returncode)
To count the number of lines in the output of a generic subprocess:
from functools import partial from subprocess import Popen, PIPE, CalledProcessError p = Popen(command, stdout=PIPE, bufsize=-1) with p.stdout: read_chunk = partial(p.stdout.read, 1 << 15) count = sum(chunk.count(b'\n') for chunk in iter(read_chunk, b'')) if p.wait() != 0: raise CalledProcessError(p.returncode, command) print(count)
It supports unlimited output.
Could you explain why buffsize=-1 (as opposed to buffsize=1 in your previous answer: stackoverflow.com/a/30984882/281545)
bufsize=-1 means use the default I/O buffer size instead of
bufsize=0 (unbuffered) on Python 2. It is a performance boost on Python 2. It is default on the recent Python 3 versions. You might get a short read (lose data) if on some earlier Python 3 versions where
bufsize is not changed to
This answer reads in chunks and therefore the stream is fully buffered for efficiency. The solution you've linked is line-oriented.
bufsize=1 means "line buffered". There is minimal difference from
and also what the read_chunk = partial(p.stdout.read, 1 << 15) buys us ?
It is equivalent to
read_chunk = lambda: p.stdout.read(1<<15) but provides more introspection in general. It is used to implement
wc -l in Python efficiently.