Dalton L. Heiland Dalton L. Heiland - 1 month ago 14
Python Question

Put os.walk into a list and print in python

I want to get a list of files from a directory and its subdirectories. From the list generated. I want to run a Java program for each file using

subprocess
and write stdout to a single file. How do I do this?

import os
myListOfFiles=[]
for root, dirs, files in os.walk("/home/documents/", topdown=False):
for name in files:
#print(os.path.join(root, name))
myListOfFiles.append(os.path.join(root, name))
print myListOfFiles

p = subprocess.Popen('Java -jar avro-tool-1.8.1.jar, '- o report $filename', stdout=subprocess.PIPE)

Answer

First, I won't answer to the first part (directory scan) because your approach works (even if some comments suggest more elegant list-comprehension solutions)

For the subprocess part, you're not there yet, so let me answer on that.

You have to pass your command in one single string, not two, or even better in a list, so if, say, the filename has spaces/strange chars in it, subprocess will protect the command with quotes.

Then, open a logfile for writing, and run your Popen commands in a loop, writing p.stdout to the open file:

with open("the_log","w") as logfile:
   for inputFile in myListOfFiles:
       p = subprocess.Popen(["java","-jar","avro-tool-1.8.1.jar","repair","-o","report",inputFile],stdout=subprocess.PIPE)
       logfile.write(p.stdout.read())
       p.wait()

Note that standard error will not be written to the output file, unless you pass stderr=subprocess.STDOUT as extra argument.

Of course, you could avoid to first scan all files and put them in a list, then perform a second loop as above. You could merge the subprocess.Popen call in your os.walk loop like this:

import os,subprocess

with open("the_log","w") as logfile:
    for root, dirs, files in os.walk("/home/documents/", topdown=False):
        for name in files:
           inputFile = os.path.join(root, name)
           p = subprocess.Popen(["java","-jar","avro-tool-1.8.1.jar","repair","-o","report",inputFile],stdout=subprocess.PIPE)
           logfile.write(p.stdout.read())
           p.wait()