newbie newbie - 1 month ago 4x
Python Question

Hadoop commands from python

I am trying to get some stats for a directory in hdfs. I am trying to get the no of files/subdirs and the size for each. I started out thinking that I can do this in bash.

OP=$(hadoop fs -ls hdfs://mydirectory)
echo $(wc -l < "$OP")

I only have this much so far and I quickly realised that python might be a better option for this. However I am not able to figure out how to execute hadoop commands like
hadoop fs -ls from python


See for your options, including how to get the return status (in case of an error). The basic code you're missing is

import commands

hdir_list = commands.getoutput('hadoop fs -ls hdfs://mydirectory')

Yes: deprecated in 2.6, still useful in 2.7, but removed from Python 3. If that bothers you, switch to

os.command (<code string>)

... or better yet use (introduced in 2.4).