I am trying to get some stats for a directory in hdfs. I am trying to get the no of files/subdirs and the size for each. I started out thinking that I can do this in bash.
OP=$(hadoop fs -ls hdfs://mydirectory)
echo $(wc -l < "$OP")
hadoop fs -ls from python
See https://docs.python.org/2/library/commands.html for your options, including how to get the return status (in case of an error). The basic code you're missing is
import commands hdir_list = commands.getoutput('hadoop fs -ls hdfs://mydirectory')
Yes: deprecated in 2.6, still useful in 2.7, but removed from Python 3. If that bothers you, switch to
os.command (<code string>)
... or better yet use subprocess.call (introduced in 2.4).