newbie newbie - 2 months ago 17
Python Question

Hadoop commands from python

I am trying to get some stats for a directory in hdfs. I am trying to get the no of files/subdirs and the size for each. I started out thinking that I can do this in bash.

#!/bin/bash
OP=$(hadoop fs -ls hdfs://mydirectory)
echo $(wc -l < "$OP")


I only have this much so far and I quickly realised that python might be a better option for this. However I am not able to figure out how to execute hadoop commands like
hadoop fs -ls from python

Answer

See https://docs.python.org/2/library/commands.html for your options, including how to get the return status (in case of an error). The basic code you're missing is

import commands

hdir_list = commands.getoutput('hadoop fs -ls hdfs://mydirectory')

Yes: deprecated in 2.6, still useful in 2.7, but removed from Python 3. If that bothers you, switch to

os.command (<code string>)

... or better yet use subprocess.call (introduced in 2.4).