Katya Handler Katya Handler - 7 days ago 5
Python Question

Check if a file exists in HDFS from Python

So, I've been using the fabric package in Python to run shell scripts for various HDFS tasks.

However, whenever I run tasks to check if a file / directory already exists in HDFS, it simply quits the shell. Here is an example (I am using Python 3.5.2 and Fabric3==1.12.post1)

from fabric.api import local


local('hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/')


If the directory does not exist, this code yields


[localhost] local: hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/
stat: `hdfs://some/nonexistent/hdfs/dir/': No such file or directory

Fatal error: local() encountered an error (return code 1) while
executing 'hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/'

Aborting.


I also tried
local('hadoop fs -test -e hdfs://some/nonexistent/hdfs/dir/')
but it caused the same issue.

How can I use fabric to generate a boolean variable that will tell me whether or not a directory or file exists in hdfs?

2ps 2ps
Answer

You can just check the succeeded flag of the result object returned from local.

from fabric.api import local
from fabric.context_managers import settings

file_exists = False
with settings(warn_only=True):
    result = local('hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/', capture=True)
    file_exists = result.succeeded