Neethu Neethu - 1 year ago 122
Bash Question

Get the last updated file in HDFS

I want the latest updated file from one of my HDFS directories. The code should basically loop through the directories and sub directories and the get the latest file path with the file name.I was able to get the latest file in local file system but not sure how to do it for HDFS one.

find /tmp/sdsa -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

The above code is working for local file system. I am able to get the date , time and file name from HDFS, but how do I get the latest file using these 3 parameters?

this is the code I tried:

hadoop fs -ls -R /tmp/apps | awk -F" " '{print $6" "$7" "$8}'

Any help will be appreciated.

Thanks in advance.

Answer Source

This one worked for me :

hadoop fs -ls -R /tmp/app | awk -F" " '{print $6" "$7" "$8}' | sort -nr | head -1 | cut -d" " -f3

The output is the entire file path .


Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download