I am new to Hadoop ecosystem.
I recently tried Hadoop (2.7.1) on a single-node Cluster without any problems and decided to move on to a Multi-node cluster having 1 namenode and 2 datanodes.
However I am facing a weird issue. Whatever Jobs that I try to run, are stuck with the following message:
on the web interface:
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register
16/01/05 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001
I finally got this solved. Posting detailed steps for future reference. (only for test environment)
Hadoop (2.7.1) Multi-Node cluster configuration
execute these commands in a new terminal
[on all machines] ↴
stop-dfs.sh;stop-yarn.sh;jps rm -rf /tmp/hadoop-$USER
[on Namenode/master only] ↴
rm -rf ~/hadoop_store/hdfs/datanode
[on Datanodes/slaves only] ↴
rm -rf ~/hadoop_store/hdfs/namenode
[on all machines] Add IP addresses and corresponding Host names for all nodes in the cluster.
sudo nano /etc/hosts
xxx.xxx.xxx.xxx master xxx.xxx.xxx.xxy slave1 xxx.xxx.xxx.xxz slave2 # Additionally you may need to remove lines like "xxx.xxx.xxx.xxx localhost", "xxx.xxx.xxx.xxy localhost", "xxx.xxx.xxx.xxz localhost" etc if they exist. # However it's okay keep lines like "127.0.0.1 localhost" and others.
[on all machines] Configure iptables
Allow default or custom ports that you plan to use for various Hadoop daemons through the firewall
much easier, disable iptables
on RedHat like distros (Fedora, CentOS)
sudo systemctl disable firewalld sudo systemctl stop firewalld
on Debian like distros (Ubuntu)
sudo ufw disable
[on Namenode/master only] Gain ssh access from Namenode (master) to all Datnodes (slaves).
ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave2
confirm things by running
ssh slave2 etc. You should have a proper response. (Remember to exit each of your ssh sessions by typing
exit or closing the terminal. To be on the safer side I also made sure that all nodes were able to access each other and not just the Namenode/master.)
[on all machines] edit core-site.xml file
<configuration> <property> <name>fs.defaultFS</name> <value>master:9000</value> <description>NameNode URI</description> </property> </configuration>
[on all machines] edit yarn-site.xml file
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> <description>The hostname of the RM.</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
[on all machines] modify slaves file, remove the text "localhost" and add slave hostnames
(I guess having this only on Namenode/master will also work but I did this on all machines anyway. Also note that in this configuration master behaves only as resource manger, this is how I intent it to be.)
dfs.replicationto something > 1 (at-least to the number of slaves in the cluster; here I have two slaves so I would set it to 2)
[on Namenode/master only] (re)format the HDFS through namenode
hdfs namenode -format
dfs.datanode.data.dirproperty from master's hdfs-site.xml file.
dfs.namenode.name.dirproperty from all slave's hdfs-site.xml file.
TESTING (execute only on Namenode/master)
start-dfs.sh;start-yarn.sh echo "hello world hello Hello" > ~/Downloads/test.txt hadoop fs -mkdir /input hadoop fs -put ~/Downloads/test.txt /input hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
wait for a few seconds and the mapper and reducer should begin.
These links helped me with the issue: