Bohn Bohn - 7 months ago 32
Java Question

Hadoop is asking for the input path to be on localhost 9000

I am trying to run the Tom Whites' Chapter 2 example

When I run the command:

hadoop MaxTemperature input/ncdc/sample.txt output


The error I am getting is this:

11/12/31 18:08:28 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-mymac/mapred/staging/mymac/.staging/job_201112311807_0001
11/12/31 18:08:28 ERROR security.UserGroupInformation: PriviledgedActionException as:mymac (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/mymac/input/ncdc/sample.txt
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/mymac/input/ncdc/sample.txt


What is it that I have set wrong?

I haven't touched his source code, it can be found in here:

https://github.com/tomwhite/hadoop-book/tree/3e/ch02

Answer

Your core-site.xml and hdfs-site.xml files are configured to use localhost:9000. If this isn't what you expect (which is what I get from you post's title), what did you expect?

What version of Hadoop are we talking about? How did you install your Hadoop distribution? From your other question and the config files, I'm guessing you used CHD4. If you look over the instructions from Cloudera, can you see if you missed anything?

Before starting Hadoop, did you format HDFS?

$ hadoop namenode -format

Then, after starting Hadoop, do you get anything other than INFO messages?

Did you copy the input data into HDFS?

$ hadoop dfs -put /tmp/my/input/data input

Finally, what do you get from simple HDFS commands such as:

$ hadoop dfs -ls /

UPDATE: Run Word Count

  1. Get HDFS up and running. Running hadoop dfs -ls / should work.
  2. Copy a folder with text file(s) into HDFS: hadoop dfs -put text_files input_folder
  3. Run hadoop dfs -ls . to see if your files got copied correctly.
  4. Find the hadoop-examples-X.Y.Z.jar file on your system.
  5. Navigate to whatever directory it's in, and run:

    $ hadoop jar hadoop-examples-*.jar WordCount input_folder output_folder.

  6. You should see the progress of the MapReduce application.

  7. When its finished, view the output with hadoop dfs -cat output_folder/*.