change change - 1 year ago 128
Apache Configuration Question

Nutch - Job Failed - ERROR mapred.FileOutputCommitter - Mkdirs failed to create file

I am trying to follow simple the steps on the Nutch tutorial. This is my first time using Nutch.

All goes good till I execute the following command:

bin/nutch crawl bin/urls -dir crawl -depth 3 -topN 5 -threads 1

which gives me the following error

log4j:ERROR setFile(null,true) call failed /usr/local/nutch/framework/apache-nutch-1.6/logs/hadoop.log (No such file or directory)
at Method)
at org.apache.log4j.FileAppender.setFile(
at org.apache.log4j.FileAppender.activateOptions(
at org.apache.log4j.DailyRollingFileAppender.activateOptions(
at org.apache.log4j.config.PropertySetter.activate(
at org.apache.log4j.config.PropertySetter.setProperties(
at org.apache.log4j.config.PropertySetter.setProperties(
at org.apache.log4j.PropertyConfigurator.parseAppender(
at org.apache.log4j.PropertyConfigurator.parseCategory(
at org.apache.log4j.PropertyConfigurator.configureRootCategory(
at org.apache.log4j.PropertyConfigurator.doConfigure(
at org.apache.log4j.PropertyConfigurator.doConfigure(
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(
at org.apache.log4j.LogManager.<clinit>(
at org.slf4j.impl.Log4jLoggerFactory.getLogger(
at org.slf4j.LoggerFactory.getLogger(
at org.slf4j.LoggerFactory.getLogger(
at org.apache.nutch.crawl.Crawl.<clinit>(
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA].
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = bin/urls
threads = 1
depth = 3
topN = 5
Injector: starting at 2013-04-02 19:08:03
Injector: crawlDb: crawl/crawldb
Injector: urlDir: bin/urls
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 0
Injector: total number of urls injected after normalization and filtering: 1
Injector: Merging injected urls into crawl db.
Exception in thread "main" Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(
at org.apache.nutch.crawl.Injector.inject(
at org.apache.nutch.crawl.Crawl.main(

My bin directory has:

  1. nutch

  2. crawl

  3. urls/seeds.txt

Not sure where the problem is.

has following error:

2013-04-03 17:33:18,370 ERROR mapred.FileOutputCommitter - Mkdirs failed to create file:/usr/local/nutch/framework/apache-nutch-1.6/bin/crawl/crawldb/1971189408/_temporary

2013-04-03 17:33:21,394 WARN mapred.LocalJobRunner - job_local_0002 The temporary job-output directory file:/usr/local/nutch/framework/apache-nutch-1.6/bin/crawl/crawldb/1971189408/_temporary doesn't exist!

Answer Source

The issue was with -dir crawl.

You need to mention the correct directory path/name.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download