I am trying to set up and run apache nutch 2.2.1 on my ubuntu desktop. As a newbie, I found some parts of the tutorial given by the official website a bit confusing.
- If I were to run it on my own desktop, is it correct to go to the
to run the bin/nutch command?
- Where should I put the file named urls? (in which there a seed list seed.txt) Is it under
If I am in the right directory, I had this problem executing the command
bin/nutch crawl urls -dir crawl -depth 1
InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 0
Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local1613558008_0002
I am following the tutorial 1 http://wiki.apache.org/nutch/NutchTutorial
and have yet to configure GORA Hbase etc.
It seems that this problem arises because the injector did not get the urls.
Does anyone know how to solve this problem? Thanks a lot!