user1207289 user1207289 - 1 month ago 25
Groovy Question

running hadoop wordCount example with groovy

I was trying to run the wordCount example with groovy using this but encounter an error

Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected


found this for above error but could not locate pom.xml file in my setup.

Then I came across this. How do we run this in hadoop. Is it by making a jar file and run similarly as the java example?(which ran fine)

What is the difference between running a groovy example using
groovy-hadoop
and by using this file (not sure how to run this) and
hadoop-streaming
? why would we use one method over others.

I've installed hadoop 2.7.1 on mac 10.10.3

Answer

I was able to run this groovy file with hadoop 2.7.1 The procedure I followed is

  1. Install gradle
  2. Generate jar file using gradle. I asked this question which helped me build dependencies in gradle
  3. Run with hadoop as usual as we run a java jar file using this command from the folder where jar is located.

    hadoop jar buildSrc-1.0.jar in1 out4

where in1 is input file and out4 is the output folder in hdfs

EDIT- As the above link is broken , I am pasting the groovy file here.

import StartsWithCountMapper
import StartsWithCountReducer
import org.apache.hadoop.conf.Configured
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.IntWritable
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.Mapper
import org.apache.hadoop.mapreduce.Reducer
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
import org.apache.hadoop.util.Tool
import org.apache.hadoop.util.ToolRunner


class CountGroovyJob extends Configured implements Tool {
    @Override
    int run(String[] args) throws Exception {
        Job job = Job.getInstance(getConf(), "StartsWithCount")
        job.setJarByClass(getClass())

        // configure output and input source
        TextInputFormat.addInputPath(job, new Path(args[0]))
        job.setInputFormatClass(TextInputFormat)

        // configure mapper and reducer
        job.setMapperClass(StartsWithCountMapper)
        job.setCombinerClass(StartsWithCountReducer)
        job.setReducerClass(StartsWithCountReducer)

        // configure output
        TextOutputFormat.setOutputPath(job, new Path(args[1]))
        job.setOutputFormatClass(TextOutputFormat)
        job.setOutputKeyClass(Text)
        job.setOutputValueClass(IntWritable)

        return job.waitForCompletion(true) ? 0 : 1
    }

    static void main(String[] args) throws Exception {
        System.exit(ToolRunner.run(new CountGroovyJob(), args))
    }

    class GroovyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable countOne = new IntWritable(1);
        private final Text reusableText = new Text();

        @Override
        protected void map(LongWritable key, Text value, Mapper.Context context) {
            value.toString().tokenize().each {
                reusableText.set(it)
                context.write(reusableText,countOne)
            }
        }
    }

    class GroovyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
        private IntWritable outValue = new IntWritable();
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Reducer.Context context) {
            outValue.set(values.collect({it.value}).sum())
            context.write(key, outValue);
        }
    }
}