user1207289 user1207289 - 1 year ago 189
Groovy Question

running hadoop wordCount example with groovy

I was trying to run the wordCount example with groovy using this but encounter an error

Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

found this for above error but could not locate pom.xml file in my setup.

Then I came across this. How do we run this in hadoop. Is it by making a jar file and run similarly as the java example?(which ran fine)

What is the difference between running a groovy example using
and by using this file (not sure how to run this) and
? why would we use one method over others.

I've installed hadoop 2.7.1 on mac 10.10.3

Answer Source

I was able to run this groovy file with hadoop 2.7.1 The procedure I followed is

  1. Install gradle
  2. Generate jar file using gradle. I asked this question which helped me build dependencies in gradle
  3. Run with hadoop as usual as we run a java jar file using this command from the folder where jar is located.

    hadoop jar buildSrc-1.0.jar in1 out4

where in1 is input file and out4 is the output folder in hdfs

EDIT- As the above link is broken , I am pasting the groovy file here.

import StartsWithCountMapper
import StartsWithCountReducer
import org.apache.hadoop.conf.Configured
import org.apache.hadoop.fs.Path
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.Mapper
import org.apache.hadoop.mapreduce.Reducer
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
import org.apache.hadoop.util.Tool
import org.apache.hadoop.util.ToolRunner

class CountGroovyJob extends Configured implements Tool {
    int run(String[] args) throws Exception {
        Job job = Job.getInstance(getConf(), "StartsWithCount")

        // configure output and input source
        TextInputFormat.addInputPath(job, new Path(args[0]))

        // configure mapper and reducer

        // configure output
        TextOutputFormat.setOutputPath(job, new Path(args[1]))

        return job.waitForCompletion(true) ? 0 : 1

    static void main(String[] args) throws Exception {
        System.exit( CountGroovyJob(), args))

    class GroovyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable countOne = new IntWritable(1);
        private final Text reusableText = new Text();

        protected void map(LongWritable key, Text value, Mapper.Context context) {
            value.toString().tokenize().each {

    class GroovyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
        private IntWritable outValue = new IntWritable();
        protected void reduce(Text key, Iterable<IntWritable> values, Reducer.Context context) {
            context.write(key, outValue);
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download