devopslife devopslife - 6 months ago 52
Java Question

avro error on AWS EMR

I'm using spark-redshift ( which uses avro for transfer.

Reading from Redshift is OK, while writing I'm getting

Caused by: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter

tried using Amazon EMR 4.1.0 (Spark 1.5.0) and 4.0.0 (Spark 1.4.1).
Cannot do

import org.apache.avro.generic.GenericData.createDatumWriter

either, just

import org.apache.avro.generic.GenericData

I'm using scala shell
Tried download several others avro-mapred and avro jars, tried setting


and adding those jars to spark classpath. Possibly need to tune Hadoop (EMR) somehow.

Does this ring a bell to anyone?


just for reference - workaround by Alex Nastetsky

delete jars from master node

find / -name "*avro*jar" 2> /dev/null -print0 | xargs -0 -I file sudo rm file

delete jars from slave nodes

yarn node -list | sed 's/ .*//g' | tail -n +3 | sed 's/:.*//g' | xargs -I node ssh node "find / -name "*avro*jar" 2> /dev/null -print0 | xargs -0 -I file sudo rm file

Setting configs correctly as proposed by Jonathan is worth a shot too.