Yazad Khambata Yazad Khambata - 22 days ago 6
Java Question

Dump Avro bytes without schema

I am writing a simple Java tool to dump the contents of a kafka topic on the console. The value of the Kafka record happens to be an Avro object. I want to be able to dump Avro data in some human readable format. I want to know if there is a way to print the contents of one Avro object (which I have as a byte array) to some human readable format?

KafkaConsumer<String, byte[]> kafkaConsumer = createConsumer(); //Create a consumer with my config

ConsumerRecords<String, byte[]> records = kafkaConsumer.poll(200);

for (ConsumerRecord<String, byte[]> record : records) {
byte[] myAvroDataAsBytes = record.value();
//TODO: How do I print these bytes without knowing the schema?
}


In the above snippet I am looking for a way to print the contents of myAvroDataAsBytes without knowing the schema associated with the Avro object bytes.

Answer

As stated in the documentation, it is not possible to parse the data without providing a schema.

[...] Avro data itself is not tagged with type information. The schema is required to parse data.

Unlike protobuf, Avro does not store any field information in the serialized data. This is a design choice that cannot be circumvented.

Many articles have been written about pros & cons of this approach. Schema evolution in Avro, Protocol Buffers and Thrift by Martin Kleppmann is a very good introduction to understand how things works under the hood and what it entails.