kamoor kamoor - 1 month ago 17
Java Question

How to disable native zlib compression library in hadoop

I have large number of files stored in gz format and trying to run map-reduce program (using PIG) by reading those files. Problem I am running into is, native Decompressor in Hadoop (ZlibDecompressor) is not able successfully decompresss some of it due to data check. But I am able to read those files successfully using java GZIPInputStream. Now my question is - Is there a way to disable Zlib? Or are there any alternate GZipCodec in hadoop(2.7.2) which I can use to decompress gzip input files?

Error given below

org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1475882463863_0108_m_000022_0 - exited : java.io.IOException: incorrect data check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)


Thank you very much for your help.

Answer

I found the answer myself. You can set following property to disable all native libraries.

io.native.lib.available=false;

or you can extend org.apache.hadoop.io.compress.GzipCodec.java to remove native implementation only for GzipCompressor.