Uknight Uknight - 21 days ago 6
Android Question

GZIP Compression Java/C# difference in compression issue

I'm adding in compression to my project with the aim of improving speed in the 3G Data communication from Android app to ASP.NET C# Server.

The methods I've researched/written/tested works. However, there's added white space after compression. And they are different as well. This really puzzles me.

Is it something to do with different implementation of the GZIP classes in both Java/ASP.NET C#? Is it something that I should be concerned with or do I just move on with .Trim() and .trim() after decompressing?



Java, compressing "Mary had a little lamb" gives:

Compressed data length: 42

Base64 Compressed String: H4sIAAAAAAAAAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA

protected static byte[] GZIPCompress(byte[] data) {
try {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
GZIPOutputStream gZIPOutputStream = new GZIPOutputStream(byteArrayOutputStream);

gZIPOutputStream.write(data);
gZIPOutputStream.close();

return byteArrayOutputStream.toByteArray();
} catch(IOException e) {
Log.i("output", "GZIPCompress Error: " + e.getMessage());
return null;
}
}



ASP.NET C#, compressing "Mary had a little lamb"

Compressed data length: 137

Base64 Compressed String: H4sIAAAAAAAEAO29B2AcSZYlJi9tynt/SvVK1+B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee++999577733ujudTif33/8/XGZkAWz2zkrayZ4hgKrIHz9+fB8/Ir7I6ut0ns3SLC2Lti3ztMwWk/8Hesi6MhYAAAA=

public static byte[] GZIPCompress(byte[] data)
{
using (MemoryStream memoryStream = new MemoryStream())
{
using (GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
{
gZipStream.Write(data, 0, data.Length);
}

return memoryStream.ToArray();
}
}

Answer

I get 42 bytes on .NET as well. I suspect you're using an old version of .NET which had a flaw in its compression scheme.

Here's my test app using your code:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

class Program
{
    static void Main(string[] args)
    {
        var uncompressed = Encoding.UTF8.GetBytes("Mary had a little lamb");
        var compressed = GZIPCompress(uncompressed);
        Console.WriteLine(compressed.Length);
        Console.WriteLine(Convert.ToBase64String(compressed));
    }

    static byte[] GZIPCompress(byte[] data)
    {
        using (var memoryStream = new MemoryStream())
        {
            using (var gZipStream = new GZipStream(memoryStream, 
                                                   CompressionMode.Compress))
            {
                gZipStream.Write(data, 0, data.Length);
            }

            return memoryStream.ToArray();
        }
    }
}

Results:

42
H4sIAAAAAAAEAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA

This is exactly the same as the Java data.

I'm using .NET 4.5. I suggest you try running the above code on your machine, and compare the results.

I've just decompressed the base64 data you provided, and it is a valid "compressed" form of "Mary had a little lamb", with 22 bytes in the uncompressed data. That surprises me... and reinforces my theory that it's a framework version difference.

EDIT: Okay, this is definitely a framework version difference. If I compile with the .NET 3.5 compiler, then use an app.config which forces it to run with that version of the framework, I see 137 bytes as well. Given comments, it looks like this was only fixed in .NET 4.5.

Comments