jming jming - 3 months ago 9x
Java Question

Hashing raw bytes in Python and Java produces different results

I'm trying to replicate the behavior of a Python 2.7 function in Java, but I'm getting different results when running a (seemingly) identical sequence of bytes through a SHA-256 hash. The bytes are generated by manipulating a very large integer (exactly 2048 bits long) in a specific way (2nd line of my Python code example).

For my examples, the original 2048-bit integer is stored as

in Python and Java respectively, and both variables contain the same number.

Python2 code I'm trying to replicate:

raw_big_int = ("%x" % big_int).decode("hex")

buff = struct.pack(">i", len(raw_big_int) + 1) + "\x00" + raw_big_int

pprint("Buffer contains: " + buff)
pprint("Encoded: " + buff.encode("hex").upper())

digest = hashlib.sha256(buff).digest()

pprint("Digest contains: " + digest)
pprint("Encoded: " + digest.encode("hex").upper())

Running this code prints the following (note that the only result I'm actually interested in is the last one - the hex-encoded digest. The other 3 prints are just to see what's going on under the hood):

'Buffer contains: \x00\x00\x01\x01\x00\xe3\xbb\xd3\x84\x94P\xff\x9c\'\xd0P\xf2\xf0s,a^\xf0i\xac~\xeb\xb9_\xb0m\xa2&f\x8d~W\xa0\xb3\xcd\xf9\xf0\xa8\xa2\x8f\x85\x02\xd4&\x7f\xfc\xe8\xd0\xf2\xe2y"\xd0\x84ck\xc2\x18\xad\xf6\x81\xb1\xb0q\x19\xabd\x1b>\xc8$g\xd7\xd2g\xe01\xd4r\xa3\x86"+N\\\x8c\n\xb7q\x1c \x0c\xa8\xbcW\x9bt\xb0\xae\xff\xc3\x8aG\x80\xb6\x9a}\xd9*\x9f\x10\x14\x14\xcc\xc0\xb6\xa9\x18*\x01/eC\x0eQ\x1b]\n\xc2\x1f\x9e\xb6\x8d\xbfb\xc7\xce\x0c\xa1\xa3\x82\x98H\x85\xa1\\\xb2\xf1\'\xafmX|\x82\xe7%\x8f\x0eT\xaa\xe4\x04*\x91\xd9\xf4e\xf7\x8c\xd6\xe5\x84\xa8\x01*\x86\x1cx\x8c\xf0d\x9cOs\xebh\xbc1\xd6\'\xb1\xb0\xcfy\xd7(\x8b\xeaIf6\xb4\xb7p\xcdgc\xca\xbb\x94\x01\xb5&\xd7M\xf9\x9co\xf3\x10\x87U\xc3jB3?vv\xc4JY\xc9>\xa3cec\x01\x86\xe9c\x81F-\x1d\x0f\xdd\xbf\xe8\xe9k\xbd\xe7c5'
'Encoded: 0000010100E3BBD3849450FF9C27D050F2F0732C615EF069AC7EEBB95FB06DA226668D7E57A0B3CDF9F0A8A28F8502D4267FFCE8D0F2E27922D084636BC218ADF681B1B07119AB641B3EC82467D7D267E031D472A386222B4E5C8C0AB7711C200CA8BC579B74B0AEFFC38A4780B69A7DD92A9F101414CCC0B6A9182A012F65430E511B5D0AC21F9EB68DBF62C7CE0CA1A382984885A15CB2F127AF6D587C82E7258F0E54AAE4042A91D9F465F78CD6E584A8012A861C788CF0649C4F73EB68BC31D627B1B0CF79D7288BEA496636B4B770CD6763CABB9401B526D74DF99C6FF3108755C36A42333F7676C44A59C93EA36365630186E96381462D1D0FDDBFE8E96BBDE76335'
'Digest contains: Q\xf9\xb9\xaf\xe1\xbey\xdc\xfa\xc4.\xa9 \xfckz\xfeB\xa0>\xb3\xd6\xd0*S\xff\xe1\xe5*\xf0\xa3i'
'Encoded: 51F9B9AFE1BE79DCFAC42EA920FC6B7AFE42A03EB3D6D02A53FFE1E52AF0A369'

Now, below is my Java code so far. When I test it, I get the same value for the input buffer, but a different value for the digest. (
contains a
object containing the same number as
in the Python example above)

byte[] rawBigInt = bigInt.toByteArray();

ByteBuffer buff = ByteBuffer.allocate(rawBigInt.length + 4);

System.out.print("Buffer contains: ");
System.out.println( DatatypeConverter.printHexBinary(buff.array()) );

MessageDigest hash = MessageDigest.getInstance("SHA-256");
byte[] digest = hash.digest();

System.out.print("Digest contains: ");
System.out.println( DatatypeConverter.printHexBinary(digest) );

Notice that in my Python example, I started the buffer off with
len(raw_big_int) + 1
packed, where in Java I started with just
. I also omitted the extra 0-byte (
) when writing in Java. I did both of these for the same reason - in my tests, calling
on a
returned a
array already beginning with a 0-byte that was exactly 1 byte longer than Python's byte sequence. So, at least in my tests,
len(raw_big_int) + 1
, since
began with a 0-byte and
did not.

Alright, that aside, here is the Java code's output:

Buffer contains: 0000010100E3BBD3849450FF9C27D050F2F0732C615EF069AC7EEBB95FB06DA226668D7E57A0B3CDF9F0A8A28F8502D4267FFCE8D0F2E27922D084636BC218ADF681B1B07119AB641B3EC82467D7D267E031D472A386222B4E5C8C0AB7711C200CA8BC579B74B0AEFFC38A4780B69A7DD92A9F101414CCC0B6A9182A012F65430E511B5D0AC21F9EB68DBF62C7CE0CA1A382984885A15CB2F127AF6D587C82E7258F0E54AAE4042A91D9F465F78CD6E584A8012A861C788CF0649C4F73EB68BC31D627B1B0CF79D7288BEA496636B4B770CD6763CABB9401B526D74DF99C6FF3108755C36A42333F7676C44A59C93EA36365630186E96381462D1D0FDDBFE8E96BBDE76335
Digest contains: E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855

As you can see, the buffer contents appear the same in both Python and Java, but the digests are obviously different. Can someone point out where I'm going wrong?

I suspect it has something to do with the strange way Python seems to store bytes - the variables
show as type
in the interpreter, and when printed out by themselves have that strange format with the '\x's that is almost the same as the bytes themselves in some places, but is utter gibberish in others. I don't have enough Python experience to understand exactly what's going on here, and my searches have turned up fruitless.

Also, since I'm trying to port the Python code into Java, I can't just change the Python - my goal is to write Java code that takes the same input and produces the same output. I've searched around (this question in particular seemed related) but didn't find anything to help me out. Thanks in advance, if for nothing else than for reading this long-winded question! :)


In Java, you've got the data in the buffer, but the cursor positions are all wrong. After you've written your data to the ByteBuffer it looks like this, where the x's represent your data and the 0's are unwritten bytes in the buffer:

                    ^ position                               ^ limit

The cursor is positioned after the data you've written. A read at this point will read from position to limit, which is the bytes you haven't written.

Instead, you want this:

^ position          ^ limit

where the position is 0 and the limit is the number of bytes you've written. To get there, call flip(). Flipping a buffer conceptually switches it from write mode to read mode. I say "conceptually" because ByteBuffers don't have explicit read and write modes, but you should think of them as if they do.

(The opposite operation is compact(), which goes back to read mode.)