Ikke Ikke - 6 months ago 20
Python Question

Why is the Python calculated "hashlib.sha1" different from "git hash-object" for a file?

I'm trying to calculate the SHA-1 value of a file.

I've fabricated this script:

def hashfile(filepath):
sha1 = hashlib.sha1()
f = open(filepath, 'rb')
try:
sha1.update(f.read())
finally:
f.close()
return sha1.hexdigest()


For a specific file I get this hash value:

8c3e109ff260f7b11087974ef7bcdbdc69a0a3b9


But when i calculate the value with git hash_object, then I get this value:
d339346ca154f6ed9e92205c3c5c38112e761eb7


How come they differ? Am I doing something wrong, or can I just ignore the difference?

Answer

git calculates hashes like this:

sha1("blob " + filesize + "\0" + data)

Reference