JustRegisterMe JustRegisterMe - 1 year ago 100
Python Question

Get MD5 hash of big files in Python

I have used hashlib (which replaces md5 in Python 2.6/3.0) and it worked fine if I opened a file and put its content in


The problem is with very big files that their sizes could exceed RAM size.

How to get the MD5 hash of a file without loading the whole file to memory?

Answer Source

Break the file into 128-byte chunks and feed them to MD5 consecutively using update().

This takes advantage of the fact that MD5 has 128-byte digest blocks. Basically, when MD5 digest()s the file, this is exactly what it is doing.

If you make sure you free the memory on each iteration (i.e. not read the entire file to memory), this shall take no more than 128 bytes of memory.

One example is to read the chunks like so:

f = open(fileName)
while not endOfFile: