lakshmipathi lakshmipathi - 1 month ago 15
Linux Question

different md5sum for the same file content?

I have an issue with computing md5sum. I have a recover tool -which archives file's metadata (inode) and also computes md5sum of them file(s) and stores them in sqlite db during installation. When the file gets removed/deleted . the tool recovers the deleted file using metadata from sqlite-db.It recovers file.Now ,I wanted to make sure recovered file is exactly same as original file.Thus recomputed the recovered files md5sum as shown below. The problem is ,strangely for few files,I can see (using cat) file content are exactly same (as before it was deleted) & stat command shows same output (except different inode number) but md5sum is different.

Following 2 files has same content - thus having different inode number doesn't affect md5sum.

764efa883dda1e11db47671c4a3bbd9e /test/hi1.txt
764efa883dda1e11db47671c4a3bbd9e /test/hi.txt


Any thoughts, how I should proceed with this?

char file_location[512] = {0};

char md5_cmd[512], md5sum[34];
FILE *pf;
//some recovery stuff goes here...

//Recompute md5 of recovered file
memset(md5_cmd, '\0', 512);
sprintf(md5_cmd, "md5sum %s", file_location);

pf = popen(md5_cmd, "r");
if (!pf) {
fprintf(stderr,"Could not open pipe");
return;
}

// get data
fgets(md5sum, 34, pf);

if (pclose(pf) != 0)
fprintf(stderr, "Error: close Failed.");

fprintf(stdout, "Md5sum is %s", md5sum);

Answer

You cannot reliably compare file contents with cat. This way (unless you use cat -A or such), there can be many difference which go by unnoticed: spaces vs. tabs, whitespace at the end of lines, etc.

You should compare files with

diff -u fileA fileB

or

cmp fileA fileB

.