Byyo Byyo - 11 months ago 94
C# Question

Algorithm to compare two images in C#

I'm writing a tool in C# to find duplicate images. Currently i create a MD5 checksum of the files and compare those.

Unfortunately my images can be


  • rotated by 90 degrees

  • have different dimensions (smaller image with same content)

  • have different compressions or filetypes (e.g. jpeg artifacts, see below)



enter image description hereenter image description here

what would be the best approach to solve this problem?

Answer Source

Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)

  1. resize the picture to 16x16 pixel

16x16 resized

  1. reduce colors to black/white (which equals true/false in this console output)

enter image description here

  1. read the boolean values into a BitArray, bool[], List<bool> or something - this is the hash

Code:

public static List<bool> GetHash(Bitmap bmpSource)
{
    List<bool> lResult = new List<bool>();         
    //create new image with 16x16 pixel
    Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
    for (int j = 0; j < bmpMin.Height; j++)
    {
        for (int i = 0; i < bmpMin.Width; i++)
        {
            //reduce colors to true / false                
            lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
        }             
    }
    return lResult;
}

I know, GetPixel is not that fast but on a 16x16 pixel image it should not be the bottleneck.

  1. compare this hash to hash values from other images and add a tolerance.(number of pixels that can differ from the other hash)

Code:

List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));

//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);

So this code is able to find equal images with:

  • different file formats
  • rotation (90, 180, 270) - by changing the iteration order of i and j
  • different size (same aspect is required)
  • different compression (tolerance is required in case of quality loss like jpeg artifacts) - you can accept a 99% equality to be the same image and 50% to be a different one.