Byyo Byyo - 12 days ago 11
C# Question

Algorithm to compare two images in C#

I'm writing a tool in C# to find duplicate images. Currently i create a MD5 checksum of the files and compare those.

Unfortunately my images can be


  • rotated by 90 degrees

  • have different dimensions (smaller image with same content)

  • have different compressions or filetypes (e.g. jpeg artifacts, see below)



enter image description hereenter image description here

what would be the best approach to solve this problem?

Answer

Here is a simple approach with a 256 bit image-hash (MD5 has 128 bit)

  1. resize the picture to 16x16 pixel

16x16 resized

  1. reduce colors to black/white (which equals true/false in this console output)

enter image description here

  1. read the boolean values into a BitArray, bool[], List<bool> or something - this is the hash

Code:

public static List<bool> GetHash(Bitmap bmpSource)
{
    List<bool> lResult = new List<bool>();         
    //create new image with 16x16 pixel
    Bitmap bmpMin = new Bitmap(bmpSource, new Size(16, 16));
    for (int j = 0; j < bmpMin.Height; j++)
    {
        for (int i = 0; i < bmpMin.Width; i++)
        {
            //reduce colors to true / false                
            lResult.Add(bmpMin.GetPixel(i, j).GetBrightness() < 0.5f);
        }             
    }
    return lResult;
}

I know, GetPixel is not that fast but on a 16x16 pixel image it should not be the bottleneck.

  1. compare this hash to hash values from other images and add a tolerance.(number of pixels that can differ from the other hash)

Code:

List<bool> iHash1 = GetHash(new Bitmap(@"C:\mykoala1.jpg"));
List<bool> iHash2 = GetHash(new Bitmap(@"C:\mykoala2.jpg"));

//determine the number of equal pixel (x of 256)
int equalElements = iHash1.Zip(iHash2, (i, j) => i == j).Count(eq => eq);

So this code is able to find equal images with:

  • different file formats
  • rotation (90, 180, 270) - by changing the iteration order of i and j
  • different size (same aspect is required)
  • different compression (tolerance is required in case of quality loss like jpeg artifacts) - you can accept a 99% equality to be the same image and 50% to be a different one.
Comments