After completion several chapters in computer vision books I decided to apply those methods to create some primitive bot for a game. I chose Fling that has almost no dynamics and all I needed to do was to find balls. Balls may have 5 different colors and also they can be directed to any of 4 directions (depending on eyes' location). I cropped each block in the field such that I can just check each block whether it contains a ball or not. My problem is that I'm not able to find balls correctly.
My first attempt was following. I sum RGB colors for each ball and get [R, G, B] array. Then I sum RGB colors for each block in the field. If block's array has a similar [R, G, B] as ball's array I suggest that this block has a ball.
The problem is it's hard to find good value for 'similarity'. Even different empty blocks vary in such sums significantly.
Second, I tried to use openCV module that has matchTemplate function. This function matches image with another source image and along with minMaxLoc function returns a value maxLoc. If maxLoc is close to 1 then the image is probably in source image. I made all possible variations of balls (20 overall), and passed them with the entire field. This function worked well but unfortunately it sometimes misses some balls in the field or assigns two different types of balls (say green and yellow) for one ball. I tried to improve the process by matching balls not with the entire field but with each block (this method has advantage that it checks each block and should detect correct number of balls in the field, when matching with entire field only gives one location for each color of ball. If there are two balls of the same color matchTemplate loses information about 2nd ball) . Surprisingly it still has false negatives\positives.
Probably there is much easier way to solve this problem (maybe a library that I don't know yet) but for now I can't find one. Any suggestions are welcomed.
The balls seem pretty distinct in terms of colour. The problems you initially described seem to be related to some of the finer, random detail present in the image - especially in the background and in the different shading/poses of the ball.
On this basis, I would say you could simplify the task significantly by applying a set of pre-processing steps to "collapse" the range of colours in the image.
There are any number of more principled ways to achieving accurate colour segmentation (which is what, more formally, you want to achieve) - but taking a more pragmatic view, here are a few quick'n'dirty hacks.
So, for example, we can initially smooth the image to reduce higher frequency components...
Then, convert to a normalised RGB representation...
Before, finally posterizing it with the mean shift filtering step...
Here is the code in Python, using the OpenCV bindings, that does all this in order:
import cv # get orginal image orig = cv.LoadImage('fling.png') # show original cv.ShowImage("orig", orig) # blur a bit to remove higher frequency variation cv.Smooth(orig,orig,cv.CV_GAUSSIAN,5,5) # normalise RGB norm = cv.CreateImage(cv.GetSize(orig), 8, 3) red = cv.CreateImage(cv.GetSize(orig), 8, 1) grn = cv.CreateImage(cv.GetSize(orig), 8, 1) blu = cv.CreateImage(cv.GetSize(orig), 8, 1) total = cv.CreateImage(cv.GetSize(orig), 8, 1) cv.Split(orig,red,grn,blu,None) cv.Add(red,grn,total) cv.Add(blu,total,total) cv.Div(red,total,red,255.0) cv.Div(grn,total,grn,255.0) cv.Div(blu,total,blu,255.0) cv.Merge(red,grn,blu,None,norm) cv.ShowImage("norm", norm) # posterize simply with mean shift filtering post = cv.CreateImage(cv.GetSize(orig), 8, 3) cv.PyrMeanShiftFiltering(norm,post,20,30) cv.ShowImage("post", post)