I learned this method from a SPIE Proceeding article, they used the twice HSV transformation for shadow detection. In their paper, the method was stated as following:
Firstly, the color model of the image is transformed from RGB to HSV,
and the three components of the HSV model are normalized to 0 to 255,
then the image is transformed from RGB to HSV once again. Thirdly, the
image is turned into a gray image from a color image, only the gray
value of the red component is used. Fourthly, the OTSU thresholding
method is used to produce a threshold by which the image is converted
to a binary image. Since the gray value of the shadow area is usually
smaller than those areas which are not covered by shadow, the
objective is pixels whose gray value is below the threshold, and
background is pixels whose gray value is beyond the threshold.
The second and third statements absolutely don't make any sense whatsoever. Even the pipeline is rather suspicious. However, after re-reading that statement probably a dozen times, here is what I came up with. Apologies for any errors in understanding.
Let's start with the second point:
Firstly, the color model of the image is transformed from RGB to HSV, and the three components of the HSV model are normalized to 0 to 255, then the image is transformed from RGB to HSV once again
You're well aware that transforming an image from RGB to HSV results in another three channel output. Depending on which platform you're using, you'll either get 0-360 or 0-1 for the first channel or Hue, 0-100 or 0-255 for the second channel or Saturation, and 0-100 or 0-255 for the third channel or Value. Each channel may be unequal in magnitude when comparing with the other channels, and so these channels are normalized to the 0-255 range independently. Specifically, this means that the Hue, Saturation and Value components all get normalized so that they all span from 0-255.
Once we do this, we now have a HSV image where each channel ranges from 0-255. My guess is they call this new image a RGB image because the channels all span from 0-255, just like any 8-bit RGB image would. This also makes sense because when you're transforming an image from RGB to HSV, the dynamic range of the channels all span from 0-255, so my guess is that they normalize all of the channels in the first HSV result to make it suitable for the next step.
Once they normalize the channels after doing HSV conversion as per above, they do another HSV conversion on this new result. The reasons why they would do this a second time are beyond me and don't make any sense, but that's what I gathered from the above description, and that's what they probably mean by "twice HSV transformation" - To transform the original RGB image to HSV once, normalize that result so all channels span from 0-255, then re-apply the HSV conversion again to this intermediate result.
Let's go to the third point:
Thirdly, the image is turned into a gray image from a color image, only the gray value of the red component is used.
The output after you transform the HSV image a second time, the final result is simply taking the first channel which is inherently a grayscale image and is the "red" channel. Coincidentally, this also corresponds to the Hue after you do a HSV conversion. I'm not quite sure what properties the Hue channel holds after converting the image using HSV twice, but maybe it worked for this particular method.
I decided to give this a whirl and see if this really works. Here's an example image of a shadow I found online:
The basic pipeline is to take an image, convert it into HSV, renormalize the image so that the values are 0-255 again, do another HSV conversion, then do an adaptive threshold via Otsu. We threshold below the optimal value to segment out the shadows.
I'm going to use OpenCV Python, as I don't have the C++ libraries set up on my computer here. In OpenCV, when converting an image to HSV, if the image is unsigned 8-bit RGB, the Saturation and Value components are automatically scaled to [0-255], but the Hue component is scaled to [0-179] in order to fit the Hue (which is originally [0-360)) into the data type. As such, I scaled each value by (255/179) so that the Hue gets normalized to [0-255]. Here's the code I wrote:
import numpy as np # Import relevant libraries import cv2 # Read in image img = cv2.imread('shadow.jpg') # Convert to HSV hsv1 = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) # Renormalize Hue channel to 0-255 hsv1[:,:,0] = ((255.0/179.0)*hsv1[:,:,0]).astype('uint8') # Convert to HSV again # Remember, channels are now RGB hsv2 = cv2.cvtColor(hsv1, cv2.COLOR_RGB2HSV) # Extract out the "red" channel red = hsv2[:,:,0] # Perform Otsu thresholding and INVERT the image # Anything larger than threshold is white, anything greater is black _,out = cv2.threshold(red, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU) # Show the image - shadow mask cv2.imshow('Output', out) cv2.waitKey(0) cv2.destroyAllWindows()
This is the output I get:
Hmm.... well there are obviously some noisy pixels, but I guess it does work.... kinda!