Detecting a Ball

I'm making a robot which will navigate using some artificial landmarks. These landmarks will include some solid color balls. In order to find the landmark, I need to detect these balls. I will use the following image as an example scene, a ball to taped to a cupboard:

When the image is captured, the first thing I do is to convert it to greyscale. I do it twice with two different formulas.

 [G1]   gray=  1.8*RED - 0.9*GREEN - 0.5*BLUE
 [G2]   gray=  0.3*RED + 0.6*GREEN + 0.1*BLUE

G2 is widely used for greyscale conversion. It captures changes in luminosity very well but loses color information. Here is its result:

G1 does capture color information but loses a lot of detail. I will use it to deduce an area of interest:

Thresholding it with the value 180:

If you look closely, the input image has two quirks near the ball. One is the shadow and the other is the reflection. G1 has captured the reflection and avoided the shadow. G2 has done the reverse: it can't see the red reflection because its luminosity is too close to that of the shadow's. It simply collects all of them into one shadow. When I process the result of G1 with a Sobel operator, the result is:

This is not so bad for a human to recognize a circle but it's very irregular and has actually lost some pixels along the edges. Let's display it overlaid on the input image. I cut it down by hand:

As you can see, the edges aren't quite at the outer border of the ball. This could be caused by jpeg compression mangling up the colors at the sharp transition from the cupboard to the ball. I can't disable it since it's the only format the camera can provide. Also, the edge pixels are quite irregular. Changing the threshold value doesn't help much in this case, different values simply move the irregularities elsewhere.

In any case, it's obvious that I can't use t3 or t4 as my base image. I can use t4 as a mask in order to eliminate most of the image in order to speed up the process.

Determining Area of Interest

I will implement a very simple algorithm for this. I will simply divide the image into NxN blocks and find the average pixel value within each block. If the average is above a certain level, then I will conclude that the pixel block has valuable data. Otherwise, it will be ignored. I will then cut the image to the smallest size (plus a small margin on each side) where it contains all valuable blocks. Here is the result of that.

Here, N=16 pixels, threshold=120 and 33% extension is made on all four sides. Following is the greyscale conversion using G2.

Regarding numerical constants such as thresholds or windows; I use just whatever value comes to my mind. If it doesn't work I double, halve it or modify it in another way until it does. The actual values to be used in the field will be found out by experimentation under the field's scenery and lighting conditions. It's possible that these will actually be functions of amount of ambient light, relative direction of the sun etc.

Edge Detection

I use the following Sobel operator for edge detection.

     Sy             Sx
  -1 -2 -1       -1  0  1
   0  0  0       -2  0  2
   1  2  1       -1  0  1

This gives me gradients in the same direction as the image coordinate system: an increasing brightness from left to right gives a positive X gradient. The same applies to the Y gradient.

By combining these two gradients, we get a gradient vector for each pixel. The size of this vector tells us whether the pixel is on an edge or not. There are two ways of calculating this size:

  [DIST1]   size=  SQRT(PX²+PY²)
  [DIST2]   size=  |PX| + |PY|

DIST1 is the correct calculation. Many people use DIST2 in order to save computation time. PX and PY are the results of the Sx and Sy operators at a given pixel. These have a maximum value of 1020 and a minimum value of -1020. For each plus and minus pair in the operators, the maximum difference we can have is 255. The sum of all positive factors is 4 in each case. Multiplying these, we get 1020. Here is a plot of the difference between DIST1 and DIST2, within the applicable range:

The error quite large, going up to 600 in the maximum case. I think I'll go with DIST1 for now. When I develop the rest of the system I can try DIST2. Besides, I can always chop up DIST1 into segments and use some linear approximations.

Having an accurate size for a gradient vector is quite important I think. This is because I will use some thresholding sooner or later. Accuracy of these values can make a big difference.

The maximum value of size is SQRT(2)*1020 for DIST1. This doesn't fit into 8 bits. In order to convert it to 8 bits, I can use two methods:

  [QUAN1]  size8 =  min( size , 255 )
  [QUAN2]  size8 =  size * 255 / max{∀ size}

Although I'd expect QUAN2 to do much better, it doesn't seem to be the case. The problem is that, if you have even one high value for size, it causes all other values to drop when quantized. This results in very faint lines for the edges. Faint edges means that I can't use a high threshold value to remove spurious details resulting from lighting, reflections etc. It's possible to analyse the range of size and quantize it properly by cobbling up the highest values into one value, but that will take more computation power and I'm not really sure how I would go about that. So, I'll go with saturated additions (QUAN1) for now.

t10 captures the borders of the ball perfectly in this example. I might not be so lucky with more objects around the landmark or different distribution of color.

Detecting a Circle

In order to detect the circle, we need to seperate it from other objects. In t10 there are three groups of connected pixels corresponding to the ball's shadow, reflection of the light source on the ball and the ball itself. I think that if there is a circle in an image, its border will always make a connected thick set of pixels. So, if I can isolate connected sets from each other I can decide whether each one corresponds to a circle or not. I will do this partition in a separate page.

After I've partitioned the image into connected sets, I can find the maximally distant pair of pixels in the connected set. The line segment connecting these would be the diameter of the circle, if the set in question does correspond to a circle. By translating this diameter along its normal, I can find the intersections of the resulting line segments with the connected set. The lengths of these line segments would follow the equation for the circle if the shape is right. Here is what I mean by this:

Here, the relation between D, a and b would tell us whether the shape is a circle or not. This method would be really slow since we need to find the maximal distance within a set. That is an O(N²) operation. However, it can be done faster.

Let's pick two non-neighboring points A,B and calculate the distance D between them. If we move B to L, the distance will decrease. If we move it to R, it will increase. By making use of this fact, we can determine the diameter in O(N) steps as follows:

pick A and B
D= distance(A,B)
for each neighbour K of B, calculate D_k= distance(K,A)
find K where D_k is maximum and is greater than D
If 4 doesn't fail, set B=K and goto 2.
If 4 fails, exchange A and B and goto 2. If this is the second failure in a row, we have found the local diameter: stop the algorithm.

Here is what I mean by local diameter: it's the maximal distance which can be reached from a given pair of starting points. Below are two cases 1 and 2 where the initial pairs are connected in blue and the final pairs are connected in red. It may not be very accurate I just made it by hand.

The point is: we don't need to find the biggest line segment connecting two points in the set. We just need to find the biggest segment we can reach and then we will decide whether the shape is a circle using that.

Anyway, my initial thought of translating a found diameter was cute and all but not really necessary. If I have a segment which is a local diameter, then I can find its midpoint P and measure distances of the points in the set to P. If these distances are pretty much all the same, then we have a circle. Otherwise, we don't have a circle.

Here is the result of our example image being partitioned:

I can eliminate p3 and p4 easily by looking at the area of the bounding box of the partition. If it's too small, I'm not interested even if it's a circle.

While looking for an initial pair of points, I can simply choose the top-left pixel and the bottom-right pixel. Scanning for these is very easy, I just go sequentially from the start or end of the image.

I tried my idea for finding the maximally distant pair and it failed spectacularly on t10-p1. I chose two neighbouring points in order to follow their paths, A shown in red and B shown in green.

This happens because our shape isn't exactly a circle even though we're going to classify it as one later. Here, there are some missing pixels. These may be caused by the camera or my processing but no matter.

Now that I've got two points on the exterior of the circle, I can make a third one as follows. I will draw the chord AB. When I take the normal of this chord, I'm assured that it will intersect the shape since A and B are on opposite sides of the normal and they are connected. The furthest intersection is my third point. Using these three points, I can draw a circle. Below is the result, overlaid on top of t10-p1.

t11= draw_circle(t10-p1)

The center-radius triplet I found is slightly bigger than the ball. It can be seen more clearly in the image below. In the above image, what I should do is to maximize the number of blue pixels which exist in both the image and the generated circle.

t11 overlaid on t6

The reason for getting a circle too big is pretty much obvious. Border pixels which have very little intensity are treated the same way as the other pixels which are properly contained in the ball image. This can be observed very clearly in the lower right part of the above image.

In order to fix this problem, I shouldn't just go with the first circle I find, but search for a better one in the neighbourhood. I will look at 27 circles total, varying the center coordinates as well as the radius by -1,0 and 1. The fitness metric will be the total luminosity value the circle accumulates over the edge pixels it passes. i.e. it will be the sum of the blue pixels in t11 over t10-p1. I have been using the result of the partitioning directly. I will now use pixels from the source image instead. Here is the REAL partitioning of t10:

With the above modifications, here is the best circle I've found:

t12

So far, all I've done is to fit a circle to a given connected set. I still have to decide whether the whole connected set is a circle arc or not. For this, I can use a method I previously mentioned. I could look at the distance of each point to the center of the circle. If there are too many outliers, then maybe our shape isn't a circle after all. I measured the following:

   avg( abs(dist(p,center)-r) ) for p in shape

I got 1.3 for t10-x1, 1.9 for t10-x2 and 1.6 for t10-x5. I did some test with artificial images as well. I got a maximum average error of 1.73 for artificial circles. For other shapes, the average error is really high if the shape is not close to a circle. For ellipses which are really close to circles, the average error comes out as slightly above 2. So, maybe I can blindly use the value 2 for decision making.

I think I'm now ready to write the whole ball detection function. Let's review what I have done:

     X0 : input
     X1 = threshold(greyscale-red( X0 ))
     X2 = greyscale-luminosity( X0 )
     X3 = mask X2 by X1
     X4 = threshold(quant1(sobel(X3)))
     For each connected set S in X4,
       fit circle C to S
       if the average distance error in S with
          regard to C is too high, reject S

Now, I've done it. Here is a source package for a test program. The final code is in detcir.[ch].

A New Approach

Masking the greyscale(G2) image with color information didn't work so well for scenes where there are a lot of objects near the ball I want to detect. I get a smaller image (that's not guaranteed either) but all the edges made by objects of irrelevant colors are still there and need to be eliminated. Here is an example:

What I really need is to detect the changes in relative intensity of our target color, in this case red. Just looking at the red channel doesn't work because white also has a lot of red in it. I need to somehow suppress everything which doesn't look red. I came up with the following:

  [G3]  gray = RED - GREEN/2 - BLUE/2

This is of course clamped to between 0 and 255. G3 is high when red component is high, unless green and blue are also high. This also suppresses dark reds to a degree. Here is the output of G3.

If you compare it to t3, this is much nicer. The white surface is now completely darkened. All that remains is the ball and its shadow/reflection. Applying a gaussian filter of size 5 really helps here. It serves two purposes: it reduces the effects of JPEG compression and weakens edges. Weakening edges is important in separating the ball from its shadow. For the ball, we have a lot pixels which will serve as edges even after the blurring. This is not so for the shadow. It will lose all but its most prominent edges due to this blurring. Here is what I mean:

Compare this to the edge image of G2:

Edges in p3 are much thicker for the ball itself, but less prominent for the shadow. This is because we are now actually making edges based on the change of the red color. The gaussian filter also carries the borders of the ball towards the center. This is because the pixels at the inside border of the circle have more neighbours with nonzero intensity than those at the outer border. After thresholding and partitioning, we end up with:

The algorithm makes a good attempt at even a red ball on a red table:

There are many circles associated with the ball here. These come from the disconnected arcs of the ball. These disconnects are caused by the combination of the reddish table with other objects creating edges around the ball.

The algorithm finds a circle for almost all edges shown below. I have filtered the list of circles by hiding the ones which have a radius greater than half width of the image. These are caused by flat components.

I'm sure I won't have to deal with such scenery for some time. So, I'm going to go ahead with this implementation for now:

   p0: input
   p1= G3(p0)
   p2= gauss(p1, 5, 1.0)
   p3= QUANT1(sobel(p2))
   p4= threshold(p3, 200)
   partition p4 and find circles as before

The nice thing about this is that we have only two magic numbers: color coefficient vector in G3 and the threshold value after edge detection. The former should pretty much be adjustable by examining the color of the ball to be detected. The latter is more complicated, a high value decreases the chance of detection and reduces accuracy. A low value prevents the algorithm from working at all.

Download

The above code packages are for test programs. They do contain detcir.[ch] but I want to access them directly. So, here I will maintain the latest version of detcir.c and detcir.h.