I'm making a robot which will
navigate using some artificial landmarks. These landmarks will
include some solid color balls. In order to find the landmark, I need
to detect these balls.
I will use the following image as an example
scene, a ball to taped to a cupboard:
When the image is captured, the first thing I do is to convert it to
greyscale. I do it twice with two different formulas.
G2 is widely used for greyscale conversion. It captures changes in
luminosity very well but loses color information. Here is its result:
G1 does capture color information but loses a lot of detail. I will
use it to deduce an area of interest:
Thresholding it with the value 180:
If you look closely, the input image has two quirks near the ball. One
is the shadow and the other is the reflection. G1 has captured the reflection
and avoided the shadow. G2 has done the reverse: it can't see the red
reflection because its luminosity is too close to that of the shadow's.
It simply collects all of them into one shadow. When I process the result
of G1 with a Sobel operator, the result is:
This is not so bad for a human to recognize a circle but it's very irregular
and has actually lost some pixels along the edges.
Let's display it overlaid on the input image. I cut it down by hand:
As you can see, the edges aren't quite at the outer border of the ball.
This could be caused by jpeg compression mangling up the colors at the
sharp transition from the cupboard to the ball. I can't disable it since
it's the only format the camera can provide. Also, the edge pixels are
quite irregular. Changing the threshold value doesn't help much in this
case, different values simply move the irregularities elsewhere.
In any case, it's obvious that I can't use t3 or t4 as my base image. I can
use t4 as a mask in order to eliminate most of the image in order to
speed up the process.
Determining Area of Interest
I will implement a very simple algorithm for this. I will simply divide
the image into NxN blocks and find the average pixel value within each
block. If the average is above a certain level, then I will conclude that
the pixel block has valuable data. Otherwise, it will be ignored. I will
then cut the image to the smallest size (plus a small margin on each side)
where it contains all valuable blocks. Here is the result of that.
Here, N=16 pixels, threshold=120 and 33% extension is made on all four sides.
Following is the greyscale conversion using G2.
Regarding numerical constants such as thresholds or windows; I use just
whatever value comes to my mind. If it doesn't work I double, halve it or
modify it in another way until it does. The actual values to be used in
the field will be found out by experimentation under the field's scenery
and lighting conditions. It's possible that these will actually be functions
of amount of ambient light, relative direction of the sun etc.
I use the following Sobel operator for edge detection.
-1 -2 -1 -1 0 1
0 0 0 -2 0 2
1 2 1 -1 0 1
This gives me gradients in the same direction as the image coordinate
system: an increasing brightness from left to right gives a positive
X gradient. The same applies to the Y gradient.
By combining these two gradients, we get a gradient vector for each
pixel. The size of this vector tells us whether the pixel is on an edge
or not. There are two ways of calculating this size:
DIST1 is the correct calculation. Many people use DIST2 in order to save
computation time. PX and PY are the results of the Sx and Sy operators
at a given pixel. These have a maximum value of 1020 and a minimum value
of -1020. For each plus and minus pair in the operators, the maximum
difference we can have is 255. The sum of all positive factors is 4
in each case. Multiplying these, we get 1020.
Here is a plot of the difference between DIST1 and DIST2, within the
The error quite large, going up to 600 in the maximum case. I think
I'll go with DIST1 for now. When I develop the rest of the system
I can try DIST2. Besides, I can always chop up DIST1 into segments
and use some linear approximations.
Having an accurate size for a gradient vector is quite important I think.
This is because I will use some thresholding sooner or later. Accuracy
of these values can make a big difference.
The maximum value of size is SQRT(2)*1020 for DIST1. This doesn't fit into
8 bits. In order to convert it to 8 bits, I can
use two methods:
Although I'd expect QUAN2 to do much better, it doesn't seem to be the case.
The problem is that, if you have even one high value for size,
it causes all other values to drop when quantized. This results in very
faint lines for the edges. Faint edges means that I can't use a high threshold
value to remove spurious details resulting from lighting, reflections etc.
It's possible to analyse the range of size and quantize it
properly by cobbling up the highest values into one value, but that will
take more computation power and I'm not really sure how I would go about
that. So, I'll go with saturated additions (QUAN1) for now.
t10 captures the borders of the ball perfectly in this example. I might
not be so lucky with more objects around the landmark or different
distribution of color.
Detecting a Circle
In order to detect the circle, we need to seperate it from other objects.
In t10 there are three groups of connected pixels corresponding to the
ball's shadow, reflection of the light source on the ball and the ball itself.
I think that if there is a circle in an image, its border will always
make a connected thick set of pixels. So, if I can isolate connected sets
from each other I can decide whether each one corresponds to a circle or not.
I will do this partition in a separate page.
After I've partitioned the image into connected sets, I can find the maximally distant pair of pixels in
the connected set. The line segment connecting these would be the diameter
of the circle, if the set in question does correspond to a circle. By
translating this diameter along its normal, I can find the intersections
of the resulting line segments with the connected set. The lengths of these
line segments would follow the equation for the circle if the shape is right.
Here is what I mean by this:
Here, the relation between D, a and b would tell us whether the shape is
a circle or not. This method would be really slow since we need to find
the maximal distance within a set. That is an O(N2) operation.
However, it can be done faster.
Let's pick two non-neighboring points A,B and calculate the distance D between
them. If we move B to L, the distance will decrease. If we move it to R,
it will increase. By making use of this fact, we can determine the diameter
in O(N) steps as follows:
pick A and B
for each neighbour K of B, calculate Dk= distance(K,A)
find K where Dk is maximum and is greater than D
If 4 doesn't fail, set B=K and goto 2.
If 4 fails, exchange A and B and goto 2. If this is the second failure
in a row, we have found the local diameter: stop the algorithm.
Here is what I mean by local diameter: it's the maximal distance which
can be reached from a given pair of starting points. Below are two
cases 1 and 2 where the initial pairs are connected in blue and the final
pairs are connected in red. It may not be very accurate I just made it
The point is: we don't need to find the biggest line segment connecting
two points in the set. We just need to find the biggest segment we can
reach and then we will decide whether the shape is a circle using that.
Anyway, my initial thought of translating a found diameter was cute and
all but not really necessary. If I have a segment which is a local diameter,
then I can find its midpoint P and measure distances of the points in the
set to P. If these distances are pretty much all the same, then we have
a circle. Otherwise, we don't have a circle.
Here is the result of our example image being partitioned:
I can eliminate p3 and p4 easily by looking at the area of the bounding
box of the partition. If it's too small, I'm not interested even if it's
While looking for an initial pair of points, I can simply choose the top-left
pixel and the bottom-right pixel. Scanning for these is very easy, I just
go sequentially from the start or end of the image.
I tried my idea for
finding the maximally distant pair and it failed spectacularly on t10-p1.
I chose two neighbouring points in order to follow their paths, A shown
in red and B shown in green.
This happens because our shape isn't exactly a circle even though we're
going to classify it as one later. Here, there are some missing pixels.
These may be caused by the camera or my processing but no matter.
I've got two points on the exterior of the circle, I can make a third one
as follows. I will draw the chord AB. When I take the normal of this
chord, I'm assured that it will intersect the shape since A and B are on
opposite sides of the normal and they are connected. The furthest
intersection is my third point. Using these three points, I can draw a
circle. Below is the result, overlaid on top of t10-p1.
The center-radius triplet I found is slightly bigger than the ball. It can
be seen more clearly in the image below. In the above image, what I should
do is to maximize the number of blue pixels which exist in both the image
and the generated circle.
t11 overlaid on t6
The reason for getting a circle too big is pretty much obvious. Border
pixels which have very little intensity are treated the same way as the
other pixels which are properly contained in the ball image. This can
be observed very clearly in the lower right part of the above image.
In order to fix this problem, I shouldn't just go with the first circle
I find, but search for a better one in the neighbourhood. I will look
at 27 circles total, varying the center coordinates as well as the
radius by -1,0 and 1. The fitness metric will be the total luminosity
value the circle accumulates over the edge pixels it passes. i.e. it
will be the sum of the blue pixels in t11 over t10-p1. I have been
using the result of the partitioning directly. I will now use pixels
from the source image instead. Here is the REAL partitioning of t10:
With the above modifications, here is the best circle I've found:
So far, all I've done is to fit a circle to a given connected set. I
still have to decide whether the whole connected set is a circle arc
or not. For this, I can use a method I previously mentioned. I could
look at the distance of each point to the center of the circle.
If there are too many outliers, then maybe our shape isn't a circle
I measured the following:
avg( abs(dist(p,center)-r) ) for p in shape
I got 1.3 for t10-x1, 1.9 for t10-x2 and 1.6 for t10-x5. I did some
test with artificial images as well. I got a maximum average error
of 1.73 for artificial circles. For other shapes, the average error
is really high if the shape is not close to a circle. For ellipses
which are really close to circles, the average error comes out as
slightly above 2. So, maybe I can blindly use the value 2 for
I think I'm now ready to write the whole ball detection function.
Let's review what I have done:
X0 : input
X1 = threshold(greyscale-red( X0 ))
X2 = greyscale-luminosity( X0 )
X3 = mask X2 by X1
X4 = threshold(quant1(sobel(X3)))
For each connected set S in X4,
fit circle C to S
if the average distance error in S with
regard to C is too high, reject S
Now, I've done it. Here is a source
package for a test program. The final code is in detcir.[ch].
A New Approach
Masking the greyscale(G2) image with color information didn't work
so well for scenes where there are a lot of objects near the ball
I want to detect. I get a smaller image (that's not guaranteed
either) but all the edges made by objects of irrelevant colors
are still there and need to be eliminated. Here is an example:
What I really need is to detect the changes in relative intensity
of our target color, in this case red. Just looking at the red
channel doesn't work because white also has a lot of red in it. I need
to somehow suppress everything which doesn't look red. I came up
with the following:
[G3] gray = RED - GREEN/2 - BLUE/2
This is of course clamped to between 0 and 255. G3 is high when red
component is high, unless green and blue are also high. This also
suppresses dark reds to a degree. Here is the output of G3.
If you compare it to t3, this is much nicer. The white surface is now
completely darkened. All that remains is the ball and its shadow/reflection.
Applying a gaussian filter of size 5 really helps here. It serves two
purposes: it reduces the effects of JPEG compression and weakens edges.
Weakening edges is important in separating the ball from its shadow.
For the ball, we have a lot pixels which will serve as edges even
after the blurring. This is not so for the shadow. It will lose all
but its most prominent edges due to this blurring. Here is what I mean:
Compare this to the edge image of G2:
Edges in p3 are much thicker for the ball itself, but
less prominent for the shadow. This is because we are now actually making
edges based on the change of the red color. The gaussian filter also carries
the borders of the ball towards the center. This is because the pixels
at the inside border of the circle have more neighbours with nonzero
intensity than those at the outer border.
After thresholding and
partitioning, we end up with:
The algorithm makes a good attempt at even a red ball on a red table:
There are many circles associated with the ball here. These come from
the disconnected arcs of the ball. These disconnects are caused by
the combination of the reddish table with other objects creating
edges around the ball.
The algorithm finds a circle for almost all edges shown below.
I have filtered the list of circles by hiding
the ones which have a radius greater than half width of the image.
These are caused by flat components.
I'm sure I won't have to deal with such scenery for some time.
So, I'm going to go ahead with this
implementation for now:
p2= gauss(p1, 5, 1.0)
p4= threshold(p3, 200)
partition p4 and find circles as before
The nice thing about this is that we have only two magic numbers: color
coefficient vector in G3 and the threshold value after edge detection.
The former should pretty much be adjustable by examining the color
of the ball to be detected. The latter is more complicated, a high
value decreases the chance of detection and reduces accuracy. A low
value prevents the algorithm from working at all.
The above code packages are for test programs. They do contain detcir.[ch]
but I want to access them directly. So, here I will maintain the latest
version of detcir.c and detcir.h.