Improving Object Detection with Contrast Stretching (Part 2/2)

In the previous article about contrast stretching, we explored percentile contrast stretching and how to apply this to obtain better performance in object detection models. Percentile contrast stretching is also called (histogram) normalization, as we normalize the range of the pixel intensities. In this article we will examine another contrast stretching method, called histogram equalization. Besides comparing model performance, we will also compare the preprocessing speed for histogram normalization in SAS with our own written percentile contrast stretching in Python.
A small recap of the previous article; we want to train an object detection model (Faster R-CNN) to detect tents in refugee camps. in Figure 1 the mAP plot of our base case can be seen, together with the rolling mean (window=100) of a measure for the area size of the detections (sqrt(Area)). We notice two things: although a lot of tents are found (recall=0.7), we have a lot of false positives (precision=0.2). The mAP score is 31%. We want to improve this result by using contrast stretching, and in this article we specifically look into histogram equalization.

Figure 1: Precision-Recall plot before contrast stretching, 10 epochs

We will first explain the math behind histogram equalization for all math enthusiasts, but feel free to skip directly to the results!

The math

With histogram equalization one can uniformly distribute all pixel intensities over the range [0,255]. Instead of simply spreading out these values more, which was what we did with percentile contrast stretching, we now choose new values for each pixel intensity in such a way that the histogram becomes uniformly distributed. To do this, we are looking for the transformation y=f(k), where y is a new pixel intensity based on the old pixel intensity k.

We will approach the histogram with pixel intensities from a probabilistic point of view. Then we can summarize all pixels N with intensity xi as drawings from a stochast X. We can express the occurrence of one value of k as a probability with

where I{xi=k} is an indicator function:

Then the discrete cumulative distribution function (CDF) is

To make y uniformly distributed on the range [0,255], we will use the transformation

We prove that this transformation will result into a uniform distribution as follows. As we introduced pX(k) this transformation Y=f(X) leads to a new distribution pY(k), which can be deduced by using the inverse CDF method:

Taking the derivatives with respect to y of both sides gives

As f-1(y)=k by definition, we get

Substituting y=f(k), we obtain

which equals the probability density function of a uniform distribution on the domain [0,255].

Results

In the first row of Figure 2 an example of an image slice before (left) and after histogram equalization (right) can be found. In the middle row of this figure the histograms of the 400×400 pixel values of the original slice are shown, together with the histograms of the two stretched slices. The difference in stretching methods is especially clear in the tails. For the 2-98 percentile stretching, we have a larger number of pixels having pixel value 0, as all the pixels having an original value lower than the 2 percentile value will attain this value. The same holds for the value 255.

Figure 2: Top row: Image Slices (Source: Google Earth, Maxar Technologies, second and third slice are edited. Middle row: Histogram of pixel distribution from slice above. Lower row: distribution of the standard deviation per image slice for each image band (GBR) and the mean of the standard deviation of the three channels (in orange)

Also we compare again the standard deviaton of pixel values in each image slice. The standard deviations of all 58,163 image slices are plotted in the bottom row of Figure 2. An observation that stands out is the fact that the spread of sigma is a lot smaller for the set of images after Histogram Equalization. This makes sense, as we are trying to make the pixel values of each image slice uniformly distributed. For this reason, the variance of the values of each slice will also approach the theoretical value of a uniform distribution:

With a=0 and b=255 we get

This leads to a standard deviation

which is exactly where you can find the peak in the histogram.

In Figure 3 we see that the mAP has increased from 31.03% to 40.93%. The effect is a little larger than the percentile contrast stretching from our previous post (mAP=40.58%).

Figure 3: Precision-Recall plot after histogram equalization, 10 epochs

 

Comparing processing speed

Although our two methods do not differ considerably in terms of increasing model performance, we are interesting whether one method significantly outperforms the other on speed. Therefore we run both methods serially on the same virtual machine with 16 cores of 2.4 GHz (8 CPUs with each 2 cores). The total memory of this virtual machine is 264 GB.

In order to do the percentile contrast stretching in Python, we wrote our own algorithm as we could not find a function that would do this for us. How you write this piece of code is very important on the speed performance. Our first algorithm could process around 50 image slices per minute. By using more numpy packages and less for-loops we increased the speed to 2,500 image slices per minute.

When coding in SAS you generally have less flexibility compared to coding in Python. That is, in SAS there might be less than 5 ways to get to the same outcome while in Python there are easily more than 20 ways, all depending on different kind of packages. However, this flexibility (from easy-to-read code to very efficient coding) comes with a trade-off in speed. To obtain a result in Python is not as hard as it is in SAS, however to obtain a result quickly can be much harder in Python due to the large amount of options you have when coding a certain program. Therefore a SAS program could have a speed advantage. This indeed seems the case. The histogram equalization of 58,163 image slices of 400×400 pixels each takes 7 minutes, which means we reach a speed of 8,300 image slices per minute, which is more than 3 times faster than our Python code.

Although the contrast methods differ and therefore the difference in speed cannot completely be attributed to the difference in Python and SAS, it gives an indication of how fast SAS can be.

If you would like to receive the program codes of both Python and SAS, feel free to reach out via the LinkedIn post about this article!