Improving Object Detection with Contrast Stretching (Part 2/2)

In the previous article about contrast stretching, we explored percentile contrast stretching and how to apply this to obtain better performance in object detection models. Percentile contrast stretching is also called (histogram) normalization, as we normalize the range of the pixel intensities. In this article we will examine another contrast stretching method, called histogram equalization. Besides comparing model performance, we will also compare the preprocessing speed for histogram normalization in SAS with our own written percentile contrast stretching in Python.
A small recap of the previous article; we want to train an object detection model (Faster R-CNN) to detect tents in refugee camps. in Figure 1 the mAP plot of our base case can be seen, together with the rolling mean (window=100) of a measure for the area size of the detections (sqrt(Area)). We notice two things: although a lot of tents are found (recall=0.7), we have a lot of false positives (precision=0.2). The mAP score is 31%. We want to improve this result by using contrast stretching, and in this article we specifically look into histogram equalization.

Figure 1: Precision-Recall plot before contrast stretching, 10 epochs

We will first explain the math behind histogram equalization for all math enthusiasts, but feel free to skip directly to the results!

The math

With histogram equalization one can uniformly distribute all pixel intensities over the range [0,255]. Instead of simply spreading out these values more, which was what we did with percentile contrast stretching, we now choose new values for each pixel intensity in such a way that the histogram becomes uniformly distributed. To do this, we are looking for the transformation y=f(k), where y is a new pixel intensity based on the old pixel intensity k.

We will approach the histogram with pixel intensities from a probabilistic point of view. Then we can summarize all pixels N with intensity xi as drawings from a stochast X. We can express the occurrence of one value of k as a probability with

where I{xi=k} is an indicator function:

Then the discrete cumulative distribution function (CDF) is

To make y uniformly distributed on the range [0,255], we will use the transformation

We prove that this transformation will result into a uniform distribution as follows. As we introduced pX(k) this transformation Y=f(X) leads to a new distribution pY(k), which can be deduced by using the inverse CDF method:

Taking the derivatives with respect to y of both sides gives

As f-1(y)=k by definition, we get

Substituting y=f(k), we obtain

which equals the probability density function of a uniform distribution on the domain [0,255].

Results

In the first row of Figure 2 an example of an image slice before (left) and after histogram equalization (right) can be found. In the middle row of this figure the histograms of the 400×400 pixel values of the original slice are shown, together with the histograms of the two stretched slices. The difference in stretching methods is especially clear in the tails. For the 2-98 percentile stretching, we have a larger number of pixels having pixel value 0, as all the pixels having an original value lower than the 2 percentile value will attain this value. The same holds for the value 255.

Figure 2: Top row: Image Slices (Source: Google Earth, Maxar Technologies, second and third slice are edited. Middle row: Histogram of pixel distribution from slice above. Lower row: distribution of the standard deviation per image slice for each image band (GBR) and the mean of the standard deviation of the three channels (in orange)

Also we compare again the standard deviaton of pixel values in each image slice. The standard deviations of all 58,163 image slices are plotted in the bottom row of Figure 2. An observation that stands out is the fact that the spread of sigma is a lot smaller for the set of images after Histogram Equalization. This makes sense, as we are trying to make the pixel values of each image slice uniformly distributed. For this reason, the variance of the values of each slice will also approach the theoretical value of a uniform distribution:

With a=0 and b=255 we get

This leads to a standard deviation

which is exactly where you can find the peak in the histogram.

In Figure 3 we see that the mAP has increased from 31.03% to 40.93%. The effect is a little larger than the percentile contrast stretching from our previous post (mAP=40.58%).

Figure 3: Precision-Recall plot after histogram equalization, 10 epochs

 

Comparing processing speed

Although our two methods do not differ considerably in terms of increasing model performance, we are interesting whether one method significantly outperforms the other on speed. Therefore we run both methods serially on the same virtual machine with 16 cores of 2.4 GHz (8 CPUs with each 2 cores). The total memory of this virtual machine is 264 GB.

In order to do the percentile contrast stretching in Python, we wrote our own algorithm as we could not find a function that would do this for us. How you write this piece of code is very important on the speed performance. Our first algorithm could process around 50 image slices per minute. By using more numpy packages and less for-loops we increased the speed to 2,500 image slices per minute.

When coding in SAS you generally have less flexibility compared to coding in Python. That is, in SAS there might be less than 5 ways to get to the same outcome while in Python there are easily more than 20 ways, all depending on different kind of packages. However, this flexibility (from easy-to-read code to very efficient coding) comes with a trade-off in speed. To obtain a result in Python is not as hard as it is in SAS, however to obtain a result quickly can be much harder in Python due to the large amount of options you have when coding a certain program. Therefore a SAS program could have a speed advantage. This indeed seems the case. The histogram equalization of 58,163 image slices of 400×400 pixels each takes 7 minutes, which means we reach a speed of 8,300 image slices per minute, which is more than 3 times faster than our Python code.

Although the contrast methods differ and therefore the difference in speed cannot completely be attributed to the difference in Python and SAS, it gives an indication of how fast SAS can be.

If you would like to receive the program codes of both Python and SAS, feel free to reach out via the LinkedIn post about this article!

Improving Object Detection with Contrast Stretching (Part 1/2)

An important part of training neural networks is preprocessing of the input. A lot of performance gain can be obtained by carefully examining, cleaning and transforming the input data. In this post we will consider the influence of contrast stretching of our input images on the performance of a Faster R-CNN network to recognize objects. In this Hackathon these objects are white tents in refugee camps. This post is the first part of our series about contrast stretching. The second part can be found here.

Background

A Faster R-CNN is an object detection model. It is based on conventional CNNs. A CNN (Convolutional Neural Network) is often used to classify images. A Faster R-CNN model is based on such a CNN model and extends it with region proposals. These region proposals contain regions of the image in which an object is potentially present. Then the CNN part classifies these proposals.

Using a training set of 3,500 image slices, we train the Faster R-CNN model 10 epochs, with in each epoch 1,000 randomly chosen images. We validate our model on a validation set, consisting of 2,034 image slices with in total 1,976 white tents.

In Figure 1 the mAP (mean Average Precision) plot can be found, together with the rolling mean (window=100) of a measure for the area size of the detections (sqrt(Area)). Keep in mind that, to create a mAP curve, the detected boxes are sorted by the probability the box contains a tent, according to the model. We notice two things: although a lot of tents are found (recall=0.7), we have a lot of false positives (precision=0.2). Especially the sharp drop in the mAP curve at the left side of the graph is a problem; this indicates that, although the model is quite confident about these boxes containing a tent, in reality they do not contain any object. Also the model seems to be more confident when assessing larger bounding boxes, as those appear (incorrectly as just discussed) in the beginning of the mAP curve.

Figure 1: Precision-Recall plot before contrast stretching, 10 epochs

We try to improve these results by using contrast stretching. Contrast stretching should theoretically improve the learning ability of the model as it enhances the contours of objects and emphasizes the difference between object and background. This helps the convolution layers in extracting information and features from the images. Contrast stretching can be done for an image simply by using a formula that scales up the differences in pixel values. As we would like to use colored images, we have to scale up three image ‘channels’: the red channel (R), the green channel (G) and the blue channel (B). A common convention is to represent each pixel of each channel with an integer in the range [0,255], where a RGB value of (0,0,0) equals black and (255,255,255) represents white.

The math

As we deal with image slices of 400×400 pixels, this results in three channels of 400×400 pixels. So in total one image slice can be represented by a 400x400x3 matrix with values between 0 and 255. Before we dive into the formulas, one final remark needs to be made. We are going to split up this 400x400x3 matrix into 3 matrices of 400×400. Then we will apply contrast stretching on each color channel (R, G and B) and afterwards we will combine the three stretched matrices again into a 400x400x3 matrix.

Then stretching can be done by transforming pixel x with a formula like:

where a represents the lower boundary to which we want to scale and b represents the upper boundary. Instead of using the minimum and maximum value in our contrast stretch as lower and upper boundary, we use respectively the 2 and 98 percentile values. An advantage of taking these percentile values is that it is more robust to outliers; if only one pixel in the image channel would be 0 and only one pixel would be 255, no contrast stretching would occur. As we use the 2 and 98 percentile values, we generate more stretching. We should be careful though that after stretching this way, we should round up all values below the 2 percentile to a value of 0 and all the values above the 98 percentile to a value of 255. This directly shows the disadvantage of stretching with different values than the minimum and maximum value; we lose some information.

To quantify the effect of the contrast stretching, we introduce a metric that represents the spread in color within one channel k in one image n: σk,n.

This standard deviation can be obtained by calculating the variance of one 400×400 matrix. If we indicate a pixel x at row i and column j of this 400×400 matrix with xi,j, we get:

where is the average pixel value in the matrix, calculated with

with I,J=400. Then σk,n = √σ2k,n.

Our total dataset consists of N=58,163 images.

In the upper two image slices in Figure 2 the qualitative effect of 2-98 percentile contrast stretching can be seen.

In the two diagrams below we have plotted the histograms of our metric. For each of the image slice we have calculated the standard deviation of each of the three image channels. Then we apply the contrast stretching separately on the three channels, resulting in the diagram on the right. As we would like to monitor the effect of separately applying contrast stretching on each channel, we also plotted a histogram of the total image, by taking the average of the three standard deviations (R,G and B, so K=3):

As can be seen, the shift in the average standard deviation for each image is representative for the shift per channel. Moreover, a significant increase in standard deviation can be observed. We can calculate the average standard deviation of all the image slices with

Figure 2: Top left: original image slice (Source: Google Earth, Maxar Technologies). Top right: same image slice after 2-98 percentile contrast stretching. In the lower two images the distribution of the standard deviation per image slice for each image band (GBR) and the mean of the standard deviation of the three channels (in orange)

Notice that this standard deviation of the pixel values in an image increases a lot. The average standard deviation of all pixel values within an image slice (400×400 pixels) was μσ = 26 before contrast stretching and after stretching it is μσ = 64. Also the values of the standard deviation are more normally distributed. Every channel has been stretched with its own 2 and 98 percentile values (per image slice), but as can be seen the stretching effect is the same for every channel.

Results

When training the model on this pre-processed dataset, we end up with a new mAP curve, see Figure 3.

Figure 3: Precision-Recall plot after 2-98 percentile contrast stretching, 10 epochs

When comparing Figure 1 to Figure 3, we can notice several improvements. First of all, the recall increases from 0.7 to 0.8, which means 80% of the 1,976 tents is detected. Also the sharp drop in precision at the start of the mAP plot has been decreased. Together this leads to an improving mAP score of 9%. Furthermore, notice that the rolling mean of the area also decreases more gradually. As we want the model to be not too sensitive regarding the size of tents, this is an improvement as well. However, we notice that smaller detected bounding boxes still come with a faster drop in precision, as can be seen when following the blue graph from recall=0.6 to the right. So some improvement can still be made.

SAS EMEA Hackathon 2020 – A deeper dive into our case

As stated in our previous post, we are going to make a model to extract information from satellite images to help estimating the number of refugees in refugee camps in Nigeria. In this post we will dive some deeper into this goal. We will answer the questions “How do we want to make this model work?” and “How is this going to help in estimating the number of refugees?”

How do we want to make this model work?
In Figure 1 an example of satellite imagery can be found. In this image tents are clearly distinguishable. We think it is possible to extract the number of people in a refugee camp by considering the buildings in such a camp. Based on the buildings, several features can be extracted e.g. the number of tents or the total area the tents cover. Bjorgo (2000) has shown that information from satellite imagery can be used in estimating populations in refugee camps, doing this for 5 refugee camps. However it is cumbersome to count the number of tents or the covered area by hand, especially when going from 5 to 50 camps and when considering temporal variation as well (see Figure 2). We want to automate this process in such a way that it can easily be used during daily operations, such as decision making in supplies distribution. We will train this object detection model by using survey data collected in these refugee camps.

Figure 1: A refugee camp in Nigeria, 10 June 2015 (Source: Google Earth, Maxar Technologies)

 

Figure 2: A refugee camp in Nigeria, 2 January 2016 (Source: Google Earth, Maxar Technologies)

How is this going to help in estimating the number of refugees?
When looking at Figure 1, one could question the accuracy of the relationship between the tents and the number of refugees. Tents could be placed in advance to anticipate on an increase in number of refugees, or tents could be empty because of a decrease in the number of refugees in a camp. Also a shortage of tents could lead to refugees without a shelter, leading to an underestimation of the population when counting tents. However, discussing these things with ELVA we could quickly conclude that tents are almost never empty. Most of the work in these camps is responsive as there is often a shortage in supplies and workforce. Also most of the population in a camp is connected to family with phones, so even if tents would be empty, this message would spread rather quickly, resulting in more refugees travelling to the concerning camp(s). A shortage of tents is more realistic and in the surveys we also encounter people without a shelter. However, also for these cases it is still useful to estimate the number of people within the camp based on the number of tents for two reasons. First of all, our model at least could give a lower boundary of the number of refugees in a camp, which is better than the current situation. Secondly, our tool will not (directly) replace the surveying, so we can check the observations of the surveys a month after our direct estimation. Finding systematic underestimation of the number of refugees in a camp could indicate a shortage of tents in these camps. This can lead to more quantitatively based decision making in supply distribution.

 

References

Bjorgo, E. (2000). Using very high spatial resolution multispectral satellite sensor imagery to monitor refugee camps. International Journal of Remote Sensing, 21(3), 611-616.