Comparative Study of Illumination-Invariant Foreground Detection
Comparative Study of Illumination-Invariant Foreground Detection
https://doi.org/10.1007/s11227-018-2488-1
Abstract
Foreground detection plays a vital role in finding the moving objects of a scene. For
the last two decades, many methods were introduced to tackle the issue of illumination
variation in foreground detection. In this article, we proposed a method to segment
moving objects under abrupt illumination change and analyzed the merits and demerits
of the proposed method with seven other algorithms commonly used for illumination-
invariant foreground detection. The proposed method calculates the entropy of the
video scene to determine the level of illumination change occurred and select the
update model based on the difference in entropy values. Benchmark datasets possess-
ing different challenging illumination conditions are used to analyze the efficiency
of the foreground detection algorithms. Experimental studies demonstrate the perfor-
mance of the proposed algorithm with several algorithms under various illumination
conditions and its low time complexity.
1 Introduction
Foreground detection is a basic step and commonly used approach for segmenting fore-
ground objects in video surveillance applications. Temporal differencing, optical flow
and background subtraction are the three methods used to detect foreground objects
from a video scene. Out of these three techniques, background subtraction is highly
used because of its accurate foreground detection and computationally less expensive.
In general, all background subtraction techniques [1–4] model the stationary portion
B P. R. Karthikeyan
karthikeyanest@gmail.com
123
2290 P. R. Karthikeyan et al.
of the video scene as background and compare the current scene with the modeled
background to detect foreground objects. Background subtraction algorithms [5] are as
simple as subtracting consecutive frames or as complex as sophisticated probabilistic
models. A simple background subtraction method may not detect foreground objects
accurately if the video scene is dynamic. A dynamic video scene may contain complex
background objects [6], like waving trees, different lightings, chairs, escalators, etc.
Foreground detection algorithms play a critical role in applications like public safety
[7] and traffic monitoring systems [8].
Despite a lot of proposed algorithms [9–15], illumination-invariant [16, 17], fore-
ground detection is far from being completely solved. A good foreground detection
algorithm should adapt to gradual as well as sudden illumination changes. In gradual
illumination change, the illumination changes may happen because of the time of a day
(i.e., movement of the sun), whereas in sudden illumination change, the illumination
may change due to clouds passing the outdoor scene or lights on or off in the indoor
scene. Gradual illumination changes could be handled by the existing algorithms to
some extent, but not the sudden illuminations. Recently, a block-based background
modeling algorithm and singular value decomposition (SVD)-based models [18, 19]
were presented to detect objects under varying illumination conditions. The block-
based model uses entropy and sum of absolute difference of blocks, but it suffers from
over-segmentation whereas the SVD-based model use the local structural information,
but it also suffers slow processing of video frames.
Even though many works proposed to evaluate the performance of foreground
detection algorithms [1–4], to the best of our knowledge, this article is the first to
address the adaptability and performance of different algorithms under various illu-
mination changes. This article attempts to provide a comparative study of foreground
detection algorithms commonly used and the proposed technique in detecting moving
object under varying illumination. This article may help engineers to select appropriate
algorithms to detect foreground objects on different kinds of the video scene.
The rest of the article is organized as follows: Sect. 2 describes seven foreground
detection algorithms used to deal with illumination variations. Section 3 presents our
foreground detection technique. Section 4 reports the experimental results of various
algorithms on three video dataset, and finally, conclusions are reported in Sect. 5.
The background subtraction algorithms model the background and find the disparity
between current frame and background to identify the foreground objects. The resultant
foreground object is called motion mask. Most of the background subtraction model
[2] use the following formula to calculate motion mask.
1 if d(It (k, l), Bt (k, l))> τ
Mt (k, l) (1)
0 otherwise
where Mt (k, l) is motion mask at time t, d is the disparity of current frame (It ) and
background model (Bt ), τ is a threshold value that varies among algorithms and usually
123
Comparative study of illumination-invariant foreground… 2291
it takes value in the range [2–6], if the disparity d is larger than the threshold τ then
that pixel location (k, l) will be assigned 1; otherwise, it will be assigned 0. Various
foreground detection algorithms are presented as follows.
Frame differencing is the fastest and easiest of all foreground detection methods. The
previous frame of a video sequence is taken as background, and the absolute disparity
between the current frame and previous frame gives motion mask. Based on the speed
of movement of foreground objects and the threshold value selected, its performance
varies. It can be mathematically expressed as
Bt (k, l) It−1 (k, l) (2)
where Bt (k, l) is the background model at the pixel (k, l) and It−1 (k, l) is the previous
pixel value at (k, l)
Background model was estimated by applying running media [10], to the incoming
frames by the following method. Initially, the first image from a video sequence is taken
as background model and each pixel is raised by 1 if the corresponding present pixel
intensity is greater than the background pixel or decremented by 1 in the case of current
pixel intensity is less than the background pixel. This method is computationally
inexpensive since it requires only one background image. The major disadvantages
of the method are it updates the background slowly when sudden changes occur and
foreground objects which are stationary will become background after some time. The
mathematically approximated median can be expressed as
Bt−1 (k, l) + 1 if Bt−1 (k, l) < It (k, l)
Bt (k, l) (3)
Bt−1 (k, l) − 1 if Bt−1 (k, l) > It (k, l)
where Bt (k, l) is background model at the pixel location (k, l), Bt−1 (k, l) is the previous
background model at the pixel location (k, l) and It (k, l) is the current pixel value at
(k, l).
It is the simplest Gaussian model [9], to find the motion mask. It calculates mean
of the video scene and subtracts mean frame with every incoming frame and checks
whether the disparity is larger than the threshold. If the disparity is greater than standard
deviation then that pixel is labeled as moving object; otherwise, that pixel is labeled as
background. This model performs better under gradual illumination change but fails
when sudden illumination changes occur. After finding the motion mask, it updates
mean value as given in Eq. (5). Single Gaussian can be mathematically denoted as
123
2292 P. R. Karthikeyan et al.
1 if |It (k, l) − μt (k, l)| >ρσ
Mt (k, l) (4)
0 otherwise
μt (k, l) (1 − α)μt−1 (k, l) + α It (k, l) (5)
where Mt (k, l) denotes the foreground, It (k, l) is the current pixel value at (k, l),
μt (k, l) and μt−1 (k, l) are the current and previous mean value at the pixel location
(k, l), σ is the standard deviation, ρ is a free parameter and α is a learning rate.
The single Gaussian is enough if the scene is static, but in reality, the scene may not be
static, so in [11], the authors proposed multimodal distributions to handle changes in
background. Later in [12], the author modeled every pixel as a mixture of K Gaussians
as given in Eq. (6) and the value of K is in the range of [3, 5]. Every pixel is compared
with the corresponding K Gaussians to detect foreground. The probability of It being
one among K Gaussians is
K
P(It ) ωi,t η(It − μi,t , i,t ) (6)
i1
where η(It − μi,t , i,t ) is the i th Gaussian with mean μi,t and covariance i,t , and
ωi,t as its weight; the covariance matrix is assumed to be i,t σi2 I . Initially, the first
image of video acts as background and σ is assumed as 6. The parameters of GMM
are updates as follows
where α and ρ are the learning rates, Ni,t is an indicator variable, and it is equals
to 1 if the ith component is matched, 0 otherwise. From the K distributions observed
exclusively, the first H distributions are observed as a background where H is estimated
as h
H arg min ωi >τ (10)
h i1
τ is a threshold. If any pixel intensity value goes away from standard deviation 2.5
of any of the H distributions, it will be labeled as a foreground pixel.
2.5 Sigma-delta
The sigma-delta method utilizes the approximated median [10], to model background.
In addition to approximated median the authors in [15] measured the temporal activities
of the pixels by estimating the temporal standard deviation. If the difference image
123
Comparative study of illumination-invariant foreground… 2293
obtained from the approximated median method is greater than the temporal standard
deviation, it is labeled as foreground else it is labeled as background. Since it involves
elementary increment and decrement operations, this method is more suitable for real-
time purpose, but if the foreground objects become static it misclassifies the object as
background.
2.6 ISBS
The ISBS method updates background with respect to the change in luminance level.
In [13], the background model is estimated as in approximated median and to find
the luminance change to update background model entropy is used. Here the entropy
value varies as the video scene becomes dark or bright, entropy goes high as the
scene becomes bright and goes low as the scene becomes dark. Entropy is estimated
from probability density function (pdf) determined for every incoming video frames
as follows.
lmax
Et − pdf(l) log(pdf(l)) (11)
ll
min
pdf(l) nl /(M · N ) (12)
where E t is the entropy at time t, l is the intensity values in the video frame, lmin
and lmax are the minimum and maximum intensities of the video sequence, nl is the
frequency of an intensity value and M · N is the size of an image.
2.7 ViBe
In ViBe [14], background pixels are modeled with a set of values, rather than with a
particular background model. The new values coming from video scene are compared
with the background samples. The background modeled by a set of N values is as
given below
where B(k, l) denotes background sample at the pixel location (k, l), whereas
{v1 , v2 , . . . , v N } are the samples at that location. To distinguish a pixel as foreground
and background, they define a sphere S R (v(x)) of radius R centered at v(x). Here the
pixel v(x) is to be identified as foreground or background. The pixel v(x) is identified
as background if the samples inside the sphere are greater than the threshold.
0 if {S R (v(u, v)) ∩ {v1 , v2 , . . . v N }}> τ
Mt (k, l) (14)
1 otherwise
123
2294 P. R. Karthikeyan et al.
3 Proposed method
The proposed algorithm calculates the entropy of video scene to determine the level
of illumination change happened in the scene. In the proposed algorithm, first frame
of the video is considered as an initial model of the background then the entropy
of incoming frames are calculated and compared with the previous entropy value to
determine the level of change happened in the video scene. If the change occurred
exceeds the threshold value, then present background model is replaced by initial
background model else the present model is updated recursively as in singe Gaussian
model. The strategy is to update the background model to initial model when sudden
illumination change takes place and update the background model recursively when
gradual illumination change occurs. The threshold value is empirically found and is
set to 0.06. The simplicity of algorithm not only makes it perform in real time but also
gives competent results.
A pseudo-code of proposed background model is given in proposed algorithm.
Proposed Algorithm
Input: n video frames.
Output: foreground objects.
Read Video.
Divide the video sequence into frames f1,f2, f3,…fn.
Initial background model = f1.
Estimate the entropy of initial background model
for i=2 to fn do
Calculate entropy of input frames.
Compare the entropy of current frame Ecurrent with the entropy of
123
Comparative study of illumination-invariant foreground… 2295
4 Experimental results
TP
Recall = (15)
TP + FN
TP
Precision = (16)
TP + FP
2(Precision) (Recall)
F - Measure = (17)
(Precision) + (Recall)
where true positive (TP) specifies the sum of foreground pixels rightly identified
as foreground, false negative (FN) specifies the sum of foreground pixels wrongly
identified as background pixels, and false positive (FP) specifies the sum of background
pixels misclassified as foreground pixels. Recall describes the percent of foreground
region rightly detected to the actual foreground region. Precision indicates the percent
of foreground region rightly detected to region labeled as foreground. A good algorithm
should produce maximum precision and maximum recall value. F-measure gives a
weighted average of recall and precision. F-measure is used in addition to precision
and recall because only one true positive may result in high precision and all true
positive in a frame may result in a high recall.
Figure 1 shows the moving objects detected by seven commonly used algorithms
and a proposed algorithm on three video sequences. The motion masks generated by
the eight algorithms are given against the original video and its ground truth. The
qualitative results of various algorithms are depicted in the figure. Frame differencing
algorithm is good in predicting changes that are occurring at a moderate rate but fails to
detect changes that are too slow or too fast. The approximate median algorithm is quite
fast in execution but produces a lot of false negatives, and this algorithm incorporates
foreground into the background even if it stops for a minimal time. Even though the
single Gaussian is less complex, it is unable to detect the changes consistently, and it
fails to perform when compared to other methods.
The GMM method outperforms all other methods in most of the situations, but
this method is complex one when compared to other methods and is computationally
expensive in several embedded hardware. Sigma-delta method utilizes the advantage of
the approximated median method and in addition to that, it incorporates better thresh-
olding technique to detect foreground objects better. ISBS method is less expensive in
terms of computation but fails to detect the gradual changes and lots of noise appears
on the resultant frames. ViBe method which generally performs better in dynamic
scenes fails to detect gradual and sudden illumination changes.
123
2296 P. R. Karthikeyan et al.
Original Image
Ground Truth
Frame
Difference
Approximated
Median
Single Gaussian
GMM
Sigma-delta
ISBS
ViBe
Proposed
Method
Fig. 1 Results of foreground detection in the light switch, time of day and lobby video sequences
123
Table 1 Quantitative analysis of illumination-invariant algorithms
Sequence Metrics Frame Approximated Single GMM Sigma-delta ISBS ViBe Proposed
difference median Gaussian method
Light switch Recall 0.3150 0.1782 0.3740 0.7685 0.7521 0.6875 0.5563 0.6459
Precision 0.7967 0.0420 0.0799 0.1343 0.1426 0.7420 0.1154 0.8315
F-measure 0.4515 0.0680 0.1317 0.2286 0.2397 0.7137 0.1911 0.7270
Comparative study of illumination-invariant foreground…
Time of day Recall 0.2636 0.2253 0.3873 0.6280 0.4896 0.2483 0.2830 0.4325
Precision 0.6732 1.0000 0.9946 0.9525 0.9684 0.6051 0.9951 0.7746
F-measure 0.3788 0.3678 0.5576 0.7569 0.6503 0.3521 0.4407 0.5551
Lobby Recall 0.2816 0.0874 0.3010 0.7573 0.6621 0.4932 0.2932 0.5515
Precision 0.5598 0.0449 0.1769 0.8647 0.9342 0.0776 0.0216 0.7738
F-measure 0.3747 0.0593 0.2229 0.8075 0.7750 0.1341 0.0402 0.6440
123
2297
2298 P. R. Karthikeyan et al.
123
Table 2 Comparison of FPS of various approaches
Sequence Resolution Frame Approximated Single GMM Sigma-delta ISBS ViBe Proposed
difference median Gaussian method
Comparative study of illumination-invariant foreground…
123
2299
2300 P. R. Karthikeyan et al.
5 Conclusion
References
1. Piccardi M (2004) Background subtraction techniques: a review. IEEE Int Conf Syst Man Cybern
4:3099–3104
2. Benezeth Y, Jodoin Pierre-Marc, Emile Bruno, Laurent Helene, Rosenberger Christophe (2012) Com-
parative study of background subtraction algorithms. J Electron Imaging 19(3):12
3. Ahmed Sumaya H, El-Sayed Khaled M, Elhabian Shireen Y (2008) Moving object detection in spatial
domain using background removal techniques-state-of-art. Recent Pat Comput Sci 1(1):32–54
4. Bouwmans Thierry (2014) Traditional and recent approaches in background modeling for foreground
detection: an overview. Comput Sci Rev 11(12):31–66
5. Sen-ching SC, Chandrika K (2004) Robust techniques for background subtraction in urban traffic
video. In: Proceedings of the SPIE 5308, Visual Communications and Image Processing
6. Li L, Huang W, Irene YH Gu, Qi Tian (2003, November) Foreground object detection from videos
containing complex background. ACM International Conference on Multimedia, pp. 2–10
7. Wahyono A, Filonenko Jo KH (2016) Unattended object identification for intelligent surveillance
systems using sequence of dual background difference. IEEE Trans Ind Inf 12(6):2247–2255
8. Wang K, Liu Y, Gou C, Wang FY (2016) A multi-view learning approach to foreground detection for
traffic surveillance applications. IEEE Trans Veh Technol 65(6):4144–4158
9. Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: real-time tracking of the human
body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785
10. McFarlane NJB, Schofield CP (1995) Segmentation and tracking of piglets in images. Mach Vis Appl
8(3):187–193
11. Friedman N, Russell S (1997, August) Image segmentation in video sequences: a probabilistic
approach. In: Thirteenth Conference on Uncertainty in Artificial Intelligence, pp. 175–181
12. Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. IEEE
Comput Soc Conf Comput Vis Pattern Recogn 2:252
13. Cheng FC, Huang SC, Ruan SJ (2011) Illumination-sensitive background modeling approach for
accurate moving object detection. IEEE Trans Broadcast 57(4):794–801
14. Barnich O, Van Droogenbroeck M (2011) ViBe: a universal background subtraction algorithm for
video sequences. IEEE Trans Image Process 20(6):1709–1724
15. Manzanera A, Richefeu JC (2004, December) A robust and computationally efficient motion detection
algorithm based on - background estimation. In: Indian Conference on Computer Vision, Graphics
and Image Processing, pp. 46–51
16. Lou J, Yang H, Hu W, Tan T (2002, January) An illumination invariant change detection algorithm.
In: Asian Conference on Computer Vision, pp. 13–18
17. Holtzhausen PJ, Crnojevic V, Herbst BM (2015) An illumination invariant framework for real-time
foreground detection. J Real Time Image Process 10(2):423–433
123
Comparative study of illumination-invariant foreground… 2301
18. Elharrouss O, Abbad A, Moujahid D, Tairi H (2018) Moving object detection zone using a block-based
background model. IET Comput Vis 12(1):86–94
19. Kim W, Kim Y (2016) Background subtraction using illumination-invariant structural complexity.
IEEE Signal Process Lett 23(5):634–638
20. Toyama K, Krumm J, Brumitt B, Meyers B (1999) Wallflower: principles and practice of background
maintenance. IEEE Int Conf Comput Vis 1:255–261
21. Li Liyuan, Huang Weimin, Irene Yu-Hua Gu, Tian Qi (2004) Statistical modeling of complex back-
grounds for foreground object detection. IEEE Trans Image Process 13(11):1459–1472
123