0% found this document useful (0 votes)

16 views21 pages

Dynamic_background_modeling_using_deep_l

The paper presents a deep learning approach for dynamic background modeling using an autoencoder network, aimed at improving object detection in surveillance applications. The method employs an unsupervised training technique to optimize the background model from incoming video frames, demonstrating effectiveness in challenging environments. Performance validation on datasets like CDNET 2014 shows that the proposed technique outperforms existing methods in terms of accuracy and robustness.

Uploaded by

محمد العبيد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views21 pages

Dynamic_background_modeling_using_deep_l

Uploaded by

محمد العبيد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-019-7411-0

Dynamic background modeling using deep learning

autoencoder network

Jeffin Gracewell 1 & Mala John 1

Received: 30 July 2018 / Revised: 8 January 2019 / Accepted: 22 February 2019

# Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract
Background modeling is a major prerequisite for a variety of multimedia applications like
video surveillance, traffic monitoring, etc. Numerous approaches have been proposed for the
same over the past few decades. However, the need for real time artificial intelligent based low
cost approach still exists. Moreover, few recently proposed efficient approaches are not
validated on the basis of some of the challenging applications in which they may fail in its
efficiency when tested. In this paper, an efficient deep learning technique based on
autoencoder network is used for modeling the background. The background model generated
herein is obtained by training the incoming frames of the surveillance video with the deep
learning network in an unsupervised manner. In order to optimize the weights of the network,
greedy layer wise pre-training approach is used initially and the fine tuning of the network is
done using conjugate gradient based back propagation algorithm. The performance of the
algorithm is validated based on the application of unattended object detection in a dynamic
environment. Comprehensive assessment of the proposed method using CDNET 2014 dataset
and other datasets demonstrates the efficiency of the technique in background modeling.

Keywords Background modeling . Background subtraction . Deep learning . Foreground

extraction . Intruder detection . Unattended object detection . Visual surveillance

1 Introduction

In recent years, surveillance cameras have been widely used among the public in large quantity
for various safety and security purposes [14, 16, 40, 45]. Monitoring all such surveillance
video round the clock, manually, with the lone support of human interventions, is a tedious
task. Hence, the need for user friendly real time artificial intelligent system that can detect
objects or events of interest by itself is highly appropriate to reduce the challenges in
monitoring the surveillance video. Therefore, the goal of the system is to design an approach

* Mala John
malajohnmit@gmail.com

1
Department of Electronics Engineering, Madras Institute of Technology, Chennai, India
Multimedia Tools and Applications

that can automatically identify activities that are unusual in happening like abnormal event
detection, unattended object detection, intruder detection and so on.
The basic fundamental technique used in surveillance is the background subtraction. The
purpose of background subtraction is to discriminate moving foreground objects, from the
background model. This is done by subtracting the current frame from the background model
for each co-located pixel values in the image. The accuracy of the background subtraction
depends on background modeling.
Though many approaches have been proposed for background modeling in the last few
decades, the need for improvement exists due to its inherent complexity. The performance
accuracy is challenged by the sudden change in luminance, motion patterns created due to
waterfall, waving trees, water ripples, fountains, shadows, addition or deletion of objects in the
background and opening or closing of the door etc. The results of foreground extraction is
quite prominent for videos belonging to static background. However, it lacks accuracy in non-
static background. Surveillance videos in real time scenarios have dynamic background and
should be carefully modeled.
The simplest strategy to detect foreground regions is the frame differencing method, where
the difference of the pixels in the current frame from those in a previous frame is calculated,
where the previous frame act as a background model. Clearly, this approach, although very
simple, is prone to many false positive errors due to global illumination changes and other
factors. As a pioneer work, various simple background modeling techniques, such as simple
averaging [39], median [19], or histogram analysis over time [46] have been introduced.
However, they are highly prone to error under various unconstrained real-world situations
including background clutters, outdoor dynamics, etc. In most of the previous works, the
background model is updated at regular intervals for accurate results. Deng and Guo [9]
proposed a self-adaptive background modeling method that improved the performance by
updating the Gaussian component of background model at regular interval.
In order to cope with different challenges in real time scenario, many authors have proposed
methods with multiple background models rather than single background models. Recently,
Sajid and Cheung [36], proposed a universal multimode background subtraction method in
which, the training image is treated as a background model. Multiple color space is used for
background subtraction based on lighting conditions. Here, RGB and Y color channels are
used under poor lighting conditions while Cb and Cr color channels of YCbCr color space are
used under good lighting conditions. Both YCbCr and RGB color spaces are used for
foreground or background classification under intermediate lighting conditions.
Several attempts have been made, to provide a neural network based background modeling
technique for foreground extraction. Gregorio and Giordano [8] proposed a background
estimation method based on a weightless neural network, and termed as BEWiS. This method
has a fast learning rate and is highly applicable in the video surveillance of both in long-term
and live videos. Babaee et al. [1] proposed a deep learning method for background modeling
based on Convolutional neural network which outperforms various other technique.
In general, deep learning methods are different from all traditional approaches. They
automatically learn features from raw pixels directly which act as an input to the network.
In [17], the author made an attempt to retrieve images using deep autoencoder network and
was successful with large dataset that contains about 80 million tiny images. In general, the
major advantage of deep learning also lies in the fact that the performance of the system does
not degrade with increase in the amount of data, unlike other techniques. Moreover, Wang
et al. [43] used the autoencoders for data visualization application. Also, the unsupervised
Multimedia Tools and Applications

pretraining used in the autoencoder acts as a pre-conditioner to optimize the initial parameter,
thereby making the network more effective for modeling the background image. Motivated by
these factors, in this paper, a deep learning technique based background modeling is proposed.
The proposed method outperforms most other state-of-the-art methods on a challenging
dataset. The robustness of the approach in modeling the background in a dynamic background
is demonstrated in abandoned object detection in remote video surveillance.
This paper is organized as follows. Section 2 presents the various works related to
background modeling, Section 3 explains in detail, the proposed technique. Section 4 elabo-
rates the applications of the proposed technique. Section 5 discusses the performance of the
proposed technique along with its application and Section 6 concludes this paper.

2 Related works

A plethora of work has been published on background modeling in the past decades. In
general, based on different methodology, background modeling can be mainly classified into
two categories namely pixel-based and region-based methods.
Pixel-wise methods are based on the assumption that the observation sequence of each pixel is
independent of each other. These methods make use of the statistical information over the time to
model the background. Wren et al. [44] presented a single Gaussian function that modeled the
background at each pixel location with a Gaussian distribution. The progress from a single Gaussian
approach is then extended to a mixture of Gaussians (MoG) models. Stauffer and Grimson [38]
proposed the Gaussian mixture model (GMM) which models every pixel with a mixture of K
Gaussians functions. An improvement to GMM was then made using online expectation maximi-
zation (EM) algorithm to initialize the parameter in the background model, but suffers because of
time complexity. Zivkovic in [48] proposed an adaptive GMM (AGMM) to efficiently update
parameters in GMM. This method showed improved performance for the outdoor scenario, but the
need for a solution for dynamic background videos still prevailed. Elgammal et al. in [11] proposed a
method to develop the statistical representations of the background and the foreground by utilized
nonparametric kernel density estimation techniques for building. In order to efficiently handle
backgrounds having large variations, Bayesian approaches have been introduced [3, 20]. Li et al.
[20] introduced a Bayesian framework that incorporates spectral, spatial and temporal features to
characterize the background appearance. Benedek et al. [3] proposed an adaptive shadow model,
which found improvements in performance for scenes under difficult lighting coloring effects and
shadow region.
Barnich and Droogenbroeck [2] presented a pixel-based nonparametric algorithm to detect
the foreground using random selection strategy and termed as ViBe. Few limitations of this
method were later observed and corrected by Van Droogenbroeck and Paquot [10]. Few of
their modification includes inhibition of propagation for some pixels in the updating mask,
change in distance function and thresholding criterion, proper filtering of connected compo-
nents, and so on. Lin et al. [23] proposed a method to initialize the background using a
probabilistic Support Vector Machine (SVM). Output probabilities of each pixel are calculated
using SVM to classify them as foreground or background which is used in the construction of
background model. The process of background initialization continues until there are no more
new pixels available for classification. Culibrk [7] proposed a background segmentation
method that relies on Probabilistic Neural Network (PNN) architecture for background
modeling and used unsupervised Bayesian classifier to classify as foreground or background.
Multimedia Tools and Applications

Kim et al. [15] presented a new adaptive background subtraction algorithm based
on a codebook in which, each pixel is quantized into the codebook. This method
achieves robust detection for compressed videos and could handle scenes containing
moving backgrounds or illumination variations with limited memory space. Although
pixel based background modeling methods can effectively obtain detailed shapes of
foreground objects, they are easily affected by noise, illumination changes, and
dynamic backgrounds.
The region-based methods makes use of spatial information termed as image regions to
exploit the spatial dependencies of pixels in the background. Matsuyama et al. [32]
proposed a correlation based regional block matching method for background subtraction
which is robust against the varying illumination conditions. Few region based methods
make use of texture or pattern information to detect the moving object in the scene [13, 21,
24, 25, 35]. Li and Leung [21] used the difference in texture information between two
adjacent frames to detect moving objects in the scene. Here, a discriminative texture
feature called local binary pattern (LBP) proposed by Ojala et al. in [35] along with the
histogram information was used for modeling the background and moving foreground
object detection. Moreover, these pattern information are also widely used in other
applications like activity recognition in video surveillance. Bhargava et al. [5] proposed
a framework based on contextual information in order to detect an abondoned
baggage. Lin et al. [24] presented a method for activity recognition by making use of
the temporal pattern which is considered as midlevel feature representation for activities.
This work was further extended in [25] for recognition of complex activity by including
adaptive Multi-Task Learning as an additional component, which involves capturing the
relatedness among activities and selecting the discriminated features. In [41], Toyama
et al. proposed a Wallflower Algorithm with three component system while the region-
level component segments the homogeneous regions of foreground objects. Maddalena
and Petrosino [28] proposed a technique for background modeling by a new self-
organizing method that classifies the foreground and background by learning the motion
pattern Furthermore, the neural network-based mapping method is used to update the
initial background model. In [47], Zhao et.al. proposed a Stacked Multilayer Self-
Organized Map (SMSOM) method for background modeling in which every pixel is
modeled by a SMSOM, and spatial consistency is considered at each layer. This method
gains several merits including strong representative ability to learn background model of
challenging scenarios, and automatic determination for most network parameters.
More recently, deep learning has been effective in image processing techniques and
has achieved superior capabilities in classification, representation learning and several
other applications [12, 18, 26, 30, 34]. In [30] Marsden et al. proposed a residual deep
learning architecture with multiple objectives for violent behavior detection, crowd
counting, and crowd density classification. In [34], Muhammad et al. presented a CNN
architecture for fire detection in a video surveillance application. Though the proposed
deep learning method performs better than the AlexNet architecture for fire detection
application, it suffers from higher false alarm rate. Guo et al. in [12] presented a deep
learning approach for object retrieval by learning the multiple deep features of a visual
object in the surveillance videos. Moreover, the advantage also lies in the adaptability of
the system to perform well in various applications based on design functions. Motivated
by these works, in this paper unsupervised deep learning technique is used to model the
background, which can be efficiently used in the application of surveillance.
Multimedia Tools and Applications

3 Methodology

3.1 Overview

Background model is a preliminary component for the foreground extraction. In this paper, a
method for background modeling is proposed by building a deep network, based on stacking
layers of autoencoders which capture the underlying background model. The deep learning
network is capable of providing a latent representation of the input pattern with its hidden
layers. The first incoming T sets of frames are used for modeling the background using deep
learning architecture. With sufficient training, the output of the network gives the background
model of the input frames presented to the deep learning network. The overview of the
proposed architecture for background modeling is depicted in Fig. 1. The details of the training
of the network and the associated background modeling are presented in the next subsection.

3.2 Background modeling

Deep learning architecture used herein for background modeling, belongs to the class of
unsupervised learning. In this work, an adaptive, deep learning architecture that transforms
the images into a low-dimensional code and decodes it back to the same input image, is used.
The architecture of the deep learning network used is presented in Fig. 2. The deep learning
network consists of a single input layer, a single output layer and three fully connected hidden
layers sacked with two autoencoder network. The approach involves training one layer at a
time, in an unsupervised way while freezing the parameters of the other layers. To do this, the
raw input is given to the input of the first autoencoder that transforms the raw input into a
vector of lower dimensions. This lower dimensional vector is the output of the hidden layer of
the autoencoder. The weights between the input and hidden layer are updated during this

Fig. 1 Main framework of our proposed method

Multimedia Tools and Applications

Fig. 2 Architecture of the deep learning network

encoding stage of training the first autoencoder. The weight matrix of the decoding stage is the
transpose of the weight matrix of the encoding stage. This procedure is repeated for the
subsequent network, while the output of the hidden layer of the first autoencoder acts as the
input for the next autoencoder. The second autoencoder provides an additional level of
compact encoding.

3.2.1 Training of RBM

There are two autoencoders trained sequentially. The basic structure of the autoecoder is
shown in Fig. 3. Based on Fig. 2., during the first phase of learning, the image frames are used
to train the first autoencoder network, which has three layers, namely, input layer(x), output
layer(x̂ ) and the hidden layer (h1). Hidden layer (h1) acts as the input layer, and output layer
while layer (h2) acts as the hidden layer for the second autoencoder network. The autoencoder
used is a Restricted Boltzmann Machine (RBM).
Training the RBM includes representing the input by visible units, a positive phase, a
negative phase and updates of weights and bias. In the positive phase, given the input to the
þ
visible units (vþi ) of the RBM, the hidden unit state (h j )of the RBM is updated. The individual
activation probability of the hidden unit is given by,

ðk Þ
P h j jvðK Þ ¼ sigmoid b j þ ∑ wij vi ð1Þ
i

where sigmoid (·) is the logistic function. Here,bj is the bias contributing to the hidden unit and
{wij} is the set of weights associated with hidden unit. In the negative phase, the reconstruction

of visible units v−i and the hidden units h−j are computed. The individual activation

probability of the visible unit is given by

Multimedia Tools and Applications

Fig. 3 Architecture of the autoencoder Network

!

ðk Þ
P vi jhðK Þ ¼ sigmoid ai þ ∑ wij h j ð2Þ
j

where sigmoid (·) is the logistic function. Here,ai is the bias contributing to the visible units
and {wij} is the weight associated with it.
In the final step, the weights are updated. Given a training set of input frames, the visible
þ
states vþ − −
i and hidden states h j are sampled from the data distribution, while vi and h j are the
reconstructed visible and hidden states. The updated change in weight is given by
!
ΔW i j ¼ ε < vþ þ − −
i h j > − < vi h j > ð3Þ

where ε is a learning rate. Once the weights are pre-trained with RBM, back propagation
algorithm is used for its fine-tuning.

3.2.2 Fine tuning using backpropagation

In general, the widely available method of updating the weight and biases for a neural network
is via backpropagation. Here, Conjugate gradient method [6] is used for back propagation.
Consider a neural network having one output layer and one input layer with one or more hidden
layers. Let W Lij be the weight associated with the connection between ith node in layer L-1 and jth
node in layer L. During the forward pass, the activations (QL) in each node of all the layers in the
network is computed where Q denotes the activations and L denotes the output layer.
In the forward pass, the error in the output layer is computed and the error signal is denoted
by δL.

δL ¼ QL −y ð4Þ
where y is the desired output, while in this case, it denotes the present input from the training
dataset to the network. The error is then propagated back to the network into the hidden layers
where the error value in each of the hidden layer is computed as, δL − 1, δL − 2 … δ2. In general,
the error at hidden layersδL − 1, δL − 2…up toδ2are given by
Multimedia Tools and Applications

T 0

δL−1 ¼ W L δL : f zðL−1Þ ð5Þ

where f′(z(L − 1)) = QL − 1(1 − QL − 1).

Then the gradient of the error function at each layer is calculated. The gradient for the
hidden layer weights is given by the output error signal back propagated to the hidden layer,
multiplied by the input to the hidden layer. The gradient in the output layer of the network is
given by
!
∂E L−1 L
¼Q δ ð6Þ
∂W L

The overall gradient of all the layer in the network is given by

∂E ∂E ∂E
gk ¼ … … ð7Þ
∂W 2 ∂W L−1 ∂W L
For Conjugate gradient algorithm to be performed the weights are arranged in an n-dimensional
column vector made up of all the weight vectors from all the layers in the network and denoted
as wk.Let k denotes the number of iterations, and initially its value is set to be 0. Initially, the
Conjugate gradient algorithm begins the iteration by searching in the steepest descent direction
which is the negative of the gradient. Therefore the initial search direction p0 is given by,
p0 ¼ −g o ð8Þ

The updated weights are given by

wkþ1 ¼ wk þ αK PK ð9Þ
where αK is the result of line search performed to determine the change in optimal distance along the
current search direction. The new search direction is then found by the linear combination of the new
steepest descent direction and the previous search direction. This is given by
PKþ1 ¼ −gKþ1 þ β K PK ð10Þ

Here Polak Ribiére update is used, and the constant βK is computed by

Δg Tk−1 −gk
βK ¼ ð11Þ
gTk−1 −gK−1

This method is effective and can be used when training neural networks with a large amount of
data. In general, the deep learning architecture used trains itself by minimizing the error
between the two images. The choice of learning rates influences the rate of absorption of a
stationary or moving foreground object in the background model. Larger learning rates tend
the network to learn the changes in the foreground faster, while lower learning rates make the
network slower to adapt to sudden changes in the background model.
When trained with sets of incoming surveillance video frames in an unsupervised way, deep
learning architecture used in the method can probabilistically reconstruct the inputs by learning
the input frames from the video.

3.3 Foreground detection using background subtraction

In foreground detection, labeling of pixels into a background or as the foreground is done. The
obtained background modeled image is subjected to background subtraction to extract the
Multimedia Tools and Applications

foreground object. To obtain the foreground pixel region F(x, y), the incoming image I(x, y) is
subtracted from the background modeled image, B(x, y). A threshold (H), is applied to the
absolute difference value to classify the pixel as foreground or background. Based on (12) the
foreground pixels are labeled as ones and background pixels are labeled as zero’s.

0; jIðx; yÞ−Bðx; yÞj < H
F ðx; yÞ ¼ ð12Þ
1; jIðx; yÞ−Bðx; yÞj≥H

To enhance the segmentation results, morphological operations such as morphological closing

and opening and connected components algorithm are used. The connected component aids in
eliminating smaller foreground objects. The optimum threshold value is to be chosen; higher
the value of H, segments more region as background and lower value of H, segments more
region as foreground. The accuracy of foreground segmentation also depends on the choice of
color space. Based on comparative studies done in the literature [36], YCbCr is used as color
space.

4 Application

The background model obtained using deep learning approach, is highly reliable in real time
applications namely unattended object detection in video surveillance. The subsequent sub-
section will discuss the applications in detail.

4.1 Application on unattended objects detection

The proposed method is highly suitable for the application of detecting unattended objects.
Fig. 4 shows the general framework of unattended object detection. Here, the incoming video
frames are trained in the deep learning network to generate the background model at a regular
periodic interval (T). Assuming that the static unattended object is not present while training
the initial set of incoming frames, the initial reference background is generated with frame
numbers from Pk to Pk + T-1, where k = 0,1,2,3...up to T-1. The P0 denotes the first frame, P1
denotes the second frame and so on. The initial background model (BM) generated with
training T number of frames is termed as the initial reference background model. These
reference background models generated by training the deep learning autoencoder network,
compress the incoming frames into latent space representation during the encoding stage and
reconstruct back to generate the image as background model during the decoding stage. The
pre training is done using RBM to initialize the weights and the fine tuning is done using
backpropogation. In order to make the network learn the weights and biases an iterative
procedure is followed. The details of network training is discussed in detail in section 3.2.
Continuous training of incoming video frames is done and the first updated background model
(BM + 1) is obtained by training 2 T number of frames. This process is continued for all the set
of incoming frames in the surveillance video and each updated background model is obtained
by training multiple of T number of frames.
The objects which are in motion during the training of the background model will not
influence much of the output. Once the reference background modeled image (BM) and all the
updated background models (BM + 1, BM + 2…BM+ n) are generated, the object which remains
static in the scene gets learned in the updated background model. This can be obtained by
Multimedia Tools and Applications

Fig. 4 Overview of Unattended objects detection

subtracting each of the updated background models to the reference background model (BM) in
sequential order which is termed to be as Bdual background subtraction^. The binary mask of
the unattended object is obtained via thresholding technique and a rectangular blob is drawn
around the foreground region, to resemble the detected abandoned object.
This technique can also be used in the application of intruder detection for remote
surveillance. This work can be extended for the application of activity recognition similar to
the method proposed by Liu et at [24, 25] considering the temporal pattern of each activity as
the features for training the network. In order to enhance the efficiency and compete with the
adaptive Multi-Task Learning approach [25] an additional stream of deep learning classifica-
tion network can be trained by extracting the features from the various activities to be
classified.

5 Experimental results

In this section, the robustness of the algorithm on unattended object detection and intruder
detection for remote video surveillance applications is demonstrated through the results
obtained. The algorithm was implemented using MATLAB on a PC equipped with AMD
processor at 3.90 GHz and 8 GB RAM.
The learning rate and the number of iterations play a major role in determining the accuracy
of the proposed approach. Initially, the selection of learning rate is done using AVSS 2007
abandoned bag dataset. In order to reduce the complexity of the network, the size of the
incoming frames in the video is reduced and it is partitioned into fixed patches of size (32 ×
Multimedia Tools and Applications

32). Three different parameters such as specificity, precision and F-Measure are used in the
evaluation of selecting the learning rate. Fig. 5 depicts the results for parameters using different
learning rates.
Based on the results, it was found that the learning rate of 0.0001 is used. Being an iterative
approach, the algorithm converges as the epochs start increasing from the initial value and the
accuracy tends to remain constant with further increase in epochs. Based on the experimental
evaluation, 25 number of epochs are used for modeling the background image. The next
subsection discusses the performance evaluation of the proposed method in the static
environment.

5.1 Evaluation of the proposed method

To evaluate the performance of the proposed method, CDNET 2014 video dataset which
includes a variety of indoor and outdoor environments is used for qualitative and quantitative
analysis. The dataset is significant as it includes videos of wide detection challenges distributed
in various categories, viz., indoor, outdoor, night, thermal, turbulence, etc. conditions. The
CDNET datasets provide ground-truth data and evaluation tools on their official homepage.

5.1.1 Qualitative analysis

Visual comparison of the results with a few state-of-art algorithms is done in the qualitative
analysis. Fig. 6 shows the background model for a sequence from each category. Fig. 7 shows
the result of the proposed algorithm with state-of-art algorithms such as UMBS [36], Cp3 [22],
Mscale [27] and RMoG [42] respectively. To reduce the computational delay, each of the input
images is resized to lower dimensionality, when applied to the deep learning network. Single
video frame from each category is used for the representation of visual analysis. The first
column displays the category name, the name of the scenario and the frame number. The input
and ground truth of the selected image is seen in the second and third column. Columns four,
five, six, seven and eight illustrate the results of the proposed algorithm and the above
mentioned state of art methods.

1.2

0.8
Accuracy

0.6
precision
0.4 specficity

0.2 FMeasure

Learning Rate
Fig. 5 Evaluation using different learning rates on AVSS2007 dataset
Multimedia Tools and Applications

Camera Jitter Shadow NightVideos Baseline DynamicBackgroundFo

traffic People in shade BusyBoulvard Highway untain02

BadWeather Intermittent ObjectMotion LowFrame Rate Thermal Turbulence

Blizzard StreetLight TunnelExit_0_35fps Library turbulence1

Fig. 6 Background Model from various catagory of CDNET2014

5.1.2 Quantitative evaluation

Quantitative evaluation is done by comparing the performance metrics obtained from the
proposed method to the state-of-art algorithms including UMBS [36], Cp3 [22], MScale [27],
SC_SOBS [29], BMOG [31], GMM-Zivkovic [48], Euclidean distance [4] and Simplified
SOBS [37]. Figures 8, 9 and 10 depict the results of the performance comparison for the
metrics such as Specificity, Percentage of Wrong Classifications (PWC) and Precision respec-
tively. Table 1 presents the overall average result of all the videos belonging to each category
which is obtained by the proposed algorithm. Equal importance has been given to all the
metrics for evaluation. While comparing the performance metric of specificity, the proposed
algorithm ranks in top two for three categories. In the two categories namely night video and
dynamic background, the proposed algorithm ranks the first position with values 0.9889 and
0.9980 respectively.
While comparing the results for False Positive Rate (FPR), the proposed algorithm ranks
among the top two for 3 categories out of the 10 categories. In the category of night video, the
proposed algorithm ranks the first position with values 0.01108. For the PWC metric, the
videos of categories namely night video and intermittentobject motion category ranks first and
the second best among all the categories. While comparing the results for Precision, the
proposed algorithm ranks among the top three for 5 categories out of the 10 categories.
From Table 1, it can be inferred that the overall average performance for all the recall metric
is 0.5449 and the overall average value of specificity is 0.9918 respectively. For specificity,
PWC and FPR metrics among the 10 categories available, the dynamic background is found to
be the best. For FPR, FNR, and PWC, the value of the minimum is accepted to be the best
result. For FNR metrics, the baseline category has the minimum value and can be rated to be
the best among other categories available. The baseline category ranks best for the precision
and F-Measure metrics. Table 2 provides the overall comparison results of 10 categories in the
CDNET 2014 dataset with the state of art algorithms.
The proposed method is found to be the second best for the Specificity, FPR and PWC
metrics. Multimode Background Subtraction [35] ranks the best for most of the metrics. This
method making use of multiple background models and color space for the foreground
detection task, may not be suitable for real-time application because of the complexity in the
multiple tasks that need to be performed before extracting the foreground pixels. Background
model generation, Binary Masks Aggregation/Fusion Binary Masks pruning makes the system
Multimedia Tools and Applications

Input Ground Proposed UMBS Cp3 Multscale RMoG

Category Image Truth [36] [22] [27] [42]

Thermal
Library
#1156

Camera jitter
traffic
#964

Shadow
people in shade
358

nightVideos
Boulvard
#910

Baseline
Highway
#721

dynamicBackground
fountain02 #1282

bad weather
blizzard
#3362

intermittentObjectM
otion
streetLight #1498

lowFramerate
tunnelExit_0_35fps
#2081

Turbulence
turbulence1 #2179

Fig. 7 Results of background subtraction for the CD2014 database

more complex and time-consuming when compared with the proposed method. From Table 2,
based on overall result evaluation, it is seen that the proposed method, SC_SOBS [29] and
Multimode Background Subtraction [36], work better when compared to other approaches.
Although the proposed algorithm performs generally well on certain indoor and outdoor
Multimedia Tools and Applications

1.05
1
0.95
0.9
0.85
0.8
Bad Low Night Turbulence Baseline Dynamic Camera Intermient Shadow Thermal
Weather Framerate Videos Background Jier Object
Moon

Proposed Mscale[27] UMBS[36]

cp3[22] SOBS_CF [29] BMOG [31]
GMM Zivkovic [48] Euclidean distance [4] Simpliﬁed SOBS [37]

Fig. 8 Comparison chart of average Specificity value for CD2014 dataset

surveillance video sequences, the following limitations were observed. Improvements in

performance metrics for recall, precision, and F-measure would further strengthen the quality
of the algorithm. In addition, the proposed method does not use any mechanism for back-
ground maintenance based on sudden illumination changes that would limit the performance.

5.2 Performance evaluation of unattended object detection

In this subsection, the performance of the algorithm for the application of unattended object
detection is presented. Several standard database, namely, AVSS 2007 i-LIDS AB dataset,
PETS 2006 dataset and CAVIAR dataset are used for this evaluation task.

5.2.1 AVSS 2007 i-LIDS AB dataset

The i-LIDS AB dataset for abandoned object video sequence consists of 5474 frames while the
scene under consideration starts from the frame number 252. The initial reference background
model is obtained by training the frame numbers 251 to 850. On continuous training, the

Thermal
Shadow
Intermient Object Moon
Camera Jier
Dynamic Background
Baseline
Turbulence
Night Videos
Low Framerate
Bad Weather
0 2 4 6 8 10 12 14

Simpliﬁed SOBS [37] Euclidean distance [4] GMM Zivkovic [48]

BMOG [31] SOBS_CF [29] cp3[22]
UMBS[36] Mscale[27] Proposed

Fig. 9 Comparison chart of average percentage of wrong classification (PWC) value for CD2014 dataset
Multimedia Tools and Applications

1
0.8
0.6
0.4
0.2
0
Bad Weather Low Night Videos Turbulence Baseline Dynamic Camera Jitter Intermittent Shadow Thermal
Framerate Background Object
Motion

Proposed Mscale[27] UMBS[36]

cp3[22] SOBS_CF [29] BMOG [31]
GMM Zivkovic [48] Euclidean distance [4] Simpliﬁed SOBS [37]

Fig. 10 Comparison chart of average Precision value for CD2014 dataset

subsequent output obtained by training frames at the periodic interval in multiples of T frames
acts as the updated background model. The value of T is set to be 600 and in order to reduce
the computational cost, the incoming video sequence is subsampled in the time domain (frame
rate reduction) by a factor of 3. The initial reference background is subtracted from the updated
background models in order to detect any object present. If the object is detected in consec-
utive updated background models, the object is termed as ‘unattended object’. The bag
remains stationary from around frame number 2012 to frame number4805 in the original
video and the proposed method captures the unattended object. The results are shown in
Fig. 11. As one could observe, the algorithm has no impact on transient events such as humans
crossing the area under surveillance.

5.2.2 PETS 2006 dataset

PETS 2006 dataset S1-T1-C sequence contains 3020 frames and a man enters the scene and
leaves the bag completely unattended from frame number 1915 till the end of the sequence.
The value of interval T used for training the deep learning network is 600 and the incoming
frames are subsampled by a factor of 3. Fig. 12b gives the initial background model followed
by the updated background model generated by the network. Fig. 12c shows 3rd updated
background model obtained by training the background model along with frame number 1801
to 2400. Fig. 12d shows the binary mask with white pixels indicating the presence of the

Table 1 Performance evaluation results of the proposed algorithm on CDNET 2014 datasets

Category Specificity Recall Precision FPR PWC FNR F-Measure

badWeather 0.9969 0.6522 0.7464 0.0031 0.8545 0.3478 0.6915

cameraJitter: 0.9873 0.4863 0.6139 0.0127 3.2287 0.5137 0.5400
baseline 0.9955 0.7197 0.8885 0.0046 1.0920 0.2803 0.7795
Intermittent Object Motion 0.9804 0.5145 0.5920 0.0196 4.5190 0.4855 0.5310
lowFramerate 0.9948 0.4672 0.6225 0.0052 1.8660 0.5328 0.5285
Dynamic Background 0.9980 0.5993 0.7164 0.0020 0.4999 0.4007 0.6432
turbulence 0.9961 0.5071 0.4928 0.0039 0.5708 0.4929 0.4640
shadow: 0.9877 0.6393 0.6756 0.0122 2.5151 0.3607 0.6504
thermal: 0.9925 0.5938 0.7311 0.0074 2.2919 0.4062 0.6428
nightVideo 0.9889 0.2686 0.3565 0.0111 2.6365 0.7314 0.2894
Overall: 0.9918 0.5449 0.6436 0.0082 2.0074 0.4552 0.5760
Multimedia Tools and Applications

Table 2 Performance comparison of various state of art algorithms for CDNET 2014 databases

Methods Specificity Recall Precision FPR PWC FNR F-

Measure

GraphCutDiff [33] 0.9771 0.6347 0.5880 0.0229 3.8820 0.3653 0.7001

SC_SOBS [29] 0.9789 0.7542 0.6516 0.0210 2.7967 0.2458 0.6679
CP3-online [22] 0.9704 0.7341 0.6119 0.0296 3.4598 0.2659 0.5915
Multimode Background Subtraction 0.9923 0.7531 0.7464 0.0077 1.3291 0.2469 0.7581
[36]
BMOG [31] 0.9906 0.7224 0.6963 0.0094 2.1499 0.2775 0.7469
GMM Zivkovic [48] 0.9864 0.6653 0.6017 0.0135 2.6999 0.3347 0.6502
Simplified SOBS [37] 0.9756 0.4916 0.4236 0.0243 4.5573 0.5084 0.5526
Multiscale BGM [27] 0.9768 0.6488 0.5618 0.0231 3.3839 0.3512 0.6071
Euclidean distance [4] 0.9732 0.6703 0.5638 0.0268 3.8113 0.3297 0.6007
Proposed 0.9918 0.5448 0.6436 0.0082 2.0074 0.4552 0.5760

unattended object, and a rectangular blob is drawn around the foreground unattended object as
seen in Fig. 12e. The same procedure is followed for PETS 2006 dataset S5-T1-G sequence
and the results are seen in Fig. 13

5.2.3 CAVIAR dataset

The algorithm is tested for CAVIAR dataset and the results of LeftBox sequence of are
discussed below. The total no of frames in the LeftBox sequence is 863. The value of T is
set to be 200 and the results on LeftBox sequence is shown in Fig. 14.

6 Conclusion and future work

In this paper, a deep learning architecture is used to develop a background model for
foreground extraction. Here, the network is trained with an input video sequence in order to

(a) incoming training frames (b) reference background model(BM)

(f) sixth updated background (g) binary mask (h) rectangular blob on unattended
model(BM+6) foreground
Fig. 11 Results of AVSS 2007 i-LIDSAB dataset
Multimedia Tools and Applications

(a) incoming training frames (b) reference background

model (BM)

(c) third updated (d) binary mask (e) rectangular blob on

background model (BM+3) unattended foreground
Fig. 12 Results of S1-T1-C sequence for pets2006dataset

generate an output which is equivalent to a background model. Pretraining the deep network is
done with RBM and backpropogation is used for fine tuning the weights. Finally, the
algorithm is validated in the application of unattended object detection. Experiments on
benchmark data set CDNET 2014 confirms that the algorithm performs reasonably well,
compared to the state of art methods for different scenarios and can be reliably used for

(a) incoming training frames (b) reference background

model third (BM)

(c) third updated (d) binary mask (e) rectangular blob on

background model (BM+3) unattended foreground

Fig. 13 Results of S5-T1-G sequence for pets2006 dataset

Multimedia Tools and Applications

(a) incoming training frames (b) reference background

model

Fig. 14 Results of LeftBox sequence for CAVIAR dataset

background modeling in a dynamic environment in video surveillance applications. In this

paper, the videos obtained have been captured by a camera which is not in motion. The future
work could be aimed at background update mechanism for videos with camera motion.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

References

1. Babaee M, Dinha DT, Rigolla G (2018) A deep convolutional neural network for video sequence
background subtraction. Pattern Recogn 76:635–649
2. Barnich O, Droogenbroeck MV (2011) ViBe: A Universal Background Subtraction Algorithm for Video
Sequences. IEEE Trans Image Process 20(6):1709–1724
3. Benedek C, Sziranyi T (2008) Bayesian foreground and shadow detection in uncertain frame rate surveil-
lance videos. IEEE Trans Image Process 17(4):608–621
4. Benezeth Y, Jodoin P-M, Emile B, Laurent H, Rosenberger C (2010) Comparative study of background
subtraction algorithms. J Electron Imaging 19(3)
5. Bhargava M, Chen C-C, Ryoo M, Aggarwal J (2009) Detection of object abandonment using temporal
logic. Mach Vis Appl 20(5):271–281
6. Charalambous C (1992) Conjugate gradient algorithm for efficient training of artificial neural networks. IEE
Proceedings G - Circuits, Devices and Systems 139(3):301–310
7. Culibrk D, Marques O, Socek D, Kalva H, Furht B (2017) Neural network approachto background
modeling for video object segmentation. IEEE Trans Neural Netw 18(6):1614–1627
8. De Gregorio M, Giordano M (2017) Background estimation by weightless neural networks. Pattern Recogn
Lett 96. https://doi.org/10.1016/j.patrec.2017.05.029
9. Deng G, Guo K (2014) Self-adaptive background modeling research based on change detection and area
training. Proceedings of IEEE Workshop on Electronics, Computer and Applications, Ottawa, pp. 59-62
10. Droogenbroeck MV, Paquot O (2012) Background subtraction: experiments and improvements for ViBe.
In: Proceedings of IEEE Comput. Soc. Conf. Comput.Vis. Pattern Recognit. Workshops, pp. 32-37
11. Elgammal A, Duraiswami R, Harwood D, Davis LS (2002) Background and foreground modeling using on
parametric kernel density estimation for visual surveillance. Proc IEEE 90(7):1151–1163
Multimedia Tools and Applications

12. Guo H, Wang J, Lu H (2016) Multiple deep features learning for object retrieval in surveillance videos. IET
Comput Vis 10(4):268–271. https://doi.org/10.1049/iet-cvi.2015.0291
13. Heikkilä M, Pietikäinen M (2006) A texture-based method for modeling the background and detecting
moving objects. IEEE Trans Pattern Anal Mach Intell 28(4):657–662
14. Kamijo S, Matsushita Y, Ikeuchi K, Sakauchi M (2000) Traffic monitoring and accident detection at
intersections. IEEE Trans Intell Transp Syst 1(2):108–118
15. Kim K, Chalidabhongse T, Harwood D, Davis L (2004) Background modeling and subtraction by codebook
construction. In: Proceedings of IEEE International Conference on Image Processing, ICIP
16. Krahnstoever N, Tu P, Sebastian T, Perera A, Collins R (2006) Multiview detection and tracking of travelers
and luggage in mass transit environments. In: Proceedings of Int. Workshop Performance Eval. Tracking
Surveillance, pp. 67–74
17. Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval. In:
Proceedings of 19th ESANN, Bruges, pp 27-29
18. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural
networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems -
Volume 1 (NIPS'12)
19. Laugraud B, Piérard S, Braham M, Broeck MVD (2015) Simple median-based method for stationary
background generation using background subtraction algorithms. In Proceedings of ICIAP
20. Li L, Huang W, Gu IY-H, Tian Q (2004) Statistical modeling of complex backgrounds for foreground object
detection. IEEE Trans Image Process 13(11):1459–1472
21. Li L, Leung MKH (2002) Integrating intensity and texture differences for robust change detection. IEEE
Trans Image Process 11(2):105–112
22. Liang D, Kaneko S, Hashimoto M, Iwata K, Zhao X (2015) Co-occurrence probability-based pixel pairs
background model for robust object detection in dynamic scenes. Pattern Recogn 48(4):1374–1390
23. Lin H, Liu T, Chuang J (2002) A probabilistic SVM approach for background scene initialization. In:
Proceedings of the International Conference on Image Processing, ICIP, pp. 893–8963
24. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from
sensor data. In: Proceedings of the 24th international conference on artificial intelligence (IJCAI'15). AAAI
Press, pp 1617–1623
25. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition.
Neurocomputing 181:108–115
26. Liu T, Stathaki T (2017) Enhanced pedestrian detection using deep learning based semantic image
segmentation. In: Proceedings of 22nd International Conference on Digital Signal Processing (DSP),
London, United Kingdom, 2017, pp. 1-5
27. Lu X (2014) A multiscalespatio-temporal background model for motion detection. In: Proceedings of IEEE
Int. Conference on image Processing (ICIP)
28. Maddalena L, Petrosino A (2008) A self organizing approach tobackground subtraction for visual surveil-
lance applications. IEEE Trans Image Process 17(7):1729–1736
29. Maddalena L, Petrosino A (2012) The sobs algorithm: what are the limits?. In: Proceedings of Computer
Vision and Pattern Recognition Workshops
30. Marsden M, McGuinness K, Little S, O'Connor NE (2017) ResnetCrowd: A residual deep learning
architecture for crowd counting, violent behaviour detection and crowd density level classification. In:
Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS),
Lecce, Italy, pp. 1-7
31. Martins I, Carvalho P, Corte-Real L, Alba-Castro JL (2017) BMOG: Boosted Gaussian Mixture Model with
Controlled Complexity. Pattern Recognition and Image Analysis. IbPRIA 2017. LNCS, Springer, pp 50-57
32. Matsuyama T, Ohya T, Habe H (2000) Background subtraction for non-stationary scenes. In: Proceedings of
Asian Conference on Computer Vision, pp. 662–667
33. Miron A, Badii A (2015) Change detection based on graph cuts. In: Proceedings of International Conference
on Systems, Signals and Image Processing (IWSSIP), London
34. Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW (2018) Convolutional Neural Networks Based Fire
Detection in Surveillance Videos. IEEE Access 6:18174–18183. https://doi.org/10.1109
/ACCESS.2018.2812835
35. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture
classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
36. Sajid H, Cheung SCS (2017) Universal Multimode Background Subtraction. IEEE Trans Image Process
26(7):3249–3260
37. Sehairi K, Chouireb F, Meunier J (2017) Comparative study of motion detection methods for video
surveillance systems. J Electron Imaging 26(2)
Multimedia Tools and Applications

38. Stauffer C, Grimson E (1999) Adaptive background mixture models for realtime tracking. Proceedings of
IEEE Int Conf Comput Vis Pattern Recognit 2:246–252
39. Tang Z, Miao Z, Wan Y (2007) Background Subtraction Using Running Gaussian Average and Frame
Difference. In: Proceedings of Entertainment Computing – ICEC, pp 411-414
40. Tian Y, Wang Y, Hu Z, Huang T (2013) Selective Eigen background for background modeling and
subtraction in crowded scenes. IEEE Trans Circuits Syst Video Technol 23(11):1849–1864
41. Toyama K, Krumm J, Brumitt B, Meyers B (1999) Wallflower: principles and practice of background
maintenance. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, 1
Kerkyra, Greece, pp. 255–261
42. Varadarajan S, Miller P, Zhou H (2013) Spatial mixture of gaussians for dynamic background modelling. In:
Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance
43. Wang Y, Yao H, Zhao S (2015) Auto-Encoder Based Dimensionality Reduction. Neurocomputing 184:232–
242. https://doi.org/10.1016/j.neucom.2015.08.104
44. Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: Real-time tracking of the human body.
IEEE Trans Pattern Anal Mach Intell 19(7):780–785
45. Yi S, Li H, Wang X (2016) Pedestrian Behavior Modeling From Stationary Crowds With Applications to
Intelligent Surveillance. IEEE Trans Image Process 25(9):4354–4368
46. Zhang S, Yao H, Liu S (2008) Dynamic Background Subtraction Based on Local Dependency Histogram.
In: Proceedings of Eighth International Workshop on Visual Surveillance -VS2008, Marseille
47. Zhao Z, Zhang X, Fang Y (2015) Stacked Multilayer Self-Organizing Map for Background Modeling. IEEE
Trans Image Process 24(9):2841–2850
48. Zivkovic Z (2004) Improved adaptive gaussian mixture model for background subtraction. Proceedings of
International Conference on Pattern Recognition (ICIP) 2:28–31

Jeffin Gracewell received his B.E. degree in Electronics and Communication Engineering from Karunya
Universiy, Coimbatore during 2010, M.E. degree in Communication Systems from Sri Sivasubramaniya Nadar
College of Engineering, Kalavakkam, Chennai during 2012. He is currently a research scholar in Madras Institute
of technology, Chennai. His current research interests are in Deep Learning, Video and Image processing.
Multimedia Tools and Applications

Mala John did her M.Sc and M.Tech from Indian Institute of Technology (IIT) Madras and IIT Delhi
respectively. She did her Ph.D in Anna University. She is a faculty member of the Department of Electronics
Engineering, Madras Institute of Technology Campus of Anna University since 1992. Presently she is the
Professor in the department of Electronics Engineering. Her interests include Communication, Signal Processing
& Image Processing.

A Novel Approach for Background Subtraction Using Generalized Rayleigh Distribution
No ratings yet
A Novel Approach for Background Subtraction Using Generalized Rayleigh Distribution
13 pages
domadiya2015
No ratings yet
domadiya2015
10 pages
chapter 3
No ratings yet
chapter 3
29 pages
Background Subtraction Using Artificial Immune
No ratings yet
Background Subtraction Using Artificial Immune
23 pages
E102.D_2018EDL8247
No ratings yet
E102.D_2018EDL8247
4 pages
Algorithms 12 00115
No ratings yet
Algorithms 12 00115
13 pages
Lambert Ebook
No ratings yet
Lambert Ebook
54 pages
Xiao 2008
No ratings yet
Xiao 2008
5 pages
IAJIT
No ratings yet
IAJIT
10 pages
Background_Subtraction_in_Real_Applicati
No ratings yet
Background_Subtraction_in_Real_Applicati
58 pages
Background Subtraction in Real Applications: Challenges, Current Models and Future Directions
No ratings yet
Background Subtraction in Real Applications: Challenges, Current Models and Future Directions
73 pages
s11227-018-2488-1
No ratings yet
s11227-018-2488-1
13 pages
Moving Object Detection Using Local Binary Pattern and Gaussian Background Model
No ratings yet
Moving Object Detection Using Local Binary Pattern and Gaussian Background Model
10 pages
Unit-3 3
No ratings yet
Unit-3 3
32 pages
2023 - Handling Illumination Variation For Motion Detection in Video Through Intelligent Method An Application For Smart Surveillance System
No ratings yet
2023 - Handling Illumination Variation For Motion Detection in Video Through Intelligent Method An Application For Smart Surveillance System
19 pages
Boston University: Real-Time Background Subtraction in C++
No ratings yet
Boston University: Real-Time Background Subtraction in C++
49 pages
Real-Time Discriminative Background Subtraction
No ratings yet
Real-Time Discriminative Background Subtraction
29 pages
21-Anomaly Detection in Surveillance Footage
No ratings yet
21-Anomaly Detection in Surveillance Footage
9 pages
Accurate Background Modeling For Moving
No ratings yet
Accurate Background Modeling For Moving
6 pages
Filtro PDF
No ratings yet
Filtro PDF
11 pages
Background Substract I On
No ratings yet
Background Substract I On
6 pages
Statistical Modeling of Complex Backgrounds For Foreground Object Detection
No ratings yet
Statistical Modeling of Complex Backgrounds For Foreground Object Detection
14 pages
657755
No ratings yet
657755
13 pages
Overview and Benchmarking of Motion Detection Methods
No ratings yet
Overview and Benchmarking of Motion Detection Methods
26 pages
Theoretical Physics Journal List
No ratings yet
Theoretical Physics Journal List
3 pages
AR B R - : Obust Ackground Emoval Algortihms Using Fuzzy C Means Clustering
No ratings yet
AR B R - : Obust Ackground Emoval Algortihms Using Fuzzy C Means Clustering
9 pages
Survey Paper Is 2013
No ratings yet
Survey Paper Is 2013
6 pages
Motion Detection and Tracking of Multiple Objects For Intelligent Surveillance
No ratings yet
Motion Detection and Tracking of Multiple Objects For Intelligent Surveillance
7 pages
Object Detection
100% (1)
Object Detection
15 pages
A Background Model Initialization Algorithm Based On QR-Decomposition
No ratings yet
A Background Model Initialization Algorithm Based On QR-Decomposition
8 pages
A Background Model Initialization Algorithm For Video Surveillance
No ratings yet
A Background Model Initialization Algorithm For Video Surveillance
8 pages
Pedestrian Detection Based On Background Compensation With Block-Matching Algorithm
No ratings yet
Pedestrian Detection Based On Background Compensation With Block-Matching Algorithm
5 pages
Background Subtraction Using Variable Threshold RGB Model
No ratings yet
Background Subtraction Using Variable Threshold RGB Model
3 pages
Moving Object Detection Using Matlab PDF
100% (2)
Moving Object Detection Using Matlab PDF
7 pages
GMM PDF
No ratings yet
GMM PDF
7 pages
Module 4
No ratings yet
Module 4
1 page
Background Subtraction
No ratings yet
Background Subtraction
15 pages
Advanced Techniques of Foreground, Background and Object Identification in Video Application
No ratings yet
Advanced Techniques of Foreground, Background and Object Identification in Video Application
6 pages
Autonomous Learning of A Spatial Model and Object Recognition With A Pan-Tilt-Zoom Camera
No ratings yet
Autonomous Learning of A Spatial Model and Object Recognition With A Pan-Tilt-Zoom Camera
17 pages
Adaptive Background Mixture Models For Real-Time Tracking
No ratings yet
Adaptive Background Mixture Models For Real-Time Tracking
7 pages
An Efficient Approach On Constant Human Motion Detection in Survileicne
No ratings yet
An Efficient Approach On Constant Human Motion Detection in Survileicne
4 pages
stationary_objects_detection
No ratings yet
stationary_objects_detection
40 pages
SANGEETHA
No ratings yet
SANGEETHA
19 pages
Background Subtraction of An Indian Classical Dance Videos Using Adaptive Temporal Averaging Method
No ratings yet
Background Subtraction of An Indian Classical Dance Videos Using Adaptive Temporal Averaging Method
4 pages
A Novel Camera-Based Drowning Detection Algorithm: Abstract
No ratings yet
A Novel Camera-Based Drowning Detection Algorithm: Abstract
10 pages
Image Processing Paper
No ratings yet
Image Processing Paper
5 pages
2024 Exploration of Resonant Modes for Circular and Polygonal
No ratings yet
2024 Exploration of Resonant Modes for Circular and Polygonal
18 pages
Robust Techniques For Background Subtraction in Urban Traffic Video
No ratings yet
Robust Techniques For Background Subtraction in Urban Traffic Video
12 pages
Human Motion Detection and Video Surveillance Using MATLAB: Sapana K. Mishra, Kanchan .S Bhagat
No ratings yet
Human Motion Detection and Video Surveillance Using MATLAB: Sapana K. Mishra, Kanchan .S Bhagat
4 pages
Paper 1
No ratings yet
Paper 1
4 pages
Object Motion Detection in Video Frames Using Background Frame Matching
No ratings yet
Object Motion Detection in Video Frames Using Background Frame Matching
4 pages
Performance comparison of optical flow and background subtraction and discrete wavelet transform methods for moving objects
No ratings yet
Performance comparison of optical flow and background subtraction and discrete wavelet transform methods for moving objects
10 pages
Council For Innovative Research: Efficient Motion Detection Algorithm in Video Sequences
No ratings yet
Council For Innovative Research: Efficient Motion Detection Algorithm in Video Sequences
6 pages
Moving Object Detection in The Real World Video: Dr. G. Shobha, Vartika Mudgal
No ratings yet
Moving Object Detection in The Real World Video: Dr. G. Shobha, Vartika Mudgal
3 pages
1207 6774 PDF
No ratings yet
1207 6774 PDF
14 pages
Machine Learning based Intelligent System for Breast Cancer Prediction (MLISBCP)
No ratings yet
Machine Learning based Intelligent System for Breast Cancer Prediction (MLISBCP)
13 pages
Ijert Ijert: Moving Object Detection and Velocity Estimation Using Matlab
No ratings yet
Ijert Ijert: Moving Object Detection and Velocity Estimation Using Matlab
4 pages
I Jcs It 20120302103
No ratings yet
I Jcs It 20120302103
6 pages
7_5485C_datasheet
No ratings yet
7_5485C_datasheet
2 pages
Dissertation Sur Le Regime Presidentiel
100% (2)
Dissertation Sur Le Regime Presidentiel
6 pages
Motion Detection and Target Tracking Using Neural Network Correlation Co-Efficient Technique
No ratings yet
Motion Detection and Target Tracking Using Neural Network Correlation Co-Efficient Technique
3 pages
Teamcenter Enterprise 4.0: Product Data Management
No ratings yet
Teamcenter Enterprise 4.0: Product Data Management
148 pages
DAP AN BAI NGHE.docx
No ratings yet
DAP AN BAI NGHE.docx
16 pages
Detection of Moving Object Based On Background Subtraction
No ratings yet
Detection of Moving Object Based On Background Subtraction
4 pages
Ch01. Introduction To Accounting
No ratings yet
Ch01. Introduction To Accounting
31 pages
TQMnotes
No ratings yet
TQMnotes
172 pages
Khosrow and Shirin
No ratings yet
Khosrow and Shirin
33 pages
Purposive Cultural Text
No ratings yet
Purposive Cultural Text
17 pages
Setofplans (Fillone)
No ratings yet
Setofplans (Fillone)
14 pages
Biogeography of Some Eocene Larger Foraminifera, and Their Application in Distinguishing Geological Plates Peter Lunt
No ratings yet
Biogeography of Some Eocene Larger Foraminifera, and Their Application in Distinguishing Geological Plates Peter Lunt
22 pages
Ntse Mat #13
No ratings yet
Ntse Mat #13
14 pages
Video Surveillance Systems - A Survey: Keywords
No ratings yet
Video Surveillance Systems - A Survey: Keywords
8 pages
Sequence Series 1998-2024
No ratings yet
Sequence Series 1998-2024
7 pages
The Cognitive Perspective On Learning: Its Theoretical Underpinnings and Implications For Classroom Practices
No ratings yet
The Cognitive Perspective On Learning: Its Theoretical Underpinnings and Implications For Classroom Practices
11 pages
Paper Panji (Jurnal AJHSSR)
No ratings yet
Paper Panji (Jurnal AJHSSR)
8 pages
Elecro Company Profile Eng V1-12.06
No ratings yet
Elecro Company Profile Eng V1-12.06
1 page
Photojournalism
No ratings yet
Photojournalism
10 pages
A Project Presentation On Real Time Object Detection in Autonomous Driving
No ratings yet
A Project Presentation On Real Time Object Detection in Autonomous Driving
35 pages
Job - Advertisement August 2018 PDF
No ratings yet
Job - Advertisement August 2018 PDF
3 pages
This Is The Door
No ratings yet
This Is The Door
2 pages
Chocolate Cookies With Cocoa Nibs and Lime Recipe - Simply Recipes
No ratings yet
Chocolate Cookies With Cocoa Nibs and Lime Recipe - Simply Recipes
4 pages
KAFD-A1-J02-JLL-PMT-MOM-00010 - J02-Revised MCC EOT Claim Dated 24-02-2021
No ratings yet
KAFD-A1-J02-JLL-PMT-MOM-00010 - J02-Revised MCC EOT Claim Dated 24-02-2021
2 pages
The Production Practices of Coconut Wine in Glan, Sarangani, Province
No ratings yet
The Production Practices of Coconut Wine in Glan, Sarangani, Province
4 pages
Feynman
No ratings yet
Feynman
4 pages
Vendor Form & Eft Form
No ratings yet
Vendor Form & Eft Form
2 pages
Workout Abs Bible 37 Six-Pack Secrets For Weight Loss and Ripped Abs (Workout Routines, Workout Books, Work Workout, Abs... (Harder, Felix) (Z-Library)
No ratings yet
Workout Abs Bible 37 Six-Pack Secrets For Weight Loss and Ripped Abs (Workout Routines, Workout Books, Work Workout, Abs... (Harder, Felix) (Z-Library)
132 pages
EAPP-PPT-1-Reading-Academic-Text
No ratings yet
EAPP-PPT-1-Reading-Academic-Text
109 pages
Epson Printer Stress Test Image
No ratings yet
Epson Printer Stress Test Image
1 page
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Real Numbers Lesson Plan Class 10
No ratings yet
Real Numbers Lesson Plan Class 10
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Dynamic_background_modeling_using_deep_l

Uploaded by

Dynamic_background_modeling_using_deep_l

Uploaded by

Multimedia Tools and Applications

Dynamic background modeling using deep learning

Jeffin Gracewell 1 & Mala John 1

Received: 30 July 2018 / Revised: 8 January 2019 / Accepted: 22 February 2019

# Springer Science+Business Media, LLC, part of Springer Nature 2019

Keywords Background modeling . Background subtraction . Deep learning . Foreground

3.2 Background modeling

Fig. 1 Main framework of our proposed method

Fig. 2 Architecture of the deep learning network

3.2.1 Training of RBM

probability of the visible unit is given by

Fig. 3 Architecture of the autoencoder Network

3.2.2 Fine tuning using backpropagation

where f′(z(L − 1)) = QL − 1(1 − QL − 1).

The overall gradient of all the layer in the network is given by

The updated weights are given by

Here Polak Ribiére update is used, and the constant βK is computed by

3.3 Foreground detection using background subtraction

To enhance the segmentation results, morphological operations such as morphological closing

4.1 Application on unattended objects detection

Fig. 4 Overview of Unattended objects detection

5.1 Evaluation of the proposed method

5.1.1 Qualitative analysis

Camera Jitter Shadow NightVideos Baseline DynamicBackgroundFo

BadWeather Intermittent ObjectMotion LowFrame Rate Thermal Turbulence

Fig. 6 Background Model from various catagory of CDNET2014

5.1.2 Quantitative evaluation

Input Ground Proposed UMBS Cp3 Multscale RMoG

Fig. 7 Results of background subtraction for the CD2014 database

Proposed Mscale[27] UMBS[36]

Fig. 8 Comparison chart of average Specificity value for CD2014 dataset

surveillance video sequences, the following limitations were observed. Improvements in

5.2 Performance evaluation of unattended object detection

5.2.1 AVSS 2007 i-LIDS AB dataset

Simpliﬁed SOBS [37] Euclidean distance [4] GMM Zivkovic [48]

Proposed Mscale[27] UMBS[36]

Fig. 10 Comparison chart of average Precision value for CD2014 dataset

5.2.2 PETS 2006 dataset

Category Specificity Recall Precision FPR PWC FNR F-Measure

badWeather 0.9969 0.6522 0.7464 0.0031 0.8545 0.3478 0.6915

Methods Specificity Recall Precision FPR PWC FNR F-

GraphCutDiff [33] 0.9771 0.6347 0.5880 0.0229 3.8820 0.3653 0.7001

5.2.3 CAVIAR dataset

6 Conclusion and future work

(a) incoming training frames (b) reference background model(BM)

(a) incoming training frames (b) reference background

(c) third updated (d) binary mask (e) rectangular blob on

(a) incoming training frames (b) reference background

(c) third updated (d) binary mask (e) rectangular blob on

Fig. 13 Results of S5-T1-G sequence for pets2006 dataset

(a) incoming training frames (b) reference background

Fig. 14 Results of LeftBox sequence for CAVIAR dataset

background modeling in a dynamic environment in video surveillance applications. In this

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.