0% found this document useful (0 votes)
23 views16 pages

Reference Paper 1

Uploaded by

bharath5911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

Reference Paper 1

Uploaded by

bharath5911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

ARTICLE IN PRESS

Signal Processing 89 (2009) 1723–1738

Contents lists available at ScienceDirect

Signal Processing
journal homepage: www.elsevier.com/locate/sigpro

Detecting abnormal human behaviour using multiple cameras


Panagiota Antonakaki , Dimitrios Kosmopoulos, Stavros J. Perantonis
Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research ‘‘Demokritos’’, Athens, Greece

a r t i c l e in fo abstract

Article history: In this paper a bottom-up approach for human behaviour understanding is presented,
Received 6 August 2008 using a multi-camera system. The proposed methodology, given a training set of normal
Received in revised form data only, classifies behaviour as normal or abnormal, using two different criteria of
9 March 2009
human behaviour abnormality (short-term behaviour and trajectory of a person).
Accepted 11 March 2009
Available online 5 April 2009
Within this system an one-class support vector machine decides short-term behaviour
abnormality, while we propose a methodology that lets a continuous Hidden Markov
Keywords: Model function as an one-class classifier for trajectories. Furthermore, an approximation
Behaviour understanding algorithm, referring to the Forward Backward procedure of the continuous Hidden
Trajectory
Markov Model, is proposed to overcome numerical stability problems in the calculation
Hidden Markov Model
of probability of emission for very long observations. It is also shown that multiple
Support vector machine
Homography cameras through homography estimation provide more precise position of the person,
leading to more robust system performance. Experiments in an indoor environment
without uniform background demonstrate the good performance of the system.
& 2009 Elsevier B.V. All rights reserved.

1. Introduction abnormal), we choose to model normal behaviour and


define as abnormal any behaviour deviating from that
Motion analysis in video and particularly human normality model. Our methodology applies two classifica-
behaviour understanding has attracted many researchers tion criteria:
[24], mainly because of its fundamental applications,
which include video indexing, virtual reality, human–-
computer interaction and smart surveillance. Smart (1) short-term behaviour;
surveillance in itself is one of the most challenging (2) trajectory.
problems in computer vision. Its goal is to automatically
model and identify human behaviours, calling for human
attention only when a suspicious behaviour is detected. The short-term behaviour refers to the type of behaviour
With the increasing number of cameras in many public that can be localized in a spatio-temporal sense, i.e. is
areas, the related research becomes more appealing and is brief and within restricted space. Examples of such
offered more application possibilities. behaviours are walking, standing still, running, moving
This work deals with the classification of behaviours as abruptly, etc.
normal or abnormal. Based on the remark that abnormal In the related literature the aforementioned classifica-
behaviour is considered to be rather infrequent (and thus tion criteria are mostly treated separately and, further-
more, few works concentrate on learning only normal
behaviours. The methodology provided herein provides
 Corresponding author. the discrimination of anomaly due to abnormal short-
E-mail addresses: ganton@iit.demokritos.gr (P. Antonakaki),
term motion, as happens in the case of abrupt motion, as
dkosmo@iit.demokritos.gr (D. Kosmopoulos), well as anomaly due to long-term motion, as in the case of
sper@iit.demokritos.gr (S.J. Perantonis). abnormal trajectory.

0165-1684/$ - see front matter & 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.sigpro.2009.03.016
ARTICLE IN PRESS
1724 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

Recently, several researchers have dealt with the


problem of anomaly detection, which is the process of
behaviour classification as normal or abnormal. A variety
of methods, ranging from fully supervised [9,10] to semi-
supervised [36] and unsupervised systems [21,22,18], have
been proposed in existing literature, which we further
review in Section 2. It should be noted, however, that most
of the existing approaches do not use multi-camera
information, except for [38], where multiple video
streams are combined via a coupled Hidden Markov
Model.
Our methodology contributes in current research in
several ways:

 The presented approach reflects two different criteria


of labelling an observed behaviour as normal or
Fig. 1. The main framework for video surveillance systems.
abnormal, since the final abnormality decision de-
pends on the output of two different classifiers with
independent inputs: short-term behaviour information
and trajectory information. The low level contains such methods as motion
 The behaviours are classified according to the target detection, object classification and tracking. In motion
object’s position on the ground plane, based on detection research is focused on either static or adaptive
homography (see Section 4) which provides higher background subtraction or temporal differencing algo-
accuracy compared to pure image-based techniques.1 rithms, aiming to isolate the foreground pixels that
 We introduce a continuous Hidden Markov Model participate into any kind of motion observed in a given
(cHMM) as an one-class classifier, using the notion of scene. Object classification is the process of classifying
length-normalized log-probability (see Section 6.1). detected objects into such classes as humans or vehicles,
 A novel algorithm implementing a Forward Backward appearing in a given scene. Following motion detection
procedure for the emission probability estimation in and object classification, detected objects are located in
HMMs is proposed, handling numerical instability the course of time and their trajectories are extracted via
resulting from long sequences (see Section 6.2). tracking.
High level processes use motion information from the
low level in order to finally identify the type or nature of a
The rest of the paper is organized as follows. In Section moving object’s activity. Motion-based techniques are
2 recent literature is reviewed, hinting as to the problems mostly used for short-term activity classification (e.g.
the proposed method tackles. Section 3 provides an walking, running, fighting), and do not take into account
overview of the proposed architecture. In Section 4 we object trajectories. These techniques actually calculate
explain briefly how homography is used to obtain features of the motion itself and perform recognition of
information on the position of target objects on the behaviours based on these features’ values. Such methods
ground plane. In Section 5 short-term behaviours are have been presented by Bobick et al. in [5] where motion
defined in terms of a set of extracted features. Section 5.2 energy images (MEIs) and motion history images (MHIs)
describes in detail the classification process which is are used to classify aerobic type exercises. Taking this
based on short-term behaviours. In Section 6, on the other work another step further, Weinland et al. in [34] focus on
hand, trajectories’ classification is presented by elaborat- the extraction of motion descriptors analogous to MHIs,
ing how we have used a continuous Hidden Markov Model called motion history volumes, from multiple cameras.
as an one-class classifier (Section 6.1). As an added value, Then, these history volumes are classified into primitive
Section 6.2 contains the description and foundation of a actions. Efros et al. in [11] compute the optical flow [14] of
modified algorithm for the Forward Backward procedure a given object to recognize short-term behaviours through
of probability estimation tackling long sequences in a nearest-neighbour classification.
contemporary computers. Finally, in Section 7 we provide Several methods that take into account the object’s
the experimental results and Section 8 concludes this trajectory for behaviour classification use the centroid of
paper through a brief discussion on the lessons learned. the target object [1,19,27,15] or points of interest in a given
image [4]. These methods, however, fail to take into
2. Related work account the short-term actions, for example the case
where a man threateningly moves his hands. Most of the
existing methods also face problems like view depen-
A typical surveillance system is divided into two layers,
dency, and occlusion when they extract trajectories from
which include low level and high level processes, respec-
one camera.
tively, as depicted in Fig. 1.
HMMs and their variations have been widely applied
on trajectory classification, e.g. [7,17,2,32], due to their
1
An early version of this work has been presented in [20]. unsupervised training, their simplicity and computational
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1725

efficiency and mainly because motion can be viewed as a precisely two aspects of motion. This kind of modularity
short-term stationary signal. Abstract Hidden Markov allows switching between using one or both classifiers for
Models are used by Nguyen et al. in [26] to deal with the detection of either abnormal short-term behaviours,
noise and duration variation, while Wang et al. in [33] use abnormal trajectory, or both. Furthermore, one can use
conditional random fields for behaviour recognition in information from each classifier to determine the type of
order to be able to model context dependence in abnormality detected.
behaviours. In our approach we use a continuous HMM In behaviour understanding, only few works employ
to model trajectory, using a methodology that allows the homography estimation. Park et al. in [28] have used
model to be used as an one-class classifier. homography to extract object features and, using spatio-
Our presented approach focuses on the anomaly temporal relationships between people and vehicles,
detection aspect of behaviour understanding, which extract semantic information from interactions calculated
differentiates it from the aforementioned methods. How- from relative positions. Ribeiro et al. in [30] have
ever, recent research has provided several anomaly- estimated homography and enabled an orthographic view
detection-focused approaches that we briefly review here. of the ground plane which eliminates perspective distor-
These approaches can be classified based on whether they tion origination from a single camera. Then, they have
are supervised, semi-supervised or unsupervised. calculated features in order to classify the data in four
In [9,10] the authors use supervised approaches that activities (active, inactive, walking, running).
need the classes of both normal and abnormal behaviour In existing literature two basic assumptions are usually
to have an adequately large number of labelled instances, made in order to extract features. The first is that the
provided as a priori information. In our method, on the targets move almost vertically to the camera z-axis or
other hand, the training set only consists of normal within a range that is small compared to the distance from
instances of data. The semi-supervised method of [36], the camera. This assumption ensures that the size
which only uses normal data, has a different approach in variation of moving objects is relatively small. The second
that it creates a set of marginally normal instances as assumption is that humans are planar objects, so that
abnormal to constitute an estimation of the abnormal homography-based image rectification can be possible.
class. In our work, we have used the derived feature of However, even though this later assumption may be true
length-normalized log-probability to define the normal when the cameras are close to being vertical to the ground
class, without attempting to generate abnormal instances plane, as in the case of cameras viewing from high
at all. On the other hand, we also take into account ceilings, it does not stand in general. In our method we get
motion-based features used in an one-class SVM to detect over these limitations, as can be deduced from the section
further abnormalities. on homography estimation (Section 4).
A set of unsupervised methods in existing literature
use large databases [37,6] containing all the observed
normal behaviour patterns, matching any new instances 3. Proposed methodology
against the database represented instances. In our work,
we have a single composite model (including HMM and The proposed methodology is based on the fusion of
SVM classification) for all normal instances, thus avoiding data that we collect from several cameras with over-
the need for database storage and look-up. Jiang et al. in lapping fields of view. We perform classification using two
[18] start by representing normal trajectories by a single different one-class classifiers, a support vector machine
HMM model per trajectory, clustering and retraining these (SVM) and a continuous Hidden Markov Model, with each
HMMs until a given condition holds. Other than the fact classifier having different feature vectors as input. The
that, in the work presented herein, we also cover the case final decision on the behaviour is made by taking into
of short-term behaviours besides trajectory, we model the account outputs from both classifiers.
full set of normal trajectories into a single HMM from the The system architecture is presented in Fig. 2. The low
beginning. Therefore, less calculations are required. Lee et level addresses the problem of motion detection and blob
al. in [21] use n-cut clustering over motion energy images analysis, providing the upper level with two different
to determine outliers, which are then judged as abnormal. features vectors per instance. We note that an object’s
This approach is different from ours in that it requires blob is defined to be the set of the foreground pixels that
repetition of the n-cut clustering when a new instance is belong to that object. Background subtraction is applied
to be judged. Another approach is found in [22], where a for motion detection and a bounding box is extracted. The
multi-layer finite state machine representation is used to blobs apparent within the viewing area of each camera are
model activities. According to [22], an abnormal activity is used to extract the objects’ principal axes. These principal
judged by the number of times a valid transition fails to be axes in combination with the corresponding homography
performed when matching the activity to the model state calculations are used to locate each object, i.e. determine
machine. Our approach uses probabilistic tools as the the points where the target object touches the ground
HMM instead of finite state machines to model uncer- plane. From the coordinates of the latter points we
tainty within the normal activities’ modelling. In [35], a calculate the trajectories of the objects.
single feature vector represents position, motion and Additional object information, namely the object’s
shape information, which is used in a clustering process centroid, blob size and shape are made available during
to detect abnormality. In our approach we extract separate the preprocessing step. Furthermore, a histogram is
information for each classifier, attempting to model more extracted from the moving object’s shape depicting the
ARTICLE IN PRESS
1726 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

Fig. 2. System overview.

moving object’s blob projection on the y-axis. The overall  The second classifier is a continuous Hidden Markov
set of elementary features is used for the creation of the Model (cHMM), also used as one-class classifier, which
final two feature vectors per instance: one vector for each supplied with the trajectory of every instance-object.
classifier. This classifier can decide whether a given trajectory
The two classifiers used at this point are able to decide follows the model of normal trajectories.
about the normality of the observed behaviour under two
different views:
Our method has been implemented to work in two
modes: offline and real-time. In the offline mode, the
 The first classifier (one-class support vector machine decision concerns the classification of a time window
(SVM)) decides if the short-term behaviour is normal of arbitrary length, which can be used for example for the
or not, supplied with feature vectors computed by characterization of video shots for video retrieval pur-
taking into account both the background subtraction poses. In its real-time aspect, the system makes a deci
and the ground plane information. The features sion in every frame whether to issue alerts as the
provided as input describe the short-term motion events happen. This decision is made by taking into
information, which we argue that constitute the consideration a time window of relatively small duration
short-term behaviour information. concerning recent camera information (images). This
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1727

aspect can be used for security purposes, aiding a human coordinate system is attached to the ground plane, so that
supervisor. a point on the plane is expressed as Pp ¼ ðxp1 ; xp2 ; xp3 ÞT . If
In the recognition step, if either classifiers gives this point is visible to the camera, which is a matter of
‘‘abnormal’’ characterization as an output, the system proper camera configuration, the homogeneous coordi-
characterizes the scene as abnormal. This means that we nates of this point on the camera plane are given by
take as output the logical ‘‘or’’ of outputs, given that a Pc ¼ ðxc1 ; xc2 ; xc3 ÞT . The homography H is a 3  3 matrix,
value of true indicates abnormality. which relates Pp and Pc as follows:
2 3 2 3 2 3
xp1 h11 h12 h13 xc1
4. Preprocessing 6 7 6 7 6 7
Pp ¼ H  Pc 3 4 xp2 5 ¼ 4 h21 h22 h23 5  4 xc2 5 (1)
xp3 h31 h32 h33 xc3
The proposed methodology uses a preprocessing step
that includes background subtraction for moving target Let the inhomogeneous coordinates of a pair of match-
segmentation and then target localization using homo- ing points xc ¼ ðxc ; yc Þ and xp ¼ ðxp ; yp Þ on the camera
graphy information. For the background subtraction, we plane (pixel coordinates) and the ground plane corre-
adopted the adaptive Gaussian mixture background spondingly. Then
model for dynamic background modelling [39]. Similar
xp1 h11  xc þ h12  yc þ h13
or better methods could have been used for the same xp ¼ ¼ (2)
xp3 h31  xc þ h32  yc þ h33
purpose, without changing our overall approach, and the
xp2 h21  xc þ h22  yc þ h23
reader is referred to the related literature for further yp ¼ ¼ (3)
information. xp3 h31  xc þ h32  yc þ h33
For target localization we have employed a homo- Each point correspondence gives an equation and four
graphy-based approach. The planar homographies are points are sufficient for the calculation of H up to a
geometric entities whose role is to provide associations multiplicative factor, if no triplet of the used points
between points on different planes, which are the ground contains collinear points. The calculation of H is a
and the camera planes in our case. In our indoor procedure done once offline and in practice many points
environment the target moves on the ground plane, so are used to compensate for errors.
mapping between planes is possible. In the following we The positioning of each target is done similarly to [16].
explain briefly how the approach works. A background subtraction algorithm extracts the silhou-
The scene viewed by a camera comprises a predomi- ettes of the targets, which move on the ground plane.
nant plane, the ground. We assume that a homogeneous From each silhouette we extract the vertical principal axis

Fig. 3. View from three cameras and extraction of the principal axis projection on the ground plane from two of the cameras. In (c) the projection is not
visible, however, the corresponding accumulator is still created in (d). In (d) three accumulators are visible—two of them very close to each other.
ARTICLE IN PRESS
1728 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

and we project it on the ground plane by replacing box. Mean optical flow and mean optical flow percentage
ðxc ; yc ; 1ÞT and ðxp ; yp ; 1ÞT in (1). The projection from each difference are derived from simple operations on optical
camera casts a ‘‘line’’ on the ground plane as depicted in flow. For these two features we use data from both the
Fig. 3. The maxima of those projected lines indicate the object’s bounding box as well as the full images of the
positions of the monitored targets, i.e. where the vertical video sequences. Optical flow is computationally expen-
principal axis touches the ground. The method is not sive, but is robust and discriminative [14]. The last two
strongly affected when the target pose is not vertical, features are computationally inexpensive, and they are
because a vertical principal axis is still extracted from extracted from the blob histogram. We have said that the
silhouettes. In such cases the indicated position is not the histogram reflects the number of the pixels that consist
exact position of the feet touching the ground but the one the foreground object per y coordinate. But, if we weigh
indicated by the vertical axes, which may be a bit out the histogram with the total number of the histo-
displaced. However, also in such cases the method still gram’s pixels, we have a probability distribution function
gives good position estimations. (pdf), pc ðyj Þ, that represents the probability of an object’s
pixel to lie in a given coordinate in the bounding box, yj .
5. Short-term behaviours Taking into account that features are extracted for
every single video frame and constitute the frame’s
feature vector, we elaborate on the calculations presented
Our first source of information for evaluating beha-
in Table 1.
viour is the so-called short-term behaviour. Our metho-
dology represents short-term behaviour with a feature
vector that consists of motion-based features. In the
(1) vðtÞ, is the Euclidean norm (over x- and y-axes of the
recognition step an one-class support vector machine is
ground plane) of the instantaneous object’s speed,
used, trained only with normal instances.
calculated from the current frame and the previous
frame object’s position.
5.1. Feature calculation (2) Algebraic mean speed, v cT ðtÞ, is the algebraic mean
value of an object’s speed within a time window that
In motion representation and analysis, our methodol- consists of the T last frames, including the frame on t 0 .
ogy uses information obtained by preprocessing, namely This value is calculated based on the algebraic sum of
the object’s bounding box, the object’s blob and sequential the x and y coordinates of the speed’s vector, which is
positions. In Fig. 4, all preprocessing-extracted informa- more robust against noise than vðtÞ.
tion are illustrated. (3) On the same grounds, the calculation of mean blob
Elaborating, from the background subtraction process difference, RðtÞ, is based on the algebraic sum of the
we extract the position of the object’s centroid inside the bounding boxes’ area change within a shifting frame
bounding box, the bounding box’s width and height and window T 0 comprising the last e.g. 5–10 frames. ðwc ðjÞ,
the object’s blob. Figs. 4a–c show the captured frames hc ðjÞ represent the width and the height of the blob for
from each camera with the corresponding bounding camera c for t ¼ j.
boxes. Figs. 4d–f show the background subtraction masks, (4) Optical flow, F i is first calculated on every frame and
from where the blob is extracted. for each camera i, but only for the object’s edges inside
The blob histogram is calculated based on the blob the bounding box. Then, the optical flow value is
information. The histogram of the blob indicates the normalized by the number of the pixels that partici-
number of pixels that belong to the blob for every y pate in the calculation—which are the pixels of the
coordinate. Figs. 4g–i show the histograms of the given edges—and the bounding box area. Then we compute
blob. the mean optical flow value from all cameras.
From homography estimation we calculate the object (5) Mean optical flow difference is the difference between
ground position and thus the trajectory which is ex- the current and the previous value of the mean optical
pressed as a sequence of ðx; yÞ vectors on the ground plane. flow divided by the previous value. This offers the
Fig. 4j illustrates the object’s trajectory in the scene, percentage of optical flow change. We calculate the
calculated from all views. features for each camera and we keep the maximum
The short-term activity is represented by a seven- value over all cameras.
dimensional feature vector, as follows: (6) Max entropy histogram difference, maxðDHðtÞÞ; is
cT ðtÞ; RT ðtÞ; FðtÞ; DFðtÞ; maxðDHðtÞÞ; maxðDSDðtÞÞÞ based on the Shannon entropy, HðtÞ, that is a measure
f ¼ ðvðtÞ; v
of the uncertainty associated with a random variable.
(4)
This means that the more a given pdf resembles a
The features’ calculation is presented in detail in Table 1, uniform pdf, the greater the entropy value. The main
with the features being separated into four categories idea is that when an abrupt motion occurs, the
according to what type of information they depend on. differences in entropy’s values will be significantly
The first two features, speed and algebraic mean speed, greater than those of a normal slow motion.
are computationally inexpensive and time efficient calcu- (7) Max standard deviation difference, maxðDSDðtÞÞ, is
lated only from trajectory data. Algebraic mean blob also calculated from the object blob’s histogram.
difference is also time efficient calculated only from the Standard deviation of the histogram (std) is a measure
background subtraction data on the object’s bounding of the spread of its values. The change on a
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1729

Fig. 4. (a)–(c) Frames captured from each camera with bounding boxes. (d)–(f) Background subtraction masks and blob indication per camera. (g)–(i)
Histogram of the object’s blob for each camera. (j) Trajectory formed by the calculated ground points.

histogram’s standard deviation value from one point be justified by the fact that normal behaviours are easier
of view,DSDðtÞ, can give us important information for to observe and thus whatever deviates from them can be
the motion of the object in that it indicates within- defined as abnormal. Thus we do not need to model
bounding-box movement. We calculate the features explicitly abnormal behaviours and we do not need
for each camera and we keep the maximum value over labelling of data, as long as our assumption on the
all cameras as the final feature value. sparsity of abnormality stands. This is what makes this
approach unsupervised.
The one-class SVM builds a boundary that separates
5.2. Short-term behaviours classification the training data class from the rest of the feature space.
For more details the reader is referred to [23].
The decision whether a short-term behaviour is normal
or not can be taken by employing an one-class SVM as
proposed by Scholkopf [31]. The selected model does not 6. Trajectories classification
require a labelled training set to determine the decision
surface. The one-class SVM is similar to the standard SVM Our second information source for evaluating beha-
in that it uses kernel functions to perform implicit viour is the trajectory. In a museum scenario, the
mappings and dot products and that the solution is only trajectory of a person entering from the designated
dependent on the support vectors. Such an approach can entrance, then approaching the cashier to buy a ticket,
ARTICLE IN PRESS
1730 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

Table 1  the efficient likelihood calculation in the forward–-


Features calculated and used for classification. backward for long sequences, taking into account
current machine limitations.
Features Type

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1. Speed
vðtÞ ¼ ðxðtÞ  xðt  1ÞÞ2 þ ðyðtÞ  yðt  1ÞÞ2
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
!2 !2ffi
2. Algebraic mean u
u 1 P t 1 P t 6.1. One-class continuous Hidden Markov Model
speed vcT ðtÞ ¼ t vx ðiÞ þ vy ðiÞ
T i¼tTþ1 T i¼tTþ1
PnumCam The problem of discriminating between normal/abnor-
3. Algebraic mean Ri ðtÞT 0
bounding box RðtÞ ¼ i¼1 where
numCam mal trajectories concerns the definition of a measure that
difference 1 P t wc ðjÞ  hc ðjÞ  wc ðj  1Þ  hc ðj  1Þ would give sufficiently different values for the two classes.
Rc ðtÞT 0 ¼ 0
T j¼tT 0 þ1 wc ðj  1Þ  hc ðj  1Þ
The variable length of the trajectories poses additional
4. Mean optical flow SnumCam Fi
FðtÞ ¼ i¼1
whereF i is the normalized difficulties. Long, normal trajectories would have cHMM
numCam
optical flow from camera i generation probability values comparable to small values
5. Mean optical flow FðtÞ  Fðt  1Þ of short, abnormal trajectories, so the observation’s length
DFðtÞ ¼
difference Fðt  1Þ factor needs to be removed.
6. Max entropy H ðtÞ  Hi ðt  1Þ
maxðDHðtÞÞ ¼ maxi i , with If we can prove that for a normal observation sequence
difference Hi ðt  1Þ
PN
ðOnormal Þ and for an abnormal one ðOabnormal Þ the following
1pipnumCam where Hc ðtÞ ¼  pc ðyj Þ  condition must hold:
j¼1
log pc ðyj Þ with pc ðyj Þ the histogram value in yj
log PðOabnormal jlÞ log PðOnormal jlÞ
location for camera c and N the bounding box’s 5 (5)
height lengthðOabnormal Þ lengthðOnormal Þ
7. Max standard std ðp ðyÞ Þ  stdi ðpi ðyÞt1 Þ
maxðDSDðtÞÞ ¼ maxi i i t , then we will be able to use it as a classification measure.
deviation stdi ðpi ðyÞt1
difference with 1pipnumCam In (5) the logarithms help us sharpen the differences
between values below 1, and the division with the
sequence’s length normalizes the computed measure.
The anomaly detection problem begins with the
then browsing into the room and looking around, and definition of ‘‘what can be labelled as normal’’. We may
finally exiting from the designated exit should be define as normal the trajectories that between two time
characterized as normal. Trajectories of persons entering instances t and t þ 1, the probabilities of the correspond-
from the exit without first visiting the ticket stand, or ing observations are proportional to each other, and their
going the wrong direction should be labelled as abnormal. fraction can be viewed as a random variable D. Taking into
Some works in literature use rules to define the consideration that Ot is the observation sequence from
restricted areas and therefore distinct normal from time ¼ 0, until time ¼ t, the random variable D depends
abnormal trajectories. We apply an one-class learning only on the model, lðA; B; pÞ [29].
strategy, as in the short-term behaviours, by training our Thus, given the model and two consecutive observa-
time series classifier using only the normal trajectories. tions Ot , Otþ1 , there is a variable D, with an expected value
Each sample is a position vector ðx; yÞ of the target in the d ¼ E½D such that
global coordinate system in each frame (calculated as
described in Section 4). The extracted normal trajectories PðOtþ1 Þ
PðOtþ1 Þ ’ d  PðOt Þ ) ’d (6)
(sequences of ðx; yÞ vectors) are used for training a PðOt Þ
continuous Hidden Markov Model [29] and constitute with 0ot þ 1pT. This assumption is derived by the facts
the model observations. that:
For convenience, we use the compact notation l ¼
ðA; B; pÞ to indicate the complete parameter set of the
model, where:  D depends only on the model;
 normal trajectories have a high probability of being
 A is the state transition probability distribution matrix. generated by the model;
 B is the observation probability density function per  the expected value represents the average amount one
state matrix. ‘‘expects’’ as the outcome of the random trial when
 p is the initial state probability distribution. identical odds are repeated many times.

The original Baum Welch algorithm is used for the We can also see that, 0odp1 because PðOtþ1 ÞpPðOt Þ.
training step, while for the recognition step we propose According to (6), we can expand the calculations as
a modified Forward Backward procedure (see Section 6.2). follows:
The methodology presented here proposes solution to two
t
problems: PðOtþ1 Þ ’ d  PðOt Þ ) PðOtþ1 Þ ’ d  PðO1 Þ
) log PðOtþ1 Þ ’ t  log d þ log PðO1 Þ
 the use of the Hidden Markov Model as an one-class log PðOtþ1 Þ 1
) ’  ðt  log d þ log PðO1 ÞÞ
classifier. tþ1 tþ1
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1731

which results after replacing t with t  1 in the following: end up underflowing current computers’ number storage.
Solutions like sampling the trajectory, only partially solve
log PðOt Þ 1
’  ððt  1Þ  log d þ log PðO1 ÞÞ; 8t : 0otpT the problem.
t t
In order to tackle the problem, one may rescale the
(7)
conditional probabilities using carefully designed scaling
As abnormal, we define the trajectories for which the as proposed in [29]. We, however, have devised a method
probability of their corresponding D value will be very for the approximation of the log-probability of a long
low. For those trajectories, we assume that there exists a sequence that gives the advantage of computational
transition from time k to time k þ 1 where, due to either simplicity and in parallel keeps the properties required
the transition probability aij or the observation probability for normal and abnormal trajectories’ classification (Eq.
bj ðOÞ, the D value probability (i.e. the probability to have (5)). Our approximating methodology avoids the calcula-
such a D value for the given model) decreases signifi- tion of the scaling factor and uses integer instead of real
cantly, because the value of Dkþ1 for the given time point values. We have named this method observation log-
k þ 1 becomes lower than expected: probability approximation (OLPA).
PðOkþ1 Þ Given the trained continuous Hidden Markov Model
9k : ¼ Dkþ1 ; pðDÞ51; Dkþ1 5d (8) and within the recognition step, in order to compute the
PðOk Þ
probability of a known observation sequence the Forward
Before that k, the trajectory can be characterized as Backward algorithm is used [29]. This algorithm consists
normal i.e. of the following steps:
PðOtþ1 Þ
8t : tok; ¼d (9) (1) Initialization: a1 ðiÞ ¼ pi  bi ðO1 Þ.
PðOt Þ P
(2) Induction: atþ1 ðjÞ ¼ ½ N i¼1 at ðiÞaij bj ðOtþ1 Þ.
From the above we have P
(3) Termination: PðOjlÞ ¼ N i¼1 aT ðiÞ.
l1
PðOkþ1 Þ ¼ log Dkþ1 þ logðd  PðO1 ÞÞ
log PðOkþ1 Þ 1 To compensate for the constant decrease in the like-
) ¼  ðlog Dkþ1 lihood in long sequences we modified the above algorithm
kþ1 kþ1
þ ðk  1Þ  log d þ log PðO1 ÞÞ (10) so that instead of multiplications we use additions of
logarithms. Some background assumptions are given next.
For the discrimination problem (see Eq. (5)), the following By definition if bxc is the floor of x number,
must hold: j log a  blog acjo1. Thus, we can approximate
log PðOkþ1 Þ log PðOk Þ log PðOjlÞ=lengthðOÞ with blog PðOjlÞc=lengthðOÞ. Now, due
5 (11) to the fact that for long sequences a  PðOjlÞ is below 1
kþ1 k
and that log a ! Infinity, one may assume that
By letting t ¼ k in (7) and using (10) in (11) we have log a ’ blog ac. This approximation is acceptable, because
1 the estimation error is bounded (less than 1). Long normal
 ðlog Dkþ1 þ ðk  1Þ  log d þ log PðO1 ÞÞ sequences give small values of cHMM probabilities, due to
kþ1
1 successive multiplications, making the logarithm of those
5  ððk  1Þ  log d þ log PðO1 ÞÞ (12) probabilities to be too high to let the 1 to be damaging.
k
Because k represents time, k40. On the other hand Dkþ1 Assuming this approximation is acceptable, it can be
inserted to Forward Backward algorithm.
and d represent the value of the probabilities’ ratio, so
0oDkþ1 ; do1. According to that remark we can assume First, we define functions necessary for computations
in cHMM algorithms, using logarithms:
that for sufficiently large sequences, e.g. for kp10, 1=k ’
1=ðk þ 1Þ in (12) due to the fact that log d; log d51. Thus, blogða  bÞc ¼ blog a þ log bc ’ bblog ac þ blog bcc
Eq. (12) can be ¼ blog ac þ blog bc
log Dkþ1 þ ðk  1Þ  log d þ log PðO1 Þ5ðk  1Þ  log d þ log PðO1 Þ Additionally the following applies for a sequence of xi ,
) log D50 (13) the bigger of which is xmax :
X
Since Dkþ1 5d, Dkþ1 is a sufficiently small value that gives xmax p xi pn  xmax
log Dkþ1 50. Given that (13) is valid, the initial assump- X 
tion, Eq. (5), is true. Therefore, (5) can be used as criterion ) logðxmax Þp log xi p logðnÞ þ logðxmax Þ
for abnormal trajectory detection. The order of magnitude for xi is 109 or less and for n is
P
10, so logð xi Þ ’ maxi ðlogðxi ÞÞ or
6.2. Log likelihood approximation in long sequences P
blogð xi Þc ’ bmaxi ðlogðxi ÞÞc.
According to all the above we can conclude to a
As mentioned previously, the continuous Hidden modification of Forward Backward algorithm, using the same
Markov Models have problems with long sequences. This dynamic programming idea: let Loga  blog ac, and a~ be the
is due to the multiplications in the Forward Backward approximated a, then the following approximations apply:
algorithm, which is used to calculate the observation
probability given the model. The constant decrease of the
observation probability results to a very low value, which  a~ 1 ðiÞ ¼ Logðpi  bi ðO1 ÞÞ ¼ blog pi þ log bi ðO1 Þc
ARTICLE IN PRESS
1732 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

XN experiments measure the performance of two variations


 a~ t ðiÞ ¼ Logðð j¼1 at1 ðjÞ  aij Þ  bj ðOt ÞÞ of our process, namely the offline and the real-time
XN process.
¼ blogðð j¼1 at1 ðjÞ  aij Þ  bj ðOt ÞÞc
Our cameras are the AXIS 214PTZ (network cameras),
XN
¼ blogð j¼1 at1 ðjÞ  aij Þ þ log bj ðOt Þc from which the frames are received through HTTP
XN requests. The communication with the cameras is per-
’ blogð j¼1 at1 ðjÞ  aij Þc þ blog bj ðOt Þc formed through an IP network. For frame synchronization
we used a Network Time Protocol (NTP) server which
’ maxj ðblog at1 ðjÞ  aij cÞ þ blog bj ðOt Þc
gives time stamps to each frame, so the closest frame
’ maxj ðblog at1 ðjÞ þ log aij cÞ þ blog bj ðOt Þc
triplet is considered to match a single time frame.
’ maxj ðblog at1 ðjÞc þ baij cÞ þ blog bj ðOt Þc In our system, we use the LibSVM [8] library to train an
¼ maxj ða~ t1 ðjÞ þ blog aij cÞ þ blog bj ðOt Þc one-class SVM model with a radial basis function (RBF)
kernel. The training set consists of feature vectors of
XN
~ lÞ ¼ Logð
 PðOj a ðiÞÞ normal behaviours only. The radial basis function has been
i¼1 T
XN chosen based on experimental results, where we had used
¼ blog a
i¼1 T
ðiÞc all the alternatives (polynomial, linear, sigmoid). SVM
’ maxi ðblog aT ðiÞcÞ parameters were also optimized through trial and error.
¼ maxi a~ t ðiÞ In order to calculate the features associated with the
optical flow, many restrictions were taken into considera-
According to the above approximations, we can express tion and various normalizations were applied, to avoid
the algorithm as follows. noise and reduce the computational cost. Problems were
mainly due to our baseline background subtraction, as
(1) Initialization: a~ 1 ðiÞ ’ blog pi c þ blog bi ðO1 Þc. well as to the noise in the cameras’ unfiltered image data.
(2) Induction: To the end of reducing the computational cost, we have
a~ t ðiÞ ’ maxj ða~ t1 ðjÞ þ blog aij cÞ þ blog bj ðOt ÞcÞ. limited the optical flow’s calculation only in the fore-
(3) Termination: PðOj ~ lÞ ’ maxi a~ t ðiÞ. ground regions. We have also used edge detection to avoid
noise in the extraction of the optical flow. It is well known
This observation log-probability approximation helps us that the optical flow vectors may have high values in
overcome the problem of consecutive multiplications, by background regions that become unoccluded by a moving
making it possible to use sum of integers. Our achieved target, even though these regions do not move at all. This
goal was to be able to calculate an approximation of the would significantly affect our classification scheme and
probability of a long sequence that would otherwise be had to be avoided. To overcome this problem, we have
impossible to compute, due to machine limitations. applied the Canny method for edge detection [25] within
the blobs’ boundaries (see Fig. 6). Then, we have
calculated the optical flow only for the pixels belonging
7. Experiments
to these edges.
Due to the complex background, edges from the
As a scene for our experiments we have used our lab, background added noise to our calculations, thus we have
where we installed three cameras, as illustrated in Fig. 5a, made use of some of the first frames from each video in
and there we tried to simulate some common scenarios.2 order to extract background edges and subtract them from
We have simulated a protected exposition room, where the final optical flow calculation. This choice is justified by
only one visitor is allowed and he or she has to follow a the fact that we expect to have the highest amount of
certain path for entering and exiting. Also, only certain optical flow around the edges, while the optical flow is
short-term behaviours are allowed. As short-term beha- expected to be low within homogeneous regions, thus the
viour we label the action taken by a single person within a most useful information for our classification is not lost. In
time period of 25 frames that correspond approximately Fig. 6, you can see all the processing steps described here.
to 1 s in real world. An artificial barrier inserted in the It should be noted that the learning process based on the
scene does not allow entering the experiment area from a first few frames can be considered as part of the initial
certain side and there also exists an ‘‘emergency exit’’. system calibration (also see Section 4).
When someone visits areas which are not allowed, we As already indicated, the background in our input data
consider to have a case of abnormal activity (see Fig. 5b). was natural (non-uniform) and we had to deal with noise.
Similarly, when areas are visited in the wrong order (e.g. In our experiments we used classic surveillance cameras
entering from the exit or exiting from the entrance) with low resolution ð352  288Þ, while the images
according to the modelled continuous Hidden Markov captured were compressed with JPEG compression meth-
Model, this activity is also labelled as abnormal. Further- od, resulting to loss of image quality and to the creation of
more, we consider normal short-term activity to be artifacts that sometimes affected the background subtrac-
something like ‘‘walking’’, ‘‘standing still’’ or ‘‘active’’ tion. Therefore, we used the a priori knowledge of a
and in no case ‘‘running’’ or ‘‘abrupt motion’’. The human target’s size in order to avoid bounding boxes of
inexact sizes. The trivial rule used was that the bounding
2
The custom corpus used within our experiments can be made box can have a maximum width and height and all other
available to any interested party, via e-mail correspondence. bounding boxes were to be omitted. The threshold for
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1733

Fig. 5. (a) View of our experimental room (exposition room). (b) Normal and abnormal trajectory example. In the latter the target goes over the barrier.

7.1. Testing the one-class cHMM assumption

To see whether the PðOtþ1 Þ=PðOt Þ ratio for normal


trajectories can be described based on a predefined
probability density function, that can in turn be repre-
sented by its expected value, we trained a cHMM model
with normal trajectories only.3 Then, we generated several
sequences O using this cHMM. These sequences should
obviously be considered normal. We then calculated the
ratio PðOtþ1 Þ=PðOt Þ for all values of t, i.e. all subsequences
of individual O sequences. In Fig. 7 we show the results of
the logarithm of probabilities log PðOtþ1 Þ=PðOt Þ to offer
more detail, since the magnitude of the probability values
is very low. What Fig. 7 shows is that a normal
Fig. 6. (a) Foreground object inside its bounding box. (b) Edges extracted
distribution appears to offer a good approximation of
with Canny inside the bounding box. (c) Edges extracted with Canny
inside the box without the edges of the background. the actual distribution of ratio values, even though the
ratio values appear to be bounded.

7.2. OLPA performance


considering the size of a bounding box as acceptable was
experimentally determined. Obviously, this heuristic is
For long observation sequences we expect, based on
dependent on the input video and has serious defects, for
the analysis in Section 6.2, that our probability calculation
example in the case where a target human lies on the floor
algorithm (OLPA) will give us results strongly correlated to
or extends his hands. A more robust approach for back-
the results the Forward Backward procedure returns.
ground detection and removal should be used to eliminate
Experiments show that, indeed, the Forward Backward
the limitations posed.
algorithm and the OLPA algorithm have strongly corre-
In order to determine which of our features were the
lated results in short observation sequences as well. We
most promising for the desired classification setting we
have performed a t-test to show that the mean values of
used a subset of our data, where both normal and
the distributions of the normalized-log P (returned by the
abnormal instances had been labelled. Using an informa-
Forward Backward algorithm) and normalized-Log P (re-
tion gain criterion and a 10-fold cross validation metho-
turned by OLPA) are the same within statistical error
dology, we have found that the most promising feature is
ðp-valueo0:05Þ. Additionally, we have calculated Pearson
the max entropy difference (see Table 1 in Section 5.1). The
and Kendall correlation (to allow for non-Gaussian data)
overall ranking of the other features based on the
between the two probability estimations and, as is
information gain criterion is: algebraic mean speed, max
illustrated in Fig. 8, the samples of the two distributions
standard deviation difference, speed, mean optical flow,
are very strongly correlated ð40:98Þ, with a p-value much
algebraic mean blob rate and mean optical flow rate. Of
lower than the usual threshold of 0.05.
course the labelled data were only used in this process,
which we hoped would offer more intuition on what
features offer higher discriminative potential. 3
We have used the JaHMM library [13].
ARTICLE IN PRESS
1734 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

Fig. 7. Fraction of logarithms of cHMM probabilities in normal trajectories.

approach. Fifteen videos with normal and five videos with


abnormal behaviours were captured. Each of the videos
lasts between 3000 and 6000 frames and contains one to
five different long-term behaviours, resulting in a total of
42 normal behaviours and 22 abnormal behaviours. Each
behaviour has been performed by one of three different
actors, through random selection. Out of the 22 abnormal
behaviours, 14 are abnormal based on the motion features
(e.g. abrupt motion) and 19 are abnormal based on the
trajectory—which means that some behaviours are ab-
normal for both criteria used. It should be noted that the
same activities performed by different actors can differ
greatly. The videos with normal behaviours illustrate a
person entering the room, buying a ticket, browsing and
looking around for several minutes and exiting the room
using a preset path. The abnormal behaviours consist of
running, abrupt motion or unexpected trajectory.
Our experiments, for offline testing, consist of a test set
formed by four normal behaviours per fold, as well as 22
abnormal behaviours that were used in all the folds. In the
offline procedure each classifier makes a decision of the
Fig. 8. Correlation between samples of the two distributions, normalized whole behaviour’s abnormality. The system signals ab-
log P (Forward Backward algorithm) and normalized Log P (OLPA). normality if any of the constituent classifiers has indicated
abnormality.
The final decision of the observed behaviour’s abnorm-
Our next experiments were performed in two steps, ality is taken by thresholding both classifiers’ (SVM and
offline training and testing, and real-time testing. cHMM) outputs. The thresholds are automatically calcu-
lated in the training step, which takes place offline before
the operation of our system. To be more specific, during
7.3. Offline experiments the training step, videos with normal behaviours are input
to the system, features are calculated and two classifier
We have performed a 10-fold cross validation method models (one-class SVM and cHMM) are trained and
to test the effectiveness of our system using the offline stored. Then, using n-fold cross validation to ascertain
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1735

Fig. 9. (a) Percentage normality in normal and abnormal behaviours for support vector machine. (b) Output of continuous Hidden Markov Model for
normal and abnormal behaviours. Black colour is for normal behaviours and red for abnormal behaviours. (For interpretation of the references to colour in
this figure legend, the reader is referred to the web version of this article.)

generality, the cHMM’s output probabilities are stored in that normal and abnormal pdfs are different for both
order to be processed and used to extract the thresholds classifiers, thresholding their outputs was a logical
based on distributional characteristics (mean value, decision.
standard deviation and minimum value; also see For SVM-based classification we set the threshold to be
Eq. (14)). For the decision concerning the SVM classifier, the following function of the mean and the standard
we also extract a threshold which indicates the maximum deviation of the distribution of the number of allowed
number of abnormal frames we allow within a normal, abnormal frames within a normal sequence:
predefined length sequence of frames. Therefore, SVM
thresholdSVM ¼ meanðHsvmnormal Þ  2:5  stdðHsvmnormal Þ
decisions are also used to determine this second thresh-
old. At this point the system is considered to be calibrated. (14)
In case someone wishes to apply the system at a different For HMM outputs the minimum value of the distribu-
location, only the training step needs to be repeated and tion of normalized log-probabilities of the normal in-
the system will be applicable to the new environment. stances was considered to be the threshold value that
The experiments prove that the system is highly separates normal trajectories from the abnormal ones:
automated, as minimum human interference is needed
thresholdHMM ¼ minðHhmmnormal Þ (15)
during the training step and the results are very encoura-
ging. We remind the reader that in the background where Hsvm is the histogram of SVM’s outputs and
subtraction step the first 250 frames are used for training, Hhmm is the histogram with HMM’s outputs.
where no person is inside the scene. Those frames are
used to extract the background edges (also see Section 7.4. Real-time experiments
5.1). Features identifying short-term behaviour are ex-
tracted and used to train an one-class SVM with a radial In both the online and offline approaches the same
basis function kernel. Simultaneously, trajectories were training set (therefore the same models) and thresholds
extracted in order to be inserted into a continuous HMM have been used. The only difference is that in the online
for training. approach we had the system emit a decision for every
The threshold values have been calculated based on frame instead of for the whole behaviour. The system
the training test. In Fig. 9, distributions of SVM and cHMM performance in both approaches is encouraging, as will be
outputs for normal as well as abnormal behaviours are shown in the following paragraphs.
shown. Fig. 9a depicts the normality percentage for Real-time experiments follow a slightly different
normal and abnormal behaviours within a time window approach. Each frame is labelled as normal or abnormal
that includes the whole behaviour, i.e. how many feature depending on both classifiers’ decision. All the videos
vectors are recognized as normal in the entire behaviour. contain 34 479 normal frames, i.e. frames for which the
We used a t-test in order to ensure that the two density behaviour should be judged as normal, and 5260
functions are different and the resulted p-value was o1%. abnormal frames. From the 4537 frames 1251 have
Because of the fact that the two pdfs are not Gaussian, we motion-based abnormality and 4537 have trajectory-
have also applied the Kolmogorov–Smirnov test or KS-test based abnormality. The SVM classifier classifies a frame,
[3] that does not require normal pdfs. The Kolmogor- but the SVM-based decision also takes into account the
ov–Smirnov test indicated that, indeed, the normal and labels of the previous 24 frames, based on the percentage
abnormal samples come from different pdfs of abnormal frames within this history of 25 frames. The
ðp-value ¼ 2:09e  07Þ. Fig. 9b shows the cHMM’s output cHMM returns a normalized log-probability value which
for normal and abnormal behaviours. The two tests (t-test characterizes the object’s sampled trajectory since the
and KS test) were also applied to these results with both object’s first appearance in the scene and up to the current
p-values substantially below 1%. According to the remark frame. The final system result for each frame is the logical
ARTICLE IN PRESS
1736 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

‘‘or’’ of these two outputs, where the value ‘‘true’’ Table 4


indicates a decision of abnormality for a given frame. Precision and recall of the single-camera system, when applied on the
CAVIAR dataset.

7.5. Overall system performance Offline overall Real-time overall

Precision and recall have been calculated for the offline Precision Recall Precision Recall

and the real-time experiments. For each approach we give Normal 0.8882 0.775 0.7625 0.7309
the performance for both the SVM and HMM classifier Abnormal 0.3129 0.5125 0.2273 0.2582
models separately, as well as for the whole system in Table
2.
Even though the overall system performance is very to the fact that one camera is not able to give as robust
satisfactory, we should note that the precision of motion- ground point estimation of the object as the estimation
based abnormal instances, through the use of the SVM given by multiple cameras. Moreover, multiple cameras
classifier, appears to be low. This indicates that we should provide the benefit of more information, especially in the
further optimize SVM parameter values to the given case where the object is not within the view of one of the
classification problem, as it has been seen in literature available cameras.
that SVM performance can be highly dependent on the It is worth pointing out that in Table 3 we average
selected parameters. However, the simultaneous use of precision and recall taking into account two of our three
both classifiers helps the system perform highly for the cameras, due to the fact that the third camera could not
given dataset. give us proper output since the object was frequently out
of its view. The multi-camera system overcomes this
7.6. Multiple cameras vs. one camera problem by compensating for any missing camera data. In
addition, as we can observe from Tables 2 and 3, cHMM
To clarify the reasons for using multiple cameras precision and recall in both offline and real-time experi-
instead of one camera, we have performed a set of ments, are greater with multiple cameras than with only
experiments only with the data of one camera from our one camera. On the other hand, precision and recall in
lab dataset. The system’s results (precision and recall) are both offline and real-time experiments for SVM are in
shown in Table 3. As we can see the system’s performance most cases higher in the single camera system than in the
is lower than the one produced by multiple cameras, due multi-camera system. These observations have led us to
two main conclusions. The first is that our assumption
that multiple cameras provide us with a more precise
position of the object (more accurate trajectory) is correct.
Table 2
The second is that our application of trivial fusion of
Precision and recall for the 3-camera system on our dataset.
motion data from different cameras—we just calculated
SVM HMM Overall mean feature values over the three cameras—can cause a
decrease of performance and should be avoided. Future
Precision Recall Precision Recall Precision Recall
work should research how motion feature values from
Offline different cameras should be combined.
Normal 0.9048 0.9286 1 0.9762 1 0.9286 In order to further allow for solid comparison, we have
Abnormal 0.7674 0.7071 0.95 1 0.88 1 chosen to use a commonly used dataset for additional
comparisons. The corpus chosen is the set of video
Real-time sequences available for result comparison from the
Normal 0.9875 0.9228 0.9960 0.9770 0.9960 0.9105 PETS04 workshop [12]. The sequences have already been
Abnormal 0.2419 0.6788 0.8478 0.9704 0.8478 0.9375
used by the CAVIAR project. The system’s performance
The column ‘‘Overall’’ indicates the performance of the combined when applied on these data is depicted in Table 4. It is
decision. worth mentioning that:

 the scenarios in this dataset are different from the


Table 3 scenarios we have assumed.
Average precision and recall for the single-camera system on our dataset.  no restricted areas have been defined therefore cHMM
performance is not included in the results, since the
SVM HMM Overall
results of the cHMM indicated normal trajectories and
Precision Recall Precision Recall Precision Recall were, therefore, useless.
 in the CAVIAR dataset, there is no explicit definition of
Offline normality and abnormality. Thus, we have considered
Normal 0.9788 0.8375 1 0.95 1 0.8
‘‘running’’ and ‘‘fighting’’ to be abnormal, while all the
Abnormal 0.6708 0.9464 0.913 1 0.7366 1
rest were considered to be normal.
Real-time
Normal 0.9945 0.9148 0.9953 0.9597 0.9975 0.8525
From the CAVIAR dataset we have used only videos from
Abnormal 0.2696 0.8569 0.7544 0.9637 0.5042 0.9861
a single camera view. There were 11 normal behaviour
ARTICLE IN PRESS
P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738 1737

videos4 and 4 abnormal.5 The extracted different beha- Our experimental results demonstrated the good
viours were a total of 43 normal and 8 abnormal ones. The performance of the system in the task of recognizing
number of frames was 12 188 normal and 2669 abnormal. human behaviour’s abnormality in a somewhat noisy
In the CAVIAR dataset evaluation of performance, the environment, with different scenarios of action and
detection of abnormal behaviour appears to be more participation of different actors. The experiments were
difficult than in our dataset. Given this difference in implemented in offline and real-time conditions, with
performance, we have sought the reasons for the decrease similar results, implying the robustness of the method.
in efficiency and found some possible causes. In our use of Furthermore, experiments with a single camera version of
the CAVIAR dataset, we used the whole videos described the system provide us the incentive to consider another,
as cases of ‘‘walking’’, ‘‘browsing’’ and ‘‘meeting’’ as input more robust method for the fusion of data in order to
for normal behaviour. We then discovered that a quick improve performance.
(running) motion can be found within a walking video, The multiple camera methodology has, so far, been
inducing noise in the discriminative ability of the speed- tested on scenarios with only one object inside the scene,
based features. Then we saw that occlusion may have without taking account any interactions between actors. It
caused problems, due to the fact that there are data from would be worthwhile to further investigate the effective-
only one camera. The edge-detection process and the ness of our system using more features, such as the
optical flow extraction fail when, for example, two people distance of the object from each camera, in order to
are too close the each other and fighting. In these cases improve the motion-based discriminatory performance of
the positioning of the targets with respect to the camera the system. However, other methodologies could also be
highly affects the method concerning the use of optical tested in the place of the SVM classifier.
flow, but only when a single camera is used. The use of
three cameras and proper fusion of information may offer
better optical edge detection and, thus, optical flow
values. The two identified problems partially explain the Acknowledgements
loss of recall for abnormal instances, even though more
experiments should be conducted to verify these findings. This work is being co-funded by the Greek General
One final comment would be that abnormality in such Secretariat of Research and Technology and the European
actions as fighting can be detected much more easily if Union via a PENED project.
one uses interaction information between actors, which
was not within the scope of this work. References

8. Conclusion and future work [1] F. Bashir, A. Khokhar, D. Schonfeld, View-invariant motion trajec-
tory-based activity classification and recognition, Multimedia
Systems 12 (1) (2006) 45–54.
In this paper, we have presented a set of theoretical [2] F. Bashir, W. Qu, A. Khokhar, D. Schonfeld, HMM-based motion
and practical tools for the domain of behaviour recogni- recognition system using segmented PCA, in: Proceedings of IEEE
tion, which have been integrated within a unified, International Conference on Image Processing (ICIP), Genoa, Italy,
vol. 3, 2005, pp. 1288–1291.
automatic, bottom-up system based on the use of multiple [3] Z. Birnbaum, F. Tingey, One-sided confidence contours for prob-
cameras performing human behaviour recognition in an ability distribution functions, The Annals of Mathematical Statistics
indoor environment, without a uniform background. The 22 (4) (1951) 592–596.
[4] M. Black, A. Jepson, A probabilistic framework for matching
approach’s innovation is fourfold: temporal trajectories: condensation-based recognition of gestures
and expressions, in: Proceedings of European Conference on
 We propose the application of two different criteria of Computer Vision (ECCV), Freiburg, Germany, vol. 1406, 1998, pp.
909–924.
human behaviour’s abnormality used within a single [5] F. Bobick, W. Davis, The recognition of human movement using
methodology that needs only normal data for training. temporal templates, IEEE Transactions on Pattern Analysis and
 We have proven that the application of multiple Machine Intelligence 23 (3) (2001) 257–267.
[6] O. Boiman, M. Irani, Detecting irregularities in images and in video,
cameras can be fruitful, when it comes to determining International Journal of Computer Vision 74 (1) (2007) 17–31.
abnormality based on the trajectory. [7] C. Bregler, J. Malik, Learning appearance based models: mixtures of
 We have presented a methodology that lets a contin- second moment experts, Advances in Neural Information Proces-
sing Systems 9 (2) (1997) 845.
uous Hidden Markov Model function as an one-class
[8] C. Chang, C. Lin, LIBSVM: a library for support vector machines,
classifier, with very promising experimental results. Software available at: hhttp://www.csie,ntu.edu.tw/cjlin/libsvmi,
 We have accomplished to offer an alternative to the vol. 80, 2001, pp. 604–611.
[9] H. Dee, D. Hogg, Detecting inexplicable behaviour, in: British
Forward Backward algorithm for the recognition step
Machine Vision Conference, London, UK, 2004, pp. 477–486.
of cHMMs in order to overcome arithmetic underflow [10] T. Duong, H. Bui, D. Phung, S. Venkatesh, Activity recognition and
in the case of very long observation sequences, without abnormality detection with the switching hidden semi-Markov
loss of precision. model, in: IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR), San Diego, CA, USA, vol. 1, 2005, pp.
838–845.
4
[11] A. Efros, C. Berg, G. Mori, J. Malik, Recognizing action at a distance,
Namely the normal videos were: browse1-browse4, wk1-wk3, in: Proceedings of Ninth IEEE International Conference on Computer
meetSplit3rdGuy,meetWalkSplit,meetWalkTogether1-meetWalkTo- Vision (ICCV), Nice, France, vol. 2, 2003, pp. 726–733.
gether2. [12] R. Fisher, The PETS04 surveillance ground-truth data sets, in:
5
Namely the abnormal videos were: FightChase, FightOneMan- International Workshop on Performance Evaluation of Tracking and
Down, FightRunAway1-FightRunAway2. Surveillance, 2004.
ARTICLE IN PRESS
1738 P. Antonakaki et al. / Signal Processing 89 (2009) 1723–1738

[13] J. Francois, Jahmm-hidden Markov model (hmm): an implementa- Vision and Pattern Recognition (CSCCVPR), vol. 2, 2003, pp.
tion in java, 2006. 620–625.
[14] B. Horn, B. Schunck, Determining optical flow, Artificial Intelligence [27] J. Owens, A. Hunter, Application of the self-organising map to
17 (1–3) (1981) 185–203. trajectory classification, in: Proceedings of IEEE International
[15] W. Hu, D. Xie, T. Tan, A hierarchical self-organizing approach for Workshop Visual Surveillance, Dublin, Ireland, 2000, pp. 77–83.
learning the patterns of motion trajectories, IEEE Transactions on [28] S. Park, M. Trivedi, Analysis and query of person–vehicle interac-
Neural Networks 15 (1) (2004) 135–144. tions in homography domain, in: Proceedings of the 4th ACM
[16] W. Hu, M. Hu, X. Zhou, T. Tan, J. Lou, Principal axis-based International Workshop on Video Surveillance and Sensor Net-
correspondence between multiple cameras for people tracking, works, Santa Barbara, CA, USA, 2006, pp. 101–110.
IEEE Transactions on Pattern Analysis and Machine Intelligence 28 [29] L.R. Rabiner, A tutorial on hidden Markov models and selected
(4) (2006) 663–671. applications in speech recognition, Proceedings of the IEEE 77 (2)
[17] A. Ivanov, F. Bobick, Recognition of visual activities and interactions (1989) 257–286.
by stochastic parsing, IEEE Transactions on Pattern Analysis and [30] P. Ribeiro, J. Santos-Victor, Human activity recognition from video:
Machine Intelligence 22 (8) (2000) 852–872. modeling, feature selection and classification architecture, Beijing,
[18] F. Jiang, Y. Wu, A. Katsaggelos, Abnormal event detection from in: Proceedings of the International Workshop on Human Activity
surveillance video by dynamic hierarchical clustering, in: IEEE Recognition and Modelling, 2005, pp. 61–78.
International Conference on Image Processing (ICIP), San Antonio, [31] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, R. Williamson,
TX, USA, vol. 5, 2007, pp. 145–148. Estimating the support of a high-dimensional distribution, Neural
[19] N. Johnson, D. Hogg, Learning the distribution of object trajectories Computation 13 (7) (2001) 1443–1471.
for event recognition, Image and Vision Computing 14 (8) (1996) [32] G. Sukthankar, K. Sycara, Automatic recognition of human team
609–615. behaviors, in: Proceedings of Modeling Others from Observations,
[20] D. Kosmopoulos, P. Antonakaki, K. Valasoulis, D. Katsoulas, Workshop at the International Joint Conference on Artificial
Monitoring human behavior in an assistive environment using Intelligence (IJCAI), Edinburgh, Scotland, 2005.
multiple views, in: 1st International Conference on Pervasive [33] T. Wang, J. Li, Q. Diao, W. Hu, Y. Zhang, C. Dulong. Semantic event
Technologies Related to Assistive Environments PETRA’ 08, Athens, detection using conditional random fields, in: Conference on
Greece, 2008. Computer Vision and Pattern Recognition Workshop (CVPRW),
[21] C. Lee, M. Ho, W. Wen, C. Huang, T. Hsin-Chu, Abnormal event 2006, pp. 109–114.
detection in video using N-cut clustering, in: International [34] D. Weinland, R. Ronfard, E. Boyer, Motion history volumes for free
Conference on Intelligent Information Hiding and Multimedia viewpoint action recognition, in: IEEE International Workshop on
Signal Processing (IIH-MSP), Pasadena, CA, USA, 2006. Modeling People and Human Interaction, 2005.
[22] D. Mahajan, N. Kwatra, S. Jain, P. Kalra, S. Banerjee, A framework for [35] T. Xiang, S. Gong, Video behavior profiling for anomaly detection,
activity recognition and detection of unusual activities, in: IEEE Transactions on Pattern Analysis and Machine Intelligence 30
Proceedings of the Indian Conference on Computer Vision, Graphics (5) (2008) 893–908.
and Image Processing (ICVGIP), 2004, pp. 15–21. [36] D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, Semi-supervised
[23] L. Manevitz, M. Yousef, One-class SVMs for document classification, adapted HMMs for unusual event detection, in: IEEE Computer
Journal of Machine Learning Research 2 (2) (2001) 139–154. Society Conference on Computer Vision and Pattern Recognition
[24] T. Moeslund, A. Hilton, V. Krüger, A survey of advances in vision- (CVPR), vol. 1, 2005, pp. 611–618.
based human motion capture and analysis, Computer Vision and [37] H. Zhong, J. Shi, M. Visontai, Detecting unusual activity in video, in:
Image Understanding 104 (2–3) (2006) 90–126. Proceedings of IEEE Computer Vision and Pattern Recognition, vol.
[25] H. Neoh, A. Hazanohuk, Adaptive edge detection for real-time video 2, 2004, pp. 819–826.
processing using FPGAs, CD proceedings at the 2004 Global Signal [38] H. Zhou, D. Kimber, Unusual event detection via multi-camera video
Processing Expo (GSPx) and International Signal Processing Con- mining, in: Proceedings of the 18th International Conference on
ference (ISPC), Santa Clara, California, September 27-30, 2004. Pattern Recognition (ICPR), vol. 3, 2006, pp. 1161–1166.
[26] N. Nguyen, H. Bui, S. Venkatsh, G. West, Recognizing and [39] Z. Zivkovic, F. van der Heijden, Efficient adaptive density estimation
monitoring high-level behaviors in complex spatial environments, per image pixel for the task of background subtraction, Pattern
in: Proceedings of IEEE Computer Society Conference on Computer Recognition Letters 27 (7) (2006) 773–780.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy