MMPP Event Detection
MMPP Event Detection
Poisson Processes
Alexander Ihler
Toyota Technological Institute at Chicago,
Jon Hutchins
Department of Computer Science
University of California, Irvine
and
Padhraic Smyth
Department of Computer Science
University of California, Irvine
Time-series of count data occur in many different contexts, including internet navigation logs,
freeway traffic monitoring, and security logs associated with buildings. In this paper we describe
a framework for detecting anomalous events in such data using an unsupervised learning approach.
Normal periodic behavior is modeled via a time-varying Poisson process model, which in turn is
modulated by a hidden Markov process that accounts for bursty events. We outline a Bayesian
framework for learning the parameters of this model from count time series. Two large real
world data sets of time series counts are used as test beds to validate the approach, consisting of
freeway traffic data and logs of people entering and exiting a building. We show that the proposed
model is significantly more accurate at detecting known events than a more traditional threshold-
based technique. We also describe how the model can be used to investigate different degrees of
periodicity in the data, including systematic day-of-week and time-of-day effects, and to make
inferences about different aspects of events such as number of vehicles or people involved. The
results indicate that the Markov-modulated Poisson framework provides a robust and accurate
framework for adaptively and autonomously learning how to separate unusual bursty events from
traces of normal human activity.
Categories and Subject Descriptors: I.5.1 [Pattern Recognition]: Models—statistical; G.3 [Pro-
bability and Statistics]: Probabilistic Algorithms
General Terms: Algorithms
Additional Key Words and Phrases: Event detection, Markov modulated, Poisson
1. INTRODUCTION
Advances in sensor and storage technologies allow us to record increasingly detailed
pictures of human behavior. Examples include logs of user navigation and search
Portions of this work have appeared at the ACM Conference on Knowledge Discovery and Data
Mining (SIGKDD), 2006.
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c 2007 ACM 0000-0000/2007/0000-0001 $5.00
Door Count
40
B
20
Fig. 1. Jittered scatterplot of the number of people entering on any weekday over a fifteen week
period, shown as a function of the time of day (in half-hour intervals). Although certain points
(e.g., set A) clearly represent unusual periods of increased activity, it is less clear which, if any, of
the values in set B represent something similar.
on the internet, RFID traces, security video archives, and loop-sensor records of
freeway traffic. These time series often reflect the underlying hourly, daily, and
weekly rhythms of natural human activity. At the same time, the time series are
often corrupted by events corresponding to bursty periods of unusual behavior. Ex-
amples include anomalous bursts of activity on a network, large occasional meetings
in a building, traffic accidents, and so forth.
In this paper we address the problem of identifying such events, by learning the
patterns of both normal behavior and events from historical data. While at first
glance this problem might seem relatively straightforward, the problem becomes
difficult in an unsupervised context when events are not labeled or tagged in the
data. Learning a model of normal behavior requires the removal of the abnormal
events from the historical record—but detecting the abnormal events can be ac-
complished reliably only by knowing the baseline of normal behavior. This leads
to a “chicken and egg” problem that is the main focus of this paper.
We focus in particular on time series data where time is discrete and N (t) is a
measurement of the number of individuals or objects engaged in some activity over
the time-interval [t − 1, t], e.g., counts of the number of people who enter a building
every 15 minutes, or the number of vehicles that pass a certain location on the
freeway every 5 minutes. As an example, Figure 1 shows counts of the estimated
number of people entering a building over time from an optical sensor at the front
door of a UC Irvine (UCI) campus building. The data are “jittered” slightly by
Gaussian noise to give a better sense of the density of counts at each time. There
are parts of this signal which are clearly periodic, and other parts which are obvious
outliers; but there are many samples which fall into a gray area. For example, set
(A) in Figure 1 is clearly far from the typical behavior for their time period; but
set (B) contains many points which are somewhat unusual but may or may not
be due to the presence of an event. In order to separate the two, we need to
define a model of uncertainty (how unusual is the measurement?), and additionally
incorporate a notion of event persistence, i.e., the idea that a single, somewhat
unusual measurement may not signify anything but several in a row could indicate
the presence of an event.
ACM Journal Name, Vol. V, No. N, August 2007.
Vehicle Count
· 3
Vehicle Count
40 40
20 20
(a) (b)
Fig. 2. Example of freeway traffic data for Fridays for a particular on-ramp. (a) Average time
profile for normal, non game-day Fridays (dark curve) and data for a particular Friday (6/10/05)
with a baseball game that night (light curve). (b) Average time profile over all Fridays (dark
curve) superposed on the same Friday data (light curve) as in the top panel.
N + (t)
Door Count 40
(a)
20
S M T W T F S S M T W T F S S M T W T F S
Time
N − (t)
40
Door Count
(b)
20
S M T W T F S S M T W T F S S M T W T F S
Time
Fig. 3. (a) Entry data for the main entrance of the Calit2 building for three weeks,
beginning 7/23/05 (Sunday) and ending 8/13/05 (Saturday). (b) Exit data for the
same door over the same time period.
40
(a)
20
Fig. 4. (a) One week of traffic data (light curve) from Sunday to Saturday (June 5-11), with the
estimated normal traffic profile (estimated by the proposed model described later in the paper)
superposed as a dark curve. (b) Ground truth list of events (baseball games).
60
300
80 40
Variance
Variance
Variance
200
60
40 20
100
20
10 20 30 40 10 20 30 40 0 5 10 15 20 25
Mean Mean Mean
Fig. 5. (a) Scatter-plot of empirical (mean,variance) pairs observed in the data, compared with
the theoretical distribution expected for Poisson-distributed random variables (dashed lines show
±2-sigma interval). The presence of events makes these distributions quite dissimilar. (b) After
removing about 5% of data thought to be influenced by events, the data show much closer corre-
spondence to the Poisson assumption. (c) Building data after removing about 5% of observations
thought to have events present, along with corresponding confidence intervals.
Vehicle count
40 40
(a)
20 20
(b) p(z 6= 0)
0 0
(c) events
Time Time
[L] [R]
Fig. 6. [L]: Illustration of the baseline threshold model set to detect the event on the second day,
with (a) original freeway traffic time series (light curve) for May 17-18, and mean profile as used
by the threshold model (dark curve), (b) events detected by the threshold method, and (c) ground
truth (known events) in the bottom panel. Note the false alarms. [R]: Using a lower threshold to
detect the full duration of the large event on the second day, causing many more false alarms.
events interspersed in the data are sufficiently few compared to the amount of non–
event observations, and if they are sufficiently noticeable in the sense that they
cause a dramatic change in activity. However, these assumptions do not always
hold, and we can observe several modes of failure in such a simple model.
One way this model can fail is due to the “chicken and egg” problem mentioned
in the introduction and illustrated in Figure 2. As discussed earlier, the presence
of large events distorts the estimated rate of “normal”behavior, which causes the
threshold test to miss the presence of other events around that same time.
A second type of failure occurs when there is a slight change in traffic level
which is not of sufficient magnitude to be noticed; however, the change is sustained
over a period of several observations signaling the presence of a persistent event. In
Figure 6[L], the event indicated for the first day can be easily found by the threshold
model by setting the threshold sufficiently high enough to detect the event but low
enough so that there are no false alarms. In order for the threshold model to detect
the event on the second day, however, the threshold must be decreased, which also
causes the detection of a few false alarms over the two-day period. Anomalies
detected by the threshold model are shown in Figure 6[L](b), while the known
events (baseball games) are displayed in panel (c).
A third weakness of the threshold model is its difficulty in capturing the duration
of an event. In order to detect not only the presence of the event on the second day
but also its duration, the threshold must be raised to the point that the number
of false alarms becomes quite prohibitive, as illustrated in Figure 6[R]. Note that
the traffic event, corresponding to people departing the game, begins at or near the
end of the actual game time.
In the remaining sections of the paper we discuss a more sophisticated proba-
bilistic model that accounts for these different aspects of the problem, and show
(in Section 7) that it can be used to obtain significantly more accurate detection
performance than the simple thresholding method.
ACM Journal Name, Vol. V, No. N, August 2007.
· 9
5. PROBABILISTIC MODELING
Section 4, and in particular the failures of Figure 6, motivate the use of a probabilis-
tic model capable of reasoning simultaneously about the rate of normal behavior
(intuitively corresponding to the periodic portion of the data) and the presence and
duration of events (relatively rare deviations from the norm). Let us assume that
the two processes are additive, so that
N (t) = N0 (t) + NE (t), N (t) ≥ 0 (2)
where N0 (t) is the number of occurrences attributed to the normal building oc-
cupancy, and NE (t) represents the change in the number of occurrences which is
attributed to an event at time t (positive or negative); the non-negativity condition
indicates that we cannot observe fewer than zero counts. We discuss modeling each
of the variables N0 , NE in turn. Note that, although the models described here
are defined for discrete time periods, it may also be possible to extend them to
continuous time measurements [Scott 1998; 2002].
Door Count
ToD Effect
10 10
5 5
SUN MON TUE WED THU FRI SAT 6:00 12:00 18:00
Time Time
(a) (b)
Fig. 7. (a) The effect of δd(t) , as seen over a week of building exit data. The relative rates over the
weekend (Sunday, Saturday) are much lower than those on weekdays. (b) The effect of ηd(t),h(t)
in modulating the Poisson rate of building exit data over a single day. There is a noticeable peak
around lunchtime, and a heavy bias towards the end of the day.
common parent nodes that are outside the plate [Buntine 1994]. The plates indicate
that there are multiple variables λ(t) and N0 (t), one for each value of t ∈ {1 . . . T },
and that the λ(t) variables are conditionally independent of each other given λ0 ,
σ, and η. A key point is that, given N0 (t), the parameters λ0 , δ, and η are all
independent of N (t).
By choosing conjugate prior distributions for these variables we can ensure that
the inference computations in Section 6 have a simple closed form:
λ0 ∼ Γ(λ; aL , bL )
1
[δ1 , . . . , δ7 ] ∼ Dir(α1d , . . . , α7d )
7
1
[ηj,1 , . . . , ηj,D ] ∼ Dir(α1h , . . . , αD
h
)
D
where Γ is the Gamma distribution,
δ
λ0
η
N (t)
λ(t)
NE (t) N0 (t)
N0 (t)
(a) (b)
Fig. 8. (a) Graphical model for λ(t) and N0 (t). The parameters λ0 , δ, and η (the periodic
components of λ(t)) couple the distributions over time. (b) Graphical model for z(t) and N (t).
The Markov structure of z(t) couples the variables over time [in addition to the coupling of N0 (t)
from (a)].
and define the probability distribution over z(t) to be Markov in time, with tran-
sition probability matrix
z00 z0+ z0−
Mz = z+0 z++ z+−
z−0 z−+ z−−
with each row summing to one, e.g., z00 + z0+ + z0− = 1. These variables can
be interpreted in terms of intuitive characteristics of the system; for example, the
length of each time period between events is geometric with expected value 1/(1 −
z00 ), the length of each positive event is geometric with expected value 1/(1 − z++ ),
and so forth. We give the transition probability variables priors specified as
[z00 , z0+ , z0− ] ∼ Dir(z ; [aZ Z Z
00 , a0+ , a0− ])
and similarly for the other matrix rows, where Dir(·) is again the Dirichlet distri-
bution.
Given z(t), we can model the increase or decrease in observation counts due to
the event, NE (t), as Poisson with rate γ(t)
(
0 z(t) = 0
NE (t) ∼
P( z(t)N ; γ(t)) z(t) 6= 0
N + (t)
Door Count
(a) 20
10
S M T W T F S S M T W T F S S M T W T F S
1
(b) p(z 6= 0)
0.5
0
(c) events:
Time
Fig. 9. (a) Entry data, along with λ(t), over a period of three weeks (Sept. 25–
Oct. 15). Also shown are (b) the posterior probability of an event being present,
p(z(t) 6= 0), and (c) the periods of time in which an event was scheduled for the
building. All but one of the scheduled events are detected, along with a few other
time periods (such as a period of greatly heightened activity on the first Saturday).
likelihood functions
P(N
P
(t); λ(t)) z(t) = 0
p(N (t)|z(t)) = i P(N (t) − i; λ(t)) NBin(i) z(t) = +1
P
i P(N (t) + i; λ(t)) NBin(i) z(t) = −1
(where the parameters of NBin(·) are as in (4)). Then, for t ∈ {T, . . . , 1}, we draw
samples
Z(t) ∼ p( z(t) | z(t + 1) = Z(t + 1), {N (t′ ), t′ ≤ t} ).
Given z(t) = Z(t), we can then determine N0 (t) and NE (t) by sampling. If z(t) =
0, we simply take N0 (t) = N (t); if z(t) = +1 we draw N0 (t) from the discrete
distribution
N0 (t) ∼ f+ (i) ∝ P(N (t) − i; λ(t)) NBin(i; aE , bE /(1 + bE ))
and if z(t) = −1 from the distribution
N0 (t) ∼ f− (i) ∝ P(N (t) + i; λ(t)) NBin(i; aE , bE /(1 + bE ))
then setting NE (t) = N (t) − N0 (t). Note that, if z(t) = +1, N0 takes values in
{0 . . . N }; if z(t) = −1, however, N0 has no fixed upper limit. In practice, for
computational efficiency we truncate the distribution (imposing an upper limit) at
the point given by P(N (t) + i; λ(t)) < 10−4 .
When N (t) is unobserved (missing), N0 (t) and NE (t) are coupled only through
z(t) and the positivity condition on N (t). Thus, when z(t) 6= −1 (positive or no
event), N0 and NE can be drawn independently, and when z(t) = −1 (negative
event) they can be drawn fairly easily through rejection sampling, i.e., repeat-
edly drawing the variables independently until they satisfy the positivity condition.
Overall, missing data are relatively rare, with essentially no observations missing
ACM Journal Name, Vol. V, No. N, August 2007.
14 ·
in the building data and about 7% of observations missing in the traffic data (due
to loop sensor errors or down-time).
and similar forms for the other zij . As noted by Scott [2002], Markov–modulated
Poisson processes appear to be relatively sensitive to the selection of prior distri-
butions over the zij and γ(t), perhaps because there are no direct observations of
the processes they describe. This appears to be particularly true for our model,
which has considerably more freedom in the anomaly process (i.e., in γ(t)) than the
telephony application of Scott [2002]. However, for an event detection application
such as those under consideration, we have fairly strong ideas of what constitutes
a “rare” event, e.g., approximately how often we expect to see events occur (say,
1–2 per day) and how long we expect them to last (perhaps an hour or two).
We can leverage this information to form relatively strong priors on the transition
parameters of z(t) which force the marginal probability of z(t). This avoids over-
explanation of the data, such as using the event process to compensate for the fact
ACM Journal Name, Vol. V, No. N, August 2007.
· 15
that the “normal” data exhibits slightly larger than expected variance for Poisson
data (see Section 4.1). By adjusting these priors one can also increase or decrease
the model’s sensitivity to deviations and thus the number of events detected; see
Section 7.
N + (t) N − (t)
20 20
10 10
1 1
0.5 p(z 6= 0) 0.5
0 0
Fig. 10. Data for Oct. 3, 2005, along with rate λ(t) and probability of event p(z 6= 0). At 3:30 P.M.
an event was held in the building atrium, causing anomalies in both the incoming and outgoing
data over most of the time period.
Vehicle count
40 40
(a)
20 20
0 0
(c) events
Time Time
[L] [R]
Fig. 11. [L]: A Friday evening game, Apr. 29, 2005. Shown are (a) the prediction of normal
activity, λ(t); (b) the estimated probability of an event, p(z 6= 0); and (c) the actual game time.
[R]: The threshold model’s prediction for the same day.
Vehicle Count
40
(a)
20
May 17 May 18
1
(c) events
Time
Fig. 12. (a) Freeway data for May 17-18,2005, along with rate λ(t); (b) probability of event
p(z 6= 0); (c) actual event times.
to detect the event on day two. Our model detects both events with no false alarms,
and nicely shows the duration of the predicted events.
Table I compares the accuracies of the Markov-modulated Poisson process (MMPP)
model described in Section 5 and the baseline threshold model of Section 4.2 on
validation data not used in training the models for both the building and freeway
traffic data respectively. For each row in the table, the MMPP model parameters
ACM Journal Name, Vol. V, No. N, August 2007.
· 17
Table I. Accuracies of predictions for the two data sets, in terms of the percentages of known
events found by each model, for different total numbers of events predicted. There were 29 known
events in the building data, and 78 in the freeway data.
were adjusted so that a specific number of events were detected, by adjusting the
priors on the transition probability matrix. The threshold model was then modified
to find the same number of events as the MMPP model by adjusting its threshold
ǫ.
In both data sets, for a fixed number of predicted events (each row), the number
of true events detected by the MMPP model is significantly higher than that of
the baseline model. This validates the intuitive discussion of Section 4.2 in which
we outlined some of the possible limitations of the baseline approach, namely its
inability to solve the “chicken and egg” problem and the fact that it does not
explicitly represent event persistence. As mentioned earlier, the events detected by
the MMPP model that are not in the ground truth list may plausibly correspond to
real events rather than false alarms, such as unscheduled activities for the building
data and accidents and non-sporting events for the freeway traffic data.
Negative events (corresponding to lower than expected activity) tend to be more
rare than positive events (higher than expected activity), but can play an important
role in the model. For example, in the building data the presence of holidays
(during which very little activity is observed) can corrupt the estimates of normal
behavior. If known, these days can be explicitly removed (marked as missing) [Ihler
et al. 2006], but by treating such periods as negative events the model can be made
robust to such periods. Although negative events are quite rare in the building data
(comprising about 5% of the events detected), they are more common in the traffic
data set (about 40% of events). Figure 13 shows a typical example of a negative
traffic event. Here, only three cars were observed on the ramp during a 15 minute
period with normally high traffic followed by a 30 minute period with much higher
than normal activity. We might speculate that the initial negative event could be
due to an accident or construction, which shut down the ramp for a short period,
followed by a positive event during which the resulting build-up of cars were finally
allowed onto the highway.
8. OTHER INFERENCES
Given that our model is capable of detecting and separating the influence of unusual
periods of activity, we may also wish to use the model to estimate other quantities
of interest. For example, we can separate out the normal patterns of behavior and
use a goodness-of-fit test to answer questions about the degree of heterogeneity in
the data and thus the underlying human behavior. Alternatively, we might wish
to use our estimates of the amounts of abnormal behavior to infer other, indirectly
ACM Journal Name, Vol. V, No. N, August 2007.
18 ·
Vehicle Count
40
(a)
20
0
1
p(z=+1)
(c) 0.5
0
Time
Fig. 13. Negative and positive events: (a) Freeway data and estimated profile for May 6; (b) when
the number of observed cars drops sharply, the probability of a negative event is high; (c) the
decrease is followed by a short but large increase in traffic, detected as a positive event.
related aspects of the event, such as its popularity or importance. We discuss each
of these cases next.
8.1 Testing Heterogeneity
One question we may wish to ask about the data is, how time–varying is the process
itself? For example, how different is Friday afternoon from that of any other week-
day? By increasing the number of degrees of freedom in our model, we improve its
potential for accuracy but may increase the amount of data required to learn the
model well. This also has important consequences in terms of data representation
(for example, compression), which may need to be a time–dependent function as
well. Thus, we may wish to consider testing whether the data we have acquired
thus far supports a particular degree of heterogeneity.
We can phrase many of these questions as tests over sub-models which require
equality among certain subsets of the variables. For example, we may wish to test
for the presence of the day effect, and determine whether a separate effect for each
day is warranted. Specifically, we might test between three possibilities:
D0 : δ1 = . . . = δ7 (all days the same)
D1 : δ1 = δ7 , δ2 = . . . = δ6 (weekends, weekdays the same)
D2 : δ1 =
6 . . . 6= δ7 (all day effects separate)
We can compare these various models by estimating each of their marginal like-
lihoods [Gelfand and Dey 1990]. The marginal likelihood is the likelihood of the
data under the model, having integrated out the uncertainty over the parameter
values, e.g.,
Z
p(N |D2 ) = p(N |λ0 , δ, η)p(λ0 , δ, η) ∂λ0 ∂δ ∂η
Since uncertainty over the parameter values is explicitly accounted for, there is no
need to penalize for an increasing number of parameters. Moreover, we can use the
same posterior samples drawn during the MCMC process (Section 6) to find the
marginal likelihood, using the estimate of Chib [1995].
Computing the marginal likelihoods for each of the models D1 , . . . , D3 for the
building data, and normalizing by the number of observations T , we obtain the
values shown in Table II. From these values, it appears that D0 (all days the same)
ACM Journal Name, Vol. V, No. N, August 2007.
· 19
Table II. Average log marginal likelihood of the data (exit and entry) under various day–
dependency models: D0 , all days the same; D1 , weekends and weekdays separate; and D2 , each
day separate. There does not appear to be a significant change in behavior among weekend days
or among weekdays. Parameters ηi,j were unconstrained.
Table III. Average log marginal likelihood under various time-of-day dependency models for the
building data: T0 , all days have the same time profile; T1 , weekend days and weekdays share time
profiles; T2 , each day has its own individual time profile. The model appears to slightly prefer T1 ,
indicating strong profile similarities among weekdays and among weekends. Parameters δj were
unconstrained.
is a considerably worse model, and that D1 and D2 are essentially equal, indicating
that either model will do an equally good job of predicting behavior.
We can derive similar tests for other symmetries that might exist. For example,
we might wonder whether every day has the same time profile. (Note that this is
possible, since Sunday might be a severely squashed version of Monday, i.e., fewer
people come to work, but they follow a similar hourly pattern.) Alternatively,
is each day of the week unique, or (again) might all weekdays be the same, and
similarly weekend days? Our tests become
T0 : ∀i, η1,i = . . . = η7,i (same time every day)
T1 : ∀i, η1,i = η7,i , η2,i = . . . = η6,i (weekends, weekdays)
T2 : ∀i, η1,i 6= . . . 6= η7,i (all time effects separate)
The results, shown in Table III, show a small but distinct preference for T1 , indi-
cating that although weekends and weekdays have differing profiles, one can better
predict behavior by combining data across weekdays and weekends. Other tests,
such as whether Fridays differ from other days, can be accomplished using similar
estimates.
8.2 Estimating Event Attendance
Along with estimating the probability that an unusual event is taking place, as part
of the inference procedure our model also estimates the number of counts which
appear to be associated with that event. Marginalizing over the other variables,
we obtain a distribution over how many additional (or fewer) people seem to be
entering or leaving the building or the number of extra (or missing) vehicles entering
the freeway during a particular time period. One intriguing use for this information
is to provide a score, or some measure of popularity, of each event.
As an example, taking our collection of LA Dodgers baseball games, we compute
ACM Journal Name, Vol. V, No. N, August 2007.
20 ·
4
x 10
game attendance
6
3
0 100 200 300 400 500 600
number of extra vehicles
Fig. 14. The attendance of each baseball game (y-axis) shows correlation with the number of
additional (event–related) vehicles detected by the model (x-axis).
and sum the posterior mean of extra (event-related) vehicles observed, NE (t), dur-
ing the duration of the event detection. Figure 14 shows that our estimate of the
number of additional cars is positively correlated with the actual overall attendance
recorded for the games (correlation coefficient 0.67). Similar attendance scores can
be computed for the building data, or other quantities such as duration estimated,
though for these examples no ground truth exists for comparison.
9. CONCLUSION
We have described a framework for building a probabilistic model of time–varying
counting processes, in which we observe a superposition of both time–varying but
regular (periodic) and aperiodic processes. We then applied this model to two
different time series of counts of the number of people entering and exiting through
the main doors of a campus building and the number of vehicles entering a freeway,
both over several months. We described how the parameters of the model may
be estimated using MCMC sampling methods, while simultaneously detecting the
presence of anomalous increases or decreases in the counts. This detection process
naturally accumulates information over time, and by virtue of having a model
of uncertainty provides a natural way to compare potentially anomalous events
occurring on different days or times.
Using a probabilistic model also allows us to pose alternative models and test
among them in a principled way. Doing so we can answer questions about how the
observed behavior varies over time, and how predictable that behavior is. Finally,
we described how the information obtained in the inference process can be used to
provide an interesting source of feedback, for example estimating event popularity
and attendance.
Although the current model is very effective in accurately detecting events in the
data sets we looked at, it is also a relatively simple model and there are a number
of potential extensions that could improve its performance for specific applications.
For example, the Poisson parameters for nearby time-periods could be coupled in
the model to share information during learning and encourage smoothness in the
inferred mean profiles over time. Similarly, it could be valuable to incorporate
additional exogenous variables into the proposed model, e.g., allowing both normal
and event-based intensities to be dependent on factors such as weather. The event
process could be generalized from a Markov to a semi-Markov process to handle
events with specific (non-geometric) duration signatures. In principle all of these
extensions could be handled via appropriate extensions of the graphical models
ACM Journal Name, Vol. V, No. N, August 2007.
· 21
and Bayesian estimation techniques described earlier in the paper, with attendant
increases in modeling and computational complexity.
A further interesting direction for future work is to simultaneously model multiple
correlated time series, such as those arising from door counts from multiple doors
(and perhaps from more than one type of sensor) as well as multiple time series
from different loop sensors along a freeway. More sensors provide richer information
about occupancy and behavioral patterns, but it is an open question how these co-
varying data streams should be combined, and to what degree their parameters can
be shared.
Acknowledgments
The authors would like to thank Chris Davison and Anton Popov for their as-
sistance with logistics and data collection, and Shellie Nazarenus for providing a
list of scheduled events for the Calit2 building. This material is based upon work
supported in part by the National Science Foundation under Award Numbers ITR-
0331707, IIS-0431085, and IIS-0083489.
REFERENCES
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring
in statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical
Statistics 41, 1 (February), 164–171.
Buntine, W. 1994. Operations for learning with graphical models. Journal of Artificial Intelli-
gence Research 2, 159–225.
Chen, C., Petty, K., Skabardonis, A., Varaiya, P., and Jia, Z. 2001. Freeway performance
measurement system: mining loop detector data. 80th Annual Meeting of the Transportation
Research Board, Washington, D.C. http://pems.eecs.berkeley.edu/.
Chib, S. 1995. Marginal likelihood from the Gibbs output. Journal of the American Statistical
Association 90, 432 (Dec.), 1313–1321.
Gelfand, A. E. and Dey, D. K. 1990. Bayesian model choice: asymptotics and exact calculations.
Journal of Royal Statistical Society, Series C 56, 3, 501–514.
Gelfand, A. E. and Smith, A. F. M. 1990. Sampling-based approaches to calculating marginal
densities. Journal of the American Statistical Association 85, 398–409.
Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian
restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 6
(Nov.), 721–741.
Guralnik, V. and Srivastava, J. 1999. Event detection from time series data. In KDD ’99:
Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and
data mining. ACM Press, New York, NY, USA, 33–42.
Heffes, H. and Lucantoni, D. M. 1984. A Markov-modulated characterization of packetized
voice and data traffic and related statistical multiplexer performance. IEEE Journal on Selected
Areas in Communications 4, 6, 856–868.
Ihler, A., Hutchins, J., and Smyth, P. 2006. Adaptive event detection with time–varying Pois-
son processes. In KDD ’06: Proceedings of the twelfth ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM Press, New York, NY, USA, 207–216.
Jordan, M. I., Ed. 1998. Learning in Graphical Models. MIT Press, Cambridge, MA.
Keogh, E., Lonardi, S., and chi’ Chiu, B. Y. 2002. Finding surprising patterns in a time series
database in linear time and space. In KDD ’02: Proceedings of the eighth ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM Press, New York,
NY, USA, 550–556.
ACM Journal Name, Vol. V, No. N, August 2007.
22 ·
Kleinberg, J. 2002. Bursty and hierarchical structure in streams. In KDD ’02: Proceedings of
the eighth ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM Press, New York, NY, USA, 91–101.
Papoulis, A. 1991. Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-
Hill Inc., New York, NY.
Salmenkivi, M. and Mannila, H. 2005. Using Markov chain Monte Carlo and dynamic pro-
gramming for event sequence data. Knowledge and Information Systems 7, 3, 267–288.
Scott, S. 1998. Bayesian methods and extensions for the two state Markov modulated Poisson
process. Ph.D. thesis, Harvard University, Dept. of Statistics.
Scott, S. 2002. Detecting network intrusion using a Markov modulated nonhomogeneous Poisson
process. http://www-rcf.usc.edu/∼sls/mmnhpp.ps.gz.
Scott, S. L. and Smyth, P. 2003. The Markov modulated Poisson process and Markov Poisson
cascade with applications to web traffic data. Bayesian Statistics 7, 671–680.
Svensson, A. 1981. On a goodness of fit test for multiplicative Poisson models. The Annals of
Statistics 9, 4, 697–704.