0% found this document useful (0 votes)
76 views10 pages

Real-Time Prediction of Taxi Demand Using Recurrent Neural Networks

Uploaded by

Chandra Sekhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views10 pages

Real-Time Prediction of Taxi Demand Using Recurrent Neural Networks

Uploaded by

Chandra Sekhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1

Real-Time Prediction of Taxi Demand Using


Recurrent Neural Networks
Jun Xu , Rouhollah Rahmatizadeh, Ladislau Bölöni, Senior Member, IEEE, and Damla Turgut, Member, IEEE

Abstract— Predicting taxi demand throughout a city can help


to organize the taxi fleet and minimize the wait-time for passen-
gers and drivers. In this paper, we propose a sequence learning
model that can predict future taxi requests in each area of a
city based on the recent demand and other relevant information.
Remembering information from the past is critical here, since
taxi requests in the future are correlated with information about
actions that happened in the past. For example, someone who
requests a taxi to a shopping center, may also request a taxi to
return home after few hours. We use one of the best sequence
learning methods, long short term memory that has a gating
mechanism to store the relevant information for future use.
We evaluate our method on a data set of taxi requests in New
York City by dividing the city into small areas and predicting the Fig. 1. Taxi demand pattern in two different areas of New York City.
demand in each area. We show that this approach outperforms
other prediction methods, such as feed-forward neural networks.
In addition, we show how adding other relevant information, such
as weather, time, and drop-offs affects the results. patterns in the data that can help to predict the demand in a
particular area at a specific time. Several previous studies have
Index Terms— Taxi demand prediction, time series forecasting,
recurrent neural networks, mixture density networks. shown that it is possible to learn from past taxi data [5]–[8].
In this paper, we propose a real-time method for predicting
taxi demands in different areas of a city. We divide a big city
I. I NTRODUCTION into smaller areas and aggregate the number of taxi requests in
each area during a small time period (e.g. 20 minutes). In this
T AXI drivers need to decide where to wait for passengers
in order to pick up someone as soon as possible. Passen-
gers also prefer to quickly find a taxi whenever they are ready
way, past taxi data becomes a data sequence of the number of
taxi requests in each area. Then, we train a Long Short Term
for pickup. Effective taxi dispatching can help both drivers Memory (LSTM) [9] recurrent neural network (RNN) with this
and passengers to minimize the wait-time to find each other. sequential data. The network input is the current taxi demand
Drivers do not have enough information about where pas- and other relevant information while the output is the demand
sengers and other taxis are and intend to go. Therefore, a taxi in the next time-step. The reason we use a LSTM recurrent
center can organize the taxi fleet and efficiently distribute them neural network is that it can be trained to store all the relevant
according to the demand from the entire city [1], [2]. This taxi information in a sequence to predict particular outcomes in
center is especially needed in the future where self-driving the future. In addition, taxi demand prediction is a time series
taxis need to decide where to wait and pick up passengers. forecasting problem in which an intelligent sequence analysis
To build such a taxi center, an intelligent system that can model is required. LSTMs are the state of the art sequence
predict the future demand throughout the city is required. learning models that are widely used in many applications
Predicting taxi demand is challenging because it is corre- such as unsegmented handwriting generation [10] and natural
lated with many pieces of underlying information. One of the language processing [11]. LSTM is capable of learning long-
most relevant sources of information is historical taxi trips. term dependencies by utilizing some gating mechanisms to
Thanks to the Global Positioning System (GPS) technology, store information. Therefore, it can for instance remember how
taxi trip information can be collected from GPS enabled many people have requested taxis to attend a concert and after
taxis [3], [4]. Fig. 1 shows an example of taxi request patterns a couple of hours use this information to predict that the same
in two areas. Analyzing this data shows that there are repetitive number of people will request taxis from the concert location
to different areas.
Manuscript received May 31, 2017; revised August 16, 2017; accepted However, predicting real-valued numbers is tricky because
September 15, 2017. The Associate Editor for this paper was S. Djahel.
(Corresponding author: Jun Xu.) many times simply learning the average of the values in the
The authors are with the University of Central Florida, Orlando, FL 32816 dataset does not give a valid solution. It will also confuse
USA (e-mail: junxu@eecs.ucf.edu). LSTM in the next time-step since the network has not seen
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. the average before. Therefore, we add Mixture Density Net-
Digital Object Identifier 10.1109/TITS.2017.2755684 works (MDN) [12] on top of LSTM. In this way, instead of
1524-9050 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

direct predicting a demand value, we output a mixture dis- More prediction applications using historical taxi informa-
tribution of the demand. A sample can be drawn from this tion can be found on topics such as taxi sharing and destination
probability distribution and be treated as the predicted taxi prediction. Yuan et al. [15] present a recommender system
demand. for taxi drivers and people expecting to take a taxi. They do
The remainder of this paper is organized as follows. this recommendation by learning from taxi GPS traces and
Section II introduces related works on prediction applications mobility pattern of passengers. Ma et al. [16] propose a taxi
using past taxi data and sequential learning applications of ride-sharing system that efficiently serves real-time requests
LSTMs. Section III shows how we encode the huge number sent by taxi users. Rong et al. [17] model the passenger
of GPS records and a brief explanation of recurrent neural seeking taxis as a Markov Decision Process (MDP) and
networks. Section IV describes the proposed sequence learn- propose a method to increase the revenue efficiency of taxi
ing model, as well as the training and testing procedures. drivers. Azevedo et al. [18] look further in the future and
In Section V, we show the performance metrics and present investigate the problem of improving the mobility intelligence
the experiment results. Lastly, in Section VI we conclude the of self-driving vehicles through a large-scale data analysis. For
paper. a more extensive survey on different approaches to analyze
and learn from taxi GPS traces, the reader is referred to a
survey [19] that focuses on this topic.
II. R ELATED W ORK

A. Prediction Applications Using Past Taxi Data B. Sequential Learning Applications With LSTMs
There are few previous research works conducted on taxi de Brébisson et al. [20] propose to use recurrent neural
demand prediction. Zhang et al. [5] propose a passenger hot- networks to predict taxi destination given the beginning of
spots recommendation system for taxi drivers. By analyzing taxi trip GPS traces. However, in our work the network learns
the historical taxi data, they extract hot-spots in each time-step the past taxi demand pattern and continuously predicts when
and assign a hotness score to each of them. This hotness score and where a new taxi trip will start.
will be predicted in each time-step and combined with the There are many other applications that LSTMs have been
driver’s location, the top−k hot-spots would be recommended. widely used. Graves [10] proposed an online handwriting
Zhao et al. [6] define a maximum predictability for the taxi sequence generation with LSTMs. In this application, the data
demand at street blocks level. They show the real entropy sequence consists of x and y pen coordinates and the points in
of past taxi demand sequence which proves that taxi demand the sequence when the pen is lifted off. His model can generate
is highly predictable. They also implement three prediction highly realistic handwriting. Rahmatizadeh et al. [21] propose
algorithms to validate their maximum predictability theory. to use LSTMs to learn the sequential trajectories for a robot
Moreira-Matias et al. [13] propose a framework consisting arm. Their goal is to make the robot perform complex manip-
of three different prediction models. In each time-step, the ulation tasks in real world. Some other successful applications
predicted demand is a weighted ensemble of predictions include language modeling [22], speech recognition [23] and
from three models. The ensemble weights are updated with visual recognition [24]. LSTMs perform very well in all these
individual prediction performances of previous time-steps in a sequential learning applications.
sliding-window. Their framework can make short term demand Overall, aforementioned works on taxi demand prediction
prediction for the 63 taxi stands in the city of Porto, Portugal. motivated us to rely on historical taxi trip information to
Davis et al. [14] use time-series modeling to forecast taxi predict future taxi demands. In terms of the taxi demand
travel demand in the city of Bengaluru, India. This information prediction, most works either use a weighted method on
can be given to the drivers in a mobile application so that they previous taxi demands or a time series fitting model to fit
know where the demand is higher. the demand sequence. The problem is that when the data
In addition, based on historical and real-time taxi data, sequence is very long, the performance is poor in these
dispatching center has been modeled in some studies. approaches. Furthermore, time series fitting model has to be
Zhang et al. [7] propose a real-time taxi dispatching appli- trained separately for each area, hence, the patterns learned in
cation. In their system, two kinds of passengers are defined one area can not be used in other areas.
to model real-time taxi demand: previously left-behind pas- One of the differences between our work and the previous
sengers and future arriving passengers. Both left-behind and works is that our model can capture long term dependencies
arriving passengers can be simulated at the dispatch center in a sequence that happen very far away from each other.
based on the real-time GPS traces of each taxi. A demand We train our network on sequences that are as long as a
inference model called Dmodel is proposed using hidden week and this can be easily extended to a month or a year
markov chain to model the state changes of both left-behind if we have enough computational power to train the network.
and arriving passengers. Miao et al. [8] propose a dispatching Another advantage of our model is that we predict all the
framework for balancing taxi supply in a city. Their goal areas of a city at once using a single model. With this
is to match taxi demand and supply and minimize taxi idle formulation, the patterns learned by the LSTM in one area
driving distance. In their work, the next time-step taxi demand can be used in other areas. Additionally, our model predicts
is calculated by the mean value of repeated samples from the entire probability distribution of taxi demands instead of
historical data. deterministically predicting the number of requests for each
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

XU et al.: REAL-TIME PREDICTION OF TAXI DEMAND USING RECURRENT NEURAL NETWORKS 3

area. This approach gives a more realistic prediction as it takes


into account the uncertainty while predicting.

III. M ATHEMATICAL M ODEL


In this section, first we introduce how we convert the
high resolution GPS data into the number of taxi requests
in each small area of the city. Then, we briefly explain the
Fig. 2. A recurrent neural network.
mathematical formulation of recurrent neural networks.

A. GPS Data Encoding


In order to be able to accurately predict the demand, we
divide the entire city into small areas. It is desired to predict
taxi demand in small areas so that the drivers know exactly
where to go. However, learning to predict taxi demand in very
small areas is difficult. So, we need to choose an area size
which is both easy to predict and sufficiently accurate for Fig. 3. An unrolled recurrent neural network.
the drivers. In this paper, we use the Geohash library [25]
that can divide a geographical area into smaller subareas with
arbitrary precision. Geohash is a geocoding system that has a information to be passed over from one step to the next.
hierarchical spatial data structure which subdivides space into All W s are the shared weights among different time-steps.
buckets of grid shape. The size of the grid is determined by For training these weights, we unroll the network for finite
the number of characters used in the geohash code. number of time-steps as shown in Fig. 3.
In our experiment, we use taxi data from 1/1/2013 through When the network is unrolled, it is more clear why it is
6/30/2016 at NYC [26], which includes around 600 mil- being used for sequence learning and how the information is
lions taxi trips after data filtering. The dataset specifies for being passed to the future. The computation at each time-step
each drop-off and pick-up event the GPS location and the can be formulated as follows:
timestamp. In our experiments, we divide the entire New York - x t is the input at time-step t.
City into around 6500 areas, with a Geohash precision 7. Each - h t is the hidden state at time-step t. It is calculated based
area size is smaller than 153m × 153m under this precision. on the previous hidden state and the current input with
Then we count the number of taxi requests during every the application of a non-linearity. In most RNN imple-
time-step length. In such a way, historical taxi data in each mentations, this non-linearity is a hyperbolic tangent:
area becomes the number of requests data sequences. These h t = tanh(Wxh x t + Whh h t −1 + bh ).
data sequences are fed into the LSTM for sequential patterns - yt is the output at time step t. We can decide how it looks
learning. like according to the task. For example, in predicting next
word in a sentence, the output yt can be a probability
distribution over a vocabulary.
B. Recurrent Neural Networks
All parameters Wxh , Whh and Why are shared among each
The sequential nature of taxi demand data leads us to the
unrolled time-step. So the network is actually performing
choice of a model that can handle time series data. Recurrent
the same computation at each time-step, but with different
neural networks (RNNs) are one of the most popular models
inputs x t . This greatly reduces the total number of parameters
that can process sequential data very well. The idea behind
in the network and avoids over-fitting on smaller datasets.
RNNs is to store relevant parts of the input and use this
Hidden state h t is the main feature of RNNs. It works as
information while predicting the output in the future. Unlike
the network memory which captures useful information about
feed-forward neural networks that only predict the output
what happened in all the previous time steps.
based on the current input, RNNs contain memory in which
Currently, the most commonly used type of RNNs are Long
some important information from the past inputs can be stored.
Short Term Memory networks (LSTMs). LSTMs are a special
For instance, when we train RNNs on a language modeling
kind of RNN, capable of learning long-term dependencies due
task in which we generate a text one character at each time-
to their gating mechanism. They were introduced by Hochre-
step, it is better to store what characters the network has
iter & Schmidhuber [9], and were refined and popularized by
predicted in the previous time steps since the next character
many people in the following years.
is dependent on the previous predictions.
RNNs are called recurrent because they perform the same
computation on every element of a sequence, with the output IV. TAXI D EMAND P REDICTION M ODEL
being conditioned on the previous computations. A typical In this section, we discuss the sequence learning model.
RNN structure is given in Fig. 2. The number of taxi requests in each area depends on many
As we can see, the RNN processes input x, stores hidden underlying factors unavailable to our model. This will naturally
state h and outputs y at each time-step t. A loop allows cause uncertainty in the model. So instead of forecasting a
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Fig. 4. The input data structure at one time-step.

deterministic taxi demand value, we use a stochastic model Fig. 5. The LSTM-MDN learning model unrolled through time-steps.
that can predict the entire probability distribution of taxi
demands in all areas. We then use this probability distribution
to decide taxi demand for each area. the output of the network which will be compared with yt ,
the real demand value from the dataset, to form the error
A. Mixture Density Networks signal. We use two LSTM layers in which each layer contains
1200 − 1500 neurons based on the specific testing scenario.
The most successful application of neural networks has been
As shown in Fig. 5, the output of LSTM layers are mixture
achieved on classification tasks. When it comes to predicting
density parameters with the total number of M × (N + 2)
real-valued data, the choice of network structure is very
in which M is the number of Gaussian kernels, and N is
important. The idea of mixture density networks (MDNs) [12]
the number of areas in the city. For each Gaussian kernel,
is to use the outputs of a neural network to parameterize
we have N neurons for the means μk (x), one neuron for the
a mixture distribution. Unlike the model with mean squared
variances σk (x), and another neuron for Mthe mixing coeffi-
error (MSE) cost which is deterministic, MDNs can model
cient wk (x). To satisfy the constraint k=1 wk (x) = 1, the
stochastic behaviors. They can be used in prediction applica-
corresponding neurons are passed through a softmax function.
tions in which an output may have multiple possible outcomes.
The softmax function is regularly used in multiclass classifica-
In our application, rather than direct predicting the number of
tion methods to “squash” a vector of n arbitrary real values z
taxi requests, the neural network outputs the parameters of a
into a set of values that add up to 1, and which can be
mixture model. These parameters are the mean and variance of
interpreted as probabilities:
each Gaussian kernel and also the mixing coefficient of each
kernel which shows how probable that kernel is. Given the ez j
softmax(z) j = (1)
parameters of the mixture distribution, we can draw a sample 
n
zk
from it and use this sample as the final prediction. e
k=1

B. LSTM-MDN Sequence Learning Model The neurons corresponding to the variances σk (x) are passed
As described in Section III, we divide the entire city into through an exponential function and the neurons correspond-
small areas and convert the past taxi data into week-long data ing to the means μk (x) are used without any further changes.
sequences. Fig. 4 shows the structure of the data sequence at The probability density of the next output yt can be modeled
one time-step. For each time-step t, the data sequence consists using a weighted sum of M Gaussian kernels:
of two parts: et and dt . et represents the number of pickups 
M
in each area and its length is the number of small areas in the p(yt |x) = wk (x)gk (yt |x) (2)
entire city. dt represents date, day of the week, hours, minutes k=1
and other impacting factors at time-step t. The input to the
network at each time-step is x t = {et , dt } and the network where gk (yt |x) is the k t h multivariate Gaussian kernel. Note
will try to predict yt = et +1 . that both the mixing coefficient and the Gaussian kernels are
The sequence learning model is created based on an LSTM conditioned on the complete history of the inputs till current
recurrent neural network and the MDNs. Fig. 5 shows the time-step x = {x 1 . . . x t }. The multivariate Gaussian kernel
structure of the unrolled LSTM-MDN learning model. The can be represented as:
 
total unrolling length is a hyper-parameter that can be set 1 yt − μk (x)2
according to testing scenario. LSTM can encode the useful gk (yt |x) = exp − (3)
(2π) N/2 σk (x) 2σk (x)2
information of the past in a single or multiple layers. The
input to each layer is the output of the previous layer con- where the vector μk (x) is the center of k t h kernel. We do
catenated with the network input. Each LSTM layer predicts not calculate the full covariance matrices for each kernel, since
its output based on its current input and its internal state. The this form of Gaussian mixture model is general enough to
concatenation of outputs of all layers will be used to predict approximate any density function [27]. Finally, we can define
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

XU et al.: REAL-TIME PREDICTION OF TAXI DEMAND USING RECURRENT NEURAL NETWORKS 5

Fig. 8. The unrolled LSTM-MDN-Conditional model for one time-step


prediction.

Fig. 6. The LSTM-MDN model perform one prediction for time t + 1.


current demands in other areas. Training such a model takes
much longer time than the LSTM-MDN model because the
LSTM needs to be unrolled for much longer time-steps. For
a city with N areas, the model needs to be unrolled N times
Fig. 7. Generate conditional distributions in a sequential way.
more compared to the LSTM-MDN model. Fig. 9 shows a
density map of real and predicted taxi demands over the entire
the error function in terms of negative logarithm likelihood: city.
M 
 V. E XPERIMENTAL S TUDY
E t = −ln wk (x)gk (yt |x) (4)
k=1 In this section, we evaluate the proposed LSTM-MDN and
LSTM-MDN-Conditional models on a dataset of taxi requests
After the model is trained, we can make a prediction for
and see how well they can predict taxi demand in the future.
time-step t + 1 by inputting taxi demand at time-step t.
In addition, we compare our models with two other baselines
As we can see in Fig. 6, we use the output which is the
and show that they outperform both.
mixture density parameters to parameterize a Gaussian mixture
distribution. A sample can then be drawn from this distribution A. Experimental Setup
and this sample would be the prediction of the next time-step
taxi demand,  et +1 . This prediction can be repeated in a loop We evaluate the performance of the proposed network mod-
to predict taxi demand for multiple time-steps. els with the New York City taxi trip dataset [26]. There are two
kinds of cabs in NYC: the yellow cabs, which operate mostly
in Manhattan, and the green cabs, which operate mostly in
C. LSTM-MDN-Conditional Model the suburbs. The dataset contains taxi trips from January 2009
In the LSTM-MDN model, the probability distribution of through June 2016 for both yellow and green cabs. Each
taxi demands in all areas are predicted at the same time-step. taxi trip has a pickup time and location information. In this
This means that prediction in each area is conditioned on all application, we use its most recent 3.5 years data: from January
areas of all previous time-steps. However, the taxi demand in 2013 through June 2016, which contains over 600 million
an area might not only be related to the past, but also to the taxi trips after data filtering. We use 80% of the data for
taxi demands of other areas in current time-step. So instead training and keep the remaining 20% for validation. The
of outputting a joint distribution for all areas in a single time- network model is implemented in Blocks [30] framework that
step, we ask the network to predict the conditional distribution is built on top of Theano [31]. We stop the training when the
of each area at a single time-step. This approach has been validation error does not change for 20 epochs. The training
adopted in other works such as in Pixel RNNs [28] and takes 2-4 hours on a GTX 1080 for each of the experiments.
Neural Autoregressive Distribution Estimator (NADE) [29]. After the training, the time that takes for the network to predict
Fig. 7 shows the idea of generating yti conditioned on all the demand is less than a second. Note that the prediction
the previously predicted demands left. Here yti represents the time is more important than the training time. This is because
predicted taxi demand in area i at time-step t. the model can be trained once but once deployed it needs to
We call this approach LSTM-MDN-Conditional model. predict the demand in a loop to provide this information in
It has the same input x t as the LSTM-MDN mode, but each x t real-time.
only leads to one area taxi demand output. Unlike predicting Theoretically, LSTM can be trained using arbitrary sequence
taxi demands for all areas in LSTM-MDN mode, this model lengths. However, constrained by the computational power,
sequentially predicts demand for each area in a conditional we use every one week data as a sequence and cut it into
way. Fig. 8 shows the unrolled LSTM-MDN-Conditional time-steps with different lengths. For example, if the time-
model structure for one time-step prediction. step length is 60mi ns, the sequence length would be 24 × 7.
This LSTM-MDN-Conditional model not only learns If the time-step length is 20mi ns, the sequence length would
demand patterns from past taxi data, but also takes into account be 24 × 3 × 7. For the 60mi ns case, the encoded input data
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Fig. 9. The density map of real demand and the predicted demand. Red areas show high demand for taxis, yellow areas show lower demand, and there is
no demand in other areas. The figure illustrates that the difference between the prediction and the real value is very small.

TABLE I The constant c in Eq. 5 is a small number (c = 1 in this


E XPERIMENTAL PARAMETERS i,t
application) to avoid division by zero when both Yi,t and Y
are 0. Similarly, when evaluating the prediction performance
over the entire city, the sMAPE and RMSE of all areas at
time-step t would be:

1  |Yi,t − Y i,t |
N
s M A P Et = (7)
N Y +Y i,t + c
i=1 i,t


1  N

RM S E t = Yi,t − Yi,t 2 (8)
N
i=1
shape is (182, 168, 6494) in which 182 is the total number of Here N is the total number of areas in the city. From
sequences in the dataset, i.e., number of weeks in the 3.5 years, the statistic perspective, RMSE shows the difference of the
168 is the sequence length: 24 × 7, and 6494 is the number predicted value from the real value while the sMAPE describes
of features consisting of number of areas, impacting factors a percentile error.
such as date, day of the week and other information. Table I To evaluate the performance of the proposed models,
includes the list of parameters in the experiments. we compare the outcomes with prediction approaches based
on another two strategies: the fully connected feed-forward
B. Performance Metrics and Baselines neural networks and naive statistic average.
1) Fully Connected Feed-Forward Neural Network Predic-
To systematically examine the performance of our pre-
tor: Feed-forward neural networks are commonly used for
diction approach, we include results with two widely used
classification and regression problems. Feed-forward neurons
prediction error metrics: Symmetric Mean Absolute Per-
have connections from their input to their output. The main
centage Error (sMAPE) [13], [32] and Root Mean Square
difference between feed-forward neural networks and recurrent
Error (RMSE) [33], [34]. Yi,t is the real taxi demand in area i
i,t is the predicted taxi demand. The neural networks is that in RNNs, the recurrent connection from
at time-step t, while Y
the output to the input at next time-step makes the network
sMAPE and RMSE in area i over time [1 − T ] would be:
capable of storing information. In this approach, the layers
1  |Yi,t − Y i,t | are fully connected which means that neurons between two
T
s M A P Ei = (5) adjacent layers are all connected together.
T Y +Y i,t + c
t =1 i,t 2) Naive Statistic Average Predictor: This approach pre-


dicts based on the mean value of past demands in a sliding-

1  T

RM S E i = Yi,t − Yi,t 2 (6) window. For example, if it is 10:00 am on Monday, the
T predicted value would be the average of demands at 10:00 am
t =1
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

XU et al.: REAL-TIME PREDICTION OF TAXI DEMAND USING RECURRENT NEURAL NETWORKS 7

Fig. 10. Prediction performance of different approaches according to sMAPE. Fig. 11. Prediction performance of different approaches according to RMSE.

in the past 5 Mondays. While we use the term “naive”, even


this approach requires the maintenance of long term, detailed
statistics of both spatial and temporal distributions of pickups.
It is very likely a good approximation of what taxi companies
are currently deploying.

C. Performance Results
First, we report prediction sMAPE and RMSE over the
entire city (all prediction areas). Second, we show the predic-
tion performance at some specific areas as time passes. Lastly,
we analyze the importance of different impacting factors on
prediction performance. For the four different predictors based
on LSTM-MDN-Conditional, LSTM-MDN, fully connected
feed-forward neural networks and naive statistic average, we
respectively use LSTM-C, LSTM, FC and Naive for short. Fig. 12. RMSE of different time-step lengths. With the real number of
We do not include the LSTM-C predictor in the entire pickups, min = 0, max = 535 and standard deviation σ = 12.0.
city performance comparison. We only evaluate the LSTM-C
model on specific areas. This is because conditioning only
neighboring areas on each other should be enough. Two areas in different areas, we have mi n = 0, max = 535 and standard
that are very far from each other most probably will not affect deviation σ = 12.0. The time-step length is 60mi ns.
each other. In addition, if we condition more areas on each Fig. 12 reports the error bar of prediction RMSE over the
other, LSTM will have a difficult time relating the events that entire city, with the standard deviation in one week. We show
happen many time-steps away from each other. this RMSE with different time-step lengths in the LSTM
1) Performance Over the Entire City: To evaluate the predictor. Basically, smaller time-step length means smaller
prediction performance over the entire city which includes number of pickups in each time slot, which does affect the
about 6500 areas, we compare the performance of the LSTM final RMSE. To avoid this, we sum up the predicted number of
predictor, the FC predictor and the Naive predictor in terms pickups every 60mi ns. As we can see in Fig. 12, the model has
of RMSE and sMAPE from Eq. 7 and Eq. 8. the minimum RMSE at time-step length either 10 or 20 mi ns.
We report sMAPE and RMSE over the entire city in Fig. 10 Overall, the RMSEs are very close under different time-step
and Fig. 11. As we can see, though they are different prediction lengths.
error metrics, they share some common patterns. For instance, 2) Performance at Specific Areas: We compare the pre-
both of them reach the minimum values at about 3am and peak diction performances of LSTM-C, LSTM, FC and the Naive
at about 8am and 10pm. In both figures, LSTM shows better predictors in specific areas. First of all, we select two areas
performance in prediction than the FC and Naive predictors. that their taxi demands in a week are shown in Fig. 13-a and
In Fig. 10, sMAPE shows the mean absolute percentage error, Fig. 13-b. The reason we select these two areas is that both
which gives us a way to calculate the prediction accuracy of them show regular demand patterns on both weekdays and
and observe that it is more than 80%. In Fig. 11, RMSE weekends. The first area is a working area while the second
shows the root mean squared difference between the predicted area is one of the most popular areas (in terms of taxi requests)
demands and the real demands. Note that, to the real demands according to our analysis to all the past taxi data in NYC.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Fig. 13. Comparison in areas with different demand patterns.

The first area is close to the intersection of West 40rd Street TABLE II
and 9th Ave in downtown while the second area is close to M ODEL W ITH D IFFERENT I MPACTING FACTORS : I
the intersection of West 33rd Street and 7th Ave.
The time-step length used here is 60mi ns in both areas.
The right part of Fig. 13 shows the prediction sMAPE of each
day in both areas. We include continuous 5 weeks prediction
results and show each standard deviation with an error bar.
As we can see from Fig. 13, in both areas, LSTM-C and
LSTM models outperform FC and Naive models in most days.
This proves LSTM is good at learning sequential information, In experiment I, we want to show the impact of each single
even though the sequence length is as long as a week. factor on prediction performance. We design different models,
LSTM-C can give more accurate prediction results than the which are shown in Table II, with each single factor as model
LSTM, because it considers both past information and current input. The control group here is the model with only past taxi
conditional information of other areas. But it requires a lot pickups as input. All models are expected to output the next
of computational power to train such a conditional model. FC time-step taxi demand in the city. Then we train each model
sometimes results in larger errors than the Naive predictor due and show the prediction performance in Fig. 14.
to the irregularities in sequence patterns. As it is shown in Fig. 14, pickup is the most valu-
3) Importance of Impacting Factors: In this part, we report able information in making future taxi demand prediction.
the importance of different impacting factors in our prediction In model B and C, since no past taxi trip information is
models. The impacting factors in our model include Date & provided, the models are trying to remember the mapping
Time, Day of week, Weather and Drop-offs. Date & Time from the input to the real taxi demand at each time-step.
includes year, month, date and time of each time-step. Day In this case, LSTM works similar to the feed-forward neural
of week represents which day of week that time-step is. networks. Model D has the worst performance due to two
Weather includes 4 different types: rain, snow, fog and thunder. reasons: one reason is that it is hard for the model to make a
We get the official weather information of NYC from National taxi demand prediction only based on a weather information.
Oceanic and Atmospheric Administration (NOAA). It includes Another reason is we only have climate information from three
climate observations from three land-based stations in the city. land-based observations, which is not a good descriptor of
We also include the number of Drop-offs as an impacting the whole city. In model E, we use number of drop-offs in
factor to see if there is any relation between the pickups and each area as input. It is interesting to find out it can give a
drop-offs in each area. To show the impacts of these inputs prediction performance close to model C, which means that
on prediction performance, we conduct two experiments. Both the drop-offs pattern has a close relation to the pickups pattern.
experiments are implemented on the LSTM-MDN model since However, the drop-off information is not as useful as pickup
we evaluate the prediction performance for the entire city. information. This is because the network needs to remember
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

XU et al.: REAL-TIME PREDICTION OF TAXI DEMAND USING RECURRENT NEURAL NETWORKS 9

worst performance compared with other models. In model F,


we include all impacting factors. As we can see, the median
prediction error is about 17%, which means a median accuracy
of around 83% can be obtained.
Overall, the experimental results show that LSTM-C and
LSTM outperform the other prediction approaches. This is
because LSTM can see and process information in the previous
time-steps. For instance, if a group of people request taxis to
go to a concert, it will remember this information and use it to
predict that after a couple of hours there would be almost the
same number of requests in the concert area. The FC network
can find the best mapping from the time and geographical
information to the number of requests without having access
to the demand in the previous time-steps. So this limitation
causes larger errors in its prediction. The Naive approach is
even more restricted since it has access to only a small history
Fig. 14. Prediction performance on different single impacting factors. of the demand in one area unlike the FC which is trained on
all the historical data of all areas.
To sum up, for better prediction, we need to use a model
that is very powerful and properly conditions the output on
all the available information. In addition, the best prediction
performance is achieved when all the impacting factors con-
sidered in this work are available as input to the network.
VI. C ONCLUSION
We propose a sequence learning model based on recurrent
neural networks and mixture density networks to predict taxi
demand in different areas of a city. By learning from historical
taxi demand patterns, the proposed LSTM-MDN model can
make taxi demand predictions for the entire city. Three and
a half years taxi trips data of New York City is used to train
our model. Experimental results show that the LSTM-MDN
model can get a good accuracy of around 83% at the city level.
We further extend the LSTM-MDN model to a conditional
Fig. 15. Prediction performance on different impacting factors. model in which each prediction is not only made based on past
TABLE III taxi information, but also conditioned on current demands in
M ODEL W ITH D IFFERENT I MPACTING FACTORS : II other areas. We show that this approach can further improve
the prediction performance. In addition, we show that our
models outperform two other prediction models based on fully
connected feed-forward neural networks and naive statistic
average.
This work can be extended by adding more information
to the input of the network such as where businesses, shops,
restaurants, etc. are located. In addition, we can organize the
taxis in a city and distribute them in real-time according to
the drop-off information to use them later. But, the pickup the demand prediction by our model. This can help a lot
in situations where in some areas there is large demand but
information is the most informative feature probably because
the taxi drivers are competing with each other for having
it does not change much in one time-step (from input to the
passengers in another area of the city. A central taxi dispatch
output).
system would be especially beneficial when in the future self-
In experiment II, we add impacting factors to the number
driving cars need to be organized automatically to respond to
of pickups as model inputs to see if they really improve the
the taxi requests in a city. Such a system can save a lot of time
prediction accuracy. Table III shows the input to each model.
from people who need taxis. In addition, it can save so much
In the last group model F, we include pickups and all impacting
factors as input. gas that is currently being spent by taxis to find passengers.
Fig. 15 shows each model’s prediction performance. It can R EFERENCES
be seen that all the models have close performances. The
[1] N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie, “T-finder: A recommender
reason is that pickup information is so informative that makes system for finding passengers and vacant taxis,” IEEE Trans. Knowl.
the effect of other factors very small. Model D and E have the Data Eng., vol. 25, no. 10, pp. 2390–2403, Oct. 2013.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

[2] K. T. Seow, N. H. Dang, and D.-H. Lee, “A collaborative multiagent [30] B. van Merriënboer et al. (2015). “Blocks and fuel: Frameworks for
taxi-dispatch system,” IEEE Trans. Autom. Sci. Eng., vol. 7, no. 3, deep learning.” [Online]. Available: https://arxiv.org/abs/1506.00619
pp. 607–616, Jul. 2010. [31] R. Al-Rfou et al. (2016). “Theano: A python framework for
[3] P. Santi, G. Resta, M. Szell, S. Sobolevsky, S. H. Strogatz, and C. Ratti, fast computation of mathematical expressions.” [Online]. Available:
“Quantifying the benefits of vehicle pooling with shareability networks,” https://arxiv.org/abs/1605.02688
Proc. Nat. Acad. Sci. USA, vol. 111, no. 37, pp. 13290–13294, 2014. [32] P. Lopez-Garcia, E. Onieva, E. Osaba, A. D. Masegosa, and A. Perallos,
[4] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Large-scale transportation “A hybrid method for short-term traffic congestion forecasting using
network congestion evolution prediction using deep learning theory,” genetic algorithms and cross entropy,” IEEE Trans. Intell. Transp. Syst.,
PLoS ONE, vol. 10, no. 3, p. e0119044, 2015. vol. 17, no. 2, pp. 557–569, Feb. 2016.
[5] K. Zhang, Z. Feng, S. Chen, K. Huang, and G. Wang, “A frame- [33] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction
work for passengers demand prediction and recommendation,” in with big data: A deep learning approach,” IEEE Trans. Intell. Transp.
Proc. IEEE SCC, Jun. 2016, pp. 340–347. Syst., vol. 16, no. 2, pp. 865–873, Apr. 2015.
[6] K. Zhao, D. Khryashchev, J. Freire, C. Silva, and H. Vo, “Predicting [34] M. Yang, Y. Liu, and Z. You, “The reliability of travel time forecasting,”
taxi demand at high spatial resolution: Approaching the limit of pre- IEEE Trans. Intell. Transp. Syst., vol. 11, no. 1, pp. 162–171, Mar. 2010.
dictability,” in Proc. IEEE BigData, Dec. 2016, pp. 833–842.
[7] D. Zhang, T. He, S. Lin, S. Munir, and J. A. Stankovic, “Taxi-passenger-
demand modeling based on big data from a roving sensor network,” Jun Xu received the M.S. degree in electrical
IEEE Trans. Big Data, vol. 3, no. 1, pp. 362–374, Sep. 2017. engineering from the Beijing University of Posts
[8] F. Miao et al., “Taxi dispatch with real-time sensing data in metropolitan and Telecommunications, China. He is currently
areas: A receding horizon control approach,” IEEE Trans. Autom. Sci. pursuing the Ph.D. degree in computer science from
Eng., vol. 13, no. 2, pp. 463–478, Apr. 2016. the Department of Computer Science, University
[9] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural of Central Florida. His research interests include
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. mobility model, agent path planning, and machine
[10] A. Graves. (2013). “Generating sequences with recurrent neural net- learning.
works.” [Online]. Available: https://arxiv.org/abs/1308.0850
[11] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning
with neural networks,” in Proc. NIPS, Dec. 2014, pp. 3104–3112.
[12] C. M. Bishop, Mixture Density Networks. Birmingham, U.K.: Aston
University, 1994.
Rouhollah Rahmatizadeh received the B.S. degree
[13] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and
in computer engineering from the Sharif University
L. Damas, “Predicting taxi–passenger demand using streaming data,”
of Technology, Tehran, Iran, in 2012, and the M.S.
IEEE Trans. Intell. Transp. Syst., vol. 14, no. 3, pp. 1393–1402,
degree in computer science from the University of
Sep. 2013.
Central Florida (UCF), Orlando, in 2014, where
[14] N. Davis, G. Raina, and K. Jagannathan, “A multi-level clustering
he is currently pursuing the Ph.D. degree in com-
approach for forecasting taxi travel demand,” in Proc. IEEE ITSC,
puter science. His research interests include machine
Dec. 2016, pp. 223–228.
learning, robotics, and wireless sensor networks.
[15] J. Yuan, Y. Zheng, L. Zhang, X. Xie, and G. Sun, “Where to find my
next passenger,” in Proc. ACM UbiComp, Sep. 2011, pp. 109–118.
[16] S. Ma, Y. Zheng, and O. Wolfson, “T-share: A large-scale dynamic taxi
ridesharing service,” in Proc. IEEE ICDE, Apr. 2013, pp. 410–421.
[17] H. Rong, X. Zhou, C. Yang, Z. Shafiq, and A. Liu, “The rich and the
poor: A Markov decision process approach to optimizing taxi driver Ladislau Bölöni (SM’05) received the B.Sc. degree
revenue efficiency,” in Proc. ACM CIKM, Oct. 2016, pp. 2329–2334. (Hons.) in computer engineering from the Technical
[18] J. Azevedo, P. M. d’Orey, and M. Ferreira, “On the mobile intel- University of Cluj-Napoca, Romania, in 1993, and
ligence of autonomous vehicles,” in Proc. IEEE NOMS, Apr. 2016, the M.Sc. and Ph.D. degrees from the Computer Sci-
pp. 1169–1174. ences Department, Purdue University, in 1999 and
[19] P. S. Castro, D. Zhang, C. Chen, S. Li, and G. Pan, “From taxi gps 2000, respectively. He is currently a Professor with
traces to social and community dynamics: A survey,” ACM Comput. the Department of Computer Science, University of
Surv., vol. 46, no. 2, p. 17, 2013. Central Florida (with a secondary joint appointment
[20] A. de Brébisson, É. Simon, A. Auvolat, P. Vincent, and Y. Bengio. with the Department of Electrical and Computer
(2015). “Artificial neural networks applied to taxi destination predic- Engineering). His research interests include cogni-
tion.” [Online]. Available: https://arxiv.org/abs/1508.00021 tive science, autonomous agents, grid computing,
[21] R. Rahmatizadeh, P. Abolghasemi, A. Behal, and L. Bölöni. (2016). and wireless networking. He is a member of the ACM, AAAI, and the Upsilon
“Learning real manipulation tasks from virtual demonstrations using Pi Epsilon Honorary Society. He received a fellowship from the Computer and
LSTM.” [Online]. Available: https://arxiv.org/abs/1603.03833 Automation Research Institute of the Hungarian Academy of Sciences for the
[22] A. Karpathy, J. Johnson, and L. Fei-Fei. (2015). “Visualizing and 1994–1995 academic year.
understanding recurrent networks.” [Online]. Available: https://arxiv.
org/abs/1506.02078
[23] A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with Damla Turgut (M’03) received the B.S., M.S., and
deep recurrent neural networks,” in Proc. IEEE ICASSP, May 2013, Ph.D. degrees from the Computer Science and Engi-
pp. 6645–6649. neering Department, University of Texas at Arling-
[24] K. Simonyan and A. Zisserman. (2014). “Very deep convolutional ton. She is currently an Associate Professor with
networks for large-scale image recognition.” [Online]. Available: https:// the Department of Computer Science, University
arxiv.org/abs/1409.1556 of Central Florida. Her research interests include
[25] G. Niemeyer. (2008). Tips & Tricks About Geohash. [Online]. Available: wireless ad hoc, sensor, underwater and vehicular
http://geohash.org/site/tips.html networks, cloud computing and considerations of
[26] NYC Taxi & Limousine Commission. Taxi and Limousine Commission privacy in the Internet of Things. She is also inter-
(TLC) Trip Record Data. Accessed: Dec. 2016. [Online]. Available: ested in applying big data techniques for improving
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml STEM education for women and minorities. She
[27] G. J. McLachlan and K. E. Basford, Mixture Models: Inference and serves as a member of the Editorial Board and of the Technical Program
Applications to Clustering, vol. 84. New York, NY, USA: Marcel Committee of the ACM and IEEE journals and international conferences. She
Dekker, 1988. is a member of the ACM, and the Upsilon Pi Epsilon Honorary Society. Her
[28] A. van der Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent recent honors and awards include University Excellence Award in Professional
neural networks,” in Proc. ICML, Jun. 2016, pp. 1747–1756. Service in 2017 and being featured in the UCF Women Making History series
[29] H. Larochelle and I. Murray, “The neural autoregressive distribution in 2015. She was a co-recipient of the Best Paper Award at the IEEE ICC
estimator,” in Proc. AISTATS, Jun. 2011, pp. 29–37. 2013.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy