Exposure To Risk Increases The Excess of Zero Accident Claims Frequency in Automobile Insurance
Exposure To Risk Increases The Excess of Zero Accident Claims Frequency in Automobile Insurance
The Research Institute of Applied Economics (IREA) in Barcelona was founded in 2005, as a research
institute in applied economics. Three consolidated research groups make up the institute: AQR, RISK
and GiM, and a large number of members are involved in the Institute. IREA focuses on four priority
lines of investigation: (i) the quantitative study of regional and urban economic activity and analysis of
regional and local economic policies, (ii) study of public economic activity in markets, particularly in the
fields of empirical evaluation of privatization, the regulation and competition in the markets of public
services using state of industrial economy, (iii) risk analysis in finance and insurance, and (iv) the
development of micro and macro econometrics applied for the analysis of economic activity, particularly
for quantitative evaluation of public policies.
IREA Working Papers often represent preliminary work and are circulated to encourage discussion.
Citation of such a paper should account for its provisional character. For that reason, IREA Working
Papers may not be reproduced or distributed without the written consent of the author. A revised version
may be available directly from the author.
Any opinions expressed here are those of the author(s) and not those of IREA. Research published in
this series may include views on policy, but the institute itself takes no institutional policy positions.
Abstract
Jens Perch Nielsen: Cass Business School, City, University of London, 106 Bunhill Row,
London EC1Y 8TZ, United Kingdom.
Acknowledgements
The study was supported by ICREA Academia, the Spanish Ministry of Economy and Competitiveness, and the ERDF
under grants ECO2015-66314-R and ECO2016-76203-C2-2-P.
1. INTRODUCTION AND MOTIVATION
According to the World Health Organization (2017), road traffic injuries are responsible
for more than 1.2 million deaths every year. Indeed, they are the leading cause of
the beginning of 2013 until the end of 2015, there was a 16% increase in the number of
insurance companies have begun to collect telematics data about drivers’ exposure to
traffic (i.e. distance driven and vehicle location) and their driving behaviour (excess
speed and aggressiveness). This information can improve the insurance ratemaking
process(2-10) and also allows conclusions to be drawn about how to make driving safer.
insured vehicle to record and store relevant information about variables that change over
time, including, for example, the number of kilometres driven per day by the insured,
the percentage of kilometres driven above the speed limit, and the percentage of
advance, given that, previously, automobile insurance companies could only use
variables related to certain fixed characteristics of the insured (for example age, gender,
or number of years since the driver’s license was issued) and the vehicle (age of the
3
Most automobile insurance databases contain many policy holders with zero claims.
This high frequency of ‘zeros’ may be due to the presence of insureds that have no wish
to claim for small accidents in order to avoid a premium increase or, alternatively, it
might be due to the relative lack of use they make of their vehicles. If the vehicle is
parked in a garage, it is not exposed to the risk of accident. Here, we analyse distance
driven as a measure of exposure to risk and examine its role in the probability of an
insured having zero claims. We show how to differentiate those drivers that almost
never use their vehicles (and so have little exposure to the risk of an accident) from
those that are good drivers, i.e. those who, despite recording high mileages, are not
even though we are aware that some accidents are not reported to the insurance
We discover a positive relationship between the distance driven and the number of
excess zeros observed in the number of claims. We argue that this is due to a learning
effect, where good drivers are more frequent than expected among those that drive long
distances. The overall effect of the driving distance variable is positive, however, even
if it is true that longer driving should obviously result in higher premium, there is a
discount due to the increased proportion of zeros in the frequencies, due to a learning
effect. The overall effect is still an increase in the premium, however not as much as we
Our research is innovative because (1) we introduce telematics covariates while dealing
with the excess of zeros and (2) we discuss the implications for new insurance products
and traffic safety that are obtained on the basis of distance driven.
4
Various studies have explored the potential of telematics when applied to risks of road
several papers have examined the impact of new technologies on road safety and how
driving habits can be measured,(13-20) while others have focused specifically on mileage
and new risk factors that might be included in the ratemaking process (see (2) for an
extended review). Recently, it has been proven that including standard telematics
be able to tailor their products to the customers’ risk profile. (21) The objective for the
insurance industry is to penalize high risk drivers with higher premiums by taking into
the speed limits or not respecting safety distances. We show that having information
about the annual distance driven by the insured improves the ratemaking process
considerably not only because it is a measure of exposure to risk, but because of the
crucial role it plays in the analysis of the absence of claims, i.e. the probability of not
claiming or, in other words, the probability of zero claims. See the following papers on
predict the number of automobile claims in insurance. The Poisson regression model is
a special case of the generalized linear model class and serve as a benchmark
model.(24,25) However, various corrections have to be made when assuming that the
probability of zero is larger than the probability under the Poisson assumption – a so-
called excess of zeros. Various papers suggest that this excess is caused by
we wish to differentiate those drivers that have no claims because they rarely use their
5
vehicles during the year (in the extreme case, making no use of the vehicle at all) from
those that have no claims despite being frequent drivers. To do this, we propose using a
zero-inflated Poisson (ZIP) model corrected by distance (kilometres driven per year by
the driver). While various studies have used ZIP models(28-30) and applied them to the
context of automobile insurance(31-33), none of these contributions has analysed the role
From the empirical point of view, we draw on a real automobile claims database for a
sample of insureds. This includes individual details about annual mileage travelled and
other aspects of driving behaviour, which enable us to study the effects of various
indicators on the probability of making a claim. We highlight the implications of this for
The rest of the paper is structured as follows. In section 2, we present the methodology
used when including distance as an offset variable in the ZIP model. The database and
some descriptive results are presented in section 3 and our main results obtained with
the models specified are analysed in section 4. Finally, a discussion and the main
2. METHODOLOGY
A Poisson regression with an offset variable is the logical way to include an exposure to
risk variable in our model. Here, therefore, we opt to use a Poisson model with offset
6
Zero-inflated Poisson (ZIP) regression is a model for count data with an excess of zeros.
It assumes that with probability p the only possible observation is 0, and with
probability 1–p, a Poisson (λ) random variable is observed. For example, in a different
context, the same model can be used in quality control. Thus, when a manufacturing
system is properly aligned, defects are nearly impossible, and the p is large. But when
the machine is misaligned, defects may occur according to a Poisson (λ) distribution.
This same principle is also plausible in motor insurance when modelling the number of
accidents per year. Some drivers hardly use their vehicle or use it very rarely, so for
Both the probability of no accidents and the mean number of defects λ in the imperfect
state (when people use their cars) may depend on covariates that are defined for each
individual. Here, we have not included subscript i to refer to the i-th observation in a
sample of size n, to make notation easier. Sometimes p and λ are unrelated; but on other
In either case, ZIP regression models are easy to fit. Maximum likelihood estimates
(MLE) are approximately normal in large samples, and confidence intervals can be
constructed by inverting likelihood ratio tests or using the approximate normality of the
MLE. The estimation can be performed with standard statistical software, such as R or
SAS, but the interpretation of the results of a ZIP regression model is not
experiment involving soldering defects on printed wiring boards, two sets of conditions
resulted in roughly the same mean number of defects; however, the perfect state was
more likely under one set of conditions and the mean number of defects in the imperfect
state was smaller under the other set. In other words, ZIP regression can show not only
7
which conditions give the lower mean number of defects but also why the means are
lower.
Notice that formally we introduce an extended model of zero claims in insurance using
distance driven as the exposure to risk variable. However, while this simple model
important effect. When factors other than just mileage are included in the model, then
essentially the extension suggested here also serves as a bias correction. If the effect of
risk exposure through distance is not log-linear, for example, then our extended model
adjusts for that. With the data provided herein, the adjustment via our extended model
improved considerably when mileage was included, and only marginally when further
variables were included. Finally, therefore, we opted only to include mileage in the
extension of the model, thus facilitating a straightforward interpretation. In this way, the
excess zeros in our extended model are simply interpreted as a function of miles driven.
In the zero part of the model, we have only a Bernoulli variable that distinguishes
between the zero event (no claim) versus the non-zero event (at least one claim), so the
expectation for this binary response random variable is exactly the probability of excess
zero claims, which should be limited to the [0,1] interval. For this reason, we have no
offset in this part and the parameter of the log-distance is not necessarily equal to one.
Below we first introduce the simple Poisson model with and without exposure as it has
traditionally been presented. Exposure, in our study, is equivalent to miles driven per
year.
8
2.1. The Poisson model
Let us assume that given xi, the dependent variable Yi follows a Poisson distribution
with parameter i , which is a function of the linear combination of parameters and
regressors, 𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 . Indeed,
When exposure to risk is introduced, then an offset is included in the model. Let us call
𝑇𝑖 the exposure factor for policy holder i (i=1,…,n), in our case Ti=ln(𝐷𝑖 ), where 𝐷𝑖
indicates distance travelled. Then the model can incorporate this factor as follows:
Under this model, the probability of zero using the Poisson distribution is calculated as
positive by definition, then the probability of zero claims declines naturally as distance
driven increases.
We are now ready to extend the traditional Poisson regression models above to include
excess zeros via ZIP models. This extension is also introduced with and without
exposure.
9
2.3. The zero-inflated Poisson model
where 𝑝𝑖 is the probability of the perfect, zero defect state and (1-𝑝𝑖 ) is the probability
of the complementary state. The new 𝑌 ∗ variable follows a Poisson distribution with
parameter exp(𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 ) and captures the claims distribution that is not
contaminated by the excess of zeros. Note that 𝑝𝑖 may depend on some covariates.
Under this model, the probability of suffering k accidents, when k is bigger than or
Here we assume that 𝑝𝑖 is the probability of an excess of zeros for the i-th observation
exp(𝛼0 + 𝛼1 ln(𝐷𝑖 ))
𝑝𝑖 =
1 + exp(𝛼0 + 𝛼1 ln(𝐷𝑖 ))
(4)
10
The Poisson model for 𝑌 ∗ is specified as follows, with an exposure, 𝐸 (𝑌𝑖 ∗ |𝑥𝑖 , 𝑇𝑖 ) =
exp(𝛼0 + 𝛼1 ln(𝐷𝑖 )) 1
P(𝑌𝑖 = 0) = + exp(−𝐷𝑖 𝜆𝑖 )
1 + exp(𝛼0 + 𝛼1 ln(𝐷𝑖 )) 1 + exp(𝛼0 + 𝛼1 ln(𝐷𝑖 ))
1
𝑃(𝑌𝑖 = k) = (exp(−𝐷𝑖 𝜆𝑖 ))𝐷𝑖 𝑘 𝜆𝑖 𝑘 /𝑘!
1 + exp(𝛼0 + 𝛼1 ln(𝐷𝑖 ))
Using the definition of the expectation of a discrete random variable, the expectation of
1 (5)
𝐸 (𝑌𝑖 |𝑥𝑖 , 𝑇𝑖 ) = (1 − 𝑝𝑖 )𝐸 (𝑌 ∗ 𝑖 |𝑥𝑖 , 𝑇𝑖 ) = 𝐷 𝜆 = 𝐷 ∗ 𝑖 𝜆𝑖
1 + exp(𝛼0 + 𝛼1 ln(𝐷𝑖 )) 𝑖 𝑖
𝐷𝑖
where 𝐷∗ 𝑖 = 1+exp(𝛼 is a transformation of the original exposure 𝐷𝑖 . So, when
0 +𝛼1 ln(𝐷𝑖 ))
Let us study the transformation. If 𝛼1 > 1, when 𝐷𝑖 is large then 𝐷 ∗ 𝑖 tends to zero, but
when 𝛼1 < 1 then 𝐷∗ 𝑖 increases when 𝐷𝑖 increases. On the other hand, when 𝐷𝑖 tends
If we examine the logistic regression part (4), we observe that 𝑝𝑖 can be understood
again as a transformation of the exposure into the [0,1] interval, which tends to zero
when 𝐷𝑖 tends to zero if 𝛼1 is positive. Moreover, the derivative of (5) with respect to
𝐷𝑖 shows how much the expected claims would change as a function of 𝐷𝑖 and indicates
that if 𝛼1 is significantly different from zero, then the derivative is not 𝐷𝑖 times 𝜆𝑖 .
Since insurance premiums are based on expected number of claims, this is an important
11
result as it potentially shows that insurance prices should not necessarily be linearly
3. DATA
We use information on the risk exposure and number of claims for 25,014 insureds with
car insurance coverage throughout 2011, that is, individuals exposed to the risk for a
full year. Note that in our case these data concern drivers up to a maximum age of 37,
given that the insurance product was sold primarily to young drivers. Our aim is to
discriminate between good and bad drivers in this portfolio segment and to identify the
influence of driving short distances.(16) Claim frequencies are presented in Table I, with
an expected value of 0.23 claims per person. Table I has information on the frequency
of all reported claims. The sum of reported claims that were not at fault is 3,108, while
the sum of claims at fault is 2,652. Overall 5,760 claims were reported. Descriptive
statistics for the risk exposure indicator (kilometres per year) are presented in Table II,
where we analyse drivers with and without claims separately. The rest of the indicators,
both those derived from traditional ratemaking factors and those obtained from
telematic devices, are presented in Table III, where we also present the definitions of
12
5 7 1 1
6 1 0 0
One insured driver had 6 claims, 2 were at fault and 4 where not at fault.
The results presented in Table II in relation to the annual distance travelled by the
insured drivers reveal differences between those with no claims and those with claims.
If we focus on the 25% of drivers that travelled the smallest distance over the year (1st
quartile), we observe that the insureds that claim at least one accident drove more
kilometres per year than those with no claims – the respective quartile values being 4.87
vs. 4.00. A similar pattern of behaviour is observed for the second (median) and third
quartiles with those making claims driving larger distances than those with no claims.
This result was as expected and is a clear indication of a relationship between claims
The Mann Whitney test is a nonparametric test of the null hypothesis that it is equally
likely that a randomly selected value from one sample is less than or greater than a
randomly selected value from a second sample. The Mann-Whitney test shows that the
differences in the mean for the exposure risk regressor (Table II), as well as for the
other classical and telematic regressors (Table III) are statistically significant in the
cases of drivers with no claims and drivers with claims, with the exception of vehicle
age (p-value=0.331) and the percentage of kilometres driven over the speed limit
13
rejected when using the Kolmogorov-Smirnov test. The Kolmogorov-Smirnov test is a
distributions that can be used to compare the statistical distribution of two samples.
From a univariate point of view, drivers that made a claim for at least one accident are,
on average, younger than those that made no claim and have held their driving licence
for fewer years. A similar conclusion can be drawn in the case of ownership of a
powerful vehicle, where those insureds making at least one claim present a higher value
than those making no claims. Unexpectedly, in the case of cars parked overnight in a
garage, the percentage value is higher among those who made at least one claim than it
is among those who made no claim. We would expect such cars to be safer, but it
appears that this variable may be closely related to car type, with powerful, more
expensive cars being kept in garages. As for the new driving behaviour indicators
derived from telematics, driving at night and driving in urban areas present larger mean
Table III. Explanatory variables* included in the models and descriptive statistics
Drivers with no
All sample Drivers with claims
claims
Description
Std. Std. Std.
Mean Mean Mean
Dev. Dev. Dev.
Age Age of the insured driver (in years) 27.57 3.09 27.65 3.09 27.18 3.10
Age2 Age squared of the insured driver
Male (%) Sex of the insured driver (1 if male, 0 female) 48.91 - 48.61 - 50.32 -
Age Driving Licence Experience of the insured driver 7.17 3.05 7.27 3.07 6.73 2.94
Vehicle age Age of the insured vehicle 8.75 4.17 8.76 4.19 8.69 4.11
Power Power of the insured vehicle 97.22 27.77 96.98 27.83 98.36 27.46
1 if the vehicle is parked in a garage over night, 0
Parking (%) 77.38 - 77.21 - 78.17 -
otherwise
14
4. RESULTS
Tables IV and V present the zero-inflated Poisson models including exposure to risk
(kilometres driven per year) as the offset variable in the models as discussed in section
models, their results being obtained using SAS, PROC GENMOD. To compare the
models, we use the Akaike Information Criterion (AIC), calculated as twice the number
of parameters in the model minus twice the value of the log-likelihood in the maximum.
The best model is the one that presents the smallest AIC value. 1
1
The AIC penalizes the number of parameters less strongly than the Bayesian information criterion
(BIC), which is calculated on the basis of the logarithm of the number of observations as opposed to
multiplying the number of parameters by two, as with the AIC.
15
Table IV highlights a clear improvement in the results when considering all the model
regressors (the lowest AIC value being obtained for the first specification). These
results seem to validate the conclusions drawn in previous studies, (2–4) in which the
relevance of the new indicators related to distance travelled and driving habits is
highlighted, but where they are used in conjunction with the classical regressors.
the logit model in its zero-inflation part. On first inspection, the positive sign of the
parameter associated with the log-distance in the logistics part might seem surprising
and it could be interpreted erroneously. This value (0.404) in the first column does not
mean that the greater the distance driven, the greater is the probability of the insured
having zero claims. Rather it means that the greater the distance driven, the greater is
the proportion of excess zero claims, indicating a deviation from the Poisson
Table IV. Zero-inflated Poisson model with offsets (Log of km per year in 000s). All
types of claim.
All variables Non-telematics Telematics
Poisson part
Intercept -2.148 0.045 -0.829 0.440 -3.461 <.001
Age -0.094 0.232 -0.123 0.121
Age2 0.002 0.221 0.002 0.131
Male -0.068 0.029 -0.011 0.719
Age Driving Licence -0.059 <.001 -0.067 <.001
Vehicle Age 0.014 <.001 0.017 <.001
Power 0.003 <.001 0.001 0.017
Parking 0.029 0.420 0.032 0.381
Zero-inflation part
Intercept (Logit) -0.847 <.001 -1.639 <.001 -0.795 <.001
16
Log of km per year 0.404 <.001 0.824 <.001 0.406 <.001
(thousands) (Logit)
AIC 28,877.112 29,427.423 29,005.172
BIC 28,999.019 29,508.694 29,070.189
In the case of the classical variables, all the parameters for gender, driving experience,
vehicle age and the power of the vehicle are statistically significant. Thus, we find an
increasing expectation in the number of claims for women drivers as opposed to men,
vehicles as opposed to owners of newer and less powerful cars. As for the new telematic
regressors, two – the percentage of kilometres per year driven over the speed limit and
the percentage of urban kilometres driven per year – are significant in explaining the
expected number of claims. Thus, the number of claims increases as these two
we compare the results of the second and third specifications (columns 2 and 3,
respectively), the best results are obtained for the model that only includes variables
Our model predicts the highest number of expected claims for younger women, with
little driving experience, driving old and powerful vehicles, driving in urban zones, and
exceeding the speed limit. Note that this result is in line with the results reported by
Mercer in 1989.(23)
Previous research(34) has shown that it may be interesting to include Age and Gender
interaction in the model. The results for all the models, which are available from the
authors, show that this interaction is not significant. In practice, Gender cannot be used
for pricing insurance in the EU, but it can certainly be used for risk evaluation and it can
17
conclusion for this sample is that there is no interaction between Age and Gender. There
are potentially two reasons for that (1) the sample consists of drivers aged less than 37
years, so Age may not have enough range to show a significantly different effect by
Gender or (2) as found by other authors, the influence of Gender is masked by the fact
that men on average drive significantly longer distances than women. The relationship
using average trip distance for a Belgian sample(36) or even taking both average trip
distance and total distance in another European portfolio sample (37),. They all concluded
that Gender differences in the risk of accidents are, to a large extent, attributable to the
Similar results are obtained when only claims at fault are considered (Table V), with the
exception that the age of the driver is now significant while gender is not. Here, again, a
better goodness of fit is obtained for the specification that includes all variables (both
telematic and non-telematic) and the model that includes only the telematics variables
(the lowest AIC value being obtained for served column 1). As in Table IV, a lower
AIC is obtained for the specification using only telematic variables as opposed to that
Table V. Zero-inflated Poisson model with offsets (Log of km per year in 000s). Claims
for which the policyholder was at fault
All variables Non-telematics Telematics
Poisson part
Intercept -0.697 0.653 0.278 0.857 -3.892 <.001
Age -0.224 0.050 -0.224 0.049
Age2 0.004 0.039 0.004 0.045
Male 0.000 0.998 0.076 0.093
Age Driving License -0.083 <.001 -0.088 <.001
18
Vehicle Age 0.013 0.015 0.016 0.004
Power 0.001 0.163 0.001 0.351
Parking -0.035 0.497 -0.025 0.637
Zero-inflation part
Intercept (Logit) -0.228 0.151 -0.765 <.001 -0.140 0.358
Log of km per year 0.442 <.001 0.743 <.001 0.441 <.001
(thousands) (Logit)
AIC 16,912.217 17,125.313 17,004.642
BIC 17,034.124 17,206.584 17,069.659
The age of at-fault drivers is inversely related to the expected number of claims, that is,
a higher number of accidents are expected among younger drivers. However, the
the two variables. Inexperienced drivers (measured in terms of the number of years in
which they have been in possession of a driving licence) and drivers of old vehicles
show a higher expected number of claims than that recorded by their more experienced
counterparts and drivers of newer vehicles. In common with the result in Table IV, the
percentage of kilometres per year driven over the speed limit, and additionally here the
claims in which the driver is at fault. The percentage of kilometres driven at night is
significant at the 10% level when we only consider the telematic variables but the AIC
value for this model is lower than that obtained for the first model.
Results for the models on the not at fault claims indicate similar conclusions. We have
not discussed the not at fault cases because in insurance premium calculation only
claims at fault are of main interest. Claims at fault indicate that the driver has caused an
19
accident, while not at fault means that the accident was due to someone else. If the
accident is caused by someone else, then the insured driver should not pay a higher
Comparisons with the classical Poisson model with offsets (without considering zero
inflation), both for the total sample and for claims where the policyholder is at-fault, are
not included here, but they do not enable us to see the impact of distance on the excess
of zeros. These results are available on request from the authors. The goodness of fit
results are always better in the zero-inflated models because they take into account
differences between false zeros (non-risk exposure) and true zeros (risk exposure and
zero claims).
In a similar context, it has been shown that prediction models for hurricane power
outage can be improved by a new two-step outage prediction model and the inclusion of
additional environmental variables that increase the overall accuracy(38). Our model also
prediction of the number of claims and this can be done in a two stage model
approach(2).
We have performed a holdout analysis, and we have tested the models against test sets
which were not used in the training process. We have chosen a 70% training sample,
versus a 30% holdout sample. In all cases we have confirmed the conclusions on the
significance of the parameter that we had in the initial analysis. The Chi squared test of
differences between observed and fitted frequencies was equal to 946.7 for the whole
sample. The hold out analysis indicates very similar values (1,041.3 with 6 degrees of
freedom in the training sample and 1,005.9 with 6 degrees of freedom in the test sample
20
In order to evaluate the variable importance, we have estimated the models using
standardized covariates, so that we can compare the coefficients. This analysis reveals
that the most important factor that determines the risk of a crash is the percentage in
urban driving, followed by the age of the driver’s license. The third factor is the
percentage of speed limit violation. The least relevant factors are the age of the vehicle,
gender of the driver, percent of night distance driven and parking in a garage.
5. CONCLUSIONS
We have shown that the part of the zero accident frequency not explained by traditional
insurance risk factors increases with the distance driven by the policy holder. This
means that when considering policy holders with the same characteristics but with
different exposures to risk in terms of distance driven per year, we can conclude that
those with a greater exposure present a larger proportion of excess zero claims than
those with less exposure. This can be understood as an indication of a learning effect, or
in terms of distance driven, that even if exposure to risk increases with distance driven,
the probability of not making a claim also increases compared to that of drivers in the
group that drive a shorter distance. This finding is evidence of the fact that good drivers
– if we identify them with those reporting no claims – are more frequent than expected
among the group of drivers that drive long distances than among those that drive shorter
This conclusion has a direct impact on the future design of PAYD insurance products,
insofar as the premium paid should not be strictly proportional to the distance driven.
Moreover, the premium should take into account the learning effect analysed here. One
21
possible solution would be to make the marginal increase in the insurance price per
kilometre driven dependent on the accumulated distance. Here, we have shown that this
relationship is not linearly dependent, as we report that the zero-inflation part plays a
significant role. Taking the derivative of (5) makes this non-linearity immediately
apparent.
The probability of excess zeros increases with distance. The coefficient for the
logarithm of the number of kilometres driven per year in the logit model (which predicts
zero inflation) is positive, i.e. the probability of observing false zeros increases with
increasing distance. Moreover, we have shown that the ZIP model gives better results in
terms of goodness of fit than those obtained with the classical Poisson model (non-zero-
Here, therefore, we have shown both the significance of the impact of the distance
variable coefficient and the positive relationship between traffic violations involving
excess speed and urban driving with the expected number of claims. These results are in
line with reports issued by official traffic institutions where it is argued that speed limit
driving is rewarded.(20)
Previous traffic studies published in Risk Analysis (22,23) have stressed the desirability of
including risk exposure in terms of distance driven. We have shown that indeed vehicle
percentages of kilometres driven at night, over the speed limit, and in urban zones,
among others can be included in the ratemaking process thus improving the results
obtained when just using classical driver variables, such as age and gender. This opens
22
the question whether pay-as-you drive should also consider a different price per mile
Our study shows that ZIP models with mileage as their offset variable can improve the
definition of drivers’ risk profiles and provide valuable policy guidelines that might be
consequence of a higher expected number of claims) could discourage the use of private
vehicles in cities, as called for by various European institutions (not least to reduce
violations, with an increase in the premium for drivers with a tendency to exceed the
speed limit.
REFERENCES
1. World Health Organization (2017). “10 facts on global road safety. Updated July
3. Lemaire J, Park SC, Wang KC. The use of annual mileage as a rating variable.
23
5. Paefgen J, Staake T, Fleisch E. Multivariate exposure modelling of accident risk:
8. Sivak M, Luoma J, Flannagan MJ, Bingham CR, Eby DW, Shope JT. Traffic
10. Edlin AS. Per-mile premiums for auto insurance. In: Arnott R, Greenwald B,
An approach with zero-inflated Poisson models for panel data. Journal of Risk
12. Vickrey W. Auto accidents, tort law, externalities and insurance: an economist’s
13. Shafique MA, Hato E. Use of acceleration data for transportation mode
24
14. Xu Y, Shaw SL, Zhao Z, Yin L, Fang Z, Li Q. Understanding aggregate human
mobility patterns using passive mobile phone location data: a home based
15. Ellison AB, Bliemer MCJ. Greaves SP. Evaluating changes in driver behaviour:
a risk profiling approach. Accident Analysis and Prevention, 2015; 75: 298-309.
16. Ayuso M, Guillen, M, Perez-Marin, AM. Time and distance to first accident and
17. Underwood G. On-road behaviour of younger and older novices during the first
six months of driving. Accident Analysis and Prevention, 2013; 58: 235-243.
18. Jun J, Guensler R, Ogle J. Differences in observed speed patterns between crash-
21. Baecke, P, Bocca, L. The value of vehicle telematics data in insurance risk
23. Mercer GW. Traffic accidents and convictions: group totals versus rate per
25
24. Gourieroux C, Monfort A, Trognon A. Pseudo maximum likelihood methods:
26. Chiappori PA, Salanié B. Testing for asymmetric information in insurance mar-
165.
28. Cameron AC, Trivedi PK. Regression analysis of count data. Cambridge
Berlin, 2003.
31. Boucher JP, Denuit M, Guillen M. Risk classification for claim counts: a
regression models of motor vehicle crashes: balancing statistical fit and theory.
33. Sarul, LS, Sahin, S. An application of claim frequency data using zero inflated
26
34. Mercer, GW. Influences on Passenger Vehicle Casualty Accident Frequency and
Restraint Device Use. Accident Analysis and Prevention, 1987; 19: 231-236.
https://doi.org/10.1111/rssc.12283.
37. Wüthrich, MV. Covariate selection from telematics car driving data. European
38. McRoberts, DB, Quiring, SM, Guikema, SD. Improving hurricane power
https://doi.org/10.1111/risa.12728.
27
Institut de Recerca en Economia Aplicada Regional i Pública Document de Treball 2014/17, pàg. 5
Research Institute of Applied Economics Working Paper 2014/17, pag. 5