0% found this document useful (0 votes)
65 views25 pages

Curating A COVID-19 Data Repository and Forecasting County-Level Death Counts in The United States

This document summarizes a project that collected COVID-19 data from various sources and used that data to develop models for forecasting county-level COVID-19 death counts in the United States over the next week. The models include exponential and linear predictors fitted to different aspects of the spatial and temporal COVID-19 data. Forecasts from different models are combined using ensembling. Prediction intervals are also developed to quantify the uncertainty of the forecasts. Evaluation shows the forecasts adapt to the exponential and sub-exponential nature of outbreaks and the prediction intervals cover over 90% of recorded deaths for most counties from April 11 to May 10. The forecasts are being used by a nonprofit to distribute medical supplies.

Uploaded by

ballechase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views25 pages

Curating A COVID-19 Data Repository and Forecasting County-Level Death Counts in The United States

This document summarizes a project that collected COVID-19 data from various sources and used that data to develop models for forecasting county-level COVID-19 death counts in the United States over the next week. The models include exponential and linear predictors fitted to different aspects of the spatial and temporal COVID-19 data. Forecasts from different models are combined using ensembling. Prediction intervals are also developed to quantify the uncertainty of the forecasts. Evaluation shows the forecasts adapt to the exponential and sub-exponential nature of outbreaks and the prediction intervals cover over 90% of recorded deaths for most counties from April 11 to May 10. The forecasts are being used by a nonprofit to distribute medical supplies.

Uploaded by

ballechase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Curating a COVID-19 data repository and

forecasting county-level death counts in the United


States
Nick Altieri1, † , Rebecca L Barter1 , James Duncan6 , Raaz Dwivedi2 , Karl Kumbier3 ,
Xiao Li1 , Robert Netzorg2 , Briton Park1 , Chandan Singh*2 , Yan Shuo Tan1 ,
arXiv:2005.07882v1 [stat.AP] 16 May 2020

Tiffany Tang1 , Yu Wang1 , Bin Yu*1, 2, 4, 5, 6


1
Department of Statistics, University of California, Berkeley
2
Department of EECS, University of California, Berkeley
3
Department of Pharmaceutical Chemistry, University of California, San Francisco
4
Chan Zuckerberg Biohub, San Francisco
5
Center for Computational Biology, University of California, Berkeley
6
Division of Biostatistics, University of California, Berkeley
May 19, 2020

†Authors ordered alphabetically. All authors contributed significantly to this work.


*Corresponding authors
This project was initiated on March 21, 2020, with the goal of helping aid the allocation of supplies to
different hospitals in the U.S., in partnership with the non-profit Response4Life.

Abstract
As the COVID-19 outbreak continues to evolve, accurate forecasting continues to play an ex-
tremely important role in informing policy decisions. In this paper, we collate a large data repository
containing COVID-19 information from a range of different sources.1 We use this data to develop
several predictors and prediction intervals for forecasting the short-term (e.g., over the next week)
trajectory of COVID-19-related recorded deaths at the county-level in the United States. Specifi-
cally, using data from January 22, 2020, to May 10, 2020, we produce several different predictors and
combine their forecasts using ensembling techniques, resulting in an ensemble we refer to as Com-
bined Linear and Exponential Predictors (CLEP). Our individual predictors include county-specific
exponential and linear predictors, an exponential predictor that pools data together across counties,
and a demographics-based exponential predictor. In addition, we use the largest prediction errors
in the past five days to assess the uncertainty of our death predictions, resulting in prediction in-
tervals that we refer to as Maximum (absolute) Error Prediction Intervals (MEPI). We show that
MEPI is an effective method in practice with a 94.5% coverage rate when averaged across counties.
Our forecasts are already being used by the non-profit organization, Response4Life, to determine
the medical supply need for individual hospitals and have directly contributed to the distribution of
medical supplies across the country. We hope that our forecasts and data repository can help guide
necessary county-specific decision-making and help counties prepare for their continued fight against
COVID-19.

1 Allcollected data, modeling code, forecasts, and visualizations are updated daily and available at https://github.
com/Yu-Group/covid19-severity-prediction.

1
1 Introduction
In recent months, the COVID-19 has dramatically changed the shape of our global society and economy
to an extent modern civilization has never experienced. Unfortunately, the vast majority of countries, the
United States included, were thoroughly unprepared for the situation we now find ourselves in. There
are currently many new efforts aimed at understanding and managing this evolving global pandemic.
This paper, together with the data we have collated, represents one such effort.
Our goal is to provide access to a large data repository (that combines data collected by a range of
different sources) and to provide a predictor to forecast short-term COVID-19 mortality at the county-
level in the United States along with uncertainty assessments of our predictors in the form of intervals.
Predicting the short-term impact of the virus in terms of the number of deaths (e.g., over the next
week) is critical for many reasons. Not only can it help elucidate the overall impacts of the virus, but it
can also help guide difficult policy decisions, such as when to impose/ease lock-downs. While many other
studies focus on predicting the long-term trajectory of COVID-19, these approaches are currently difficult
to verify due to a lack of long-term COVID-19 data. On the other hand, predictions for immediate short-
term trajectories are much easier to verify and are likely much more accurate than long-term forecasts.
Moreover, other predictive efforts have focused on modeling COVID-19 case- or death-counts at the
national or state-level, rather than the more fine-grained county-level that we consider in this paper.
As a result of researchers across academia and industry collectively refocusing their efforts towards
combating this universal viral threat we face, there are now a large number of papers covering many
dimensions of COVID-19 (see Section 6 for related work). However, to the best of the authors’ knowledge,
there is no related work addressing county-level predictions of COVID-19.
The predictions we produce in this paper focus on confirmed death counts, rather than confirmed
cases since confirmed cases fail to accurately capture the true prevalence of the virus due to limited
testing availability. Moreover, comparing different counties based on confirmed cases is difficult since
some counties have performed many more tests than others: the number of positive tests does not equal
the number of actual cases. We note that the confirmed death count is also likely to be an under-count
of the number of true COVID-19 deaths (since it seems as though in many cases only deaths occurring
in hospitals are being counted).2 Nonetheless, the confirmed death count is believed to be more reliable
than the confirmed case count.
In Section 2, we introduce our data repository and summarise the data sources contained within.
This data includes a wide variety of COVID-19 related information in addition to the county-level case-
and death-counts.
In Section 3, we introduce our predictive approach, wherein we fit a range of different exponential
and linear predictor models using the data. Each predictor captures a different aspect of the behaviors
exhibited by COVID-19, both spatially and temporally, i.e., across regions and time. The predictions
generated by the different methods are combined using an ensembling technique, which we refer to as
Combined Linear and Exponential Predictors (CLEP).
In Section 4, we develop uncertainty estimates for our predictors in the form of prediction intervals,
which we call Maximum (absolute) Error Prediction Intervals (MEPI). The ideas behind these intervals
come from conformal inference [44].
Section 5 details the evaluation of the predictors and the prediction intervals. Overall, we find that
our predictions are adaptive to the exponential and sub-exponential nature of COVID-19 outbreak, and
2 https://www.nytimes.com/interactive/2020/04/28/us/coronavirus-death-toll-total.html

2
our prediction intervals are reasonably narrow and cover the recorded number of deaths for more than
90% of days from April 11 to May 10 for most of the counties in the US.
Finally, we describe related work by other authors in Section 6, we discuss the impact of our work
in distributing medical supplies across the country in Section 7, and conclude in Section 8. Supporting
material is provided in the Appendix.
Making both data and the methods used in this paper accessible to others is key to ensuring the
usefulness of these resources. Thus the data, code, and predictors we discuss in this paper are available
as open-source on GitHub and are updated daily. The results in this paper contain case and death
information in the U.S. from January 22, 2020, to May 10, 2020, but the data and forecasts in the
GitHub repository are updated daily. We also provide several visualizations of the daily updated data
and forecasts.3 In Figure 1, we provide a high-level summary of the contributions made in this work.

COVID-19 Data Repository


COVID-19 Cases/Deaths + County-level Data + Hospital-level Data

Multiple county-level
CLEP Ensemble + MEPI intervals Visualizations
predictors
Predictor 1
Predictor 2
Ensemble
COVID-19 deaths
COVID-19 deaths

predictor
Predictor 3 Prediction
intervals

Time Time

Figure 1: An overview of the paper. We curate an extensive data repository combining data from multiple
data sources. We then build several predictors for county-level predictions of cumulative COVID-19 death
counts, and develop an ensembling procedure (CLEP) and a prediction interval scheme (MEPI) for these
predictions. Both CLEP and MEPI are generic machine learning methods and can be of independent
interest (see Sections 3.6 and 4.1 respectively). All the data, predictions, and visualizations are publicly
available on GitHub.

2 COVID-19 data repository


One of our primary contributions is the curation of a COVID-19 data repository that we have made
publicly available on GitHub. It is also updated daily with new information. Specifically, we have
compiled and cleaned a large corpus of hospital- and county-level data from 20+ public sources to
aid data science efforts to combat COVID-19. The repository currently includes data on COVID-19-
related cases, deaths, demographics, health resource availability, health risk factors, social vulnerability,
and other COVID-19-related information. In Table 1, we provide an overview of the county-level and
hospital-level data sources in our repository. The full corpus of data, along with further details and
extensive documentation, are available on GitHub. Note that similar but complementary county-level
data was recently aggregated and released in another study [33].
3 GitHub repo https://github.com/Yu-Group/covid19-severity-prediction. Visualizations available at https://

covidseverity.com/ and https://geodacenter.github.io/covid/map.html, in collaboration with the Center for Spatial


Data Science at the University of Chicago.

3
Dataset Description
COVID-19 Cases/Deaths Data
USAFacts* [10] Daily cumulative number of reported COVID-19-related deaths and confirmed
cases by US county, dating back to January 22, 2020 (updated daily)
New York Times [20] Similar to the USA Facts dataset, but it includes aggregated death counts only
in New York city without county breakdowns

Demographics and Health Resource Availability


Area Health Resources Files* [5] Includes data on health facilities, professions, resource scarcity, economic ac-
tivity, training programs, and socioeconomic characteristics (2018-2019)
Social Vulnerability Index [19] Reports the CDC’s measure of social vulnerability from 2018
Health Professional Shortage Ar- Provides data on areas having shortages of primary care, as designated by the
eas [14] Health Resources & Services Administration
Kaiser Health News ICU Number of hospitals, hospital employees, and ICU beds in each county
Beds* [16]
USDA Poverty Estimates [18] Poverty estimates and median household income for each county

Health Risk Factors


County Health Rankings & Estimates of various health outcomes and health behaviors (e.g., percentage of
Roadmaps* [2] adult smokers, percentage of adults with obesity)
Interactive Atlas of Heart Disease Estimated heart disease and stroke death rate per 100,000 (all ages, all
and Stroke* [17] races/ethnicities, both genders, 2014-2016) from the CDC
US Chronic Respiratory Disease Estimated mortality rates of chronic respiratory diseases (1980-2014) from the
Mortality Rates* [21] Institute for Health Metrics and Evaluation
CMS Chronic Conditions [7] Prevalence of 21 chronic conditions based upon CMS administrative enrollment
and claims data for Medicare beneficiaries in the fee-for-service program
Compressed Mortality File [8] Overall mortality rates (2012-2016) for each county from the National Center
for Health Statistics
Diagnosed Diabetes Atlas* [12] Estimated percentage of people who have been diagnosed with diabetes per
county (2016) from the CDC

Social Mobility
JHU Date of Interventions Dates that counties (or states governing them) took measures to mitigate the
Data [33] spread by restricting gatherings
Google Community Mobility Re- Reports relative movement trends over time by geography and across different
ports [13] categories of places (e.g., retail and recreation, groceries and pharmacies,
parks, transit stations, workplaces, and residential)
Apple Mobility Trends [4] Uses data from Apple maps to report a relative (to January 13th, 2020) volume
of directions requests per country/region, sub-region or city

Miscellaneous
County Adjacency File* [9] Lists each US county and its neighboring counties; from the US Census
Airline Origin and Destination Survey data including origin, destination, and itinerary details from a 10% sam-
Survey [3] ple of airline tickets in 2019; from the Bureau of Transportation Statistics
County Presidential Data [25] County-level returns for presidential elections from 2000 to 2016 according to
official state election data records

Hospital-level
HIFLD Hospital Data [15] Includes number of ICU beds, location, and number of employees for US hos-
pitals; from Homeland Infrastructure Foundation-Level Data
Definitive Healthcare: USA Hospi- Provides data on number of licensed beds, staffed beds, ICU beds, and the bed
tal Beds [11] utilization rate for hospitals in the US
Centers for Medicare & Medicaid Reports the Case Mix Index (CMI) for each hospital, which is a measure of the
Services Case Mix Index File [6] diversity, clinical complexity, and resource needs of the hospital’s patients
Centers for Medicare & Medicaid Lists teaching hospitals along with address (2020)
Services Teaching Hospitals [1]

Table 1: COVID-19 Data Repository: Overview of county- and hospital-level data sets. Data sets marked
with an asterisk (*) were used in the predictors discussed in this work.

4
In Sections 3, 4 and 5, we primarily use the county-level case and death reports provided by USA
Facts from January 22, 2020 to May 10, 2020, along with some county-level demographics and health
data (the datasets marked with an asterisk (*) in Table 1).

3 Predictors for forecasting short-term death counts


As a first step, we provide a visualization of the COVID-19 outbreak across the United States in Figure 2.
We plot (a) the cumulative recorded death counts due to COVID-19 up to May 10 (top panel), and (b)
the new death counts from May 1 to May 10 (bottom panel). Each bubble denotes the count in a county,
a darker and larger bubble denotes a higher death count, and lack of bubble denotes that the count is
zero. The top panel captures the extent of the outbreak in a region, while the bottom panel captures
the recent trends in the outbreak. The color scale in the two plots is different to better illustrate the
respective counts in each plot, but we keep the scales for the bubble size the same to provide a comparison
between the extent and recent trend of COVID-19. Overall, Figure 2 clearly shows that the COVID-19
outbreak in the United States is very dynamic both in time and across different regions. The worst-
affected regions include the states of New York, New Jersey, Massachusetts, Michigan, and parts of
Illinois, Florida, Louisiana, Georgia, Washington, and California. Moreover, the majority of these areas
continue to face a substantial COVID-19 burden in early May.
We fit several different statistical predictor models to capture the dynamic behavior of COVID-19
death counts. Since each method captures slightly different trends in the data, we also compute various
weighted combinations of these models. The five predictors we introduce in this paper are:

1. A separate-county exponential predictor (the “separate” predictors): a series of pre-


dictors built for each county using only data from that county, used to predict deaths in that
county.

2. A shared-county exponential predictor (the “shared” predictor): a single predictor built


using data from all counties, used to predict death counts for individual counties.

3. An expanded shared-county exponential predictor (the “expanded shared” predictor):


a predictor similar to the shared-county exponential predictor, but also includes COVID-19 case
numbers and neighboring county cases and deaths as predictive features.

4. A demographics shared-county exponential predictor (the “demographics shared” pre-


dictor): a predictor also similar to the shared-county exponential predictor, but also includes
various county demographic and health-related predictive features.

5. A separate-county linear predictor (the “linear” predictor): a predictor similar to the


separate county exponential predictors, but uses a simple linear format, rather than the exponential
format.

An overview of these predictors is presented in Table 2. In order to combine the different trends cap-
tured by each of these predictors, we also fit various combinations of them, which we refer to as Combined
Linear and Exponential Predictors (CLEP). CLEP produces a weighted average of the predictions from
the individual predictors, where we borrow the weighting scheme from prior work [42]. In this weighting
scheme, a higher weight is given to those predictors with more accurate predictions, especially on recent
time points. In practice, we find that the CLEP that combines only the expanded shared predictor and

5
Figure 2: Visualization of the COVID-19 outbreak in the US. We depict the cumulative recorded death
counts up to May 10 in the top panel and newly recorded death counts for the period May 1-10 in the
bottom panel. Each bubble denotes the death count for a county (lack of bubble denotes a zero count).
The bubble size (area) is proportional to the death counts in the region. The two panels’ bubble sizes
are on the same scale, but the color scale is different as shown respectively on each plot. Kings County,
NY in the New York metropolitan area suffered the largest number of deaths overall (6003 deaths). The
areas most significantly affected over the course of the pandemic continued to face a massive burden even
in early May. For the period, May 1 to May 10, Cook County, IL in the Chicago metropolitan area saw
with the largest number of new deaths (644 deaths).
6
the linear predictor consistently has the best predictive performance. For the rest of this section, we
expand upon the individual predictor models and the weighting procedure for the CLEP ensembles.

Predictor name Type Fit separately to Fit jointly to all Use neighboring Use demograph-
each county? counties? counties? ics?
Separate Exponential
Shared Exponential
Expanded
Exponential
shared
Demographics
Exponential
shared
Linear Linear

Table 2: Overview of the 5 predictors used here. The best model is a combination of the expanded
shared predictor and the linear predictor (see Section 3.6).

3.1 The separate-county exponential predictors (the “separate” predictors)


The separate-county exponential predictor aims to capture the reported exponential growth of COVID-
19 deaths [37]. We approximate an exponential curve for death count separately for each county using
the most recent 5 days of data from that county. These predictors have the following form:

E(deathst | t) = eβ0 +β1 t , t = 1, . . . , 5, (1)

where t denotes the day, and we fit a separate predictor for each county. The coefficients β0 and β1 are fit
for each county using maximum likelihood estimation under a Poisson generalized linear model (GLM)
with t as the independent variable and deathst as the observed variable. If the first death in a county
occurred less than 5 days prior to fitting the predictor, only the days from the first death were used
for the fit. If there is only one day’s worth of data, we simply predict the most recent value for future
values. We also fit exponential predictors to the full time-series (as opposed to just the most recent 5
days) of available data for each county, but due to the rapidly shifting trends, these performed worse
than our 5-day predictors. We also found that predictors fit using 6 days of data yielded similar results
to predictors fit using 5 days of data, and using 4 days of data performed slightly worse.
To handle possible over dispersion of data (when the variance is larger than the mean), we also
explored estimating β0 , β1 by fitting a negative binomial regression model (in place of Poisson GLM)
with inverse-scale parameter taking values in {0.05, 0.15, 1}. However, we found that this approach yields
a larger mean absolute error than the Poisson GLM for counties with more than 10 deaths.

3.2 The shared-county exponential predictor (the “shared” predictor)


To incorporate additional data into our predictions, we fit a predictor that combines data across different
counties. Rather than producing a separate predictor model for each county (as in the separate predictor
approach above), we instead produce a single shared predictor that pools information from counties
across the nation. The shared predictor is then used to predict future deaths in the individual counties.
The data underlying the shared predictor is slightly different from the separate county predictors.
Instead of only including the most recent 5 days of data from each county, we include all days after the
third death in each county. Thus the data from many of the counties extend substantially further back

7
than 5 days. By using data that extends much further back, the early-stage data from counties that
are now much further along could inform the predictions for current earlier-stage counties. Instead of
basing the exponential predictor prediction on time t (as was the case for the separate predictors above),
we base the prediction on the (logarithm of the) previous day’s death count. This makes the counties
comparable since the outbreaks began at different time points in each county. The shared predictor is
given as follows:

E(deathst | t) = eβ0 +β1 log(deathst−1 +1) , (2)

where the coefficients β0 and β1 are fitted by maximizing the log-likelihood corresponding to Poisson
GLM (like that in the separate county predictor (1)).

3.3 The expanded shared predictor


Next, we expand the shared county exponential predictor to include other COVID-19 dynamic (time-
series) features. In particular, we include the number of confirmed cases in the county as this may give
an additional indication to the severity of an outbreak, as well as the number of confirmed deaths and
cases in neighboring counties. Let casest , neigh deathst , neigh casest respectively denote the number of
cases in the county at time t, the total number of deaths across all neighboring counties at time t, and
the total number of cases across all neighboring counties at time t. Then our (expanded) predictor to
predict the number of confirmed deaths k days into the future is given by

E[deathst |t] = eβ0 +β1 log(deathst−1 +1)+β2 log(casest−k +1)+β3 log(neigh deathst−k +1)+β4 log(neigh casest−k +1) , (3)

where the coefficients {βi }4i=0 are shared across all counties and are fitted using the Poisson GLM. When
fitting the predictor at time t, we use the death counts for the county up to t − 1, however we only use
the new features (cases in the current county, cases in neighboring counties, and deaths in neighboring
counties) up to time t − k. While predicting the death count for a given county k days into the future
(for time t + k), we iteratively use the daily sequential predictions for the death counts for that county,
and use the information for the other features only up to time t (the time up to which we have data
available)4 . It may be possible to jointly predict the new features along with the number of deaths, but
we leave this to future work.
For this predictor, we found it beneficial to implement feature scaling and regularization. We scaled
all features to have mean 0 and variance 1 and applied elastic net with an equal penalty on the `1 and `2
regularization terms. The regularization penalty of 0.01 was chosen through cross-validation on previous
days’ data.

3.4 The demographics shared predictor


The demographics shared county exponential predictor (the “demographics shared” predictor) is again
very similar to the shared predictor. However, it includes several static county demographic and
healthcare-related features to address the fact that some counties will be affected more severely than
others, for instance due to (a) their population makeup, e.g., older populations are likely to experience a
4 More \ t+1 using (deathst , casest−k+1 , neigh deathst−k+1 , neigh casest−k ). Then, for
precisely, first we estimate deaths
j = 1, 2, . . . , k − 1, we recursively plug-in (deaths
\ t+j , casest−k+j+1 , neigh deathst−k+j+1 , neigh casest−k+j+1 ) in equa-
\ \ t+k for k-days ahead.
tion (3) to estimate deathst+j+1 , and finally obtain an estimate deaths

8
higher death rate than younger populations, (b) their hospital preparedness, e.g., if a county has very few
ICU beds relative to their population, they might experience a higher death rate since the number of ICU
beds is correlated strongly (0.96) with the number of ventilators [41], and (c) their population health,
e.g., age, smoking history, diabetes, cardiovascular disease, and respiratory diseases are all considered to
be likely risk factors for acute COVID-19 infection [30, 40, 31, 29, 46].
For a county c, given a set of demographic and healthcare-related features dc1 , . . . , dcm (such as median
age, population density, or number of ICU beds), the demographics shared predictor is given by

c c
E[deathst |t, c] = eβ1 log(deathst−1 +1)+β0 +βd1 d1 +···+βdm dm , (4)

where the coefficients {β0 , β1 , βd1 , . . . , βdm } are fitted by maximizing the log-likelihood of the correspond-
ing Poisson generalized linear model. The features we choose fall into three categories:
1. County density and size: population density per square mile (2010), population estimate (2018)
2. County healthcare resources: number of hospitals (2018-2019), number of ICU beds (2018-2019)
3. County health demographics: median age (2010), percentage of the population who are smokers (2017),
percentage of the population with diabetes (2016), deaths due to respiratory diseases per 100,000 (2017),
deaths due to heart diseases per 100,000 (2014-2016).

3.5 The separate county linear predictor (the “separate linear” predictor)
We also fit a linear version of the separate county predictors based on the most recent 4 days of data in
each county. The motivation for the linear model is that some counties are now exhibiting sub-exponential
growth. For these counties, the exponential predictors introduced in the previous section may not be a
good fit to the data. The separate linear predictors are given by

E[deathst |t] = β0 + β1 t, (5)

where we fit the coefficients β0 and β1 using ordinary least squares. In the following section, we introduce
the Combined Linear and Exponential Predictor (CLEP), which incorporates the abilities of our expo-
nential predictors (to deal with exponential trends) and linear predictor (to deal with sub-exponential
trends). In practice, we found that combining the expanded shared predictor and the linear predictor has
the best predictive performance.

3.6 The combined predictors: CLEP


Finally, we consider various combinations of the five predictors we have introduced using an ensemble
approach similar to that described in [42]. The Combined Linear and Exponential Predictors (CLEPs)
are developed as follows.
Let us first consider the procedure for generating a combined predictor for any two of our predictors.
1 2
Let ybt+k and ybt+k be the predictions of (cumulative) deaths by day t + k made on day t by the two
predictors that we can index arbitrarily by predictor 1 and 2. Note that on day t we only have access
to complete confirmed cases and recorded deaths data up to day t − 1, because recorded deaths and
confirmed cases are not fully updated until the end of the day. The prediction of the combined estimates
of deaths by day t + k can be written as

combined
ybt+k = wt1 ybt+k
1
+ wt2 ybt+k
2
, (6)

9
where wt1 ≥ 0 and wt2 ≥ 0 represent the weights of the first and second predictors respectively, and
wt1 + wt2 = 1. We select weights for the two predictors based on their past predictive performance,
using an exponential decay term (a function of t). As a result, more recent predictive performance has
more influence on the weight term than less recent performance. Let ybim (where m = 1, 2) denote the
predicted number of deaths from predictor m for day i, yi denote the recorded deaths for day i, and
yim , yi ) denote a loss function (used for measuring predictive performance). Then following [42], the
`(b
exponential weighting term wtm for predictor m applied on day t is given by

t−1
!
X
wtm ∝ exp −c(1 − µ) µt−i `(b
yim , yi ) , (7)
i=t0

where µ ∈ (0, 1) and c > 0 are tuning parameters, t0 represents some past time point, and t represents
the day on which the prediction is calculated. Since µ < 1, the µt−i term represents the greater influence
given to more recent predictive performance. Note that the loss terms `(b yim , yi ) used in the weights are
calculated based on the 3-day-ahead predictions generated over the course of a week starting with the
predictor built 11 days ago (for predicting counts 8 days ago) up to the predictor built 4 days ago (for
predicting yesterday’s counts). We found that using the 3-day predictive performance of each model
across the past week in the weights performed well.
yim , yi ) = |b
In [42], the authors choose `(b yim − yi | as their loss function, since their errors roughly had
a Laplacian distribution. In our case, we found that using this loss function led to vanishing weights
due to the heavy-tailed nature of our error distribution. To help address this, we apply a logarithm to
the predictions and the true values, and define `(b yim , yi ) = | log(1 + ybim ) − log(1 + yi )|, where we add a
one inside the logarithm to handle potential zero values. We found that this transformation improved
performance in practice.
To generate our predictions, we use the default value of c in [42] which is 1. However, we change the
value of µ from the default of 0.9 to 0.5 for two reasons: (i) we found µ = 0.5 yielded better empirical
performance, and (ii) it ensured that performance more than a week ago had little influence over the
predictor. We chose t0 = t − 7 (i.e., we aggregate the predictions of the past week into the weight term),
since we found that performance did not improve by extending further back than 7 days. (Moreoever,the
information from more than a week effectively has a vanishing effect due to our choice of µ.) Thus, in
practice, we used weights for predictor m of the form:

t−1
!
X
wtm ∝ exp −0.5 (0.5) t−i
|log(1 + ybim ) − log(1 + yi )| , (8)
i=t−7

where ybim is the 3-day ahead prediction from the predictor m trained on data till time i − 3. We compute
these weights separately for each county.
Our CLEP ensemble approach can be easily extended to more than two predictors. Given M predic-
PM
tors, we compute each weight wtm in the same way, and then normalize them so that m=1 wtm = 1.

4 Prediction Intervals via Conformal Inference


Accurate assessment of the uncertainty of forecasts is necessary to help determine how much emphasis to
put on the predictions, for instance, when making policy decisions. As such, the next goal of the paper
is to quantify the uncertainty of our predictions by creating prediction intervals. A common method to
do so involves constructing (probabilistic) model-based confidence intervals, which rely heavily on the

10
probabilistic assumptions made about the data. However, due to the highly dynamic nature of COVID-
19, assumptions on the distribution of death and case rate are challenging to check. Moreover, such
prediction intervals based on probability models are likely to be invalid when the underlying probability
model does not hold to the desired extent. For instance, a recent study [35] reported that the 95%
uncertainty credible intervals for state-level daily mortality predicted by the initial IHME model [22],
had a coverage of a mere 27% to 51% of recorded death counts over March 29 to April 2. The authors
of the IHME model noted this behavior, and have since updated their uncertainty intervals so that they
now provide more than 95% coverage (where coverage is defined below in equation (10a)). However,
while the previous releases of the intervals were based on asymptotic confidence intervals, the IHME
authors have not precisely described the methodology for their more recent intervals.

4.1 Maximum-absolute-Error Prediction Interval (MEPI)


We now introduce a generic method to construct prediction intervals for sequential data. In particular,
we build on the ideas from conformal inference [44] and make use of the past errors made by a predictor
to estimate the uncertainty for its future predictions.
To construct prediction intervals for county-level cumulative death counts caused by COVID-19, we
calculate the largest (normalized absolute) error for the death count predictions generated over the past
5 days for the county of interest and use this value (the “maximum absolute error”) to create an interval
surrounding the future (e.g., tomorrow’s) prediction. We call this interval the Maximum absolute Error
Prediction Interval (MEPI).
Mathematically, let yt be the actual recorded cumulative deaths for day t, and ybt denote the estimate
for yt made k days earlier, i.e., on day t − k, by a prediction algorithm. We call ybt the k-day-ahead
prediction for day t. We define the normalized absolute error, ∆t , of the prediction, ybt , to be

|yt − ybt |
∆t := .
|b
yt |

We use the normalization so that yt is equal to either ybt (1 − ∆t ) or ybt (1 + ∆t ). This normalization
addresses the fact that the counts are increasing over time, and thus the un-normalized errors, |yt − ybt |,
also tend to be increasing over time. The normalization ensures that the errors across time are comparable
in magnitude, which is essential for the exchangeability of the errors (see Section 4.3).
To compute the k-day-ahead prediction interval for day t + k, to be computed on day t, we first
compute the k-day-ahead prediction ybt+k using a CLEP. Next, we compute the normalized errors for the
most recent 5 days ∆t , ∆t−1 , ..., ∆t−4 (5 days was chosen to balance the trade-off between coverage and
length, see Appendix A.2 for more details). The largest of these normalized errors is then used to define
the maximum absolute error prediction intervals (MEPI) for the k-day-ahead prediction as follows:
  
ct+k := max ybt+k (1 − ∆max ), yt , ybt+k (1 + ∆max ) ,
PI (9a)
where ∆max := max ∆t−j . (9b)
0≤j≤4

We construct these intervals separately for each county. The lower bound includes a maxima calcu-
lation to account for the fact that yt is a cumulative count, and thereby non-decreasing. This maxima
calculation ensures that the lower bound for the interval is not smaller than the last observed value.

11
4.2 Evaluation metrics
A good prediction interval should both contain the true value most of the time (have good coverage)
and have a reasonable width/length.5 Indeed, one can trivially create very wide prediction intervals that
would always contain the target of interest. We thus consider two metrics to measure the performance
of prediction intervals: coverage and normalized length.
Let yt denote a positive real-valued time-series of interest, which in this case is the target variable:
COVID-19 deaths (t denotes the time index). Let {PI ct = [at , bt ]} denote the sequence of prediction
intervals produced by an algorithm. The coverage of this prediction interval over a specified period,
Coverage(T ), denotes the fraction of days in this period for which the prediction interval contained the
observed cumulative death counts, for that county. This notion of coverage for streaming data has been
used extensively in prior works on conformal inference [44] and can be calculated for a given evaluation
period T (which we set to be from April 11 to May 10) as follows:

1 X
Coverage(T ) = I(yt ∈ PI
ct ). (10a)
|T |
t∈T

The average normalized length of the prediction intervals, NL(T ), is calculated as follows:

1 X bt − at
NL(T ) = . (10b)
|T | yt
t∈T

In practice, we replace the denominator on the RHS of expression (10b) with max{1, yt } to avoid possible
division by 0.
Importantly, the definitions of coverage (10a) and the average length (10b) are entirely data-driven
and do not rely on any probabilistic or generative modeling assumptions.

4.3 Exchangeability of the errors


While the ideas from MEPI are a special case of conformal prediction intervals [44, 43], there are some
key differences. Where conformal inference uses the raw errors in predictions, MEPI uses the normalized
errors, and where conformal inference uses a percentile (e.g., the 95th percentile) of the errors, MEPI uses
the maximum. Furthermore, we only make use of the previous five days instead of the full sequence of
errors. The reason behind these alternate choices is because the validity of prediction intervals constructed
in this manner relies crucially on the assumption that the sequence of errors is exchangeable. Our
choices are designed to make this assumption more reasonable. Due to the dynamic nature of COVID-
19, considering a longer period (e.g., substantially longer than five days) would mean that it is less
likely that the errors across the different days are exchangeable. Meanwhile, the normalization of the
errors eliminates a potential source of non-exchangeability by removing the sequential growth of the
errors resulting from the increasing nature of the counts themselves. Since we only use 5-time points to
construct the interval, the 95th percentile can be rounded up to the maximum.
Figure 3 provides empirical evidence that the exchangeability of the normalized residuals/errors under
our 5-day window is indeed reasonable. We rank the errors {∆t+5 , ∆t , ∆t−1 , . . . , ∆t−4 } in increasing order
so that the largest error has a rank of 6. If the errors were exchangeable, then for each of them, the
rank has a uniform distribution on {1, 2, 3, 4, 5, 6}, and in particular has a mean of 3.5. To approximate
this numerically, we measure the rank of the errors ∆t+k and ∆t−j , j = 0, . . . , 4, for each day t between
5 We use the term width and length for an interval interchangeably in this paper.

12
March 27 to April 10, and take an average. Figure 3 plots the results for each of the 6 worst hit counties
as well as 6 randomly selected counties (see Section 5.2 for more discussion on these counties).

Kings County, NY Queens County, NY Bronx County, NY


1.0
Average rank of normalized error 4 4 4
0.8
2 2 2
0.60 0 0
New York County, NY Cook County, IL Wayne County, MI
0.4
4 4 4
0.22 2 2
0.00 0 0
0.0t 4 t 3 t 2 t 0.2
1 t t+5 t 40.4t 3 t 2 t 1 t0.6 t + 5 t 4 t 0.8
3 t 2 t 1 t t +1.0
5
Error
(a) Six worst-affected counties

Suffolk County, NY Bergen County, NJ Oakland County, MI


1.0
Average rank of normalized error

4 4 4
0.8
2 2 2
0.60 0 0
Monmouth County, NJ Broward County, FL Dougherty County, GA
0.4
4 4 4
0.22 2 2
0.00 0 0
0.0t 4 t 3 t 2 t 0.2
1 t t+5 t 40.4t 3 t 2 t 1 t0.6 t + 5 t 4 t 0.8
3 t 2 t 1 t t +1.0
5
Error
(b) Six randomly-selected counties

Figure 3: Average rank of the six errors {∆t+5 , ∆t , ∆t−1 , . . . , ∆t−4 } of our CLEP (with the expanded
shared and linear predictors), computed over t = March 27, . . . , April 10, for six worst affected counties
(top panel) and six random counties (bottom panel). We rank the errors {∆t+5 , ∆t , ∆t−1 , . . . , ∆t−4 } in
increasing order so that the largest error has a rank of 6. If {∆t+5 , ∆t , ∆t−1 , . . . , ∆t−4 } are exchangeable
for any day t, then the expected average rank for each of the six errors would be 3.5 (dashed black line).

See Appendix A.2 for more details on these choices and some theoretical guarantees for coverage.

5 Results
In this paper, we focused on short-term (up to 7 days) predictive accuracy. In this section, we first
present and compare the results of our various predictors, and then give further examinations of the best
performing predictor: the CLEP ensemble predictor with the expanded shared predictor and the linear
predictors. Finally, we report the performance of the coverage and length of the MEPIs for this CLEP.

13
5.1 Empirical performance of various predictors
Table 3 summarizes the Mean Absolute Errors (MAEs) of our predictions for cumulative recorded deaths
using both raw and log-scale. Each row in Table 3 corresponds to a single predictor, and we report
different statistics of errors made for k-day-ahead predictions for k ∈ {3, 5, 7} over the period t ∈ T =
{March 22, . . . , May 10}. For a given day t, let Ct , t ∈ T denote the collection of counties in US that have
at least 10 cumulative deaths, and let ybtc and ytc denote the predicted and recorded cumulative death
count of county c ∈ Ct on day t. Then the raw and log-scale MAE on day t are

1 X c
Raw-scale MAEt = yt − ytc |, and
|b (11a)
|Ct |
c∈Ct
1 X
Log-scale MAEt = ytc + 1) − log(ytc + 1)|.
| log(b (11b)
|Ct |
c∈Ct

For any given predictor, we compute these errors for each day and report the 10th percentile (p10), 50th
percentile (median), and 90th percentile (p90) values over the period T . From the table, we find that
the CLEP ensemble that combines the expanded shared exponential predictor and the separate county
linear predictors has the best overall performance.

3-day-ahead prediction 5-day-ahead prediction 7-day-ahead prediction


p10 median p90 p10 median p90 p10 median p90
separate 8.21 16.18 29.08 9.13 19.96 53.77 10.15 27.35 110.89
shared 11.55 15.05 21.44 12.29 19.76 36.74 12.52 23.33 59.25
demographics 15.22 42.09 49.15 17.69 46.33 103.04 21.02 59.70 233.13
expanded shared 9.73 12.06 16.24 10.12 15.45 24.29 10.25 18.62 38.18
linear 6.43 9.58 15.44 7.46 12.07 20.44 7.85 14.87 26.08
CLEP (expanded +linear) 5.84 9.25 15.56 6.65 11.05 21.40 7.07 13.55 25.91

(a) Summary statistics of raw scale MAE

3-day-ahead prediction 5-day-ahead prediction 7-day-ahead prediction


p10 median p90 p10 median p90 p10 median p90
separate 0.11 0.20 0.62 0.13 0.28 0.95 0.14 0.35 1.34
shared 0.11 0.17 0.31 0.12 0.21 0.40 0.13 0.26 0.49
demographics 0.16 0.20 0.46 0.17 0.27 0.58 0.18 0.32 0.88
expanded shared 0.09 0.14 0.31 0.10 0.17 0.44 0.11 0.20 0.55
linear 0.10 0.15 0.49 0.11 0.19 0.68 0.12 0.22 1.08
CLEP (expanded +linear) 0.08 0.13 0.33 0.09 0.16 0.44 0.10 0.19 0.52

(b) Summary statistics of log scale MAE

Table 3: Summary statistics of Mean Absolute Errors (Top: raw scale; Bottom: log scale) of our
predictors for cumulative death counts in the past 50 days (March 22, 2020 to May 10, 2020). “p10”,
“median”, and “p90” denote the 10th-percentile, median, and 90th-percentile of the 50 mean absolute
errors computed daily from March 22, 2020 to May 10, 2020. “k-day-ahead prediction” (k = 3, 5, 7)
indicates predictions made k days ahead of each day from March 22, 2020 to May 10, 2020. The smallest
error in each column is displayed in bold.

In Figure 4, we compare the raw scale and log scale MAEs as a function of time over the past 50
days for the expanded shared exponential predictor, the separate county linear predictor, and the CLEP
that combines the two. We found that the MAEs of the CLEP are often similar to, and usually slightly
smaller than the smaller MAE of the two single predictors. Putting together the results from Table 3
and Figure 4, we find that the adaptive combination used for building our ensemble predictor CLEP is

14
able to leverage the advantages of linear and exponential predictors, and improves upon the MAE of
single predictors.

linear

Raw scale MAE


60

Log scale MAE


1.5 expanded shared
CLEP
40 1.0
20 0.5

2
9
4/5
2
9
6
5/3
0
2
9
4/5

2
9
6
5/3

3/2
3/2

4/1
4/1
4/2

5/1
3/2
3/2

4/1
4/1
4/2

5/1
Date Date

Figure 4: Plots of mean absolute error (MAE) for different predictors over the last 50 days, March 22
to May 10. We plot the raw scale MAE in the left panel, and the log-scale MAE in the right panel.
Results are shown for expanded shared exponential predictor (orange line), the separate county linear
predictor (red line), and the CLEP that combines the two predictors (blue line). The exponential model
performed well early in the outbreak (March through early April), and the linear predictor seems to be
doing better since mid-April. We can see that CLEP is able to adaptively combine these two predictors
and obtain the best performance overall.

5.2 County-wise visualizations


We now provide several visualizations of our 5-day-ahead CLEP predictions (based on the best-performing
CLEP of the expanded shared and linear separate predictor models) and corresponding MEPIs for certain
counties, for the period April 11 to May 10. Since there are over 3,000 counties in the United States, we
present results for two sets of counties:
• Six worst-affected counties: Queens County, NY; Kings County, NY; Bronx County, NY; New York
County, NY; Cook County, IL; and Wayne County, MI.
• Six randomly-selected counties: Suffolk County, NY; Bergen County, NY; Oakland County, MI;
Monmouth County, NJ; Broward County, FL; Dougherty County, GA.
In Figure 5, we present the results for the worst-affected counties in the top panel (a), and for the
randomly selected counties in the bottom panel (b). The solid black line denotes the recorded death
counts, dashed blue line denotes the CLEP 5-day-ahead predictions, and shaded blue region denotes the
corresponding MEPI (prediction interval). The predictions and prediction intervals for a given day t (t =
April 11, . . . , May 10) are based on data up to day t − 5. Since the predictions are updated daily to
reflect latest trends in cumulative deaths for each county, we do not expect the predictions (the dashed
blue lines) to be monotonically increasing over time.
From Figure 5(a), we observe that, among the worst-affected counties, the CLEP appears to fit the
data very well for Cook County, IL, and Wayne County, MI. However, our predictor greatly over-predicts
the number of deaths in four NY counties on April 19, 20, and 21 (the predictions were made on April
14, 15, and 16, respectively). On April 14, New York City reported 3,778 additional deaths that have
occurred since March 11 and have been classified as “probable”. This change led to a sharp uptick of
recorded cumulative deaths in Kings, Queens, Bronx, and New York County on April 14. Due to this
reporting lag, our CLEP predicted that the four counties were experiencing exponential growth when

15
they were actually experiencing linear growth. As a result, it greatly over-predicts the number of recorded
deaths in the following days.
From Figure 5(b), we find that our predictors and MEPI perform well for five out of the six the
randomly counties (Broward County FL, Dougherty County GA, Monmouth County NJ, Bergen County
NJ, and Oakland County MI). However, the performance is slightly worse for Suffolk County, NY, due
to a sudden uptick in cumulative deaths on May 5.

Kings County, NY Queens County, NY Bronx County, NY


1.0 8000
7500
6000 4000
5000
0.8
4000
Recorded deaths 2000
Cumulative deaths

2500 5-day predictions


Prediction intervals 2000
0.6
4/11 4/18 4/25 5/2 5/9 4/11 4/18 4/25 5/2 5/9 4/11 4/18 4/25 5/2 5/9
New York County, NY Cook County, IL Wayne County, MI
0.4
3000
2000 2000
2000
0.2
1000 1000
1000
0.0
0.0
4/11 4/18 4/25 0.2
5/2 5/9 4/11 0.4 4/18 4/25 5/2 0.65/9 4/11 0.8 4/25
4/18 5/2 5/91.0
Date
(a) Worst-affected counties

Suffolk County, NY Bergen County, NJ Oakland County, MI


1.0 2000
1500
750
1000
0.8 1000 500
Recorded deaths
Cumulative deaths

500 5-day predictions


Prediction intervals 250
0.6
4/11 4/18 4/25 5/2 5/9 4/11 4/18 4/25 5/2 5/9 4/11 4/18 4/25 5/2 5/9
Monmouth County, NJ Broward County, FL Dougherty County, GA
0.4 300 150
400
200
0.2 100
200
100
0.0 50
0.0
4/11 4/18 4/25 0.2
5/2 5/9 4/11 0.4 4/18 4/25 5/2 0.65/9 4/11 0.8 4/25
4/18 5/2 5/91.0
Date
(b) Randomly-selected counties

Figure 5: A grid of line charts displaying the performance of CLEP and MEPI for the cumulative death
counts due to COVID-19 as a function of time in the past Month (April 11 - May 10). The observed data
is shown in black, 5-day-ahead CLEP predictions are shown in the dashed blue, and the corresponding
5-day-ahead MEPIs are shown as shaded blue regions. The prediction and prediction intervals for day t
(t = April 11, . . . , May 10) are based on data up to day t − 5.

16
5.3 Empirical performance of MEPI
We now present the performance of our MEPI at the county level for cumulative death counts, with
respect to both coverage (10a) and average normalized length (10b) (see Section 4.2). Since the average
performance may change with time, we report the results for two time periods: the past month—T1 =
{April 11, April 12, . . . , May 10}, and the past 15 days—T2 = {April 26, . . . , May 10}. We evaluate the
5-day-ahead MEPIs, i.e., k = 5 in equation (9a), designed with the CLEP that combines the expanded
shared and separate linear predictors. We summarize the results in Figure 6 and discuss details below.
Coverage: In panels (a) and (c) of Figure 6, we plot the histogram of observed coverage (10a) across all
counties in the US for the recent month (T1 ) and recent 15 days (T2 ) respectively. Panel (e) of Figure 6
shows the observed coverage for the 447 counties that had at least 10 deaths on May 1 (since our time
period ends on May 10, we want at least 10 days of 10+ deaths) and is based on the time period for each
county starting from either April 11 or the day of 10 deaths (if this day is after April 11) until May 10.
The median number of days since 10 deaths is 27. From these plots, we observe that for the majority of
the counties, we achieve excellent coverage.
For the recent month, the average observed coverage across all counties over the past month (April 11
to May 10) is 94.5%. There is not a large difference in coverage when comparing the longer and shorter
time periods (Figure 6(a) and (c)), but the coverage decreases slightly when restricting to counties with
at least 10 deaths (median of observed coverage for these counties is 87%). This makes sense for counties
that had zero or very few deaths as for such counties, the prediction interval has very good coverage.
For the counties with poor coverage, we show in Appendix A.1 that there is usually a sharp uptick
in the number of recorded deaths on someday, possibly due to recording errors, or backlogs of counts.
Modeling these upticks and obtaining coverage for such events is beyond the scope of this paper.

Normalized length: In panels (b) and (d) of Figure 6, we plot the histogram of observed average
normalized length (10b) across different counties in the US for the recent month (T1 ) and recent 15 days
(T2 ) respectively. Panel (f) covers the same counties as did Panel (e): those with at least 10 deaths in
the relevant time period.
Recall that the normalized length is defined as the length of the MEPI over the recorded number of
deaths (10b). And, more than 70% of counties in the US recorded 2 or less COVID-19 deaths by May
1. For these counties, having a normalized length of 2 means the actual length of the prediction interval
is 4 (or less). And thus, it is not surprising to see that the average normalized length of MEPI for a
non-trivial fraction of counties is larger than 2 in panels (b) and (d).
When considering counties with at least 10 deaths in panel (f), the average normalized length over
these (county-specific) periods is much smaller. The median of the average normalized length of these
counties is 0.58. Overall, Figure 6 shows that our MEPIs provide a reasonable balance between coverage
and length, especially when the cumulative counts are not too small.

6 Related Work
Several recent works have tried to predict the number of cases and deaths related to COVID-19. Even
more recently, the Center for Disease Control and Prevention (CDC) has started aggregating results
from several models.6 But to the best of our knowledge, all of these predictions are either made at the
country level or the state level, and none of these works have predicted deaths and cases at the county
6 Forecasts available at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html

17
2000
1250
1500 1000
Count

Count
1000 750
500
500
250
0 0
0 20 40 60 80 100 1 2 3 4
Coverage % Average normalized length
(a) Evaluation period: April 11–May 10 (b) Evaluation period: April 11–May 10

2000
1200
1500 1000
800
Count

Count
1000 600
400
500
200
0 0
0 20 40 60 80 100 1 2 3 4
Coverage % Average normalized length
(c) Evaluation period: April 26–May 10 (d) Evaluation period: April 26–May 10

200
80
150
60
Count

Count

100
40
20 50

0 0
0 20 40 60 80 100 1 2 3 4
Coverage % Average normalized length
(e) Coverage for selected counties (f) Average length for selected counties

Figure 6: Plots showing the performance offor 5-day-ahead MEPI intervals for county-level cumulative
death counts. We plot histogram of observed coverage and average normalized length for a few settings.
For the top two panels (a and b) we compute the histogram for all the counties over the recent month
(April 11–May 10), and for the middle two panels (c and d) over the recent 15 days (April 26–May 10).
For the bottom two panels (e and f), we only include counties that have had at least 10 deaths on May
1, and compute the histogram over county-specific period where we only include those days for which a
county’s cumulative deaths is at least 10. Overall these plots suggest that the MEPI intervals provide a
large coverage while being reasonably narrow. Refer to the text for further discussion.

18
level. In addition, directly comparing other models’ forecasting results to our own can be difficult for
several other reasons: (1) the predictors mostly make strong assumptions and typically do not involve
data-fitting, (2) we do not have access to a direct implementation of their predictors (or results), and
(3) their predictors focus on substantially longer time horizons. Keeping these points in mind, we now
provide a brief summary of recent work on predictive modeling for COVID-19.
Two recent works [36] and [28] have modeled the death counts at the state level in the US. The earlier
versions of the Institute for Health Metrics and Evaluation (IHME) [36] model was based on Farr’s Law
with feedback from Wuhan data; the Imperial College predictor [28] uses an individual-based simulation
predictor with parameters chosen based on prior knowledge. On the topic of Farr’s Law, we note that a
1990 paper [24] used Farr’s Law to predict that the total cases from the AIDS pandemic would diminish
by the mid-1990s and the total number of cases would be around 200,000 in the entire lifetime of the
AIDS pandemic. It is now estimated that 32 million people have died from the disease so far. While the
AIDS pandemic is very different from the COVID-19 pandemic, it is still useful to keep this historical
performance in mind.
Another approach uses exponential smoothing from time-series predictors to estimate day-level COVID-
19 cases [26]. In addition, several works use compartment epidemiological predictors such as SIR, SEIR,
and SIRD [27, 39, 23] to provide simulations at the national level. Other works [38, 32] simulate the
effect of social distancing policies either in the future for the US, or in a retrospective manner for China.
Finally, several papers estimate epidemiological parameters retrospectively based on data from China
[45, 34].

7 Impact: a hospital-level severity index for distributing medi-


cal supplies
Together with the non-profit response4life7 , our models have been used to determine which hospitals
are most urgently in need of medical supplies, and have subsequently been directly involved in the
distribution of medical supplies across the county. To do this, we translate our forecasts into the COVID
pandemic severity index, which is a simple measure of the COVID-19 outbreak severity for each hospital.
To generate this hospital-level severity index, we divided the total county-level deaths among all
of the hospitals in the county proportional to their number of employees. Next, for each hospital, we
computed its percentile among all US hospitals with respect to total deaths so far and also with respect to
predicted new deaths in the next seven days. These two percentiles are then averaged to obtain a single
score for each hospital. Finally, this score is quantized evenly into three categories: low, medium, and
high severity. Evaluation and refinement of this index are ongoing, as more hospital-level data becomes
available.

8 Conclusion
In this paper, we made three key contributions. We (1) introduced a data repository containing COVID-
19-related information from a variety of public sources, (2) used this data to develop predictors for
short-term forecasting at the county level, and (3) introduced a novel yet simple method for producing
prediction intervals for these predictors. By focusing on county-level predictions, our forecasts are at
7 https://response4life.org/

19
a finer geographic resolution than those from other relevant studies. By comparing our predictions
to real observed data, we found that our predictions are accurate and that our predictions intervals
are reasonably narrow and yet provide good coverage. We hope that these results will be useful for
individuals, businesses, and policymakers to plan and cope with the COVID-19 pandemic and that our
data repository and forecasting and interval methodology will be useful for academic purposes. Indeed,
our results are already being used to determine the hospital-level need for medical supplies and has been
directly influential in determining the distribution of these supplies.
Our data repository will be useful both for educational purposes, as well as for other teams interested
in analyzing the data underlying COVID-19 pandemic. Our MEPI methodology can be applied to other
models for COVID-19 forecasting, as well as to time series analysis more broadly.

Acknowledgements
The authors would like to thank many people for help with this effort. This includes the response4Life
team and volunteers; Max Shen’s IEOR group at Berkeley: Shen Group team (IEOR): Junyu Cao, Shunan
Jiang, Pelagie Elimbi Moudio; Aaron Kornblith and David Jaffe for advice from a medical perspective,
and helpful input from many others including Ying Lu, Tina Eliassi-Rad, Jas Sekhon, Philip Stark, Jacob
Steinhardt, Nick Jewell, Valerie Isham, SriSatish Ambati, Rob Crockett, Marty Elisco, Valerie Karplus,
Marynia Kolak, Andreas Lange, Qinyun Lin, Suzanne Tamang, Brian Yandell, and Tarek Zohdi. Special
thanks to Sam Scarpino for sharing data with us.

References
[1] 2020 reporting cycle: Teaching hospital list. Centers for Medicares & Medicaid Ser-
vices (CMS). Accessed on 04-01-2020 at https://www.cms.gov/OpenPayments/Downloads/
2020-Reporting-Cycle-Teaching-Hospital-List-PDF-.pdf.
[2] Adult smoking data. County Health Rankings & Roadmaps. Accessed on 04-02-2020 at
https://www.countyhealthrankings.org/explore-health-rankings/measures-data-sources/
county-health-rankings-model/health-factors/health-behaviors/tobacco-use/adult-smoking.
[3] Airline origin and destination survey. US Department of Transportation, Bureau of Transportation Statistics.
Accessed on 04-20-2020 at https://transtats.bts.gov/Databases.asp?Mode_ID=1&Mode_Desc=Aviation&
Subject_ID2=0.
[4] Apple maps mobility trends reports. Apple Maps. Accessed on 05-15-2020 at https://www.apple.com/
covid19/mobility.
[5] Area health resources files. National Center for Health Workforce Analysis, Bureau of Health Workforce,
Health Resources and Services Administration. Accessed on 04-02-2020 at https://data.hrsa.gov/data/
download.
[6] Case mix index file. Centers for Medicares & Medicaid Services (CMS). Accessed on 04-
01-2020 at https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/
FY2020-IPPS-Final-Rule-Home-Page-Items/FY2020-IPPS-Final-Rule-Data-Files.
[7] CMS chronic conditions data (2017). Centers for Medicare & Medicaid Services (CMS).
Accessed on 04-02-2020 at https://www.cms.gov/Research-Statistics-Data-and-Systems/
Statistics-Trends-and-Reports/Chronic-Conditions/CC_Main.
[8] Compressed mortality file (2012-2016). National Center for Health Statistics (NCHS). Accessed on 04-02-
2020 at https://wonder.cdc.gov/cmf-icd10.html.
[9] County adjacency files. United States Census Bureau. Accessed on 05-15-2020 at https://www.census.gov/
geographies/reference-files/2010/geo/county-adjacency.html.
[10] COVID-19 deaths dataset. USA Facts. Accessed on 03-31-2020 at https://www.reuters.com/article/
us-health-coronavirus-who/covid-19-spread-map.

20
[11] Definitive healthcare: USA hospital beds. Definitive Healthcare. Accessed on
04-01-2020 at https://coronavirus-resources.esri.com/datasets/definitivehc::
definitive-healthcare-usa-hospital-beds.
[12] Diagnosed diabetes atlas. Centers for Disease Control and Prevention, Division of Diabetes Translation, US
Diabetes Surveillance System. Accessed on 04-02-2020 at https://www.cdc.gov/diabetes/data.
[13] Google COVID-19 community mobility reports. Google. Accessed on 05-15-2020 at https://www.google.
com/covid19/mobility/.
[14] Health professional shortage areas. Health Resources and Services Administration. Accessed on 04-04-2020
at https://data.hrsa.gov/data/download.
[15] HIFLD hospital data. Homeland Infrastructure Foundation-Level Data (HIFLD). Accessed on 04-02-2020 at
https://hifld-geoplatform.opendata.arcgis.com/datasets/6ac5e325468c4cb9b905f1728d6fbf0f_0.
[16] ICU beds by county. Kaiser Health News. Accessed on 04-02-2020 at https://khn.org/news/
as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/.
[17] Interactive atlas of heart disease and stroke. Centers for Disease Control and Prevention, Division for Heart
Disease and Stroke Prevention. Accessed on 04-02-2020 at https://www.cdc.gov/dhdsp/maps/atlas/index.
htm.
[18] Poverty estimates for the U.S., states, and counties, 2018. United States Department of Agriculture,
Economic Research Service. Accessed on 04-24-2020 at https://www.ers.usda.gov/data-products/
county-level-data-sets/download-data/.
[19] Social vulnerability index. Centers for Disease Control and Prevention, Geospatial Research, Analysis, and
Services Program. Accessed on 04-03-2020 at https://svi.cdc.gov/data-and-tools-download.html.
[20] The New York Times Data. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.
html. Accessed on 04-01-2020.
[21] United States chronic respiratory disease mortality rates by county 1980-2014. Institute for Health Metrics
and Evaluation (IHME). Accessed on 04-02-2020 at http://ghdx.healthdata.org/record/ihme-data/
united-states-chronic-respiratory-disease-mortality-rates-county-1980-2014.
[22] COVID-19: Whats New for April 5, 2020, 2020. http://www.healthdata.org/sites/default/files/
files/Projects/COVID/Estimation_update_040520_3.pdf, Last accessed on 2020-04-13.
[23] M. Becker and C. Chivers. Announcing CHIME, a tool for covid-19 capacity planning. Accessed on 04-02-
2020 at http://predictivehealthcare.pennmedicine.org/2020/03/14/accouncing-chime.html.
[24] D. J. Bregman and A. D. Langmuir. Farr’s Law Applied to AIDS Projections. JAMA, 263(11):1522–1525,
03 1990.
[25] M. E. Data and S. Lab. County Presidential Election Returns 2000-2016, 2018.
[26] H. H. Elmousalami and A. E. Hassanien. Day level forecasting for Coronavirus disease (COVID-19) spread:
Analysis, modeling and recommendations. arXiv preprint arXiv:2003.07778, 2020.
[27] D. Fanelli and F. Piazza. Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos,
Solitons & Fractals, 134:109761, 2020.
[28] N. Ferguson, D. Laydon, G. Nedjati-Gilani, N. Imai, K. Ainslie, M. Baguelin, S. Bhatia,
A. Boonyasiri, Z. Cucunubá, G. Cuomo-Dannenburg, et al. Impact of non-pharmaceutical inter-
ventions (NPIs) to reduce COVID19 mortality and healthcare demand, 2020. Accessed on 04-02-
2020 at https://spiral.imperial.ac.uk/bitstream/10044/1/77482/5/Imperial%20College%20COVID19%
20NPI%20modelling%2016-03-2020.pdf.
[29] K. J. Goh, S. Kalimuddin, and K. S. Chan. Rapid progression to acute respiratory distress syndrome: Review
of current understanding of critical illness from COVID-19 infection. Annals of the Academy of Medicine,
Singapore, 49(1):1, 2020.
[30] W. Guan, W. Liang, Y. Zhao, H. Liang, Z. Chen, Y. Li, X. Liu, R. Chen, C. Tang, T. Wang, et al.
Comorbidity and its impact on 1590 patients with COVID-19 in China: A nationwide analysis. European
Respiratory Journal, 2020.
[31] W. Guan, Z. Ni, Y. Hu, W. Liang, C. Ou, J. He, L. Liu, H. Shan, C. Lei, D. S. Hui, et al. Clinical
characteristics of Coronavirus disease 2019 in China. New England Journal of Medicine, 2020.

21
[32] S. Hsiang, D. Allen, S. Annan-Phan, K. Bell, I. Bolliger, T. Chong, H. Druckenmiller, A. Hultgren, L. Y.
Huang, E. Krasovich, P. Lau, J. Lee, E. Rolf, J. Tseng, and T. Wu. The effect of large-scale anti-contagion
policies on the Coronavirus (COVID-19) pandemic. medRxiv, 2020.
[33] B. D. Killeen, J. Y. Wu, K. Shah, A. Zapaishchykova, P. Nikutta, A. Tamhane, S. Chakraborty, J. Wei,
T. Gao, M. Thies, et al. A county-level dataset for informing the united states’ response to covid-19. arXiv
preprint arXiv:2004.00756, 2020.
[34] A. J. Kucharski, T. W. Russell, C. Diamond, Y. Liu, , J. Edmunds, S. Funk, and R. M. Eggo. Early dynamics
of transmission and control of COVID-19: A mathematical modelling study. medRxiv, 2020.
[35] R. Marchant, N. I. Samia, O. Rosen, M. A. Tanner, and S. Cripps. Learning as we go: An examination of
the statistical accuracy of covid19 daily death count predictions. arXiv preprint arXiv:2004.04734, 2020.
[36] C. J. Murray and I. H. M. E. COVID-19 health service utilization forecasting team. Forecasting COVID-
19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months.
medRxiv, 2020.
[37] S. Nebehay and K. Kelland. COVID-19 cases and deaths rising, debt relief needed for poorest nations: WHO.
Reuters. Accessed on 04-01-2020 at https://www.reuters.com/article/us-health-coronavirus-who/
covid-19-infections-growing-exponentially-deaths-nearing-50000-who-idUSKBN21J6IL?il=0.
[38] C. M. Peak, R. Kahn, Y. H. Grad, L. M. Childs, R. Li, M. Lipsitch, and C. O. Buckee. Modeling the
comparative impact of individual quarantine vs. active monitoring of contacts for the mitigation of COVID-
19. medRxiv, 2020.
[39] S. Pei and J. Shaman. Initial simulation of SARS-CoV2 spread and intervention effects in the continental
US. medRxiv, 2020.
[40] D. Qi, X. Yan, X. Tang, J. Peng, Q. Yu, L. Feng, G. Yuan, A. Zhang, Y. Chen, J. Yuan, X. Huang, X. Zhang,
P. Hu, Y. Song, C. Qian, Q. Sun, D. Wang, J. Tong, and J. Xiang. Epidemiological and clinical features of
2019-nCoV acute respiratory disease cases in Chongqing municipality, China: A retrospective, descriptive,
multiple-center study. medRxiv, 2020.
[41] L. Rubinson, F. Vaughn, S. Nelson, S. Giordano, T. Kallstrom, T. Buckley, T. Burney, N. Hupert, R. Mutter,
M. Handrigan, et al. Mechanical ventilators in US acute care hospitals. Disaster medicine and public health
preparedness, 4(3):199–206, 2010.
[42] G. D. Schuller, B. Yu, D. Huang, and B. Edler. Perceptual audio coding using adaptive pre-and post-filters
and lossless compression. IEEE Transactions on Speech and Audio Processing, 10(6):379–390, 2002.
[43] G. Shafer and V. Vovk. A tutorial on conformal prediction. Journal of Machine Learning Research,
9(Mar):371–421, 2008.
[44] V. Vovk, A. Gammerman, and G. Shafer. Algorithmic learning in a random world. Springer Science &
Business Media, 2005.
[45] C. Wang, L. Liu, X. Hao, H. Guo, Q. Wang, J. Huang, N. He, H. Yu, X. Lin, A. Pan, S. Wei, and T. Wu.
Evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of Coronavirus
disease 2019 in Wuhan, China. medRxiv, 2020.
[46] F. Zhou, T. Yu, R. Du, G. Fan, Y. Liu, Z. Liu, J. Xiang, Y. Wang, B. Song, X. Gu, et al. Clinical course
and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort
study. The Lancet, 2020.

A Further discussion on MEPI


We now provide a more detailed discussion on a couple of aspects of MEPI.

A.1 Counties with poor coverage


While Figure 6 shows that MEPI intervals achieve higher than 83% coverage for the vast majority of
counties over the April 11–May 10 period, there are also counties with coverage below the targeted level.
We provide a brief investigation of counties where the coverage of MEPIs for cumulative death counts
is below 0.8 in Figure 6(a). Among these counties, Figure 7 shows the cumulative deaths from April

22
11 to May 10 of the worst-affected 24 counties. We find that there is usually a sharp uptick in the
number of recorded deaths. However, a surge of deaths attributed to COVID-19 does not mean that
the deaths happen in a single day. For example, a reporting lag corrected April 19 resulted in 276 more
reported deaths in Philadelphia (top row, second panel from left). It is very difficult to construct efficient
prediction intervals that are valid even under such data corrections.

6000
Queens, NY Philadelphia, PA Norfolk, MA Marion, IN Delaware, PA Erie, NY
800 600 400 400
5000 300 300
4000 600 400 300
400 200 200
3000 200 200
200 100 100
2000
Bucks, PA Washington, DC Clark, NV Maricopa, AZ Providence, RI Lucas, OH
300 250 250 200
300 150
200 200 150
200 200 150 100
150 100
100 100 100 50 50
100
50
Northampton, PA Monroe, NY Pima, AZ Lake, IN 125
Lehigh, PA Dutchess, NY
150 125 125 125 100
100 100 80
100 100 100
75 75 60
75 75
50 50
50 50 50 40
25 25
25 20
Kane, IL Johnson, IN Albany, NY Alameda, CA San Juan, NM Stark, OH
80 80 60 60
60 60
60 60 40 40
40 40 40
40 20
20 20
20 20 20 0
1
8
5
5/2
5/9

1
8
5
5/2
5/9

1
8
5
5/2
5/9

1
8
5
5/2
5/9

1
8
5
5/2
5/9

1
8
5
5/2
5/9
4/1
4/1
4/2

4/1
4/1
4/2

4/1
4/1
4/2

4/1
4/1
4/2

4/1
4/1
4/2

4/1
4/1
4/2
Figure 7: Overview of counties where the coverage of 5-day-ahead MEPIs for cumulative deaths count
is below 0.8 (in Figure 6(a)). Among these counties, we choose the 24 worst-affected counties, and for
each of the 24 counties, we plot its cumulative deaths from April 11 to May 10. For most of the counties
shown here, there is a sharp uptick in the number of deaths.

A.2 MEPI vs conformal interence


Recall that MEPI (9a) can be viewed as a special case of conformal prediction interval [44, 43], We
now provide further discussion on this connection and discuss under what assumptions the MEPI would
achieve good coverage.
A general recipe in conformal inference with streaming data is to compute the past several errors of
the prediction model and use an s-percentile value for some suitable s (e.g., s = 95) to construct the
prediction interval for the future observation. At a high-level, conformal prediction intervals rely simply
on the assumption that the sequence of errors is exchangeable. Roughly speaking, the proof proceeds
as follows: The exchangeability of the residuals ensures that the rank of the future residual is uniform
in the pool of the residuals. Hence the probability of the future residual being in top s-percentile is no
larger than s, thereby obtaining the promised s%-coverage. For more details, we refer the reader to the
excellent tutorial [43] and the book [44].
The MEPI scheme deviates from the general conformal recipe in two ways. We compute a maximum
over the past 5 days for the normalized errors (instead of 95-percentile threshold over the entire past for
the unnormalized errors). Loosely speaking, both our choices—of normalized errors and looking at only
past 5 errors—try to make the errors more exchangeable. Given the dynamic nature of COVID-19 and

23
our adaptive predictors, the prediction errors do not remain exchangeable over a long period. Moreover,
given that we take 5 data points to bound the future error, computing a maximum over them is a more
conservative choice (e.g., when compared to taking median or a percentile-based cut-off).8

Normalized vs unnormalized errors: We now provide some numerical evidence to support our
choice of normalized errors over the `1 -errors to define the MEPI. Figure 8(a) shows the (unnormalized)
`1 errors |b
yt −yt | of our CLEP for the six worst-affected counties over the same period as Figure 3 (March
27–April 10). We found that in all of these counties, the `1 errors on days t−4, t−3, t−2, t−1, t and t+5
do not appear to be exchangeable. Recall that under exchangeability conditions, the expected average
rank of each of these six `1 errors would be 3.5. However, for all six counties, the average rank of the
absolute error on day t + 5 is larger than 4. This indicates that the future absolute error tends to be
higher than past errors, and using the `1 error |b yt − yt | in place of the normalized error ∆t can lead to
substantial underestimation of future prediction uncertainty.

Longer time window: In Figure 8(b), we show the rank distribution of normalized errors over a
longer window of 10 days. We found that due to the highly dynamic nature of COVID-19, these errors
appear to be less exchangeable. Under exchangeability conditions, the expected average rank of each of
these 11 errors would be 6. However, we found that the average rank substantially deviates from this
expected value for many days in this longer window for all displayed counties.
Overall, we believe that putting together the observations from Figures 3 and 8 yield reasonable
justification for the two choices we made to define MEPI (9a), namely, the 5-day window (versus the
entire past), and the choice of normalized errors (versus the unnormalized absolute errors).

Theoretical guarantees for coverage: In order to obtain a rough baseline coverage for the MEPIs,
we now mimic some of the theoretical computations from the conformal literature. For a given county,
and a fixed time t and parameter k, if the six errors, namely, Et = {∆t+k , ∆t , ∆t−1 , ∆t−2 , ∆t−3 , ∆t−4 }
are exchangeable, then we have

ct+k = P (∆t+k < ∆max ) = 1 − P (∆t+k = ∆max ) = 5 ≈ 0.83.


 
P yt+k ∈ PI (12)
6
Thus, we may believe that the Coverage(T ) ≈ 83% holds for large |T |, where the coverage was defined
in equation (10a). However, we now elaborate why establishing theoretically that MEPI achieves this
coverage remains a challenging task.
On the one hand, the probability in equation (12) is taken over the randomness in the errors, and the
time-index t + k remains fixed. This observation, in conjunction with the law of large numbers, implies
the following: Over multiple independent runs of the time-series, for a given county and a given time
t + k, the fraction of runs for which the MEPI PI
ct+k contains the observed value yt+k converges to 5/6 as
the number of runs goes to infinity. However, analyzing such a fraction over several different independent
runs of the COVID-19 outbreak is not relevant for our work.
On the other hand, the evaluation metric we consider is the average coverage of the MEPI over
a single run of the time-series, c.f., the definition (10a) for Coverage(T ). Thus, we require an online
version of the law of large numbers in order to guarantee that Coverage(T ) → 83% as |T | → ∞. Such
a law of large numbers, established in prior works [43], has been crucial for establishing theoretical
guarantees in conformal inference. In our case, this law (Proposition 1, Section 3.4 [43]) guarantees that,
8 Furthermore, in order to compute a 95-percentile value, we need to consider errors for at least past 20 days—

exchangeability is not likely to hold for such a long horizon.

24
Kings County, NY Queens County, NY Bronx County, NY
1.0
4 4 4

Average rank of absolute error


0.8
2 2 2
0.60 0 0
New York County, NY Cook County, IL Wayne County, MI
0.4
4 4 4
0.22 2 2
0.00 0 0
0.0t 4 t 3 t 2 t 0.2
1 t t+5 t 40.4t 3 t 2 t 1 t0.6 t + 5 t 4 t 0.8
3 t 2 t 1 t t +1.0
5
Error
(a) Rank distribution of absolute unnormalized errors over 5 days

Kings County, NY Queens County, NY Bronx County, NY


1.0
Average rank of absolute error

6 6 6
0.8
3 3 3
0.60 0 0
New York County, NY Cook County, IL Wayne County, MI
0.4
6 6 6
0.2
3 3 3
0.00 0 0
0.0 0.2 0.4 0.6 0.8 1.0
9

5
7

1
t+

t+

t+
t

t
t

t
Error
(b) Rank distribution of normalized errors over a longer window of 10 days
Figure 8: Rank distribution of absolute unnormalized errors of our CLEP (with the expanded shared
and linear predictors) for six worst affected counties (top panel) and rank distribution of normalized
errors over a longer window (bottom panel). These plots (when compared to Figure 3) provide further
numerical support for our choice of 5 days and the normalized errors for defining the MEPI.

when the entire sequence of errors {∆t , t ∈ T } for a given county is exchangeable, the corresponding
Coverage(T ) ≈ 83%, when the period T is large. Unfortunately, such an assumption is both hard to
check and unlikely to hold for the prediction errors obtained from CLEP for the COVID-19 cumulative
death counts.
Despite the challenges listed above, our results in Section 5.3 showed that MEPIs with CLEP achieved
good coverage with narrow widths.

25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy