Effectiveness of A Lock-Down To Combat COVID-19 Using Regression Analysis: A Case Study in India
Effectiveness of A Lock-Down To Combat COVID-19 Using Regression Analysis: A Case Study in India
Abstract
Ever since its outbreak in the city of Wuhan in China, COVID-19 has spread around the
world at a rapid rate and has now been identified in over 187 countries. To combat the
spread of the virus and to isolate the affected, one of the most significant measures taken
by various countries around the world is to enforce a complete lock-down across the
nation. This paper aims to examine the effectiveness of the one such lock-down enforced
in India by the Government on the 24th of March 2020 so as to curb the spread of COVID-
19. In order to be able to estimate the effectiveness, an approach of Regression Analysis,
which is a sub-branch of Predictive Analytics, has been applied using Python
Programming. Upon following this approach, it is observed that the 41-day long lock-
down has been highly effective in limiting the total number of COVID-19 positive cases in
India.
Keywords: Regression, Log-Linear Model, Statistical Analysis in Python, Coronavirus
1. Introduction
The Coronavirus represents a large family of viruses that is held responsible for causing a
gamut of illnesses which include the common cold, Middle East Respiratory Syndrome
(MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A novel
Coronavirus (nCoV) may be a new strain that has not been previously identified in
humans. In general, the human population has no acquired immunity to the new
Coronavirus and while most people who contract the virus will experience no or mild
symptoms, some people will develop severe or even life-threatening symptoms. As of 2nd
May 2020, more than 3.39 million cases of COVID-19 have been reported in 187
countries and territories, resulting in more than 241,000 deaths. More than 1.06 million
people have recovered. Owing to this, the Coronavirus has been declared as a global
pandemic by the World Health Organisation (WHO).
In India, the first positive case of COVID-19 was reported on 30th January 2020. Since
then, the number of cases has been rising constantly. In order to curb the spread of the
disease, the Government of India declared a nationwide lock-down on 24th March 2020,
initially for a span of 21 days i.e., up-to 14th April 2020, but subsequently extended the
period up-to 3rd May 2020 and later, with various relaxations, extended it to 17th May
2020.
The primary objective of this paper is to examine the effect of the enforced lock-down
(from 24th March 2020 to 3rd May 2020), for a period of 41 days in curbing the number of
COVID-19 positive cases in the India.
2. Methodology
In order to arrive at the effectiveness of the Lock-down enforced on 24th of March 2020,
the approach of Predictive Analytics has been put to use using Python Software.
Predictive Analytics is a statistical technique that involves studying and analysing the
current data to make predictions about the future through the application of various means
such as Data Collection, Data Modelling and Statistics, to name a few. It is an extremely
useful technique that can be applied in various spheres to get an estimate of the
behavioural trends and patterns of the data in the future.
This methodology of applying Predictive Analytics has been used to predict the total
number of COVID-19 positive cases had the lock-down not been enforced. This number
has then been compared to the total number of COVID-19 positive cases that were
recorded while the lock-down of 41 days, as ordered by the Government of India, was in
enforcement. This difference in the number of COVID-19 positive cases in both the above
stated scenarios will be used to arrive at an estimate of the effectiveness of the nation-
wide lock-down in India to combat the spread of the coronavirus.
The most crucial steps as followed in the paper to achieve the objective have been listed
as follows:
1. Data Collection
2. Statistical Modelling
3. Linear Regression Analysis
4. Transforming the data into Log-Linear Model
5. Construction of Best-Fit Line
6. Future Prediction
In the following sections, the above mentioned steps have been elaborated in great detail
with various mathematical calculations to supplement it. The analysis carried out in this
paper has been programmed in Python using the Numpy, Pandas, and Matplotlib Software
libraries.
in order to avoid any discrepancies to the accuracy of the predictions of the model and is
thus omitted from Table 1.
04-Apr-20 3072
05-Apr-20 3577
06-Apr-20 4281
07-Apr-20 4789
08-Apr-20 5274
09-Apr-20 5865
10-Apr-20 6761
11-Apr-20 7529
12-Apr-20 8447
13-Apr-20 9352
14-Apr-20 10815
15-Apr-20 11933
16-Apr-20 12759
17-Apr-20 13835
18-Apr-20 14792
19-Apr-20 16116
20-Apr-20 17656
21-Apr-20 18985
22-Apr-20 20471
23-Apr-20 21700
24-Apr-20 23452
25-Apr-20 24942
26-Apr-20 26917
27-Apr-20 28380
28-Apr-20 29974
29-Apr-20 31787
30-Apr-20 33610
01-May-20 35365
02-May-20 37776
03-May-20 40263
A Scatter Plot is a graph of plotted points that shows the relationship between two sets of
numeric data. The obtained data in Table 1 is represented in the form of a Scatter Plot
wherein the total number of COVID-19 positive cases has been expressed as a function of
the corresponding date of occurrence, as shown in Figure. 1.
The Scatter Plot thus obtained closely resembles an exponential curve in nature.
According to the mathematical definition, a quantity is said to be growing at an
exponential rate if it increases with a rate proportional to its current size. Thus, by
observing the obtained graph, it can be implied that greater the number of people initially
affected by the virus in the early stages of its outbreak, greater the number of people who
will get affected in the later stages. The consistent doubling of cases in a fixed period is
the hallmark of exponential growth. [3][4]
The number of new infections that a single infectious individual will cause during their
infectious period is known as the basic reproduction number of a disease. The initial
exponential growth rate of a virus is an important measure of the severity of the spread,
and is also closely related to the basic reproduction number. This number is key to
determining how widespread a virus will become. For COVID-19, estimates of the basic
reproduction number have it somewhere between 1.03 and 1.67. [5][6][7]
It is important to note that Linear Regression can only be applied if the curve is of the
linear form, in the equation of which, 'm' denotes the slope of the curve and 'b' denotes the
y-intercept of the curve. In other words, only if the variables share a linear relationship,
the approach of Linear Regression can be applied. The equation of the a linear curve is
stated as follows:
(1)
(2)
This discrepancy in the approach is resolved by transforming the data into the Log-Linear
Model as described in the following section.
2.4. Transforming the Data into Log-Linear Model
In order to remove the discrepancy in the approach and to apply Linear Regression, it is
necessary to transform the exponential curve obtained in Figure. 2., into a linear curve. To
carry out the above mentioned process, a few mathematical simplifications are necessary.
It is known that the exponential curve is of the form that is depicted in Equation. 2. It can
be re-written as follows:
(3)
Taking logarithm on both sides of the above equation, the following equation is obtained:
(4)
Upon further simplification of the above equation, one can arrive at the following
equation:
(5)
On application of the logarithm rules, the above equation can be simplified to:
(6)
It is known that,
(7)
(8)
Assuming that,
(9)
(10)
(11)
Upon comparing Equation. 11 to Equation. 1, it is observed that they are now in similar
forms:
(12)
(13)
This process, in which an exponential curve is modified into a linear curve through the
application of logarithms is known as Logarithmic Transformation. [9]
In the scatter plots plotted throughout the course of this paper, in general, the x-axis
represents the date and y-axis represents the total number of COVID-19 positive cases
recorded on the corresponding date. As seen in the above mathematical derivation,
Logarithmic Transformation involves computing the logarithm of the y-axis values. The
logarithmic y values are now plotted on the y-axis in place of the previous values.
Plotting the newly transformed y-axis values versus the x-axis values will result in a new
plot with a nearly straight line. This plot is termed to be in the form of a 'Log-Linear'
model. When one uses natural log values for the dependent variable (y) and keep the
independent variables (x) in their original linear scale, the resulting graph is called a Log-
Linear model.
The above explained process of Logarithmic Transformation is applied to the data
recorded before the lock-down was enforced as well as to the overall data collected during
the enforcement of the lock-down. The Scatter Plots, thus obtained, have been depicted in
Figure. 3. and Figure. 4. respectively.
As observed in Figure. 3. and Figure. 4., the Log-Linear Models obtained are in close
similarity to the Linear Curves, and resemble a straight line. This enables one to perform
Linear Regression Methods on these logarithmic-ally transformed curves.
A Line of Best Fit can be roughly determined using an eyeball method by drawing a
straight line on a scatter plot so that the number of points above the line and below the
line is about equal (and the line passes through as many points as possible). However, a
more accurate way of finding the line of best fit is the Least Square Method, as described
below.
The straight line that constitutes the Best Fit to a set of data points in the x-y plane is
typically calculated by minimizing the sum of the squares of the distances from the points
to the line—a method that was introduced by Legendre and Gauss more than two hundred
years ago.
The Line of Best Fit is drawn for the Logarithmic-ally transformed scatter plot for the
data that was recorded before the Lock-down was enforced. To do so, the approach of
Least Squares Method has been put to use as described in detail below. The line of best fit
is of the standard form:
(14)
where 'm' represents the slope of the line and 'b' represents the y-intercept of the line.
To arrive at the equation of the Best-Fit line, the following values are first calculated by
taking the sum of the x-axis values, logarithmic-ally transformed y-axis values, the square
of x-axis values, and the product of the x-axis and logarithmic-ally transformed y-axis
values in accordance to the data presented in Table I as shown below. The total number of
observations 'n' is computed as well.
(15)
(16)
(17)
(18)
(19
)
Substituting the above values, the slope 'm' and the y-intercept 'b' of the Best-Fit Line are
computed using the following formulae:
(20)
(21)
After the application of the above formulae on the data, the values of 'm' and 'b' are
obtained to be:
(22)
(23)
Upon arriving at the values of the slope and intercept using the above formulae, the Best-
Fit line has been plotted over the graph in Figure. 2., to obtain the graph as depicted in
Figure. 5.
This Line of Best Fit depicted in Figure. 5. is the line that is the representation of all the
points in the Scatter Plot, since it is at an equal distance from all the points. In other
words, this Line of Best Fit represents the the data of all the COVID-19 positive cases that
were recorded before the lock-down had been enforced.
number of COVID-19 positive cases before the lock-down was enforced in the country.
The green line represents the extension of the Best-Fit line that is fitted on to the rest of
the data which includes the total number of COVID-19 cases recorded in the country
while the lock-down was in enforcement. The Observations of the Methodology followed
in the above sections has been elucidated in the following section.
2.7. Observations
The most striking observation made with respect to Figure. 6. at the end of this process is
the significant difference in the slopes of the extended Best-Fit Line and the Logarithmic-
ally Transformed Linear line. It can be observed that the slope of the line depicting the
total number of COVID-19 positive cases during the lock-down is much lesser when
compared to the slope of the line that represents the total number of COVID-19 positive
cases had the lock-down not been enforced. This gives a visual proof of the effectiveness
of the 41-day long lock-down. To strengthen the presented argument, the mathematical
proof of the observation has been presented below.
The two cases that need to be compared in order to arrive at a suitable conclusion are as
follows:
1. The total number of COVID-19 positive cases after 41 days had the lock-down not
been enforced. (As represented by the Best-Fit Line in Figure. 6.)
2. The total number of COVID-19 positive cases after 41 days of the lock-down
enforcement. (As represented by the scatter plot that has been obtained via Logarithmic
Transformation in Figure. 6.)
The mathematical observation can be made by arriving at the y-axis values corresponding
to the x-axis date of 3rd of May 2020. The values obtained upon doing so are 'y1' and 'y2'
respectively for the above mentioned cases:
(24)
(25)
(26)
(27)
The obtained values from Equation. 26 and Equation. 27. are representative of the total
number of COVID-19 positive cases without and with the enforcement of the lock-down
respectively. It is thus concluded that the total number of cases in the country if the lock-
down had not been enforced would be approximately 4.8 million cases. This is an
exorbitant number. On the other hand, it is obtained that the number of cases with the
lock-down in place is close to 40 Thousand cases till the end of the second lock-down
extension of 3rd May 2020.
3. Conclusion
In this paper, an approach of Regression Analysis has been applied to the data of per-day
count of the number of COVID-19 positive cases in India that has been recorded ever
since the outbreak of the Coronavirus since 30th of January 2020 till the end of the second
extension of the enforced Lock-down, that is the 3rd of May 2020.
Upon following the methodology that involved the steps of Data Collection, Statistical
Modelling, Linear Regression Analysis, Logarithmic Transformation in order to obtain a
Log-Linear Model, Construction of the line of Best-Fit and obtaining the Future
Predictions, various visual and mathematical observations have been made as elucidated
in the previous section.
According to the results obtained through data visualisation and modelling, it can be
concluded that had the lock-down not been enforced, there would be close to a 4.8 million
COVID-19 positive cases in India by the 3rd of May 2020. On the other hand, since the
lock-down has been enforced on 24th of March 2020, the total number of COVID-19
positive cases in the country by the 3rd of May 2020 is around 40 Thousand.
Thus, it is concluded the lock-down of 41 days enforced by the Government of India to
tackle the Coronavirus pandemic has been highly effective in reducing the number of
active COVID-19 cases in the country by reducing the potential 4.8 million positive cases
to around 40 thousand positive cases in the through the duration of 41 days.
References
[1] Knatterud.,G.L., Rockhold, F.W., George, S.L., Barton, F.B., Davis, C.E., Fairweather, W.R., Honohan,
T., Mowery, R, O’Neill, R. (1998). "Guidelines for quality assurance in multicenter trials: a position paper".
Controlled Clinical Trials, 19:477-493.
[2] J. Mitlöhner, S. Neumaier, J. Umbrich and A. Polleres, "Characteristics of Open Data CSV Files," 2016
2nd International Conference on Open and Big Data (OBD), Vienna, 2016, pp. 72-79.
[3] J. Ma, J. Dushoff, B.M. Bolker, D.J.D. Earn "Estimating initial epidemic growth rates", Bulletin of
Mathematical Biology, 76 (2013), pp. 245-260
[4] E. Pourabbas, A. d’Onofrio, M. Rafanelli "A method to estimate the incidence of communicable
diseases under seasonal fluctuations with application to cholera", Applied Mathematics and Computation, 118
(2001), pp. 161-174
[5] J. Wallinga, M. Lipsitch "How generation intervals shape the relationship between growth rates and
reproductive numbers", Proceedings of the Royal Society B: Biological Sciences, 274 (2006), pp. 599-604.
[6] F.J. Richards "A flexible growth function for empirical use", Journal of Experimental Botany, 10
(1959), pp. 290-300.
[7] L. Zhong, L. Mu, J. Li, J. Wang, Z. Yin and D. Liu, "Early Prediction of the 2019 Novel Coronavirus
Outbreak in the Mainland China Based on Simple Mathematical Model," in IEEE Access, vol. 8, pp. 51761-
51769, 2020.
[8] Astrid Schneider, Dipl. Math.,1 Gerhard Hommel, Prof. Dr. rer. nat.,1 and Maria Blettner, Prof. Dr. rer.
nat.*,1. "Linear Regression Analysis" Dtsch Arztebl Intv.107(44); 2010 Nov; PMC2992018.
[9] M. T. Abuelma'atti and N. A. Tassaduq, "A new implementation for the logarithmic/exponential
function generator," 2014 International Symposium on Intelligent Signal Processing and Communication
Systems (ISPACS), Kuching, 2014, pp. 127-132.
[10] M. B. Jain, M. K. Nigam and P. C. Tiwari, "Curve fitting and regression line method based seasonal
short term load forecasting," 2012 World Congress on Information and Communication Technologies,
Trivandrum, 2012, pp. 332-337.
Authors