Autocorrelation A075
Autocorrelation A075
REGRESSION ANALYSIS
TOPIC – AUTOCORRELATION
TEACHER – MANGESH KUTEKAR
where:
Xt= time series value at time
Xˉ = mean of the time series
n= total number of observations
k = lag
Types of Autocorrelation :
1. Positive Autocorrelation – When high values are followed by high values and low
values by low values, indicating a persistent trend. Example: Stock prices showing
momentum.
2. Negative Autocorrelation – When high values are followed by low values and vice
versa, creating an alternating pattern. Example: Temperature fluctuations between
day and night.
3. No Autocorrelation – When there is no clear relationship between past and present
values, meaning the data points are randomly distributed. Example: White noise or
purely random processes.
Lag-1 Autocorrelation
Definition
Lag-1 autocorrelation measures the correlation between a time series and its
values at a lag of 1 time step. It helps in identifying short-term dependencies in
data.
Where:
Xt= value of the time series at time
Xˉ= mean of the time series
n = total number of observations
Example
Interpretation:
r1>0 (Positive Autocorrelation): If today’s value is high, tomorrow’s is also likely to
be high.
Lag-k Autocorrelation
Definition
Lag-k autocorrelation measures the correlation between a time series and its values at a lag of
k time steps. It helps in identifying dependencies over longer periods.
Where:
Xt= value of the time series at time
Xˉ= mean of the time series
k= lag (number of time steps)
n= total number of observations
Example
Consider monthly sales data for a store:
If we calculate Lag-2 autocorrelation, we check how sales in January relate to March,
February to April, and so on.
Interpretation:
rk>0 (Positive Autocorrelation): Past values have a strong influence on future values.
rk<0(Negative Autocorrelation): Past values show an inverse relationship with future
values.
rk≈0 (No Autocorrelation): No clear pattern between past and future values.
Definition:
Partial Autocorrelation measures the direct relationship between a time series and its past
values at a given lag, after removing the influence of all intermediate lags. It helps in
identifying the actual lag order for time series models like AR (AutoRegressive) models.
Formula:
The Partial Autocorrelation function (PACF) at lag k is computed using the Yule-Walker
equations or through recursive regression techniques, such as:
Where:
Autocorrelation at lag-2 considers both Day-1 and Day-2 while predicting Day-3.
Partial Autocorrelation at lag-2 removes the effect of Day-1 and measures only the direct
relationship between Day-3 and Day-1.
Interpretation
Partial Autocorrelation (PACF) tells us how much a past value directly influences the present
value, after removing the effects of intermediate lags.
If today’s stock price is related to the price 3 days ago, PACF removes the influence
of days 1 and 2 to show the true relationship between today and day 3.
It helps in identifying the correct number of lags for models like AR and ARIMA by
focusing only on direct dependencies.
Consequences of Autocorrelation in Regression Analysis
Introduction
Autocorrelation occurs when the residuals (errors) in a regression model are correlated with
each other over time. This violates one of the key assumptions of Ordinary Least Squares
(OLS) regression, which assumes that errors are independent. When autocorrelation is
present, it can lead to unreliable estimates, misleading statistical tests, and poor forecasting
accuracy. Understanding its consequences is crucial to ensure valid and interpretable
regression results.
Introduction:
Autocorrelation occurs when past values in a time series influence future values, leading to
dependence between observations. Detecting autocorrelation is crucial in time series
analysis, as it affects model assumptions, statistical inferences, and forecasting accuracy.
Several methods, both graphical and statistical, can help identify the presence of
autocorrelation in data.
1. Autocorrelation Function (ACF) Plot → The ACF plot visually represents how a time
series is correlated with its previous values at different lags. If significant spikes
appear at specific lags, it indicates autocorrelation.
2. Durbin-Watson Test → A statistical test used to detect first-order autocorrelation in
regression models. A value significantly lower than 2 suggests positive
autocorrelation, while a value above 2 indicates negative autocorrelation.
3. Ljung-Box Test → A hypothesis test that checks whether autocorrelation exists across
multiple lags. A low p-value (< 0.05) suggests that autocorrelation is present in the
time series.
4. Partial Autocorrelation Function (PACF) Plot → Unlike ACF, the PACF plot removes
indirect effects from intermediate lags, helping to identify the direct influence of past
values on the current value. This is useful for selecting the appropriate order of an AR
model.
Conclusion:
Identifying autocorrelation is essential for building reliable time series models and making
accurate predictions. By using ACF/PACF plots for visualization and statistical tests like the
Durbin-Watson and Ljung-Box tests, analysts can confirm whether autocorrelation exists. If
present, appropriate techniques like differencing, transformation, or ARIMA modeling
should be applied to improve model accuracy.
Autocorrelation Function (ACF) Plot
The Autocorrelation Function (ACF) plot, also known as a correlogram, is a key tool in time
series analysis that visually represents how a time series is related to its past values at
different time lags. By analyzing the ACF plot, patterns such as trends, seasonality, and
dependencies can be identified, providing valuable insights into the structure of the data.
The Partial Autocorrelation Function (PACF) plot measures the direct relationship between a
time series and its lagged values while eliminating the effects of intermediate lags. Unlike
the ACF plot, which considers cumulative correlations, the PACF plot isolates the influence of
each lag individually.
The PACF at lag k is computed using the following equation based on the Yule-Walker
equations:
where:
ϕkk is the partial autocorrelation at lag
rk is the autocorrelation coefficient at lag
ϕkj represents lower-order partial autocorrelations
Conclusion
The PACF plot helps in identifying the direct relationship between time series observations
and their past values while removing indirect effects. It is useful for model selection,
particularly in determining the order of an AR model in ARIMA forecasting .
Ljung-Box Test
The Ljung-Box test is a statistical hypothesis test that checks whether a time series exhibits
autocorrelation at multiple lags. It helps determine whether the series is random or
dependent on past values.
The test statistic Q is computed as:
where:
n = Number of observations
m = Maximum lag being tested
rₖ = Autocorrelation at lag k
The Ljung-Box test follows a chi-square distribution with m degrees of
freedom.
1. If the p-value is large (p > 0.05): Fail to reject the null hypothesis → No significant
autocorrelation is detected.
2. If the p-value is small (p < 0.05): Reject the null hypothesis → The time series shows
significant autocorrelation.
Conclusion
The Ljung-Box test is useful for checking whether a time series is random (white noise) or
exhibits statistically significant dependencies over multiple lags. It is often used in model
validation to ensure that residuals in a fitted model are uncorrelated.
Durbin-Watson Test
The Durbin-Watson (DW) test is a statistical test used to detect autocorrelation in residuals
(errors) of a regression model. It checks whether successive errors in a time series regression
are correlated, which is a sign of autocorrelation.
where:
et= Residual (error term) at time
et−1 = Residual at the previous time step
n = Number of observations
The DW statistic ranges from 0 to 4, with interpretations:
DW ≈ 2 → No autocorrelation
DW < 2 → Positive autocorrelation
DW > 2 → Negative autocorrelation
are essential for risk assessment, investment strategies, and economic analysis.
By applying statistical techniques such as autocorrelation and partial autocorrelation
analysis, we can examine the persistence of price movements, detect seasonality, and
identify dependencies in the data. These insights are crucial for forecasting future stock
prices, developing trading strategies, and making informed financial decisions.
PACF Plot
Conclusion
The ACF plot suggests that the SENSEX time series is non-stationary, with a strong positive
correlation at lower lags and a gradual decay over time. This implies that past stock prices
significantly impact future prices, making forecasting models like ARIMA more suitable after
applying differencing to remove trends.
Durbin-Watson Test
Ljung-Box Test
The Ljung-Box test produced a test statistic of 454.18 at lag 10 with a p-value of 2.68e-91.
This extremely low p-value (far below 0.05) leads us to reject the null hypothesis of no
autocorrelation. In other words, there is very strong evidence that the BSE SENSEX closing
price series exhibits significant autocorrelation at the tested lags.
Final Thoughts:
The analysis reveals that the SENSEX time series exhibits significant short-term
dependencies and trends. While an AR(1) model is a good candidate for modeling
the time series, the non-stationary nature of the data indicates that differencing is
necessary before applying ARIMA.
The Durbin-Watson test supports the independence of the residuals after
differencing, and the Ljung-Box test shows significant autocorrelation in the raw data,
further emphasising the need for differencing.
Overall, the ARIMA model (with differencing to remove trends) would be the most
suitable approach for forecasting BSE SENSEX closing prices.