0% found this document useful (0 votes)
11 views18 pages

Autocorrelation A075

The document provides a comprehensive overview of autocorrelation, defining it as the correlation of a time series with its lagged values and discussing its types: positive, negative, and no autocorrelation. It highlights the importance of autocorrelation in various fields such as time series analysis, finance, and signal processing, and outlines methods for identifying it, including ACF and PACF plots, as well as statistical tests like the Durbin-Watson and Ljung-Box tests. The consequences of autocorrelation in regression analysis are also discussed, emphasizing its impact on model accuracy and reliability.

Uploaded by

shubhiyadav1105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views18 pages

Autocorrelation A075

The document provides a comprehensive overview of autocorrelation, defining it as the correlation of a time series with its lagged values and discussing its types: positive, negative, and no autocorrelation. It highlights the importance of autocorrelation in various fields such as time series analysis, finance, and signal processing, and outlines methods for identifying it, including ACF and PACF plots, as well as statistical tests like the Durbin-Watson and Ljung-Box tests. The consequences of autocorrelation in regression analysis are also discussed, emphasizing its impact on model accuracy and reliability.

Uploaded by

shubhiyadav1105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

MSC STATISTICS AND DATA SCIENCE (2024-2026)

REGRESSION ANALYSIS
TOPIC – AUTOCORRELATION
TEACHER – MANGESH KUTEKAR

NAME - SHUBHI YADAV


SAP ID - 86062400023
ROLL NO - A075
Autocorrelation
Autocorrelation is a mathematical representation of the degree of similarity between a
given time series and a lagged version of itself over successive time intervals. It's
conceptually similar to the correlation between two different time series, but
autocorrelation uses the same time series twice: once in its original form and once lagged
one or more time periods.

where:
 Xt= time series value at time
 Xˉ = mean of the time series
 n= total number of observations
 k = lag

Types of Autocorrelation :
1. Positive Autocorrelation – When high values are followed by high values and low
values by low values, indicating a persistent trend. Example: Stock prices showing
momentum.
2. Negative Autocorrelation – When high values are followed by low values and vice
versa, creating an alternating pattern. Example: Temperature fluctuations between
day and night.
3. No Autocorrelation – When there is no clear relationship between past and present
values, meaning the data points are randomly distributed. Example: White noise or
purely random processes.

.Autocorrelation is widely used in various fields, including:


 Time Series Analysis & Forecasting – Identifies patterns, seasonality, and trends for
better predictions.
 Economics & Finance – Detects serial correlation in stock prices, returns, and
economic indicators.
 Image Processing & Computer Vision – Helps in texture analysis, image denoising,
and pattern recognition.
 Statistics & Machine Learning – Checks for autocorrelation in residuals to improve
model accuracy.
Autocorrelation gives information about the trend of a set of historical data so that it can be
useful in technical analysis for the equity market.
Importance of Autocorrelation:
 Time Series Analysis – Autocorrelation helps detect trends, seasonality, and patterns
in time-dependent data like stock prices and weather conditions.
 Forecasting & Modeling – ARIMA and other time series models use autocorrelation
to improve prediction accuracy.
 Regression Analysis – Autocorrelation in residuals violates independence
assumptions, affecting model efficiency.
 Finance & Risk Analysis – Helps assess market efficiency, detect momentum effects,
and improve trading strategies.
 Signal & Image Processing – Enhances signal clarity by filtering noise in radar, sonar,
and pattern recognition.
Autocorrelation Used in Real Life
1. Stock Market Analysis – Autocorrelation helps detect trends and momentum in stock
prices, allowing investors to make informed trading decisions.
2. Weather Forecasting – Identifies seasonal patterns in temperature, rainfall, and
climate data, improving prediction accuracy.
3. Economics & Business – Used in sales forecasting to understand demand cycles and
optimize inventory management.
4. Signal Processing – Helps filter noise and enhance signal quality in radar, sonar, and
communication systems.
5. Medical Research – Analyzes ECG and EEG signals to detect abnormalities in heart
rate and brain activity.

Difference Between Autocorrelation and Multicollinearity :


Autocorrelation in Time Series Analysis
├── Time Series Analysis
│ ├── Autocorrelation Analysis
│ │ ├── Lag-1 Autocorrelation
│ │ ├── Lag-k Autocorrelation
│ │ ├── Partial Autocorrelation (PACF)
│ ├── Forecasting Models (AR, MA, ARIMA, etc.)

Lag-1 Autocorrelation
Definition
Lag-1 autocorrelation measures the correlation between a time series and its
values at a lag of 1 time step. It helps in identifying short-term dependencies in
data.

Where:
 Xt= value of the time series at time
 Xˉ= mean of the time series
 n = total number of observations
Example

Consider a temperature dataset:


If the Lag-1 autocorrelation is positive, it means today's temperature is strongly related to
yesterday’s. If negative, a high temperature today is likely followed by a low temperature
tomorrow.

Interpretation:
 r1>0 (Positive Autocorrelation): If today’s value is high, tomorrow’s is also likely to
be high.

 r1<0 (Negative Autocorrelation): If today’s value is high, tomorrow’s is likely to be


low.

 r1≈0 (No Autocorrelation): No relationship between consecutive values.

Lag-k Autocorrelation

Definition

Lag-k autocorrelation measures the correlation between a time series and its values at a lag of
k time steps. It helps in identifying dependencies over longer periods.

Where:
 Xt= value of the time series at time
 Xˉ= mean of the time series
 k= lag (number of time steps)
 n= total number of observations

Example
Consider monthly sales data for a store:
If we calculate Lag-2 autocorrelation, we check how sales in January relate to March,
February to April, and so on.

Interpretation:
 rk>0 (Positive Autocorrelation): Past values have a strong influence on future values.
 rk<0(Negative Autocorrelation): Past values show an inverse relationship with future
values.
 rk≈0 (No Autocorrelation): No clear pattern between past and future values.

Partial Autocorrelation (PACF)

Definition:

Partial Autocorrelation measures the direct relationship between a time series and its past
values at a given lag, after removing the influence of all intermediate lags. It helps in
identifying the actual lag order for time series models like AR (AutoRegressive) models.

Formula:

The Partial Autocorrelation function (PACF) at lag k is computed using the Yule-Walker
equations or through recursive regression techniques, such as:

Where:

 rk= regular autocorrelation at lag


 ϕkk = partial autocorrelation at lag
 ϕkj = partial autocorrelations of smaller lags
Example

Consider a stock price dataset:

Autocorrelation at lag-2 considers both Day-1 and Day-2 while predicting Day-3.

Partial Autocorrelation at lag-2 removes the effect of Day-1 and measures only the direct
relationship between Day-3 and Day-1.

Interpretation

 PACF is significant at lag k → Direct influence of Xt−k on Xt


 PACF is close to 0 at lag k → No direct influence after removing intermediate effect

Partial Autocorrelation (PACF) tells us how much a past value directly influences the present
value, after removing the effects of intermediate lags.

Think of it like this:

 If today’s stock price is related to the price 3 days ago, PACF removes the influence
of days 1 and 2 to show the true relationship between today and day 3.
 It helps in identifying the correct number of lags for models like AR and ARIMA by
focusing only on direct dependencies.
Consequences of Autocorrelation in Regression Analysis

Introduction
Autocorrelation occurs when the residuals (errors) in a regression model are correlated with
each other over time. This violates one of the key assumptions of Ordinary Least Squares
(OLS) regression, which assumes that errors are independent. When autocorrelation is
present, it can lead to unreliable estimates, misleading statistical tests, and poor forecasting
accuracy. Understanding its consequences is crucial to ensure valid and interpretable
regression results.

1. Biased Standard Errors → Autocorrelation affects the estimated standard errors of


regression coefficients, making them unreliable. This can lead to incorrect
conclusions about the significance of variables in the model.
2. Inefficient Estimates → While Ordinary Least Squares (OLS) estimates remain
unbiased, they are no longer the Best Linear Unbiased Estimators (BLUE) because
they do not have the minimum variance, leading to less precise estimates.
3. Inflated R² and t-Statistics → The presence of autocorrelation can artificially
increase the R² value, making the model appear to fit the data better than it
actually does. Similarly, the t-statistics of coefficients may be inflated, leading to
incorrect statistical inferences.
4. Misleading Hypothesis Tests → Since standard errors are biased, the p-values
obtained from t-tests and F-tests may be inaccurate, resulting in false acceptance
or rejection of hypotheses about the regression coefficients.
5. Invalid Confidence Intervals → Autocorrelation distorts the confidence intervals of
estimated coefficients, making them either too narrow or too wide. This reduces
the reliability of interval estimates and weakens the credibility of regression
analysis.
6. Forecasting Errors → In time series or econometric models, autocorrelation can
cause large forecasting errors. Since past errors are correlated, the model may
underestimate or overestimate future values, leading to poor decision-making.
Conclusion:
Autocorrelation is a serious issue in regression analysis as it violates one of the key
assumptions of OLS, leading to inefficient estimators, misleading statistical tests, and
unreliable predictions. To address autocorrelation, methods such as Generalized Least
Squares (GLS), Cochrane-Orcutt transformation, or Newey-West standard errors should be
applied to improve model reliability and accuracy.
Methods to Identify Autocorrelation in Time Series Data

Introduction:
Autocorrelation occurs when past values in a time series influence future values, leading to
dependence between observations. Detecting autocorrelation is crucial in time series
analysis, as it affects model assumptions, statistical inferences, and forecasting accuracy.
Several methods, both graphical and statistical, can help identify the presence of
autocorrelation in data.

Methods to Identify Autocorrelation:

1. Autocorrelation Function (ACF) Plot → The ACF plot visually represents how a time
series is correlated with its previous values at different lags. If significant spikes
appear at specific lags, it indicates autocorrelation.
2. Durbin-Watson Test → A statistical test used to detect first-order autocorrelation in
regression models. A value significantly lower than 2 suggests positive
autocorrelation, while a value above 2 indicates negative autocorrelation.
3. Ljung-Box Test → A hypothesis test that checks whether autocorrelation exists across
multiple lags. A low p-value (< 0.05) suggests that autocorrelation is present in the
time series.
4. Partial Autocorrelation Function (PACF) Plot → Unlike ACF, the PACF plot removes
indirect effects from intermediate lags, helping to identify the direct influence of past
values on the current value. This is useful for selecting the appropriate order of an AR
model.

Conclusion:
Identifying autocorrelation is essential for building reliable time series models and making
accurate predictions. By using ACF/PACF plots for visualization and statistical tests like the
Durbin-Watson and Ljung-Box tests, analysts can confirm whether autocorrelation exists. If
present, appropriate techniques like differencing, transformation, or ARIMA modeling
should be applied to improve model accuracy.
Autocorrelation Function (ACF) Plot

The Autocorrelation Function (ACF) plot, also known as a correlogram, is a key tool in time
series analysis that visually represents how a time series is related to its past values at
different time lags. By analyzing the ACF plot, patterns such as trends, seasonality, and
dependencies can be identified, providing valuable insights into the structure of the data.

The autocorrelation coefficient at lag k is calculated using the formula:

Interpreting the ACF Plot to Detect Autocorrelation


To determine whether autocorrelation is present in a dataset using the ACF plot, follow
these steps:
1. Identify Significant Lags:
o Examine the ACF plot for spikes that exceed the significance limits (typically
represented by dashed horizontal lines).
o These limits are set at ±1.96/√N, where N is the number of observations.
o A spike outside these bounds indicates a statistically significant
autocorrelation at that lag.
2. Observe the Pattern of Spikes:
o Gradual Decay: A slow decline in autocorrelation suggests that the time
series is non-stationary and may require differencing to remove trends.
o Alternating Signs: If autocorrelations alternate between positive and
negative, the data may exhibit seasonality or cyclic patterns.
o Regular Spikes: If significant spikes appear at consistent intervals (e.g., every
12 lags for monthly data), this confirms seasonality.
3. Assess the Overall Structure:
o All Lags Within Bounds: If all autocorrelations remain within the threshold
limits, the series lacks autocorrelation and appears random.
o Significant Spikes at Certain Lags: The presence of large spikes at specific lags
indicates that past values strongly influence future values, confirming
autocorrelation in the series.
Conclusion
The ACF plot is essential for detecting autocorrelation in time series data. It helps identify
dependencies, assess stationarity, and detect trends or seasonality, guiding the selection of
forecasting models like ARIMA for better predictions.

Partial Autocorrelation Function (PACF) Plot

The Partial Autocorrelation Function (PACF) plot measures the direct relationship between a
time series and its lagged values while eliminating the effects of intermediate lags. Unlike
the ACF plot, which considers cumulative correlations, the PACF plot isolates the influence of
each lag individually.

The PACF at lag k is computed using the following equation based on the Yule-Walker
equations:

where:
 ϕkk is the partial autocorrelation at lag
 rk is the autocorrelation coefficient at lag
 ϕkj represents lower-order partial autocorrelations

Interpreting the PACF Plot to Detect Autocorrelation

1. Spikes at Certain Lags: If only a few spikes are significant, it suggests an


autoregressive (AR) process where dependencies exist for a limited number of past
values.
2. Decay Pattern: If the PACF plot shows no significant spikes beyond a certain lag, it
suggests that only the first few lags influence future values, making it useful for
model selection (e.g., determining the AR order in ARIMA models).

Conclusion
The PACF plot helps in identifying the direct relationship between time series observations
and their past values while removing indirect effects. It is useful for model selection,
particularly in determining the order of an AR model in ARIMA forecasting .

Ljung-Box Test

The Ljung-Box test is a statistical hypothesis test that checks whether a time series exhibits
autocorrelation at multiple lags. It helps determine whether the series is random or
dependent on past values.
The test statistic Q is computed as:

where:
 n = Number of observations
 m = Maximum lag being tested
 rₖ = Autocorrelation at lag k
The Ljung-Box test follows a chi-square distribution with m degrees of
freedom.

Interpreting the Ljung-Box Test to Detect Autocorrelation

1. If the p-value is large (p > 0.05): Fail to reject the null hypothesis → No significant
autocorrelation is detected.
2. If the p-value is small (p < 0.05): Reject the null hypothesis → The time series shows
significant autocorrelation.

Conclusion
The Ljung-Box test is useful for checking whether a time series is random (white noise) or
exhibits statistically significant dependencies over multiple lags. It is often used in model
validation to ensure that residuals in a fitted model are uncorrelated.

Durbin-Watson Test

The Durbin-Watson (DW) test is a statistical test used to detect autocorrelation in residuals
(errors) of a regression model. It checks whether successive errors in a time series regression
are correlated, which is a sign of autocorrelation.

The Durbin-Watson statistic (DW) is calculated as:

where:
 et= Residual (error term) at time
 et−1 = Residual at the previous time step
 n = Number of observations
The DW statistic ranges from 0 to 4, with interpretations:
 DW ≈ 2 → No autocorrelation
 DW < 2 → Positive autocorrelation
 DW > 2 → Negative autocorrelation

Interpreting the Durbin-Watson Test to Detect Autocorrelation

1. If DW is close to 2: No significant autocorrelation is detected.


2. If DW is significantly less than 2: Suggests positive autocorrelation, meaning past
errors influence future errors in the same direction.
3. If DW is significantly greater than 2: Indicates negative autocorrelation, meaning past
errors are inversely related to future errors.
Conclusion
The Durbin-Watson test is a simple yet effective method to quantify the presence of
autocorrelation in regression residuals. If significant autocorrelation is found, adjustments
such as time series modeling (ARIMA) or GLS regression may be needed.

BSE SENSEX Data


Introduction
The BSE SENSEX (Bombay Stock Exchange Sensitive Index) is a key indicator of India's stock
market performance, comprising the top 30 financially strong and actively traded companies
listed on the BSE. It reflects overall market trends, investor confidence, and economic
conditions.
This dataset contains historical closing prices of the BSE SENSEX over a specific period. The
data provides valuable insights into market trends, price fluctuations, and volatility, which

are essential for risk assessment, investment strategies, and economic analysis.
By applying statistical techniques such as autocorrelation and partial autocorrelation
analysis, we can examine the persistence of price movements, detect seasonality, and
identify dependencies in the data. These insights are crucial for forecasting future stock
prices, developing trading strategies, and making informed financial decisions.

PACF Plot

Understanding the PACF Plot


 The x-axis (lags) represents different time intervals, showing how past values at specific
lags directly influence the present value.
 The y-axis (partial autocorrelation values) measures the direct correlation between the
time series and its lagged version, after removing the effect of any intermediate lags.
 The blue shaded region represents the 95% confidence interval—values outside this
range indicate statistically significant lags.

Interpretation of the PACF Plot


a) Significant Lag at Lag 1
 The PACF plot shows a high correlation at lag 1, meaning that the immediate past
value has a strong direct influence on the current price.
 This suggests that the BSE SENSEX closing prices follow an autoregressive process,
where today's price is heavily dependent on yesterday's price.
b) Sudden Drop After Lag 1
 After lag 1, the partial autocorrelation values drop sharply and remain mostly within
the confidence interval.
 This indicates that only the most recent lag has a direct effect, while other lags
influence the series indirectly.
 This is a strong indication of an AR(1) process, meaning that the time series can be
well modeled using just the first lag.
c) Small and Insignificant Correlations at Higher Lags
 The PACF plot shows no statistically significant spikes beyond lag 1 (all are within the
confidence bounds).
 This suggests that there is no strong direct relationship between further past values
and the present value after accounting for the intermediate lags.
 This reinforces the idea that the time series is best explained with a low-order
autoregressive model, likely AR(1).
d) No Strong Seasonality
 If there were seasonality in the data, we would observe significant spikes at regular
intervals (e.g., every 12 lags for monthly data). However, in this case, we do not see
such patterns, suggesting that there is no strong seasonal component in the data.
Conclusion
 The PACF plot suggests that an AR(1) model is a good fit for this time series, as only
lag 1 has a significant impact while higher-order lags do not contribute much.
 The absence of significant lags beyond the first confirms that the stock prices are
mainly influenced by their most recent values, with minimal direct impact from older
observations.
 If we were to build an ARIMA model, we might consider AR(1) as the autoregressive
term (p=1).
ACF Plot

Interpretation of the ACF Plot


1. Strong Positive Autocorrelation at Small Lags
o The plot shows high positive autocorrelation for small lag values (e.g., lag 1,
lag 2, etc.), indicating that the SENSEX closing prices are highly dependent on
their recent past values. This suggests a strong short-term relationship in the
data.
2. Gradual Decline in Autocorrelation
o The autocorrelation values decrease gradually as the lag increases. This
pattern is characteristic of non-stationary time series, where past values
continue to influence future values over time. A slow decay often suggests
the presence of a trend in the data.
3. Crossing Zero and Becoming Negative
o Around lag 25-30, some values fall below the zero line, meaning the series
exhibits weak negative autocorrelation at longer lags. This could indicate
periodicity or cyclic behavior in stock market movements.
4. Confidence Interval (Shaded Region)
o The shaded area represents the 95% confidence interval. Any bars extending
beyond this region are statistically significant autocorrelations. In this plot,
many lags have significant correlations, confirming the presence of
autocorrelation in the time series.

Conclusion
The ACF plot suggests that the SENSEX time series is non-stationary, with a strong positive
correlation at lower lags and a gradual decay over time. This implies that past stock prices
significantly impact future prices, making forecasting models like ARIMA more suitable after
applying differencing to remove trends.

Durbin-Watson Test

Durbin-Watson Statistic: 1.93245920192261


The Durbin-Watson statistic of 1.93 is very close to the ideal value of 2, which indicates that
there is little to no evidence of autocorrelation in the differenced price series. This suggests
that the residuals (the differences between consecutive values) are approximately
independent. In practical terms, it means that past errors are not significantly influencing
future errors, supporting the reliability of models that assume independent residuals.

Ljung-Box Test

Ljung-Box Test Results:


lb_stat lb_pvalue
10 454.177923 2.683943e-91

The Ljung-Box test produced a test statistic of 454.18 at lag 10 with a p-value of 2.68e-91.
This extremely low p-value (far below 0.05) leads us to reject the null hypothesis of no
autocorrelation. In other words, there is very strong evidence that the BSE SENSEX closing
price series exhibits significant autocorrelation at the tested lags.
Final Thoughts:
 The analysis reveals that the SENSEX time series exhibits significant short-term
dependencies and trends. While an AR(1) model is a good candidate for modeling
the time series, the non-stationary nature of the data indicates that differencing is
necessary before applying ARIMA.
 The Durbin-Watson test supports the independence of the residuals after
differencing, and the Ljung-Box test shows significant autocorrelation in the raw data,
further emphasising the need for differencing.
 Overall, the ARIMA model (with differencing to remove trends) would be the most
suitable approach for forecasting BSE SENSEX closing prices.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy