AI For Real World Application - Notes (Module-6)
AI For Real World Application - Notes (Module-6)
Definition:
A time series is an ordered sequence of values, typically denoted as xtx_t, where tt
represents the time index (e.g., t=1,2,3,…t = 1, 2, 3, \ldots).
○ Trend: The long-term movement in the data. It shows whether the series is
increasing, decreasing, or stagnant over time.
○ Seasonality: Regular, repeating patterns or cycles of behavior observed in the
series over fixed intervals (e.g., daily, monthly, or yearly).
○ Cyclic Variations: Irregular patterns in the data due to business or economic
cycles that do not have a fixed period.
○ Residual (Noise): Random variation or fluctuations that cannot be explained by
the trend, seasonality, or cyclic components.
○ Univariate Time Series: Involves a single variable recorded over time (e.g., stock
prices).
○ Multivariate Time Series: Involves multiple variables that are often interrelated
(e.g., temperature and humidity data).
1. Forecasting: Predicting future values based on past observations (e.g., sales forecasting,
stock price prediction).
2. Anomaly Detection: Identifying unusual patterns or outliers in the data.
3. Trend Analysis: Analyzing long-term movements to inform strategic decisions.
4. Seasonal Adjustment: Removing seasonal effects to understand the underlying trends.
1. Deterministic Models: Assume the time series can be represented using mathematical
functions (e.g., linear or exponential trend models).
○ Autoregressive (AR) Models: Models that use past values of the series to predict
future values.
○ Moving Average (MA) Models: Models that use past forecast errors to improve
predictions.
○ ARMA (Autoregressive Moving Average) Models: Combines AR and MA
models.
○ ARIMA (Autoregressive Integrated Moving Average) Models: Extends
ARMA by adding differencing to handle non-stationarity.
○ Seasonal ARIMA (SARIMA): Incorporates seasonal components into ARIMA
models.
○ State-Space Models: Includes Kalman filters to model time-varying processes.
3. Machine Learning Approaches:
3. Autocorrelation: The correlation of a time series with its own lagged values.
1. Data Collection: Gather and preprocess the time series data, ensuring it is complete and
free of errors.
2. Exploratory Data Analysis (EDA):
○ Plot the time series to visualize trends and patterns.
○ Decompose the series into components (trend, seasonality, and residual).
3. Stationarity Testing: Check if the series is stationary or requires transformations.
4. Model Selection: Choose an appropriate model based on data characteristics.
5. Parameter Estimation: Estimate model parameters using techniques like maximum
likelihood or least squares.
6. Validation: Evaluate the model's performance using metrics such as Mean Absolute
Error (MAE), Root Mean Squared Error (RMSE), or Mean Absolute Percentage Error
(MAPE).
7. Forecasting: Predict future values based on the chosen model.
Common Challenges
1. Seasonality and Trends: Decomposing and accurately modeling these components can
be complex.
2. Non-Stationarity: Many real-world time series are non-stationary and require
transformations.
3. High Variability: Extreme fluctuations and noise can make modeling difficult.
4. Data Gaps: Missing data points can lead to biased models.
5. Overfitting: Overly complex models may fit the noise instead of the actual signal.
1. Python:
○ Libraries: pandas, numpy, statsmodels, scikit-learn, prophet,
sktime, tensorflow, keras.
2. R:
○ Libraries: forecast, TSA, tseries.
3. Software:
○ MATLAB, SAS, Excel (for basic analysis).
A stationary time series is one whose statistical properties, such as mean, variance, and
autocorrelation, remain constant over time. Stationarity is a critical concept in time series
analysis because many analytical methods and models (e.g., ARIMA) assume that the underlying
data is stationary.
Types of Stationarity
○ Only the first two moments (mean and variance) are constant, and the
autocovariance depends only on the lag.
○ This is a less strict condition and is more commonly assumed in time series
analysis.
Importance of Stationarity
1. Modeling Requirements: Many models like AR, MA, ARMA, and ARIMA assume
stationarity. For non-stationary data, transformations like differencing are required.
2. Predictability: A stationary series is easier to forecast because its statistical properties do
not change over time.
3. Simplified Analysis: With constant statistical properties, analysis and hypothesis testing
become more robust.
1. Visual Inspection:
○ Plot the time series: Look for trends, seasonality, or changing variance.
○ Plot the autocorrelation function (ACF): A stationary series typically has rapidly
decaying autocorrelations.
2. Statistical Tests:
1. Differencing:
○ Subtract the previous value from the current value to remove trends.
yt=xt−xt−1y_t = x_t - x_{t-1}
○ For seasonal effects, use seasonal differencing: yt=xt−xt−my_t = x_t - x_{t-m}
where mm is the seasonal period.
2. Transformation:
3. Detrending:
○ Separate the series into trend, seasonal, and residual components, then analyze the
residuals.
4. Smoothing:
1. Financial Analysis:
○ Stationary returns in stock markets are analyzed rather than raw prices.
2. Econometrics:
● Variants:
○ Simple Exponential Smoothing: Assumes no trend or seasonality.
○ Holt’s Method: Extends exponential smoothing to include trends.
○ Holt-Winters Method: Accounts for both trend and seasonality.
3. Locally Weighted Scatterplot Smoothing (LOWESS or LOESS)
● Definition: Uses kernel functions (e.g., Gaussian) to assign weights to data points
based on proximity.
● Advantage: Smooths data while maintaining structure.
7. Moving Median Smoothing
● Definition: Replaces each data point with the median of a window of neighboring
values.
● Advantages:
○ Robust to outliers.
○ Effective for reducing sharp spikes.
8. Fourier Transform Smoothing
● Definition: Transforms the data into the frequency domain and removes
high-frequency components (noise).
Prepared By: Dept. of CSE-AIML, BWU
B. Tech CSE AIML 5th Sem
AI for Real World Application
(PCC-CSM503)
1. Visualize Raw Data: Plot the data to understand its structure and variability.
2. Select a Technique: Choose a smoothing method based on the nature of the data
(e.g., linear trend, seasonal patterns).
3. Apply Smoothing:
○ Use computational tools (e.g., Python, R) to implement the technique.
○ Adjust parameters like window size or smoothing constant.
4. Evaluate the Result:
○ Compare the smoothed series to the raw series.
○ Ensure the smoothed data retains key patterns while removing noise.
Applications of Smoothing
Advantages of Smoothing
Disadvantages of Smoothing
1. Based on Purpose:
Prepared By: Dept. of CSE-AIML, BWU
B. Tech CSE AIML 5th Sem
AI for Real World Application
(PCC-CSM503)
Example of Smoothing
Raw Data: Monthly sales data with irregular spikes and seasonality.
Implementation in Python
import pandas as pd
import matplotlib.pyplot as plt
# Example data
data = [120, 130, 115, 140, 135, 145, 150, 160, 155, 165, 170, 180]
series = pd.Series(data)
# Moving Average
window = 3
smoothed = series.rolling(window=window).mean()
# Plot
plt.plot(series, label="Original")
plt.plot(smoothed, label="Smoothed (Moving Average)", color="red")
plt.legend()
plt.show()
This approach demonstrates how smoothing can be applied to remove noise and reveal
Autocorrelation measures the relationship between a time series and its lagged version
over successive time periods. The autocorrelation function (ACF) quantifies these
correlations and provides insights into the time series' structure, such as trends,
seasonality, or randomness.
Key Concepts
1. Lag:
The ACF plots the autocorrelation coefficients rkr_k for different lags kk. It helps
identify:
3. Stationarity:
○ Stationary series: ACF decays rapidly.
○ Non-stationary series: ACF decays slowly or remains high.
The PACF plots partial autocorrelations for different lags, showing the direct effect of a
lag kk on xtx_t without mediation by other lags.
● Use:
○ Identify the order of an autoregressive (AR) process.
○ Distinguish between AR and MA models in ARIMA.
● ACF: Significant spikes up to the order of the MA process, then cuts off.
● PACF: Exponentially decays or oscillates.
● The horizontal bands around zero in ACF/PACF plots represent the 95%
confidence interval.
1. Model Identification:
1. Compute Autocorrelation:
○ Calculate rkr_k for various lags using statistical software (e.g., Python, R).
2. Generate Plots:
# Example data
data = [12, 14, 15, 13, 16, 18, 17, 19, 21, 20]
series = pd.Series(data)
# Plot ACF
plot_acf(series, ax=ax[0], lags=5, title="Autocorrelation Function")
# Plot PACF
plot_pacf(series, ax=ax[1], lags=5, title="Partial Autocorrelation Function")
plt.tight_layout()
plt.show()
ACF vs PACF
Decay Gradual for AR; cuts off Gradual for MA; cuts off for AR.
for MA.
Practical Considerations
1. Sample Size:
3. White Noise:
Summary
● The ACF quantifies the relationship between a time series and its lags, while the
PACF measures direct correlations after accounting for intermediate lags.
● Both are essential tools for time series analysis, particularly for identifying
ARIMA model components, detecting seasonality, and understanding series
behavior.
ARIMA models are statistical tools for analyzing and forecasting time series data. The
model combines three key components: Autoregression (AR), Differencing
(Integration, I), and Moving Averages (MA). These models are widely used for
understanding patterns in time series and making future predictions.
Model Notation
○ Estimate the parameters of pp, dd, and qq using statistical tools (e.g.,
Python, R).
5. Evaluate the Model:
■ Ljung-Box test.
○ Ensure no significant autocorrelation remains.
6. Forecast:
○ Use the model to predict future values and evaluate accuracy using metrics
like Mean Squared Error (MSE) or Mean Absolute Error (MAE).
Extensions of ARIMA
○ Automatically selects the best parameters for pp, dd, and qq based on
criteria like AIC or BIC.
Advantages of ARIMA
Limitations of ARIMA
Practical Example
Python Implementation
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
# Plot results
plt.plot(series, label='Original Data')
plt.plot(range(len(series), len(series) + len(forecast)), forecast, label='Forecast',
color='red')
plt.legend()
plt.show()
Applications
Summary
Deep learning has emerged as a transformative approach for time series analysis,
providing advanced techniques to model complex temporal relationships, handle
high-dimensional data, and predict future trends. Unlike traditional methods like
ARIMA, which rely on predefined statistical assumptions, deep learning methods can
automatically learn from raw data, making them ideal for tackling diverse and non-linear
time series problems.
Core Concepts
○ Uses neural networks with multiple layers to learn features directly from
the data.
○ Handles both univariate (single variable) and multivariate (multiple
variables) series.
1. Forecasting:
3. Classification:
1. Non-linear Modeling:
1. Data Requirements:
4. Interpretability:
1. Data Preparation:
○ Metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), RMSE.
○ Check residuals for patterns to ensure model quality.
1. Regularization:
○ Optimize learning rates, number of layers, and units using grid search or
Bayesian optimization.
3. Data Augmentation:
4. Transfer Learning:
○ Combine deep learning models with traditional methods like ARIMA for
better performance.