Understanding Time Series
Understanding Time Series
Forecasting Methods:
Moving averages,
ARIMA (AutoRegressive Integrated Moving Average),
Exponential smoothing techniques.
Study Notes- Understanding Time Series
Time series:
Time series analysis is a vital technique in data science used to derive meaningful insights from
time-based data.
Whether it's forecasting sales, predicting stock prices, or analyzing seasonal trends, time series
analysis offers effective methods to uncover patterns in sequential data.
Seasonality (S): Recurring patterns observed at regular intervals, such as monthly sales spikes.
Trends (T): Long-term movements or directions in the data over time.
Irregularity (I): Unpredictable or random fluctuations caused by unforeseen events.
Seasonality (S):
For example:
Daily Seasonality: Website traffic may peak during certain hours of the day.
Weekly Seasonality: Retail stores might see higher sales on weekends.
2
Study Notes- Understanding Time Series
Monthly or Yearly Seasonality: Industries like tourism or retail may experience higher activity
during holidays or specific seasons.
Explanation:
Seasonality is important because it highlights predictable variations that can impact decision-
making. Recognizing these patterns helps businesses and analysts prepare for changes in
demand, allocate resources efficiently, and improve forecasting accuracy. For instance,
understanding seasonal trends in sales can help a business stock the right inventory during peak
periods, ensuring better customer satisfaction and profitability.
Trends (T):
Trends represent the overall long-term direction in time series data, reflecting consistent
upward or downward movement over a period. They capture the general trajectory of change
in the data, providing insight into the underlying structure and magnitude of shifts over time.
Key Points:
Definition: A trend indicates whether the data exhibits a sustained increase, decrease, or
remains relatively stable (horizontal) over a long period.
Purpose in Analysis: Identifying and modeling trends helps analysts isolate systematic patterns
from short-term fluctuations, making it easier to uncover deeper insights and improve
forecasting accuracy.
Application: Trends are often removed or detrended during analysis to focus on other
components like seasonality or irregularities.
Types of Trends:
3
Study Notes- Understanding Time Series
1. Upward Trend: Data consistently increases over time (e.g., rising stock prices or population
growth).
2. Downward Trend: Data consistently decreases over time (e.g., declining sales of outdated
products).
3. Horizontal Trend: Data remains relatively stable without significant change over time.
4. Non-linear Trend: Data follows a curved pattern rather than a straight line, indicating complex
growth or decline.
5. Damped Trend: Data shows a trend that slows or stabilizes over time, such as saturation in
market adoption.
Importance:
Understanding trends is critical for identifying the overall trajectory of a system or process,
enabling informed decision-making and effective long-term planning. For example, recognizing
an upward trend in demand can prompt businesses to expand production or allocate additional
resources.
Irregularity, also called noise or residuals, refers to random and unpredictable fluctuations in
time series data that cannot be explained by seasonality or trends. These anomalies represent
unexpected events or variations that deviate from the established patterns in the data.
Irregular components highlight the portion of the data not accounted for by predictable trends
or seasonal patterns.
Understanding and isolating these fluctuations is essential for accurate forecasting, as they may
affect the reliability of the model.
4
Study Notes- Understanding Time Series
By recognizing and accounting for irregularities, analysts can build robust models that better
distinguish between systematic patterns and random noise.
Residual
A residual is the difference between an observed value and its predicted value, providing a
measure of how well a model aligns with real-world data. In statistics, models are built using
experimental data to analyze trends and make predictions. A smaller residual indicates a more
accurate model, whereas larger residuals suggest potential issues with the model's fit (e.g.,
using a linear model for non-linear data). The example below illustrates residuals in a simple
linear regression context:
The line of best fit, shown in blue, represents a model for the heights of boys across different
ages. Residuals are depicted as dotted red lines connecting each data point to the line of best
fit. Data points above the line have positive residuals, those below have negative residuals, and
points directly on the line have residuals of zero. By definition, the sum of all residuals should
equal zero, although in practice, slight deviations may occur due to rounding.
5
Study Notes- Understanding Time Series
The Simple Moving Average (SMA) is a widely used technique for time series forecasting. It
calculates the average of a data series over a defined number of periods or intervals. This
method operates on the assumption that historical data reflects future trends, leveraging past
values to predict future outcomes.
In the above code , the moving average window of 6 months. Below is the plot of test, train and
forecasted data sets.
6
Study Notes- Understanding Time Series
The model's accuracy is below 50%, indicating it is not well-suited for this dataset. While the SMA
method is straightforward and easy to understand, it has limitations. It may fail to capture intricate
patterns or shifts in the underlying data and assigns equal weight to all points in the selected window,
which might not be ideal for every time series. To address these shortcomings and achieve more precise
forecasts, advanced techniques such as exponential smoothing, ARIMA, or machine learning algorithms
can be employed.
Simple Exponential Smoothing (SES) is a straightforward forecasting technique for time series
data, where greater weight is given to more recent observations using an exponentially
decreasing approach.
Below is an example code snippet where the SimpleExpSmoothing module from the statsmodels
package is imported to build the model.
Here we have used smoothing level of 0.001 to get the best accuracy. Below is the plot of test,
train and forecasted data sets.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test: The KPSS test evaluates the null hypothesis that the
time series is stationary. If the p-value exceeds the chosen significance level, the data is considered
stationary.
7
Study Notes- Understanding Time Series
The model is roughly 53.2% accurate which is certaining better than before but still not a good
model.
While ARIMA is suitable for stationary time series data with a constant mean and variance,
SARIMA is tailored for datasets with seasonal trends, making it ideal for applications like sales
forecasting, temperature predictions, and more.
Confirming Stationarity
To build effective SARIMA models, it’s essential to confirm the stationarity of the time series
data. Two popular statistical tests used for this purpose are:
Augmented Dickey-Fuller (ADF) Test: This test checks for the presence of a unit root, which
indicates non-stationarity. If the p-value from the ADF test is below the chosen significance
level, the data is deemed stationary.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test: This test evaluates stationarity by assessing the
null hypothesis that the series is trend-stationary.
By addressing both seasonal and non-seasonal aspects of time series data, SARIMA provides
robust and accurate forecasts for complex datasets.
In above code dataFinal is the data that we prepared in the last article. The code gives p-value of
0.016 which is less than the critical value of 0.05 which means according to this test series is
stationary
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test: The KPSS test evaluates the null hypothesis
that the time series is stationary. If the p-value exceeds the chosen significance level, the data is
considered stationary.
The below code gives p-value of 0.1 which is more than the critical value of 0.05 which means
that null hypothesis is reject and according to this, the series is stationary.
9
Study Notes- Understanding Time Series
Now Let’s apply the SARIMA method to get the model parameters which will be used for
fitting.Below is the output which shows multiple steps done to evaluate the model and provide
the paramters for the same.
10
Study Notes- Understanding Time Series
Now, with the model parameters in hand, let’s fit the model and do the prediction for the test
data
The forecast is now ready, let’s plot the same to check how it looks with train and test data sets.
11
Study Notes- Understanding Time Series
The above looks pretty good and better than previous models. Let’s check the error and
The model demonstrates an impressive accuracy of approximately 98.5%, making it the most
accurate among all the models tested. Consequently, this is the final model selected for
Conclusion
In our analysis of the wind power dataset, we evaluated three forecasting models: Simple
12
Study Notes- Understanding Time Series
Moving Average, Simple Exponential Smoothing, and SARIMA. The models exhibited varying
levels of accuracy, highlighting their suitability for this data.
The Simple Moving Average method, using a 6-month moving window, achieved less
than 50% accuracy, indicating its limitations in capturing the intricate patterns and
variability of the dataset.
The Simple Exponential Smoothing method showed a slight improvement, with an
accuracy of 46%. While better than the Moving Average approach, it still lacked the
precision needed for effective forecasting.
The SARIMA model outperformed both methods, achieving a remarkable accuracy of
98.5%. By incorporating seasonal, autoregressive, and moving average components,
SARIMA successfully captured the complex patterns and dependencies inherent in the
wind power data.
These results underscore SARIMA's effectiveness for time series data with seasonal
fluctuations.
13