100% found this document useful (3 votes)
331 views24 pages

Gas Prod

The document discusses analyzing a time series gas production dataset using R. It finds the dataset exhibits trend, seasonality and white noise. The series is non-stationary but becomes stationary after differencing. An ARIMA model is developed using auto.arima and manual steps to forecast the next 12 periods. The accuracy of the model is reported.

Uploaded by

Tushar Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
331 views24 pages

Gas Prod

The document discusses analyzing a time series gas production dataset using R. It finds the dataset exhibits trend, seasonality and white noise. The series is non-stationary but becomes stationary after differencing. An ARIMA model is developed using auto.arima and manual steps to forecast the next 12 periods. The accuracy of the model is reported.

Uploaded by

Tushar Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Table of Contents

1 Project Objective ............................................................................................................ 3


2 Data Analysis ................................................................................................................ 3
2.1 Environment Set up and Data Import ....................................................................... 3
2.1.1 Install necessary Packages and Invoke Libraries .............................................. 3
2.1.2 Set up working Directory .................................................................................. 3
2.1.3 Import and Read the Dataset ........................................................................... 3
2.2 R Functions Used ................................................................................................... 4
2.3 Descriptive Statistics Analysis ................................................................................. 4
2.4 Plots and Components…………………………………………………………………….4
2.5 Periodicity…………………………………………………………………………………...7
2.6 Deseasonal Series…………………………………………………………………………7
3 Stationary Series............................................................................................................ 8
4 ARIMA Model……………...…………………………………………………………………….13
5 Accuracy...........................................................................................................................22
1 Project Objective
For this assignment, you are requested to download the Forecast package in R. The package
contains methods and tools for displaying and analyzing univariate time series forecasts including
exponential smoothing via state space models and automatic ARIMA modelling. Explore
the gas (Australian monthly gas production) dataset in Forecast package to do the following:

• Read the data as a time series object in R. Plot the data


• What do you observe? Which components of the time series are present in this dataset?
• What is the periodicity of dataset?
• Is the time series Stationary? Inspect visually as well as conduct an ADF test? Write down
the null and alternate hypothesis for the stationarity test? De-seasonalise the series if
seasonality is present?
• Develop an ARIMA Model to forecast for next 12 periods. Use both manual and auto.arima
(Show & explain all the steps)
• Report the accuracy of the model

2 Data Analysis

2.1 Environment Set up and Data Import

2.1.1 Install necessary Packages and Invoke Libraries


Use this section to install necessary packages and invoke associated libraries. Having all
the packages at the same places increases code readability.

2.1.2 Set up working Directory


Setting a working directory on starting of the R session makes importing and exporting data
files and code files easier. Basically, working directory is the location/ folder on the PC where
you have the data, codes etc. related to the project.

2.1.3 Import and Read the Dataset


Since the data is already present in the forecast package, we need to import from there .
Data is already in time series format so we don’t need to convert that data into time series.

2.2 R Functions Used


➢ Different R functions have been used in the analysis.
➢ str:- To tell the structure of data
➢ summary:- To find summary of data
➢ ts.plot- to plot in the data
3|Page
➢ anyNA- to find missing val
➢ window- to take subset of the data in the ordered manner
➢ hist- to plot histogram
➢ stl- to decompose the data and find components
➢ diff- to stationarise the series
➢ acf, pacf- for autocorrelation
➢ forecast, arima, auto.arima, box.test, accuracy – commands used for arima modelling

2.3 Descriptive Statistics Analysis

➢ Original Data set contains 476 observations with 1 variable.


➢ Data is a time series data .
➢ Time series data range from 1956(January) to 1995(August).
➢ The frequency of data is 12
➢ Dataset contain NO OUTLIERS
➢ There are no missing values and cleaning of data is done using tsclean command

Summary

Min. 1st Qu. Median Mean 3rd Qu. Max.


1646 2675 16788 21415 38629 66600

2.4 Plotting:- Data and Components of data

4|Page
5|Page
6|Page
KEY OBSERVATIONS FROM DATA PLOTIING

➢ From the plot it can be seen gas production was constant for more than a decade(till1970) but
after that it start increasing , thus showing trend and seasonality. It is multiplicative series.
➢ Gas production is most in July followed by August and June . in the starting month of the years
the production was less , then it start increasing in the middle months and again start decreasing
towards end of the year.
➢ Gas Production has increased over the years.
➢ There are no outliers present in the data.
➢ Components present are TREND,SEASONALITY AND WHITE NOISE

2.5 Periodicity in Data

Since the data is monthly gas production , hence the periodicity is monthly.
The frequency of the data is 12.

2.6 Deseasonalised Data

7|Page
3 STATIONARY SERIES
➢ A common assumption in many time series techniques is that the data are stationary.
➢ A stationary process has the property that the mean, variance and autocorrelation structure
do not change over time.
➢ Stationarity can be defined in precise mathematical terms, but for our purpose we mean a flat
looking series, without trend, constant variance over time, a constant autocorrelation
structure over time and no periodic fluctuations (seasonality).
➢ The correlation between the t-th term in the series and the t+m-th term in the series is
constant for all time periods and for all m

Technique To Transform Series Into Stationary


➢ Differencing Data:- That is, given the series , we create the new series.The
differenced data will contain one less point than the original data. Although you
can difference the data more than once, one difference is usually sufficient.
➢ Detrending:-If the data contain a trend, we can fit some type of curve to the data
and then model the residuals from that fit. Since the purpose of the fit is to simply
remove long term trend, a simple fit, such as a straight line, is typically used.
➢ Log:-For non-constant variance, taking the logarithm or square root of the series
may stabilize the variance. For negative data, you can add a suitable constant to
make all the data positive before applying the transformation. This constant can
then be subtracted from the model to obtain predicted (i.e., the fitted) values and
forecasts for future points.

3.1 ADF TEST


Augmented Dicky Fuller(ADF) test determines whether the series is stationary or non
stationary.
P-value should be less than 0.05 for series to be stationary

HYPOTHESIS FOR ADF TEST

NULL HYPOTHESIS(H0) :- Series is non stationary


ALTERNATE HYPOTHESIS (Ha) :- Series is stationary

ADF TEST -NON STATIONARY TIME SERIES

Since the p-value is more than 0.05, we can’t reject null hypothesis. Thus coming to the conclusion
that it is non stationary series

8|Page
Augmented Dickey-Fuller Test

data: gas
Dickey-Fuller = -2.7131, Lag order = 7, p-value = 0.2764
alternative hypothesis: stationary

ACF AND PACF PLOT

➢ Autocorrelation: Correlation with self


➢ Autocorrelation of different orders gives inside information regarding time series
➢ Determines order p of the series –1 ≤ ACF ≤ 1 ACF(0) = 1
➢ Partial autocorrelation adjusts for the intervening periods
➢ Theoretically (p, q) may take any value but usually values higher than 2 not preferred in
practical situation

ACF and PACF makes sense only if the series is stationary

9|Page
10 | P a g e
TRANSFORMATION TO STATIONARY SERIES USING LOG AND DIFFERENCING

❖ ADDING LOG TO SERIES MAKE THE VARIANCE CONSTANT

❖ DIFFERNCING STABILIZES THE MEAN OF SERIES

11 | P a g e
ADF TEST FOR STATIONARY SERIES

Augmented Dickey-Fuller Test


data: gasdiff
Dickey-Fuller = -10.319, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

ACF AND PACF PLOT

12 | P a g e
4 ARIMA MODEL
➢ AR is a special and simpler form of a general class of models: ARIMA(p, d, q)

➢ AR: Current observation is regressed on past observations


Y(t) = β1Y(t-1) + β2Y(t-2) + β3Y(t-3) + ... +βpY(t-p) + ε(t)

➢ MA (Moving Average): Current observation is regressed on past forecast errors


Y(t) = ε(t) + α1 ε(t-1) +α2 ε(t-2) + ... + αq ε(t-q) | α1 | < 1 α1, α2 , ..., αq : Moving average
parameters
➢ ACF and PACF plot gives the values of p,d,q

➢ Theoretically (p, q) may take any value but usually values higher than 2 not preferred
in practical situation

➢ ARIMA(p, d, q) identifies a non-seasonal model which needs to be differenced d times


to make it stationary and contains p AR terms and q MA terms

13 | P a g e
4.1 FITTING ARIMA MODEL

After doing few iterations and making few combinations of (p,d,q), we come to the conclusion that
best suited values for non seasonal (p,d,q -2,1,2) and seasonal values for (p,d,q-2,1,2)

ASSUMPTIONS OF ARIMA

➢ If the original series is not stationary, differencing is necessary


➢ Check for model adequacy through residual plots and against fitted values

➢ .Plot ACF(residuals)
Residuals plot have to be normally distributed.
Residuals should not be autocorrelated

➢ Do portmanteau test(BOX- LJUNG TEST) for residuals Portmanteau Test to check whether
the residuals are independent. P-value must be greater than 0.05
H0 : Residuals are independent
Ha: Residuals are not independent

IF ALL THE ASSUMPTIONS ARE FULFILLED THAN ONLY ARIMA MODEL IS CONSIDERED
BEST FOR FORECASTING

4.1.1 MANUAL ARIMA MODEL WITH NON SEASONAL EFFECT(2,1,2)

Since the parameters used in the model are not fulfilling the assumptions made above we
won’t consider this model for forecasting.

Coefficients:
ar1 ar2 ma1 ma2
1.7180 -0.9797 -1.8485 0.9999
s.e. 0.0184 0.0188 0.0559 0.0506

sigma^2 estimated as 0.003845: log likelihood = 108.77, aic = -207.55

14 | P a g e
BOX TEST

Box-Ljung test
data: gas.arima.fit$residuals
X-squared = 174.44, df = 50, p-value = 1.11e-15

15 | P a g e
FORECAST MODEL

16 | P a g e
4.1.2 MANUAL ARIMA MODEL WITH SEASONAL EFFECT(2,1,2)(2,1,2)

Since this model meets the assumptions we will consider this model for
forecasting

Coefficients:
ar1 ar2 ma1 ma2 sar1 sar2 sma1 sma2
-0.8669 -0.3009 0.3835 -0.1606 0.7900 -0.5407 -1.6498 0.9984
s.e. 0.3598 0.2419 0.3668 0.3006 0.2633 0.2241 0.9490 1.0718

sigma^2 estimated as 0.001374: log likelihood = 117.37, aic = -216.74

BOX TEST

Box-Ljung test

data: gas.arima.fit_season$residuals
X-squared = 50.505, df = 50, p-value = 0.4534

17 | P a g e
18 | P a g e
MODEL FORECAST

4.1.3 AUTO ARIMA MODEL

Auto ARIMA takes into account the AIC and BIC values generated to determine the best
combination of parameters. AIC (Akaike Information Criterion) and BIC (Bayesian
Information Criterion) values are estimators to compare models. The lower these values, the
better id the model

ARIMA(0,1,1)(0,1,1)[12]
Coefficients:
ma1 sma1
-0.5437 -0.8052
s.e. 0.0937 0.2372
sigma^2 estimated as 0.002052: log likelihood=114.2
AIC=-222.4 AICc=-222.04 BIC=-215.61

19 | P a g e
BOX TEST

Box-Ljung test

data: auto.fit$residuals
X-squared = 61.286, df = 50, p-value = 0.1316

20 | P a g e
MODEL FORECAST

21 | P a g e
5 ACCURACY OF MODEL

ARIMA MODEL WITH SEASONAL EFFECT

ME RMSE MAE MPE MAPE


Training set 0.0009694977 0.03433724 0.02543251 0.008201871 0.2371457
Test set 0.0045605091 0.05058495 0.04131225 0.038808571 0.3841784

MASE ACF1 Theil's U


Training set 0.3649114 -0.01063201 NA
Test set 0.5927576 0.12740599 0.508929

AUTO ARIMA

ME RMSE MAE MPE MAPE


Training set 0.0002340178 0.04106044 0.02937808 0.001000298 0.2740695
Test set 0.0350038511 0.05821483 0.04413084 0.321383923 0.4075100

MASE ACF1 Theil's U


Training set 0.4215233 -0.02145511 NA
Test set 0.6331993 0.10362446 0.5987873

BEST MODEL

MAPE OUTPUT IS ALMOST SAME FOR BOTH MODELS


BUT LOWER AIC VALUE MEANS BETTER MODEL

HENCE, AUTO ARIMA MODEL HAS BETTER ACCURACY

MODEL AIC MAPE

ARIMA WITH SEASON -216.74 0.40

AUTO ARIMA -222.4 0.38

22 | P a g e
23 | P a g e
24 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy