Science and Technology Journals
Science and Technology Journals
Article Info
Received: 27-04-2023 Revised: 10-05-2023 Accepted: 22-05-2023 Published: 30-05-2023
Abstract- Power consumption prediction is a tough task because of its fluctuating nature. If the expected
demand is excessively high in comparison to the existing demand, the transformer may damage. Predicting the
temperature of transformer oil is an efficient approach to verify the transformer's safety status. As a result, in
this study, we offer a bimodal architecture for predicting oil temperature given a sequence of prior temperatures.
Our model was tested using the Ettm1, Ettm2, and Etth1 datasets and achieved an RMSE of 0.41375, MAE of
0.3031 and MAPE of 8.292% on Ettm1 test dataset, an RMSE of 0.4105, MAE 0.3090 and MAPE of 6.678% on
Ettm2 test dataset and an RMSE of 0.6762, MAE 0.4690 and MAPE of 11.23% on Etth1 test dataset.
1 Introduction
The electric power distribution problem is the distribution of electricity to different areas depending on its
sequential usage. However, it may be challenging to predict future demand for a specific location because it
fluctuates according to days of the week, seasons, weather, and temperatures, etc. However, no system now in
use can provide an accurate long-term forecast using extremely long-term real-world data. Any erroneous
prophecy has the potential to harm the transformer's electrical components. Managers must decide based on the
empirical estimate, which is far higher than the demands in reality, as there is no reliable way to anticipate
future power use. If the prediction is not accurate, the entire transformer can be damaged. On the other hand, a
transformer's electrical status may be determined by the transformer’s oil temperature. So it’s an efficient
strategy to predict how the transformer’s oil temperature is safe and it can help us avoid unnecessary waste.
Initially, statistical techniques like ARIMA [1] , [2], SARIMA, ARIMAX etc, and traditional machine
learning techniques like GBRT, and SVR [3] were used for TSF. Because of their inability to capture long-range
dependencies within a time series, their performance was not up to the mark. Deep learning-based approaches
like RNN, LSTM [4], and GRU have been proposed for TSF and have shown promising results. A sophisticated
deep neural network is required for the extraction of temporal connections since we are working with time series
data that is growing more complex and diverse, ranging from univariate to multivariate to today's big-time
series.
The Transformer architecture [5] not only captures the long-range dependencies but also, its self-attention
mechanism permits it to concentrate on the sequence segment that is most crucial for prediction. Since its
introduction, the transformer has been applied to a majority of tasks ranging from NLP, speech recognition and
human-motion recognition. Since then, there has been a surge of Transformer based models for TSF.
The major contributions of this manuscript are:
1. Proposed a bimodal architecture consisting of two branches, one being the sequence transformer and the
other being the LSTM-CNN branch.
2. The proposed model has achieved an RMSE of 0.41375, MAE 0.3031 and MAPE of 8.292% on Ettm1
test dataset, an RMSE of 0.4105, MAE 0.3090 and MAPE of 6.678% on Ettm2 test dataset and an RMSE
of 0.6762, MAE 0.4690 and MAPE of 11.23% on Etth1 test dataset.
2 Related Work
Theoretical guarantees exist for conventional time series forecasting techniques like ARIMA model [1] and
Holt-Winters seasonal approach [6]. They only really apply to univariate forecasting issues, which limits their
use to complicated time series data. Deep learning-based TSF algorithms have the potential to produce more
accurate forecasts than traditional methods due to the recent increases in processing power and data availability
[7] [8]. As seen in Fig. 1, earlier RNN-based TSF algorithms [9] [10] condense the previous data into internal
memory states that are iteratively updated with fresh inputs at every time step. The implementation of RNN-
based models is severely constrained by the gradient vanishing/exploding difficulties [11] and the ineffective
training process [12].
Outpu
ts
Recurrent
Layer
Inputs
Due to the efficacy and robustness of the self-attention processes, Transformer-based models [5] have
recently replaced RNN models in practically all sequence modeling applications. In the literature, many
Transformer-based TSF approaches (see Fig. 2 ) have been proposed [13], [7], [14], [15], [16], [17], [18], [19].
Utilizing their impressive long sequence modeling skills, these works frequently concentrate on the difficult
long-term time series forecasting challenge
Output
Attention
Layer
Projection
Layer
Inputs
3 Proposed Model
In this section, the proposed model is explained in detail.
Input Sequence
Input Embedding
Encoder Bi-LSTM
Image Formation
Block
Encoder
CNN Block
X3
Feature Merging
Linear
Input
Layer/Embedding
LayerNorm
MHSA
LayerNorm
Feed Forward
Network
Output of Encoder
Where ( )
4 Experimental Setup
4.1 Datasets
The Electricity Transformer Temperature dataset [20] gathers electrical data for two years (July 2016 to July
2018) from two transformers in China, including oil temperature and load data that is collected every 15 or
every hour. The datasets have been divided into train , validation and test set in the ratio 8:1:1 respectively.
4.3 Hardware
The models were trained on NVIDIA TITAN RTX GPU (24GB VRAM)
4.4 Hyperparameters
AdamW was used for training the model with an initial learning rate of 3e-4 and the StepLR learning rate
scheduler was used. The model was trained for 100 epochs with a batch size of 64 and an input window size of
192 and output horizon of 1. The models were trained with Pytorch on NVIDIA TITAN RTX GPU.
MAE: It is defined as the average of the absolute difference between the ground-truth value of a quantity
and the predicted value of that quantity.
Mathematically MAE is calculated as:
∑ | | (3)
MAPE: The Mean Absolute Percent Error (MAPE) is used to gauge the accuracy of the forecast. It is
commonly known as Mean Absolute Percent Deviation (MAPD). The accuracy is expressed as a percentage. It
can be enumerated by multiplying the average percent inaccuracy each time by the absolute value minus the
absolute value.
Mathematically MAPE is Calculated as:
∑ | | (4)
Fig. 5 Time Series Plots of Ground Truth Time Series and Time Series predicted by the proposed model on the Ettm2 test set.
7 References
[1] G. E. Box and M. G. Jenkins, “Some Recent Advances in Forecasting and Control,” Journal of the Royal Statistical Society
Series C: Applied Statistics, vol. 23, no. 2, pp. 158-179, 1974.
[2] P. Chujai and N. Kerdprasop, “Time Series Analysis of Household Electrical Consumption with ARIMA and ARMA
Models,” in International MultiConference of Engineers and Computer Scientists 2013 Vol I, Hong Kong, 2013.
[3] H. Drucker, L. Kaufman, A. Smola and V. Vapnik, “Support vector regression machines,” in NIPS, 1996.
[4] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, pp. 1735-1780, 1997.
[5] A. Vaswani and N. Shazeer, “Attention Is All You Need,” in Neural Information Processing Systems, Long Beach, 2017.
[6] C. C. Holt, “Forecasting seasonals and trends by exponentially weighted moving averages,” International Journal of
Forecasting, vol. 20, pp. 5-10, 2004.
[7] B. Lim, N. Loeff and T. Pfister, “Temporal Fusion Transformers for interpretable multi-horizon time series forecasting,”
International Journal of Forecasting, vol. 37, pp. 1748-1764, 2021.
[8] B. Oreshkin, D. Carpo, N. Chapados and Y. Bengio, “N-beats: Neural basis expansion analysis,” in ICLR, 2021.
[9] S. S. Rangapuram, M. W. Seeger and J. Gasthaus, “Deep state space models for time series forecasting,” in NIPS, 2021.
[10] D. Salinas, V. Flunkert, J. Gasthaus and T. Januschowski, “DeepAR: Probabilistic forecasting with autoregressive recurrent
networks,” International Journal of Forecasting, vol. 36, pp. 1181-1191, 2020.
[11] Y. Benjio, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol.
5, no. 2, pp. 157-166, 1994.
[12] Gers, S. and F. A., “Applying LSTM to time series predictable through time-window approaches,” in Springer, 2001.
[13] S. Li, X. Jin, Y. Xuan and X. Zhou, “Enhancing the locality and breaking the memory bottleneck of transformer on time
series forecasting,” in NIPS, 2019.
[14] N. Wu and B. Green, “Deep Transformer Models For Time Series Forecasting: The Influenza Prevalence Case,” arXiv, p. 10,
2020.
[15] K. Kondo and M. Kimura, “Sequence to sequence with attention for influenza prevalence prediction using google trends,” in
Proceedings of the 2019 3rd International Conference on Computational Biology and Bioinformatics, New York, 2019.
[16] L. S. SAOUD and H. AlMarzouqi, “Cascaded Deep Hybrid Models For Multistep Household Energy Consumption
Forecasting,” arXiv, p. 13, 13 10 2022.
[17] Sutskever and O. Vinyals, “Sequence to Sequence Learning with Neural Networks,” arXiv, 2014.
[18] H. Wu, J. Xu and J. Wang, “. Autoformer: Decomposition transformers with autocorrelation for long-term series
forecasting,” in NIPS, 2021.
[19] S. Liu, H. Yu, C. Liao and J. Li, “Pyraformer: Lowcomplexity pyramidal attention for long-range time series modeling and
forecasting,” in ICLR, 2021.
[20] H. Zhou, S. Zhang and J. Peng, “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting,” in
AAAI, 2021.