0% found this document useful (0 votes)
18 views11 pages

SSL Assignment Report 1

The report outlines a project focused on forecasting energy consumption and production for prosumers on the Enefit platform in Estonia, utilizing various data features and modeling techniques including GRUs, LSTMs with Attention, MLPs, and XGBoost. The dataset underwent extensive preprocessing, merging, and feature engineering to enhance model performance, with results indicating that XGBoost achieved the lowest Mean Absolute Error (MAE) among the models tested. Future work will explore data augmentation techniques to improve model robustness and performance.

Uploaded by

Adrian Patrascu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

SSL Assignment Report 1

The report outlines a project focused on forecasting energy consumption and production for prosumers on the Enefit platform in Estonia, utilizing various data features and modeling techniques including GRUs, LSTMs with Attention, MLPs, and XGBoost. The dataset underwent extensive preprocessing, merging, and feature engineering to enhance model performance, with results indicating that XGBoost achieved the lowest Mean Absolute Error (MAE) among the models tested. Future work will explore data augmentation techniques to improve model robustness and performance.

Uploaded by

Adrian Patrascu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

SSL Assignment Report 1

Zincenco Petru, Pătras, cu Adrian-Octavian


April 17, 2025

1 Abstract
Our chosen assignment [3] involves forecasting the energy consumption and production time-series for
prosumers (consumers who also produce energy, mainly via solar panels) on the Enefit platform in
Estonia. We focus on relevant data features such as prosumer installed capacity, client metadata, his-
torical and forecast weather data from nearby stations, natural gas prices, and electricity market prices.
We plan to model this time-series data using Gated Recurrent Units (GRUs), LSTMs with Attention
(LSTM-Attn), MLPs and XGBoost [5], [9], [10]. We also open-source our results and experiments at
[15].

Contents
1 Abstract 1

2 Dataset Preprocessing 2
2.1 Data Loading and Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Data Merging and Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Feature Extraction and Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.4 Normalization and Data Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Models Used 3
3.1 MLP Model – Base Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 LSTM with Attention – Improved Sequential Modeling . . . . . . . . . . . . . . . . . . . . 3
3.3 GRU Model – Enhanced Temporal Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.4 XGBoost Gradient Boosting – Best Performing Model . . . . . . . . . . . . . . . . . . . . 3

4 Results 4
4.1 MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 LSTM-Attn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.3 GRU (window = 50) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.4 GRU (window = 100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.5 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Future Work 6
5.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.1.1 Common Time Series Augmentation Techniques . . . . . . . . . . . . . . . . . . . 6
5.1.2 Generative Time Series Synthetization Techniques . . . . . . . . . . . . . . . . . . 6
5.1.3 Evaluating Synthesized Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2.1 Boosting-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2.2 Traditional Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2.3 Transformers Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1
A Dataset Description 8
A.1 train.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A.2 gas prices.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A.3 client.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A.4 electricity prices.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
A.5 forecast weather.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
A.6 historical weather.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

B Weather Visualization 11

2 Dataset Preprocessing
Compared to the plethora of features given in Appendix A, we pruned most of them to save compute
and check the models’ performance with the most intuitive and descriptive features available.

2.1 Data Loading and Cleaning


Describe the steps taken to load and clean each source file:

• train.csv: Core energy target data (datetime, target, is consumption, data block id).
• client.csv: Client metadata (installed capacity, county, product type).
• forecast weather.csv: Weather forecasts (e.g. direct solar radiation, surface solar radiation downwards)

• historical weather.csv: Observed weather (temperature, dewpoint, rain).


• gas prices.csv & electricity prices.csv: Day-ahead market prices.

2.2 Data Merging and Alignment


The pipeline merges these CSV sources primarily on data block id and datetime, ensuring each record
is enriched with:
• Energy metrics from train.csv.
• Prosumer characteristics from client.csv.

• Forecast weather features.


• Historical weather for validation.
• Gas and electricity price indicators.

2.3 Feature Extraction and Engineering


After merging, we derive features meaningful for solar prosumers:
• Sun Intensity Estimation: Proxy from direct solar radiation and surface solar radiation downwards.

• Temporal Features: Decompose timestamps into hour, day, month.


• Rolling Statistics: Moving averages and standard deviations on energy and weather.
• Market Context: Normalized gas and electricity prices.

2.4 Normalization and Data Splitting


Numerical features are scaled for stable training. The final dataset is then split into training, validation,
and test sets in temporal order to prevent leakage.

2
3 Models Used
In this project, four distinct models have been employed to forecast energy consumption and production.
Each model leverages a unique architecture tailored to capture the data’s temporal and feature-based
characteristics. They are described below in the order of experimentation and performance:

3.1 MLP Model – Base Model


Architecture: A basic feed-forward neural network composed of several fully-connected (dense) layers
with ReLU activation functions. This model serves as the starting point by learning nonlinear combina-
tions of the input features without explicit temporal dependencies.
Input Data:
• A fixed-length feature vector that aggregates historical energy targets, weather parameters, and
client metadata over a predefined time window (e.g., past 24 hours).
Output Data:
• A single prediction per instance that represents the immediate future energy consumption/production
value.

3.2 LSTM with Attention – Improved Sequential Modeling


Architecture: An LSTM-based recurrent neural network enhanced with an attention mechanism
(LSTM-Attn). The LSTM units process the sequential data, while the attention layer dynamically
weights the hidden states to focus on the most relevant parts of the input sequence. This approach was
adopted after a simpler RNN failed to produce satisfying predictions.
Input Data:
• Sequential time series data where each timestep includes features such as past energy values,
corresponding timestamps, weather parameters, and client information.
Output Data:
• A sequence of predictions over the forecast horizon, benefiting from the model’s ability to selectively
emphasize important temporal features.

3.3 GRU Model – Enhanced Temporal Dynamics


Architecture: A gated recurrent unit (GRU) network that efficiently models sequential dependen-
cies using a simplified gating mechanism compared to LSTM. This results in faster training and fewer
parameters while capturing long-term dependencies effectively.
Input Data:
• Sequential time series data (same format as for RNN with Attention): energy metrics, weather
features, and client metadata over a designated time window.
Output Data:
• A sequence of predictions corresponding to future time intervals. The GRU’s efficient structure
provides improved performance relative to the RNN with Attention.

3.4 XGBoost Gradient Boosting – Best Performing Model


Architecture: A gradient boosting framework that builds an ensemble of decision trees. XGBoost [2]
is optimized for speed and performance by using regularization to prevent overfitting and by efficiently
handling missing values, making it well-suited for tabular data with engineered features.
Input Data:
• A fixed-length vector of carefully aggregated and engineered features derived from historical energy
data, statistical summaries, weather forecasts, and market prices.
Output Data:
• A single prediction per sample. The optimized tree ensemble delivers the highest forecasting
accuracy among all the methods tested.

3
4 Results
In this section we report each model’s forecast performance and show their prediction vs. ground-truth
curves plus training/validation loss plots.

Table 1: Mean Absolute Error (MAE) summary for all models

Model MAE (kWh)


MLP 136.43
LSTM-Attn 173.24
GRU (window = 50) 158.65
GRU (window = 100) 116.48
XGBoost 80.88

4.1 MLP

(a) Prediction vs. Ground Truth (b) Training and Validation Loss

Figure 1: MLP Results

4.2 LSTM-Attn

(a) Prediction vs. Ground Truth (b) Training and Validation Loss

Figure 2: LSTM-Attn Results

4.3 GRU (window = 50)

4
(a) Prediction vs. Ground Truth (b) Training and Validation Loss

Figure 3: GRU (window = 50) Results

4.4 GRU (window = 100)

(a) Prediction vs. Ground Truth (b) Training and Validation Loss

Figure 4: GRU (window = 100) Results

4.5 XGBoost

Figure 5: XGBoost: prediction vs. ground truth.

5
5 Future Work
5.1 Data Augmentation
We want to employ several augmentation techniques to make our models more robust and prevent
overfitting. Time series are particularly hard to augment due to their temporal structure which is
inherently tied to their behavior. Thus, we categorize the augmentation techniques into two avenues:

5.1.1 Common Time Series Augmentation Techniques


1. Noise: Random small amounts of jittering increase the resilience of the models to small sensor
misreadings of weather or energy consumption. [8]
2. Magnitude Scaling: Scaling installed capacity from subsection A.3 to different clipping val-
ues simulates different users with different energy consumption capabilities.
3. Window Slicing: Slicing the history data into overlapping segments of data should be beneficial
with a good enough window that captures relevant features and changes in the environment.

5.1.2 Generative Time Series Synthetization Techniques


VAEs, GANs, Transformers could learn the complex mapping from context (weather forecasts, client
profile, time of day/year, data block id) to the target energy profile (target, is consumption).
This would allow generating realistic energy series for specific, perhaps underrepresented, conditions
(e.g., extreme weather events, new types of prosumers, future data block id characteristics).
Specific generative models under consideration for this purpose include TimeGAN [14], VAE-GAN
[7], and TTS-GAN [1].

5.1.3 Evaluating Synthesized Data Quality


The quality of the generated time series data is evaluated based on two criteria: statistical fidelity and
downstream task performance: [4]
• Statistical Fidelity: We compare the distributions and temporal characteristics of synthetic
versus real data. This involves analyzing basic statistics (mean, standard deviation, min/max),
correlation patterns (Pearson, Autocorrelation Function, Partial Autocorrelation Function), and
overall distribution similarity using the Kolmogorov-Smirnov test.
• Downstream Performance: The practical utility is assessed using a Train-on-Synthetic, Test-
on-Real (TSTR) framework, where models trained solely on synthetic data are evaluated on their
ability to generalize to the real test dataset.

5.2 Models
5.2.1 Boosting-based methods
LightBGM [6] is known to be a faster and less resource-intensive variant of XGBoost. Doing faster
experiments with no drawbacks would be very useful.
CatBoost [13] using unique features via oblivious decision trees (the same split criterion is used on
the whole tree). The sophisticated built-in handling of categorical features should leverage the importance
of features such as county or is business, capturing complex interactions.

5.2.2 Traditional Statistical Models


Models such as ARIMA, SARIMA and Prophet should make interesting experiments for broadening
the feature engineering for more complex inputs in ML methods. [11]

5.2.3 Transformers Models


Always data-hungry and expensive to run, transformers are a far-fetched fit to our task. They are high-
risk, high-reward models that could easily develop strong biases and need lots of patching. It’s still an
ongoing area of research, so we don’t expect better results than boosting methods. We will experiment
with PatchTST [12].

6
References
[1] Mikolaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande,
Luis C. Cobo, and Karen Simonyan. High fidelity speech synthesis with adversarial networks. arXiv
preprint arXiv:1909.11646, 2019.
[2] Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages
785–794. ACM, 2016.
[3] Enefit. Enefit - Predict Energy Behavior of Prosumers. Kaggle Competition, 2023. Accessed:
2025-04-17.

[4] F. Haddad. How to evaluate the quality of the synthetic data – measuring from the perspective of
fidelity, utility, and privacy. AWS Machine Learning Blog, 2022. Accessed: 2025-04-17.
[5] hyd. 1st place solution. Kaggle Discussion Forum, Enefit - Predict Energy Behavior of Prosumers
Competition, 2024. Accessed: 2025-04-17.
[6] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan
Liu. LightGBM: A highly efficient gradient boosting decision tree. In Isabelle Guyon, Ulrike von
Luxburg, Sébastien Bengio, Hanna M. Wallach, Rob Fergus, Sanjay Vishwanathan, and Roman
Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3146–3154. Curran
Associates, Inc., 2017.
[7] Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoen-
coding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
[8] Mad Devs. Basic Data Augmentation Method Applied to Time Series. Mad Devs Blog, Jan 2024.
Published: 2024-01-15. Accessed: 2025-04-17.
[9] Matt Motoki. 6th place solution. Kaggle Discussion Forum, Enefit - Predict Energy Behavior of
Prosumers Competition, 2024. Accessed: 2025-04-17.
[10] Jakob Khalil Musone and Thomas Deckers. Enefit - Predict Energy Behavior of Prosumers.
https://github.com/Musone/Predict-Energy-Behavior-of-Prosumers, 2025. GitHub reposi-
tory. Accessed: 2025-04-17.
[11] Neptune.ai. ARIMA vs Prophet vs LSTM for Time Series Prediction. Neptune.ai Blog, Jan 2025.
Published: 2025-01-24. Accessed: 2025-04-17.
[12] Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth
64 words: Long-term forecasting with transformers. In International Conference on Learning Rep-
resentations, 2023. arXiv preprint arXiv:2211.14730.

[13] Liudmila Ostroumova Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush,
and Andrey Gulin. CatBoost: unbiased boosting with categorical features. In Advances in Neural
Information Processing Systems, pages 6639–6649, 2018.
[14] Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar. Time-series generative adversarial
networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.

[15] Petru-Costin Zinceno. Predict-Energy-Behavior-Research. https://github.com/IGrilex/


Predict-Energy-Behavior-Research, 2025. GitHub repository. Accessed: 2025-04-17.

7
A Dataset Description
This appendix details the structure and parameters of the dataset files provided for the competition. All
datasets use EET/EEST time (UTC+2/UTC+3), with ‘datetime‘ columns typically indicating the start
of a 1-hour interval, except for some instantaneous weather measurements noted below.

A.1 train.csv
Summary: Core training data with hourly energy targets per segment.

Table 2: Dataset Parameters: train.csv

Parameter Summary Description


county County ID Estonian county identifier code.
is business Business Flag True if prosumer is a business, False otherwise.
product type Contract Code Contract type: 0:Combined, 1:Fixed, 2:General
service, 3:Spot.
target Energy Target Hourly energy (kWh, consumption/production)
per segment.
is consumption Target Type True if target is consumption, False if production.
datetime Timestamp Start time of the 1-hour period (EET/EEST).
data block id Data Block Link Groups data available at the same forecast time.
Lag exists for historical data.
row id Row Identifier Unique row ID.
prediction unit id Segment ID Unique ID for county/business/product
combination.

A.2 gas prices.csv


Summary: Daily min/max day-ahead natural gas prices.

Table 3: Dataset Parameters: gas prices.csv

Parameter Summary Description


origin date Available Date Date when day-ahead prices became available.
forecast date Relevant Date Date for which prices are relevant.
lowest price per mwh Min Gas Price Lowest daily day-ahead gas price (€/MWh).
highest price per mwh Max Gas Price Highest daily day-ahead gas price (€/MWh).
data block id Data Block Link Groups data available at the same forecast time.

A.3 client.csv
Summary: Client segment info, including installed solar capacity.

Table 4: Dataset Parameters: client.csv

Parameter Summary Description


product type Contract Code Client segment’s contract type code.
county County ID County code for the client segment.
eic count EIC Count Number of consumption points (EICs) in the
segment.
installed capacity Solar Capacity Total installed solar panel capacity (kW) for the
segment.
is business Business Flag True if segment represents businesses.

8
Table 4: Dataset Parameters: client.csv (continued)

Parameter Summary Description


date Record Date Date this client segment data applies to.
data block id Data Block Link Groups data available at the same forecast time.

A.4 electricity prices.csv


Summary: Hourly day-ahead electricity market prices.

Table 5: Dataset Parameters: electricity prices.csv

Parameter Summary Description


origin date Available Date Date when day-ahead prices became available.
forecast date Valid Hour Start time of the 1-hour period (EET/EEST)
price is valid for.
euros per mwh Electricity Price Day-ahead electricity price (€/MWh).
data block id Data Block Link Groups data available at the same forecast time.

A.5 forecast weather.csv


Summary: Weather forecast data from ECMWF, available at prediction time.

Table 6: Dataset Parameters: forecast weather.csv

Parameter Summary Description


latitude / Coordinates Location of the forecast grid point.
longitude
origin datetime Forecast Time Timestamp (EET/EEST) when forecast was
generated.
hours ahead Lead Time Hours between generation and validity time
(forecast covers 48h).
temperature Temperature Forecast air temp at 2m (◦ C), estimated at end
of hour.
dewpoint Dew Point Forecast dew point temp at 2m (◦ C), estimated
at end of hour.
cloudcover low / Cloud Cover % Forecast cloud cover (%) at end of hour (Low:
mid / high / total 0-2km, Mid: 2-6km, High: 6+km, Total).
Wind Components
10 metre u wind component Forecast eastward (u) and northward (v) wind
/ v wind component speed at 10m (m/s), estimated at end of hour.
data block id Data Block Link Groups data available at the same forecast time.
forecast datetime Valid Time Start time (EET/EEST) the forecast applies to
(origin + hours ahead).
direct solar radiationDirect Radiation Forecast direct solar energy on perpendicular
plane, accumulated hourly (W h/m2 ).
Global
surface solar radiation Radiation
downwards Forecast total (direct+diffuse) solar energy on
horizontal surface, accumulated hourly (W h/m2 ).
snowfall Snowfall Forecast hourly snowfall accumulation (meters
water equivalent).
total precipitation Precipitation Forecast hourly liquid precipitation accumulation
(rain + melted snow) (meters).

9
A.6 historical weather.csv
Summary: Observed historical weather data. Units/definitions may differ from forecasts.

Table 7: Dataset Parameters: historical weather.csv

Parameter Summary Description


datetime Measurement Hour Start time of the 1-hour period (EET/EEST).
temperature Temperature Observed air temp at 2m (◦ C) at end of hour.
dewpoint Dew Point Observed dew point temp at 2m (◦ C) at end of
hour.
rain Rain Observed hourly rainfall (mm). *Unit differs
from forecast.*
snowfall Snowfall Observed hourly snowfall (cm). *Unit differs
from forecast.*
surface pressure Pressure Observed surface air pressure (hPa), assumed at
end of hour.
cloudcover low / Cloud Cover % Observed cloud cover (%) at end of hour (Low:
mid / high / total 0-3km, Mid: 3-8km, High: 8+km, Total). *Bands
differ from forecast.*
windspeed 10m Wind Speed Observed wind speed magnitude at 10m (m/s) at
end of hour. *Not components.*
winddirection 10m Wind Direction Observed wind direction at 10m (degrees) at end
of hour.
shortwave radiation Global Radiation Observed global horizontal radiation
(direct+diffuse), accumulated hourly (W h/m2 ).
direct solar radiationDirect Radiation Observed direct solar energy on perpendicular
plane, accumulated hourly (W h/m2 ).
diffuse radiation Diffuse Radiation Observed diffuse solar energy on horizontal plane,
accumulated hourly (W h/m2 ).
latitude / Coordinates Location of the weather station.
longitude
data block id Data Block Link Groups data available at the same forecast time.
Historical block ID usually follows forecast block
ID.

10
B Weather Visualization

Figure 6: Weather forecast stations across Estonia as provided in the dataset

11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy