Stacking
Stacking
Stacking
Stacking is one of the popular ensemble modeling techniques in machine learning.
Various weak learners are ensembled in a parallel manner in such a way that by
combining them with Meta learners, we can predict better predictions for the
future.
This ensemble technique works by applying input of combined multiple weak learners'
predictions and Meta learners so that a better output prediction model can be achieved.
Architecture of Stacking
The architecture of the stacking model is designed in such as way that it consists of two
or more base/learner's models and a meta-model that combines the predictions of the
base models. These base models are called level 0 models, and the meta-model is
known as the level 1 model. So, the Stacking ensemble method includes original
(training) data, primary level models, primary level prediction, secondary level
model, and final prediction. The basic architecture of stacking can be represented as
shown below the image.
o Original data: This data is divided into n-folds and is also considered test data or
training data.
o Base models: These models are also referred to as level-0 models. These models
use training data and provide compiled predictions (level-0) as an output.
o Level-0 Predictions: Each base model is triggered on some training data and
provides different predictions, which are known as level-0 predictions.
o Meta Model: The architecture of the stacking model consists of one meta-
model, which helps to best combine the predictions of the base models. The
meta-model is also known as the level-1 model.
o Level-1 Prediction: The meta-model learns how to best combine the predictions
of the base models and is trained on different predictions made by individual
base models, i.e., data not used to train the base models are fed to the meta-
model, predictions are made, and these predictions, along with the expected
outputs, provide the input and output pairs of the training dataset used to fit the
meta-model.
o Split training data sets into n-folds using the RepeatedStratifiedKFold as this is the
most common approach to preparing training datasets for meta-models.
o Now the base model is fitted with the first fold, which is n-1, and it will make predictions
for the nth folds.
o The prediction made in the above step is added to the x1_train list.
o Repeat steps 2 & 3 for remaining n-1folds, so it will give x1_train array of size n,
o Now, the model is trained on all the n parts, which will make predictions for the sample
data.
o Add this prediction to the y1_test list.
o In the same way, we can find x2_train, y2_test, x3_train, and y3_test by using Model 2 and
3 for training, respectively, to get Level 2 predictions.
o Now train the Meta model on level 1 prediction, where these predictions will be used as
features for the model.
o Finally, Meta learners can now be used to make a prediction on test data in the stacking
model.
Voting ensembles:
This is one of the simplest stacking ensemble methods, which uses different algorithms
to prepare all members individually. Unlike the stacking method, the voting ensemble
uses simple statistics instead of learning how to best combine predictions from base
models separately.
The voting ensemble differs from than stacking ensemble in terms of weighing models
based on each member's performance because here, all models are considered to have
the same skill levels.
Member Assessment: In the voting ensemble, all members are assumed to have the
same skill sets.
Combine with Model: Instead of using combined prediction from each member, it uses
simple statistics to get the final prediction, e.g., mean or median.
Combine With Model: It considers the weighted average of prediction from each
member separately.
Blending Ensemble:
Blending is a similar approach to stacking with a specific configuration. It is considered a
stacking method that uses k-fold cross-validation to prepare out-of-sample predictions
for the meta-model. In this method, the training dataset is first to split into different
training sets and validation sets then we train learner models on the training sets.
Further, predictions are made on the validation set and sample set, where validation
predictions are used as features to build a new model, which is later used to make final
predictions on the test set using the prediction values as features.
Combine With Model: Linear model (e.g., linear regression or logistic regression).