Random Forest
Random Forest
• 1. Bagging– It creates a different training subset from sample training data with replacement &
the final output is based on majority voting.
For Example, Random Forest.
• 2. Boosting– It combines weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy.
For example, ADA BOOST, XG BOOST
Random forest works on the Bagging principle.
Bagging
• Bagging, also known as Bootstrap Aggregation is the ensemble technique used by
random forest.
• Bagging chooses a random sample from the data set.
• Hence each model is generated from the samples (Bootstrap Samples) provided by the
Original Data with replacement known as row sampling.
• This step of row sampling with replacement is called bootstrap.
• Now each model is trained independently which generates results.
• The final output is based on majority voting after combining the results of all models.
• This step which involves combining all the results and generating output based on
majority voting is known as aggregation.
Example :
• Here the bootstrap sample is taken from actual data (Bootstrap
sample 01, Bootstrap sample 02, and Bootstrap sample 03) with a
replacement which means there is a high possibility that each sample
won’t contain unique data.
• Now the model (Model 01, Model 02, and Model 03) obtained from
this bootstrap sample is trained independently. Each model generates
results as shown.
• Now Happy emoji is having a majority when compared to sad emoji.
Thus based on majority voting final output is obtained as Happy
emoji.
• Example 2 : consider the fruit basket as the data as shown in the figure
below.
• Now n number of samples are taken from the fruit basket and an
individual decision tree is constructed for each sample.
• Each decision tree will generate an output as shown in the figure.
• The final output is considered based on majority voting. In the below
figure you can see that the majority decision tree gives output as an apple
when compared to a banana, so the final output is taken as an apple.
This algorithm is widely used in E-commerce, banking, medicine, the stock market, etc.
For example: In the Banking industry it can be used to find which customer will default on the loan.
Advantages and Disadvantages of Random Forest
Algorithm
• Advantages
• 1. It can be used in classification and regression problems.
• 2. It solves the problem of overfitting as output is based on majority voting or averaging.
• 3. It performs well even if the data contains null/missing values.
• 4. Each decision tree created is independent of the other thus it shows the property of
parallelization.
• 5. It is highly stable as the average answers given by a large number of trees are taken.
• 6. It maintains diversity as all the attributes are not considered while making each
decision tree though it is not true in all cases.
• 7. It is immune to the curse of dimensionality. Since each tree does not consider all the
attributes, feature space is reduced.
• 8. We don’t have to segregate data into train and test as there will always be 30% of the
data which is not seen by the decision tree made out of bootstrap.
• Disadvantages