0% found this document useful (0 votes)
53 views22 pages

Random Forest

Random forest is a machine learning algorithm that uses ensemble learning. It builds multiple decision trees during training and outputs the class that is the mode of the classes or mean prediction of the individual trees. It works by bagging, or sampling with replacement from the training data to build each tree. This helps reduce variance and prevent overfitting. Random forest can handle both classification and regression problems and performs well even with missing data or continuous/categorical variables. It is widely used in areas like banking, e-commerce, and medicine due to its accuracy and ability to handle large datasets.

Uploaded by

Kizifi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views22 pages

Random Forest

Random forest is a machine learning algorithm that uses ensemble learning. It builds multiple decision trees during training and outputs the class that is the mode of the classes or mean prediction of the individual trees. It works by bagging, or sampling with replacement from the training data to build each tree. This helps reduce variance and prevent overfitting. Random forest can handle both classification and regression problems and performs well even with missing data or continuous/categorical variables. It is widely used in areas like banking, e-commerce, and medicine due to its accuracy and ability to handle large datasets.

Uploaded by

Kizifi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Random Forest

What is Random forest ??


• Random forest is a Supervised Machine Learning Algorithm that
is used widely in Classification and Regression problems.
• It builds decision trees on different samples and takes their
majority vote for classification and average in case of regression.
• One of the most important features of the Random Forest Algorithm
is that it can handle the data set containing continuous variables as in
the case of regression and categorical variables as in the case of
classification.
• It performs better results for classification problems.
Working of Random Forest Algorithm
• Before understanding the working of the random forest we must look into the ensemble
technique.
•  Ensemble simply means combining multiple models. Thus a collection of models is used to make
predictions rather than an individual model.

• Ensemble uses two types of methods:

• 1. Bagging– It creates a different training subset from sample training data with replacement &
the final output is based on majority voting.
For Example, Random Forest.
• 2. Boosting– It combines weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy.
For example,  ADA BOOST, XG BOOST
 Random forest works on the Bagging principle.
Bagging
• Bagging, also known as Bootstrap Aggregation is the ensemble technique used by
random forest. 
• Bagging chooses a random sample from the data set.
• Hence each model is generated from the samples (Bootstrap Samples) provided by the
Original Data with replacement known as row sampling.
• This step of row sampling with replacement is called bootstrap.
• Now each model is trained independently which generates results.
• The final output is based on majority voting after combining the results of all models.
• This step which involves combining all the results and generating output based on
majority voting is known as aggregation.
Example :
•  Here the bootstrap sample is taken from actual data (Bootstrap
sample 01, Bootstrap sample 02, and Bootstrap sample 03) with a
replacement which means there is a high possibility that each sample
won’t contain unique data.
• Now the model (Model 01, Model 02, and Model 03) obtained from
this bootstrap sample is trained independently. Each model generates
results as shown.
• Now Happy emoji is having a majority when compared to sad emoji.
Thus based on majority voting final output is obtained as Happy
emoji.
• Example 2 :  consider the fruit basket as the data as shown in the figure
below.
• Now n number of samples are taken from the fruit basket and an
individual decision tree is constructed for each sample.
• Each decision tree will generate an output as shown in the figure.
• The final output is considered based on majority voting. In the below
figure you can see that the majority decision tree gives output as an apple
when compared to a banana, so the final output is taken as an apple.
This algorithm is widely used in E-commerce, banking, medicine, the stock market, etc.

For example: In the Banking industry it can be used to find which customer will default on the loan.
Advantages and Disadvantages of Random Forest
Algorithm
• Advantages 
• 1.  It can be used in classification and regression problems.
• 2. It solves the problem of overfitting as output is based on majority voting or averaging.
• 3. It performs well even if the data contains null/missing values.
• 4. Each decision tree created is independent of the other thus it shows the property of
parallelization.
• 5. It is highly stable as the average answers given by a large number of trees are taken.
• 6. It maintains diversity as all the attributes are not considered while making each
decision tree though it is not true in all cases.
• 7. It is immune to the curse of dimensionality. Since each tree does not consider all the
attributes, feature space is reduced.
• 8. We don’t have to segregate data into train and test as there will always be 30% of the
data which is not seen by the decision tree made out of bootstrap.
• Disadvantages

• 1. Random forest is highly complex when compared to decision trees


where decisions can be made by following the path of the tree.
• 2. Training time is more compared to other models due to its
complexity. Whenever it has to make a prediction each decision tree
has to generate output for the given input data.
Coding in python – Random Forest

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy