0% found this document useful (0 votes)
2 views34 pages

Randon Forest

The document provides an overview of Random Forest, an ensemble learning method that combines multiple decision trees to enhance accuracy and reduce overfitting. It explains key concepts such as bagging, hyperparameters, and the advantages of using Random Forest over single decision trees. Additionally, it outlines real-world applications, strengths, limitations, and best practices for implementing Random Forest in various scenarios.

Uploaded by

Aarthy R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views34 pages

Randon Forest

The document provides an overview of Random Forest, an ensemble learning method that combines multiple decision trees to enhance accuracy and reduce overfitting. It explains key concepts such as bagging, hyperparameters, and the advantages of using Random Forest over single decision trees. Additionally, it outlines real-world applications, strengths, limitations, and best practices for implementing Random Forest in various scenarios.

Uploaded by

Aarthy R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Decision Tree

Dr. Shubhra goyal


shubhrag@regenesys.net
2
REGENESYS’ INTEGRATED
LEADERSHIP
AND MANAGEMENT MODEL
Holistic focus on the individual
(SQ, EQ, IQ, and PQ)

Interrelationships are dynamic


between individual, team,
institution and the external
environment (systemic)

Strategy affects individual,


team, organisational, and
environmental performance

Delivery requires alignment of


strategy,
structure, systems and culture
3
REGENESYS GRADUATE
ATTRIBUTES

4
Ground
Rules
• Be open-minded
• Listen carefully
• Avoid doing other unrelated tasks when attending the session so
that you can be FOCUSED.
• Raise your hand if you have any query so that we can ensure one
conversation at a time
• When speaking, use “I think”, “I feel”, etc. (you are a very important
aspect of
this learning)
• Respect the opinions of
others
• Give constructive feedback
• Build on the ideas of
others
Have rather
fun and thanthe learning
ENJOY
experience!
destroying them 7
Know Your Mentor
Qualification
• Ph D in Machine Learning
• Master in Computer Science
13 Years of experience
Teaching
• Corporate Trainer
• Assistant Professor
Areas of specialization
• AI & Data Science
• Database Management
• Software Engineering
Publications –Research papers
Dr. Shubhra Goyal
www.linkedin.com/in/dr-shubhra-goyal-03a4
a922/
shubhrag@regenesys.net
Understanding
Random Forests
DR. SHUBHRA GOYAL
Agenda

•Understanding what Random Forest is


•Exploring the core concepts behind it like decision trees, bagging, and ensemble learning
•Learning how Random Forest works step-by-step
•Studying the core methods and hyperparameters
•Comparing it with single decision trees
•Looking at real-world use cases
•Discussing the pros and cons
•Wrapping up with a quiz to reinforce the concepts learned
If one person makes a decision, it might be wrong. What if 100
people vote on it? Would that be more reliable?
Why or why not?
What is Random Forest?

Random Forest is a popular ensemble


learning method that builds multiple
decision trees and merges their outputs
to improve accuracy and reduce
overfitting.
It can be used for both classification and
regression problems. Instead of relying
on a single decision tree, Random Forest
combines the decisions of many trees to
make a final prediction.
This approach increases the model's
performance and robustness.
Why is it Called "Random
Forest"?
The name "Random Forest" reflects two key characteristics.
First, it is a "forest" because it consists of multiple decision
trees working together.
Second, it is "random" because each tree is built using a
random subset of the training data and a random subset of
features.
This randomness helps ensure that the trees are different
from each other, which improves the overall performance of
the model.
Key Concepts in Decision Tress

Ensemble Learning:
Ensemble learning is a machine learning technique that
combines predictions from multiple models to solve a
problem more effectively than using a single model. It
works on the principle that a group of diverse models can
provide more accurate and reliable results than any
individual model. The three main types of ensemble
learning methods are bagging, boosting, and stacking.
Bagging:
Bagging stands for Bootstrap Aggregating. It involves
creating multiple subsets of the original training data using
random sampling with replacement. A separate model is
trained on each of these subsets. In Random Forest, these
models are decision trees. The final prediction is made by
aggregating the outputs from all trees: using majority
voting for classification or averaging for regression.
Decision Trees (Quick Recap)

A decision tree is a tree-like model used to make


decisions based on input features.
It splits data into branches using feature-based
questions, leading to an outcome at the leaf
nodes.
While decision trees are easy to understand and
interpret, they tend to overfit the data if not
carefully controlled.
This means they may perform well on training
data but poorly on unseen data.
If one tree makes an incorrect decision, but
many trees disagree, who would you trust
more: one or many? Why?
Random Forest to the
Rescue
Random Forest addresses the limitations of single decision
trees by using many trees to make a collective decision.
Since each tree sees a slightly different view of the data,
their combined decision is more accurate and stable.
By averaging many predictions or using majority voting, the
model becomes less likely to overfit or make large errors.
How Random Forest Works

The Random Forest algorithm follows three main steps:


1.Select random samples from the dataset (with replacement).
2.Build a decision tree for each sample using a random subset of features.
3.Aggregate the outputs from all trees to form the final prediction. This
approach allows the model to reduce variance and improve accuracy.
Algorithm Steps

1.Draw multiple bootstrap samples from the original data.

2.For each bootstrap sample:


•Select a random subset of features.
•Train a decision tree without pruning.

3.For prediction:
•Classification: Use majority voting among all trees.
•Regression: Take the average of all tree outputs. This process ensures the model is
both accurate and robust.
Illustrations
Randomness in Random
Forest
Random Forest introduces two types of randomness to reduce overfitting
and increase generalization:
•Data sampling: Each tree is trained on a randomly selected subset of the
training data.
•Feature sampling: At each split in a tree, only a random subset of
features is considered. This ensures the trees are diverse, which helps in
reducing the chance of correlated errors.
Hyperparameters in
Random Forest
Some important hyperparameters in Random Forest include:
•n_estimators: Number of trees in the forest.
•max_features: Number of features considered at each split.
•max_depth: Maximum depth each tree is allowed to grow.
•min_samples_split: Minimum number of samples required to split an internal node.
•min_samples_leaf: Minimum number of samples required to be at a leaf node.
Tuning these parameters helps improve model performance and prevent overfitting.
Comparison with Decision
Trees

Feature Decision Tree Random Forest


Overfitting High Low
Stability Low High
Accuracy Moderate High
Interpretability Easy More Complex

Random Forests improve on decision trees by reducing variance and increasing stability, though at the cost
of being less interpretable.
When to Use Random
Forest
Random Forest is ideal in situations where accuracy is more important
than interpretability.
It performs well with:
•Large datasets with many features
•Noisy or missing data
•Problems where overfitting is a concern It is particularly useful when
individual decision trees do not perform well.
Real-World Applications

Random Forest is widely used in various fields:


•Finance: Credit scoring, risk assessment
•Healthcare: Disease prediction, diagnosis support
•E-commerce: Customer churn prediction, recommendation systems
•Security: Fraud detection, intrusion detection
•Business: Sales forecasting, customer segmentation
Strengths of Random
Forest
•Handles both classification and regression tasks
•Naturally reduces overfitting
•Can handle missing and unbalanced data
•Provides feature importance scores
•Robust to outliers and noise in the data
Limitations of Random
Forest
•Can be computationally intensive with large datasets
•Less interpretable than single decision trees
•Requires careful tuning of hyperparameters
•Model size can become very large
Feature Importance

Random Forest can calculate the importance of each feature by measuring


how much each feature improves the split at decision nodes.
This helps in:
•Identifying key factors influencing the output
•Reducing dimensionality
•Improving model interpretability and performance
Example Scenario

Scenario: Predicting customer churn in a telecom company.


•Features: Age, Contract Type, Monthly Charges, Tenure
•Build multiple decision trees with bootstrapped samples
•Each tree votes on whether the customer will churn or not
•Final decision is based on majority vote
Why might combining the results of many simple models
be more powerful than one very complex model?

Think About: ensemble learning and the importance of


model diversity.
Best Practices

•Use cross-validation to tune hyperparameters


•Normalize or scale data if necessary
•Handle missing data before training
•Start with fewer trees and gradually increase
•Monitor performance to avoid overfitting
Random Forest vs Other Ensemble
Methods
Method Key Feature
Random Forest Uses bagging with decision trees
AdaBoost Focuses on misclassified samples
Gradient Boosting Learns from previous errors in sequence

Random Forest is best when you want simplicity and robustness with less tuning.
Review of Key Concepts
Let’s recap what we have learned:
•Random Forest builds multiple trees using random data and
features
•It aggregates outputs for better accuracy
•It overcomes limitations of individual decision trees
•Key concepts include ensemble learning, bagging, and
feature importance
Let’s discuss based on our
understanding
1. What is the main idea behind Random Forest?
2. What does bagging mean in Random Forest?
3. How does Random Forest prevent overfitting?
4. Which model is more accurate: single decision tree or
Random Forest?
5. Name one key hyperparameter in Random Forest.
Answers

1.What is the main idea behind Random Forest?


•Combining multiple decision trees to improve accuracy
2.What does bagging mean in Random Forest?
•Creating random data subsets with replacement
3.How does Random Forest prevent overfitting?
•Using multiple trees trained on random samples and features
4.Which model is more accurate: single decision tree or Random Forest?
•Random Forest
5.Name one key hyperparameter in Random Forest.
•n_estimators, max_depth, etc.
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy