Randon Forest
Randon Forest
4
Ground
Rules
• Be open-minded
• Listen carefully
• Avoid doing other unrelated tasks when attending the session so
that you can be FOCUSED.
• Raise your hand if you have any query so that we can ensure one
conversation at a time
• When speaking, use “I think”, “I feel”, etc. (you are a very important
aspect of
this learning)
• Respect the opinions of
others
• Give constructive feedback
• Build on the ideas of
others
Have rather
fun and thanthe learning
ENJOY
experience!
destroying them 7
Know Your Mentor
Qualification
• Ph D in Machine Learning
• Master in Computer Science
13 Years of experience
Teaching
• Corporate Trainer
• Assistant Professor
Areas of specialization
• AI & Data Science
• Database Management
• Software Engineering
Publications –Research papers
Dr. Shubhra Goyal
www.linkedin.com/in/dr-shubhra-goyal-03a4
a922/
shubhrag@regenesys.net
Understanding
Random Forests
DR. SHUBHRA GOYAL
Agenda
Ensemble Learning:
Ensemble learning is a machine learning technique that
combines predictions from multiple models to solve a
problem more effectively than using a single model. It
works on the principle that a group of diverse models can
provide more accurate and reliable results than any
individual model. The three main types of ensemble
learning methods are bagging, boosting, and stacking.
Bagging:
Bagging stands for Bootstrap Aggregating. It involves
creating multiple subsets of the original training data using
random sampling with replacement. A separate model is
trained on each of these subsets. In Random Forest, these
models are decision trees. The final prediction is made by
aggregating the outputs from all trees: using majority
voting for classification or averaging for regression.
Decision Trees (Quick Recap)
3.For prediction:
•Classification: Use majority voting among all trees.
•Regression: Take the average of all tree outputs. This process ensures the model is
both accurate and robust.
Illustrations
Randomness in Random
Forest
Random Forest introduces two types of randomness to reduce overfitting
and increase generalization:
•Data sampling: Each tree is trained on a randomly selected subset of the
training data.
•Feature sampling: At each split in a tree, only a random subset of
features is considered. This ensures the trees are diverse, which helps in
reducing the chance of correlated errors.
Hyperparameters in
Random Forest
Some important hyperparameters in Random Forest include:
•n_estimators: Number of trees in the forest.
•max_features: Number of features considered at each split.
•max_depth: Maximum depth each tree is allowed to grow.
•min_samples_split: Minimum number of samples required to split an internal node.
•min_samples_leaf: Minimum number of samples required to be at a leaf node.
Tuning these parameters helps improve model performance and prevent overfitting.
Comparison with Decision
Trees
Random Forests improve on decision trees by reducing variance and increasing stability, though at the cost
of being less interpretable.
When to Use Random
Forest
Random Forest is ideal in situations where accuracy is more important
than interpretability.
It performs well with:
•Large datasets with many features
•Noisy or missing data
•Problems where overfitting is a concern It is particularly useful when
individual decision trees do not perform well.
Real-World Applications
Random Forest is best when you want simplicity and robustness with less tuning.
Review of Key Concepts
Let’s recap what we have learned:
•Random Forest builds multiple trees using random data and
features
•It aggregates outputs for better accuracy
•It overcomes limitations of individual decision trees
•Key concepts include ensemble learning, bagging, and
feature importance
Let’s discuss based on our
understanding
1. What is the main idea behind Random Forest?
2. What does bagging mean in Random Forest?
3. How does Random Forest prevent overfitting?
4. Which model is more accurate: single decision tree or
Random Forest?
5. Name one key hyperparameter in Random Forest.
Answers