0% found this document useful (0 votes)

78 views8 pages

Information Gain - Towards Data Science

This document discusses several machine learning concepts including decision trees, entropy, feature selection, and different feature selection methods. It provides an in-depth overview of decision tree fundamentals and how they work to partition data based on features. It also explains key feature selection concepts like entropy, information gain, and the need for feature selection to improve model performance. Finally, it describes different feature selection methods including filter, wrapper and embedded methods and provides examples of how each works.

Uploaded by

SIDDHARTHA SINHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views8 pages

Information Gain - Towards Data Science

Uploaded by

SIDDHARTHA SINHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Get

started Open in app

Towards Data Science

A Medium publication sharing concepts, ideas and codes.

Follow 562K Followers

Information Gain
In Towards Data Science. More on Medium.

Huy Bui · Mar 31, 2020

Decision Tree Fundamentals

Learning about Gini Impurity, Entropy, and how to construct a decision tree

Photo by veeterzy on Unsplash

When talking about the decision trees, I always imagine a list of questions I would ask
my girlfriend when she does not know what she wants for dinner: Do you want to eat
something with the noodle? How much do you want to spend? Asian or Western?
Healthy or junk food?

Making a list of questions to narrow down the options is essentially the idea behind
decision trees. More formally, the decision tree is the algorithm that partitions the
observations into similar data points based on their features.

The decision tree is a supervised learning model that has the tree-like…
Read more · 7 min read

83
Renu Khandelwal · Oct 24, 2019

Feature Selection : Identifying the best input features

In this article, we will understand what is feature selection, the difference between
feature selection and dimensionality reduction. How does feature importance help.
Understand different techniques like filter method, wrapper and embedded method
for identifying the best features with code in Python.

What is Feature Selection?

Feature selection is also referred to as Attribute selection or Variable selection
and is part of Feature Engineering. It is the process to select a subset of most
relevant attributes or features in the data set for predictive modeling.

Machine Learning in Finance | Data Driven Investor

Before we cover some Machine Learning finance applications, let's
first understand what Machine Learning is. Machine…
www.datadriveninvestor.com

The selected features help predictive models to identify hidden business insights.

If we need to predict the salary of people in IT then based on our common

understanding, we would need the number of years of experience, skill sets, work
location, current designation. These are a few of the key features helpful with salary
prediction. If the data set contains the height of the person, we know that feature is
irrelevant to salary prediction and hence should not be included as part of feature
selection.

Feature selection is the process to decide which relevant original features

to include and which irrelevant features to exclude for predictive
modeling.

Difference between Feature Selection and dimensionality reduction

Feature selection and Dimensionality reduction’s goal is to reduce the number of
Feature selection and Dimensionality reduction’s goal is to reduce the number of
attributes or features in the data set.

The key difference between feature selection and dimensionality reduction is that in
Feature selection, we do not change the original features, however, in
dimensionality reduction we create new features from original features.
This transformation of features using dimensionality reduction is often irreversible.

Feature selection is based on certain statistical methods like filter, wrapper and
embedded methods that we will discuss in this article.

For dimensionality reduction, we use techniques like Principal Component

Analysis(PCA)

Need for Feature Selection

Helps train the model faster: We have reduced numbers of relevant features,
so training is much faster.

Increase model interpret-ability and simplifies the model — It reduces

the complexity of the model by including only the most relevant features and
hence easy to interpret. This is very helpful in explaining the predictive model

Improves accuracy of the model: We include only features that are relevant
for our prediction and that increases the accuracy of the model. Irrelevant features
introduce noise and reduce the accuracy of the model

Reduces Over-fitting: Over-fitting is when the predictive model does not

generalize well on test data or unseen data based on the training. To reduce
overfitting, we need to remove the noise in the data set and include the features
that most influence the prediction. Noise comes from irrelevant features in the
data set. When a predictive model has learned the noise as part of training then it
will not generalize well on unseen data.

Different methods for Feature Selection

Filter

Wrapper

Embedded methods

Filter method for feature selection

The filter method ranks each feature based on some uni-variate metric and then
selects the highest-ranking features. Some of the uni-variate metrics are

variance: removing constant and quasi constant features

chi-square: used for classification. It is a statistical test of independence to

determine the dependency of two variables.

correlation coefficients: removes duplicate features

Information gain or mutual information: assess the dependency of the

independent variable in predicting the target variable. In other words, it
determines the ability of the independent features to predict the target variable.
Advantages of Filter methods
Filter methods are model agnostic

Rely entirely on features in the data set

Computationally very fast

Based on different statistical methods

The disadvantage of Filter methods
The filter method looks at individual features for identifying it’s relative
importance. A feature may not be useful on its own but maybe an important
influencer when combined with other features. Filter methods may miss such
features.
Filter criteria for selecting the best feature
Select independent features with

High correlation with the target variable

Low correlation with other independent variables

Higher information gain or mutual information of the independent variable

Wrapper method for feature selection

The wrapper method searches for the best subset of input features to predict the
target variable. It selects the features that provide the best accuracy of the model.
Wrapper methods use inferences based on the previous model to decide if a new
feature needs to be added or removed.

Wrapper methods are

Exhaustive search: evaluates all possible combinations of input features to find

the input feature subset that would give the best accuracy for a selected model.
Computationally very expensive when the number of input features gets larger;

Forward selection: start with a null feature set and keeping adding one input
feature at a time and evaluate the accuracy of the model. This process is continued
till we reach a certain accuracy with a predefined number of features;

Backward selection: start with all the features and then keep removing one
feature at a time to evaluate the accuracy of the model. Feature set that yields the
best accuracy is retained.

Always evaluate the accuracy of the model on the test data set.
Advantages
Models feature dependencies between each of the input features

Dependent on the model selected

selects the model with the highest accuracy based on feature subset
Disadvantages:
Computationally very expensive as training happens on each of the input feature
set combination

Not model agnostic

Embedded method for feature selection

Embedded methods use the qualities of both filter and wrapper feature selection
methods. Feature selection is embedded in the machine learning algorithm.

Filter methods do not incorporate learning and are only about feature selection.
Wrapper methods use a machine-learning algorithm to evaluate the subsets of
Wrapper methods use a machine-learning algorithm to evaluate the subsets of
features without incorporating knowledge about the specific structure of the
classification or regression function and can, therefore, be combined with any
learning machine

Embedded feature selection algorithms include

Decision Tree

Regularization — L1(Lasso)and L2(Ridge) Regularization

By fitting the model, using these machine learning techniques. These methods provide
us with the feature importance for better accuracy.

In the next article, we will implement some of the feature selection

methods in python using Filter method
References:
http://ijcsit.com/docs/Volume%202/vol2issue3/ijcsit2011020322.pdf

https://arxiv.org/pdf/1907.07384.pdf

http://people.cs.pitt.edu/~iyad/DR.pdf

https://link.springer.com/chapter/10.1007%2F978-3-540-35488-8_6

180

ayşe bilge gündüz · Nov 11, 2019

Machine Learning 101-ID3 Decision Tree and Entropy Calculation

(1)
This series includes the Machine Learning course notes which I collected while I was
in the course phase at Ph.D.
Training Approaches
Machine Learning training approaches divide into 3;

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

ID3 Decision Tree

This approach known as supervised and non-parametric decision tree type.

Mostly, it is used for classification and regression.

A tree consists of an inter decision node and terminal leaves. And terminal leaves
has outputs. The output display class values in classification, however display
numeric value for regression.

The aim of dividing subsets into decision trees is to make each subset as
homogeneous as possible. The disadvantage of decision tree algorithms is that
they are greedy approaches. A Greedy algorithm is any algorithm that follows
the problem-solving heuristic…
Read more · 5 min read

Lukas Molzberger · Sep 23, 2019

Using Information Gain for the Unsupervised Training of
Excitatory Neurons
Looking for a biologically more plausible way to train a neural network.

Traditionally, artificial neural networks have been trained using the Delta rule and
backpropagation. But this contradicts the findings that the neurosciences have made
on the function of the brain. There simply is no gradient error signal that is
propagated backwards through biological neurons (see here and here). Besides, the
human brain can find patterns in its audiovisual training data by itself without the
need for training labels. When a parent shows a cat to a child, the child doesn’t use
this information to learn every detail of what…
Read more · 11 min read

Azika Amelia · Sep 6, 2019

Decision tree: Part 2/2

Calculating Entropy and Information gain by hand
This post is second in the “Decision tree” series, the first post in this series develops
an intuition about the decision trees and gives you an idea of where to draw a decision
boundary. In this post, we’ll see how a decision tree does it.

Spoiler: It involves some mathematics.

We’ll be using a really tiny dataset for easy visualization and follow through. However,
in practice, such datasets would definitely over-fit. This dataset decides if you should
buy a car given 3 features: Age, Mileage and whether or not the car is road test.
Read more · 4 min read

About Write Help Legal

Get the Medium app

Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
No ratings yet
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
63 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Feature Engineering and Dimensionality Reduction
No ratings yet
Feature Engineering and Dimensionality Reduction
146 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Identifying Key Predictors and Influencers For Predictions Using Artificial Intelligence 1
No ratings yet
Identifying Key Predictors and Influencers For Predictions Using Artificial Intelligence 1
14 pages
L5 Dimensionality Reduction
No ratings yet
L5 Dimensionality Reduction
47 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
117 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Sqa Chief Mates Navigation Solved Papers PDF
60% (5)
Sqa Chief Mates Navigation Solved Papers PDF
2 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Module 3
No ratings yet
Module 3
33 pages
1 of 1
No ratings yet
1 of 1
41 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Asl 50 Questions
No ratings yet
Asl 50 Questions
36 pages
Python A Practical Learning Approach - TS Murugesh
No ratings yet
Python A Practical Learning Approach - TS Murugesh
166 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Unit 5
No ratings yet
Unit 5
25 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
3 Module DWM
No ratings yet
3 Module DWM
16 pages
05a Feature Creation Selection
No ratings yet
05a Feature Creation Selection
18 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Report Writing Format
No ratings yet
Report Writing Format
19 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
ML Notes
No ratings yet
ML Notes
15 pages
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
La Teoria de Katz y Su Aplicación en Las Habilidades Gerenciales de Los Directores
No ratings yet
La Teoria de Katz y Su Aplicación en Las Habilidades Gerenciales de Los Directores
18 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Data Reduction SQ
No ratings yet
Data Reduction SQ
3 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
2017 Machine Learning Summary v4 PDF
No ratings yet
2017 Machine Learning Summary v4 PDF
41 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Machine Leaning and Dimensionality Reduction Course UCLouvain
No ratings yet
Machine Leaning and Dimensionality Reduction Course UCLouvain
36 pages
Wa0028.
No ratings yet
Wa0028.
10 pages
Language and Media - Course Guide - 2025 - Blended
No ratings yet
Language and Media - Course Guide - 2025 - Blended
15 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
FS 2 Module 1
75% (8)
FS 2 Module 1
27 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Test of Achievement Write Up - Elise Lubben 2
No ratings yet
Test of Achievement Write Up - Elise Lubben 2
4 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Gad Action Plan - Sy 2022 - 2023
100% (10)
Gad Action Plan - Sy 2022 - 2023
2 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
The Importance of Materials Development For Language Learning
No ratings yet
The Importance of Materials Development For Language Learning
2 pages
Unit 4
No ratings yet
Unit 4
20 pages
Developing Reasoning Depth in Reading
No ratings yet
Developing Reasoning Depth in Reading
134 pages
KNIME - Seven Techs For Dimensionality Reduction
No ratings yet
KNIME - Seven Techs For Dimensionality Reduction
17 pages
Teacher: Jaida Vandunk Grade: 1 Grade Ccss Standard:: Stage 3: Analyzing/Reflecting
No ratings yet
Teacher: Jaida Vandunk Grade: 1 Grade Ccss Standard:: Stage 3: Analyzing/Reflecting
2 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Time Management Skills For Teachers 2
No ratings yet
Time Management Skills For Teachers 2
21 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Module 4 Working With The Standards-Based Assessors - Tools
No ratings yet
Module 4 Working With The Standards-Based Assessors - Tools
73 pages
Opinion Editorials
No ratings yet
Opinion Editorials
7 pages
Guide To Undergraduate Research 12 June 2023
No ratings yet
Guide To Undergraduate Research 12 June 2023
22 pages
Effective Supervision, Monitoring and Evaluation in Basic Education Challenges and Solutions in Yobe State
No ratings yet
Effective Supervision, Monitoring and Evaluation in Basic Education Challenges and Solutions in Yobe State
10 pages
EDU 536 TQ Group 3
No ratings yet
EDU 536 TQ Group 3
35 pages
Strategies For Language Learning - Macaro
No ratings yet
Strategies For Language Learning - Macaro
18 pages
Task 10
No ratings yet
Task 10
9 pages
AVSAB Humane Dog Training Position Statement 2021
No ratings yet
AVSAB Humane Dog Training Position Statement 2021
4 pages
DLP Mathematics Similar Fraction
No ratings yet
DLP Mathematics Similar Fraction
5 pages
HANGTM6 - Students' Motivation in Learning English at FPT Polytechnic
No ratings yet
HANGTM6 - Students' Motivation in Learning English at FPT Polytechnic
8 pages
Divyanshu Singh TK CV-1
No ratings yet
Divyanshu Singh TK CV-1
1 page
LAS 9 Preparing For Teaching and Learning
No ratings yet
LAS 9 Preparing For Teaching and Learning
7 pages
Cot 2021 2022 Arts 1
No ratings yet
Cot 2021 2022 Arts 1
4 pages
Action Plan On CPP
100% (2)
Action Plan On CPP
1 page
Lesson Plan Reading
No ratings yet
Lesson Plan Reading
2 pages
BMGT525 Online Syllabus B Tuesday Fall 2021-2022
No ratings yet
BMGT525 Online Syllabus B Tuesday Fall 2021-2022
1 page
Blackboard Learn 9: Roster & Grade Book
No ratings yet
Blackboard Learn 9: Roster & Grade Book
1 page
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
From Everand
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
Ethan Bennett
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Information Gain - Towards Data Science

Uploaded by

Information Gain - Towards Data Science

Uploaded by

Get

started Open in app

Towards Data Science

A Medium publication sharing concepts, ideas and codes.

Follow 562K Followers

Huy Bui · Mar 31, 2020

Decision Tree Fundamentals

Photo by veeterzy on Unsplash

Feature Selection : Identifying the best input features

What is Feature Selection?

Machine Learning in Finance | Data Driven Investor

If we need to predict the salary of people in IT then based on our common

Feature selection is the process to decide which relevant original features

Difference between Feature Selection and dimensionality reduction

For dimensionality reduction, we use techniques like Principal Component

Need for Feature Selection

Increase model interpret-ability and simplifies the model — It reduces

Reduces Over-fitting: Over-fitting is when the predictive model does not

Different methods for Feature Selection

Filter method for feature selection

variance: removing constant and quasi constant features

chi-square: used for classification. It is a statistical test of independence to

correlation coefficients: removes duplicate features

Information gain or mutual information: assess the dependency of the

Rely entirely on features in the data set

Computationally very fast

Based on different statistical methods

High correlation with the target variable

Low correlation with other independent variables

Higher information gain or mutual information of the independent variable

Wrapper method for feature selection

Wrapper methods are

Exhaustive search: evaluates all possible combinations of input features to find

Dependent on the model selected

Not model agnostic

Embedded method for feature selection

Embedded feature selection algorithms include

Regularization — L1(Lasso)and L2(Ridge) Regularization

Read more on L1 and L2 regularization here

In the next article, we will implement some of the feature selection

ayşe bilge gündüz · Nov 11, 2019

Machine Learning 101-ID3 Decision Tree and Entropy Calculation

ID3 Decision Tree

Mostly, it is used for classification and regression.

Lukas Molzberger · Sep 23, 2019

Azika Amelia · Sep 6, 2019

Decision tree: Part 2/2

Spoiler: It involves some mathematics.

About Write Help Legal

Get the Medium app

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.