Information Gain - Towards Data Science
Information Gain - Towards Data Science
Information Gain
In Towards Data Science. More on Medium.
When talking about the decision trees, I always imagine a list of questions I would ask
my girlfriend when she does not know what she wants for dinner: Do you want to eat
something with the noodle? How much do you want to spend? Asian or Western?
Healthy or junk food?
Making a list of questions to narrow down the options is essentially the idea behind
decision trees. More formally, the decision tree is the algorithm that partitions the
observations into similar data points based on their features.
The decision tree is a supervised learning model that has the tree-like…
Read more · 7 min read
83
Renu Khandelwal · Oct 24, 2019
The selected features help predictive models to identify hidden business insights.
The key difference between feature selection and dimensionality reduction is that in
Feature selection, we do not change the original features, however, in
dimensionality reduction we create new features from original features.
This transformation of features using dimensionality reduction is often irreversible.
Feature selection is based on certain statistical methods like filter, wrapper and
embedded methods that we will discuss in this article.
Improves accuracy of the model: We include only features that are relevant
for our prediction and that increases the accuracy of the model. Irrelevant features
introduce noise and reduce the accuracy of the model
Wrapper
Embedded methods
Forward selection: start with a null feature set and keeping adding one input
feature at a time and evaluate the accuracy of the model. This process is continued
till we reach a certain accuracy with a predefined number of features;
Backward selection: start with all the features and then keep removing one
feature at a time to evaluate the accuracy of the model. Feature set that yields the
best accuracy is retained.
Always evaluate the accuracy of the model on the test data set.
Advantages
Models feature dependencies between each of the input features
selects the model with the highest accuracy based on feature subset
Disadvantages:
Computationally very expensive as training happens on each of the input feature
set combination
Filter methods do not incorporate learning and are only about feature selection.
Wrapper methods use a machine-learning algorithm to evaluate the subsets of
Wrapper methods use a machine-learning algorithm to evaluate the subsets of
features without incorporating knowledge about the specific structure of the
classification or regression function and can, therefore, be combined with any
learning machine
Decision Tree
By fitting the model, using these machine learning techniques. These methods provide
us with the feature importance for better accuracy.
https://arxiv.org/pdf/1907.07384.pdf
http://people.cs.pitt.edu/~iyad/DR.pdf
https://link.springer.com/chapter/10.1007%2F978-3-540-35488-8_6
180
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
A tree consists of an inter decision node and terminal leaves. And terminal leaves
has outputs. The output display class values in classification, however display
numeric value for regression.
The aim of dividing subsets into decision trees is to make each subset as
homogeneous as possible. The disadvantage of decision tree algorithms is that
they are greedy approaches. A Greedy algorithm is any algorithm that follows
the problem-solving heuristic…
Read more · 5 min read
77
Traditionally, artificial neural networks have been trained using the Delta rule and
backpropagation. But this contradicts the findings that the neurosciences have made
on the function of the brain. There simply is no gradient error signal that is
propagated backwards through biological neurons (see here and here). Besides, the
human brain can find patterns in its audiovisual training data by itself without the
need for training labels. When a parent shows a cat to a child, the child doesn’t use
this information to learn every detail of what…
Read more · 11 min read
91
We’ll be using a really tiny dataset for easy visualization and follow through. However,
in practice, such datasets would definitely over-fit. This dataset decides if you should
buy a car given 3 features: Age, Mileage and whether or not the car is road test.
Read more · 4 min read
35