Unit 4 Notes
Unit 4 Notes
UNIT IV
ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING
UNIVERSITY QUESTIONS
1. Assume an image has pixel size 240*180. elaborate how K means clustering can be
used to achieve lossy data compression of that image
2. Explain in detail about combining multiple classifiers by voting.
3. [i] what is bagging and boosting?give example
[ii]outline the steps in the adaboost algorithm with an example.
4. Elaborate on the steps in expectation –maximization algorithm
• first step is to create multiple classification/ regression models using some training
dataset. Each base model can be created using different splits of the same training
dataset and same algorithm, or using the same dataset with different algorithms, or
any other methods.
• When combining multiple independent and diverse decisions each of which is at least
more accurate than random guessing, random errors cancel each other out, and correct
decisions are reinforced. Human ensembles are demonstrably better.
• Use a single, arbitrary learning algorithm but manipulate training data to make it learn
multiple models.
CLASSIFIER COMBINATION RULES
EXAMPLE:
ENSEMBLE LEARNING
• The idea of ensemble learning is to employ multiple learners and combine their
predictions.
• Ensemble methods combine several decision trees classifiers to produce better
predictive performance than a single decision tree classifier. The main principle behind the
ensemble model is that a group of weak learners come together to form a strong learner, thus
increasing the accuracy of the model.
SIMPLE ENSEMBLE TRAINING METHODS
Simple ensemble training methods typically just involve the application of statistical
summary techniques, such as determining the mode[max voting], mean[averaging], or
weighted average of a set of predictions.
ADVANCED ENSEMBLE TRAINING METHODS
There are three primary advanced ensemble training techniques, each of which is
designed to deal with a specific type of machine learning problem.
―Bagging‖ techniques are used to decrease the variance of a model‘s predictions, with
variance referring to how much the outcome of predictions differs when based on the
same observation.
―Boosting‖ techniques are used to combat the bias of models.
Finally, ―stacking‖ is used to improve predictions in general.
BAGGING/BOOTSTRAP AGGREGATING
. Bagging is a voting method whereby base-learners are made different by training them over
slightly different training sets.
• For given a training set of size n, create m samples of size n by drawing n examples
from the original data, with replacement. Each bootstrap sample will on average contain
63.2% of the unique training examples, the rest are replicates. It combines the m resulting
models using simple majority vote.
PSEUDOCODE
Then the second model is built which tries to correct the errors present in the first
model.
This procedure is continued and models are added until either the complete training
data set is predicted correctly or the maximum number of models is added.
STEPS
• Draw a random subset of training samples d1 without replacement from the training
set D to train a weak learner C1
• Draw second random training subset d2 without replacement from the training set add
add 50 percent of the samples that were previously falsely classified / misclassified to
train a weak learner C2
• Find the training samples d3 in the training set D on which C1 and C2 disagree to
train a third weak learner C3
• Combine all the weak learners via majority voting.
ADVANTAGES OF BOOSTING
• Improved accuracy
• Robustness to overfitting
• Better interpretability
DISADVANTAGES OF BOOSTING
• Prone to over – fitting.
• Requires careful tuning of different hyper – parameters
ADABOOST ALGORITHM
• It is used to learn weak classifiers
STACKING
• It is a popular ensemble machine learning techniques and used to predict multiple
nodes to build a new model and improve model performance.
• Stacking enables us to train multiple models to solve similar problems, and based on
their combined output, it builds a new model with improved performance
• It ensembling classification or regression models
• it consists of two-layer estimators
• first layer consists of all the baseline models that are used to predict the
outputs on the test datasets.
• second layer consists of Meta-Classifier or Regressor which takes all the
predictions of baseline models as an input and generate new predictions
• Hence each cluster has datapoints with some commonalities, and it is away
from other clusters.
• Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different clusters.
• We need to choose some random k points or centroid to form the cluster. These points
can be either the points from the dataset or any other point. So, here we are selecting the
below two points as k points, which are not the part of our dataset. Consider the below image:
• Now we will assign each data point of the scatter plot to its closest K-point or
centroid.
• calculate the distance between two points. So, we will draw a median between
boththecentroids. Consider the below image:
• From the above image, it is clear that points left side of the line is near to the
K1 or blue centroid, and points to the right of the line are close to the yellow
centroid. Let's color them as blue and yellow for clear visualization.
• As we need to find the closest cluster, so we will repeat the process by choosing a
new centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:
• Next, we will reassign each datapoint to the new centroid. For this, we will repeat the
same process of finding a median line. The median will be like below image:
• From the above image, we can see, one yellow point is on the left side of the
line, and two blue points are right to the line. So, these three points will be
assigned to new centroids
• As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:
• We can see in the above image; there are no dissimilar data points on either side of
the line, which means our model is formed. Consider the below image:
• As our model is ready, so we can now remove the assumed centroids, and the
two final clusters will be as shown in the below image:
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
STEPS:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready.
EXAMPLE:
Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve this type
of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the
category or class of a particular dataset. Consider the below diagram:
Suppose we have a new data point and we need to put it in the required category.
• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
• As we can see the 3 nearest neighbors are from category A in Figure 4.9 ,
hence this new data point must belong to category A.
ADVANTAGES OF KNN ALGORITHM:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
DISADVANTAGES OF KNN ALGORITHM:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the
data points for all the training samples.
EXPECTATION-MAXIMIZATION ALGORITHM
STEPS
1. Given a set of incomplete data, consider a set of starting parameters
2. Expectation step(E-step) – Using the observed available data of the dataset, estimate
the values of the missing data
3. Maximization step(M step)-complete data generated after E step is used in order to
update the parameters
4. Repeat step 2 and 3 untill convergence
FORMULA FOR E STEP AND M STEP
K- no. of heads
n-no. of flips
For coin A
L(H)=P(A)*No. of heads
L(T)=P(A)*No. of tails
For coin B
L(H)=P(B)*No. of heads
L(T)=P(B)*No. of tails
After 10 iteration
Θ1=0.8
Θ2=0.52
USE/APPLICATION
• Used to fill the missing data in sample
• Uses as the basis of unsupervised learning of clusters
• Used for estimating the parameters of Hidden Markov Mode(HMM)
• Used for discovering the values of latent variables
ADVANTAGES
• Always guaranteed that likelihood will increase with each iteration
• E step and M step are easy for many problems in terms of implementation
DISADVANTAGES
• Has slow convergence
• It makes convergence to the local optima only
• It requires both probability –forward and backward
PART A
1. What is Ensemble method?
Ensemble methods are techniques that aim at improving the accuracy of results in
models by combining multiple models instead of using a single model. The combined
models increase the accuracy of the results significantly. This has boosted the
popularity of ensemble methods in machine learning
2. Which are the performance factors that influence KNN algorithm?
1. The distance function or distance metric used to determine the nearest neighbors
2. The Decision rule used to derive a classification from the K-Nearest neighbors.
3. The number of neighbors used to classify the new example.
3. List the properties of K-Means algorithm.
1. There are always K clusters
2. There is always at least one item in each cluster.
3. The clusters are non-hierarchical and they do not overlap
4. How do GMMs differentiate from K-means clustering?
GMMs and K-means, both are clustering algorithms used for unsupervised learning
tasks. However, the basic difference between them is that K-means is a distance-based
clustering method while GMMs is a distribution based clustering method.
5. What is ‘Over fitting’ in Machine learning?
In machine learning, when a statistical model describes random error or noise instead
of underlying relationship ‗over fitting‘ occurs. When a model is excessively
complex, over fitting is normally observed, because of having too many parameters
with respect to the number of training data types. The model exhibits poor
performance which has been over fit.
6. What are the two paradigms of ensemble methods?
The two paradigms of ensemble methods are
Sequential ensemble methods
Parallel ensemble methods
7. What is Error-Correcting Output Codes?
The main classification task is defined in terms of a number of subtasks that are
implemented by the base learners. The idea is that the original task of separating
oneclass from all other classes may be a difficult problem. We want to define a set of
simpler classification problems, each specializing in one aspect of the task, and
combining these simpler classifiers, we get the final classifier.