Assignment 3.docx 2
Assignment 3.docx 2
Fig shows Iris data set. The two classes are easily separated with a
straight line. The left plot shows decision boundaries of 3 possible linear
classifiers. The dashed line does not separate the classes properly. The
solid line on the right rep the decission boundary of an SVM classifier, it
is far away from training instances fitting the widest possible instances
between the classes. This is called large margin classification.
The hard margin and soft margin problems are optimization
problems with linear models. Called quadratic programming problems.
The general problem formulation is given by
To apply 2nd degee polynomial to a 2 dimensional training set train a
linear SVM classifier on the transformed training set . Eqn shows 2nd
degree polynomial mapping function
The dot product of transformed vectors is equal to the square of the dot
product of the original vectors.
2. Discuss Non-linear SVM classification. How can you see polyomial
Kernel, Guassian and RBF kernel.
For non linear dataset,we add a second feature to transform into a linear
dataset.
x2 = (x1)2, the resulting 2D dataset is perfectly linearly separable
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
polynomial_svm_clf = Pipeline([
("poly_features", PolynomialFeatures(degree=3)),
("scaler", StandardScaler()),
("svm_clf", LinearSVC(C=10, loss="hinge"))
])
polynomial_svm_clf.fit(X, y)
For a low polynomial degree SVM uses kernel trick. The folowing code
explains how to use kernel trick on moons dataset.
rbf_kernel_svm_clf = Pipeline([
("scaler", StandardScaler()),
("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
])
rbf_kernel_svm_clf.fit(X, y)
The plots shows how models are trained with different values of
hyperparameters gamma (γ) and C. Increasing gamma makes the bell-shape
curve narrower (left plot), and as a result each instance’s range of influence is
smaller: the decision boundary ends up being more irregular, wiggling around
individual instances. Conversely, a small gamma value makes the bell-shaped
curve wider, so instances have a larger range of influence, and the decision
boundary ends up smoother. So γ acts like a regularization hyperparameter: if
your model is overfitting, you should reduce it, and if it is underfitting, you
should increase.
3. Explain how decision trees are trained, visualized and used in making
predictions.
Decision Trees are the fundamental components and most powerful
Machine Learning algorithms
To train, visualize, and make predictions with Decision Trees
Figure 6-1. Iris Decision Tree
The following code trains a DecisionTreeClassifier on the iris dataset
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)
Figure 6-1 show predictions on an iris flower to classify it. The root node
(depth 0, at the top): this node asks whether the flower’s petal length is
smaller than 2.45 cm. If it is, then moves down to the root’s left child
node (depth 1, left). In this case, it is a leaf node (i.e., it does not have any
children nodes), so it does not ask any questions:it simply looks at the
predicted class for that node and the Decision Tree predicts flower is an
Iris-Setosa (class=setosa).
Now suppose you find another flower, but this time the petal length is
greater than 2.45 cm. now move down to the root’s right child node
(depth 1, right), which is not a leaf node, so it asks another question: is
the petal width smaller than 1.75 cm? If it is, then flower is most likely
an Iris-Versicolor (depth 2, left). If not, it is likely an Iris-Virginica (depth
2, right).
For example, the depth-2 left node
has a gini score equal to 1 – (0/54)2 – (49/54)2 – (5/54)2 ≈ 0.168
suppose We found a flower whose petals are 5 cm long and 1.5 cm wide. The corresponding
leaf node is the depth-2 left node, so the Decision Tree should output the
following probabilities: 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54),
and 9.3% for Iris-Virginica (5/54). And of course if you ask it to predict the class, it
should output Iris-Versicolor (class 1) since it has the highest probability. Let’s check
this:
tree_clf.predict_proba([[5, 1.5]])
array([[0. , 0.90740741, 0.09259259]])
Once all predictors are trained, the ensemble makes predictions very much like
bagging or pasting
from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=1), n_estimators=200,
algorithm="SAMME.R", learning_rate=0.5)
ada_clf.fit(X_train, y_train)
13.What are the main differences between linear and nonlinear Support
Vector Machines
The fundamental idea behind SVMs is best explained with some pictures.
Fig shows part of the iris dataset. The two classes can clearly be separated
easily with a straight line (they are linearly separable). The left plot
shows the decision boundaries of three possible linear classifiers. The
model whose decision boundary is represented by the dashed line is so
bad that it
does not even separate the classes properly. The other two models work
perfectly on this training set, but their decision boundaries come so close
to the instances that these models will probably not perform as well on
new instances. In contrast, the solid line in the plot on the right represents
the decision boundary of an SVM classifier; this line not only separates
the two classes but also stays as far away from the closest training
instances as possible. You can think of an SVM classifier as fitting the
widest possible street (represented by the parallel dashed lines) between
the classes.
This is called large margin classification.
The following Scikit-Learn code loads the iris dataset, scales the features,
and then trains a linear SVM model (using the LinearSVC class with C =
1 and the hinge loss function, described shortly) to detect Iris-Virginica
flowers. The resulting model is represented on the left of Figure 5-4.
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica
svm_clf = Pipeline([
("scaler", StandardScaler()),
("linear_svc", LinearSVC(C=1, loss="hinge")),
])
svm_clf.fit(X, y)
you can use the model to make predictions:
>>> svm_clf.predict([[5.5, 1.7]])
array([1.]
Nonlinear SVM Classification
Although linear SVM classifiers are efficient and work surprisingly well
in many cases, many datasets are not even close to being linearly
separable. One approach to handling nonlinear datasets is to add more
features, such as polynomial features (as you did in Chapter 4); in some
cases this can result in a linearly separable dataset.
Consider the left plot in Figure 5-5: it represents a simple dataset with
just one feature x1. This dataset is not linearly separable, as you can see.
But if you add a second feature x2 = (x1)2, the resulting 2D dataset is
perfectly linearly separable.
14.Explain the concept of GINI impurity and how it is used in decision tree
algorithms.
Gini Impurity: It is a Measure of Impurity. Gini impurity is a metric used
in decision tree algorithms to measure how “pure” a node is in terms of
class distribution. A pure node means all data points in that node belong
to a single class, while an impure node contains data points from multiple
classes.
The Gini impurity ranges from 0 (pure node) to a maximum value that
depends on the number of classes.
For example:
• If all samples in a node are of the same class, Gini=0
• If there are two classes with equal proportions, Gini=0.5
18. Describe the Bayes Optimal Classifier and its theoretical importance in
classification problems.
Answer given in class
19. What is a Bayesian Belief Network, and how does it represent
probabilistic relationships between variables?
Answer given in class
20. Construct a regression using the following data which consists of 10 data
instances and three attributes “Assessment’, ‘Assignment’ and Project.