ML Module2
ML Module2
CSC 701
• What is Regression?
𝐵0=𝑦¯−𝐵1𝑥¯
• Mean of Y –(B1 *Mean of X)
Diameter
(X) in Price (Y) in
inch Dollars
1 8 10
2 10 13
3 12 16
4 20 ?
The goal of linear regression is to find the regression line that best approximates (or
fits) the relationship between the input variable(s) and the output variable. This
involves calculating the values of b0 (intercept)and b1 (the regression coefficients).
Linear Regression
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
7 55 ?
Linear Regression-Step I
GLUCO
SE
SUBJE AGE LEVEL
CT X Y XY X2 Y2
St.12Francis
AugustInstitute
2024 of Technology
Department of Computer Engineering 21
Ordinary Least Squares
coefficients.
errors together.
minimize.
St.12Francis
AugustInstitute
2024 of Technology
Department of Computer Engineering 23
Gradient Descent
Multivariate Regression:
Multivariate regression, on the other hand, deals with
situations where there are multiple dependent variables, each
predicted by the same set of independent variables. In other
words, it simultaneously predicts multiple outcomes based on
the same set of predictors
For example, given the humidity level, it may either rain or not rain.
Similarly, when applying for a loan or college admission, the
application is either approved or rejected.
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1,
which cannot go beyond this limit, so it forms a curve like the "S"
form. The S-form curve is called the Sigmoid function or the logistic
function.
In Linear regression, it is required that In Logistic regression, it is not required to have the
relationship between dependent variable linear relationship between the dependent and
and independent variable must be linear. independent variable.
St. Francis Institute of Technology
Department of Computer Engineering 45
What is a Decision Tree?
• A decision tree is a flowchart-like structure used to
make decisions or predictions.
• It consists of nodes representing decisions or tests on
attributes, branches representing the outcome of
these decisions, and leaf nodes representing final
outcomes or predictions.
• Each internal node corresponds to a test on an
attribute, each branch corresponds to the result of the
test, and each leaf node corresponds to a class label or
a continuous value.
Masters 3 placed
Bachelors 3 placed
Masters 3 placed
Bachelors 1 placed
Masters 4 placed
66
Decision tree using regression
Create Nodes:
Split the data into two subsets based on the best split. Create a decision node that tests the
chosen attribute.
Recursively Split:
Repeat the above steps for each child node until you reach a stopping point (like a maximum
depth of the tree or a minimum number of samples per leaf).
Prediction:
To predict the value for a new instance, follow the path down the tree using the decision
nodes and return the average value of the target variable in the final leaf node.
2 Yes 50
3 No 60
4 Yes 65
5 No 70
6 Yes 80
7 No 85
8 Yes 90
Predicted
165 NO YES =100/110 =
A 50 10
c NO [TN] [FP] 60
t
u
5 100 = 100/105 = 0.95
a
l YES [FN] [TP] 105
55 110
Even if data is imbalanced, we can figure out that our model is working well or not. For
that, the values of TPR and TNR should be high, and FPR and FNR should be as low as
possible.
With the help of TP, TN, FN, and FP, other performance metrics can be calculated.
Now that we have defined the terms, let’s calculate the Cohen
Kappa score. Let the total observation (TP + FP + FN + TN) is N.
Or, N = TP + FP + FN + TN
St. Francis Institute of Technology 12
Department of Computer Engineering 0
• The first step is to calculate the probability that both the
raters are in perfect agreement:
• Observed Agreement, Po = (TP + TN) / N
• In our example, this would be:
• Po = (45+15)/100=0.6
• Next, we need to calculate the expected probability that
both the raters are in agreement by chance. This is
calculated by multiplying the expected probability that
both the raters are in agreement that the classes are
positive, and, the classes are negative.