0% found this document useful (0 votes)

13 views63 pages

Linear Regression

Uploaded by

akrab.tech7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views63 pages

Linear Regression

Uploaded by

akrab.tech7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

CSCI417

Machine Intelligence
Lecture # 5

Fall 2023

1
Tentative Course Topics

1.Machine Learning Basics

2.Classifying with k-Nearest Neighbors
3.Splitting datasets one feature at a time: decision trees
4.Classifying with probability theory: naïve Bayes
5.Linear/Logistic regression
6.Support vector machines
7.Model Evaluation and Improvement: Cross-validation, Grid Search, Evaluation Metrics, and
Scoring
8.Ensemble learning and improving classification with the AdaBoost meta-algorithm.
9.Introduction to Neural Networks - Building NN for classification (binary/multiclass)
10.Convolutional Neural Network (CNN)
11.Pretrained models (VGG, Alexnet,..)
12.Machine learning pipeline and use cases.

2
Agenda
Regression Problem:
Univariate Linear Regression

Optimization Technique:
Gradient Descent

Regression Problem:
Univariate Linear Regression Example

Regression Problem:
Multivariate Linear Regression
Optimization Technique:
Normal Equations

3
Regression Problem

Univariate Linear Regression

Linear Regression with One Variable

4
Linear Regression with One Variable
• Linear regression with one variable is also known as "univariate”
• We want to predict a single output value y from a single input value x.

Housing Prices (Portland, Oregon) Price

(in 1000s of
dollars)
500

400

300
230
y 200

100

0
Supervised Learning 0 500 1000 1500 2000 2500 3000

Given the “right answer” for each example in the data.

1250
x Size(feet2)
The Hypothesis Function
• Our hypothesis function has the general form:

– like the equation of a straight line

– we are trying to create a function called hθ that (hypothesis)
is trying to map our input data (the x's) to our
output data (the y's).
Parameters/ weights
Notation:
x’s = “input” variable / features
y’s = “output” variable / “target” variable

Andrew Ng 6
The Hypothesis Function

(hypothesis)

Andrew Ng 7
Example input x
0
output y
4
1 7
• Suppose we have the following set of training data: 2 7
• Now we can make a random guess about our hθ 3 8
ℎ𝜃 (𝑥) = 2 + 2 𝑥
For example: 0 , 1 9

• The hypothesis function becomes 𝜃 7

• So, for input of x=1 to our hypothesis 𝜃 , 6

4. This is off by 3.

y
4

Note that we will be trying out various values 0 , 1 3

– to try to find values which provide the 2

best possible "fit" or the most

0
0 0.5 1 1.5 2 2.5 3 3.5

representative "straight line" through the x

data points mapped on the x-y plane.

9
Cost Function
• We can measure the accuracy of our hypothesis function by using a cost function.
• This takes an average of all the results of the hypothesis with inputs from compared to
the actual output .

Squared error function

Notation
m = Number of training examples
(𝑥 ( ) , 𝑦 ( ) ) = 𝑖 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
Objective function (Our goal) : 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 , 𝐽(𝜃 , 𝜃 )
Choose 𝜃 , 𝜃 so that ℎ (𝑥) is close to output y for our training examples (𝑥, 𝑦)

10
input 𝒙 output 𝒚
Simplified Example 1
2
1
2
3 3

Cost function J( )= () ()

(for fixed 𝜃 , this is a function of 𝑥)

𝑊ℎ𝑒𝑟𝑒 𝑚 (# 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠) = 3
3

y
1

0
0 1 2 3 ℎ =0
x

Andrew Ng 11
input x output y
Simplified Example 1
2
1
2
3 3

Cost function J( )= () ()

(for fixed 𝜃 , this is a function of x)

Where m (# of samples)= 3
3

y
1 ℎ = 0.5𝑥

0
0 1 2 3 ℎ =0
x

Andrew Ng 12
input x output y
Simplified hypothesis 1
2
1
2
3 3

Cost function J( )= () ()

(for fixed 𝜃 , this is a function of x)

Where m (# of samples)= 3
3 ℎ = 𝑥 J(0) = )
= 1+4+9) ≈ 2.3
2
y J(0.5)= )
ℎ = 0.5𝑥
+1+2.25) ≈ 0.58
1
=
0 ℎ =0 J(1)= )
0 1
x 2 3

= )= 0

Andrew Ng 13
Simplified Example 𝐽(0) ≈ 2.3
𝐽(0.5) ≈ 0.58
𝐽(1) = 0

(for fixed 𝜃 , this is a function of x) (function of the parameter. )

2
X
y
1 X
X
0 -0.5 0 0.5 1 1.5 2 2.5
0 1
x 2 3

Minimum cost = Best fit line (our objective)

Andrew Ng 14
Summary of cost function

15
Summary of cost function

16
(for fixed , this is a function of x) (function of the parameters )

ℎ 𝑥 = 800 − 0.15 𝑥

has same
X 𝐽(𝜃 , 𝜃 )

-0.15 X

800

“Contour plots” or “Contour figures”

17
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

ℎ 𝑥 = 500 − 0.5 𝑥

-0.5

500

18
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

ℎ 𝑥 = 100 + 0.1 𝑥
0.1

100
our objective

19
Andrew Ng
Quick Summary until now

Hypothesis:

Cost Function:

Goal:
Our strategy until now: Keep changing to reduce until
we hopefully end up at a minimum!!
20
Optimization Technique

Gradient Descent
Optimization
• Optimization is the process of finding the set of
parameters/weights that minimize the c o s t
f un ct i o n .

• Strategy 1: A first very bad idea solution: RANDOM SEARCH

◮ is to simply try out many different random parameters and
keep track of what works best. (iterative refinement.)
◮ start with random weights and iteratively refine them over time
to get lower cost

22
Optimization
• Strategy 2: Following the Gradient
◮ Compute the best direction along which we should change our
parameter (weight) vector that is mathematically guaranteed to
be the direction of the steepest descend.
◮ This direction will be related to the gradient of the cost
function.

23
Problem set up

24
Gradient Descent

Andrew Ng 25
Gradient Descent

• The way we do this is by taking the derivative

(the tangential line to a function) of our cost
function. (function of the parameter )
3

• The slope of the tangent is the derivative at

that point and it will give us a direction to
2
move towards. X
• We will know that we have succeeded when
our cost function is at the very bottom of the 1
X
pits in our graph, i.e. when its value is the
minimum. X
0
-0.5 0 0.5 1 1.5 2 2.5

Andrew Ng 27
Gradient Descent
• We make steps down the cost function in the direction with the steepest descent,
and the size of each step is determined by the parameter α, which is called the
learning rate.
• The gradient descent algorithm is:
(function of the parameter )
• Repeat until convergence { 3

2
X
}
1
X

Learning rate (step size)

0
-0.5 0 0.5 1 1.5 2 2.5
Positive slope (positive number) Ɵ will decrease
Negative slope (negative number) Ɵ will increase
28
GD Intuition

29
GD algorithm

30
Two ways to compute the gradient
• There are two ways to compute the gradient:
1) Numerical gradient: A slow, approximate but easy way to implement.
Approximate (since we have to pick a small value of h, while the true
gradient is defined as the limit as h goes to zero), and that it is very
computationally expensive to compute

2) Analytic gradient: A fast, exact but more error-prone way that requires
calculus. It allows us to derive a direct formula for the gradient (no
approximations) that is also very fast to compute.

Always use analytic gradient but check implementation with

numerical gradient. This is called a gradient check.
31
Gradient Descent for Linear Regression
• When specifically applied to the case of linear regression, a new form of the gradient
descent equation can be derived. We can substitute our actual cost function and our
actual hypothesis function and modify the equation to:
- Start by initializing the parameters randomly
- Repeat until convergence { // until error is small
- Predicted values with linear regression hypothesis.
- Calculate the cost function.
- If cost is large, update parameters using GD
𝜕
Should be done 𝜕
simultaneously
𝜕
𝜕
}

32
Gradient Descent variants
• There are three variants of gradient descent based on the amount of data used
to calculate the gradient:

1. Batch gradient descent

2. Stochastic gradient descent
3. Mini-batch gradient descent

33
Batch Gradient Descent
• Batch Gradient Descent, Vanilla gradient descent, calculates the error for
each observation in the dataset but performs an update only after all
observations have been evaluated.

• One cycle through the entire training dataset is called a training epoch.
Therefore, it is often said that batch gradient descent performs model
updates at the end of each training epoch.

• Batch gradient descent is not often used, because it represents a huge

consumption of computational resources, as the entire dataset needs to
remain in memory.

34
Batch Gradient Descent

35
Stochastic Gradient Descent (SGD)
• Stochastic gradient descent, often abbreviated SGD, is a variation of the
gradient descent algorithm that calculates the error and updates the model
for each example in the training dataset.

• The noisy update process can allow the model to avoid local minima (e.g.
premature convergence).

• SGD is usually faster than batch gradient descent, but its frequent updates
cause a higher variance in the error rate, that can sometimes jump around
instead of decreasing.

36
Mini-Batch Gradient Descent
• Mini-batch gradient descent seeks to find a balance between the robustness of
stochastic gradient descent and the efficiency of batch gradient descent.

• It is the most common implementation of gradient descent used in the field of deep
learning.

• It splits the training dataset into small batches that are used to calculate model error
and update model coefficients.

37
(for fixed , this is a function of x) (function of the parameters. )

39
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

40
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

41
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

42
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

43
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

44
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

45
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

46
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

1250

47
Andrew Ng
Learning Rate
• The gradient tells us the direction, but it does not tell us how far along this
direction we should step.

• The learning rate (step size) determines how big the step would be on
each iteration. It determines how fast or slow we will move towards the
optimal weights.

48
Learning Rate
• If learning rate is large, it may fail to converge and overshoot the
minimum.
• If learning rate is very small, it would take long time to converge and
become computationally expensive.
• The most commonly used rates are :
0.001, 0.003, 0.01 (default),
0.03, 0.1, 0.3

49
Regression Problem

Multivariate Linear Regression

Linear Regression with multiple Variable

50
Multiple variables (Features)
• Linear regression with multiple variables is also known as "multivariate linear
regression".

Could the prediction be more

𝑺𝒊𝒛𝒆 (𝒇𝒆𝒆𝒕)𝟐 𝑷𝒓𝒊𝒄𝒆 $𝟏𝟎𝟎𝟎 accurate if we add #of rooms?
𝒙 𝒚
2104 460
1416 232
1534 315
852 178

51
Multiple variables (Features)

𝑺𝒊𝒛𝒆 (𝒇𝒆𝒆𝒕)𝟐 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑨𝒈𝒆 𝒐𝒇 𝒉𝒐𝒎𝒆 𝑷𝒓𝒊𝒄𝒆 $𝟏𝟎𝟎𝟎

𝒃𝒆𝒅𝒓𝒐𝒐𝒎𝒔 𝒇𝒍𝒐𝒐𝒓𝒔 (𝒚𝒆𝒂𝒓𝒔)
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒚
2104 5 1 45 460
1416 3 2 40 232
𝒎, 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔
1534 3 2 30 315
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠
852 2 1 36 178
… … … … …
1416
𝐧, number of features 3
𝑥( )
=
2
𝑥 ( ) , the input (features) of the 𝑖 training example
()
40
𝑥 , value of feature j in the 𝑖 training example
( )
𝑥 = 30

52
Hypothesis

𝑥 𝜃
𝑥 𝜃
𝐹𝑜𝑟 𝑐𝑜𝑛𝑣𝑒𝑛𝑖𝑒𝑛𝑐𝑒 𝑜𝑓 𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛, 𝑑𝑒𝑓𝑖𝑛𝑒 𝑥 = 1
𝑥 𝜃
𝑥= 𝑥 ∈ℝ 𝜃= 𝜃 ∈ℝ
. .
ℎ 𝑥 = 𝜃 𝑥 +𝜃 𝑥 + 𝜃 𝑥 +⋯+𝜃 𝑥 . .
𝑥 𝜃
53
Hypothesis for Multiple Features

54
Hypothesis for Multiple Features 𝑥 𝜃
𝑥 𝜃
𝑥 𝜃
𝑥= 𝑥 ∈ℝ 𝜃= 𝜃 ∈ℝ
. .
. .
𝑥 𝜃

55
GD for Multiple Variables

57
Gradient Descent in Practice I - Feature Scaling

• We can speed up gradient descent by having each of our input values in roughly the same
range.
• Because will:
– descend quickly on small ranges
– descend slowly on large ranges, and
– oscillate inefficiently down to the optimum when the variables are very uneven.

• The way to prevent this is to modify

the ranges of our input variables so
that they are all roughly the same.

58
Gradient Descent in Practice I - Feature Scaling

Make sure features are on same scale using:

– Mean Normalization
– Z-Score Normalization

Un-scaled features Scaled features

https://www.blog.nipunarora.net/ml_multi_variate_linear_regression/ 59
Features Selection
• We can improve our features and the form of our hypothesis function.
• We can combine multiple features into one. For example, we can
combine and into a new feature by taking .
– Ex: is length and is width of a house, we can combine them into a new
feature area= length x width

63
Optimization Technique:

Normal Equations
Better values for
• To solve for analytically normal equation
•

Minimize
𝐽 𝜃 = …=0 (set to zero)
for 𝑒𝑣𝑒𝑟𝑦 𝑗 solve for 𝜃 , 𝜃 , … , 𝜃

67
Example: m=4
𝑺𝒊𝒛𝒆 (𝒇𝒆𝒆𝒕)𝟐 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑨𝒈𝒆 𝒐𝒇 𝒉𝒐𝒎𝒆 𝑷𝒓𝒊𝒄𝒆 $𝟏𝟎𝟎𝟎
𝒃𝒆𝒅𝒓𝒐𝒐𝒎𝒔 𝒇𝒍𝒐𝒐𝒓𝒔 (𝒚𝒆𝒂𝒓𝒔)
𝒙𝟎 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒚
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

Features matrix 𝑚 × (𝑛 + 1) 𝑚 − dim 𝑣𝑒𝑐𝑡𝑜𝑟

68
examples ; n features.
𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 , 𝑛 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠
( )
( )
()
𝑥
()
𝑥
() m x (n+1)
= 𝑥. ∈ ℝ
𝑖
• 𝑥
. ( )
.
()
𝑥
𝑦( )
𝐸𝑥𝑎𝑚𝑝𝑙𝑒: 𝑜𝑛𝑒 𝑓𝑒𝑎𝑡𝑢𝑟𝑒
1 𝑥 𝑦( )

𝑥( ) =
1
() 𝑋= 1 𝑥 𝑦= .
𝑥 ⋮ ⋮ .
1 𝑥 .
𝑦( )

69
Gradient Descent Vs. Normal Equation
Gradient Descent Normal Equation
Need to choose No need to choose
Needs many iterations No need to iterate

Works well when is large Slow if is very large

What if (XT X) is non-invertible (singular/degenerate)?
reasons: Redundant features (size in both feet and meter)/ too many features m<=n

Gradient descent
Works well even when n is massive (millions)
Better suited to big data
What is a big n though
100 or even a 1000 is still (relativity) small
If n is 10 000 then look at using gradient descent
Normal equation
Normal equation needs to compute (XT X)-1
This is the inverse of an n x n matrix
With most implementations computing a matrix inverse grows by O(n3 )
Can be much slower

70
Example 2:

Age (x1) Height in cm (x2) Weight in Kg (y)

4 89 16
9 124 28
5 103 20
• What is X (design matrix) and y?

X= y=

71
Check?
• Suppose you have training examples with features. The normal equation is
• For the given values of m and n, what are the dimensions of , X, and in this equation?

X has m x n + 1 = 25 x 7
y is an m-vector = 25 x1.
θ is an (n+1)-vector= 7 x1

Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
LinearRegression) Byimran
No ratings yet
LinearRegression) Byimran
47 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Notes 1
No ratings yet
Notes 1
30 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
CSE445 T3 Linear Regression One Variable
No ratings yet
CSE445 T3 Linear Regression One Variable
57 pages
CM20315 06 Fitting
No ratings yet
CM20315 06 Fitting
67 pages
Lecture2-Linear Regression With One Variable
No ratings yet
Lecture2-Linear Regression With One Variable
49 pages
Lecture3 - Linear Regression and Logistic Regression
No ratings yet
Lecture3 - Linear Regression and Logistic Regression
60 pages
CS229
No ratings yet
CS229
69 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
ML02
No ratings yet
ML02
25 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Regression
No ratings yet
Regression
30 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Cost Function: y 2m 1 (Y ) 2m 1
No ratings yet
Cost Function: y 2m 1 (Y ) 2m 1
1 page
Regression Analysis
No ratings yet
Regression Analysis
54 pages
cs229 2
No ratings yet
cs229 2
275 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Design and Analysis of Experiments, 9th Edition (Ebook PDF) Download
100% (1)
Design and Analysis of Experiments, 9th Edition (Ebook PDF) Download
46 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
British Tourist Authority Price Sensitivity of Tourism To Britain
No ratings yet
British Tourist Authority Price Sensitivity of Tourism To Britain
132 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Aiml Notes Chapter-3
No ratings yet
Aiml Notes Chapter-3
34 pages
ML Notes
No ratings yet
ML Notes
14 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
FYBSc CS 2021 22
No ratings yet
FYBSc CS 2021 22
73 pages
Day Trading For A Living?
No ratings yet
Day Trading For A Living?
17 pages
JB Ies 109 Exercises Answers
No ratings yet
JB Ies 109 Exercises Answers
246 pages
Biostat Chat GPT Note
No ratings yet
Biostat Chat GPT Note
11 pages
Chapter 2 - Panel Data Regression
No ratings yet
Chapter 2 - Panel Data Regression
30 pages
Mango Butter Yield Ed Casas Et Al
No ratings yet
Mango Butter Yield Ed Casas Et Al
35 pages
Srda Advance Sampling
No ratings yet
Srda Advance Sampling
33 pages
Module 4 - Study Material - Overview of Predictive Analytics
No ratings yet
Module 4 - Study Material - Overview of Predictive Analytics
15 pages
109 Sourabh Vivek Chougule
No ratings yet
109 Sourabh Vivek Chougule
75 pages
RochaTangneyDondio ECGBL
No ratings yet
RochaTangneyDondio ECGBL
10 pages
Assignment - DBB2102 - BBA 3 - Set-1 and 2 - Nov - 2023
No ratings yet
Assignment - DBB2102 - BBA 3 - Set-1 and 2 - Nov - 2023
10 pages
Water Framework Directive Watch List Method: Analysis of 17 - Estradiol and Estrone in Water
No ratings yet
Water Framework Directive Watch List Method: Analysis of 17 - Estradiol and Estrone in Water
58 pages
Understanding Social Inequalities in Pakistan: An Intersectionality Perspective On Ethnicity, Income, and Education
No ratings yet
Understanding Social Inequalities in Pakistan: An Intersectionality Perspective On Ethnicity, Income, and Education
18 pages
A Study On Indian Stock Market Predictio
No ratings yet
A Study On Indian Stock Market Predictio
15 pages
Reynolds 1RMPredic 5RM&Antrh JSCR 06
No ratings yet
Reynolds 1RMPredic 5RM&Antrh JSCR 06
10 pages
Health Behavior Theory and Cumulative Knowledge Regarding Health Behaviors: Are We Moving in The Right Direction?
No ratings yet
Health Behavior Theory and Cumulative Knowledge Regarding Health Behaviors: Are We Moving in The Right Direction?
16 pages
Riccio COR1GB1305 Fall15
No ratings yet
Riccio COR1GB1305 Fall15
4 pages
Applied Sciences: An Intelligent Event-Sentiment-Based Daily Foreign Exchange Rate Forecasting System
No ratings yet
Applied Sciences: An Intelligent Event-Sentiment-Based Daily Foreign Exchange Rate Forecasting System
15 pages
Activity 3 - Correlation & Regression
No ratings yet
Activity 3 - Correlation & Regression
2 pages
Cell2Cell Churn Modeling and Analysis: Group 8
No ratings yet
Cell2Cell Churn Modeling and Analysis: Group 8
13 pages
Attachment 2
No ratings yet
Attachment 2
4 pages
CSC 413 Ass
No ratings yet
CSC 413 Ass
7 pages
A Meditation On Mediation: Evidence That Structural Equations Models Perform Better Than Regressions
No ratings yet
A Meditation On Mediation: Evidence That Structural Equations Models Perform Better Than Regressions
15 pages
Exercise On Correlation and Regression
No ratings yet
Exercise On Correlation and Regression
1 page
Pam3100 Ps5 Revised Spring 2018
No ratings yet
Pam3100 Ps5 Revised Spring 2018
5 pages
Shelf Life PDF
No ratings yet
Shelf Life PDF
8 pages
HW 9 Update
No ratings yet
HW 9 Update
3 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Linear Regression

Uploaded by

Linear Regression

Uploaded by

CSCI417

1.Machine Learning Basics

Univariate Linear Regression

Housing Prices (Portland, Oregon) Price

Given the “right answer” for each example in the data.

– like the equation of a straight line

• The hypothesis function becomes 𝜃 7

• So, for input of x=1 to our hypothesis 𝜃 , 6

Note that we will be trying out various values 0 , 1 3

– to try to find values which provide the 2

best possible "fit" or the most

representative "straight line" through the x

data points mapped on the x-y plane.

Squared error function

(for fixed 𝜃 , this is a function of 𝑥)

(for fixed 𝜃 , this is a function of x)

(for fixed 𝜃 , this is a function of x)

(for fixed 𝜃 , this is a function of x) (function of the parameter. )

Minimum cost = Best fit line (our objective)

“Contour plots” or “Contour figures”

• Strategy 1: A first very bad idea solution: RANDOM SEARCH

• The way we do this is by taking the derivative

• The slope of the tangent is the derivative at

Learning rate (step size)

Always use analytic gradient but check implementation with

1. Batch gradient descent

• Batch gradient descent is not often used, because it represents a huge

Multivariate Linear Regression

Could the prediction be more

𝑺𝒊𝒛𝒆 (𝒇𝒆𝒆𝒕)𝟐 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑨𝒈𝒆 𝒐𝒇 𝒉𝒐𝒎𝒆 𝑷𝒓𝒊𝒄𝒆 $𝟏𝟎𝟎𝟎

• The way to prevent this is to modify

Make sure features are on same scale using:

Un-scaled features Scaled features

Features matrix 𝑚 × (𝑛 + 1) 𝑚 − dim 𝑣𝑒𝑐𝑡𝑜𝑟

Works well when is large Slow if is very large

Age (x1) Height in cm (x2) Weight in Kg (y)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.