0% found this document useful (0 votes)
12 views6 pages

Least Squares 25

The least squares method is employed to determine the best-fitting parameters for a model when data does not fit perfectly. It minimizes the sum of squares error to find an approximate solution to a linear system of equations, which is particularly useful in predicting outcomes such as severe leg injuries related to the number of skiers. The document illustrates this method with a dataset from Copper Mountain, demonstrating the transition from a linear model to a quadratic model for improved predictive accuracy.

Uploaded by

pqpxbgz84z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Least Squares 25

The least squares method is employed to determine the best-fitting parameters for a model when data does not fit perfectly. It minimizes the sum of squares error to find an approximate solution to a linear system of equations, which is particularly useful in predicting outcomes such as severe leg injuries related to the number of skiers. The document illustrates this method with a dataset from Copper Mountain, demonstrating the transition from a linear model to a quadratic model for improved predictive accuracy.

Uploaded by

pqpxbgz84z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

4.

7 Least Squares
Skiers Injuries
The least squares method is used to find the parameters of a proposed 2402 61
model for a given set of data when the model does not fit the data perfectly. 2742 55
The least squares method is best explained with an example and the 2002 50
calculations are easily accomplished by solving the associated least squares 3210 61
problem in MATLAB. 4086 62
Often a set of data are presented for analysis, and we want to model 4514 61
the data with a simple function. One approach for analyzing a data set with 6370 91
𝑛 data values is to build a 𝑛 − 1 degree interpolating polynomial and fit it 4557 58
through all of the data. In general this is a bad idea. Polynomial 4051 61
8086 144
interpolation has some advantages in practice, but interpolation is rarely
used on its own. For example: Consider the
table of data that relates number of skiers at
Copper Mountain to severe leg injuries.
To interpolate a polynomial through all
the data points requires a 9th degree
polynomial which provides a poor model for
the data. The model is useless for predicting
future injuries as shown by the graph below.
Clearly this 9th degree polynomial model
cannot be used predictively.
Consider the actual data presented in
the second graph. How can we build a better
model (better predictor) of severe leg injuries?
If we choose to model the data with a simple
linear function
𝑓(𝑡) = 𝑥0 + 𝑥1 𝑡
then we face another problem. With only 2
unknowns, 𝑎0 and 𝑎1 , how can we hope to
satisfy all 10 data points? We are trying to
satisfy these equations:
𝑥0 + 𝑥1 (2402) = 61
𝑥0 + 𝑥1 (2742) = 55
𝑥0 + 𝑥1 (2002) = 50
𝑥0 + 𝑥1 (3210) = 61
𝑥0 + 𝑥1 (4086) = 62
𝑥0 + 𝑥1 (4514) = 61
𝑥0 + 𝑥1 (6370) = 91
𝑥0 + 𝑥1 (4557) = 58
𝑥0 + 𝑥1 (4051) = 61
𝑥0 + 𝑥1 (8086) = 144
By looking at the data, we know there is not a line passing through all of these data points. This is
called the least squares problem, and we discuss the details in the following pages.
To find the solution to the least squares problem, we find a function, i.e. a line or curve, that
minimizes the sum of squares error for a given data set. Finding this solution is equivalent to
determining the best approximate solution to a linear system of equations.

Theroem
⃑ = ⃑𝒃 is inconsistent, then the solution to the
Let 𝑨 be 𝑚 × 𝑛 with 𝑚 > 𝑛. If the system 𝑨𝒙
linear system
𝑨𝑇 𝑨𝒙 ⃑
⃑ = 𝑨𝑇 𝒃
is the least squares solution. 𝒙
⃑ 𝑙𝑠 . The 𝒙
⃑ 𝑙𝑠 vector minimizes the sum of squares error, i.e.
∀𝒙 ⃑ − 𝑨𝒙
⃑ ∈ ℝ𝑛 ; ‖𝒃 ⃑ ‖ is minimized when 𝒙 ⃑ 𝑙𝑠 .
⃑ =𝒙

⃑ 𝑙𝑠 as close as possible to ⃑𝒃. In this context, “close” means that the norm, ‖𝒃
𝑨 maps 𝒙 ⃑ − 𝑨𝒙
⃑ ‖, is
minimized by the vector 𝒙 ⃑ 𝑙𝑠 . This error minimization occurs when the error vector
𝒆𝒓𝒓 ⃑ − 𝑨𝒙
⃑⃑⃑⃑⃑⃑⃑ = 𝒃 ⃑
is orthogonal to the range of the matrix mapping 𝑨. After a least squares solution and error vector
are determined, we find the least squares error.

The least squares error is simply the magnitude of the error vector
least squares error is ‖𝒆𝒓𝒓
⃑⃑⃑⃑⃑⃑⃑ ‖
When solving a least squares problem, we seek the best approximation to the data, i.e. the least
squares solution; the error vector; and the least squares error.

Before we solve the problem with the injured skiers, we will discuss the geometric interpretation of
⃑ = ⃑𝒃 given here
least squares. Consider solving the system 𝑨𝒙
1 0 𝑥 1
1
[0 0] [𝑥 ] = [1]
2
0 1 1
1 0
This system is clearly inconsistent. The columns of the coefficient matrix 𝑨, [0] and [0], span the
0 1
𝑥1 𝑥3 - plane. The linear system 𝑨𝒙 ⃑
⃑ = 𝒃 is really asking the question, “Which vector 𝒙⃑ is mapped by
𝑨 to the vector 𝒃⃑ ?” The answer for this problem is no such vector exists. The least squares
question is, “What vector in the span of the columns of 𝑨 is ‘closest’ to the vector ⃑𝒃?” Consider the
picture where the columns of 𝑨 are black vectors and ⃑𝒃 is the solid, green vector.
The blue vector 𝒙
⃑ 𝑙𝑠 is the solution to the least squares
𝑥3
problem 𝑨 𝑨𝒙
𝑇 ⃑ = 𝑨𝑇 𝒃 ⃑ . The dashed, green vector, 𝑨𝒙 ⃑ 𝑙𝑠
in the 𝑥1 𝑥3 - plane, is the vector in the plane closest to ⃑𝒃.
Note: 𝑨𝒙⃑ 𝑙𝑠 differs from ⃑𝒃 only by the orthogonal red
vector in the x2 direction. ⃑⃑⃑⃑⃑⃑⃑
𝒆𝒓𝒓
Can you see why 𝑨𝒙 ⃑ − 𝑨𝒙
⃑ 𝑙𝑠 minimizes ‖𝒃 ⃑ ‖?
1
⃑ 𝑨⃑𝒙𝑙𝑠 ⃑⃑ = [1]
𝒃 𝑥2
For this example we have the system 𝑨𝑇 𝑨𝒙
⃑= 𝑨𝑇 𝒃
1 0 𝑥 1 1
1 0 0 1 1 0 0
[ ] [0 0] [𝑥 ] = [ ] [1]
0 0 1 2 0 0 1
0 1 1 ⃑ 𝑙𝑠
𝒙
1 0 1 𝑥 1 𝑥 1 1
[ ] [ ] = [ ] giving [ ] = [ ] 𝑥1
0 1 𝑥2 1 𝑥2 1
1
1
This solution is 𝒙
⃑ 𝑙𝑠 = [ ] and the vector 𝑨𝒙
⃑ 𝑙𝑠 = [0] is
1
1

the vector in the range of 𝑨 closest to 𝒃.
0
⃑ − 𝑨𝒙
The error vector is the thin, red vector, 𝒃 ⃑ = 𝒆𝒓𝒓
⃑⃑⃑⃑⃑⃑⃑ = [1].
0
The least squares error is the magnitude of the error vector, i.e. the least squares error for this
problem is ‖𝒆𝒓𝒓
⃑⃑⃑⃑⃑⃑⃑ ‖ = 1.

Skiers Injuries
Now we return to the problem with skiers on Copper Mountain. The data are 2402 61
given, so we build the linear system of equations using the function 2742 55
𝑓(𝑠) = 𝑥0 + 𝑥1 𝑠 2002 50
where 𝑠 is the number of skiers and 𝑓(𝑠) is the number of injuries. Note: this 3210 61
is a linear model. 4086 62
𝑥0 4514 61
The system for 𝑨𝒙 ⃑ =𝒃⃑ , where 𝒙
⃑ = [ ], is given by
𝑥1 6370 91
1 2402 61 4557 58
1 2742 55 4051 61
1 2002 50 8086 144
1 3210 61
1 4086 𝑥0 62
[ ]=
1 4514 1 𝑥 61
1 6370 91
1 4557 58
1 4051 61
[1 8086 ] [144]
Multiplying 𝑨 to both sides of the equation gives
𝑇
10 42020 𝑥0 704
[ ][ ] = [ ]
42020 207809010 𝑥1 3377399
Building the augmented matrix and finding the RREF yields values for 𝑎1 and 𝑎0
1 0 14.018
[ ]
0 1 0.013423
Thus, the least squares function is given here followed by a graph of the data and the function
𝑓(𝑠) ≈ 14.018 + 0.013423𝑠

Continuing, what happens when we substitute the


𝑥0 14.018
solution 𝒙
⃑ 𝑙𝑠 = [ ] ≈ [ ] back into the
𝑥1 0.013423
system 𝑨𝒙 ⃑ = ⃑𝒃? The resulting vector 𝑨𝒙 ⃑ 𝑙𝑠 is the
vector in the span of the columns of 𝑨 that is closest
to the vector ⃑𝒃, i.e. it is the best approximation.
Following is the actual calculation.
1 2402 46.3
1 2742 50.8
1 2002 40.9
1 3210 57.1
1 4086 14.018 68.8
𝑨𝒙⃑ 𝑙𝑠 ≈ [ ]≈
1 4514 0.013423 74.6
1 6370 99.5
1 4557 75.1
1 4051 68.4
[1 8086 ] [122.6]
46.3 61 61 46.3 14.7
50.8 55 55 50.8 4.2
40.9 50 50 40.9 9.1
57.1 61 61 57.1 3.9
68.8 62 62 68.8 −6.8
is the best approximation to and the error vector is 𝒆𝒓𝒓
⃑⃑⃑⃑⃑⃑⃑ = − =
74.6 61 61 74.6 −13.6
99.5 91 91 99.5 −8.5
75.1 58 58 75.1 −17.1
68.4 61 61 68.4 −7.4
[122.6] [144] [144] [122.6] [ 21.4]

The least squares error is ‖𝒃⃑ − 𝑨𝒙 ⃑⃑⃑⃑⃑⃑⃑ ‖ For this problem ‖𝒆𝒓𝒓
⃑ 𝑙𝑠 ‖ = ‖𝒆𝒓𝒓 ⃑⃑⃑⃑⃑⃑⃑ ‖ ≈ 38.
The same procedure can be used to minimize the sum of squares error for a quadratic function.
Because the linear model provides only marginal predictive ability, we advance to a quadratic
model which has 3 degrees of freedom. Consider
𝑞(𝑠) = 𝑥0 + 𝑥1 𝑠 + 𝑥2 𝑠2
𝑥0
and build the system of linear equations, 𝑨𝒙 ⃑
⃑ = 𝒃, from the data using 𝒙 ⃑ = [𝑥1 ]. The system is
𝑥2
1 2402 5769604 61
1 2742 7518564 55
1 2002 4008004 50
1 3210 10304100 𝑥 61
0
1 4086 16695396 𝑥 62
[ 1] =
1 4514 20376196 𝑥 61
2
1 6370 40576900 91
1 4557 20766249 58
1 4051 16410601 61
[1 8086 65383396 ] [144]
In reality this system is badly scaled and produces a least squares system, 𝑨𝑇 𝑨𝒙 ⃑ , that
⃑ = 𝑨𝑇 𝒃
modern, 64-bit computers cannot solve. However, the problem can easily be scaled in order to
improve the solvability. We will not discuss the scaling details here. Instead, we simply build the
least squares system by applying the transpose of the coefficient matrix, i.e. 𝑨𝑇 .
10 42020 207809010 𝑥0 704
[ 42020 207809010 𝑥
1184046347504] [ 1 ] = [ 3377399 ]
207809010 1184046347504 7527999527654179 𝑥2 19185675699
Solving the scaled system (not this one) yields the desired quadratic model. The RREF for the
system yields values for the unknown
coefficients
1 0 0 83.298
[0 1 0 −0.019364 ]
0 0 1 0.0000032948
Using the coefficient values we build the model.
𝑞(𝑠) = 83.298 − 0.019364𝑠 + 0.0000032948𝑠2
A graph of the data with the quadratic model is
provided. This model is clearly better, and we
can determine the best approximation and the
error vector
55.8 5.20
55.0 0.026
57.7 −7.75
55.1 5.91
59.2 2.81
𝑨𝒙⃑ 𝑙𝑠 ≈ ⃑⃑⃑⃑⃑⃑⃑ ≈
𝒆𝒓𝒓
63.0 −2.03
93.6 −2.64
63.4 −5.48
58.9 2.08
[142.2] [ 1.85]
Calculating ‖𝒆𝒓𝒓
⃑⃑⃑⃑⃑⃑⃑ ‖ gives the least squares error, i.e. ‖𝒆𝒓𝒓
⃑⃑⃑⃑⃑⃑⃑ ‖ ≈ 13.4.
The quadratic model gives its worst estimate of
the data at a specific data point, (2002, 50). The
estimated data value is (2002, 57.7), and the error
at that point is approximately −7.75.

The least squares method can be used to provide a


wide variety of models.
If a model has unknown coefficients that appear in a
linear way, the least squares equation for minimizing
error will work.
For example consider the model
𝑔(𝑠) = 𝑥3 𝑒 0.0001s + 𝑥2 𝑠2 + 𝑥1 𝑠 + 𝑥0
Because the coefficients appear in a linear way, this
model has a least squares solution. A graph of the
least squares solution using 𝑔(𝑠) is included.

EXERCISES
MATLAB: Create the script leastsquaresYourLastName.m and use it to store these problems.
Use the following data set on all of the problems below:
Time 1 2 3 4 5 6 7 8 9 10 11 12
Distance 5.9 6.7 18.4 25.3 26.0 33.7 44.8 31.3 35.6 50.4 40.0 60.4
1) Use the least squares method to find the best linear model (i.e., 𝑓(𝑡) = 𝑥0 + 𝑥1 𝑡) for the data by
completing these tasks:
a) Build the inconsistent system of linear equations, 𝑨𝒙⃑ = ⃑𝒃, using the model/data provided.
b) Convert to the associated consistent system of linear equations, 𝑨𝑇 𝑨𝒙 ⃑ = 𝑨𝑇 ⃑𝒃.
c) Determine the coefficients of the model for the data set.
d) Find the best approximation to the data.
e) Find the error vector between the actual data and the best approximation.
f) Find the least squares error.
g) Find the data point for which the model provides its worst estimate.
h) Graph the data and the model.
2) Repeat Problem 1 but find the best quadratic model (i.e., 𝑞1 (𝑡) = 𝑥0 + 𝑥1 𝑡 + 𝑥2 𝑡 2 ) for the data.
3) It may make sense that distance = 0 when time = 0. Repeat Problem 1 but find the best
quadratic model through the origin (i.e., 𝑞2 (𝑡) = 𝑥1 𝑡 + 𝑥2 𝑡 2 ) for the data.
4) Repeat Problem 1 but find the best cubic model (i.e., 𝑐(𝑡) = 𝑥0 + 𝑥1 𝑡 + 𝑥2 𝑡 2 + 𝑥3 𝑡 3 ) for the
data. Can this model be forced through the origin?
5) Repeat Problem 1 but find the best logarithmic model (i.e., ℎ(𝑡) = 𝑥0 + 𝑥1 ln (𝑡)) for the data.
Can this model be forced through the origin?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy