MITOCW - MITRES - 18-007 - Part4 - Lec1 - 300k.mp4: Herbert Gross
MITOCW - MITRES - 18-007 - Part4 - Lec1 - 300k.mp4: Herbert Gross
mp4
The following content is provided under a Creative Commons license. Your support
will help MIT OpenCourseWare continue to offer high quality educational resources
for free. To make a donation or view additional materials from hundreds of MIT
HERBERT Hi. As I was standing here wondering how to begin today's lesson, an old story
GROSS: came to mind of the professor who passed out an examination to his class, and one
of the students said, "Professor, this is the same test you gave us last week". And
the professor said, "I know, but this time I changed the answers." And I was thinking
of this in terms of the fact that much of the new mathematics is essentially the old
mathematics with some of the answers changed.
One of the topics that we used to belittle in the traditional curriculum because it was
too easy, was the topic called linear equations. And it turns out that in the study of
I could've called today's lesson something old, something new. Meaning that the old
topic that we were going to revisit would be that of linear functions, and the new
topic would be how it manifests into the modern curriculum in the sense that one
introduces a subject called linear algebra, or matrix algebra, as a standard portion
of a modern calculus course whereas in the traditional calculus courses, essentially
Instead I picked a more conservative title for today's lesson, I simply call it "Linearity
Revisited". And as I say, it goes back to when we were in junior high school or high
school, when we were taught that linear functions were very nice. For example,
given the equation y equals mx plus b-- the linear equation meaning what? It graphs
as a straight line, but that the two variables are related linearly, y is a constant
We were told solve for x in terms of y. And what we found was that if y equals mx
1
plus b, this was true-- if and only if-- x was equal to y minus b over m. What we
showed was given a value of x that corresponded a value of y, and conversely given
a value for y that corresponded a unique value for x.
And to put this into the language of functions, what we were saying was that if f(x)
equals mx plus b, then f inverse exists. In other words, what we're saying is that no
two different x values can give you the same y value if the function has the form y
equals mx plus b. And just about the time that we were learning to enjoy this kind of
an equation our dream world was shattered, and we were told it's too bad, but most
functions aren't linear.
We were given things like y equals x to the seventh plus x to the fifth, and we found
that we couldn't solve for x very conveniently in terms of y. And that's what began
our intermediate algebra and advanced algebra courses. In other words, the fact
that most functions are non-linear. Now an interesting thing occurred though. Let
me just emphasize this. And this is the key point.
In terms of calculus, we discovered-- and here's a key word coming up-- Most
functions are locally linear. Now that sounds a little bit like a tongue twister, but
actually back in the first part of course when we talked about delta y sub tan-- a
We were saying that to study f(x) near f equals a, we saw that f(a) plus delta x
minus f(a) was equal to f prime of a times delta x plus k delta x where the limit of k--
if delta x1 went to 0-- was 0 itself. Provided of course that f was differentiable at x
equals a otherwise you couldn't write down f prime of a here.
The interesting point is this. But if you look just at this term over here, this expresses
delta f as a linear function of delta x. The part that makes this thing non-linear is the
term called k delta x. But that's the term that's going to 0 as a second order
infinitesimal. So what we're really saying is this-- That provided that f is differentiable
at x equals a-- In other words locally we mean this-- near x equals a we can say that
2
That's what we call delta f sub tan, recall. And what we mean by approximately here
is that error k delta x goes to 0 very, very rapidly as delta x goes to 0. And what we
mean by locally is this-- suppose f prime exists also when x equals b. We can again
compute delta f near x equals b Now delta f is equal to what? Approximately f prime
of b times delta x plus that error term which goes to 0 very rapidly.
We again call this thing here delta f tan, but the thing to keep in mind is since f
prime of a need not equal f prime of b, delta f tan is different at a and at b. In other
words, even though it's always true where f is differentiable, that we can say that
delta f is approximately delta f tan the value of delta f tan depends on the value of x
that we're near. And that's what we mean by saying that approximating delta f by
Now I think that sometimes by putting these things into words it sounds harder than
it really is. So I think what might be nice is if we just look at a specific illustration, a
problem which I deliberately picked to be as simple a nonlinear example as I can
think of.
Let me come back to our old friend, the function f(x) equals x squared, which as I
say, is about as simple a non-linear function we can get into. Now we know that f(x)
equals x squared plots as the curve y equals x squared, the parabola. Let's take a
couple of points on this parabola. Let's say the point (1, 1), and the point (2, 4),
draw in the tangent lines to the curve at these two points. And we know what? That
the equation of the tangent line to the curve at (1, 1) is y minus 1 over x minus 1
equals the slope. Since y is equal to x squared the slope is 2x, when x is 1 the slope
is 2. So the equation of this tangent line is given by y minus 1 over x minus 1 equals
2.
here is y minus 4 over x minus 2 equals 4. So now I've induced three functions that I
can talk about. My original function, f(x) is x squared. This straight line is the linear
function just solving for y in terms of x. g(x) equals 4x minus 4. And this straight line
corresponds to the function h(x) equals 2x minus 1.
3
Now the interesting point, of course, is that these two functions here are linear.
They are completely different functions. Notice not only pictorially are they different,
but algebraically their slopes are different, and their y-intercepts are different, and
back in our course in part one, we talked about things geometrically saying look at
near the point of tangency, the tangent line serves as a good approximation to the
curve itself.
What were we really saying then? What we were saying was that near the point of
tangency, g(x), which was a linear function, could replace f(x) which was a nonlinear
function.
Of course when we moved too far away from a given point then when we said that
f(x) still had a linear approximation, we had to pick a different linear function. By the
way, again because we were dealing with one independent variable and one
dependent variable, it was very easy to invent the concept of a graph. As we shall
show a little while, the concept of linearity extends to several variables, but you can't
draw the graph as nicely.
So let me now revisit the same result here, only without reference to the graph.
What we're saying is that our function is mapping the real number line into the real
number line. In other words instead of putting x and y at right angles to each other,
let's put x and y horizontally parallel to one another. And what we're saying is that f
Now what does h do? Remember h is the function 2x minus 1. h maps the interval
from 0 to 2 onto the interval from minus 1, 2, 3. And you see this is all this diagram
means f maps 0 into 0, it maps 1 into 1, it maps 2 into 4. f is the function which
squares the input to yield the output. And correspondingly, h maps 0 into minus 1, it
maps 1 into 1, and it maps 2 into 3.
Now the interesting point is that f and h are very different. In fact, the only time f and
h have the same output is when x equals 1. Which of course we move from before
because how was h(x) constructed? h(x) was constructed to be the line tangent to
the parabola y equals x squared at the point x equals 1 y equals 1. So that should
4
be no great surprise.
But if we didn't know that notice it algebraically we could equate f(x) to h(x) conclude
therefore, that means x squared must equal 2x minus 1. We then transpose, and
get that x minus 1 squared must be 0 whence x must equal 1. And what we have is
that near x equals 1 x squared behaves like-- and I put this in quotation marks
because that's the hardest part of the course that's going to follow was what you
And what we mean by that is this-- at least in terms of a picture. If I pick a small
interval surrounding x equals 1 on the x-axis, and a small interval-- like a thick dot--
surrounding y equals 1 on the y-axis here. Then as a mapping from this domain into
this range, I can essentially not distinguish f from h.
The error is so small that as the size of the interval shrinks, the error goes to 0 even
faster. And therefore, if I stay close enough locally to the point in question-- I cannot
tell the difference between the non-linear function and the linear function.
But what I have to be careful about is this-- that what whereas x squared can be
replaced by 2x minus 1, near x equals 1, near x equals 2, x squared can be
You might say well look it, don't these two straight lines intersect at the particular
point? The answer is yes they do. But even at the point that they intersect, there
was no neighborhood in which these lines can serve as approximations for one
another.
Those are two straight lines that intersect at a constant angle, and as soon as you
leave the point of intersection there is a significant arrow. Meaning an arrow which
does not go to 0 more rapidly than the change in x. You don't have that higher or
the infinitesimal over here. At any rate, leaving this to the exercises and the
supplementary notes for you to get more out of, in summary, let's just say this-- If f
is continuously differentiable at x equals a, then locally-- meaning near x equals a-- f
5
behaviors linearly.
f(x) is approximately f(a) plus f prime of a times the quantity x minus a, and you see
once x is chosen to be a this is a number, this is a number, delta x here is the only
variable on the right hand side.
So what we're saying is that f(x) is a what? Linear function of delta x. And the more
interesting point is since this is all review so I say what i mean by interesting point is
what? That we don't have to just review this way, we did this simply to refresh your
memories as to how linearity was playing a big role in calculus of a single variable.
Now what we're going to do is extend the result to several variables. Let me just say
that at the outset. That this concept does extend to n variables, but n equals 2
yields a particularly good geometric insight.
For example, let's suppose I look at two equations and two unknowns. Well actually,
I'll use u and v instead. Let those be variables. Also we can think of this as a
function. I have u(x, y) is x squared minus y squared, Whereas v(x, y) is 2xy. Notice
that these are not linear, because here we have things appearing to second part,
your squares, and here we have what? The variables multiplying one another.
These are not linear equations, but the beautiful point is-- if you look at this way-- is
even without a picture, I can think of this as a mapping which maps two dimensional
space into two dimensional space. And how does this mapping take place? It maps
the point or the pair, or the 2-tuple-- whichever way you want to say it-- (x, y) into
the 2-tuple (u, v), where u is x squared minus y squared, and v is 2 xy.
In other words, f-bar-- and notice I put the bar underneath simply to indicate that E2
is a vector space, and we have a function that's mapping what? A vector into a
vector, so I indicate that f is a vector function here. It maps a vector into a vector.
And how does the mapping take place? It maps the 2-tuple (x, y) into the 2-tuple x
squared minus y squared comma 2xy u v.
Now the thing is that as long as we only have n equals 2, we can still draw a picture,
but not a picture as nice as what existed when n was equal to 1. See pictorially, f-
6
bar maps the xy plane into what we can call the uv plane but notice that since the
domain of f-bar has two degrees of freedom-- a two dimensional vector space--
Notice that the domain of f-bar is the entire xy plane. Whereas the range of f-bar is
the entire uv plane--
In other words, I can now view f-bar as a mapping which carries points in the xy
plane into points in the uv plane, and this will be exploited more later in the course,
but the idea is this. Let's take a look for the time being. Let's see what f-bar does to
the point (2, 1).
Remember u is x squared minus y squared, so at the point (2, 1), u becomes what?
2 squared minus 1 squared, which is 3. On the other hand, 2xy is 2 times 2 times 1,
which is 4. So f-bar can be viewed as mapping the point (2, 1) into the point (3, 4).
Now you recall that calculus isn't interested in what's happening at a particular point,
is f-bar (2 + delta x, 1 + delta y), when delta x and delta y are quite small. That's the
What we're saying is we know that (2, 1) maps into (3, 4). We also know or we'd like
to believe that a point near (2, 1) maps into a point near (3, 4). Well if we call this
point (2 + delta x, 1 + delta y), then the corresponding image over here should be (3
What we can say is that whatever the image of (2 + delta x, 1 + delta y) is it has the
form (3 + delta u, 4 + delta v), and all we have to do is find delta u and delta v. This
is the pictorial idea of what's happening. Now the point is that delta u and delta v are
very difficult to find. After all, u and v are non-linear functions. To invert them is
The thing that's easy to find is delta u tan, and delta v tan. Remember delta u tan
was the postulate of u with respect to x times delta x, plus the postulate of u with
7
means delta u tan is 2x delta x minus 2y delta y. We're interested in this at the point
(2, 1).
Letting x be 2, and y be 1, we see that delta u tan is 4 delta x minus 2 delta y. Since
v is equal to 2xy, the postulate of v with respect to x is 2y the postulate with v with
respect to y is 2x. Therefore, delta v sub tan is 2y delta x plus 2x delta y. Since
we're evaluating this at x equals 2y equals 1, we see that delta v tan is two delta x
plus 4 delta y.
Now here's the key point. This is always delta u tan. This is always a delta v tan.
Well, the local thing comes in is that we know that because u and v are continuously
differentiable functions of x and y, that near the point (2, 1), we can replace delta u
by delta u sub tan delta v by delta v sub tan, and we wind up with what? delta u is
But the key point now is that this is a system of linear equations. You see delta u is
a linear combination of delta x and delta y, and delta v is also a linear combination
Notice how linear systems come into play. Now I've been emphasizing the case n
equals 2 just so we could draw a picture. Notice that the no matter how many
variables we have. Well, in fact, let me just summarize this in terms of x and y first.
And then we'll generalize it to n-variables in a minute.
The key point for two variables, and what happens for two variables happens for
any number. But as we've often done in this course, we emphasize the two variable
case because we can still visualize the picture. Even though the graph idea is hard
to see, because we're mapping two dimensions into two dimensions.
But at least the domain and the range are easy to see separately, but if u is a
continuously differentiable function of x and y near the point (x0, y0), then delta u is
8
exactly the postulate of u with respect to x times delta x, plus the postulate of u with
respect to y times delta y, plus an error term, k1 delta x plus k2 delta y, where k1
and k2 go to 0, as delta x and delta y go to 0.
If we just look at this spot alone, delta u is linear up to this as a correction term. In
infinitesimal, and the reason I keep harping on this point is that no matter how
complex the theory gets in the rest of this particular block, the key step is always
going to be that when you have a continuously differentiable function you can
essentially-- as long you stay locally-- you can essentially throw away the nasty part.
You can essentially throw away this error term, because it goes to 0 so rapidly that if
you stay close enough to the point x0, y0), no harm comes from neglecting this
term. What you must be careful about is that as soon as you pick a large enough
neighborhood so that this term is no longer negligible, then even though this part
here is still delta u sub tan, delta u sub tan is no longer a good approximation for
up to xn.
mentioned the text, I don't remember whether we've mentioned this in previous
lectures or not. It's rather interesting that when you deal with more than three
independent variables we somehow don't like to use the word delta w sub tan.
Instead we replace the word tangent by lin as an abbreviation for linear. The key
point being what? That this thing that we call delta w sub lin, or if you like to call it
sub tan, what's in the name? Call it whatever you want. The point is that this thing
that we call delta w sub lin, or delta w sub tan is the partial of f with respect to x1
evaluated at a-bar times delta x1, plus the partial of f with respect to x of n evaluate
9
it at a bar times delta xn.
And the key point is that once you have chosen a specific number a-bar, notice that
the coefficients of delta x1 up to delta xn are numbers. They're not variables. They
are numbers once a is chosen. So that what is delta w lin, why we call it linear?
Notice that this expression here is a linear combination of delta x1 up to delta xn.
In other words they're what? Sums of terms each involving a delta x times excuse
me. A delta x times a constant. What we're saying is that nice functions, and what's
a nice function? A nice function is one which is continuously differentiable. A nice
particular point can be approximated by a linear function, where the error will be
very small as long as you stay near the point in question.
You remember at the beginning of my lecture I said something old, something new.
This finishes the old part of the course. In other words, what I've tried to motivate for
you here is why If we were remodeling the pre-calculus curriculum much more
emphasis should be paid to linear equations. Granted that most functions in real life
are non-linear, the point remains that locally, functions are linear. OK?
That's the key point. Locally we deal with linear functions. Therefore, since all non-
linear functions may be viewed as being linear locally, this motivates why we should
really study systems of linear equations. In other words, this motivates the subject
called linear systems. Now what is a linear system? Essentially a linear system is m
equations in n unknowns.
In many cases m and n are taken to be equal, but what kind of equations are they?
They are equations where all the variables appear separately to the first power
multiplied only by a constant term, and by the way, let me introduce this double
subscript notation rather than introducing umpteen different symbols for constants.
Notice that a very nice device here is to pick one symbol like an a, and then use two
subscripts. The first subscript telling you what row the coefficient is referring to, and
the second one which column are in terms of the equations. The first subscript tells
10
you which equation you're dealing with, and the second subscript tells you what
For example this is what? This is the coefficient of x sub 1 in the first equation. This
is the coefficient of x sub n in the first equation. This is the coefficient of x sub n in
the n-th equation. Think of this as the row and the column if you will. And what we're
saying then is that the solutions of this type of system of equations are really
controlled by the coefficients of the x's.
In other words, by the numbers a sub ij, where i and j take on-- well i takes on all
values from what? The number of rows. i goes from 1 to m, and j goes from 1 to n.
But the a's become very important, and this is what ultimately is going to motivate
what we mean by a matrix, but before I come to that let me give you just one
example of what I mean by saying that the equations are governed by the
coefficients of the x's, not by the constants on the right hand side.
By the way, notice the convention that when you have two equations with two
unknowns rather than call the unknowns x1 and x2, it's conventional to call the
unknowns x and y. Let's take a particularly simple system here-- x plus y equals b1,
x minus y equals b2. If we add these two equations, we get 2x is b1 plus b2,
Notice that this tells us how to solve for x and y in terms of b1 and b2. Namely, to
find x you take half the sum of the two b's. To find y, you take half the difference.
Now certainly the solution depends on the values of b1 and b2. I'm not saying you
don't change the answers by changing the constants on this side. What I am saying
is that the structure by which you find the answers does not depend on b1 and b2;
What we're saying is no matter what b1 and b2 are in this particular problem, to find
x and y we take half the sum of the b's, and we take half the difference. In other
words, the solution depends on b1 and b2 numerically, but not structurally.
11
Well the whole idea is this-- and this is what we so often do in mathematics.
Because the solution to our equations depends on the coefficients of the x's, we
somehow want to focus our attention on the coefficients. And we don't need the x's
in there, because we can sort of think of the x's as being a place value type of
situation. In other words, x1 can be thought of as being the first column. x2 to the
second column. The first equation can be thought of as the first row. The second
equation, the second row.
And what this motivates is a concept called an m by n matrix. Now this sounds like a
very ominous term, an m by n matrix. But the point is it's not a very ominous term.
It's in fact, I think that it's too-- in fact the word matrix essentially indicates an array,
and that's all this thing is. By an m by n matrix, we simply mean a rectangular array
In other words, the first number tells you the number of rows, and the second
number tells you the number of columns. Now there's certainly nothing logical about
that in terms of our game idea. Just memorize this, it's a rule of the game or a
definition. Somebody could've said, why didn't you give the columns first and then
the rows? Well we could've, but one of them had to come first.
And the convention is that one refers to the rows first, and then the columns. An m
moment.
And it happens to be brackets right now. But if I write down this array-- What is it
now? [1, 1, 1; 1, -1, 2]; this is a rectangular array of numbers consisting of what?
Now again, we don't want to invent this thing vacuously. Let's keep track of what this
What is the matrix of coefficients here? Well the matrix would be what? The
coefficient of the first variable in the first column is 1, second variable first column is
1, third variable first row is 1. You see? Second equation, first variable coefficient is
1. Second equation second variable coefficient is -1.
Second equation, third variable. Coefficient is 2. So using our matrix coding system,
the matrix of coefficients would be what? [1, 1, 1; 1, -1 2]. Which is exactly the
matrix that we wrote down over here. And to put this into a different perspective so
to see what we're driving at, let's take a second example where we first start out
with three equations and four unknowns. Three linear equations and four
unknowns. And then we'll write the matrix for this afterwards.
[1, 2, 1, 1]. My second row would be [2, -1, -1, 3]. My third row would be [3, 1, 2, -1].
Again know this-- in this coding system, the number of rows corresponds to the
number of equations. And the number of columns corresponds to the number of
variables that are formed in linear combinations. To summarize this again-- the
matrix of coefficients in our second example is the 3 by 4 matrix [1, 2, 1, 1; 2, -1, -1,
3; 3, 1, 2, -1].
Well again, let's recall that when we do mathematics, we don't like to introduce
notation for the sake of notation. And simply to be able to have a way of
conveniently writing the coefficients, but not being able to use it efficiently would be
13
a rather stupid thing to do.
Why invent new notation if it's not going to help us effectively solve new problems?
This is why in mathematics we've been emphasizing the game idea whereby what
we really care about is structure. We care about structure, not about the terms
themselves. And to motivate when I'm driving at, let me return to examples one and
two. And bring up a question that has great impact-- and even if we don't appreciate
it right now in terms of a practical application, let's at least see what's happening.
You'll notice that if I look at these systems of equations over here, notice that the
first two equations tell me how to express z1 and z2 in terms of y1, y2 and y3. On
the other hand, the second system of equations tells me how to express y1, y2, and
y3 in terms of x1, x2, x3 and x4. Now without belaboring the point because the
arithmetic is quite trivial here, a very natural question that might come up next this is
let's look at our old friend the chain rule again.
Since the z's are expressed in terms of the y's, and the y's are expressed in terms
of the x's, it seems that by direct substitution, I should be able to express the z's in
terms of the x's namely. I replace y1 by this linear combination of the x's. I replace
y2 by this linear combination of the x's. I replace y3 by this linear combination of the
x's.
| then combine the y's in terms of the x's as indicated here. And that should give me
the z's in terms of the x's. Leaving that hopefully as a trivial exercise, we come to
the next example that I'd like to mention here, and that is suppose you were told to
express z1 and z2 in terms of x1, x2, x3 and x4. The point is that with the amount of
arithmetic mentioned before we could easily show that z1 = 6*x1 + 2*x2 + 2*x3 +
3*x4.
way of the y's? In other words is there a way of replacing the y's by the x's, and then
finding z's in terms of x's in a convenient, mechanical way that will save us much
14
steps?
Not so much in these easy examples where you have 2 by 3, and 3 by 4 systems,
but cases where you might have 10 equations, and 10 unknowns. Or 10 equations
and 12 unknowns. And the answer is, there is a way. Of course you knew there was
going to be a way. Otherwise we wouldn't be leading up to it in this particular way,
and as so often happens, there usually happens to be a real life situation that
motivates why we invent something called matrix algebra.
In terms of our present illustration, the chain rule that we're just talking about
expressing the z's in terms of the y's, and then the y's in terms of the x's motivates
what we mean by matrix "multiplication". And you may notice that I put
"multiplication" here in quotation marks. The reason I put in quotation marks is
unfortunately the word "multiplication" has a connotation of multiplying numbers
together.
Don't think of it that way think of multiplication meaning what? A way of combining
two matrices to form another matrix. There's going to be no logic behind this other
than one very famous piece of logic. That is knowing what the answer was
I remember when I was an undergraduate in college. The big type of humor that
was going around at that time was the idea of, somebody would give you the
answer, and you have to make up the question. Oh they were silly little things like, if
the answer to the question was 9w what was the question? And the question would
be, do you spell your last name with a v Herr-Wagner, and the answer would be 9w.
And these were funny jokes at that time. I don't know whether their funny now or
not. But the funny point is this. That this joke, which might not be that funny is
exactly how we motivate definitions and rules in mathematics. We start with the
answer, and then go back, and answer the question. Knowing in advance that
somehow or other, the matrix that expresses the z's in terms of the y's is given by
this. And the matrix that expresses the y's in terms of the x's, is given by this matrix.
15
Somehow or other what we would like to do is invent a way of combining these two
matrices to give me the matrix that expresses this answer. In other words, if I start
knowing what the answer is supposed to be-- what is the matrix that expresses the
z's in terms of the x's? Is the matrix whose first row is [6, 2, 2, 3}. And whose
second row is [5, 5, 6, -4]. In other words the matrix would be what? [6, 2, 2, 3; 5, 5,
6, -4].
And without even looking at any mechanical rule, the question that comes up is,
how can I invent a rule that will tell me how to multiply this 2 by 3 matrix by this 3 by
4 matrix to obtain this 2 by 4 matrix? 2 by 4 matrix.
Now look in the notes, I'm going to do this in great detail. There will be many
exercises on this for you to sharpen your teeth on. But for now I just want to hit this
main point because the lecture is quite long. Your attention span probably is starting
to be taxed. And so I just want to show you what the recipe is because my feeling is
that this is something you have to hear before you can really read it without
becoming panicked by the notation.
The idea is this-- first of all to multiply two matrices, all we ever require is that the
number of columns in the first matrix equals the number of rows in the second
matrix. And if that sounds complicated to you, simply think in terms of the chain rule
again. The number of columns in the first matrix tells you how many unknowns
And that number of unknowns gives you the number of equations in the second
system. In other words, the number of columns in the first matrix must match the
number of rows in the second matrix. Notice we don't care about the number of
rows in the first one matching the number of columns in the second, all we care is
that the number of columns in the first matrix-- namely three here-- match the
number of rows of the second, which is three.
Then the rule works in a very interesting mechanical way that makes use of the dot
product. Namely what you do is, suppose I want to find the term in the product of
16
these two matrices that occupies the second row, third column. What I do is I take
the second row. In other words I take the row that comes from the first matrix. I take
the column value from the second matrix. In other words, I have what? Second row,
third column. And I form the usual dot product that we've talked about. I dot the
second row with the third column.
6. So in this product matrix, the term in the second row, third column will be 6. The
term in the second row, third column will be 6. Second row, third column will be 6.
Now leaving it as an exercise for the time being, and reading it in the supplementary
notes, I'm sure you'll be able to put this all together. It's not nearly as difficult as it
I think the most difficult part is rationalizing why one would invent such a definition in
the first place. The answer is very simple: we invent the definition to solve a
particular problem. Coming back here again, all I'm saying is that if I invent-- for
example let me just give you one more checking out point here.
Let me see what the term would be in the first row, second column. To find the term
in the first row, second column, I take the first row of the first matrix. Dot it with the
second column of the second matrix. See first row dotted with second column, the
answer will give me what? The term in the product that's in the first row, second
column. Let's check that.
second column should be 2. It is. You see there's no more motivation to how we
multiply these two matrices than the fact that it solves the problem that we want
solved.
To find the term that's in the i-th row, j-th column of the product, dot the i-th row of
the first matrix with the j-th column of the second matrix. More generally, you can
always multiply an m by n matrix by an n by p matrix. What's the key factor? You
don't care about the number of rows in the first, you don't care about the number of
17
columns in the second. What you do care about is what?
That the number of columns in the first matrix be equal to the number of rows in the
second, and if you do that when you multiply an m by n matrix by an n by p matrix,
notice that the result will be what? An m by p matrix. In other words, the number of
rows is governed by the number of rows in the first matrix and a number of columns
is governed by the number of columns in the second matrix.
Notice by the way, that this tells us right away that when we want to multiply two
matrices it makes a difference in which order that they're written. If we were to take
that 2 by 3 matrix, and the 3 by 4 matrix, and interchange them, we don't have the
appropriate match up of rows and columns. You can't dot a 2-tuple with a 4-tuple.
The very fact that we say dot the row with the column, the dot product is only
defined for two n-tuples. We insist that the n-tuples be the same. The n has to be
Let me summarize today's lecture by saying that in overview, know this-- hopefully
we have reestablished the need for linear systems of equations, and secondly, once
we have understood what the need for linear systems is, we are now introducing a
mechanism whereby we can solve linear systems more efficiently than what we
were taught in the past as to how to solve them.
You see what I'm going to do for the next few lectures now is concentrate on a new
game called the game of matrix algebra. But that will unfold gradually as we develop
the next two lectures. And so until our next lecture, so long.
Funding for the publication of this video was provided by the Gabriella and Paul
Rosenbaum foundation. Help OCW continue to provide free and open access to
18