Linear Algebra For Physics-Springer (2024)
Linear Algebra For Physics-Springer (2024)
Papadopoulos
Florian Scheck
Linear
Algebra
for Physics
Linear Algebra for Physics
Nikolaos A. Papadopoulos · Florian Scheck
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
What is a matrix? What is linear algebra? Most of those who open this book already
have an idea or perhaps a good understanding of both.
When one first comes across linear algebra, one may naively think that it is a
standalone area of mathematics, distinct from the rest. On the other hand, matrices
seem to simply be rectangular arrays of numbers and the reader may think that they
already know everything there is to know about them. However, once the reader has
gone through the first chapters of this book, they will experience that matrices are
omnipresent throughout the book, and it will gradually become clear that matrices
and linear algebra are two sides of the same coin. In fact, matrices are what you
can do with them, and linear algebra itself is ultimately the mathematical theory of
matrices.
A physicist constantly uses coordinates, which essentially are also matrices. This
fact is particularly pleasing for a physicist and could greatly facilitate the access and
understanding of linear algebra.
So why is linear algebra important for physicists?
The well-known mathematician Raoul Bott stated that 80% of mathematics is
linear algebra. According to our own experience with physics, we would state that
almost 90% of mathematics in physics is linear algebra. Furthermore, according to
our experience with physics students, the most challenging subject in mathematics for
Bachelor students is linear algebra. Students usually have hardly any problem with
the rest of mathematics, such as, for example, calculus which is already known from
school. The important challenge of linear algebra seems to be that it is underestimated,
both from the curriculum point of view and from the students themselves. The reason
for this underestimation is probably the widespread idea that, with linear algebra
being “linear”, it is trivial and plain, easy to learn and use.
Our intention is therefore, among others, to contribute with this book to ameliorate
this asymmetric relation between linear algebra and the rest of mathematics.
ix
x Preface
Finally, we would like to add that with the advent of Gauge theory, structures
connected to symmetries became much more important for physics than they ever
were at any time before. This means that virtuosity in handling linear algebra is
needed since linear structures are one of the chief instruments in symmetries.
Our first and foremost Acknowledgment goes to Christiane Papadopoulos, who tire-
lessly and devotedly prepared the LaTeX manuscript with extreme efficiency, and
beyond. We also extend our gratitude to our students. One of us, N.P., particularly
thanks the numerous physics students who, over the past few years, have contributed
significantly to our understanding of the role of Linear Algebra in physics through
their interest, questions, and bold responses in several lectures in this area. Our thanks
also go to many colleagues and friends. Andrés Reyes contributed significantly to
determining the topics to be considered in the early stages of the book. Our physics
colleague, Rolf Schilling, provided extensive support and highly constructive criti-
cism during the initial version of the manuscript. Similar appreciation is extended
to Christian Schilling for several selected chapters. We vividly remember how much
we learned about current mathematics many years ago from Matthias Kreck. The
numerous discussions with him about mathematics and physics have left traces in
this book. The same applies to Stephan Klaus, Stephan Stolz, and Peter Teichner.
One of us, N.P., has benefited greatly from discussions with Margarita Kraus and
Hans-Peter Heinz. Equally stimulating have been the discussions and collaborations
with mathematician Vassilios Papakonstantinou, which have lasted for decades until
today. We would like to express our heartfelt gratitude to mathematician Athanasios
Bouganis, whose advice in the final stages of the book was crucial. Last but not least,
we would like to thank mathematician Daniel Funck from Durham University, who
significantly improved not only the linguistic quality of the book but also its overall
content with great care and dedication.
We also thank the staff of Springer Nature and, in particular, Ute Heuser, who
strongly supported this endeavor.
xi
About This Book
In this book, we present a full treatment of linear algebra devoted to physics students
both undergraduate and graduate since it contains parts which are relevant for both.
Although the mathematical level is similar to the level of comparable mathematical
textbooks with definitions, propositions, proofs, etc., here, the subject is presented
using the vocabulary corresponding to the reader’s experience made in his lectures.
This is achieved by the special emphasis given to the role of bases in a vector space.
As a result, the student will realize that indices, as many as they may be, are not
enemies but friends since they give additional information about the mathematical
object we are using.
The book begins with an introductory chapter, the second chapter, which provides
a general overview of the subject and its relevance for physics. Then, the theory is
developed in a structured way, starting with basic structures like vector spaces and
their duals, bases, matrix operations, and determinants. After that, we recapitulate
the role of indices in linear algebra and give a simple but mathematically accurate
introduction to tensor calculus.
The subject material up to Chap. 8 may be considered as the elementary part of
linear algebra and tenor calculus. Detailed discussion about eigenvalues and eigen-
vectors is followed by Chap. 9 on operators on inner product spaces, which includes,
among many other things, a full discussion of the spectral theorem. This is followed
by a thorough presentation of tensor algebra in Chaps. 3, 8, and 14 which takes full
advantage of the material developed in the first chapters, thus making the introduction
of the standard formalism of multilinear algebra nothing else but a déjà vu.
Chapter 1 includes material that is usually left for the appendix. However, as
we wanted to highlight the usefulness on this chapter, especially for physicists, we
placed it as Chap. 1.
All chapters contain worked examples. The exercises and the hints are destined
mainly for physics students. Our approach is in many regards quite different from
the standard approach in the mathematical literature. We therefore hope that students
of both physics and mathematics will benefit a great deal from it.
Where the organization of the book is concerned, the first eight chapters deal
with what we would call elementary linear algebra and is therefore perfectly suitable
xiii
xiv About This Book
for bachelor students. It covers what is commonly needed in everyday physics. The
remaining chapters give a perspective and allow insights into what is interesting
and important beyond this. Hence the subjects of Chap. 9 up to the last one can be
considered as the advanced linear algebra part of the book.
Everything is written from a physicist’s perspective but respecting a stringent
mathematical form.
Contents
xv
xvi Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Chapter 1
The Role of Group Action
In this chapter, we shortly present some prerequisites for the book concerning math-
ematical structures in calculus and geometry.
We introduce and discuss in detail, especially for the benefit of physicists, the
notion of quotient spaces in connection with equivalence relations.
The last two sections deal with group actions, which are the mathematical face of
what we meet as symmetries in physics.
This chapter could be considered as appendix, but we set it at the beginning of
the book to point out its significance. For physicists, it can be skipped on the first
reading.
In this chapter, we are dealing informally with the various mathematical structures
needed in physics. Some of these structures will often be introduced without proof
and used intuitively, as in the literature in physics. It is a fact that in physics, we
often have to rely on our intuition, sometimes more than we would like to. This may
cause difficulties, not only for mathematically oriented readers. We try to avoid this
as much as possible in the present book. Therefore, we will rely here on our intuition
no more than necessary, and be precise enough to avoid misunderstandings.
We treat set theory as understood; we here discuss only the notion of quotient
space. Quotient spaces appear in many areas of physics, but in most cases they are
not recognized as such. In this chapter, we will concentrate on general and various
aspects of group actions and the revision of some essential definitions. In this context,
we also introduce the notion of an equivariant map that respects the group actions of
the input and output spaces in a compatible way.
Usually, we call a set with some structure a space. So we can talk about topological
spaces, metric spaces, Euclidean and semi-Euclidean spaces, affine spaces, vector
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_1
2 1 The Role of Group Action
spaces, tensor spaces, dual spaces, etc. Almost every set we meet in physics also has
a manifold structure. This is the case in the spaces mentioned above. Some examples
of manifolds that the reader already knows are the three-dimensional Euclidean
space we live in, any two-dimensional smooth surface, and any smooth curve (one-
dimensional manifold) we may think of or see in it. We may also think of manifolds
in any dimension .k < n in .Rn which is also the simplest manifold we can have in
.n-dimensions.
In this book, we will talk freely about manifolds (smooth manifolds) without
defining them, and we expect the reader to know at least intuitively what we refer to.
We start with some general remarks that might be quite useful for many readers.
Equivalence relations first appear in real life when we want to talk about objects that
are not absolutely the same but show clear similarities. In this way, we can observe
rough structures more clearly and more precisely. We get new sets, often with far
fewer elements. In mathematics, such a set is called a quotient set and the elements
of this quotient set are called equivalence classes. Each such element, called an
equivalence class, is itself a special subset of the originally given set.
We could consider as an example from real life the set of the inhabitants of the
European Union. A possible equivalence relation exists if we consider the inhabitants
of each European state as equivalent. In this case, the quotient set is the set of
European states and the elements are the corresponding states Germany, France,
Cyprus and so on.
It would be surprising if we could not apply things that are produced in life
constantly and often unconsciously, in mathematics, and in particular in linear algebra
too. So, in the following, we describe this phenomenon and some consequences of
it, here in the framework of mathematical formalism.
In mathematics, it is well-known that we can construct new sets with set operations
like, for example, union and intersection. However, building quotient spaces is a
much more complex operation than obtaining a new set or a new manifold by union
or intersection. This happens when we have to talk not about equal elements but
about equivalent elements. Here we use the equivalence relation to construct a new
topological space or a new manifold. As this approach to constructions of quotient
spaces may seem highly abstract, we have to be much more precise than we usually
would be in the standard physics literature (at least at the beginning) and not rely,
entirely on our physical or geometric intuition. Interestingly enough, we use many
quotient spaces in physics intuitively and often without being aware of the precise
mathematical situation. A prominent example in special relativity is when we have to
consider our three-dimensional Euclidean space of a given four-dimensional space of
events (spacetime points). As is well-known, a point belonging to a Euclidean space
within this setup, is not a point but a straight line in the four-dimensional space of
events. This is, for example, the set of all the events at all times of a free moving point
1.2 Quotient Spaces 3
particle. Mathematically speaking, all the events on this straight line are equivalent
elements (events) of the spacetime and form an equivalent class or simply a class or
a coset. For example, the space point “London” is the equivalence class of all events
along a straight line. The set of all such equivalent classes or cosets (parallel straight
lines of free point particles) fills the whole spacetime. This set of equivalent classes
is called quotient space. So it is evident that the three-dimensional Euclidean space
is described by a quotient space of the four-dimensional spacetime.
Such a construction is a mathematically precise approach to obtaining new spaces
out of given ones. Even if it seems quite abstract, for this procedure we need only
elementary mathematics from set theory. The usefulness of this formalism in many
applications justifies its introduction here. Later, we shall also use this approach to
show precisely that the basis dependent (component-wise with indices) tensor for-
malism is equivalent to the coordinate-free formalism. In this sense, tensor formalism
as mostly used in physics, can also be considered as a coordinate-free formulation.
Therefore, in this book, we try to use the coordinate-free formalism and the tensor
formalism at the same level and take advantage of both.
We first remember the definition of an equivalence relation:
the equivalence class or coset of x relative to the equivalence relation .∼. The
new set of equivalence classes which we call quotient space of . X , determined
by the equivalence relation .∼, is given by:
. X/ ∼ := {[x] : x ∈ X }.
.π :X → X/∼
x |→ π(x) := [x]
which may also be called canonical map, canonical projection or quotient map.
So we have again
.π −1 ([x]) = {x ' : x ' ∼ x, x ∈ X } ⊂ X.
We may call any element .x ' ∈ π −1 ([x]) ∈ X a representative of the class .[x]. If
the different elements .x and .x ' are for example different because they have different
properties, we ignore these different features and consider .x and .x ' as essentially
identical. So we may identify all the other equivalent elements of .x with .x. We so
obtain a new object, a new element .[x] and a new set . X/∼= {[x], . . . }, both with
completely different properties as .x and . X = {x, . . . }.
It is clear that .[x] is not an element of . X , so we have .[x] ∈ / X . The set of represen-
tatives of .[x], the fiber .π −1 ([x]), is a subset of . X (i.e., .π −1 ([x]) ⊂ X ). It is a slight
misuse of notation when sometimes we mean .π −1 ([x]) and write “.[x] ⊂ X ”. Note
again that for the element .[x] ∈ X/∼ we may use different names: equivalent class,
coset, or fiber (i.e., .(π −1 ([x])). All this is demonstrated symbolically in Fig. 1.1.
In many cases, the quotient map .π is defined in such a way that . X and . X/∼
have the same algebraic (or geometric) structure. In this case, .π is a homomorphism
relative to the relevant structure. See for example the proposition in Sect. 2.6.
We are now going to give two very simple and essential geometrical examples
of quotient spaces. The first example corresponds to the special relativity case men-
tioned earlier in this section. The second one corresponds to a pure geometric case.
. → = {x = ξ→ : ξ→ ∈ R2 and ξ→ /= 0}.
X = R2 − {0} →
So, as shown in Fig. 1.2, this is the two-dimensional plane without the zero
point. We denote with .R+ the positive numbers .R+ := {ξ ∈ R : ξ > 0}.
For each .x ∈ X , the x-ray . A(x) is given by . A(x) = R+ x.
We denote the equivalence class or coset of .x by .[x]. This can also be defined
formally like this (for . X = R2 − {0}):
In Eq. (1.1), we see that .[x] is the set . A(x), the ray which contains .x.
We further see that for two equivalence classes the following is true: either they are
equal or they are disjoint. This means that for . A(x) and . A(z) either . A(x) = A(z) or
. A(x) ∩ A(z) = ∅. The quotient space . X/∼ is given by the set of rays:
. X/ ∼ = {[x]} = {A(x) : x ∈ X }.
We can also describe this by a set of suitable representatives. For example, we may
choose the circle with radius one . S 1 = {x ∈ X : ||x|| = 1} and the bijective map:
Φ:
. X/∼ −→ S 1
x
A(x) |−→ .
|| x ||
So we determined for every equivalence class .[x] ≡ A(x) one and only one repre-
x
sentative, the point . ||x|| ∈ S 1 ⊂ R2 . That is for each ray a single point on the circle
and so we get the bijection:
. X/∼ ∼
= S1.
There are of course innumerable such bijections which characterize the quotient
space, . X/ ∼, but . S 1 seems to be the most pleasant.
. G × G −→ G,
(a, b) |−→ a ∗ b ≡ ab
.φ : G × X −→ X
(g, x) | −→ Φ(g, x) ≡ gx
id X : X −→ X
.
x −→ id X (x) = x.
8 1 The Role of Group Action
Even if it seems quite trivial, it turns out that the identity map, .id X , is a very
important map in mathematics and physics.
The map .φ above gives two families of partial maps which we denote by .(φg )G
and .(φx ) X .
φ : X −→ X
. g
and
φ : G −→ X
. x
We may call .φg a g-transformation and .φx an .x orbit maker! There are two more
corresponding maps .φ̂ and .φ̃. Using .T r f (X ) ≡ bijective .(X ) for bijective maps in
. X , we have:
.φ̂ : G −→ T r f (X )
g |−→ φ̂(g) := φg : X −→ X.
The map, .φ̂, converts an abstract group element into a transformation .φg in . X . If we
denote the set of all maps between . X and .Y by . Map(X, Y ) = { f, . . . },
. f : X −→ Y
x −→ f (x) = y.
.φ̃ : X −→ Map(G, X )
x |−→ φ̃(x) := φx : G −→ X.
The map, .φ̃, converts the point .x into the map .φx , a kind of an “orbit maker”!
According to our convention, .T r f (X ) is the set of bijections on X so that we have
. T r f (X ) ≡ Bi j (X ). The official name for . T r f (X ) is the symbol . S(X ), the group of
φ̂(gh) = φ̂(g)φ̂(h).
.
ψ : X × G −→ X
.
with
ψ̂ : G −→ T r f (X )
.
g |−→ ψ̂(g).
So we have
. G × X → X,
since we consider, as indicated by .G × X , a left action. The group .G acts from the
left. In the case
.Y × G → Y,
since we consider as indicated by .Y × G, a right action. The group .G acts from the
right.
It is important to realize that left and right actions correspond to two different
maps. In particular, as was shown above, a left action leads to a homomorphism
.φ̂(h) ◦ φ̂(g) = φ̂(hg). A right action leads to an antihomomorphism .Ψ̂(g) ◦ Ψ̂(h) =
Ψ̂(hg).
.Ψ ' : X × G −→ X
(x, g) |−→ Ψ ' (x, g) := g −1 x.
Since this leads to an antihomomorphism, .Ψ ' is a right action even if the group
element .g −1 acts on .x, as we see, from the left. We may also indicate this fact
by writing . R̄ := L g−1 . We will need this fact later.
The left or right action is also relevant for what follows. It is obvious that for every
given point .x0 ∈ X , the left group action leads for example to a left orbit which we
denote by .Gx0 and which is the subset of . X given by:
A subgroup of.G,. Jx0 (that is, Jx0 < G) is connected with the orbit.Gxo at the position
x . This subgroup characterizes entirely the orbit, naturally together with .G. This
. o
leads to the following definition.
of .G with respect to .x0 . Different terms for the same thing sometimes indicate their
importance.
1.3 Group Actions 11
We will now give a definition of a few other essential types of action relevant
to some aspects of linear algebra and physics in general, for example, gravity and
especially cosmology.
So we have .Gx0 = X .
Equivalently, for any .x and .x ' in . X there exists a .g ∈ G such that .x ' = gx.
Definition 1.8 Free action. The group, .G, acts freely on . X if for all .x0 ∈ X ,
the isotropy group . Jx 0 is trivial (i.e., . Jx0 = {e}). This means that, in other
words, .G acts on . X without fixed points.
. Gxo ∼
= G/H.
So we may consider the quotient space .G/H as a model of the orbit .Gxo in the
same sense as .Rn may be considered as the model of a real vector space .V with
.dim V = n. This is the first application of Sect. 1.2. The relevant equivalence
. H with . H < G. It turns out that the second group . H is given, as expected, by
the data of the orbit .Gxo and is precisely the stability group . Jxo . So we have
. H := Jx o .
12 1 The Role of Group Action
Given .G and . H as above, in order to obtain the quotient space .G/H , we have to
consider the right action of . H on .G:
. G × H −→ G,
(g, h) |−→ gh ≡ Rh g.
It is clear that . H acts freely on .G. This follows directly from the group axioms and
Definition 1.8 (Free action): If.gh = g, we also have.g −1 gh = g −1 g. Since.g −1 g = e,
it follows that .eh = e and .h = e and so, the isotropy group . Jg = {e} is trivial. This
means that . H acts freely on .G.
We now define the following equivalence relation.
So we have
[g] := {g ' : g ' = gh, h ∈ H } ≡ g H.
.
This means that .g ' and .g are in the same (right) . H orbit .g H in .G.
This equivalence relation leads to the quotient space, as defined in Sect. 1.2:
Here the cosets .[g] are . H orbits in .G. Since the action of . H an .G is free, every such
H orbit in .G is bijectively related to the subgroup . H . So we get .g H ∼
. = H for all
bi j
. g ∈ G and .G/H consists of all such . H orbits. For this reason .G/H is also called . H
orbit space in .G and we may draw symbolically (Fig. 1.3).
This is one further example of a quotient space. Figure 1.3 and Eq. (1.3) also show
explicitly that the quotient space .G/H is given by the set of all positions .g H, g ∈ G
that the subgroup . H takes by the natural .G-action. This means nothing else but that
. G/H by itself is an orbit of the . G-action. Therefore . G/H is a homogeneous space.
At the same time, .G/H is generally a model for every .G orbit .Gxo in . X .
Summing up the above discussion, we can state that this type of quotient space
leads to a kind of universal relation: Every .G orbit in . X has the same structure and is
called a homogeneous space. So for a given .G orbit .Gxo ⊆ X there exists a subgroup
. H of . G (in fact, this subgroup . H is the isotropy group . Jx 0 ) such that
. Gxo ∼
= G/H.
By assumption, . X is not a group but has the same “number” of elements, that
is, the same cardinality, as the group .G.
We use this fact in Sect. 3.2. If we work with all (linear) coordinate systems
simultaneously, it signifies that we are de facto coordinate-free. This is true for linear
algebra as well as for tensor calculus.
This proves that tensor calculus, in the basis dependent component formulation,
is completely equivalent to and not less valuable than the corresponding basis free
formulation.
φ : G × X → X and ψ : G × Y → Y
.
The interesting case occurs when we demand . F to commute with the group actions
φ and .ψ. This leads to the notion of an equivariant map:
.
14 1 The Role of Group Action
Note that the interesting maps between groups are the group homomorphisms in
the same sense. Here we have .G-spaces, . X and .Y , and the interesting maps now are
the equivariant maps.
This can also be expressed by the following commutative diagram:
. F
X Y
φ
. g . ψg
X Y
. F
Summary
References
4. M. Göckeler, Th. Schücker, Differential Geometry, Gauge Theories, and Gravity (Cambridge
University Press, 1989)
5. K.-H. Goldhorn, H.-P. Heinz, M. Kraus, Moderne mathematische Methoden der Physik. Band
1 (Springer, 2009)
6. K. Jänich, Topologie (Springer, 2006)
7. J.M. Lee, Introduction to Smooth Manifolds. Graduate Texts in Mathematics (Springer, 2013)
8. S. Roman, Advanced Linear Algebra (Springer, 2005)
9. C. Von Westenholz, Differential Forms in Mathematical Physics (Elsevier, 2009)
Chapter 2
A Fresh Look at Vector Spaces
We start at the level of vector spaces, and we first consider quite generally a vector
space as it is given only by its definition, an abstract vector space.
We have to investigate, to compare, and use vector spaces to describe, whenever
this is possible, parts of the physical reality. The most appropriate way is to use maps
that are in harmony, that is, compatible with a vector space structure. We have to use
linear maps, also called vector space homomorphisms.
It turns out that the physical reality demands additional structures which we have
to impose on an abstract vector space. The most prominent structure of this kind is a
positive definite scalar product (special symmetric bilinear form), the inner product,
and in this way we obtain an inner product vector space or a Euclidean vector space
which is strongly connected with our well-known (affine) Euclidean space. We have,
of course, also semi-Euclidean vector spaces where the scalar product is no more
positive definite.
It is interesting that instead of adding, as in Sect. 2.3 with a symmetric bilinear
form, we could also “subtract” structures from vector spaces. This means we can
consider a vector space a “special” manifold, a linear manifold which is usually
called affine space.
The discussion in Sect. 1.3 allows us to define a vector space that emphasizes the
point of view of group action.
In preparation of this first approach, we have to define what a field is, for example
the real and complex numbers. These are also called scalars and are used essentially
to stretch vectors, an operation which is also called scaling. We already know what
a group is. In some sense, a group .G contains a perfect symmetry structure. It is
characterized by one operation, by the existence of a neutral element .e with .eg =
ge = e, for all .g in .G and by the property that for every .g in .G the inverse .g −1 exists.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 17
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_2
18 2 A Fresh Look at Vector Spaces
α · (β · γ ) = (α · β) · γ .
.
.α · (β + γ ) = α · β + α · γ ;
(α + β) · γ = α · γ + β · γ .
This means that for every .α ∈ K with .α /= 0, there exists an element .α −1 so that
−1
.α · α = α −1 · α = 1. It is easy to recognize the following equivalent definition.
2.1 Vector Spaces 19
α · (β + γ ) = α · β + α · γ ;
.
(α + β) · γ = α · β + β · γ .
In the case of a vector space, we have a new situation. We have to consider two
different sets. The main set is the vector space .V = {u, v, w, . . . } and the second set
is the field .K = {α, β, γ , . . . }. .(V, +) is an additive group. In addition, we see that
the second operation is an external operation between .K and .V . To be more precise:
we get .(K, ·), the multiplicative group of the field .K which acts on .V . The notion of
group action was discussed in Sect. 1.3 in general terms. This means here that every
scalar .λ ∈ K, λ /= 0 can expand or shrink every element (vector) of .V . All this is
summarized in the following definition.
.+ : V × V −→ V,
(v, w) |−→ v + w.
Scalar multiplication:
.K × V −→ V,
(α, x) |−→ αv,
.αv = vα;
(iii) The distributive law holds:
.(α + β)v = αv + βv and .α(v + w) = αv + αw.
Part (ii) of the definition refers to the group action, as discussed rather generally in
Sect. 1.3.
We restrict ourselves from the beginning throughout this book to finite-dimensional
vector spaces. Only a few examples are referring to nonfinite-dimensional vector
spaces. We use greek letters like .α, β, ξ, λ for the scalars since we want to underline
the scalar action on vectors.
In physics, the two fields .R and .C play the leading role; therefore, we restrict .K
to these two fields.
Although .R and .C vector spaces correspond to the same linear structure, in some
cases they have different properties, for example in the spectral theorems, which are
very important in physics. That is why it is necessary to distinguish them.
It is interesting to realize that the scalar action.K on the abelian group.V creates the
vector space we know, an object which is very “compact” and at the same time very
flexible. This is due to the existence of its basis and with it the notion of dimension,
especially in the finite-dimensional case we consider here. Thus every vector space
. V is entirely characterized by its dimension and the scalar action on it.
As we saw, all the above spaces, rings, fields, and vector spaces, start with an
abelian group and after that another operation is introduced. In linear algebra, for
square matrices and linear operators in a vector space, we can further define another
operation. This results in an algebra. In accordance with our procedure in this section,
there are two ways to arrive at an algebra. We can either start from a ring and then
introduce a scalar multiplication or start with a vector space and then introduce a
vector multiplication. In the following definition, we consider both.
holds.
2.1 Vector Spaces 21
Here we see explicitly that an algebra is a vector space in which we can take the
product of vectors. Or, equivalently, we see that an algebra is a ring in which we can
multiply each element by a scalar.
Example 2.1
. V = R0 := {0}.
This is the simplest but trivial vector space we have. Here, and in most examples
below, the verification of the vector space-axioms are very straightforward.
Example 2.2
. V = R = R1 = {v ∈ R}.
This is the simplest (nontrivial) vector space we can have. Both, scalars .α and
vectors .v, are real numbers .(α, v ∈ R).
Example 2.3
[ ]
ξ1
. V = R2 = R × R := {x = ξ2
: ξ 1 , ξ 2 ∈ R}. (2.1)
[ ] [ ]
We may also write .x = e1 ξ 1 + e2 ξ 2 with .e1 = 01 and .e2 = 01 . We would
like to clarify that multiplication is commutative, that is, we have (Definition
2.4) for example .e1 ξ 1 = ξ 1 e1 .
22 2 A Fresh Look at Vector Spaces
This is the simplest typical example of a vector space. The scalars .ξ 1 , ξ 2 are the
components, coefficients, coordinates of the vector .x. .R2 may also be seen as the
coordinate plane. It is clear that the data of the vector .x ∈ R2 is the list of length
two .(ξ 1 , ξ 2 ) ξ 1 , ξ 2 ∈ R, but we choose, as is common practice for good reasons,
to present this as a column which is a .2 × 1 matrix, as indicated in Eq. (2.1) with
square brackets. We assume, as usual in physics, that the reader learnt very early
to use matrices. In physics, we also like arrows! Here too, and we shall use them,
especially when we want to emphasize that these vectors belong to the standard
vector space .Rn (here .n = 2) written as columns. For this reason we freely use both
[ 1]
notations, .x ∈ R2 and .x ≡ ξ→ = ξξ 2 , ξ 1 , ξ 2 ∈ R. Here, the symbol “.≡” indicates
that we use a different notation for the same thing.
This difference between a list and a matrix seems at first to be quite pedantic.
But in linear algebra, whenever we consider linear combinations of vectors in
.R or vectors in . V , it is better to talk of a linear combination with respect to a list
n
Example 2.4
⎧ ⎡ ξ1 ⎤ ⎫
⎪
⎪ ⎪
⎪
⎪
⎨ .
⎢ .. ⎥ ⎪
⎬
⎢ i⎥ 1
. V = Rn = x = ⎢ξ ⎥ : ξ ,...,ξ ∈ R .
n
(2.2)
⎪
⎪ ⎣.⎦ ⎪
⎪
⎪
⎩ .. ⎪
⎭
ξ n
. x = ξ→ ≡ (ξ i )n ≡ (ξ i ).
The symbol “.≡” here indicates only a different notation of the same object.
We write the list of numbers .ξ 1 , ξ 2 , . . . , ξ n as a column of length or size .n, a
.n × 1 matrix, written as a vertical list (without commas of course). It is clear
that in general we cannot distinguish a .n × 1 matrix (column) from a vertical
list of length .n.
In physics, the vector space .Rn is extremely relevant and well-known, especially
for .n = 3. The vector space .R3 is the model for our Euclidean space which we
may denote by . E 3 in order to distinguish it from .R3 . It should be clear that the
Euclidean space . E 3 with its elements, the points . p ∈ E 3 (we denote elements of . E 3
with “. p”, “.q”, etc.), is not a vector space, and is therefore different to .R3 whose
elements we denote with “.x→” and “.ξ→ ”, and so on. As we know, . E 3 is a homogeneous
[0]
space whereas .R3 is not. The presence of the element .0→ = 0 ∈ R3 , the neutral
0
element .(0→ + ξ→ = ξ→ ), makes .R3 nonhomogeneous. It is clear that the points of . E 3
are not numbers and that we cannot add points. Nevertheless, we consider .R3 , not
only in physics, as a perfect model for . E 3 . This enables us, among other things, to
do calculations choosing a bijective correspondence (. p ↔ ξ→ ) between the points . p
in . E 3 and the three numbers .ξ 1 , ξ 2 , ξ 3 . The three numbers .ξ 1 , ξ 2 , ξ 3 describe the
position of the point . p. This allows for example to formulate Newton’s axioms in
coordinates and to perform all the calculations we need in Newtonian mechanics.
Similarly, .Rn is the model of . E n for .n ∈ N.
24 2 A Fresh Look at Vector Spaces
Very often in physics and mathematics, we identify .ξ→ with . p and we also
call .ξ→ (the three numbers) a point. We furthermore use .ξ→ to denote a point . p, a
position which is not a vector. Here, we have to distinguish the following cases:
A translation of a vector leads to the same vector. If we consider a translation
of a point, we may think that we obtain another point. In other words, this
identification means that we may consider a vector space as a manifold (a linear
manifold).
Example 2.5
][
αβ
. V =R 2×2
:= {A = : α, β, γ , δ ∈ R}.
γ δ
[ ]
This is the vector space of .2 × 2-matrices. .0 ≡ [0] ≡ 00 00 is the zero. If we
write [ 1 1]
α1 α2
.A = = (αsi ) with αsi ∈ R, i, s ∈ I (2)
α12 α22
and
. B = (βsi ) C = (γsi ),
.R × R2×2 −→ R2×2 ,
[]
αβ11 αβ21
(α, B) |−→ α B := .
αβ12 αβ22
Example 2.6
( [ ] )
α11 α21 α31
. V =R 2×3
:= A = 2 2 2 = (αs ) i
α1 α2 α3
2.1 Vector Spaces 25
Example 2.7
. x : N −→ R
n |−→ x(n) := ξn ∈ R.
and
(α · x)(n) := αx(n).
.
This means that .ζn := (ξ + η)n := ξn + ηn and .(α · x)n := αξn . Finally, we
write again .z = x + y and .αx. Note that for .Rn , we can write in the above
notation .Rn ≡ R I (n) .
Example 2.8
.V is the vector space of the interrupted sequences. Zero, addition, and scalar
multiplication are given as in the previous example. It turns out that this vector
space is equivalent to the space of polynomials denoted by
26 2 A Fresh Look at Vector Spaces
∑
m
.R[x] = {α(x) := αk x k , m ∈ N0 }.
k=0
R(N0 ) ∼
. = R[x].
Example 2.9
. V = R X ≡ Map(X, R) ≡ F(X ) := { f, g, . . . },
with . f given by
. f : X −→ R
x |−→ f (x).
Zero, addition and scalar multiplication are in analogy to the Example 2.7. We
therefore have for zero the constant map .0̂.
0̂ : X −→ R,
.
x |−→ 0̂(x) := 0 ∈ R
for all .x ∈ X ,
(α f )(x) := α f (x).
Example 2.10
. V = C 0 (X ).
V is the set of all continuous functions. Zero, addition, and scalar multiplication
.
are defined as in the previous Example 2.9. From analysis, we know that.C 0 (X )
is a vector space: For example, the addition of two continuous functions is a
continuous function too.
2.1 Vector Spaces 27
Example 2.11
. V = C 1 (X ).
Example 2.12
. V = L(X ).
Example 2.13
. V = Sol(A, 0).
α ξ 1 + α2 ξ 2 + · · · + αn ξ n = 0.
. 1 (2.3)
∑
n
. αs ξ s = αs ξ s . (2.4)
s=1
. Ax = 0. (2.6)
28 2 A Fresh Look at Vector Spaces
The set of all solutions . Sol(A, 0) of Eq. 2.3, is a vector space: If .x and . y are
solutions of Eq. 2.3, the sum and the scalar product are also solutions of Eq.
2.3. If .x, y ∈ Sol(A, 0) : Ax = 0, Ay = 0, then it follows directly that
. A(x + y) = Ax + Ay = 0 + 0 = 0,
A(λx) = λAx = λ0 = 0(λ ∈ R)
All the above equations, Eqs. (2.3) to (2.6), contain linear combinations. As
we shall see in Sect. 3.1 and Remark 3.1, it might not be exaggerated to claim
that linear combinations are the most important operation in linear algebra. The
precise definition of a linear combination is given in Sect. 3.1, Definition 3.3.
to know where it belongs to. The elements of a vector space are, of course,
vectors. The elements of a circle are not vectors, we call them points. This
clarifies the paradox that .x and . y are not vectors and at the same time .x and . y
are vectors.
All this reminds us of Euclid’s approach to geometry. Euclid does never tell
us what a point is, he only says how the points behave towards each other. If we
compare this with Comment 2.2, we see the difference. We considerate elements
on the circle . S 1 , .x and . y not as vectors but, since . S 1 is not a vector space, as
points in . S 1 .
At this stage, it is natural to address subsets of .V which have the same structure as
the vector space .V . This means that a subset .U should be by itself also a vector space.
In this sense, we may say that for the subset .U of .V , we would like to stay in the
vector space category, and we write .U < V . This leads to the following definition.
30 2 A Fresh Look at Vector Spaces
Example 2.16
Given .V = R2 = R × R.
[ ] [ ]
(i) The .x-axis .U1 := R × {0} =[ 01] R = {[ x0] : x ∈ R } and
the . y-axis .U2 := {0} × R = 01 R = { 0x : x ∈ R},
are subspaces of .V and we write .U1 , U2 < V ;
(ii) For every .v different from zero .(v /= 0),→ the set .Uv := Rv = {x ∈ R2 :
x = λv, λ ∈ R} is a subspace of .V ;
(iii) The set of solutions of the equation.α1 ξ 1 + α2 ξ 2 = 0 with. A = [α1 , α2 ] ∈
R1×2 is a subspace of .V . Assuming that . A /= 0 ≡ [00], we may write for
∼
→
the above set of solutions .U = Sol(A, 0)
[ ]
α2
U = Rv0 with v0 =
. .
−α1
2.1 Vector Spaces 31
Example 2.18 Given that .V = Map(R, R), C 1 (R) and .C 0 (R) are subspaces
of .V , we may write
.C (R) < C (R) < V.
1 0
example, belongs neither to the x-axis nor to the y-axis, and consequently not
to their union. To simplify our notation, we write here .u 1 = (1, 0) instead of
.u 1 = [0 ].
1
32 2 A Fresh Look at Vector Spaces
It is essential and beneficial to use the maps that conserve the mathematical structure
for every mathematical structure. Here, these so-called homomorphisms are the linear
maps.
A few further comments about linear maps: We would like to remind the reader that
we do not assume that they come upon this definition for the first time and the same
holds for most definitions and many propositions and theorems in this book. But
we are convinced that essential and fundamental facts have to be repeated and thus
this is by no means a loss of time. It is further an excellent (opportunity to fix our
notation. In this sense, we remember that )a subspace .U of .V denoted by U < V to
distinguish it from the symbol .⊆ (subset) is a vector space in its own right by the
restriction of addition and scalar multiplication to .U .
f : V −→ V ' ,
. 0
. f : V −→ V,
v |−→ f (v) := v.
= αλv1 + βλv2
= α f λ (v1 ) + β f λ (v2 ).
. f : V −→ V
v = u + w |−→ f (v) := u.
The linearity is clear, and thus the image of . f is .im f = f (V ) = U . Note that
for this projection, we have to use both the subspace .U , and the subspace .W .
A parallel projection corresponds to the usual parallelogram rule when adding
two vectors. Here, we did not use the dot product in .R3 which means that we
do not have to use orthogonality.
In physics, we use for the most part orthogonal projections. These are, as we will
show later, connected with the inner product: here the dot product (see Sect. 10.4). In
this case, we only need to know the space .U = im f . The corresponding complement
(here .W ) is given by the orthogonality.
. f : V −→ V
[ 1] [ 1 1 1 2] [ 1 s]
ξ α1 ξ +α2 ξ α ξ
ξ 2 | −→ α 2 ξ 1 +α 2 ξ 2
= αs2 ξ s .
1 2 s
With the matrix form, the two above equations lead to a single one:
. f (ξ→ ) = Aξ→ .
. f (λξ→ + μ→
η ) = A(λξ→ ) + A(μ→ η ) = αsi (λξ s ) + αsi (μηs ) =
= λαsi ξ s + μαsi ηs = λ f (ξ→ ) + μf (→η ).
2.2 Linear Maps and Dual Spaces 35
. D : C 1 (R) −→ C 0 (R)
df
f |−→ D f := .
dx
As is commonly known from analysis, . D is a linear map (.α, β ∈ R):
d d f1 d f2
. D(α f 1 + β f 2 ) = (α f 1 + β f 2 ) = α +β = α D f1 + β D f2 .
dx dx dx
. J : L1 (x) −→ R,
∫
f |−→ J ( f ) := f (x)d x.
X
From all these examples of vector spaces and linear maps, we may learn that if
we have thousands of vector spaces, we expect millions of linear maps. In addition,
taking into account the above examples, we may observe that the image set of vector
spaces by linear maps is again a vector space.
36 2 A Fresh Look at Vector Spaces
The above notation with the symbol “.<” , indicates that the set .Hom .(V, V ' )
and .Map(V, V ' ) are vector spaces. The set . I so(V, V ' ) is not a vector space
since, for example, the sum of two such bijective maps is not necessarily
bijective.
. g f := g ◦ f for f, g ∈ End(V ),
it is not difficult to see that .End(V ) is a ring. Hence with the above composition,
we have an additional multiplicative operation in .End(V ). It is isomorphic to a
matrix ring, but we have not yet seen this (see Sect. 3.3, Corollary 3.8 and Comment
3.3). Similar connections of linear maps to matrices are dominant throughout linear
algebra. As we shall see, with the help of bases we can entirely describe, that is,
represent linear maps and their properties by matrices. In this sense, if we ask what
linear algebra is, we may simply state that linear algebra is the theory of matrices.
For .v ' ∈ V ' , the preimage (or fiber) of .v ' ∈ V ' is given by
The kernel is often called null space and the image is called range.
We note some direct conclusion from the definition:
2.2 Linear Maps and Dual Spaces 37
Linear maps themselves build vector spaces. The most prominent example is the dual
space of .V denoted by .V ∗ . This contains all linear functions from .V to .K : V ∗ :=
Hom(V, K).
This contains all homomorphisms, here all linear functions from.V to.K. We denote
the elements of .V ∗ preferentially by greek letters, for example .α, β, . . . , ξ, η, θ, . . .
and we thus have .V ∗ = {α, β, ξ, η, θ, . . .}. .ξ ∈ V ∗ is a linear function:
.ξ : V −→ K,
v |−→ ξ(v).
abstract vector space allows to identify the two spaces .V and .V ∗ . But we can
only fully understand the above mentioned subtlety if we deal explicitly with
the dual space .V ∗ , in both cases we take .V without and with the inner product.
We must admit that we cannot understand tensors if we do not understand
broadly the role of the dual space .V ∗ . Therefore we do not avoid, as is usually
done in introductions to elementary theoretical physics, but on the contrary, we
underline the role of .V ∗ in the present book. As already mentioned, the role of
∗
. V is important to understand tensors.
(|) :
. V × V −→ K,
(w, v) |−→ (w | v).
In physics, we usually define linearity in the second slot, as above in (i), not in
the first one.
For .K = R, we call .(V, (|)) Euclidean vector space or real inner product space.
For .K = C, we call .(V, (|)) unitary vector space or complex inner product space or
Hilbert space (for finite dimensions). It corresponds usually to a finite-dimensional
subspace of the Hilbert space in quantum mechanics.
It follows from this definition by direct calculation,
(a) additivity in the first slot .(w + u | v) = (w | v) + (u | v);
(b) and .(λw | v) = λ̄(w | v).
For (a) we have from (i) and (ii)
.(w + u | v) = (v | w + u) = (v | w) + (v | u) = (v | w) + (v | u) = (w | v) + (u | v).
The inner product.(|) is called symmetric in the case of a.R vector space and Hermitian
in the case of a .C vector space. As a result, we see altogether that the real inner
product is a positive definite symmetric bilinear form. The complex inner product is
analogously what is called a Hermitian sesquilinear form positive definite. Note that
.(v | v) is real and nonnegative, for a complex vector space too.
Bearing in mind the very important applications in physics, it is instructive to
discuss again and separately the situation for a real vector space. This leads to some
more definitions for the special case of .K = R where the scalars are real numbers.
.(|) is now a bilinear form, symmetric and positive definite. Bilinear means
that if we write .σ (x, y) ≡ (x | y) for .x, y ∈ V .
(i) .σ is a bilinear form:
This means:
(a) .σ (x, x) is never negative,
(b) .σ (x, x) is never zero if .x /= 0.
This leads to further definitions.
From (b) it results that .σ is nondegenerate. Note that for nondegenerate bilinear
forms, .σ (x, x) can be negative.
The use of inner products leads to different structures and properties for the vectors
in .V . The most important ones are the norm, the orthogonality, the Pythagorean the-
orem, the orthogonal decomposition, the Cauchy-Schwarz inequality, the triangular
inequalities, and the parallelogram equality.
||λv||2 = (λv | λv) = λ̄(v | λv) = λ̄(v | v)λ = λ̄λ(v | v) = |λ|2 ||v||2 .
.
2.3 Vector Spaces with Additional Structures 41
Note that for a real vector space, if we have .||u + v||2 = ||u||2 + ||v||2 , it follows that
.u and .v are orthogonal:
. Pb : V −→ V,
(b | v)
v |−→ Pb (v) := b ∈ Kb.
(b | b)
A direct inspection shows that . Pb is a linear map and projects the vectors of .V
orthogonally to the one-dimensional subspace .Kb. . Pb , as any projection operator, is
idempotent:
. Pb = Pb .
2
42 2 A Fresh Look at Vector Spaces
(b | vb )
. Pb Pb v = Pb vb = b
(b | b)
b
= (b | vb )
(b | b)
1 (b | v)
=b (b | b)
(b | b) (b | b)
(b | v)
=b
(b | b)
= Pb v.
. Pe v = e(e | v).
If we write .eb := b
||b||
, we obviously get . Pb = Peb since
. Pb v = eb (eb | v).
Note that up to now we used the orthogonal decomposition for the vector .v, relatively
to the vector .b /= 0! We may write
.idV v = v = vb + vc = Pb v + vc = Pb v + (idV − Pb )v
2.3 Vector Spaces with Additional Structures 43
for all .v ∈ V . This leads to the very useful decomposition of the identity .idV into
the two projection operations:
idV = Pb + (idV − Pb ).
.
. | (u | v) |≤ ||u||||v||.
Proof For the inequality, we assume that .u /= 0 and we define the projection as
(u|v)
Pu (v) = u (u|u)
. =: vu and .vc := v − vu . Hence we have .v = vu + vc . Since
That is why the Cauchy-Schwarz inequality indicates purely that an orthogonal pro-
jection is less than the original. Further, we have if .||vc || = 0(vu = v), then .v and .u
are colinear.
||u + v||2 = (u + v | u + v) = (u | u) + (u | v) + (v | u) + (v | v) =
.
The Cauchy-Schwarz inequality for .|(u | v)| leads furthermore to the inequality
.(α | β) = 0 ⇔ α = 0 or β = 0;
(α | α) = 0 ⇔ α = 0.
|α + β| ≤ |α| + |β|.
[ 1]
→ ξ
.x = ξ = ∈ R2
ξ2
and
. x T = ξ = [ξ1 ξ2 ] ∈ (R2 )∗ .
∼
ξ |−→ ξ T |−→ (ξ T )T = ξ .
∼ ∼ ∼ ∼
With the above preparation, we can express the canonic inner product for
.n = 2 in various forms, as we saw, for example with .x = ξ→ and . y = η→,
where we have:
∑
2
ξ y = x T y = (x | y) = (ξ→ | η→) =
. ξ s ηs = ξ s δsr ηr = ξs ηs .
s=1
46 2 A Fresh Look at Vector Spaces
Using the transpose .T and the explicit symbols for columns and rows,
→ , x T = ξ and .ξ T = x, we may again write for .( | ) :
.x = ξ
∼ ∼
[10]
(x | y) = x T 1y = x T y = ξ η→ with 1 = (δsr ) =
. 01 .
∼
.U ⊥ = {v ∈ V : (v | u) = 0 for all u ∈ U }.
U0 = v0 R := {u : u := λv0 , λ ∈ R} ≤ V.
.
The orthogonal space .U0⊥ , as shown in Fig. 2.3, is given in this case by
⊥
.w ∈ V, w / = 0 with .(w | v0 ) = 0. So the subspace .U0 is equal .wR and
we may write
⊥
. V = U0 + U0 .
. (V, (|)σ ) is again a Euclidean vector space. All aspects discussed in exam-
ple (i), apply one to one also here.
(iii) .R2 , with a symmetric [ nondegenerate
] bilinear form, is given, for example,
by the matrix . S = σ01 σ02 , with .σ1 = positive and .σ2 negative. We can
[ 0 ]
assume that .σ1 = 1 and .σ2 = −1. We get . S = 01 −1 .
.(V, (|)σ ) is a semi-Euclidean vector space. This is our first model for
the two-dimensional spacetime of special relativity: In other words, it
is the vector space that corresponds to the two-dimensional Minkowski
spacetime.
|(u | v)|
. 0≤ ≤ 1.
||u||||v||
As a result, we obtain
(u | v)
. −1≤ ≤ 1.
||u||||v||
This allows the unique determination of a real number .ϕ ∈ [0, π ], the angle
between the two vectors .u, v ∈ R2 − {0}:
(u | v)
. cos ϕ := .
||u||||v||
It is clear that this definition applies to any Euclidean vector space .V , for any
two vectors .u, v ∈ V − {0}.
48 2 A Fresh Look at Vector Spaces
and write for the transpose of . A : AT = (α(T )is ) with .α(T )is = αsi , the inner
product takes the form:
∑
(A | B) M :=
. αsi βsi = α(T )is βsi = tr (AT B).
i,s
(R2×2 ), (|)) M ∼
. = (R4 , (|))
and to see that .(|) M is indeed an inner product in .V = R2×2 . It is also obvious
that all the above relations may be extended to every .n ∈ N. The space .(Rn×n ,
(|) M ), of .n × n matrices, is also a Euclidean vector space.
∫
.( f | g) := f (x)g(x)d x f, g ∈ L 2 (R)
R
and
/∫
|| f ||2 := f (x)2 d x.
R
The definition of a bilinear form.ϕ applies also when we consider two different
vector spaces, .V and .W :
ϕ : V × W −→ R,
.
The following naturally given bilinear form is a very instructive and remarkable
example. We take .V and its dual .V ∗ and denote the bilinear form by the symbol
.(, ):
(, ) V ∗ × V −→ R,
.
Note that in this case, we write .(, ) and not .(|) as for the inner product.
(z | w) := z̄w.
.
The norm takes the form .||z|| := |z| = (z̄z)1/2 and the Cauchy-Schwarz
inequality is given by .|z̄ w| < |z| |w|.
50 2 A Fresh Look at Vector Spaces
If we consider an abstract .n-dimensional vector space, it has only the linear structure.
We can of course introduce other structures, such as a volume form (see Sect. 7.5)
or an inner product, or even a particular basis. The standard vector space .Kn has all
of these structures. Therefore, we can say that the vector space .Kn has the maxi-
mum structure an .n-dimensional vector space can have! The reader should already
be familiar with .Rn from a first course in analysis. In Sect. 2.1, Example 2.3, we
introduced some elements of this section for the case .n = 2, so our discussion here
can be considered as a review thereof or as an extension to general .n ∈ N .
As discussed in Example 2.4, .Kn is the set of all finite sequences of numbers with
length .n which we may also call .n-tuple or list of length or size .n. We regard this
.n-tuple as a column. Any .K is given by
n
2.4 The Standard Vector Space and Its Dual 51
⎧ ⎡ ξ1 ⎤ ⎫
⎪
⎪ ⎪
⎪
⎪
⎨ ⎢ ... ⎥ ⎪
⎬
⎢ j⎥ j
Kn =
. x = ⎢ ξ ⎥ : ξ ∈ K, j ∈ I (n) := {1, 2, ...n} .
⎪
⎪ ⎣.⎦ ⎪
⎪
⎪
⎩ .. ⎪
⎭
ξ n
We also use the notation .x = (ξ j )n ≡ (ξ j ) ≡ ξ→ and we omit the index .n when its
value is clear. We may say that .ξ j is the . jth coordinate or the . jth coefficient or the
. jth component of . x ∈ K .
n
If we apply the standard addition and scalar multiplication by adding and multi-
plying the corresponding entries, the set .Kn is indeed a vector space since it fulfills
all the axioms of the definition of a vector space. .Kn is the standard vector space; it is
the model of a vector space with dimension .n, and, as is well-known from analysis,
it is locally also the model of a manifold!
It is evident that .Kn can be identified with the .n × 1-matrices with entries in .K.
The greatest advantage of .Kn is its canonical basis:
⎡ ⎤ ⎡ ⎤
1 0
0 .
. E := (e1 , . . . , en ) with e1 = ⎣ . ⎦ , . . . en = ⎣ .. ⎦ .
.. 0
0 1
Matrices are closely connected with linear maps. It is well-known that a matrix. F with
elements in .K, may, if we wish, describe a linear map . f as a matrix multiplication.
This was also demonstrated in Example 2.23. Here we use the letter. F to demonstrate
the close connection between a matrix and a map, and in our mind we may always
identify . f with . F:
. f : Kn −→ Km ,
x |−→ f (x) := F x.
. z i = ϕ ij x j . (2.7)
However, this is not enough for us. Since we always use greek letters for scalars, we
set .x = (ξ j )n ≡ ξ→ , j ∈ I (n), z = (ζ i )n ≡ ζ→ , i ∈ I (m). In addition, we change
the index . j into the indices .s, r ∈ I (n) and we write .x = (ξ s )n . In this book, we
52 2 A Fresh Look at Vector Spaces
systematically use this kind of indices and notation, and we may call this “Smart
Indices Notation”. Hopefully, the reader will soon realize the usefulness of this
notation. We therefore write for the Eq. 2.7
ζ i = ϕsi ξ s .
. (2.8)
We can verify that . f is indeed a linear map in both notations. The matrix . F = (ϕsi )
contains all the information about the map . f .
Before proceeding further, we would like to remark that we restrict ourselves to
.R for simplicity reasons and because there are direct, relevant connections to the
n
.(|) : Rn × Rn −→ R,
∑
n
(x, y) |−→ (x | y) := ξ s η s = ξs η s ,
i=1
we may interpret rows (.1 × n-matrices) with entries scalars as elements of the dual
space of .Rn and write:
{ }
(Rn )∗ := ξ ∗ ≡ ξ = [ξ1 . . . ξn ] : ξi ∈ R, i ∈ {1, 2, . . . , n} .
.
We used the matrix notation .ξ = [ξ1 ξ2 . . . ξn ] ∈ R1×n and not the notation for the
∼
corresponding (horizontal) list.(ξ1 , ξ2 , . . . , ξn ) (see Comment 2.1). In order to use the
Einstein convention, we have to write the indices of the coefficients of .ξ downstairs
and we get, with . y = (ηs ) = η→,
ξ : Rn −→ R,
.
[ η1 ]
y |−→ ξ(y) = ξs η = [ξ1 · · · ξn ] .. .
s
.n
η
The Einstein convention is the simplest and nicest way to express matrix multi-
plications. The same holds for linear combinations. After all, both operations are
essentially the same thing. Therefore we can write symbolically in an obvious nota-
tion: [ ]
..
. [∗ ∗ ∗ · · · ∗ ∗ ] . = [∗ · + ∗ · + ∗ · + · · · + ∗ · + ∗ ·] (2.9)
...
2.4 The Standard Vector Space and Its Dual 53
which is
. [1 × n][n × 1] = [1 × 1].
The above Eq. 2.9 is nothing else but the usual row by column rule for the
multiplication of matrices. We assumed tacitly that between the entries “.∗” and
“..”, the multiplication and correspondingly the addition are already defined.
At the same time, with this symbolic notation, we also generalized the matrix
multiplication, even in the case where the entries are more general objects than
scalars. This means for example that if the entries “.∗” and “..” are themselves
matrices, we obtain also the matrix multiplication of block matrices.
We use the row matrix .[ξ ] ∈ R1×n to express the linear map
.ξ ∈ Hom(Rn , R) = (Rn )∗ .
This leads to the natural identification of.R1×n with.(Rn )∗ . Thus we can call.ξ ∈ (Rn )∗
a linear function, a linear form, a linear functional or a covector, and we may also
write:
ξ ≡ ξ ≡ [ξ ] ≡ (x |: Rn −→ R,
.
∼
y |−→ ξ(y) = [ξ ]y = (x | y).
(x |∈ (Rn )∗ corresponds to the Dirac notation and is widely used in quantum mechan-
.
ics (see also Sect. 6.4).
Furthermore, if we use the transpose, we set .ξi = ξ i and we have
. T : Rn −→ (Rn )∗ ,
x |−→ x T = [ξ1 . . . ξn ].
54 2 A Fresh Look at Vector Spaces
We can connect .Rn with .(Rn )∗ and in addition we can, if we want, identify
n ∗
.(R ) with .R . This is an identification we are often doing in physics from the
n
T
. y ≡| y) |−→ (y |≡ y ≡ y T ∈ (Rn )∗ .
∼
In the previous sections, we started with an abstract vector space .V given only by
its definition, and we introduced the inner product as an additional structure on
it. Besides its geometric signification, this other structure facilitates the formalism
considerably within linear algebra, which is very pleasant for applications in physics
too.
On the other hand, removing some of its structure is necessary when starting with
. V . It is essential to consider the vector space . V as a manifold. This special manifold
is also called linear manifold or affine space. The well-known Euclidean space is
also an affine space, but its associated vector space is an inner product vector space,
a Euclidean vector space. An affine space has two remarkable properties: On the one
hand, it is homogeneous and on the other hand, it contains just enough structure for
straight lines to exist inside it, just as every every .Rn does. Similarly, any vector space
has the same capacity to contain straight lines. But at the same time, a vector space
is not homogeneous since its most important element, the zero, is the obstacle to
homogeneity. To obtain the associated affine space starting from a vector space, we
have to ignore the unique role of its zero, and to ignore that we can add and multiply
vectors by scalars. We obtain an affine space with the same elements as the vector
space we started with and which we now may call points. The action of a vector
space .V gives the precise procedure for this construction on a set . X , precisely in the
sense of group actions in Sect. 1.3. The group which acts on . X is the abelian group
of .V :
2.5 Affine Spaces 55
.τ : V × X −→ X,
(v, x) |−→ τ (v, x) = τv (x) = x + v.
From this follows, according to Comment 2.2 in Sect. 2.1, that .V and . X have
the same cardinality or, in simple words, have the same “number” of elements: For
. x 0 ∈ X , we have . V × {x 0 } → X , and the map
∼
=
. V −→ X,
v |−→ x0 + v,
is bijective.
We can give another equivalent and perhaps more geometric description of an
affine space, with essentially the following property:
Δ : X × X −→ T (X ),
.
(x , x ) |−→ −
0 1 x−→
x , 0 1
such that for .x0 , x1 , x2 ∈ X , the triangular equation .− x−→ −−→ −−→
0 x 2 = x 0 x 1 + x 1 x 2 holds. This
means that two points .x0 and .x1 in X determine an arrow .− x− →
0 x 1 ∈ T (X ) which we
may also interpret as a translation. We may consider the straight line through the
points .x0 and .x1 as an affine subspace of . X . Here, we see explicitly that an affine
space can contain straight lines. It is interesting to notice that for every .x0 ∈ X , we
have the set
−→
. Tx 0 X = {x 0 + x 0 x, x ∈ X }
which is the tangent space of . X at the point .x0 . In this sense, we may regard .T (X ) as
the universal tangent space of . X and the letter .T could mean not only “translation”
but also “tangent” space.
The above construction is equally valid ( if we start )with a vector space .V (X = V ).
The affine space now is the triple .V := V, T (V ), τ . In this case, the arrow .− v− →
v
0 1 is
given by the difference: .−
v−→
v
0 1 = v 1 − v0 and the vector space . V is now considered
as an affine space! See also Comment 2.2 in Sect. 2.1.
56 2 A Fresh Look at Vector Spaces
Starting with a vector space .V allows to give another, more direct definition of
an affine space in .V , considered as a subset, not a subspace, of .V . This construction
is more relevant to linear algebra. It also leads to a new kind of very useful vector
spaces, the quotient vector spaces which are discussed in Sect. 2.6.
. A = v0 + U := {v0 + u : u ⊂ U }.
As we saw in Sect. 1.3, we may also write . A(v0 ) = U v0 , that means that the affine
space . A(v0 ) is exactly the orbit of the action .U on the vector .v0 . Here, the action of
.U on .v0 is the additive action of the commutative subgroup .U (of the commutative
group .V ). This justifies the notation . A(v0 ) = U v0 = {u + v0 = v0 + u : u ∈ U } =
U + v0 .
. A(v0 ) = f −1 (w0 ) = v0 + U.
In Fig. 2.4, we see the corresponding affine spaces associated to the linear map
.f . They are all parallel to each other and have the same dimension.
We know from Sect. 1.2, in quite general terms, what a quotient space is. We now
have the opportunity to discuss a very important application which is probably the
most important example of a quotient space in linear algebra. In what follows, we
2.6 Quotient Vector Spaces 57
Fig. 2.4 Affine subspaces in .V relative to the subspace .U = ker f < V with .w = f (v), w1 =
f (v1 ), w0 = f (v0 )
would like to stay within the vector space .V , we use the notation of Sect. 2.5. We
consider the set of all such affine subspaces associated with a given subspace .U of
. V for all points .v ∈ V :
. A(U ) := {A(v) : v ∈ V }.
As shown in Sect. 2.5, all these affine subspaces are parallel and have the same
dimension. Here, it is also intuitively clear, as mentioned in Remark 1.1 in Sect. 1.2,
that this gives us a disjoint decomposition of the vector space .V .
It turns out that we can introduce a vector space structure on the set . A(U ) and
that we may thus obtain a new vector space out of .V and the given subspace .U .
Therefore we can talk about a vector space . A(U ) with elements (again vectors of
course) which are the affine spaces in .V associated with .U . The elements of . A(U )
are by construction equivalent classes or cosets. Every such class is in this case also
a vector since, as we will see below, we can introduce naturally a linear structure on
. A(U ). This makes . A(U ) a vector space and we denote this vector space by . V /U . As
sets, . A(U ) and .V /U are bijectively connected. . A(U ) is simply a set as was defined
above, and .V /U is this set with the vector space structure. Clearly, this new vector
space itself cannot be a subspace of .V , but we stay, in a more general mathematical
language, in the vector space category.
Another way to obtain similar new vector spaces, is to use the notion of equivalence
classes as discussed in Sect. 1.2. We notice that the subspace.U induces an equivalence
class on .V :
' '
.v ∼ v :⇔ v − v ∈ U.
The coset [.v] of this equivalence class of .v is exactly the additively written orbit of
the .U action on .v and we have for this orbit
[v] = U v = A(v) = v + U.
.
58 2 A Fresh Look at Vector Spaces
. V /U := {v + U : v ∈ V } (2.10)
which we may call the quotient space of .V modulo .U . It is evident that a bijection
of sets . A(U ) ∼
= V /U holds.
bi j
Furthermore, we are now in the position to use the formalism of Sect. 1.2 to
introduce a vector space structure on the set .V /U = {[v]}. Consequently, in the end,
we will be in the position to add and scalar multiply the affine subspaces, the classes
or cosets, associated with .U . This means for example that for a given subspace .U
with .dim = 1, all straight lines parallel to .U behave exactly as vectors of a vector
space. This also means that, in general, at the end of our construction, we may expect
to have
. V /U ∼= Km ,
Since all these spaces are parallel and have the same dimension, and are elements of
the set . A(U ) = V /U , it seems quite natural to look at the concrete space .V /U and
try to define the addition and, similarly, the scalar multiplication in the following
way:
+
[v] · [w] := (v + U ) + (w + U ) = v + w + U,
.
λ · [v] := λ(v + U ) := λv + U.
We have only to make sure that these operations are well-defined. That is for .[v ' ] =
[v] and .[w ' ] = [w], we have .v ' + w ' + U = v + w + U , and .λv + U = λv ' + U .
Taking into account the definition of the equivalence class (see Definition 1.2), we
see immediately that this is indeed the case and that .V /U is a concrete vector space.
Since we may write the equivalence relation
Fig. 2.5 The vector space .R2 with the .x- and . y-axes and the affine spaces parallel to .U = u R
v ' + w ' + U = v + U + w + U + U = v + w + U.
.
60 2 A Fresh Look at Vector Spaces
Similarly,
λv ' + U = λ(v + U ) = λv + λU + U = λvU.
.
. V /U × V /U −→ V /U,
(a + U, b + U ) |−→ (a + b) + U and
K × V /U −→ V /U,
(λ, a + U ) |−→ λa + U.
π : V −→ V /U,
.
v |−→ [v] ≡ v + U.
→
Thus, the zero of .V /U is .ker π = U ≡ [0].
2.6 Quotient Vector Spaces 61
The map .π is linear and surjective (epimorphism). The dimension formula, a.k.a
rank-nullity (see Corollary 3.2 in Sect. 3.3), takes here the form
We now understand even better what a vector space is. The elements of a
vector space can be anything. The only one thing we can demand from these
elements, is that they behave well; that is, that we can add and multiply by
scalars. In consequence, we understand that it is the behavior that matters. Here,
we come across the same situation as we met with the Euclidean axioms. We do
not know what the essence of a vector is. Nevertheless, it does not matter since
we have the definition of a vector space.
Now we can ask ourselves: what are the other benefits of the notion of quotient
space for linear algebra? We present two examples that might also be a relevant in
physics.
As will be discussed in Sect. 3.5 in Proposition 3.11, for every given subspace
.U and . V there exist a lot of complementary subspaces . W in . V , satisfying .U ⊕
W ∼= V (see Definition 2.19). In this situation, the following holds: the quotient
vector space .V /U is isomorphic to .W , that is, .W ∼
= V /U and consequently to any
complementary subspace of .U in .V . We can therefore say that .V /U represents all
these complimentary vector spaces of .U .
The second example is the following isomorphism theorem which we give without
the proof.
. V / ker f ∼
= im f.
f¯
. V / ker f ∼
= V '.
This means that we can describe the vector space .V ' solely with data of the vector
space .V .
62 2 A Fresh Look at Vector Spaces
It is further worthwhile noting that quotient vector spaces play an important role
not only in finding and describing new spaces but also in formulating theorems or
various proofs in linear algebra. As already mentioned, in physics we also often use
quotient spaces intuitively even if we do not apply explicitly the above formalism.
We are now going to answer the question that arose in Comment 2.5 of Sect. 2.1.
How can we obtain uniquely a subspace of.V from a union of two or more subspaces?
There is a natural construction that leads to the desired result.
U1 + . . . + Um := {u 1 + u 2 + . . . + u m : u i ∈ Ui , i ∈ I (m)}.
.
In the case of a direct sum, we may say that the list of vector spaces
.(U1 , . . . , Um ) is (block) linearly independent too (for the definition, see Sect.
3.1). As a consequence, a direct sum is a form of linear independence.
. V = U1 ⊕ · · · ⊕ Um ,
.U1 ⊕ U2 f
−→ U2' ⊕ U1'
which is a form of the fundamental theorem of linear maps (see Theorem 5.2 in
Sect. 5.3).
64 2 A Fresh Look at Vector Spaces
Example 2.34 .V = R2 .
As in the examples in 2.16 in Sect. 2.1.2, .U1 and .U2 are the x-axis and y-axis.
We have of course .U1 ∩ U2 = {0} and we have the direct decomposition of
.R :
2
.R = U1 ⊕ U2 ≡ (x − axis) ⊕ (y − axis).
2
At this point, the question arises concerning a criterion for a sum to be a direct
sum. A simple answer is given in the following proposition for .m = 2.
R3 (Fig. 2.7b).
2.7 Sums and Direct Sums of Vector Spaces 65
.U ∩ Y = {0}.
0 = u + y ⇔ u = −y,
.
⇒ u ∈ Y,
⇒ u ∈ U and u ∈ Y ⇒ u ∈ U ∩ Y,
⇒ u = 0 and also y = 0.
It is of interest to notice, here without proof, that for more than two sub-
spaces, we have the following result: .U1 + · · · + Um is a direct sum if and only
if
∑m
.U j ∩ Ui = {0} for all j ∈ I (m) := {1, 2, . . . , m}.
i/= j
This is equivalent
∑m to the uniqueness or, generalizing, to the linear independence
relation: If . i= j u i = 0 with .u i ∈ Ui , i ∈ I (m), then .u i = 0 for all .i ∈ I (m).
In an affine space and in every abstract vector space, the notion of parallelism is part
of the structure. Consequently we may say that the vectors .u and .v in .V are parallel
whenever .u = λv for some scalar .λ ∈ K. See also Proposition 3.11 in Sect. 3.4 which
leads immediately to the definition of a (parallel) projection.
. P : V −→ W < V,
v |−→ P(v) = w,
. V = ker P ⊕ im P
. P is idempotent P 2 = P.
2.9 Family of Vector Spaces in Newtonian Mechanics 67
The importance of projection operators stems also from the fact that every
operator, especially in a complex vector space, always contains essentially pro-
jection operators as components. In particular, this is present in connection with
the spectral decomposition of operators. In quantum mechanics and symmetries
in physics, idempotent operators are particularly important.
Fig. 2.9 The vector space .R2 with its standard subspaces (x- and y-axes). .R1 ≡ R1 (0) = Re1 and
.R2≡ R2 (0) = Re2 , T p0 R2 ≡ R2 ( p0 ) = p0 + R2 : the tangent space at the points . p0 , q0 , p1 and
their standard subspaces .R1 ( p0 ) and .R2 ( p0 )
As a next step, we fix a point . p0 ∈ R2 and we add the vector .v so that we have
. p0 + v, a new point which we consider as the point . p = p0 + v ∈ R . It is usually
2
.R2p0 = p0 + R2 .
2.9 Family of Vector Spaces in Newtonian Mechanics 69
We denote .Rq20 , .R2p1 the tangent spaces at the points .q0 and . p1 . Note that in the
literature both notations, .T p0 R2 and .Rnp0 are used.
All of the above is illustrated in Fig. 2.9.
It leads to the following definitions.
p0 + w ≡ ( p0 , w).
It is clear that we represent . p0 by two numbers (its coordinates) and the vector
part .v also by two numbers since we are in .R2 . Similarly, we may take another point
of application .q0 such that:
v = q0 + v
. q0 or wq0 = q0 + w.
Without doubt, the tangent vector .vq0 = q0 + v is different from the tangent vector
v = p0 + v. The point .q0 is different from . p0 , but .vq0 and .v p0 are parallel to each
. p0
other.
The tangent vectors .v p0 and .vq0 are equal if and only if .q0 = p0 and .u = v.
You may ask, where there is a connection with physics. In physics, we met
such objects very early! The instantaneous velocity of a point particle moving
in .R2 is exactly what we call here a tangent vector. Instantaneous velocity is,
as we know, a vector, but it is never alone. The point of its application is the
momentary position of the moving particle in .R2 .
α : R −→ R2 ,
.
t |−→ α(t),
. T p R2 ≡ R2p := {v p := ( p, v) : v ∈ R2 }
consisting of all tangent vectors with . p the point of application, is called the
tangent space of .R2 at the point . p.
Now, we may ask whether .T p R2 ≡ R2p is really a vector space. Yes, it is:
Addition :
. u p + vp := p + u + v = p + (u + v) = ( p, u + v).
Scalar multiplication : λv p := p + λv = ( p, λv).
R2 −→ T p R2 ,
.
v |−→ ( p, v).
. p −→ T p R2 .
This is called the tangent space of .R2 or equivalently, also the tangent bundle of .R2 .
2.9 Family of Vector Spaces in Newtonian Mechanics 71
. T R2 = R2 × R2 = {( p, v) : p ∈ R2 , v ∈ R2 }.
It is, in this special case, in bijection with the vector space .R4 : .T R2 ∼
= R4 .
bi j
Remark 2.13 Vector space and its dual, tangent space and its dual.
is
(R2 )∗ = {rows of length 2} = {[ϕ1 , ϕ2 ] : ϕ1 , ϕ2 ∈ R2 }.
.
. T ∗ R2 ∼
= R2 × (R2 )∗ .
bi j
because this facilitates both our notation, and our explanations. Going from vectors
to vector fields, we may describe this by the map . F which we call a vector field. We
use the usual simplified notation:
. F : R2 −→ R2 × R2 (= T R2 ),
p |−→ ( p, F( p)).
. Θ : R2 −→ R2 × (R2 )∗ (= T ∗ R2 ),
p |−→ ( p, Θ( p)).
We follow the same procedure with the basis and the cobasis of a vector space.
We then obtain a basis field and a cobasis field. This means that we get a family of
canonical basis vectors, related to .R2 .
The covectors .ε1 ( p) and .ε2 ( p) are the linear forms (covectors, linear functionals,
linear functions)
.ε ( p), ε ( p) : T p R −→ R
1 2 2
For the canonical basis field which corresponds to the Cartesian coordinates
. p = (x 1 , x 2 ), we write:
∂
. ≡ ei and d x i ≡ εi .
∂xi
The duality relation is given by
∂ ∂xi
. dxi ( )= = δ ij .
∂x j ∂x j
This may partially explain the use of the . ∂∂x i , d x j notation. Here we used
the notation .ei ( p), εi ( p) because we wanted to emphasize the linear algebra
background.
Summary
Beginning with the elementary part of linear algebra, we introduced and discussed
the first steps of all necessary mathematical concepts which the physics student
should already know and in fact has to use from day one in any theoretical physics
lecture. The reader is probably already familiar with most of these concepts, at least
74 2 A Fresh Look at Vector Spaces
in special cases. But we now offered careful definitions and a catalogue of basic
properties which will better equip readers to follow the pure physics content of any
introductory lecture in theoretical physics. Moreover, this introduction provided the
reader a solid foundation to explore linear algebra in the subsequent chapters.
Right from the outset, we introduced the concept of the dual vector space. This
aspect is often overlooked in physics lectures, leading to difficulties later on, espe-
cially in comprehending tensors.
In physics, abstract vector spaces are rarely encountered. For instance, in New-
tonian mechanics, we begin with an affine Euclidean space and immediately utilize
its model, the corresponding Euclidean vector space. What we accomplished in this
chapter, mathematically speaking, was jumping from an abstract vector space to an
affine Euclidean space and then back to an inner product space with significantly
more structure than the initial abstract linear structure. The reader learned to manip-
ulate structures within mathematical objects, adding and subtracting structures, a
skill utilized throughout the book.
We also introduced the concept of a quotient space. While not typically utilized
in physics until the end of master’s studies, standard quotient spaces in differential
topology and geometry are increasingly relevant in modern fields of physics, such
as gravitation and cosmology.
Finally, we address a topic never found in linear algebra books but crucial for
readers and one of the most important applications of linear algebra to physics:
Newtonian mechanics begins with the concept of velocity, particularly the velocity
at a fixed point. This velocity is essentially a vector in a tangential space where
motion occurs. Hence, in physics, we often encounter families of isomorphic vector
spaces, as discussed at the end of this chapter.
Exercise 2.1 The zero, the neutral element .0 of a vector space.V , is uniquely defined.
Prove that .0 is the only zero in .V .
Exercise 2.2 The inverse of any vector of a vector space .V is uniquely defined.
Prove that for any .v ∈ V there is only one additive inverse.
Exercise 2.3 A scalar times the vector .0 gives once more .0.
Prove that .λ0 = 0 for all .λ ∈ K.
Exercise 2.4 The number .0 ∈ K times a vector gives the zero vector.
Show that for all .v ∈ V , .0v = 0V .
Exercise 2.5 The number .−1 times a vector gives the inverse vector.
Show that for all .v ∈ V , .(−λ)v = λv holds.
2.9 Family of Vector Spaces in Newtonian Mechanics 75
Exercise 2.6 If a scalar times a vector gives zero, then at least one of the two is
zero.
Prove that if .λ ∈ Kv ∈ V , and .λv = 0, then either .λ = 0 or .v = 0.
Show that
(i) if . A, B ∈ C, then . AB ∈ C;
(ii) AB = [B A; ]
.
(iii) if . J = 01 −1
0 then . J J = −12 .
(iv) Check that, as a field, .C is isomorphic to complex numbers
C∼
.= C.
In the following four exercises, it is instructive to check that all the vector
space axioms hold on the given vector space. First, to check the axioms of the
commutative group .V and then the axioms connected to the .K scalar action on
the commutative group .V .
. V = {S ∈ Rn×n : S T = S}
. W = {H ∈ Cn×n : H † = H }
. V = R X = Map(X, R),
is a vector space.
76 2 A Fresh Look at Vector Spaces
Exercise 2.12 Square matrices have more structure than a vector space. They form
an algebra (see Definition 2.3).
Show that .Kn×n is an algebra.
The next four exercises are dealing with some aspects of subspaces.
Exercise 2.13 The union of two subspaces of a vector space .V is in general not a
subspace (see Comment 2.5). However, this exercise shows an exception.
Prove that the union of two subspaces .U and .W of .V is a subspace of .V if and only
if one of them is contained in the other.
.U ≤ W ≤ V or W ≤ U ≤ V.
In the following two examples, we consider very special linear maps: Linear
functions, also called linear functionals or linear forms.
2.9 Family of Vector Spaces in Newtonian Mechanics 77
Exercise 2.21
(V ∗ )∗ = Hom(V ∗ , R).
.
.v* : V ∗ −→ R
ξ |−→ v # (ξ ) := ξ(v).
Exercise 2.22 We consider the vector space of square matrices .V = Rn×n , and its
dual .V ∗ = (Rn×n )∗ .
Show that the trace .tr ∈ (Rn×n )∗ , given by
. tr : (Rn×n ) −→ R
∑
n
A |−→ tr(A) := αii
i=1
∑
n
.(v|u) = v̄ T u = v † u = v̄ i u i where v i , u i ∈ C.
i=1
Exercise 2.27 The polarization identity for a real inner product space .V .
Show that for .u, v ∈ V ,
T : V −→ V
. a
v |−→ Ta (v) = a + v.
Exercise 2.35 An affine subspace . A in a real vector space .V contains all its straight
lines.
Show that a subset . A in .V is an affine subset of .V if and only if for all .u, v ∈ A and
.λ ∈ R, .λv + (1 − λ)u ∈ A.
∑
k
. A = { vi α i |α i ∈ K i ∈ {0, . . . , k} and α i = 1}.
i=0
Exercise 2.37 Affine subset in a real vector space .V from a list of vectors.
Show that for a list of vectors .(v0 , . . . , vk ) in .V , there exists an affine subset . A of .V
given by:
. A = {v0 + (vi − v0 )α : α , . . . , α ∈ R and i ∈ I (k)}.
i 1 k
v → v + U,
.
. W ∼
= V /U.
Exercise 2.40 Isomorphism theorem for linear maps.
Let. f be a linear map. f ∈ Hom(V, V ' ) and.V, V ' vector spaces. Show that. f induces
an isomorphism:
∼
=
. f¯ : V / ker f −→ im f
80 2 A Fresh Look at Vector Spaces
In this chapter, we discuss in detail the basics of linear algebra. The first important
concepts in a vector space are linear combinations of vectors and related notions like
generating systems, linear independent and linear dependent systems. This leads
directly to the central notions of bases of vector spaces and their dimension.
Bases allow us to perform concrete calculations for vectors needed in physics,
such as assigning a list of numbers, the coordinates. This enormous advantage has
its price. The representation of an abstract vector by a list of numbers depends on
the chosen basis. Any theoretical calculation we do should obviously not depend on
the choice of a basis. We discuss in detail the satisfactory but demanding solution to
this problem in Sect. 3.2 which can be skipped on a first reading.
We then demonstrate a suitable choice of basis for the representation of linear
maps, and we discuss the origin of tensors in an elementary manner. Finally, we
provide an important application for physics, and show that the transition from New-
tonian mechanics to Lagrangian mechanics is nothing but the transition from a linear
dependent to a linear independent system.
As mentioned above, one main reason for the existence of a basis stems from the
scalar action of .K which gives the possibility of scaling the elements of a basis and
thus to obtain, after adding, all the elements of the given vector space .V .
In addition, we need an abstract and fundamental property that characterizes a
basis: the notion of linear independence or linear dependence (see Definition 3.4).
Before we can understand this, however, one must first understand the notions of
linear combinations and span.
.α1 v1 + α2 v2 + · · · + αk vk
with the coefficients or scalars .αi ∈ K and the vectors .vi ∈ V , .i ∈ I (k) and
.k ∈ N. .span(v1 , . . . , vk ) denotes the set of all such linear combinations:
( )
∑
n
. span(v1 , . . . , vk ) := α vi
i
for all α ∈ V
i
.
i=1
A linear combination is trivial if all the coefficients are zero, and otherwise,
it is nontrivial. Since the linear combination is by definition a finite sum of
vectors, we use the Einstein convention for the sum, and we denote
α1 v1 + α2 v2 + · · · + αk vk =
.
∑
k
= αi vi =: (αi vi )k , = αi vi ∈ V.
i=1
We study a list of columns in .Kn : A = (a→1 , a→2 , . . . , a→k ) and the associ-
ated .n × k-matrix with the columns .a→s ∈ Kn with .s ∈ I (k) given by .[A] =
a1 a→2 . . . a→k ]. We write .[A] for this matrix to distinguish it from the list
[→
. A = (a→1 , a→2 , . . . , a→k ). Note that we use three kinds of brackets: “()” usually
for a list, “.[]” for a matrix, and the standard “.{}” for a set. We identify of course
.[A] with . A automatically, but sometimes it is useful to make the difference. And
ψ A : Kk −→ Kn
.
[ 1] [ 1]
λ λ
λ .
→ = . |−→ ψ A (λ)→ := [A]λ
→ = [→
a →
a . . . →
a ] .. = a→s λs .
.k 1 2 k .k
λ λ
where
The range .im ψ A is of course also the set of all possible linear combinations of the
list . A, and so we have .im ψ A ≤ Kn . We see immediately that .im ψ A is a subspace in
.K because it is the image of the linear map. Thus we have also another proof that
n
Usually, we use the map .ψ A when the list . A is a basis. Then the map .ψ A is a basis
isomorphism and is also called a parametrization. This is extensively discussed in
Sect. 3.2.1. But here, . A is taken as an arbitrary list, as it is not necessary to have a
basis to obtain just a linear combination (see Comment 3.1). The same can be applied
to a linear combination with .as ∈ V instead of .a→s ∈ Kn :
84 3 The Role of Bases
ψ A : Kk −→ V,
.
[ ]
λ1
→ |−→ [a1 a2 . . . ak ]
λ .. = as λs ∈ V.
.k
λ
So we again get .[A] = [a1 . . . ak ] and .im ψ A ≤ V and, as above, we can also define
for . A = (a1 , . . . , ak )
.[A] : Kk −→ V,
[ ]
λ1
→ |−→ [A]λ
λ → = [a1 · · · ak ]
→ = [a1 · · · ak ]λ .. ∈ V.
.k
λ
αi vi + β i vi = (αi + β i )vi ;
.
∎
3.1 On the Way to a Basis in a Vector Space 85
Remark 3.1 The set.U = span(v1 , . . . , vk ) is the smallest subspace of.V con-
taining the list .(v1 , . . . , vk ).
Proof Indeed, a subspace .W of .V containing all .vi of the above list, contains also
all linear combinations of them: .span(v1 , . . . , vk ) ⊂ W . This means that every such
. W is necessarily bigger than .U . Hence .U < W . ∎
An element of .Rn is a vector taken as column which is a n.×1 matrix with scalar
entries which are, here, real numbers. According to our given convention, this
corresponds to a colist of numbers.
Until now, column and colist can be considered as synonyms, particularly if
we consider the elements of .Rn as points. But sometimes, as stated above,
it is necessary and useful to draw the following distinction, for instance if
we multiply matrices: A list and a colist are simply there as a set, as a given
data, and if we want, we can later define some algebraic operations and it is
usually uniquely clear from the context what we mean. Matrices are of course
the well-known algebraic objects. In other words, a .1 × n-matrix with vector
entries (columns) is more than simply a list. For a colist of length .m with any
element, we may write for example symbolically:
⎛ ⎞
∗1
∗2
⎝ . ⎠.
..
.
m
∗
We may add and multiply the elements .* of the above matrix with each other.
So we may now write for the .m × 1-matrix (column) with entries .*:
[ *1 ]
. ..
.m
*
.[*] or [*1 + · · · + *n ].
⎡ v1 ⎤
.
⎢ .. ⎥
.v = ⎢ i ⎥
⎣ v ⎦,
..
.m
v
and ⎡ α1 ⎤
.
⎢ .. ⎥
.a = ⎢ αi ⎥ ,
⎣ ⎦
..
.m
α
v i , αi ∈ R, i ∈ I (m).
.
.θ = (ϑ1 , . . . , ϑs , . . . ϑn ), ϑs ∈ R, s ∈ I (n).
We may associate this with the row matrix .1 × n and with the same symbol .θ
and without the comma:
.θ = [ϑ1 ϑ2 · · · ϑn ].
For the sake of completeness, we write the associated colist of scalars, here
numbers, to the vectors .v and .a above:
( 1)
v
. ..
.m
v
and ( )
α1
. ..
.m
α
(v1 , . . . , vk )
.
.[v1 . . . vk ]
which is a row-matrix with entries vectors. Similarly, we write for the list of
the covectors
∗
.θ , . . . , θ ∈ V
1 k
the colist: ( 1)
θ
..
.
.k
θ
Note that a vector in .Kn corresponds to a colist (vertically written list) of scalars and
a covector in .Kn to a list of scalars.
We can now proceed with a few additional and important definitions to get to
the notion of a basis in vector spaces which is a very special list of vectors with
appropriate properties.
. V = span(a1 , . . . , ak )
is equivalent to the usual and more abstract algebraic formulation, as we shall show
below. This sounds good, but it is difficult to check.
As we see, we have to check here a yes-no question. This, and all the equivalent
definitions of linearly independent and linearly dependent, refer to an abstract vector
space without any other structures, as, for example, volume and scalar product. Even
more, it should be clear, that in this section, we do not know yet what a basis in a
vector space is. This situation makes it difficult to recognize directly the geometric
character of the above definition.
For our demonstration, we therefore choose our standard vector space .Rn , n ∈
N. Here, we know of course the dimension, the volume, and the distance, and we
have the canonical basis . E = (e1 , . . . , en ) in .Rn . In order to proceed, we have to
remember what we mean by a .k-volume in a fixed .Rn where .(k < n). This is in itself
interesting enough. We consider the .k-vectors .(a1 , . . . , ak ) which usually should
define a nondegenerate .k- parallelepiped . Pk := Pk (a1 , . . . , ak ). We know its volume
.vol k (Pk ) = vol k (a1 , . . . , ak ), a positive number (we do not need the orientation,
here), which we may call .k-volume. It is clear that in the case that . Pk is degenerate,
we have .volk (a1 , . . . , ak ) = 0. Similarly, if we consider .k < n, in particular .(k /= n),
we have in an obvious notation .voln (a1 , . . . , ak ) = 0. For all that, we have of course
our experience with our three-dimensional Euclidean space. The generalization to
every fixed .n ∈ N is quite obvious. In connection with this, it is useful to think of
the following sequence of subspaces given by
We may also see immediately the following results. The .n-dimensional volume,
⎧
⎪
⎨is positive if k = n and Pk is nondegenerate
.vol n (Pk ) = =0 if k < n
⎪
⎩
if n < k is not definite.
It is not surprising that this can be expressed with the help of determinants. The
parallelepiped . Pk (a1 , . . . , ak ) corresponds to the matrix . Ak = [a1 · · · ak ] and the
90 3 The Role of Bases
Euclidean volume is given by .volk (Pk ) = det Ak . Taking into account that every list
(a1 , . . . , am ) corresponds also to a parallelepiped, the above result may be stated
.
differently:
If the list . Am = (a1 , . . . , am ) “produces” enough dimension (enough “space”),
which means that .dim(span Am ) = m, we have .volm (Am ) positive (nonzero). If this
list “produces” not enough dimension, which means that .dim(span Am ) < m, we
have .volm ( Am ) = 0. Furthermore, since we are only interested in values positive or
zero, we may define an equivalence relation yes or no.
In the above sense, we can say, to simplify, that linearly independent means “enough
space” and linearly dependent “not enough space”.
The list .(a1 , . . . , am ) is linearly independent if the equation .ai ξ i = 0 has only the
trivial solution: .ξ i = 0 for all .i. Or else, the list .(a1 , . . . , am ) is linearly dependent
which means that the equation .ai ξ i = 0 has a nontrivial solution.
The next lemma shows a property of a linearly independent list which underlines
the importance of being linearly independent. As we shall see, this property turns
out to be an essential property of a basis in .V .
0 = ai ξ i − ai η i = ai (ξ i − η i ) ∈ U.
.
.ξi − ηi = 0 ⇔ ξi = ηi
The next lemma tells us essentially that the informal definition (see Definition
3.4) and Definition 3.5 are equivalent. It is closer to the geometric aspects of the
definition. This means that in a linearly independent list, a vector loss leads to the
loss of spanning space. For a linearly dependent list, there is always a redundant
vector, the absence of which leaves the spanning space of the list invariant. For
example, in the next lemma the vector .a j is redundant.
92 3 The Role of Bases
Proof We show that (i) .⇔ (ii) and (i) .⇔ (iii) which establishes the result. For this
purpose, we define
a ξi = 0
. i (3.1)
has a nontrivial solution. This means that, for example, there is some . j for which
E j /= 0. Without loss of generality, by scaling if necessary, we may put .ξ j = −1 and
.
.a j = as α αs ∈ K.
s
(3.3)
If .v ∈ span A, we have
v = ai v i = as v s + a j v j with j = j0 fixed , v i ∈ K.
. (3.4)
We insert Eq. (3.3) into Eq. (3.4) with . j = j0 fixed (.v i ∈ K) and we so obtain
.v = as v s + (as αs )v j , (3.5)
3.1 On the Way to a Basis in a Vector Space 93
which is
v = as (v s + αs v j ) with v s + αs v j ∈ K.
. (3.6)
and
. − a j + as αs = 0.
The next lemma refers to a special feature of a spanning list. A spanning list is
very “near” to a linearly dependent list.
Proof This is almost trivial: Since.V = span Am , we have.v ∈ span Am which means
that.−v = ai ξ i , i ∈ I (m). This is equivalent to.v + ai ξ i = 0 which shows that. Am+1
is linearly dependent. ∎
But in our case, here in this section, we cannot use the above considerations. In
this section, up to now with an abstract vector space .V , we have not yet defined what
a basis is. We neither know what a dimension of a vector space is. The following
proposition will help us, among other things, to prove the existence of a basis in .V .
The length of a linearly independent list is less than, or equal to the length
of a spanning list in .V .
94 3 The Role of Bases
different numbering.
The next step leads quite similarly to.C2 = (a1 , a2 , c3 . . . , cm ). Proceeding equally,
we obtain .Cr1−1 = (a1 , . . . ar −1 , cr , . . . , cm ), a spanning list again.
So far, we discussed two important properties for a given fixed number of vectors
(a1 , . . . , am ) in .V . Such a list of vectors can be linearly independent or not and
.
spanning or not. The possibility of a list of vectors being both, linearly independent
and spanning, seems more attractive than the other three possibilities. This leads to
the definition of a basis for a finitely generated vector space .V which we consider in
this book.
Proof We show that (i) .⇔ (ii), (i) .⇔ (iii), and (i) .⇔ (iv) which clearly establishes
the result.
(i) .⇒ (ii): Given (i), every .v ∈ V is a linear combination of vectors in . B, since
. B, according to (i), is also linearly independent. The above linear independence and
uniqueness lemma states that this linear combination is unique. So we proved (ii).
(ii) .⇒ (i): Given (ii), every vector .v is a linear combination of vectors in . B, so
. B spans . V . Since this linear combination is unique, the linear independence and
uniqueness lemma tells that . B is also linearly independent. This proves (i). So we
proved the statement (i) .⇐ (ii).
(i) .⇒ (iii): Given (i), we have to show that the linearly independent list . B is
maximal. Since . B spans .V , if we add any vector .v ∈ V to . B, we get .(B, v) =
(b1 , . . . , bn , v), and according to the above remark, this list is now linearly dependent,
and not linearly independent any more. This means that . B is linearly independent
and maximal. This proves (iii).
(iii) .⇒ (i): We have to show that . B is linearly independent and spans .V . Since . B
is already linearly independent and maximal, we have only to show that . B spans .V .
. B being maximally linearly independent, if we add any .v ∈ V , we get .(B, v) which
is now linearly dependent. Therefore, .v is a linear combination of . B. So . B spans .V
and (i) is proven. This is why the statement (i) .⇔ (iii) is proven too.
(i) .⇒ (iv): (i) means that . B = (b1 , . . . , bn ) spans .V and is linearly independent.
According to the linearly dependent lemma above, if we delete a vector of this list
and write, for example . B0 = (b1 , . . . , bn−1 ), then . B0 does not span .V any more. So
. B spans . V and is minimal. This proves (iv).
(iv) .⇒ (i): We start a spanning list with . B minimal. This means for example that
the list . B0 = (b1 , . . . , bn−1 ) does not span .V any more: .span(B0 ) /= span(B). In this
case, the linearly dependent lemma tells us that . B is linearly independent. So. B spans
. V and is linearly independent. This proves (i) and we proved the statement (i) .⇔
(iv). ∎
We considered all the above conditions in detail and will do so as well in what
follows because a basis is our best friend in linear algebra!
The existence of a basis is given by the following proposition:
Proof This can be seen as follows. Since .V is finitely generated, we can start by
a spanning list: Say .span(v1 , . . . , vm ) = V . If the list is not linearly independent,
we throw out some vectors of it until we obtain a minimally spanning list. This is,
according to the above proposition, a basis of .V . ∎
The existence of bases does not mean a priori that every basis for .V has the same
number of vectors (the same length). But, as the next corollary shows, it does.
96 3 The Role of Bases
In a finitely generated vector space .V in a basis, every basis has the same
finite number of vectors.
Proof Here we can apply the Proposition 3.2. The length of a linearly independent
list is less or equal to the length of a spanning list. We start with the two bases . B and
.C. . B is linearly independent and .C spans . V . So we have .*(B) < *(C). Similarly, we
can say that .C is linearly independent and . B spans .V so that we have .*(B) > *(C).
It follows that .*(B) = *(C). ∎
This means that the number of vectors in a basis, the length of a basis, is universal
for all bases in a vector space .V , and this is what we call the dimension of .V .
The dimension of a finitely generated vector space is the length of any basis.
We denote the dimension by .dimK V . The dimension depends on the field .K. If the
field for .V is clear, we may write .dim V , but we have to know that, for example,
.dim R V / = dim C V . It now becomes more apparent that the characteristic data of a
vector space .V are the field .K and its dimension. This also justifies the isomorphism
.V ∼= Kn . But since .Kn has much more structures than the abstract vector space .V
with .dimK V = n, the isomorphism refers to those structures in .Kn which correspond
only to the structure of .V .
Now that we have a basis . B = (b1 , . . . , bn ) for .V , we may ask how many bases
exist for .V . As we already mentioned, Poincaré might have proposed this question to
Einstein. We will discover that this goes very deeply into what relativity is (see also
Chap. 4). Apart from this, to understand linear algebra, it is fundamental to have a
good understanding of the space of bases. But here, the question initially arises what
the individual bases are useful for.
It is therefore helpful to discuss what a given basis makes of an abstract vector
space and, in particular, what a basis makes of a vector.
A given basis . B = (b1 , . . . , bn ) determines for each abstract vector .n numbers,
its coordinates. This leads to the parameterization of the given abstract vector space,
using the standard vector space .Kn and, in particular, it allows to describe each
3.2 Basis Dependent Coordinate Free Representation 97
∑
n
v=
. ξ i bi = (ξ i bi )n
I =1
We shall mostly use the notation for the scalars with small greek letters, vectors
with small latin letters, and matrices with capital letters; covectors with small greek
letters, taking care not to confuse them with the notation of scalars.
The scalars .ξ i are also called coefficients or components of .v with respect to
the basis . B. We use the Einstein convention for the summation and in addition
some obvious notations, as usual, setting, for example, .(ξ i bi )n = ξ i bi whenever no
confusion is possible. Further on, we consider the column vector or column-matrix
→
.ξ:
[ ] ξ1
. ξ→ = .. = (ξ i )n = (ξ i )
.n
ξ
as an element of .Rn identifying .Rn with .Rn×1 (the column-matrices) and using again
an obvious notation. What follows corresponds to Sect. 3.1 and to the notation pre-
sented there. For a fixed . B, this leads to a bijection between the elements of .Rn
and .V , as .ξ→ ↔ v, or more precisely to a linear bijection or isomorphism .ψ B (basis
isomorphism):
ψ B : Rn −→ V,
.
[ ξ1 ]
→ := ξ bi = [B]ξ→ = [b1 . . . bn ] .. .
ξ→ |−→ ψ B (ξ) i
.n
ξ
.φ B : V −→ Rn ,
v |−→ φ B (v) = v B = ξ→ ∈ Rn .
The linear map .φ B , given by the basis . B, is also called a representation. It is, as well
as .ψ B , a linear bijection, an isomorphism, given by the basis . B and therefore also
called basis isomorphism.
We might want to identify .ψ B with . B and .ψ −1 −1
B = φ B with . B , and .[B] with . B
and write
∼
=
B : Rn → V and
.
∼
(3.7)
−1 =
B : V → R . n
. B(V ) = I so(Rn , V ) = {ψ B } .
This point of view allows determining the spaces . B(V ) and finding the correct
behavior under bases and coordinate changes. It also allows us to determine precisely
what a coordinate-free notion means in a formalism, mainly when this formalism
depends explicitly on coordinates. This is also the case with tensor calculus.
What follows is a very interesting and important example and application of Sect.
1.3 about the group action and the definitions there. The key observation is that the
group .Gl(n) of linear transformations (“transformation” in this book is a synonym
for “bijection”) in .Rn , surprisingly acts also on the space . B(V ), even if .V is an
abstract vector space where the dimension .n is not visible as with .Rn . On the other
hand, it is clear that we have the following .Gl(n) actions on .Rn :
Gl(n) × Rn −→ Rn ,
. (3.8)
→
(g, ξ) |−→ g ξ→
−→
.g :Rn −→ Rn [B] : Rn ∼
= V and
g [B]
Rn −→ Rn −→ V.
.
[B] ◦ g
The following proposition concerning the .Gl(n) action on . B(V ) which we give
without proof, answers our question about the space of bases in .V .
. Gl(n)| −→ B(V ),
bi j
g | −→ Ψ(g) := B0 g, B0 ∈ B(V ).
This means that there is a bijection between the elements of . B(V ) and the elements
of .Gl(n) (i.e., . B ↔ g). Because of transitivity, for a fixed basis . B0 , for each . B, we
have .∃!g ∈ G with . B = B0 g.
. B = B0 g or B(V ) = B0 Gl(n).
. B(V ) is an orbit of .Gl(n) relative to . B0 . Note that . B(V ) is the so-called .Gl(n) torsor.
A space which is an orbit of a group .G is called a homogeneous space (see
Remark 1.2). So . B(V ) is a homogeneous space of the group .Gl(n). A free action
means that for every . B1 and . B2 ∈ B(V ) there exists a unique .g12 ∈ Gl(n) so that
. B2 = B1 g12 . This is analogous to the connection between a vector space . V and its
associated affine space. The homogeneous space . B(V ) corresponds to the affine
space . X = (V, T (V ), τ ) as discussed in Sect. 2.5 and the group .Gl(n) corresponds
to the abelian group .V .
The group .Gl(n) is also called the structure group of the vector space .V ∼ = Rn .
Relativity here means that the geometric object .v ∈ V can be represented by an
element of .Rn relative to the coordinate system . B as .ψ −1 B (v) = v B ∈ R .
n
the .Gl(n) relativity and in connection with this, the .Gl(n) group is also called the
structure group, and usually in physics, the symmetry group of the theory.
Our next step is to construct a new vector space .Ṽ which contains in a precise way
all the representations of the vectors .v in .V and can be identified with our original
vector space .V . As we shall see, the result is a coordinate-free formulation of the
. Gl(n)-relativity of . V . Coordinate-free here means, by the explicit use of coordinate
systems, that we work with all coordinate systems simultaneously. In other words,
the tensor calculus, as applied in physics and engineering, which depends explicitly
on coordinates, can be formulated in a precise coordinate-independent way. In this
sense, it is equivalent to any coordinate-free formulation if done right in a consistent
notation.
The vector space .V ~ is given as a set .V
~ = {z̃, ỹ, x̃ . . . } of .Gl(n)-equivariant maps
from . B(V ) to .R (see Definition 1.9). As we saw, . B(V ) is a right .Gl(n) space (see
n
Definition 1.4) and we consider .Rn as a left .Gl(n) space. As we saw in Eqs. (3.8)
and (3.9), both actions are canonically given. This justifies the equivariance property
we demand, so we have for .z̃ ∈ Ṽ
. z̃ : B(V ) −→ Rn ,
B |−→ z̃(B),
with
. z̃(Bg) = g −1 z̃(B). (3.10)
See also Comment 1.1 on the meaning of the right action. This may also be shown
by the commutative diagram
z̃
. B(V ) −→ Rn
g↓ ↓ g −1
B(V ) −→ Rn . (3.11)
z̃
In Eq. (3.11), we interpret the .Gl(n) action on .Rn also as a right action:
Rn × Gl(n) −→ Rn ,
. (3.12)
→ g)
(ξ, →
|−→ g −1 ξ.
3.2 Basis Dependent Coordinate Free Representation 101
~.
Definition 3.8 The equivariant vector space .V
~ can
Taking into account Eqs. (3.10), (3.11), and (3.12), the vector space .V
be written, with .B := B(V ), as
( )
. ~ := Map(B,
V ~ Rn ) = Mapequ B(V ), Rn . (3.13)
For good reasons, we may call .V ~ the equivariant vector space of .V . We have here an
example of a “complicated” vector space (see Comment 2.9 in Sect. 2.6)!.V ~ is a vector
space since it is a vector valued map (.Rn —valued). Furthermore, .dim V ~ = dim V
holds since .z̃ ∈ V ~ is an equivariant and the group .Gl(n) acts transitively on . B(V ). So
if we define .z̃ at one given . B0 ∈ B(V ), then its value is also given by the equivariance
property in every other basis . B, for example, . B = B0 g. This leads to
~ = Map(B,
The vector space .V ~ Rn ) is canonically isomorphic to .V , so we
∼ ~
have .V = V .
k
Proof We already know that .V~ is a vector space and that its dimension is.dim Ṽ = n.
Addition and scalar multiplication are given as follows:
~ : z̃, ỹ : B(V ) −→ Rn ,
For z̃, ỹ ∈ V
.
k :V −→ V
. ~ = Map(B,
~ Rn ),
v |−→ k(v) =: ṽ
with
ṽ(B) := B −1 (v) ∈ Rn
. (3.15)
102 3 The Role of Bases
. B: Rn −→ V,
B ◦ g : Rn −→ Rn −→ V,
g B
R ←− R ←− V : (B ◦ g)−1 ,
n n
g −1 B −1
This represents simultaneously the effect of any change of basis and shows that
k(v) ≡ ṽ is indeed equivariant. The proposition is proven, and the identification
.
~ is established.
between .V and .V ∎
For example, .[v] B is a .n × 1 column-matrix with entries scalars. The diagram (3.11)
g
and Eq. (3.12) express a change of basis via equivariance: . B |−→ B ' = Bg and we
get
→B ' = ṽ(B ' ) = ṽ(Bg) = g −1 ṽ(B) = g −1 v→B .
.v (3.18)
If we set .g −1 = h, we have
v→
. B' = ṽ(B ' ) = ṽ(Bh −1 ) = h ṽ(B) = h v→B . (3.19)
The last equation is the usual form for a change of basis for the coefficient vectors.
In what follows, we recall change of basis in the standard form, as usually done in
physics.
Taking a second basis.C = (c1 , . . . , cn ) for.V , we have analogously.ṽ(C) = vC =
v→C = [v i ]C ∈ Rn . So there exists a matrix .T ∈ Gl(n) with scalar entries .τsi ∈ R:
. T = (τsi ) ≡ [τsi ],
so that
ψ B = ψC ◦ T or equivalently B = C T ⇔ C = BT −1 .
. (3.20)
So we have for .v ∈ V
v = ψ B (→
. v B ) = ψC (→
xC ) (3.21)
in various notations:
v = ψ B (→
. v B ) ≡ [B][v i ] B = B v→B = C v→C = ψC (→
vC ). (3.22)
The result of the map .ṽ is again given by .ṽ(C) = v→C . Using Eqs. (3.20), (3.21), and
(3.22), we can write:
The result of Eq. (3.23), .v→C = T v→B , is exactly the result of the equivariant property
of .ṽ.
The appearance here of .v ∈ V as the map .ṽ which is explicitly coordinate (basis)
dependent, legitimizes the formalism of coordinates of linear algebra and the tensor
calculus to be as rigorous as any coordinate-free formulation.
104 3 The Role of Bases
There is, in addition, a different formalism which shows another aspect of coordinate
independence. There is a second canonical isomorphism of vector spaces:
. V̄ ∼
= V,
where .V̄ is defined in analogy to the vector bundle formalism and by using again
the .Gl(n) action on .B(V ) and .Rn . We will discuss this here shortly since it offers
a different but equivalent point of view. Besides that, it is a very interesting and
important example of Sect. 1.2 dealing with quotient spaces.
We consider the set . M := B(V ) × Rn = {(B, x→)} which is the space of pairs
(basis, coordinate vector). . M is canonically a .Gl(n) space. Setting .G = Gl(n) we
have an action defined as:
M×G
. −→ M,
( )
(B, x→), g |−→ (Bg, g −1 x→) =: (B, x→)g.
It is not difficult to recognize that this (basis, coordinate vector) class corresponds
bijectively to a unique vector in .V . We expect for example .[B, x→] ↔ ψ B (→ x ). This
leads to the definition (.G := Gl(n)):
We see that .V̄ is a .G orbit space since every element .[B, x→] = (B, x→)G is a .G orbit;
see also Remark 1.2 on homogeneous spaces. Then the following proposition is valid.
n
. V̄ = (B(V ) × R ) / Gl(n) is a vector space and is canonically isomorphic
to .V .
3.3 The Importance of Being a Basis Hymn to Bases 105
λ : V̄ −→ V,
.
We may call .V̄ the associated vector space to .V and .λ the isomorphism of the
structure since the vector space .V̄ can also be considered as a model for the
abstract vector space.
Scalar multiplication:
.α[B, x→] := [B, α→
x ], α ∈ R.
∎
So we have for the isomorphic .V̄ ∼
= V.
Rn ∼
. =V ∼
=V~∼
= V̄ !
We have already demonstrated the usefulness of a basis for .V . This makes it possible
to express an abstract vector .v ∈ V simply by a list of numbers (scalars). This allows
us to communicate to everybody this special vector just by numbers. The price for
106 3 The Role of Bases
this achievement is for example that, in the end, one basis is not enough and that we
have to consider all the bases . B(V ) altogether. This is demonstrated in Sect. 3.2 by
the equivariant map .ṽ : B(V ) → Rn . Stated differently, the price is that instead of a
single element .v, we have to know a special function .ṽ or the equivalence class .[v].
That is, we believe, a fair price!
It gives us even more: with given bases in .V and .V ' , we can, in addition, describe
a linear map . f ∈ Hom(V, V ' ) with a finite amount of numbers. This list of numbers
is organized by a matrix, as is well-known, and there is a linear bijectivity or an iso-
morphism between linear maps and matrices. In addition, many properties and many
proofs within the category of vector spaces can be easily formulated by explicitly
using bases. This is what we are going to demonstrate in what follows. Even more,
we are going to realize in this book that bases are our best friends in linear algebra.
The proposition below shows that a linear map is uniquely determined by the
values of the basis vectors for the domain space:
. f (v) := f (ξ i bi ) := ξ i wi . (3.24)
This shows that the value . f (v) is given uniquely: the coefficients .ξ i are uniquely
defined and the values.wi = f (b1 ) are uniquely given. Therefore, there exists at most
one such map. ∎
The following proposition shows that we can choose a tailor-made basis . B0 ,
leading to further essential conclusions, as for example Theorem 3.1 below about
the normal form of linear maps which reveals its geometric character and many very
important corollaries.
and we obtain
so .0 = (ρμ z μ )k is left.
Since .(z μ )k is a basis for .ker f , it follows that .ρμ = 0. This shows that . B0 is
also linearly independent. The list . B0 spans .V and is linearly independent. We so
managed to find . B0 , a tailor-made basis for .V ! ∎
Even more, we could say that these corollaries summarize the entire representa-
tion theory of linear maps with .V ' /= V or essentially also the endomorphisms
.Hom(V, V ) if we take two different bases (see the singular value decomposition,
SVD, in Sect. 12.2). On the other hand, if we use for the description of the endomor-
phisms .Hom(V, V ) only one basis, then the problem is more challenging and leads
to more advanced linear algebra (see Chaps. 9 up to 13). We consider, as usual in
linear algebra, finite-dimensional vector spaces.
Proof From the length of the bases and for the basis-independent subvector spaces
ker f , .im f and .V , .k = dim(ker f ), r = dim(im f ) and .n = dim V , and the basis
.
above . B0 , we see: .k + r = n ∎
108 3 The Role of Bases
Proof Taking into account Definition 2.18 and Exercise 2.38 about affine spaces and
linear maps, and . f −1 (w) = A(v) = v + ker f from Fig. 2.4, we get .dim f −1 (w) =
dim A(v) = dim ker f . We so obtain directly from the rank-nullity theorem,
−1
.dim f (w) = dim V − dim(im f ) and .dim f −1 (w) = dim(ker f ). ∎
For . f : V → V ' linear and .dim V = dim V ' , the following conditions are
equivalent:
(i) . f is injective,
(ii) . f is surjective,
(iii) . f is bijective. ∎
For .V a vector space and . B = (b1 , . . . , bn ) a basis for .V , there exists one
canonical isomorphism .ψ B (basis-isomorphism)
This shows that in this case we do not have to distinguish between linear maps and
matrices, the above . F x→ being of course a matrix multiplication.
Proof Notice that . f (ei ) ≡ Fei := f i are the columns of the matrix . F (see Example
2.23 and Sect. 2.4). So we have . F = [ f 1 . . . f n ]. ∎
Given two vector spaces .V with basis . B = (v1 , . . . , vn ) and .V ' with basis
.C = (w1 , . . . , wm ).
Then for every linear map . f : V → V ' , there exists precisely one matrix
. F = (ϕr ) ∈ K such that . f (vr ) = wi ϕri for .r ∈ I (n) and .i ∈ I (m). The
i m×n
map
Proof We use the Einstein convention. The position of the indices .i and .r upstairs
and downstairs respectively refers also to the basis transformation properties
(.Gl(n), Gl(m)). As .C is a basis for .V ' , the linear combinations .wi ϕri are uniquely
determined and with the index .r fixed,
⎡ ϕ1 ⎤
r
⎢ ... ⎥
⎢ ⎥
. f r = ⎢ ϕri ⎥
⎣ . ⎦
..
ϕr m
and
(λ f )(vr ) = λwi ϕri = wi (λϕri ).
So we have with . M := MC B
. M( f + g) = M( f ) + M(g)
M(λ f ) = λM( f ).
Since . B is a basis for .V , . f is, by Proposition 3.8, uniquely defined by the con-
dition . f (vs ) := wi ϕis . Therefore, . F determines uniquely the values of . f : F =
[ f 1 . . . f n ], f (B) = [C]F and . MC B ( f ) = F. So . MC B is bijective. ∎
⎡ ⎤ ⎡ ⎤
1r 0 1 0
. MC0 B0 ( f ) = ⎣ ⎦ where 1r = ⎢ ⎥
⎣ ... ⎦ ∈ R .
r ×r
0 0 0 1
Proof From Proposition 3.9 we have essentially the tailor-made bases . B0 and .C0 for
V and .V '
.
. B0 = (b1 , . . . , br , z 1 , . . . , z k ) ∈ B(V ).
Then
The name “normal form” is historical. Behind it, however, are the notions
of equivalence, relation and quotient space as introduced in Sect. 1.2. Theo-
rem 3.1 is a very prominent example. It corresponds to the simplest possible
representation in every equivalence class. [ ]
Similarly, for example, the normal form . M0 = 10r 00 for an .m × n-matrix
. M, is a special representative of the corresponding matrices which are equiv-
alent to the given matrix . M. This corresponds to an improved form of the
reduced row echelon form (rref). Thus, the set of all these “normal” matrices
.{M0 } is bijective to the corresponding quotient space, as discussed in Sect. 1.2.
If we take .m ≤ n without loss of generality, we have:
So we get: [ ]
∼
= 1r 0
K
.
m×n
/∼ { | r ∈ I (m)}.
bi j 0 0
We observe that the set .Km×n with infinite cardinality has the quotient space
.K /∼ which is finite:
m×n
∼
=
Km×n /∼
. bi j I0 (n) = {0, 1, 2, . . . , n}.
112 3 The Role of Bases
f
. V −→ V '
Φ↑ ↑ Φ'
V −→ V '
g
There is a simple question: Does the same simple normal form (see Theorem
3.1 for a linear map . f ∈ Hom(V, V ' ) also apply to the case where .V ' = V and
.C 0 = B0 , that is, for an endomorphism . f ∈ Hom(V, V )? The answer here is,
no. This is a difficult problem and it leads to the Jordan form. It is the question
of diagonalization or non-diagonalization of endomorphisms (operators) and
square matrices (see Chap. 9, Sect. 9.5 ).
But what would such a simple normal form mean? The operator . f , for
example, would in any case have a representation with a diagonal matrix, that
is, a direct decomposition of the space .V , a discrete list of .n = dim(V ) scalars
.(λ1 , λ2 , . . . λn ) and
. f (u i ) = λi u i , i ∈ I (n),
. V = U1 ⊕ U2 ⊕ · · · ⊕ Un ,
f
. |U i = λi idU i .
We could simply represent each such one-dimensional space .Ui as a null space
of . f − λi idV :
.Ui = ker( f − λi id V ).
It is clear that if for a given index, for instance .i = 1, the scalar .λ1 is zero, then
we have .U1 = ker f and, in addition,
. im f = U2 ⊕ U3 ⊕ · · · ⊕ Un .
. ker f ⊕ im f = V.
It may be plausible that the search for such a simple decomposition of the
vector space .V cannot be straightforward. See also Proposition 3.13.
. MC B ( f ) ≡ M BC ( f ) ≡ M BC ( f ) ≡ f C B ≡ [ f ]C B
≡ FC B ≡ [ f (b1 )C . . . f (bn )C ] ≡ [F(B)]C .
For fixed . B and .C we can define a basis for .Hom(V, V ' ) : f ir ≡ fri : V → V ' :
(
' wi for s = r
f ≡
. ir fri :V →V : f ir (vs ) .
0, s /= r
The entry .1 is in the .r th column and in the .ith row. All the entries are zero.
This is another proof of Corollary 3.8. .{ f ir } is a basis of .Hom(V, V ' ) and
. E ir is a basis of .K
m×n
. It should be evident that an isomorphism is a map that
sends a basis to a basis.
It is instructive to start with the direct product or Cartesian product we know very
well because the comparison with the direct sum provides interesting insights for
both. We first briefly recall its definition:
U1 × · · · × Um := {(u 1 , . . . , u m ) : u 1 ∈ U1 , . . . , u m ∈ Um }
.
λ(u 1 , . . . , u m ) := (λu 1 , . . . , λu m ).
.
Using the definition in Sect. 2.7, the following results for given subspaces .U1 , ..., Um
of .V are directly obtained.
(i) .U1 + · · · + Um < V ;
(ii) .U1 + · · · + Um = span(U1 ∪ · · · ∪ Um );
(iii) .dim(U1 + · · · + Um ) < dim U1 + · · · + dim Um .
For the sum of two vector spaces, .U1 and .U2 , particularly .U1 ∩ U2 /= {0} not being
excluded, we have .dim(U1 + U2 ) ≤ dim(U1 ) + dim(U2 ).
The exact relation is given by the following statement.
bases of .U1 , U2 , and .U3 = U1 ∩ U2 , we may obtain new bases . B1' and . B2' for .U1 and
.U2 .
∎
116 3 The Role of Bases
. V = U ⊕ W.
It is evident that the choice of .W is not unique. So we may have, for example, another
subspace .Y , such that again .V = U ⊕ Y.
. V = ker f ⊕ im f
holds? For this problem, the spaces .ker f, ker f 2 , im f 2 , im f are relevant,
as is their behavior. All these spaces are . f -invariant subspaces of .V and in
particular the relation
. ker f ≤ ker f
2
(3.25)
holds.
This follows from
x ∈ ker f ⇒ f x0 = 0 ⇒ f 2 x0 = f ( f x0 ) = f (0) = 0
. 0
and so
x ∈ ker f 2 .
. 0
Proof We already saw that .ker f ≤ ker f 2 (3.25). So assertion (iii) is equivalent to
condition (iv). Therefore, it is enough to show that (i) and (ii) are equivalent to (iv).
We now show that (iv) .⇔ (ii) and (ii) .⇔ (i) which establishes the result.
– (iv) .⇒ (ii)
Given .ker f 2 ≤ ker f , we have to show that .ker f ∩ im f = {0}: Let .z ∈ ker f ∩
im f which means .z ∈ im f or .z = f (x), and also .z ∈ ker f which means .0 =
f (z) = f ( f (x)) = f 2 (x) or .x ∈ ker f 2 . Assertion (iv) leads to .x ∈ ker f which
means . f (x) = 0, and to .z = f (x) = 0, which proves .ker f ∩ im f = {0} which
is assertion (ii).
118 3 The Role of Bases
– (ii) .⇒ (iv)
Given .ker f ∩ im f = {0}, we have to show .ker f 2 ≤ ker f : Let .x ∈ ker f 2 . Then
. f (x) = 0 which means . f ( f (x)) = 0 and . f (x) ∈ ker f . Since . f (x) ∈ im f ,
2
we have . f (x) ∈ ker f ∩ im f = {0} such that . f (x) = 0, x ∈ ker f and .ker f 2 ≤
ker f which proves (iv).
– (ii).⇔ (iv)
The implication (i) .⇒ (ii) is clear by Proposition 2.3 since the direct sum .U1 ⊕
U2 = V means that .U1 ∩ U2 = {0}.
– (ii) .⇒ (i)
Given .ker f ∩ im f = {0}, we have to show .ker f + im f = V . According to
Proposition 2.3, .ker f ∩ im f = {0} means direct sum:
. ker f + im f = ker f ⊕ im f.
Proof We first show that .Φ is injective, that is, .ker Φ = {0}: If .Φ(z 1 , . . . , z m ) = 0,
we have .z 1 + · · · + z m = 0. Since the sum on the right hand side above is direct, the
uniqueness of the decomposition of .0 leads to .z 1 = 0, . . . , z m = 0 which shows that
.ker Φ = {0} and .Φ are injective. Further more, the rank-nullity theorem,
. dim(im Φ) = dim(U1 × . . . × Um ),
As we saw, there are various possibilities to construct a new vector space out of the
two vector spaces .U1 and .U2 . The role of the two bases . B1 and . B2 is particularly
important in this construction. As we saw (Corollary 3.10), for the case of the direct
sum .U1 ⊕ U2 , we have .U1 ⊕ U2 = span(B1 ⊔ B2 ) with .dim(U1 ⊕ U2 ) = dim U1 +
dim U2 . Note that we may also write . B1 ⊔ B2 = (B1 , B2 ) and likewise .U1 ⊕ U2 =
span(B1 , B2 ).
Now we may ask the provocative question: If we take the Cartesian product . B1 ×
B2 instead of the disjoint union . B1 ⊔ B2 , what can we say about the corresponding
vector space .W = span(B1 × B2 )?
Using the same notation as in the corollary above, the basis . BW of .W is given by
Let us now for simplification reasons consider real vector spaces. As we know,
an abstract vector space is completely determined by its dimension, so we have:
.dim W = dim U1 dim U2 . The vector space . W is what is called a tensor product of
.U1 and .U2 and we write . W = U1 ⊗ U2 .
In addition, it is clear that for .W nothing changes if we write for the basis vectors
.(as , bi ) ≡ as bi ≡ as ⊗ bi and for good reasons they may be called product or even
tensor product of the basis vectors .as and .bi . So we may write . BW = B1 × B2 =
{as ⊗ bi } and we have for .w ∈ U1 ⊗ U2
w = w si as ⊗ bi with w si ∈ R.
.
It may also be clear that the new vector space .W , .W = U1 ⊗ U2 is not a subspace of
V . This justifies our characterization of the above question as provocative.
.
The tensor space .U1 ⊗ U2 depends only on .U1 and .U2 , regardless of where these
come from.
As one may already realize, we can hardly find a subject that does not use tensors
in physics. In the Chaps. 8 (First Look at Tensors) and 14 (Tensor Formalism), we
are going to learn much more about tensor products.
120 3 The Role of Bases
i ∈ I (m), s ∈ I (n)
.
.ψ i : Rm −→ R,
(q 1 , . . . q n ) |−→ ψ i (q 1 , . . . , q n ) ≡ x i (q s ),
describing the configuration space. Q of dimension.n, see Fig. 3.1. The variables.q s are
the usual generalized coordinates used in classical mechanics. For our demonstration,
the time dependence is not relevant and is therefore left out. For the mass .m of the
particles, we take, without loss of generality, .m i = m = 1 for all .i ∈ I (m). It is clear
that we cannot solve the Newtonian equation in the usual form (with .m = 1 and the
force . F):
. ẍ (t) = F ,
i i
(3.26)
∂x i s
.dxi = dq . (3.27)
∂q s
For this construction, in order to use linear algebra, we direct our attention to
the point . p0 ∈ Q. We consider the vector space .V = T p0 Rm ∼ = Rm and its subspace
. W := T p0 Q ∼ = R . Heuristically, we have to “project” Eq. (3.24, 3.26) onto the con-
n
figuration space . Q and particularly onto the corresponding vector space .W at the
position . p0 ∈ Q. This leads to the work done by the force . F with respect to the dis-
placement .d x in .W . Using the dot product .<|> in .Rm and the Newtonian equation
(3.24, 3.26) we obtain
. < ẍ | d x >=< F | d x > (3.28)
. i x¨ d x i = Fi d x i . (3.29)
It is clear that we cannot drop the .d x i in Eq. (3.28, 3.29). The covectors .d x i are
linearly dependent as we saw in Eq. (3.26, 3.27). The covectors .dq s ( p0 ) are by
definition linearly independent. At the same time it is also clear that we have to use
Eq. (3.26, 3.27) in order to express Eq. (3.29) with the covectors .dq i ( p0 ). The latter
are linearly independent and so we obtain:
∂x i s ∂x i s
ẍ
. i dq = Fi dq . (3.31)
∂q s ∂q s
122 3 The Role of Bases
Since .dq s are linearly independent, we may write for all .s ∈ I (n):
∂x i ∂x i
ẍ
. i = Fi . (3.32)
∂q s ∂q s
From now on, we can proceed exactly the same way as in the physical literature. For
the sake of completeness, we continue with our simplified prerequisites (.m i = 1 and
time independence) and we obtain the Lagrangian equation: For the right hand-side
of the Eq. (3.32), we may write:
∂x i
. Fs := Fi . (3.33)
∂q s
The quantifiers .Fs are called generalized forces. In the case of the existence of a
potential .V , we may write
∂V d ∂V
.Fs = − + . (3.34)
∂q s dt ∂q s
For the left-hand side of Eq. (3.32), using essentially the product rule of differentia-
tion, we obtain:
) ) ) )
∂x i d ∂x i d ∂x i
. ẍ i = ẋi s − ẋi
∂q s dt ∂q dt ∂q s
) i)
d ∂x ∂ ẋ i
= ẋi s − ẋi s . (3.35)
dt ∂q ∂q
(3.27), we get:
dxi ∂x i s
. ≡ ẋ i = v i (q s , q̇ s ) = q̇ , (3.36)
dt ∂q s
∂ ẋ i ∂x i
. = (3.37)
∂ q̇ s ∂q s
and
∂x i ∂ ẋ i ∂v i
. = = . (3.38)
∂q s ∂ q̇ s ∂ q̇ s
and ) )
d ∂ ∂
. (T − V ) − s (T − V ) = 0. (3.42)
dt ∂ q̇ s ∂q
. L := T − V (3.43)
are the Lagrangian equations. As we see, the essential part for the derivation of the
Lagrangian equation up to Eq. (3.32), is just an application of linear algebra.
Summary
We have examined the role of bases from all angles. This role is extremely positive,
particularly concerning physics. We have also pointed out certain potential drawbacks
when expressing certain basis dependent statements. But for these drawbacks, a
satisfactory response was provided. We showed that one can look at all possible
bases simultaneously to a avoid coordinate dependence.
In order to define what a basis of a vector space is, it was necessary to introduce
several elementary concepts, such as the concept of a generating system of a vector
space and linear dependence or independence. When the number of elements in a
generating list is finite, we call such a vector space “finitely generated”. These are
124 3 The Role of Bases
precisely the vector spaces we discuss in this book. The associated dimension of a
vector space was then simply defined by the number of elements in a basis.
In the remaining chapter, some of the most important advantages resulting from
the use of bases were discussed. Perhaps the most significant is that through bases,
abstract vectors and abstract maps can be expressed by a finite number of scalars.
Thus, bases enable concrete calculations of a theory to be performed and compared
with experiments whose results are essentially numerical.
Bases allow us, in particular, to maintain the geometric character of a linear map.
By choosing suitable tailored bases, one can find the simplest possible representa-
tion matrix of the corresponding map, which in most cases only has entries on the
diagonal that are nonzero. This is the so-called normal form of a linear map, it may
be considered essentially as the fundamental theorem of linear maps.
With the help of bases, this chapter presented the first and probably the easiest
access to tensors. At the end of the chapter, a perhaps surprising application of linear
algebra to classical mechanics was discussed.
Exercise 3.1 The span of a list in a vector space is the smallest subspace containing
this list.
Let .V be a vector space and . A = (a1 , . . . , ak ) a list of vectors in .V . Show that .span A
is the smallest subspace of .V containing all the vectors of the list . A.
Exercise 3.3 If the list .(a1 , . . . , ar ) in a vector space .V is linearly independent and
.v ∈ V , then show that the list .(a1 , . . . , ar , v) is linearly independent if and only if
.v ∈
/ span(a1 , . . . , ar ).
Exercise 3.4 This exercise shows that the length of a spanning list plus one vector
more, is always linearly dependent.
Suppose that the list . Am = (a1 , . . . , am ) of .m vectors spans the vector space .V
(.span Am = V ). Show that any list . Am+1 with .m + 1 vectors (not necessarily con-
taining . Am ) is always linearly dependent.
3.6 From Newtonian to Lagrangian Equations 125
Exercise 3.5 Suppose that the vectors .(a1 , . . . , ar ) are linearly independent. Show
that either .(a1 , . . . , ar ) is a basis in .V or that there are vectors .(ar +1 , . . . , an ) such
that .(a1 , . . . , ar , ar +1 , . . . , an ) is a basis of .V .
Exercise 3.6 Using the extension of a linearly independent list to a basis as in the
previous Exercise 3.5, we obtain the following result.
Show that all bases of a vector space have the same length. This means that if . B(V )
is the set of bases in .V and . B1 , B2 ∈ B(V ), then .l(B1 ) = l(B2 ).
Now we give another definition for the dimension of a vector space which
does not rely on a property of a basis, as usual in literature. Only the notion of
linearly dependent or linearly independent will be used.
and
. dim V := min N(V ).
Note that for finitely generated vector spaces, as we consider them in this book,
the set .N(V ) is nonempty, .N(V ) /= ∅ and if .V /= {0}, then .dim V ≥ 1.
Exercise 3.7 Given the above definition, show that if .dim V = min N(V ) = n, then
the length .l(B) of any basis . B of .V is given by .l(B) = n.
The next two exercises are almost trivial. Using the definition of dimension by
N(V ) might make them even easier to prove.
.
126 3 The Role of Bases
. dim U ≤ dim V.
U = V.
.
Exercise 3.10 Linearly independent list in a vector space .V with length equal to
.dim V .
Let . A = (a1 , . . . , ak ) be a linearly independent list of vectors in .V with .dim V = n.
Show that if .k = n, then . A is a basis of .V .
Exercise 3.11 Spanning list in a vector space .V with length equal .dim V .
Let . A = (a1 . . . , ak ) be a spanning list of vectors in .V with .dim V = n. Show that
if .k = n, then . A is a basis of .V .
→ with λs ∈ K, λ
a λs = A λ
. s → ∈ Kn
.Ψ A : Kn −→ V
es |−→ as .
Exercise 3.14 If . f is a linear map, . f ∈ Hom(Kn , Km ), show that there exist scalars
.ϕs ∈ K with .s ∈ I (n) and .i ∈ I (m) such that for every .v→ = es v s ∈ Kn , .v s ∈ K and
i
⎡ 1⎤ ⎡ 1 s⎤
v ϕs v
⎢ .. ⎥ ⎢ .. ⎥
.F : ⎣
. ⎦ −→ ⎣ . ⎦ holds.
vn s v
ϕm s
Exercise 3.15 The image of a basis already determines a linear map. This ensures
the existence of a linear map as was required in Proposition 3.8.
Let . B = (b1 , . . . , bn ) be a basis of a vector space .V and .(w1 , . . . , wn ) any list of
vectors in a second vector space .W . Show that there exists a unique linear map
. f : V → W such that . f (bs ) = ws ∀ s ∈ I (n).
Exercise 3.17 The preimage of a linear map preserves the linear independence in
the following sense.
Let . f be a linear map . f : V → V ' . Show that the list .(v1 , . . . , vr ) in .V is linearly
independent if the list .( f (v1 ), . . . , f (vr )) in .V ' is linearly independent.
Exercise 3.19 The inverse map of a bijective linear map is also linear.
If the map . f : V → V ' is an isomorphism, show that the inverse map
. f −1 : V ' −→ V
Exercise 3.20 All isomorphic vector spaces have the same dimension.
Show that vector spaces (finite-dimensional) are isomorphic if and only if they have
the same dimension.
is a basis of .V ' .
The following three exercises concern sums and direct sums of a vector space.
Exercise 3.25 Let .U1 , . . . , Um be subspaces of a vector space .V . Verify the follow-
ing results (see Definition 3.12 and the pages thereafter):
(i) .U1 + · · · + Um ≤ V ;
(ii) .U1 + · · · + Um = span(U1 ∪ . . . ∪ Um ) ;
(iii) .dim(U1 + · · · + Um ) ≤ dim U1 + · · · + dim Um .
Exercise 3.26 Equivalent conditions for a direct sum of subspaces of a vector space.
Let .U1 , . . . , Um be subspaces of a vector space .V and .U = U1 + · · · + Um . Show
that the following conditions for a direct sum
U = U1 ⊕ · · · ⊕ Um
.
are equivalent.
3.6 From Newtonian to Lagrangian Equations 129
The next exercises concern another point of view concerning the origin of ten-
sors (Sect. 3.5) and needs some preparation. We have to compare the Cartesian
product with the tensor product. The role of the scalar field is different. For
this comparison we consider the following two exercises.
and the scalar action of the field .K explicitly by the dot .·. So we have, as usual, for
.λ ∈ K and .(u, v) ∈ U × V :
. (λ1 u 1 + λ2 u 2 , v) = λ1 (u 1 , v) + λ2 (u 2 , v) and
(u 1 , λ1 v1 + λ2 v2 , v) = λ1 (u, v1 ) + λ2 (u, v2 ).
In Newtonian mechanics, it turns out that we only need the first Newtonian law to
determine the structure of spacetime.
First Newtonian law: Every body continues in its state of rest or of uniform
rectilinear motion, except if it is compelled by forces acting on it to change that state.
This law refers particularly to a trajectory .x→(t) of a mass point. It corresponds to
the well-known equation of motion without the presence of a force:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 131
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_4
132 4 Spacetime and Linear Algebra
d2
. x→(t) = 0. (4.1)
dt 2
This means that we here postulate the existence of a special reference frame in
which the solutions of the above equation are straight lines with constant velocity
(vanishing acceleration). Without going into details, we realize intuitively that the
space where the movement takes place must be a manifold which contains straight
lines. For instance, vector spaces and affine spaces (see Sect. 2.5) are such manifolds
which may contain straight lines as a subspace. On the other hand, there is not enough
room for straight lines in a sphere or a cube.
. M∼
= E 1 × E 3.
Our intention is not to discuss the physical relevance of the law of inertia but
to compare the physical situation with mathematics, especially with linear algebra.
We hope that this helps appreciate the different roles of mathematics and physics
and their connection. This relation can be demonstrated within linear algebra in a
very transparent way. As we saw, in physics, we have to postulate the existence of
an inertial frame. This is a huge step in understanding our world. Its validity has
4.1 Newtonian Mechanics and Linear Algebra 133
. Aut(V ) = Gl(n).
The group .Aut(V ) consists precisely of those transformations which respect the
linear structure of .V . We here consider first an abstract vector space without further
structure. This can be described by the action of the group .Gl(n) on . B(V ). With the
basis . B = (b1 , . . . , bn ) and .g = (γsi ) ∈ Gl(n); γsi ∈ R; i, s ∈ I (n) = {1, . . . , n}; we
have:
It is well-known that .Gl(n) acts on . B(V ) freely and transitively from the right,
from which follows that the two sets, even having quite different structures, are still
bijective (see Proposition 3.5):
. B(V ) ∼
= Gl(n).
bi j
134 4 Spacetime and Linear Algebra
For this reason, we may call .Gl(n) the structure group of .V . Indeed, the above action
of .Gl(n) on . B(V ) completely characterizes the linear structure on the set .V via the
set of the basis . B(V ) of .V . This is the deeper meaning of the isomorphism between
groups
. Aut(V ) ∼
= Gl(n).
This discussion within linear algebra is the model that significantly clarifies the
corresponding discussion within the Newtonian mechanics and spacetime. So we
can apply the above procedure also in Newtonian mechanics.
As already stated, the transformations which are implied by the law of inertia and
the relativity principle are given by the Galilean group. To simplify, we here consider
the part of the Galilean group which is connected with the identity .G(a.k.a.G ≡
(+) (+)
G 1 ≡ Gal ↑ ). This is the so-called proper orthochronous Galilean group .Gal ↑ . In
an obvious notation, the element .g ∈ G that maps inertial frames to inertial frames
is given by the following expression:
.t | −→ t + s
x→ |−→ x→ ' := R x→ + wt
→ + a→ (4.2)
. I F(M) × G −→ I F(M)
(I F, g) |−→ I F ' := I Fg.
. I F(M) ∼
= G.
bi j
The Galilean group is the structure group of spacetime; this means that we have the
isomorphism of groups:
4.1 Newtonian Mechanics and Linear Algebra 135
. Aut(M) ∼
= G.
That means that the Galilean group ultimately determines the Newtonian spacetime
. M structure: We first see from the above expression for the Galilean transformations
that . M is an affine space with additional structure. We have now to determine and
describe the additional structure in this affine space. It is helpful to return to our linear
algebra model. We learned from the equation of motion in Newtonian mechanics that
our model of spacetime as a vector space .V is not realistic and should be an affine
space. So we have indeed to take the affine space corresponding to the vector space
. V , which we denote by . A given by the triple .(A, V, ψ) as discussed in Sect. 2.5 with
.ψ the action of . V on . A.
ψ : V × A −→ A
.
(v, p) −→ p + v.
This action is by definition free and transitive. We therefore have the bijection
. V ∼
= A.
bi j
This is still not enough. We need additional structures in the space . A. A realistic
and typical example would be to introduce a Euclidean structure to . A , for example,
our three-dimensional Euclidean space. This transforms . A into a Euclidean space
. E. This is done by definition in the corresponding vector space . V (also called the
.τ' = τ
ξ→ ' = wτ →
→ + R ξ. (4.4)
.(1) τ
/
(2) || ξ→ || if τ = 0 (|| ξ→ ||= < ξ|ξ >). (4.5)
. M = (A4 , Δt, || Δ→
x || if Δt = 0).
The invariant .τ = Δt is the duration between two events that characterize the
absolute time as assumed by Newton. The second invariant .|| ξ→ ||=|| Δ→ x || is the
Euclidean distance between two simultaneous events. Therefore, it is clear that the
spacetime . M of the Newtonian mechanics is not a Euclidean or semi-Euclidean or
an affine space with certain scalar products. The reason for this “complication” is
the velocity transformation in Galilean transformations. (Eq. (4.2)).
Despite this, time and space are each separately regarded as Euclidean spaces . E 1
and . E 3 , as considered by Newton. We believe, and hopefully, the reader can also
see, that a good understanding of linear algebra is necessary to clarify the structure
of spacetime completely.
In electrodynamics, both the law of inertia and the relativity principle hold. As in the
previous section, the spacetime of electrodynamics that we denote now by .M is an
affine space . A4 as in Newtonian mechanics but with a different additional structure.
To determine the additional structure, we have to use the laws of electrodynamics.
Following Einstein, we take from electrodynamics only the existence of a photon.
This turns out to be entirely sufficient to determine the structure of spacetime in
electrodynamics. From our experience with Newtonian mechanics, we learned that
we have to search for the relevant invariants. Therefore, it is quite reasonable to use
the velocity of light .c. For this reason, we consider the speed of light (of a photon)
in two different frames of inertia, and we have:
4.2 Electrodynamics and Linear Algebra 137
|| Δ→
x ||2
.In I F with coordinates (t, x→) : c2 = . (4.6)
Δt 2
' 2
' ' x ||
|| Δ→
→ ' ) : c'2 =
.In I F with coordinates (t , x . (4.7)
Δt ' 2
The invariance of velocity of light given by
c' = c
. (4.8)
c2 Δt ' − || Δ→
x ' ||2 = c2 Δt 2 − || Δ→
2
. x ||2 = 0. (4.10)
To proceed, we make an assumption that is, in principle, not necessary but which
simplifies our derivation significantly in a very transparent way. We assume that the
Eq. (4.10) is valid also in the form:
c2 Δt ' − || Δ→
x ' ||2 = c2 Δt 2 − || Δ→
2
. x ||2 = K (4.11)
with . K ∈ R. This means that the invariant . K could also be different from zero.
Defining .Δx 0 := cΔt which has the dimension of length as .Δx i with .i ∈ {1, 2, 3}
and .μ, ν ∈ {0, 1, 2, 3}, Eq. (4.11) takes first the form
'
. x ' ||2 = (Δx 0 )2 − || Δ→
(Δx 0 )2 − || Δ→ x ' ||2 (4.12)
with ⎡ ⎤
1 0 0 0
⎢0 −1 0 0⎥
. S = (σμν ) = ⎢ ⎥. (4.14)
⎣0 0 −1 0⎦
0 0 0 −1
The expressions in Eqs. (4.13) and (4.14) correspond to the relativistic scalar product
(symmetric nondegenerate bilinear form) which is nondegenerate and nonpositive
definite. .Δs 2 is invariant and we have, for example, for the two different frames of
inertia . I F and . I F ' :
Δs 2 (I F ' ) = Δs 2 (I F)
. (4.15)
138 4 Spacetime and Linear Algebra
which shows that .Δs 2 is universal. This means that the spacetime .M of electrody-
namics is an affine space . A4 with a scalar product given by the matrix . S = (σμν ),
a symmetric covariant tensor called also metric tensor or Minkowski Metric ten-
sor. After diagonalization, this tensor has the canonical form given by Eq. (4.14).
Concluding, we may state that the spacetime .M of electrodynamics is given by the
pair
.M = (A , S).
4
(4.16)
Since here the scalar product . S is not positive definite as in the case of a Euclidean
space,.M is here known as a semi-Euclidean or pseudo-Euclidean space or Minkowski
spacetime.
It is interesting to notice that the space .M in electrodynamics is mathematically
much simpler than the space . M in Newtonian mechanics. The space .M is formally
almost a Euclidean space, whereas in. M, as we saw, the invariants are mathematically
not as simple as a scalar product. On the other hand, the physics of .M, the spacetime
of electrodynamics (special relativity), is much more complicated and complex than
the physics of . M, the spacetime of Newtonian mechanics because the duration .τ of
the two events is not any more an invariant. At the same time, .M, the semi-Euclidean
or Minkowski spacetime, is the spacetime of elementary particle physics or simply
the spacetime of physics without gravity. This causes all the well-known difficulties
which enter into relativistic physics.
Now having found the structure of spacetime .M, it is equally interesting to deter-
mine its structure group .G.
. Aut(M) = G. (4.17)
Since we know that the space .M is a semi-Euclidean space, we expect that the
structure group .G consists of semi-Euclidean transformations. So .G is isomorphic
to the well-known Poincaré (Poin) or inhomogeneous Lorentz group. This is a special
affine group similar to a Euclidean group (affine Euclidean group). We may write
. G = Poin and we have in an obvious notation:
or in a matrix form
. x |→ x ' = Δx + a. (4.20)
given by:
T
. O(1, 3) = {Δ, Δ σΔ = σ}. (4.21)
Reference and Further Reading 139
As we see, all the mathematics we used in this chapter belong formally to linear
algebra. After this experience, we may expect that all the mathematics we need for
symmetries in physics also belongs to linear algebra.
Summary
In this chapter, we discussed one of the most important applications of linear algebra
to physics. Starting from two fundamental theories of physics, Newtonian mechan-
ics and electrodynamics, essentially using linear algebra alone, we described the
structure of spacetime.
Using Newton’s axioms, specifically employing the principles of inertia and rela-
tivity, we derived the spacetime structure of Newtonian mechanics, which is famously
associated with the Galilean group.
For the description of the spacetime of electrodynamics, which simultaneously
represents the spacetime of elementary particle physics and essentially the space-
time of all physics if one wants to exclude gravitational interaction, we followed
Einstein’s path: from electrodynamics, we only adopted the properties of the photon,
the elementary particle associated closely with the electromagnetic force. With the
photon, the principle of relativity, and linear algebra, we described the spacetime of
physics without gravity. This spacetime is famously also closely connected with a
group of transformation, the Poincaré group.
One of the most important applications of matrices, is the ability to add, and, under
certain conditions, to multiply matrices. This makes matrices look like numbers,
perhaps something like super numbers. This possibility opens up when you consider
matrices as linear maps. More precisely, it turns out that the composition of maps
induces the product of matrices:
We consider the linear maps . f : U → V and .g : V → W .
Let . X = (u 1 . . . . , u n ) ≡ (u r )n be a basis of .U, Y = (v1 , . . . , v p ) ≡ (vμ ) p a basis
of .V , and . Z = (w1 , . . . , wm ) ≡ (wi )m a basis of .W . For the indices we choose
.r ∈ I (n), μ ∈ I ( p), and.i ∈ I (m). The composition of. f and.g is given by.h = g ◦ f
with .h : U → W .
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 141
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_5
142 5 The Role of Matrices
Suppose now that we do not know anything about matrix multiplication. We want
to define the matrix multiplication to be compatible with the composition of linear
maps, that is, obtain a homomorphism between linear maps and matrices.
The values of . f, g, h at basis vectors are given:
. F ≡ fY X G ≡ gZ Y H = hZX.
Thus we see (Eqs. 5.1, 5.2, 5.3) the homomorphism between linear maps and matrices
underlining the role of bases:
.h Z X = g Z Y f Y X . (5.4)
Remark 5.1 Products of linear maps and matrix multiplications have the same
algebraic properties, so that we have
To make the analysis of the above results in algebraic terminology easier, we change
the notation of matrices to
. A := G B := F C := H, (5.5)
so we have
. A = (αiμ ) B = (βrμ ) C := (γri ). (5.6)
A = (aμ ) p = [a
⎡ 1 .1.⎤. a p ] : a row (1 × p-matrix) with columns (vectors) entries ,
α
. ⎢ .. ⎥
A = (α )m = ⎣ . ⎦ : a column (m × 1-matrix) with rows (covectors) entries .
i
αm
Then for the columns of . A, we have .aμ ∈ Rm and for the rows (covectors) of . A, we
have .αi ∈ (R p )∗ . We have similar expressions for . B and .C in Eq. (5.6) and in Eqs.
(5.8) and (5.9) below. In Eqs. (5.7), (5.8) and (5.9), we also see the block matrix form
of . A, B and .C, with blocks columns or rows. The matrix multiplication is given by
the following map, written as juxtaposition:
This also leads to various aspects of matrix multiplication: Summarizing and using
an obvious notation, we write
For the various components of the product matrix .C, we have, using Eqs. (5.7), 5.8,
and (5.9), the following very compact and transparent expressions:
Equation (5.10) is the standard form of multiplication. This is actually the well-
known low level multiplication.
144 5 The Role of Matrices
Equation (5.11) means that the linear combination of the columns of the first
matrix . A with the coefficients of the .r th column of the second matrix . B gives the
.r th column of the product matrix .C or it means simply that all columns of .C = AB
are linear combinations of the columns of . A. In Eq. (5.11) we see explicitly that the
action of the matrix . A on the column .br gives the column .cr of the product.
Equation (5.12) is analog to rows: the linear combination of the rows of the second
matrix . B with the .ith row coefficients of the first matrix . A gives the .ith row of the
matrix .C. Equivalently, the right action of the matrix . B on the .ith row of the matrix
. A, gives the .ith row of the product.
In Eq. (5.13), for fixed .μ, the product .αμ β μ is the matrix product between an
.m × 1 and a .1 × n-matrix.
The identification between linear maps and matrices provides the notion of rank
for matrices as well. If we denote by . f A the linear map related to the matrix . A, then
we can also define a rank for matrices:
.Kn −→ Km
ξ→ |−→ → := Aξ→ = as ξ s ∈ Km .
f A (ξ)
So we have
This definition corresponds to the column rank (.c rank(A) ≡ rank(A)), as defined in
the next section, Sect. 5.2.
Linear maps not only provide the notion of rank for matrices, they also provide
the corresponding estimate for the rank of the product of two matrices. In an obvious
notation, we define the following vector space and linear maps:
Let
. V := Kn , V ' := K p , V '' := Km ,
f := f B , g := f A , h := FC := g ◦ f,
ḡ := g| im f
f g
. V −→ V' −→ V '' ,
V −→ im f −→ im ḡ.
f ḡ
5.1 Matrix Multiplication and Linear Maps 145
For the vector spaces .V, V ' , V '' and the linear maps . f : V → V ' and .g :
V ' → V '' , the following inequalities hold:
Proof Define .ḡ := g|im f . We consider the image of .ḡ. Then we have:
. im ḡ = im(g ◦ f ) (5.14)
and
. dim(im ḡ) ≤ dim(im f ) and rank(ḡ) ≤ rank( f ). (5.16)
Using the rank-nullity theorem and Eq. (5.16), this leads for the .rank(g ◦ f ) to
and to
. rank(g ◦ f ) ≥ rank( f ) − dim(ker g). (5.22)
146 5 The Role of Matrices
and
. rank(g ◦ f ) ≥ rank( f ) + rank(g) − dim V ' . (5.24)
This corresponds for matrices, using the above notation, to the corollary:
Since we have
c = aμ βrμ ,
. r (5.26)
every column of. AB is a linear combination of the columns of. A. This means that
. span AB ≤ span A
and
.c rank AB ≤ c rank A. (5.27)
Perhaps the most important parameter of a matrix is its rank. We would like to
remember that if we consider a .m × n-matrix . A = (αis ), it is, as we know, just a
rectangular array with scalar (number) entries. We may think that . A is a list of .n
vectors (columns),
. A = (a1 , . . . , as , . . . , an ), as ∈ Rm , s ∈ I (n),
which we write vertically. We may call it a colist, this was already discussed in
Sect. 3.1. ⎛ 1⎞
α
⎜ · ⎟
⎜ ⎟
⎜ · ⎟
⎜ i⎟
.B = ⎜ α ⎟ .
⎜ ⎟
⎜ · ⎟
⎜ ⎟
⎝ · ⎠
αm
If we want to stress that by . A we mean the matrix face of . A (the matrix . A, A ≡ [A])
or analogously for . B, we write
⎡ ⎤
α1
⎢ · ⎥
⎢ ⎥
⎢ · ⎥
⎢ i⎥
. A = [A] = [a1 · · · as · · · am ] or [B] = ⎢ α ⎥ .
⎢ ⎥
⎢ · ⎥
⎢ ⎥
⎣ · ⎦
αm
Usually, we identify . A with .[A] and .[B]. But the list . A is different from the colist . B,
.(A /= B). The elements of the list . A belong to .Rm whereas the elements of the list
n ∗
. B (list . B = α , . . . , α ) belong to .(R ) .
1 m
We have . R(A) ∼ = C(AT ) ≤ (Rn ). Note that . AT is the column face of rows of . A. This
leads to the following definition:
It is further on clear that if we set .t := r rank and .c := c rank, .t is also the number of
linearly independent rows and .c is also the number of linearly independent columns.
We are now able to formulate a main theorem of elementary matrix theory.
Theorem 5.1 Row and column rank, the first fundamental theorem of linear
algebra.
The row rank of a matrix is equal to the column rank: .r rank(A) =
c rank(A), so that .t = c.
Proof The goal is to try to reduce . A to a specific .[t × c]-matrix . Ã, with .t linearly
independent rows and .r linearly independent columns if possible. Without loss of
generality, we choose the first .t rows to be linearly independent and we call the
remaining, the linearly dependent rows, superfluous. Similarly, we choose the first .r
columns to be linearly independent and we call the rest (that is, the linearly dependent
columns) superfluous. For the rows, in order to express the linear dependence, we split
the index.i in. j and.μ. Taking.σ ∈ I (n), i ∈ I (m) , j ∈ I (t) and.μ ∈ {t + 1, . . . , m}
μ
with .ρ j ∈ K, we have:
μ μ j μ μ j
.α = ρ j α and ασ = ρ j ασ . (5.28)
For the columns, we consider the columns and their column rank. According to our
choice above, we need to rearrange and write the .r linearly independent columns
first in the list and we have
.αs λs = 0 or equivalently
αis λs = 0, (5.30)
(αsj λs = 0 and αsμ λs = 0)
for all i ∈ I (n),
Now we throw out the .m − t superfluous rows and we are so left with the
shortened columns which we denote by . Ās , the shortened matrix or list . Ā =
(ā1 , . . . , ās , . . . , ān ).
The point is that the row operation above does not affect the column rank and we
get the equality:
.c rank(ā1 , . . . , ār ) = c rank(a1 , . . . , ar ). (5.32)
This is equivalent to the statement that also the shortened column list .(ā1 , . . . , ār )
is linearly independent like the given list .(a1 , . . . , ar ). This means that we have to
show the assertion:
If
.ās λ = 0 (αs λ = 0, ∀ j ∈ I (t)),
s j s
(5.33)
it follows that
λs = 0 ∀s ∈ I (r ).
. (5.34)
This can be shown as follows: The equations .as λs = 0 or .αis λs = 0 contain the
j
equations .ās λs = 0 or .αs λs = 0, . j ∈ I (t), i ∈ I (m), t < m. So what is left, is to
check the equations
150 5 The Role of Matrices
We may think that the number of rank .(A) is the quintessence of the matrix . A and
indicates the “true” size of . A.
The following sections will justify this point of view.
given by the matrix . F, acting as a linear map . f . We identify . f with . F and, slightly
misusing the notation (see Comment 5.2 below), we have:
5.3 A Matrix as a Linear Map 151
Rn Rm
|| ||
ker f coker f
⊕ ⊕
0→'
coim f ∼ im f, 0→' ∈ im f
=
. f : Rn −→ Rm
x |−→ f (x) := F x.
F x is the matrix multiplication. If . E := (e1 , . . . , en ) and . E ' := (e1' , . . . , em' ) are the
.
canonical bases in .Rn and .Rm , we may also think the matrix . F as a representation
of the map . f with respect to the bases . E and . E ' and in our notation . F = f E ' E ∈
Rm×n . In components (coefficients, coordinates), the map . f is given by the following
expressions:
To reveal the role of .rank f , we would like to point out that given the map
. f ∈ Hom(V, V ' ), with .dim V = n and .dim V ' = m, the subspaces .ker f in .V
and .im f in .V ' are always uniquely defined. If we want to consider two abstract
vector spaces .V and .V ' , that is, without a scalar product and without using any
specific bases, this leads to a nonunique decomposition of .V and .V ' . So we may
have for example:
. V ∼ f im f ⊕ Ω ∼
= ker f ⊕ U1 −→ 1 = V
'
with
V ∼
= ker f ⊕ U2 −→ im f ⊕ Ω2 ∼
f '
= V with
∼ ∼
U1 = U2 = im f or, equivalently,
dim U1 = dim U2 = dim(im f ) = rank f = r.
Ω1 ∼
= Ω2
. ≃ ker f or, equivalently
m − r = dim Ω1 = dim Ω2 /= dim (ker f ) = n − r.
. V ∼ f im f ⊕ coker f ∼
= ker f ⊕ coim f −→ = V '.
In the present case with .(Rn , E) and .(Rm , E ' ), with . E and . E ' the cor-
responding canonical (standard) basis in .Rn and .Rm , the situation is now
5.3 A Matrix as a Linear Map 153
. F T is closely connected with the dot products in .Rn and .Rm . This gives
g
Rm −
. → Rn ,
w |−→ g(w) := F T w.
So we get
f ≡F
Rn −→ Rm ,
.
g≡F T
. Rn ←− Rm .
Now we have .im g = C(F T ). This is the row space of . F in form of columns and it
is a subspace of .Rn . The null space of . F T N (F T ) = ker g is a subspace of .Rm . As
we already showed in Theorem 5.1, .dim C(F T ) = dim C(F) = rank(F) = r . So we
also obtain
. dim(im f ) = dim(im g) = rank f = r. (5.41)
In addition, we can show that .im g = C(F T ) and .ker f = N (F) are not only
complementary but also orthogonal:
Given .z ∈ N (F) and .v ∈ C(F T ), we have with .w ∈ Rn :
. F z = 0 and . F w = v. The transpose of the last equation is given by
T
v T = w T F.
. (5.42)
or
. ker f ⊥ im g. (5.44)
Similarly, we obtain
.C(F) ⊥ N (F T ) (5.45)
It is clear that now .coim f = C(F T ) and .coker f = N (F T ) are uniquely deter-
mined. It is interesting that in this case we obtain with .Rn and .Rm not only uniquely
the decomposition
Rn = ker f ⊕ coim f
. f
−→ im f ⊕ coker f = Rm , (5.46)
but even more: the unique orthogonal decomposition denoted by the symbol .Θ:
Rn = ker f Θ coim f
. f
−→ im f Θ coker f. (5.47)
Rn = ker F Θ im F T −→
. F im F Θ ker F T = Rm . (5.48)
For good reasons, this theorem may be considered as the second fundamental theorem
of elementary linear algebra. It states more precisely the situation at the beginning
of this section, written as:
The result of the above theorem may be represented symbolically by Fig. 5.1.
will be a useful preparation for the presentation of linear maps too. Section 3.2 gives
a in-depth knowledge, but what follows here is self-sufficient.
We start with an abstract vector space .V with .dim(V ) = n, and, as we already
know, we need a basis . B = (b1 , . . . , bs , . . . , bn ) in order to construct the appropriate
representation. This provides the following basis isomorphism:
.φ B :V −→ Kn ,
bs |−→ φ B (bs ) := es s ∈ I (n), (5.49)
⎡ 1⎤
v
⎢·⎥
.or v | −→ φ B (v) := ⎢ ⎥
⎣ · ⎦ = v B = v→B . (5.50)
vn
or
φ B (B) := E.
.
156 5 The Role of Matrices
Comment 5.3 Comparison of the use of bases with the theory of manifolds.
– .B = (b1 , . . . , bn ), a list,
– .[B] = [b1 · · · bn ], a matrix,
– .φ B , a chart,
– .ψ B , a parametrization,
⎡ 1⎤
v
⎢ .. ⎥
and we write .v = ψ B (→
v ) = [B]→
v = [b1 · · · bn ] ⎣ . ⎦.
vn
We might now ask ourselves what the main purpose of a basis in practice is. With
a given basis, we can replace an abstract vector space by a concrete vector space,
and since this consists of number lists, we can calculate not only with vectors, but
we can also send these vectors elsewhere, for example, from Nicosia to Berlin. This
is possible if the observers at different positions previously agreed on the basis to be
used. So we can think that we gained a lot. But what is the price for this gain?
We lose uniqueness: any other basis yields quite different values, coordinates for
.v. So we actually need to know how to go from a basis . B to any other basis .C.
In Sect. 3.2, we learned that the best way to think about the abstract vector .v is to
present .v with all its representations simultaneously. This means to consider all the
bases, . B, C, D, ..., at the same time. But concretely, it is actually enough to use only
one more basis, for example .C, and to determine the transition from . B to .C. This
means, we use the set of all bases . B(V ) (think of relativity!). Thus, we can think that
we can present this .v with all its representations simultaneously and we know that
we can reach and use all the bases in .V , as shown and discussed in Sect. 3.2. We can
describe this, using the bases . B and .C, in the commutative diagram in Fig. 5.2:
We choose a new basis.C = (c1 , . . . , cn ) and the corresponding basis isomorphism
−1
.φC . The transition map is given by . TC B = φC ◦ φ B which we identify immediately
μ μ
with the matrix.T := TC B = (τs ), τs ∈ K. In what follows, we take.μ, ν, s, r ∈ I (n).
The matrix .T is invertible (so that .T ∈ Gl(n)), so we have .T T −1 = 1n and .T −1 T =
1n , with .T −1 = (τ̄μs ), or equivalently
τ r τ̄ μ = δsr and
. μ s
Since, with our smart indices, we distinguish clearly .μ, ν, from .s, r , we may write
τ̄ s ≡ τμs .
. μ
In this sense, .τ̄μs is a pleonasm and is used only if we want to emphasize that .τμs
belongs to .T −1 . It is unfortunate that the notation can be confused with the transpose
T −1
. T , and one should notice that generally, . T /= T T .
Now, we can describe the change of basis also by this definition:
(i) μ
v = bs v sB == cμ τsμ v sB == cμ vC
!
.
μ
=⇒vC = τsμ v sB
or v = bs v sB ∀ v B ∈ Kn ,
we obtain
b = cμ τsμ
. s
It is interesting to realize that the matrix .T acts on the basis from the right
and on the coefficient vectors from the left:
This is also an example for the discussion in Sect. 1.3, particularly for Remark 1.3.
idV
V V
φB φC
RnB RC
idC B
and we have
idC B ◦ φ B = φC ◦ idV .
.
5.4 Vector Spaces and Matrix Representations 159
The meaning of .TC B is .TC B ≡ idC B . This is what we call in physics a pas-
sive symmetry transformation. It corresponds to a coordinate change while the
physical system remains fixed.
It is clear that the set of all representations of .V is given by the set of bases
in .V, B(V ) = {B}. If we consider the set of all bases in .V , every basis . B
leads to a basis isomorphism .φ B or to a linear chart .(V, φ B ). It is also called
a representation of the vector space .V by .n × 1-matrices or by columns of
length .n, or by the coefficient vectors in .Kn .
As discussed in Eqs. (5.49) and (5.50) at the beginning of this section, if we want,
we can also use the same notation for all these representations with the letter . M (like
matrix) and so get for every . B ∈ B(V ) the isomorphism
. M : V −→ Kn ≡ Kn×1 ,
In this sense, we here use a “universal” notation. In the next section, we shall see that
we may use the same universal notation for the representation of linear maps too.
In this section, we use a similar notation and apply the results of the previous sections.
For the representation of linear maps, we return now to the general case where
'
. V and . V are vector spaces without further structure. In order to describe a given
linear map
' '
. f : V −→ V , f ∈ Hom(V, V ),
f
V V'
φB φB'
RnB RmB '
F
So we have .φ B (bs ) = es and .φ'B (bi' ) = ei' with .s ∈ I (n) and .i ∈ I (m). The map
. f is, according to Proposition 3.8 in Sect. 3.3, determined uniquely by the values of
the basis . B:
' i
. f (bs ) = bi ϕs , ϕis ∈ K. (5.51)
f
. B' B ≡ F := (ϕis ). (5.52)
As mentioned in the previous Sect. 5.4, .(φ B , V ) and (.φ B ' , V ' ) are what is called in
the theory of manifolds (local) charts. In linear algebra, these are of course global
charts. Equations (5.51) and (5.52) exposed in the form of a diagram as above, give:
f
bs f (bs ) = bi' ϕis
φB φB' φB'
So we have from .φ B (bs ) = es , .φ B ' (bi' ) = ei' , using linearity and Eqs. (5.51) and
(5.52),
This justifies the equations in the above diagram and the following correspondence:
bs f bi
. B→ → → B'
es F ei'
This shows, stated in simple words, that we can perfectly describe what happens at
f F
the “top” level of .V −→ V ' , at the “bottom” level of .Rn −→ Rm . This is the essence
of the representation theory of linear maps. Using here in addition the universal
notation with the symbol . M, we can write
F
v |−→ w
. = F(v),
M( f )
M(v) |−→ M(w) = M( f )M(v)
or v→B |−→ w
→ B' = F v→B . (5.61)
162 5 The Role of Matrices
So we have
w 'B ' = ϕis ν Bs ' , wiB , ν Bs ∈ K
. (5.63)
which is
.→ 'B ' = F v→B
w (5.64)
Since in the above diagram, we already have the equation . Fes = ei' ϕis , it is
obvious that
. Fes = f s . (5.66)
This means that the .sth columns of the matrix . F which represents the map . f , is
the value of the canonical basis vectors .es ∈ Kn . In addition, it means that the .sth
column of . F gives the coefficients of the value of the .sth basis vector .bs in .V . These
coefficients correspond to the basis . B ' in .V ' , as expected.
At this stage, we have to explain how . F changes by transforming the basis . B into
the basis .C in .V and the basis . B ' into the basis .C ' in .V ' . For simplicity’s sake we
call the new matrix . F̄ := f C ' C and so have to consider the transition from . F to . F̄.
This is given by the following proposition.
From the bases . B and .C in .V and . B ' and .C ' in .V ' , the matrix . FC ' C of . f is
given by . FC ' C = T ' FB ' B T −1 where the corresponding transition matrices
are given by
The second diagram leads to . f C ' C ◦ T = T ' ◦ f B ' B and to . f C ' C = T ' ◦ f B ' B ◦ T −1 .
We can also achieve the above result directly using only the tensor formalism: Putting
the right indices in the right place, we obtain
j
f B ' B ≡ FB = (ϕis ) f C ' C ≡ FC = (ηi ).
j
We start with . FB = (ϕ)is and we want to obtain . FC = (η)r , using .T = (τrs ) and
' 'j
. T = (τi ). We so obtain
'j s i 'j i s
.ηr = τi τr ϕs = τi ϕs τr .
j
(5.67)
This corresponds to
. FC = T ' FB T −1 . (5.68)
Usually, if the involved bases are evident from the context, they are often not included
in the notation. Here, seizing this opportunity, we would like to express all the relevant
isomorphisms by the same letter . M, using in a way a universal notation:
164 5 The Role of Matrices
or
. M(w) = M( f ) M(v), (5.70)
and
w
→ = F v→.
. (5.71)
We now turn to perhaps the very first applications of linear algebra, which is also one
of the most important: solving a system of linear equations. With the results from
the previous chapters, tackling this problem and providing the corresponding proofs
is quite straightforward.
(iv) The homogeneous equation system has only the trivial solution:
L(A) = {0}.
.
So we have
L(A, b) = pb + L(A).
.
This means that .L(A, b) is an affine space with the corresponding subspace
.L(A) ≤ Kn , with
. dim L(A) = n − rank A.
If we take the opportunity and utilize tailor-made bases for the map
. f A : Kn → Km ,
5.6 Linear Equation Systems 167
[ ]
1r 0
. Ã = ∈ Km×n .
0 0
. Ã x˜b = b̃
Proof We have
. Axb = b ⇔
F Axb = F ' b ⇔
'
[ ]
b̄
F ' Axb = with b̄ ∈ Kr and
0
[ ]
b̄
b̃ := F ' b = . (5.72)
0
x̃ := F −1 xb ⇔ xb = F x̃b ,
. b (5.73)
168 5 The Role of Matrices
leads to
. Axb = b ⇐⇒
'
F AF x̃b = b̃ ⇐⇒
F AF F −1 xb = F ' b ⇐⇒
'
[ ]
1r 0
x̃ = F ' b.
0 0 b
This leads to [ ]
b̃
x̃ =
. b .
λ
Summary
What is a matrix? This is where we started. By the end of this chapter, hopefully the
following became clear: a matrix is what you can do with it! In short, linear algebra
is what you can do with matrices.
Initially, we focused on matrix multiplication and its relationship to the com-
position of linear maps. We explored the four facets of matrix multiplication. A
fundamental theorem of linear algebra is the fact that the row rank of a matrix equals
its column rank. Many aspects related to this theorem are also considered in the
exercises.
The various roles of a matrix, as a linear map, as a basis isomorphism (as a supplier
of coordinates), as a representation matrix of abstract vectors and linear maps, along
with the associated change of basis formalism, were discussed extensively.
5.6 Linear Equation Systems 169
This led to the fundamental theorem of linear maps which relates the kernels and
images of a matrix with those if its transpose.
In the next five exercises, you are asked to prove once more the fundamental
theorem of linear algebra which yields the equality of row and column ranks
of a matrix. All these proofs, for which one needs only very elementary facts,
show very interesting aspects of the structure of matrices.
and that
. rank AT ≤ rank A,
which leads to
. rank AT = rank A,
and thus to
row rank A = column rank A.
.
. T ≡ TC B = (τsi )
as given in Comments 5.4 and 5.5, can also be directly expressed by the coordinates
of the basis vector .bs , s ∈ I (n) of the basis . B = (b1 , . . . , bn ) in the vector space .V .
Show that the matrix .TC B in the notation of Comments 5.4 and 5.5 is given by
T
. CB = [φC (b1 ) . . . φC (bn )].
T
. BC = [φ B (c1 ) . . . φ B (cn )].
Exercise 5.8 The representation of a linear map . f ∈ Hom(V, V ' ) by using the
canonical basis isomorphism between .V and .Kn (.V ' and .Km ) with basis . B =
(b1 , . . . , bn ) in a vector space .V (. B ' = (b1' , . . . , bm' ) in .V ' ) can be expressed
symbolically as follows:
f
bs bi' ϕis
es ei' ϕis
F
(i) . f is invertible;
(ii) The columns of the matrix . M( f ) = F are linearly independent in .K;
(iii) The columns of . F span .Kn .
(iv) The homogeneous equation system has only the trivial solution.
172 5 The Role of Matrices
n − rank A.
(ii) If . pb is a solution of . Ax = b, then all solutions of . Ax = b are given by:
So we have
L(A, b) = pb + L(A).
.
This means that .L(A, b) is an affine space with the corresponding subspace .L(A) ≤
Kn . With .r = rank A, we have
. dim L(A) = n − r.
As mentioned in Sect. 5.3, a linear map . f between .Kn and .Km determines uniquely
the four subspaces .ker f, coim f ≤ Kn , .coker f , and .im f ≤ Km which give essen-
tial information about this map.
If we consider . f between two abstract vector spaces .V and .V ' with no additional
structure, .coker f and .coim f are not uniquely defined. If we want subspaces fixed
similarly as for . f ∈ Hom(Kn , Km ), we have to additionally consider the dual spaces
of .V and .V ' , .V ∗ and .V '∗ . This is what we are going to show in what follows.
Beforehand, we have to note that dual spaces play a unique role in linear algebra and
a crucial role in many areas of mathematics, for example, in analysis and functional
analysis. They are essential for a good understanding of tensor calculus; they appear
in special and general relativity and are ubiquitous in quantum mechanics. The Dirac
notation in quantum mechanics is perhaps the best demonstration of the presence of
dual spaces in physics.
.ξ : V −→ K,
v |−→ ξ(v),
so that .ξ is a linear map between the vector spaces .V and .V ' = K. We choose the
basis . B = (b1 , . . . , bn ) of .V and . B ' = (1) of .K and the matrix representation of .ξ
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 175
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_6
176 6 The Role of Dual Spaces
ξ
. B' B ≡ [ξ1 . . . ξn ] ≡ (ξi )n ≡ (ξ→ )T ≡ M(ξ ) ∈ K1×n
ξi ∈ K, ξ→ ≡ (ξ i ) with ξ i = ξi and i ∈ I (n).
εi (e j ) = δ ij εi ≡ ei∗ = eiT
. ∈ (Kn )∗ = K1×n ,
for example .ε1 = [10 · · · 00], · · · , εn = [00 · · · 01]. It should be clear that any
linear map, here the linear function .ξ ∈ V ∗ = Hom(V, K), can also have the
6.1 Dual Map and Representations 177
Proposition 6.1 shows also the following: For every basis . B in .V , there exists
a dual isomorphism .Ψ B from .V to .V ∗ which can be regarded as giving .V the
vector structure of .V ∗ . This isomorphism is given exactly by the dual basis . B ∗ .
Using the notation of Proposition 6.1, we have
Ψ B :V −→ V ∗
.
Φ B ∗ : V ∗ −→ (K n )∗ ≡ K1×n ,
.
β i |−→ Φ B ∗(β i ) := εi .
So we have for .ξ = ξi β i
Φ B ∗(ξ ) = ξ B ' B .
.
178 6 The Role of Dual Spaces
As we already saw, the basis . B determines similarly the coordinate .ξi of the
covector .ξ ∈ V ∗ in the cobasis . B ∗ = (β 1 , . . . , β n ). Taking .ξ = ξi β i , ξi ∈ K,
we have
j
.ξ(bi ) = ξ j β (bi ) = ξ j δi = ξi .
j
We can think that, besides the category of vector spaces, there is also the
category of covector spaces, that is, that there exists to each .V the associated
∗
. V . Analogously, we can think that to each linear map . f there exists the
associated dual linear map . f ∗ .
. f ∗ (η + θ ) = (η + θ ) ◦ f = η ◦ f + θ ◦ f = f ∗ η + f ∗ θ and
f ∗ (λη) = λη ◦ f = λ(η ◦ f ) = λ f ∗ η.
. f + g |−→ ( f + g)∗ = f ∗ + g ∗
λ f |−→ (λ f )∗ = λ f ∗ .
A direct comparison between . f and its dual . f ∗ shows that . f ∗ points to the
opposite direction:
f
. V −→ V ' ,
f∗
V ∗ ←− (V ' )∗ .
(ξ, v) := ξ(v), ξ ∈ V ∗ , v ∈ V.
.
Thus the above definition . f ∗ η(v) = η( f v) can also take the form:
( f ∗ η, v) = (η, f v).
.
then
.(g ◦ f )∗ (η) = η ◦ (g ◦ f ) = (η ◦ g) ◦ f
( ) (
= g ∗ (η) ◦ f = f ∗ g ∗ (η) = f ∗ ◦ g ∗ (η) ∈ U ∗ .
. ((g ◦ f )∗ η, v) =
= (η, (g ◦ f )v) = (η, g( f v)) =
= (g ∗ η, f v) = ( f ∗ (g ∗ η), v) = (( f ∗ ◦ g ∗ )η, v) ⇒ (g ◦ f )∗ η = ( f ∗ ◦ g ∗ )η.
f
V W
f ∗ η=η◦ f η
K
The representation of the dual map . f ∗ is, as expected, deeply connected with the
representation of . f : . M( f ∗ ) = M( f )T !
Proof For this proof, we use only the corresponding bases and cobases. We have
β r (bs ) = δsr
. r, s ∈ I (n) and
γ (c j ) =
i
δ ij i, j ∈ I (m)
We define
. f ∗ (γ i ) = β r χri . (6.3)
. F ∗ := f B ∗ C ∗ = (χri ). (6.4)
6.2 The Four Fundamental Spaces of a Linear Map 181
. =⇒ χsi = ϕsi .
As we saw in Sect. 5.3, the matrix . F ∈ Hom(Kn , Km ) determines uniquely the four
subspaces
T T
. ker F, im F ≤ K and im F, ker F ≤ K
n m
(6.7)
which give important information about the map . F. For . f ∈ Hom(V, V ' ) this is not
possible if .V and .V ' have no additional structure. The reason is that only .ker f and
.im f are uniquely defined by . f , but .coim f and .coker f are not uniquely defined by
' '
. f . Only if we choose bases . B and . B for . V and . V , the complements of .ker f and
'
.im f are also fixed by . f and .(B, B ). So we may write:
. V ∼ f im f ⊕ coker f ∼
= ker f ⊕ coim B f −→ B = V '. (6.8)
We need . B and . B ' since, as mentioned, when .V, V ' are abstract vector spaces, we
do not possess anything analogous to . F T as in the case when we consider .Kn and
'
.K . As we shall see later in Sect. 6.3, if . V and . V are Euclidean or unitary vector
m
ad T
spaces, the adjoint . f will play the role of . F and we can find a basis-free version
of .coim B f and .coker B f , induced directly from . f .
Here, with .V and .V ' abstract vector spaces without further structure, if we want
to find from . f induced a kind of basis-free decomposition of .V and .V ' , we have to
make use of the dual point of view and consider . f ∗ ∈ Hom(V '∗ , V ∗ ).
As we already know, for a given . f the dual . f ∗
. f ∗ : V '∗ −→ V ∗ ,
η |−→ f ∗ (η) := η ◦ f. (6.9)
182 6 The Role of Dual Spaces
is uniquely determined. Now the subspaces .im f ∗ ≤ V ∗ and .ker f ∗ ≤ V '∗ are also
uniquely determined by . f ∗ . These two subspaces, .im f ∗ , ker f ∗ , which correspond
to .coim B f and .coker B f , are a kind of substitute for .im F T and .ker F T , respectively.
So we get the big picture for . f as given by the proposition in form of a diagram:
. V∗ ∼
= coker B f ∗ ⊕ im f ∗ f∗
coim B f ∗ ⊕ ker f ∗ ∼
= V '∗
←
∼ B
=↑ B ' ↑∼
=
V ∼
= ker f ⊕ coim B ( f ) →
im f ⊕ coker B ( f ) ∼
= V '.
f
(6.10)
Proof The proof is obtained straightforwardly almost by inspection, using the dual
bases for .V and .V ' . We may also write symbolically for the uniquely defined sub-
spaces:
f∗
· · · ⊕ im f ∗ ← · · · ⊕ ker f ∗
. →
ker f ⊕ · · · f im f ⊕ · · · .
. im f ∗ ∼ = coim B ( f ) ∼
= im f ∼ = coim B ( f )∗ (6.11)
. ker f ∼
= coker B ( f )∗ and ker f ∗ ∼
= coker B ( f ). (6.12)
Proposition 6.4 may also be considered as a synopsis of the results of the second
fundamental theorem of linear algebra (see Theorem 5.2) for the general case of an
abstract vector space.
If we use the notation of an annihilator, further results are obtained.
. W ∗ := span((β 1 , . . . , β l ) = {λs β s : λs ∈ K}
is a subspace of .V ∗ (W ∗ ≤ V ∗ ) and .W ∗ ∼
= W . We also notice that .W ∗ annihilates .U :
so that .W ∗ is a subspace of .U 0 :
. W ∗ ≤ U 0. (6.13)
. w = μ j α j + λs β s , μ j , λs ∈ K (6.14)
and
!
w(ai ) = 0 for all i ∈ I (k).
. (6.15)
j
= μ j δi + 0 =
!
= μi = 0. (6.16)
This leads to
.w = λs β s ∈ U 0 . (6.17)
. im f ∗ = (ker f )0 . (6.19)
This follows directly by setting .ker f = U and using tailor-made bases in the
proof of Proposition 6.5. Here, we give a basis-independent proof for Eq. (6.18):
Proof In Eq. (6.13), we have
f∗
V ∗ ←− V '∗ ≥ ker f ∗
V −→ V ' ≥ im f.
.
f
We have to show
(a) .ker f ∗ ≤ (im f )0 and
(b) .ker f ∗ ≥ (im f )0 .
which gives .ker f ∗ = (im f )0 .
For (a): We consider the sequence of the following implications:
η ∈ ker f ∗ ⇒ f ∗ η0 = 0∗ ∈ V ∗
. 0
. f ∗ η0 (v) = 0∗ (v) = 0 ⇒ η0 ( f v) = 0,
⇒ η0 ( f V ) = 0 or η0 (im f ) = 0
So (a) is proven.
For (b): We start with .θ ∈ (im f )0 . Then we have for all .v ∈ V
.θ ( f v) = 0 and f ∗ θ (v) = 0.
∎
6.3 Inner Product Vector Spaces and Duality 185
Proof (i) If . f is injective, then we have .ker f = {0}. Using the propositions 6.5
and 6.6, we obtain
∗ ∗
.(ker f ) = V = dim V = n
0
and so
. dim(im f ∗ ) = dim(ker f )0 = n,
and so
. im f ∗ = V,
thus . f ∗ is surjective.
(ii) Similar to the proof for (i).
∎
We consider a linear map. f ∈ Hom(V, V ' ) between.V = (V, (|)) and.V ' = (V ' , (|)),
two inner product vector spaces. In our approach, we always mean a finite-
dimensional vector space by an inner product vector space, usually a Euclidean
or unitary vector space. Here we obtain the same picture of the four relevant sub-
spaces of . f as in the case . f ≡ F ∈ Hom(Kn , Km ). The role of the transpose . F T
(giving .ker F T and .im F T ) is taken over now by the adjoint map . f ad of . f . We will
discuss this new linear algebra notion in this section too. The existence of adjoint and
self-adjoint operators is omnipresent in physics. Especially for quantum mechanics,
it is interesting to note that self-adjoint operators describe the physical observables
on a Hilbert space. It is well-known that for finite dimensions, the notion of a Hilbert
space is equivalent to that of a unitary space.
So here we have the opportunity to look at finite dimension spaces first which
are much easier than infinite dimensioned spaces, to understand the Hilbert space
structure and observe its geometric significance.
The existence of an inner product in .V allows a second dual isomorphism which
is basis-independent.
186 6 The Role of Dual Spaces
. f : V −→ V ' ,
v |−→ f (v),
f (u + v) = f (u) + f (v) and for λ ∈ C,
f (λv) = λ̄ f (v) or f (λv) = λ f (v). (6.22)
It is clear that for a Euclidean vector space, the semilinear map . j is a linear map.
f∗
V∗ . V '∗
j J
f ad
V V'
The isomorphism . j and . J are the corresponding dual isomorphisms as given in Eqs.
(6.20) and (6.24) below:
J : V ' −→ V '∗
.
and we obtain
. j ◦ f ad = f ∗ ◦ J (6.25)
or equivalently
. f ad = j −1 ◦ f ∗ ◦ J. (6.26)
. f ad : V ' −→ V,
w |−→ f ad (w).
We can obtain the analytic expression (6.23) for . f ad from (6.25) as follows:
.( j ◦ f ad )(w) = ( f ∗ ◦ J )(w)
⇔ j ( f ad w) = f ∗ (J w) ∈ V ∗
⇔ ( j ( f ad w))(v) = J (w)( f v) ∈ K for any v ∈ V and w ∈ V '
⇔ ( f ad w|v) = (w| f v)
⇔ (v| f ad w) = ( f v|w) (6.27)
The additivity is quite clear because for .v ∈ V , .w ∈ V ' . The semi lin-
earity follows from
∎
(iii) .ad is an involution: .( f ad )ad = f . For any .w ∈ V, v ∈ V ,
(w | ( f ad )ad v) = ( f ad w | v) = (w | f v).
. (6.29)
∎
(iv) .(g ◦ f )ad = f ad ◦ g ad .
Using an obvious notation as above for . j, J and . K , and (6.26), we obtain
.(g ◦ f )ad = j −1 ◦ (g ◦ f )∗ ◦ K = j −1 ◦ f ∗ ◦ g ∗ ◦ K ⇒
(g ◦ f )ad = j −1 ◦ f ∗ ◦ J ◦ (J −1 ◦ g ∗ ◦ K ) = f ad ◦ g ad . (6.30)
We are now in the position to determine the relation of .ker f ad and .im f ad to .im f
and .ker f :
Then:
(i) .ker f ad = (im f )⊥ .
(ii) .im f ad = (ker f )⊥ .
6.3 Inner Product Vector Spaces and Duality 189
. ⇔ f ad y = 0
⇔ (v| f ad y) = 0 ∀v ∈ V
⇔ ( f v|y) = 0 ∀v ∈ V
⇔ y ∈ (im f )⊥
. im f ad = (ker f )⊥ .
This result shows that . f and . f ad lead uniquely through “.ker” and “.im” in an orthog-
onal decomposition of .V and .V ' .
In this way, we obtained for a general . f ∈ Hom(V, V ' ), with the inner product
vector spaces .(V, (|)) and .(V ' , (|)' ), the same connection with the four . f -relevant
subspaces as with . F ∈ Hom(Rn , Rm ) in Theorem 5.2. Therefore, it may be consid-
ered as another face of the same theorem:
Theorem 6.1 The fundamental theorem of linear maps for inner product vec-
tor spaces.
Any map . f : V → w decomposes as follows:
f
. V = ker f Θ im f ad −→ im f Θ ker f ad = V ' .
. F ad = F † where (F † := F̄ T ).
j
We obtain from . f va = wi ϕai ϕa ∈ K,
.(vb | f ad wi ) = χib ,
(vb | f ad wi ) = ( f vb | wi ) = (wi | f vb ) = ϕbi .
The comparison with the last two equations leads to the result
χ b = ϕbi .
. i
Quantum mechanics is done in a Hilbert space . H , that is, the realm of quantum
mechanics. Here, we consider finite-dimensional vector spaces and therefore we
also consider finite-dimensional Hilbert spaces. An .n-dimensional Hilbert space
is a .C vector space with inner product . H = (V, (|)), dim V = n. If we choose a
orthonormal basis .C = (c1 , . . . , cn ), then we have the following isomorphism:
. H∼
= Cn .
So we can identify the Hilbert space . H with .Cn . The inner product here is also called
a Hermitian product, and .(|) is noting else but the Dirac Bra Ket! But Dirac goes one
step further and decomposes the BraKet .(|) into two maps .((|) → (||) → (| and |))
which is .|) = id H ∈ Hom(H, H ):
|) : H −→ H,
.
(| : H −→ H ∗ ,
.
v |−→ (v|
with
(v| : H −→ C,
.
u |−→ (v|u) = v † u.
So the result is that we have in fact .| v) = v and .(v | /= v, definitively. So the new
object is only the map .(| ∈ Hom(H, H ∗ ): However, this is nothing else but the well-
known canonical isometry between . H and . H ∗ (see Chap. 11.2 and also the canonical
dual isomorphism as in Definition 6.3). At this point, to facilitate our intuition, we
prefer considering a real vector space. So we set now . H = Rn ≡ Rn×1 . This is no
restriction for the following considerations.
We notice immediately that because of the equation .(v|u) = v T u, the equality
T ∗
.(| = (·) : H → H holds too. Thus, the transpose.T, when restricted to. H , is nothing
else but the map ."Bra" = (| taken from Bra Ket. So we have just to call the symbol
192 6 The Role of Dual Spaces
.|) = id H , Ket, as Dirac did. But what is the difference between the transpose .T and
Bra? Bra is only defined on . H while the transpose .T is defined on . H as well as on
∗
. H . So we have the well-known relations:
T : Rn −→ (Rn )∗
.
This means that when we are using .| v), we see .v ∈ Rn but explicitly never
n ∗ ∗
.ξ = ξv ∈ (R ) . This facilitates the identification of . H with . H . In coordinates
(coefficients), using the canonical basis . E = (e1 , . . . , en ) and the canonical cobases
∗
. E = (ε , . . . , ε ), (ei |es ) = δis , ε (es )) = δs , i, s ∈ I (n), we have
1 n i i
.| v) = v = es v s v s ∈ R and
(v | = v T = vi εi with vi = v i .
.
∑
n
T
. (v|u) = v̄ u = v̄ i u i = vi u i .
i=1
Then, .(| corresponds to the conjugate transpose .† and we can write for .v ∈ H :
|v) = ei v i , v i ∈ C and
.
(v| = vi εi with vi = v̄ i .
As we know, the symbol .(|) denotes the Hermitian product for a .C vector
space which is a sesquilinear (“one and a half linear”) map.
(|) : H × H −→ C.
.
6.4 The Dirac Bra Ket in Quantum Mechanics 193
∑
n
. A= αsi | ei )(es |.
s,i
∑n
Remark 6.11 The extended identity . i=1 | ei )(ei |.
∑n
According to remark 6.9, the expression. i=1 |ei )(ei | can be identified with
id H or id H ∗ .
.
The present Dirac formalism leads also directly, as in Sect. 5.1, to the expres-
j
sion for matrix multiplication. With. A = (αsi ), B = (βr ), C = (γri ), i, j, r, s ∈
I (n), using Remark 6.10, we get:
This leads to
γ i = α ij βrj .
. r
∎
194 6 The Role of Dual Spaces
Summary
The role of the dual vector space and the dual map was thoroughly discussed. This
is an area that is often neglected in physics. The last section of this chapter on Dirac
formalism in quantum mechanics illustrates that it doesn’t have to be the case. Dual
maps in situations where only abstract vector spaces are available serve as a certain
substitute for adjoint maps, which, as we have seen, are defined on inner product
vector spaces.
Here, we also observed the dual version of the four fundamental subspaces of a
linear map. The annihilator of a subspace, as a subspace in the dual space, also played
an important role. We showed that an inner product space is naturally isomorphic to
its dual spaces.
Following this, within duality of inner product vector spaces, the corresponding
adjoint to a given linear map, was introduced. The four fundamental subspaces are
naturally most visible in the inner product space situation using the adjoint map.
Finally, as mentioned, Dirac formalism was addressed.
. T : Km×n −→ Kn×m
A |−→ AT
In the following two exercises, we see explicitly the role of the dual space
.(Km )∗ in determining the rows of a matrix . A ∈ Km×n . Similarly, we see that
n ∗ T
.(K ) determines the rows of . A .
Exercise 6.3 We consider the matrix . A ∈ Km×n as a linear map . A ∈ Hom(V, V ' )
with .V = Kn and .V ' = Km and we write:
⎡ 1⎤
α
' ' ⎢ .. ⎥
. A = (αs ) = [a1 . . . an ] = ⎣
. ⎦,
i
αm
6.4 The Dirac Bra Ket in Quantum Mechanics 195
where
Show that
a = AT ei' and α ' = εs AT .
s
. i
Having more advanced tools in this chapter, it will be even easier to prove
the row-column-rank theorem in Exercise 6.6. Beforehand, let us recall the
connection between the .im f (with . f ∈ Hom(V, V ' )) and the column rank of
. M( f ) in Exercise 6.5.
Exercise 6.5 Let . f be a linear map . f ∈ Hom(V, V ' ). Show that .dim im f equals
the column rank of . M( f ), the matrix of . f .
Exercise 6.6 The row-column-rank theorem.
Consider the linear map . f = A ∈ Hom(Kn , Km ), then show that the row rank of . A
equals the column rank of . A.
Exercise 6.7 The form of any rank equal .1 matrix.
Use the experience made with the proof in Exercise 5.4 to show that for any matrix
.A ∈ R , the rank of . A is .1 if and only if the matrix . A is of the form . A = wξ with
m×n
n ∗
.w ∈ R and .ξ ∈ (R ) . In this case, we can also write:
m
. A = wv T = |w)(v|.
Exercise 6.8 The dual basis covectors select the coordinates of the vectors in .V
and the basis vectors select the coordinates of covectors in .V ∗ .
If .(b1 , . . . , bn ) is a basis in .V and .(β 1 , . . . , β n ) its dual basis, so that .(β s (br )) =
δrs , s, r ∈ I (n), show that for any vector .v ∈ V and any covector .ξ ∈ V ∗ ,
196 6 The Role of Dual Spaces
(i) .v = bs β s (v) ∈ V ;
(ii) .ξ = ξ(bs )β s ∈ V ∗ .
Exercise 6.9 There alway exists an element of the dual space which annihilates any
given proper subspace of the corresponding vector space.
Let .U be a subspace of a vector space .V . If .dim U < dim V , then show that there
exists some .ξ ∈ V ∗ such that .ξ(U ) = 0.
Exercise 6.11 Here is another proof of Proposition 6.5. This proof is basis free.
For .U a subspace of a vector space .V , the dimension of the annihilator .U 0 is given
by
. dim U + dim U = dim V.
0
i : U −→ V
u |−→ i (u) = u ∈ V,
The next four exercises deal with various simple relations between two sub-
spaces and the corresponding annihilators.
Exercise 6.12 If .U1 and .U2 are subspaces of .V with .U1 ≤ U2 , then show that .U20 ≤
U10 .
Exercise 6.13 If .U1 and .U2 are subspaces of .V with .U20 ≤ U10 , then show that
.U1 ≤ U2 .
Exercise 6.14 If .U1 and .U2 are subspaces of .V , then show that .(U1 + U2 )0 = U10 ∩
U20 .
Exercise 6.15 If .U1 and .U2 are subspaces of .V , then show that .(U1 ∩ U2 )0 = U10 +
U20 .
References and Further Reading 197
Definition 6.5 The double dual of .V , here denoted by .V ∗∗ , is the dual of the
dual space .V ∗ :
∗∗
.V := Hom(V ∗ , R).
. V ∗∗ is canonically isomorphic to .V :
.Ψ : V −→ V ∗∗
v |−→ (Ψ(v))(ξ ) ≡ v # (ξ ) := ξ(v)
for .v ∈ V and .ξ ∈ V ∗ .
Exercise 6.16 Show the following assertion: .Ψ ≡ (·)∗ is a linear map from .V to
∗∗
.V .
The determinant is one of the most exciting and essential functions in mathematics
and physics. Its significance stems from the fact that it is a profoundly geometric
object. It possesses many manifestations. Its domain is usually the .n × n-matrices,
and it may also be called the determinant function. Another form is the map from
the Cartesian product .V n = V × . . . × V to .K (where .dim V = n) which is linear
in every component with the additional property (alternating) that if two vectors are
identical, the result is zero. This is usually called a multilinear alternating form or a
determinant form or even a volume form on .V . In connection with this, the notion
of orientation is illuminated by a determinant.
In what follows, we start with the algebraic point of view for determinants, and
in doing so, we derive and discuss most of the properties of determinants. Later we
address the geometric point of view. In addition, we define the determinant of a linear
operator, which is essentially a third manifestation of determinants.
From the algebraic point of view, the use of elementary operations and elementary
matrices offers some advantages since the expressions and proofs become clearer and
shorter. We start with a few remarks on the notations and definitions. We first consider
the .m × n canonical basis matrices (see also Comment 3.3, and the Examples 2.5
and 2.6) given by:
. E is := (εis )r
j
(7.1)
with
(εis )rj ≡ (εis )rj := δi j δsr , for i, j ∈ I (m), r, s ∈ I (n).
. (7.2)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 199
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_7
200 7 The Role of Determinants
sth column
.
. Fis = 1n + E is (7.4)
. Fk (λ) = 1n + (λ − 1)E kk λ ∈ K, i, s, k ∈ I (n) (7.5)
⎡ ⎤
1 0 0 0 0
⎢0 1 0 1 0⎥
⎢ ⎥i
. Fis = 1n + E is = ⎢0 0 1 0 0⎥
⎢ ⎥
⎣0 0 0 1 0⎦
0 0 0 0 1
⎡ ⎤
1
⎢ 1 0⎥
⎢ ⎥
. Fk (λ) = ⎢ λ ⎥k
⎢ ⎥
⎣0 1 ⎦
1
. k
1
. Fis−1 = 1n − E is and Fk−1 (λ) = Fk ( ). (7.6)
λ
For the transpose, we have
7.1 Elementary Matrix Operations 201
T
. Fi j = F ji .
Fk (λ)T = Fk (λ). (7.7)
and
. Pis ≡ Psi .
202 7 The Role of Determinants
'
For the elementary row operations with the lower triangular matrix . Fs i (λ), (i <
s), we have:
'
Fs i (λ)
.αi |−→ αi + λαs and
α |−→ α j j
for j /= i, (7.8)
It is well known that the action on matrices . A ∈ Km×n and . B ∈ Kn×n with
.rank B = n by a sequence of elementary matrices as given in Remark 7.1, leads
to [1 ∗]
.F1' F2' · · · F '' A = r
0 ∗ (7.10)
l
and to
. F1' F2' · · · Fk'' B = 1n . (7.11)
and
. B F1 · · · Fk = 1n . (7.13)
Remark 7.3 Normal form of matrices and elementary matrices. Using the
tailor-made bases (see Proposition 3.8 and Theorem 3.1), we obtained for an
.m × n-matrix . A with
The same can be obtained using elementary row and column operations as
in Comment 7.1, and in Eqs. (7.10) and (7.12) in Comment 7.2. Hence
with
. F ' := F1' F2' . . . Fl'' and F := F1 F2 . . . Fl . (7.16)
. F1' , F2' , . . . , Fl'' are elementary matrices from .Gl(m). . F1 F2 , . . . , Fl are elementary
matrices from .Gl(n).
. B = F ' A.
. B = A F.
(Δ1)
. Δ(AFis ) = Δ(A) i /= s,
( )
(Δ2) Δ AFk (λ) = λΔ(A) ∀λ ∈ K.
Axiom .Δ1 means that .Δ is invariant if we add to a given column another col-
umn. This is a specific additive property of .Δ. Geometrically means that .Δ is shear
invariant.
Figure 7.1 shows in 2 dimensions the shear invariance of the area of a parallelo-
gram. As is well-known, this is given by the Euclidean geometry. Here, it corresponds
to a certain determinant function which is usually denoted by.det (see Definition 7.4).
Property .Δ2 means that if we scale a given column by .λ, the same happens to .Δ.
We may call this homogeneity or the scaling property of .Δ.
Using the definition given in Sect. 7.1, we set . B := AFis and . B ' = AFk (λ). It
is obvious that . A and . B ' are column equivalent . A ∼ B, and similarly . A and . B ' are
c
column equivalent . A ∼ B ' .
c
We may also express the properties .Δ1 and .Δ2 using, as usual, the identification
between a list and the corresponding matrix:
shortly, by
From the axioms .Δ1 and .Δ2, four very important properties follow:
Proof
(i) From Remarks 7.3 and 7.6, for .r = rank A < n, that is, .(r /= n) and the prop-
erties of elementary matrices, we obtain the sequence of equations with scalars
' '
.λ1 , . . . , λl and .λ1 , . . . , λl ' :
[ ]
Δ(A) = Δ(F1' F2' . . . Fl'' 10r 00 F1 F2 . . . Fl ),
.
..
.
[ ]
Δ(A) = Δ(F1' . . . Fl'' 10r 00 )λ1 , . . . λl ,
[ ]
and Δ(A) = λ'1 λ'2 . . . λl' ' Δ 10r 00 )λ1 . . . λl .
Since for example the last columns are zero, we obtain also .Δ(A) = 0. This
proves (i).
(ii) In this case, we may assume that .r = n, for example .Δ(A) = Δ(F1n ). Anal-
ogously as before, we have .Δ(A) = λ' Δ(1n ) with .λ' /= 0. It is now clear that
if .Δ(1n ) = 0, it follows .Δ = 0 as well. So (ii) is proven.
(iii) In the proof of (ii), we found .Δ(A) = λ' Δ(1n ) with .λ' /= 0, so (iii) is already
proven.
(iv) The result of (iii) means essentially by itself that the dimension of the space of
the determinant functions is .1: .Δ(A) = λΔ(1n ) signifies that finally for every
matrix . A, it is the value .Δ(1n ) that counts. Since .Δ(1n ) ∈ K, we have:
∼ ∼
Δn := {Δ} bi=j Δ(1n )K = K
.
Δ(1n ) = λΔ Δ0 (1n ).
. (7.18)
The axioms .Δ1, Δ2, and .Δ3 determine uniquely the function .Δ0 ∈ Δn with
.Δ0 (1n ) = 1.
Proof For .Δ ∈ Δn with .Δ(1n ) = 1, according to Proposition 7.1 (iii), and Eqs.
(7.18) and (7.19,) we have:
Δ = λ Δ Δ0
. and
Δ(1n ) = λΔ Δ0 (1n ).
!
Since .Δ(1n ) = 1 and .Δ0 (1n ) = 1, this gives
1 = λΔ 1 and λΔ = 1.
.
Definition 7.4 We define the standard .det to be .Δ0 and write .det := Δ0 .
Δ = λΔ det .
. (7.20)
∎
Having shown the uniqueness of .det, we are going now to show also its existence
inductively with respect to the dimension .n.
208 7 The Role of Determinants
Axioms .Δ1, Δ2, and .Δ3 determine uniquely the determinant function .Δ =
det : Kn×n → K.
For low dimensions we have, as is well-known, in an obvious notation the
following results:
[ 1 1 1]
Δ1 : det (a + b, b) = det α2 +β 2 β 2 = (α1 + β 1 )β 2 − (α2 + β 2 )β 1 .
2 2 α +β β
= α1 β 2 + β 1 β 2 − α2 β 1 − β 2 β 1 = α1 β 2 − α2 β 1 = det (a, b).
. [ 1 1] 2
Δ2 : det (λa, b) = det λα2 β 2 = λα1 β 2 − λα2 β 1 = λ det (a, b).
2 2 [ λα ] β 2
Δ3 : det (I2 ) = det 01 01 = 1.
2 2
. det A = α1 det (b̄, c̄) − β 1 det (ā, c̄) + γ 1 det (ā, b̄), (7.21)
3 2 2 2
⎡⎤
α1 β 1 γ 1 [ 2 2 2]
⎣ 2 2 2⎦ α β γ
.with [a b c] := α β γ and [ā b̄ c̄] := 3 3 3 .
α β γ
α3 β 3 γ 3
We have again to show that the axioms .(Δ1), (Δ2), and .(Δ3) are valid:
Δ3 :
. is clear, det (13 ) = 1.
3
Δ2 : We have, for example, for the second column
[ 1 1 1]
α λβ γ
det(a, λb, c) = α2 λβ 2 γ 2 ,
α3 λβ 3 γ 3
det(a, λb, c) = α det (λb̄, c̄) − λβ 1 det (ā, c̄) + γ 1 det(ā, λb̄),
1
2 2
= α λ det (b̄, λc̄) − λβ det (ā, c̄) + γ λ det(ā, b̄).
1 1 1
2 2
7.2 The Algebraic Aspects of Determinants 209
. det(a + b, b, c) =
= (α1 + β 1 ) det (b̄, c̄) − β 1 det (b̄ + b̄, c̄) + γ 1 det (ā + b̄, b̄),
2 2 2
= α1 det (b̄, c̄) + β 1 det (b̄, c̄) − β 1 det (ā, c̄) − β 1 det(b̄, c̄) + γ 1 det (ā, b̄),
2 2 2 2 2
(−1) n
αn1 det(b1 , . . . , bn−1 ),
using [ ]
α11 , α21 ... αn1
. (a1 , a2 , . . . , an ) = b1 , b2 bn
with b1 , . . . , bn ∈ Kn−1 ,
where .(b1 , . . . , bˇi , . . . , bn ) indicates the list .(b1 , . . . , bn ) but omitting .bi . We see that
.det is linear in every column:
n
with
so .Δ2 is valid since linearity contains the homogeneity .(Δ2). In order to show that
Δ1 is valid as well, we proceed in an analogous way as in the case of .n = 3 and we
.
. det A = εi jk αi β j γ k .
3
Δ4 :
. det is linear in every column which of course includes the axiom Δ2.
3
Remark 7.9 The definitions .(D1, D2) and .(Δ1, Δ2) are equivalent.
Proof
We summarize some of the most important properties of .det below. Most properties
follow directly from the existence of .Δ1 and .Δ2 (or . D1 and . D2) and the normal-
ization
. det(1n ) = 1 (Δ3).
Interestingly, we do not have to explicitly use the permutation group .(Sn ) at this
level.
(i) The determinant of a matrix is linear in every column. This is equivalent to
the determinant being a multilinear map on .Kn .
(ii) The determinant remains unchanged if we add a linear combination of some
columns to a different column. This corresponds geometrically, as we saw in
Sect. 7.2 and Fig. 7.1, to the shear transformation invariance of the determi-
nant.
(iii) The determinant is zero if the matrix columns are linearly dependent. This is
closely connected with the next property.
(iv) The determinant changes sign if two columns are interchanged. This means
that the determinant is an alternating multilinear form.
(v) Multiplication law: If .Δ is the normalized determinant (with .Δ(1n ) = 1),
then
.Δ(AB) = Δ(A)Δ(B) if Δ(1n ) = 1.
212 7 The Role of Determinants
Proof Let .Δ1 and .Δ2 be determinant functions. From Proposition 7.1 (i), it
first follows that
since .Δ3 (A) := Δ1 (1n )Δ2 (A) − Δ2 (1n )Δ1 (A) is also a .det function and with
Δ3 (1n ) = Δ1 (1n )Δ2 (A) − Δ2 (1n )Δ1 (1n ) = 0, we have.Δ3 = 0 so that (7.23)
.
holds.
Setting
Δ1 (1n ) = Δ(A),
∎
(vi) Any determinant function .Δ is transposition invariant:
Proof This follows from the fact that every invertible matrix . A is a product of
an elementary matrix (see Comments 7.1 and 7.2, and Remark 7.3):
. A = F1 , F2 . . . Fm
∎
(vii) The multi-linearity leads to the expression
. det(λA) = λn det A.
7.4 Properties of the Determinants 213
(viii) For an upper triangular matrix, the determinant function .Δ is given by the
product
⎡ of the⎤diagonal elements:
a1 ∗
⎢ .. ⎥
.Δ ⎣ . ⎦ = a1 a2 . . . an .
0 an
Proof Using those elementary operations which leave the determinant func-
tions invariant, we obtain
⎡ ⎤ ⎡ ⎤
a1 ∗ a1 0
⎢ .. ⎥ ⎢ .. ⎥
.Δ ⎣ . ⎦ = Δ⎣ . ⎦ = a1 . . . an Δ(1n ) = a1 . . . an .
0 an 0 a1
∎
[A B]
(ix) Let . C D be a block matrix with . A ∈ Kr ×s , B ∈[ Kr ×(n−s)
] , C ∈ K(m−r )×s , and
(m−r )×(n−s)
.D ∈ K , then the following holds: .det 0 D = det A det D.
A 0
Proof
[ ]
If we define Δ(A) := det A0 D0 which is a det function,
. using Δ(A)
[ = Δ(1
] n ) det A and Δ(1n ) = det D,
we obtain det A0 D0 = det D det A = det A det D.
∎
[A B]
(x) Using the same notation as in (ix), the following holds: .det 0 D = det A det D.
(xii) All the properties of the determinant that refer to the columns also hold when
replacing columns with rows.
(xiii) Cofactor expansion concerning the columns (rows).
From a given matrix . A ≡ (as )n ≡ (αis ), we define various matrices with
respect to the fixed position .(i, s).
214 7 The Role of Determinants
where the .ith row and the .sth column have been deleted.
If we use elementary matrix operation, we see that the entry .γis is given by
the expressions
or equivalently by
∑
n
. αik
#
αks = δis (det A).
k=1
Proof The calculation of the components of the matrix . A# A is given by (.i fixed)
∑
n ∑
n
αik
#
αks = det(a1 , . . . , ai−1 , ek , ai+1 . . . an )αks ,
k=1 k=1
∑
= det(a1 , . . . , ai−1 , ek αks , ai+1 . . . , an ),
k
= det(a1 , ai−1 , as , ai+1 , . . . , an ).
. If s /= i, we have
det(a1 , . . . , ai−1 , as , ai+1 , . . . an ) = 0
since det(. . . as . . . as . . . ) = 0.
If s = i, we have
∑n
αik
#
αks = det(a1 , . . . , ai−1 , ai , ai+1 , . . . an ) = det A.
k=1
7.5 Geometric Aspects of the Determinants 215
So we have altogether
∑
n
. αik
#
αks = δis (det A). ∎
k=1
∑
n
. αik αks
#
= δis det A.
k=1
∑
n
. det A = (−)1+k α1k det /
A1k .
k=1
This is the recursion formula which was used in the proof of the existence of
.det.
As we saw so far, the determinant gives essential information about .(n × n)-matrices
and subsequently about linear maps between vector spaces of the same dimension,
for example, about linear operators (endomorphism).
In addition, determinants also have a deep geometric significance. We restrict
ourselves to .R- vector spaces to simplify the explanations and support the intuition.
The determinant of an operator. f : V → V measures how this. f changes the volume
of solids in .V . In addition, since .det f is a scalar with positive and negative values,
it also measures how this . f changes the orientation in .V . The determinant by itself
turns out to be essentially a subtle geometric structure that defines the volume and
the orientation in .V .
It is important to note that this is a new geometric structure on .V called a volume
form. It may be, or rather has to be defined directly on an abstract vector space
(a vector space without a scalar product on it). Despite this, if we already have a
Euclidean vector space .V , this induces a specific volume form on .V . Hence, the
volume form is a weaker geometric structure than a scalar product.
216 7 The Role of Determinants
We want to demonstrate these ideas in the simplest nontrivial case. We consider the
two-dimensional Euclidean space .R2 with its standard basis .(e1 , e2 ). It is understood
that our discussion is also valid for .R3 , R4 , . . . Rn . [ 1] [ 1]
α α
We start with a parallelogram . P(a1 , a2 ) given by .a1 = α12 and .a2 = α22 . The
1 2
area of . P(a1 , a2 ) is given by the usual formula. For the square, we have
and
( f a1 , f a2 ) = (a1 , a2 )F
. (7.28)
. volume2 ( f a1 , f a2 ) =( f a1 , | f a1 )( f a2 | f a2 ) − ( f a1 | f a2 ) .
2 2
(7.29)
.( f a1 | f a1 )( f a2 | f a2 ) − ( f a1 | f a2 )2
= (ϕ11 ϕ22 − ϕ21 ϕ12 )2 ((a1 | a1 )(a2 | a2 ) − (a2 | a2 )(a1 | a2 )). (7.30)
This is in fact
and with the definition . P := P(a1 , a2 ) for the parallelogram .(a1 , a2 ), . P ' :=
P( f a1 , f a2 ), Eq. (7.31) may be written as
.
. volume2 (c1 , c2 )2 = (det G)2 volume2 (e1 , e2 ) = (det G)2 vol22 . (7.33)
The result is that we may define on .R2 and in every two-dimensional vector
space .V , a signed volume which is completely independent of the presence or
not of a scalar product. Slightly more generally we may write, as example for
.n = 2, using the notation of Sect. 7.2, the following definition:
. D: R2 × R2 −→ R,
(a1 , a2 ) |−→ D(a1 , a2 ) := Δ(A)
. D◦ (e1 , e2 ) = det(12 ) = 1.
(iv) The interpretation of the signs can be read off from the example
[10]
. D◦ (e1 , e2 ) = det[e1 e2 ] = det 01 = +1
and
[0 1]
D◦ (e2 , e1 ) = det[e2 e1 ] = det 10 = −1,
or more generally
D◦ (a1 , a2 ) = det[a1 a2 ] = det A,
D◦ (a2 , a1 ) = det[a2 a1 ] = − det A.
The sign of a given volume form characterizes the “standard” or the “non-
standard” orientation of a basis.
218 7 The Role of Determinants
In this case, if .det F = −1, then the basis . B ' = ( f a1 , f a2 ) has a different ori-
entation to the basis . B = (a1 , a2 ). This means that the linear map . F changes the
orientation.
So far, we actually used the term orientation in a common way. In the next section,
we are going to focus our attention on a more profound discussion of this term.
The last discussion is a good introduction to the notion of orientation on a real vector
space. Orientation is a special structure that can be introduced on an abstract vector
space.
Orientation plays a very important role in both physics and mathematics, and has
a great impact on daily life. The scientists who are confronted with this concept have
a good intuitive understanding of it. Here, we are going to give the precise definition
of it. Section 1.2 and our discussion in Sect. 7.5 will be very helpful.
On a vector space .V , we first consider all the bases. The reason is that bases,
and in particular the relations between them, are the source of additional structures
in an abstract vector space. So we choose a basis . A = (a1 , . . . , an ) and a second
basis . B = (b1 , . . . , bn ). There exist certain relations between them, given by the
determinant of their transition matrix. It is clear that the transition matrix is invertible
and therefore its determinant is nonzero. Here however, we are only interested in
whether this determinant is positive or negative. This determines the equivalence
relation between the set of bases . B(V ) of .V which we call orientation.
We say two bases are orientation equivalent if and only if the determinant of their
transition matrix is positive. In this case, the two bases are consistently oriented. As
we learned in Sect. 1.2, this equivalence relation leads to a class decomposition of
the set of bases, and so to a quotient space consisting of subsets of bases.
Furthermore, it is evident that this quotient space consists only of two elements,
only of two subsets of . B(V ): The bases consistently oriented to the chosen basis . A
(with the positive determinant of the corresponding transition matrix), and the bases
having opposite orientations to the chosen basis . A (with the negative determinant of
the corresponding transition matrix).
The next definition summarizes the above considerations.
7.6 Orientation on an Abstract Vector Space 219
∼
In this case, we write . B or A and we say also that . A and . B are consistently oriented.
Note that . A = BT means equally well
[a1 · · · an ] = [b1 · · · bn ] T
.
or equivalently
∼
Comment 7.4 .or is an equivalence relation.
∼
Proof We need to show that .or is (i) reflexive and (ii) symmetric.
∼
(i) . A or A
∼
since . A = 1n , so .or is reflexive;
∼ ∼
(ii) . B or A ⇔ A or B
since, if . A = BT , then, with .det T = positive, . B = AT −1 and .det T −1 =
∼
positive, so .or is symmetric;
∼ ∼ ∼
(iii) . A or B and . B or C ⇒ . A or C
since, if . A = BT and . B = C T ' , then, with .det T > 0 and .det T ' > 0, we
∼
get. A = BT = C T ' T , with.det T T ' = det T det T ' > 0, so.or is transitive.
∎
The subset .Gl + (n) ⊆ Gl(n), defined by .Gl + (n) := {T ∈ Gl(n) : det T >
0}, is a subgroup..
∎
Using this, we can affirm the following for . A, B ∈ B(V ): . A has an equal orientation
with . B if there is some .T ∈ Gl + (n) with . A = BT .
220 7 The Role of Determinants
∼
Definition 7.7 The quotient space . B(V )/or.
For a given basis . A = (a1 , . . . , an ) ∈ B(V ), we call the set of bases, given by
∼
. or(A) := {B = (b1 , . . . , bn ) ∈ B(V ) : B or A}
an orientation of .V .
The corresponding quotient space is given by
∼
. B(V )/or := {or(A) : A ∈ B(V )}.
The bases . A and . B above represent the same equivalence class or coset which we
call, as stated, orientation.
It is easy to obtain, for example, .or( Ā), an opposite orientation of the given
.or(A) with . A = (a1 , a2 , . . . , an ): we take .or( Ā) with . Ā = (−a1 , a2 , . . . , an )
and we observe that .or( Ā) /= or(A), since we may write
with ⎡ ⎤
−1 0 0
0 1 ···
. T ' = ⎣ .. .. ⎦
. . 0
0 0 1
∼
Remark 7.12 The cardinality of . B(V )/or is .2.
From Remark 7.11 and Example 7.1, we can see that we have
∼
. B(V )/or = {[A], [ Ā]}.
This means, as expected and widely known, that there are only two orientations
on a real vector space.
Now, we can also specify what we specifically mean by an oriented vector space.
7.7 Determinant Forms 221
∼
Comment 7.5 . B(V )/or as an orbit space.
There are many ways of defining the determinant of a matrix. In the past (see Sect.
7.2), we first defined determinants as special functions on the set of square matrices
.Δ : Kn×n → K,
A |→ Δ(A).
In order to make this clear, we used the name determinant function. It was very natural
to consider the same object also as a function of the.n columns.(a1 , . . . , an ) of a given
matrix . A = [a1 . . . an ]. This is why we may use now the equivalent definition with
the letter . D just for distinction:
. D: Kn × . . . × K n −→ K,
(a1 , . . . an ) |−→ D(a1 , . . . , a1 ),
222 7 The Role of Determinants
Δn (Kn ) = {D . . . },
.
and we get
. Δn = Δn (Kn ) ∼
= K.
This definition can be used to extend the concept of determinant to the case of an
abstract vector space.V with.dim V = n. In this case, we are talking about multilinear
forms on .V or n-linear forms or determinant forms on .V . The space of determinant
forms on .V are denoted similarly by
Δn (V ) = {D . . . }.
.
Permutations are appearing here because the determinants are not only multilinear
but also alternating. This leads further to the explicit form of the determinant known
as the Leibniz formula.
Therefore, it is helpful to summarize some of the relevant properties of the sym-
metric group .(Sn ) and the role of the sign of a given permutation.
( )
Definition 7.9 The sign .επ of a permutation .π = π11 π22 ... ... πn ∈ Sn is .−1 if
n
π j −πi
the number of pairs .Φ(π) in the list .(π1 , . . . , πn ) with . j−i = negative is odd,
and the sign is .+1 otherwise.
That is
ε = (−1)Φ(π) .
. π (7.37)
Using, without proof, the fact that every permutation is a product of a certain number
of transposes, .t (π) ∈ N0 ,
.π = τ1 τ2 . . . τt (π) ,
and taking into account that every transpose .τ has negative sign .ετ = −1, we also
have
t (π)
.επ = (−1) . (7.39)
with
. Pπ◦σ = Pπ Pσ . (7.41)
The above homomorphism shows that the sign .ε is also a group homomorphism:
ε : Sn −→ Z2 ∼
. = {+1, −1}
with
ε
. π◦σ = επ ◦ ε σ . (7.43)
. D: Vn −→ K
(v1 , . . . , vn ) |−→ D(v1 , . . . , vn ) ∈ K,
.Δn (V ) = {D . . . }.
As in Sect. 7.2, we obtain here again the result that the set of n-linear alternating
forms is a one dimension vector space.
Δn (V ) ∼
. = K. (7.47)
Further more, if we use the multi-linearity of . D and the above relation (v), we obtain
the explicit expression for the determinant. This is a very important Leibniz formula.
7.8 The Determinant of Operators in V 225
. det A = det(a1 , . . . , an ) ai ∈ Kn i,
for i 1 , . . . , i n ∈ I (n), we may write
det A = det(ei 1 αi11 , ei 2 αi22 , . . . , ei n αi n ),
det A = det(ei 1 , ei 2 , . . . , ei n )αi11 . . . αinn .
In Eq. (7.47), we may see immediately that every nontrivial determinant form,
also called for good reasons volume form or simply volume, can be used as a basis
of .Δn (V ).
. det f := det FB .
Now, we have to show that this definition is well defined, that it is basis independent
which will justify the above notation as .det f .
In order to check the basis independence, we choose a second basis .C ∈ B(V )
and we expect to show that
. det FC = det FB . (7.49)
This follows from the commutative diagram given below in an obvious notation, with
. T = TC B :
FC
KCn KCn
T T
KnB KnB
FB
We therefore obtain from Eq. (7.50), taking the determinant and using the rules
mentioned in Sect. 7.4,
. f ∗ D(v1 , . . . , vn ) := D( f v1 , . . . , f vn ).
. det : End(V ) −→ K,
f | −→ det f (7.51)
7.8 The Determinant of Operators in V 227
which means
. D( f v1 , . . . , f v1 ) = (det f )D(v1 , . . . , vn ), (7.53)
or, equivalently
D( f v1 , . . . , f vn )
. det f = . (7.54)
D(v1 , . . . , vn )
This definition reveals the geometric character of the determinant of a given endo-
morphism . f ∈ End(V ) ≡ Hom(V, V ). Equations (7.53) and (7.54) show that .det f
characterizes the scaling of the . f -transformation on any volume . D in .V .
Summary
The determinant, alongside the identity, the exponential function, and a few others,
is one of the most important maps in mathematics and physics. Therefore, like in
most books on linear algebra, we dedicated an entire chapter to determinants.
Initially, we adopted an algebraic approach for defining and understanding the
properties of determinants. Here, elementary matrices and elementary row operations
were the primary tools. Then, we delved into the geometric aspects of determinants.
In doing so, we introduced and extensively discussed the concept of orientation.
The role of permutations and the permutation group was also introduced in a
suitable manner.
Finally, the concept of determinants was extended to operators, not just matrices,
and we presented the corresponding geometric interpretation.
Elementary matrices induce column and row operations (see Remark 7.2). In
connection with the proof of Theorem 5.1, it follows by construction that for
instance elementary column operations do not affect the column rank of a
matrix. The following Exercise 7.1 shows that they do not affect the row rank
either. The same applies when we replace column by row.
Exercise 7.1 Use the result of Exercise 6.2 to show that elementary column opera-
tions on a given matrix . A ∈ Km×n do not affect the row rank of . A.
228 7 The Role of Determinants
We now apply the above Exercise 7.1 to prove once more the following theo-
rem.
For the next six exercises one can find proofs in the literature which generally
differ from the ones given in this chapter. The reader is asked to chose our
proofs, or to try to find different proofs.
Exercise 7.3 Show that the set of determinant functions on a vector space .V , or
equivalently the set of determinant forms is a vector space of dimension one.
Exercise 7.5 Use the above Exercise 7.4 to prove the following exercise.
Let .Δ be a determinant function with .Δ(1n ) = 1 and . A, B ∈ Kn×n . Show that
Δ(AB) = Δ(A)Δ(B).
.
Δ(AT ) = Δ(A).
.
then [A 0]
. det 0 D = det A det D.
then [A B]
. det 0 D = det A det D.
with
μ
a ∈ R2 , ϕi ∈ R, F = (ϕis ), i, s, μ ∈ I (2),
. i
so we can write
. [ f (a1 ) f (a2 )] = [a1 a2 ]F.
Prove
.vol2 ( f a1 , f a2 )2 = ( f a1 | f a1 )( f a2 | f a2 ) − ( f a1 | f a2 )2 ,
with . Pπ◦σ = Pπ Pσ .
Note that if we compare the entries of the matrix . Pπ with the squares of a chess-
board, and set on every entry .1 a rook, the zeros correspond exactly to the area of
activity of the rook. Therefore we could call . P the chess-representation of the group
. Sn .
[ ]
Exercise 7.12 Check that for a matrix . A = αγ βδ ∈ R2×2 with .det A /= 0,
[ δ −β
]
. A−1 = 1
det A −γ α .
230 7 The Role of Determinants
In Sect. 3.5, we discussed in very elementary terms an origin of tensors. Yet there
are many ways to introduce tensors. Some of them are very abstract. We believe that
the most direct way, which is also very appropriate for physics, is to use bases of
∗
. V and . V . This definition is, of course, basis-dependent, but as we have often seen
thus far, this is not a disadvantage. In this chapter, we shall proceed on this path and
leave a basis-independent definition for later (see Chap. 14).
Thus far, we have gained considerable experience with bases. We already used
the simplest possible tensors (e.g. scalars, vectors, and covectors) with their indices
on various occasions. Therefore, it is very instructive to summarize first what we
already know from linear algebra about the use of indices and tensors. This leads us
to the following section.
One important application (one may even say chief application) of bases, is they
enable to represent abstract vectors with coordinates, and do concrete calculations
with them. One can say, that bases “give indices to vectors” since we label bases
vectors with an order. It turns out that indices, as they are used in linear algebra,
and in particular in this book, are quite helpful since they give additional informa-
tion about the properties and structures of the mathematical objects they are related
to. We accomplish this using the Einstein Summation Convention and with some
straightforward conventions we add. In this way, we obtain valuable indices which
we call smart indices.
There are two different kinds of indices corresponding to the vectors in .V and the
covectors in.V ∗ . These two possibilities appear clearly and efficiently when written as
upper or down indices. This meets precisely with the Einstein Convention. Regardless
of the Einstein Convention, using indices up or down is much more functional than
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 231
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_8
232 8 First Look at Tensors
writing them left or right, as is usually done in the mathematical literature and not
seldom in the physics literature.
For the sake of simplicity, we consider a real vector space .V with dimension .n
and its dual .V ∗ , and we are going to summarize our experience with indices in linear
algebra till now. Our primary purpose is to revise some subjects and show typical
examples of how the indices enter the various expressions. Thus, we also see the
positive influence of the chosen conventions on a good understanding of the path to
a given expression.
. B ∗ = (β s ) = (β 1 , . . . , β n ) and C ∗ = (γ j ) = (γ 1 , . . . , γ n ).
with the corresponding coefficients given by.vrB , vCi , ξsB , ξ Cj ∈ R. For the change
of basis we use a regular matrix
. T = TC B = (τsi ) ∈ Gl(n),
Note that the indices .r, s correspond to the bases . B and . B ∗ and the indices .i, j
to the bases .C and .C ∗ . We use different kinds of indices for different bases even
when they correspond to the same vector space. This distinction is not usual in
the literature, but it prevents confusion. In connection with this, we point out that
coefficients (components) of vectors have upper indices and coefficients of covectors
have lower indices. However, vectors themselves have lower indices and covectors
themselves upper indices. This is, of course, consistent with the Einstein Convention
and in the usual matrix formalism also leads to the following expressions:
⎡ ⎤ ⎡ 1⎤
v 1B vC
⎢ .. ⎥ ⎢ .. ⎥ C
→B = ⎣ . ⎦ , ξ = [ξ1 . . . ξn ], v→C = ⎣ . ⎦ , ξ = [ξ1C . . . ξnC ],
.v
B B B
(8.6)
∼ ∼
v nB vCn
⎡ ⎤ ⎡ ⎤
β1 γ1
⎢ ⎥ ⎢ ⎥
. B := [b1 · · · bn ], B ∗ := ⎣ ... ⎦ , C = [c1 · · · cn ], C ∗ := ⎣ ... ⎦ . (8.8)
β n
γ n
v→ , v→C ∈ Rn , ξ B , ξ C ∈ (Rn )∗ , v ∈ V, ξ ∈ V ∗ .
. B (8.9)
∼ ∼
( · )i = τsi (·)s ,
. (8.10)
This also leads to an invariant expression which is very important, not only in physics:
Note that writing the symbol .ξ to indicate the row with entries .ξs ∈ R : ξ =
∼
[ξ1 · · · ξn ], we may also write .ξ ≡ ξ B ≡ ξ B ∈ (Rn )∗ and .v→ ≡ v→B ≡ v B ∈ Rn too.
∼ ∼
We would like to iterate that we mostly use the symbol “.≡” to indicate that we use
different notations for the same objects. We admit that the use of .v→B and .ξ B instead
∼
of .v B and .ξ B is a pleonasm.
Note that we often identify the list . B = (b1 , b2 , . . . , bn ) with matrix row
(.1 × n) with entries given by vectors .b1 , . . . , bn which we denote by .[B] :=
[b1 b2 . . . bn ], and we denote both by . B. For the cobasis . B ∗ , we analogously
write the symbol . B ∗ for the column (.n × 1-matrix) with entries the covectors
∗ ∗
.β , . . . , β , as in Eq. (8.8). We also apply this to .C, D, . . . and .C , D , . . . .
1 n
We are now coming to the next important illustration of using smart indices, by
summarizing again our results concerning the representation of linear maps.
We consider the map . f ∈ Hom(V, V ' ) with .dim V ' = m and the basis . B ' =
(bQ' ) = (b1' , · · · , bm' ), .Q ∈ I (m). Suppose . f is given by the equations
. f
V V'
ψB
. ψB'
.
ρ
We choose our indices, .r for .v, .ρ for .w, .ϕr for . F, and we write .vr , w Q ∈ R.
This leads directly to the expression:
Q
w B ' = ϕrQ vrB .
. (8.17)
This is, of course, also the result of the direct calculation, which in the matrix
formalism takes the well-known form:
. w
→ B ' = F v→B , (8.18)
or even
−−→
→ = f (v) or −
w
.
→
w = F v→,
Our last example is a change of basis for the representation of the map . f .
Here, the change of basis leads us immediately to the correct result with the
help of our smart indices. In contrast to the matrix formalism, where this is
less straightforward, especially if you are not very familiar with commutative
diagrams.
Now, we would like to obtain the representation of . f relative to the new bases
'
.C and .C , as indicated in the following diagram:
. f
V V'
ψC
. ψC '
.
RCn RCm'
. F̃
236 8 First Look at Tensors
with
. F̃ = f C ' C = (ϕ̃iα ) α ∈ I (m). (8.19)
We take
.C ' = (cα' ) = (c1' , . . . , cm' )
and
Q
w = bQ' w B ' = cα' wCα ' ,
. (8.20)
with
. T ' = TC ' B ' = (τQ'α ),
Q
.wCα ' = τQ'α w B ' . (8.21)
We have taken into account that the indices .i and .α correspond to the new basis
C and .C ' and the indices .r and .Q to the bases . B and . B ' . Doing so, the result is
.
This result was also found in Sect. 5.5, “Linear Maps and Matrix Representa-
tions”.
To justify the path leading from vectors to tensors, both intuitively and correctly, we
need to explain the notion of a free vector space. This will also help to understand
better our approach from vectors to tensors via the explicit use of bases. In addition,
this will show that both tensors and their indices are something quite natural and, in
a way, inevitable.
Starting with an abstract vector space .V , we may also choose a basis that is simply
a list of .n = dim V linearly independent vectors in .V .
We may start now with an arbitrary list of arbitrary objects, that is, elements.
Using a list of arbitrary objects in any given set as a basis, we can formally also
8.2 From Vectors in V to Tensors in V 237
determine a vector space called a free vector space. This brings us to the following
definition:
For a given set . S = {s(1), . . . , s(n)} of cardinality .n, the free .R-vector space
on . S is
.RS := {si v : si := s(i), v ∈ R, i ∈ I (n)}.
i i
We can show immediately that .RS is a vector space and we write .V = RS. The
set we started with, . S = {s(1), . . . , s(n)}, is a basis of .V so that .dim V = n and
we may change the notation and write . B = S, taking . B = (b1 , . . . , bn ) with .b1 :=
s(1), . . . , bn := s(n).
Repeating the above construction for a new set . S(2) := {s(i, j) := bi b j , i, j ∈
I (n)}, we obtain a new vector space .T 2 V := RS(2) with the basis . B2 = S(2) =
{bi j := bi b j }. We call .T 2 V a tensor space of rank 2. This is the simplest nontrivial
tensor space over .V . It is also clear that .T 2 V is given by
This is also called a contravariant tensor of rank .k or .k-tensor. Since every con-
travariant tensor of rank .k can be written as a linear combination of tensor products
of vectors, we can also justify the following expression for .T k V :
238 8 First Look at Tensors
. · · ⊗ V, .
T k V = ,V ⊗ ·,,
k-times
It is evident that the same construction can also be made for .V ∗ . So we have . B ∗ =
(β 1 , . . . , β n ) and
. T k V ∗ = ,V ∗ ⊗ ·,,
· · ⊗ V ,∗ .
k-times
is a basis of .T k V ∗ .
In our construction, there is no restriction to no additional properties on the set . S
preventing it from being a basis for a vector space. Hence we can also have the set
as a basis of a vector space which we denote by .Tlk V . We thus obtain another tensor
space given by
T k V = {v i1 ...ik
. l j1 ... jl bi1 ⊗ · · · ⊗ bik ⊗ β j1 ⊗ . . . ⊗ β jl }
which we call a mixed tensor space of type .(l, k). We can also write
· · ⊗ V, ⊗ ,V ∗ ⊗ ·,,
T k V = ,V ⊗ ·,,
. l · · ⊗ V ,∗ .
k-times l-times
Using the notation given in Eqs. (8.1) and (8.2), the change of basis .br = ci τsi leads,
for the coefficients of . A, to the transformation given as expected by
s
α 1 ···ik = τr 11 τr 22 · · · τr kk αr 1 r2 ···rk s 1 s2 ···sl τ̄ j 11 τ̄ j 22 · · · τ→j ll .
i i i i s s
. j1 ··· jl
Symmetric and antisymmetric tensors are special kinds of tensors. They are essen-
tial for mathematics, especially for differential geometry, and everywhere in physics.
We restrict ourselves here to the covariant tensors since similar considerations apply
to contravariant tensors too. For mixed tensors, a similar approach is not really rele-
vant.
Symmetric tensors are tensors whose coefficients, in any basis inter-change, stay
unchanged. Antisymmetric (totally antisymmetric) or alternating tensors are tensors
whose coefficients change signs by interchanging any pair of indices.
In physics, we have two prominent examples of symmetric tensors, the metric tensor
g and the energy impulse tensor .Tμν .
. μν
τ
. i 1 ···i a ···i b ···i k = − τi1 ···i b ···ia ···i k .
Summary
This chapter concludes the section of the book that we consider elementary linear
algebra.
Here, we summarized and reiterated our notation for linear algebra and tensor
formalism. This notation primarily involves the systematic selection of indices and
their positioning, whether upper or lower, in the entries of the representation matrices
of linear maps.
Several advantages of this notation were mentioned, and multiple examples facil-
itated the reader’s understanding of using these “smart indices”, as we prefer to call
them.
Following that, we presented our second elementary introduction to tensors. This
introduction, like the one in Chap. 3, is dependent on the basis but represents a certain
generalization concerning the dimension and the rank of the tensor space considered
and the corresponding tensor notation.
Chapter 9
The Role of Eigenvalues and Eigenvectors
This is one of the most important topics of linear algebra. In physics, the experimental
results are usually numbers. In quantum mechanics, these numbers correspond to
eigenvalues of observables which we describe with special linear operators in Hilbert
spaces or in finite-dimensional subspaces thereof. Eigenvalues are also relevant for
symmetries in physics. Eigenvalues and eigenvectors help significantly to clarify the
structure of operators (endomorphisms). We recall that an operator in linear algebra
is a linear map of a vector space to itself. We denote the set of operators on .V by
.End(V ) ≡ Hom(V, V ).
In this chapter, after some preliminaries and definitions, we will discuss the role
of eigenvalues and eigenvectors of diagonalizable and nondiagonalizable operators.
For any given operator . f on a .C vector space, we get to a corresponding direct sum
decomposition of .V , using only very elementary notions. This decomposition leads
to the more refined Jordan decomposition of .V . First, we consider the situation on
an abstract vector space without other structures. Later, in Chap. 10, we shall discuss
vector spaces with an inner product structure.
The first step towards understanding a part of the structure of a given operator
f ∈ End(V ), was presented in Chap. 6. After choosing a basis, we obtained a decom-
.
position:
. V = ker f ⊕ coim f.
This also shows the general direction we have to choose to investigate the structure
of any arbitrary operator in .V . We need to find a finer decomposition of .V induced
by the operator . f . We hope to find a list of subspaces,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 241
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_9
242 9 The Role of Eigenvalues and Eigenvectors
. V = U1 ⊕ · · · ⊕ Ui ⊕ · · · ⊕ Uω
in such a way that each .Ui is as small as possible, that is, their dimensions are as
small as possible. The restriction of . f to every such subspace must of course be an
endomorphism : . f |U j ∈ End(U j ). In other words, every .U j must be . f invariant.
The subspaces .ker f and .im f are, indeed, both . f invariant subspaces. The most
we can expect in this situation, is for every . f invariant subspace .Ui to be one-
dimensional. Such one-dimensional. f invariant subspaces lead to the specific scalars,
the eigenvalues of . f , and special vectors, the eigenvectors of . f . These are special
characteristics and geometric properties of every operator . f . Furthermore, if, for
example, the operator . f is connected with an observable, as in quantum mechanics,
then the eigenvalues correspond to the results that the experiments produce. These
have to be compared with the theoretical results given from the calculations of such
eigenvalues.
However, to clearly understand what is going on, we must consider the most
general case without the inner product (metric) structure. This is also justified because
eigenvalues and eigenvectors are independent of any isometric structure.
We cannot expect that every operator . f will induce such one-dimensional direct
sum decompositions of .V . The existence of this fundamental property of an operator
. f is connected with the diagonalization problem we shall discuss below. Diagonal-
izable operators are a pleasant special case from the mathematical point of view.
Fortunately, the most physically relevant operators are diagonalizable too. Almost
all diagonalizable operators in physics are so-called normal operators, defined only
on inner product vector spaces (see Sect. 10.5).
One more comment has to be made. The theory of eigenvalues and eigenvectors
differs when considering a complex or real vector space. The formalism within a
complex vector space seems more natural and straightforward than within a real
vector space. It should be clear that both real and complex vector spaces, are equally
relevant and essential in physics.
In what follows, we start with the theory of eigenvalues and eigenvectors on a
.K—vector space for .K ∈ {C, R} and we may think in most cases of a .C—vector
space and only when there is a difference to the .R—vector space formulation, we
shall comment on that appropriately.
with complex vector spaces is easier to deal with, we generally think of complex
vector spaces. If there is a difference, we bear this difference in mind when we
restrict ourselves to the real vector space framework.
We are led to the notion of eigenvalues and eigenvectors if we think of the smallest
possible nontrivial subspace.U of.V . Then.U is of course a one-dimensional subspace
and is determined by a nonzero vector .u (i.e., .u /= 0), so we have .U = span(u) =
{αu : α ∈ K}. We would like .U to also be consistent with the operator . f . That is, we
would like .U to also be . f - invariant,
. f (U ) < U.
It is important to realize from the beginning that it is the eigenelement, the pair
(.λ, v), which is uniquely defined. If the eigenvalue .λ is given, there are always many
eigenvectors belonging to this .λ: all the nonzero .u ∈ U = span(u) = {αu : α ∈ K}.
Furthermore, it is also possible that some other vector .w ∈ V \U exists which fulfills
the same eigenvalue equations as above.
. f (w) = λw.
or equivalently
. E(λ, f ) = ker( f − λidV ).
In other words, . E(λ, f ) is the set of all eigenvectors corresponding to .λ with the
inclusion of the vector .0 in order . E(λ, f ) to be a vector space.
9.3 Examples
Eigenvectors Eigenvalue
all .v ∈ V \ {0} .λ=1
because . f v = idV v = 1 v.
Eigenvectors Eigenvalue
.kerf \ {0} .λ=0
because . f v = 0 v.
Hence, the eigenspace . E(λ = 0, f ) = ker f .
. f 2 = f. (9.1)
Eigenvectors Eigenvalue
.im f \ {0} .λ1 =1
.ker f \ {0} .λ2 = 0
9.3 Examples 245
. V = E(1, f ) ⊕ E(0, f )
= im f ⊕ ker f.
. f v =λv. (9.2)
. f v = λ2 v ⇒ λv = λ2 v. (9.4)
. V = ker f ⊕ im f. (9.5)
id = f + (id − f ).
. (9.6)
For λ = 0, we have
. f v = 0v and so E(λ = 0) ≡ E(0) = ker f.
For λ = 1, we have fv = v and so E(λ = 1) ≡ E(1) = im f.
. V = im P0 ⊕ im P1 , (9.8)
. f = 0 P0 + 1 P1 . (9.10)
. f 2 = id. (9.11)
Eigenvectors Eigenvalues
.im P \ {0} .λ1 =1
.ker P \ {0} .λ2 = −1
. V = E(1, f ) ⊕ E(−1, f ).
. f v = λv and f 2 v = λ2 v,
9.3 Examples 247
. P := 21 (id + f )
. Pv = 1
2
(id + f )v = 1
2
v+ 1
2
fv = 1
2
v+ 1
2
v = v,
hence .v ∈ im P.
Let .v ∈ im P, then we have . Pv = v and we obtain:
. Pv = v ⇔ 1
2
(id + f )v = v ⇔ 1
2
v+ 1
2
f v − v = 0 ⇔ 21 f − 21 v = 0
⇔ fv = v
. E(1) = im P. (9.12)
∎
Similarly, one can show that . E(−1, f ) = ker P.
Using the procedure of Example 9.3, we see that we can write .ker P =
im(id − P) and we get . E(−1) = im(id − P) such that we have the direct
sum decomposition of .V :
Using Eq. (9.12) and the above results, we define. P1 := P and. P−1 := id − P.
The spectral decomposition of . f is given by
. f = 1 P1 + (−1)P−1 . (9.14)
248 9 The Role of Eigenvalues and Eigenvectors
. f bm = 0.
Thus .λ = 0 is the only eigenvalue of . f and . E(0) = ker f (see also Definition
9.12, Lemma 9.4, and Proposition 9.6). In this example, there is no basis of
eigenvectors and so. f is not diagonalizable. As we shall see later, in Theorem
(9.4), the fact that . f is a nilpotent operator is not an accident, it is the heart of
non-diagonalizability.
These examples show that the terms eigenvalue, eigenvector, and eigenspace are
very natural ingredients of operators. This applies regardless of how they can be
specifically determined.
[10]
Example 9.6 . A = 00
[0 0]
. P1 := A and P0 := 01
[11]
Example 9.7 . A = 1
2 11
We see explicitly that .v1 ⊥v2 . So we get . E(1) = Rv1 = im A and . E(0) =
Rv0 = ker A.
R2
ξ→
Rv1 = E(1)
Aξ→
R1
Rv0 = E(0)
[1 ]
Example 9.8 . A = 0
0 −1
So we have explicitly:
[1 ][1] [1] [1 ] [0] [0]
.
0
0 −1 0 = 0 and 0
0 −1 = 1 = (−1) 1 .
[1]
[ 0 ]eigenvalues, .λ1 = 1 and .λ2 = −1, with eigenvectors .v1 = 0
There are two
and .v−1 = 1 and . E(1) = R1 , . E(−1) = R2 . For the corresponding projec-
tions, we get
. P1 : R −→ E(1), P−1 : R −→ E(−1).
2 2
. A = 1 P1 + (−1)P−1
[1 ] [ ] [ ]
0
0 −1 = 1 01 00 + (−1) 00 01
R2
ξ→
R1
Aξ→
[ ]
cos ϕ sin ϕ
Example 9.9 . A = sin ϕ − cos ϕ
. A = 1 P1 + (−1)P−1 .
9.3 Examples 251
[ ]
cos ϕ − sin ϕ
Example 9.10 . A = sin ϕ cos ϕ
This is a rotation by the angle .ϕ. Unless .ϕ ∈ πZ, there is no subspace of .R2
that stays . A-invariant and thus there are neither eigenvalues nor eigenvectors.
Observe that this behavior is only because we are working over .R, and it is
quite different over .C!
[0 1 0]
Example 9.11 . A = 001
000
A is a nilpotent matrix.
[0 0 1] [0 0 0]
. A2 = 000 , A3 = 000
000 000
In what follows, we consider . f fixed and write . E λ ≡ E(λ, f ). As we see, the restric-
tion of . f to . E λ < V, f | Eλ acts only as a multiplication by .λ, the simplest nontrivial
action an operator can do. It is interesting to notice that for .λ1 /= λ2 , it follows that
. E λ1 ∩ E λ2 = {0} so that . E λ1 and . E λ2 are linearly independent (see Definition 3.12).
This means that the eigenvectors corresponding to distinct eigenvalues are linearly
independent. This is shown in the following proposition.
Proof Suppose that the eigenvectors.v1 , . . . , vr are linearly dependent. This will lead
to the contradiction that one of them must be zero. (By definition, every eigenvector
252 9 The Role of Eigenvalues and Eigenvectors
v = αμ vμ , μ ∈ I (k − 1), αμ ∈ K.
. k (9.16)
Acting by . f , we obtain
λ v = λk αμ vμ = αμ λk vμ .
. k k (9.19)
0 = αμ (λμ − λk )vμ .
. (9.20)
Since .(λμ − λk ) /= 0 and the .(v1 , . . . , vk−1 ) are linearly dependent, it follows that
Equation (9.16) shows that .vk = 0, this is in contradiction to the fact that .vk is an
eigenvector and therefore nonzero. This completes the proof and the list .(v1 , . . . , vr )
is linearly independent. ∎
From the above proposition, we can directly deduce the following corollary.
. E λ1 + · · · + E λr = E λ1 ⊕ · · · ⊕ E λr ;
∑
r
. dim E λi < dim V.
i=1
9.3 Examples 253
since. E λi < V, W < V and the above sum is direct which means that the subspaces
(E λ1 , . . . , E λr ) are linearly dependent. This leads directly to the result:
.
So we have :
. f v = λv ⇔ Fv B = λv B . (9.22)
.φB ◦ f = F ◦ φB . (9.23)
254 9 The Role of Eigenvalues and Eigenvectors
This means that by using matrices, we can obtain everything we want to know
about operators. In particular, the eigenvalues of an operator . f are given by the
eigenvalues of the corresponding representation . f B ≡ F.
How do we find the eigenvalues of a matrix? To answer this, we first notice that
.λ is an eigenvalue of . F if and only if the equation
(F − λ1n )→
. v=0 (9.27)
χ F (λ) = 0.
.
The equation .det(x1 − F) = 0 holds if and only if the matrix .x1 − F is not
invertible, which is equivalent to the statement:
. ker(x1 − F) /= {0}.
9.3 Examples 255
. det[x1 − F] = 0.
. χ f (x) := χ f B (x).
This definition is well defined since it is independent of the chosen basis . B. The
following lemma shows this.
f = φ B ◦ f ◦ φ−1
. B B and f C = φC ◦ f ◦ φC−1 .
f = φC ◦ f φC−1 = φC ◦ (φ−1
. C
−1
B ◦ f B ◦ φ B ) ◦ φC ,
f C = T ◦ f B ◦ T −1 .
This proves also that .det f := det f B is well defined and that the determinant of
the operator .xidV − f is the characteristic polynomial of the operator . f :
As expected, there is a relation between the geometric.n λ = dim E λ and the algebraic
multiplicity .m λ .
so
'
.χ f (x) = (x − λ)n λ +m λ Q2 (x), (m 'λ > 0).
[12]
Example 9.12 . F = 24
The[ eigenvalue
] equation .det[x 1 − F] = 0 for the above . F is given by
x−1 −2
.det −2 x−4 = 0. This leads to
(x − 1)(x − 4) − 4 = 0 ⇔ x 2 − 5x + 4 − 4 = 0 ⇔ x 2 − 5x = 0.
.
So the eigenvalues of . F are .λ1 = 0 and .λ2 = 5. One can calculate an eigen-
vector corresponding to .λ1 from the matrix equation
[ 1 2 ] [ ξ1 ] ξ 1 +2 ξ 2 =0
. 24 ξ2
= 0 or by the system 2ξ 1 +4 ξ 2 =0
.
[ 2 ]
The solution gives the eigenvector .v0 = −1 and the eigenspace . E(0) =
ker F = Rv0 . Similarly, for the eigenvalue .λ2 = 5 we have
[ 1−5 ] [ −4 ]
.
2
2 4−5 =0⇔ 2
2 −1 = 0.
2ξ [
, with a solution.ξ 2 = 2 ξ 1 . An eigenvector
]
−ξ =0
1 2
. E(5) = Rv5 .
[1 0]
Example 9.13 . F = 00
[11]
Example 9.14 . F = 1
2 11
[ ] 1 1 1
x− 21 − 21
χ F (x) = det
.
− 21 x− 21
= (x − )(x − ) − =
2 2 4
1 1 1 1
= x − x − x + − = x 2 − x = x(x − 1).
2
2 2 4 4
This leads to .x(x − 1) = 0 and to eigenvalues of . F, .λ1 = 0 and .λ2 = 1.
[0 1]
Example 9.15 . F = 10
F is an involution: . F 2 = 1.
.
The characteristic polynomial is given by
[ x −1
]
.χ F (x) = det −1 x = x 2 − 1.
[0 1 0]
Example 9.16 . F = 001
000
. x 3 = 0.
The only solution is.x = 0 and we have only one eigenvalue.λ = 0, as expected.
For a linear map . f ∈ Hom(V, V ' ), the problem of diagonalization was solved in
Sect. 3.3 and in Theorem 3.1. To accomplish this, we simply had to choose two
tailor-made bases . B0 and . B0' in .V and .V ' . For an operator . f ∈ End(V ), the situation
is very different and much more difficult. Here, it is natural to look for only one
9.4 The Question of Diagonalizability 259
tailor-made basis . B0 (one vector space, one basis) and we expect (or hope) to obtain
a diagonal matrix
⎡λ 0
⎤
1
⎢ .. ⎥
⎢ . ⎥
f
. B0 B0 ≡ f B0 =⎢ λs ⎥ with λs ∈ K.
⎣ .. ⎦
.
0 λn
However, we cannot expect to find a diagonal representation for every operator. This
leads to the diagonalizability question and to the following equivalent definitions.
Suppose the basis . B0 is given by the list .(v1 , . . . , vn ). The values . f vs of the basis
vector .vs are given as usual by the expression
This shows that the basis vectors of . B0 are eigenvectors of the map . f . Hence a
tailor-made basis . B0 is an eigenvector basis of . f . This is Definition 9.8.
Conversely, if Definition 9.8 holds,. f has a basis of eigenvectors.C = (c1 , . . . , cn )
and then we have . f cs = λs cs . This means . f C = (λs δsi ), and we see immediately that
[λ 0
]
1
f = .. .
. C
.
0 λn
Proof From Proposition 9.1 we know that if the eigenvalues .λ1 , . . . , λn are distinct,
so the list of the .n corresponding eigenvectors .v1 , . . . , vn is a linearly independent
set. Since .dim V = n, it follows that .(v1 , . . . , vn ) is a basis of .V . This basis consists
of eigenvectors. So by Proposition 9.3, . f is diagonalizable. ∎
. V = U1 ⊕ · · · ⊕ U j ⊕ · · · ⊕ Ur (9.31)
. f j : U j −→ U j so that
. f = f 1 ⊕ · · · ⊕ f j ⊕ · · · ⊕ fr . (9.32)
. P j : V −→ U j .
The following proposition summarizes the two aspects, the direct sum decomposition
of .V and the complete orthogonal system of projection.
. V = U1 ⊕ · · · ⊕ Ur
with .U j = im P j .
Proof (i)
The above direct sum allows to write for .v ∈ V and .u j ∈ U j , j ∈ I (r ):
v = u 1 + · · · + u j + · · · + ur .
.
If we define projectors . P j :
262 9 The Role of Eigenvalues and Eigenvectors
. P j : V −→ U j
v |−→ P j v := u j .
We need to show that .(P1 , . . . , Pr ) is a direct decomposition of identity, that is, that
– . P j is linear;
– . Pi P j = δi j ;
– . P1 + . . . + Pr = id.
(See Exercise 9.4).
Conditions (i) and (ii) of Definition 9.9 hold and so. P1 , . . . , Pr is the corresponding
abstract decomposition. ∎
Proof (ii)
We have to show that for .v ∈ V there is a unique decomposition:
.v = u 1 + · · · + u j + · · · + ur
v = idV v = (P1 + · · · + P j + · · · + Pr ) v = P1 v + · · · + Pr v = u 1 + · · · + u r .
.
. 1 y + · · · + y j + · · · + yr = 0.
0 = (P j (y j + · · · + yr ) = P1 y j = P j P j x j = P j2 x j = P j x j = y j
.
for all. j ∈ I (r ). Therefore, the decomposition is unique so (ii) and with it Proposition
9.5 is proven. ∎
We are now in the position to give a geometric characterization of
diagonalizability.
. V = E λ1 ⊕ · · · ⊕ E λr .
(b) . P1 + · · · + Pr = idV ;
(c) . f = λ1 P1 + · · · + λr Pr .
According to Definition 9.9, the properties (a) and (b) state that the list.(P1 , . . . , Pr ) is
a direct decomposition of the identity. Assertion (iv) states that there exists a spectral
decomposition of .V induced by the operator . f .
. B0 = (B1 , B2 , . . . , Br ) (9.33)
( j) ( j)
with . B j the basis of . E λ j , B j := (b1 , . . . , bn j ) j ∈ I (r ) and
.n j = dim E λ j . (9.34)
Since .V = span(B1 , . . . , Br ),
.n = dim V = n 1 + · · · + n r . (9.35)
χ f (x) = (x − λ1 )m 1 · · · (x − λr )m r ,
. (9.36)
the equation
.n = m 1 + · · · + mr . (9.37)
. E λ1 + E λ2 = E λ1 ⊕ E λ2 .
. f − λ j id = (λ1 P1 + . . . + λ j P j + · · · + λr Pr ) − (λ j P1 + . . . + λ j Pr )
= (λ1 − λ j )P1 + . . . + (λ j−1 − λ j )P j−1 + (λ j−1 − λ j )P j + . . . + (λr − λ j )Pr
0 = ( f − λ j id)(x)
.
. Eλ j = U j ,
V =E λi ⊕ · · · ⊕ E λr
and we have (iii). Thus, (iii) and (iv) are equivalent. So Theorem 9.1 is proven. ∎
For this theorem, an interesting and clarifying conclusion is given by the following
statement.
Proof We use the notation of Theorem 9.1 and the proof there. The proof is straight-
forward since we may write .α = ( j, μ) and .Vα = V j,μ := span(bμ( j) )
μ ∈ I (n j ). ∎
266 9 The Role of Eigenvalues and Eigenvectors
. F = P D P −1 ⇔ D = P −1 F P ⇔ F P = P D. (9.38)
corresponding eigenvalues of . F.
In other words the matrix . P considered as a list of .n columns (vectors in .Kn )
is an eigenbasis of . F. We can also see it as follows:
For
. P := (c1 , . . . , cn ) and D := diag(λ1 . . . , λn ),
and [λ ]
1
. D = diag(λ1 , . . . , λ1 , λ2 , . . . , λ2 , . . . , λr , . . . , λr ), (9.42)
After having discussed the question of diagonalizability, we may now ask what it
is suitable for.
There are a lot of reasons. In physics, we sometimes call it the decoupling proce-
dure, which shows an important aspect. In mathematical terms, when we are looking
for the most straightforward possible representation of an operator, we have to choose
an appropriate basis which we may, in this case, also call a tailor-made basis. It turns
out that this basis consists of eigenvectors. So the diagonalization reveals the true
face of an operator and its geometric properties. In the case where . f corresponds
to a physical observable, the eigenvalues of . f are exactly the physical values of the
experiment. In addition, diagonalization allows a considerable simplification in the
calculations.
At this point, it is instructive to compare the situation between .Hom(V, V ' ) and
'
.End(V ). In the case of .Hom(V, V ), the diagonalization is relatively straightforward
to obtain. As we saw in Sect. 3.3, Theorem 3.1, and Remark 3.3, we have at our
disposal two bases, one in .V and the other one in .V ' . In the case of .End V , it seems
natural to work with one basis since the domain and the codomain are identical,
and the problem is far more complex. This leads to the question of normal form for
endomorphisms. This problem was firstly discussed in Remark 3.4 (On the normal
form of endomorphisms) and Proposition 3.12 where we showed some obstacles to
using diagonalization.
But there is a chance! The space .End(V ) has much more structure than the space
'
.Hom(V, V ). .End(V ) is an algebra: the operator can be raised to powers of . f and to
linear combination of powers of . f , in contrast to linear maps in .Hom(V, V ' ).
We have
. f k := f ◦ . . . . . . . . . ◦ f ∈ End(V ).
k − times
This also means that we may talk about linear combinations of powers of . f ,
with . f 0 = id.
ϕ(x) = αm x m + · · · + α2 x 2 + α1 x + α0 .
. (9.44)
. f |−→ ϕ( f ) := αm f m + · · · + α2 f 2 + α1 f + α0 f 0 , (9.45)
268 9 The Role of Eigenvalues and Eigenvectors
This means for the exploration of the subtle properties that we may use any of the
operators .ϕ( f ) in (9.45) with .ϕ ∈ K[x]. This gives us additional information about
the one operator . f ! In this spirit, we already took advantage of this when we used
the linear polynomials .lλ j ∈ K[x] given by
. λjl (x) := x − λ j ,
Now, if we continue along this path, we come to very interesting insights about
the operator . f . We consider for example the polynomial . pm (x) = x m and write
.f
m
:= pm ( f ). So if we have to calculate the power of . F ∈ Kn×n , we may use the
diagonalization, as in Eq. (9.38), and we get
. F = P D P −1
and we have
. F 2 = F F = P D P −1 P D P −1 = P D 2 P 1
and similarly
. F m = P D m P −1 .
power of . F directly.
[ ]
Example 9.17 Example of non-diagonalizability: . A = 01 −1 0 .
At first, we consider these matrices as real and then as complex matrices. The
matrix . A leads to the following map:
A
.R2 −→ R2
(e1 , e2 ) −→ (e2 , −e1 ).
R'u = u ' R
R2 = e2 R
u' = A u
ϕ = π2 Ru = uR
. e2 u
ϕ ϕ .
e1 R1 = e1 R
of the equation:
.χ A (x) = 0, (9.48)
Since
χ A (x) > 0,
. (9.50)
. A: C2 −→ C2
(e1 , e2 ) | −→ (e2 , −e1 ).
270 9 The Role of Eigenvalues and Eigenvectors
The eigenvalues of . A are now determined by Eqs. (9.48) and (9.49). Hence we
obtain the eigenvalues .λ1 = −i and .λ2 = i. The corresponding eigenvectors
are given by: [1] [1]
.b1 = i and b2 = −i .
Thus . Ab1 = −ib1 and . Ab2 = ib2 . The eigenbasis . A is given by:
[1 ]
. B = [b1 b2 ] = 1
i −i .
[0 1]
[ 0 1 ] of non-diagonalizability: .C = 0 0 .
Example 9.18 Example
For the matrix .C = 0 0 however, we see that even in a complex vector space
we have:
C
.C2 −→ C2
→ e→1 ].
(e1 , e2 ) −→ [0,
There is no way to make the matrix .C better, that is, a diagonal one. Here, the
eigenvalue equation
[ x −1 ]
χC (x) = det
.
0 x = x2 = 0 (9.51)
Especially, as we shall see, the action of nonlinear polynomials turns out to be very
helpful again. To proceed, we have to restrict ourselves to complex vector spaces or,
more generally, to operators in.C vector spaces with characteristic polynomials which
decompose into linear factors because here, the theory is much easier. In addition,
this is also the first and most significant step towards the theory of operators to real
vector spaces.
Our aim in this section, is not to develop the whole theory but to give a helpful
idea of what we may expect if the operator . f is nondiagonalizable.
The theory of diagonalizability leads us to the conclusion that, for a nondiagonal-
izable operator, the direct sum of the eigenspaces is not sufficient to decompose the
whole vector space .V . In this case, we obtain strict inequality:
r
. ⊕ Eλ j < V (9.53)
j=1
As we see, at the number .m, there is a certain saturation such that thereafter the
equality follows endlessly.
Proof In the proof of Proposition 3.12, we saw that the inequality .ker g ≤ ker g 2
holds. It is an easy exercise (see Exercise 9.5) that for every power .k ∈ N : ker g k ≤
ker g k+1 . Thus we have the following sequence of inequalities:
. ker g < ker g m−1 < ker g m = ker g m+1 = ker g m+2 = . . . .
272 9 The Role of Eigenvalues and Eigenvectors
if
. x ∈ ker g k+1 \ ker g k ,
then g x ∈ ker g k \ ker g k−1 ,
since
Conversely,
Hence, if .ker g k+1 = ker g k , we see through induction that .ker g N = ker g k ∀N > k.
Then there are two possibilities. Either there is some .m ∈ N such that
of vector spaces of ever increasing dimension. But since the dimensions are bounded
above by .dim V = n (as .ker g k ≤ V ∀ k ∈ N), this is not possible. ∎
and
. f w = λw + w ' ∈
/ Eλ.
( f − λidV ) N v = 0.
. (9.54)
Thus we see that .Wλ is the set of all generalized eigenvectors of . f with respect to
λ, the vector .0 included. Additionally, we see that the eigenspace . E λ is contained in
.
. E λ ≤ Wλ .
. f : Wλ j −→ Wλ j ∀ j ∈ I (r ).
. h m = 0 with m ∈ N.
. N m = 0.
9.5 The Question of Non-diagonalizability 275
. N m v = λm v = 0 and thus λ = 0.
. E W (N ) = {0}.
The result of this lemma is that one nilpotent matrix is similar to a strictly upper
triangular matrix: [ ]
0 ∗
.. .
.
.
0 0
The next proposition shows that a nilpotent matrix is similar to a strictly upper
triangular matrix.
276 9 The Role of Eigenvalues and Eigenvectors
Proof
– (i) .⇒ (ii)
Let . A be nilpotent. Lemma 9.4 shows that .λ = 0 is an eigenvalue and so . A is
similar to the matrix
[ ]
0 ∗
.N =
0→ N1 with N1 ∈ K(n−1)×(n−1) .
The matrix . N1 is, after an induction argument .n, a strictly upper triangular matrix
of the form [ ]
0 ∗
N1 = .. . . ,
.
. .
0 ··· 0
It follows that [ξ ∗
]
χ A (ξ) = det(ξ1n − A) = det .. = ξn .
.
.
0 ξ
In our case, the linear polynomial .lλ j with . L j := lλ j ( f ) ∈ End(V ), together with the
generalized eigenspace, .Wλ j leads to the nilpotent operator
. ( f − λ j idV )| Eλ j , (9.58)
which is by definition the zero operator of the eigenspace . E λ j ! Denoting the restric-
tions of . f and .idV on the generalized eigenspace .Wλ j by . f j and .id j , we get
. f j = λ j id j + h j , j ∈ I (r ). (9.60)
This leads us to the expectation that there exists a decomposition of the operator . f in
partial operators . f j , each one characterized by the eigenvalue .λ j , and that all these
. f j should have the same structure.
We may expect a universal structure for the operators . f j , j ∈ I (r ), with .r the
number of distinct eigenvalues. The following theorem shows that this is the case.
We prove it by using very elementary methods, as demonstrated by [7, pg. 238]. Note
that most proofs used in the literature use much more advanced techniques for this
kind of theorem. For simplicity, the formalism is given at the level of a matrix:
. F1 ⊕ F2 ⊕ · · · ⊕ Fr .
. F j = λ j id j + H j ∈ Km j ×m j , j ∈ I (r ) (9.62)
with.r the number of distinct eigenvalues of. F and. H j a strictly upper triangular
matrix in the strictly upper triangular form:
⎡ ⎤
0 ∗
⎢ .. ⎥
.Hj = ⎣ . ⎦. (9.63)
0 0
Note that the Jordan approach, using higher level mathematical instruments, contin-
ues to decompose each .Km j into . f -invariant, . f -irreducible vector spaces.
.χ F (x) = (x − λ1 )m 1 · · · (x − λr )m r . (9.64)
with
.χG (x) = (x − λ1 )m 1 −1 (x − λ2 )m 2 · · · (x − λr )m r . (9.66)
with
. F j = λ j I j + H j ∈ K(m j ×m j ) j ∈ {2, . . . , r } (9.68)
and
. F1∗ = λ1 1 + H1 ∈ K(m 1 −1)×(m 1 −1) . (9.69)
From Eqs. (9.65), (9.67) and (9.68), it follows that . F is similar to a block matrix
given by ⎡ ⎤
F1 C2 · · · C j · · · Cr
⎢ 0 F2 0⎥
⎢ ⎥
.C = ⎢ .. ⎥ and C j ∈ Km 1 ×m j . (9.70)
⎣ . ⎦
0 0 Fr
Now, we have to show that .C is similar to the matrix . F. This means, we would like
to obtain something of the form
⎡ ⎤
F1 Y2 · · · Yr
⎢ 0 F2 0 0 ⎥
−1 ⎢ ⎥
.F = B CB = ⎢ .. ⎥ (9.71)
⎣ . 0⎦
0 0 Fr
with
. Y j = 0. (9.72)
For this purpose we choose the following invertible matrix . B with a similar form as
C in Eq. 9.70:
.
⎡ ⎤
1m 1 B2 · · · Br
⎢ 0 Im 2 0 ⎥
⎢ ⎥
.B = ⎢ . ⎥. (9.73)
⎣ . . ⎦
0 0 1m r
By setting . F j = λ j I j + H j , we get
. Y j = (λ1 − λ j )B j + H1 B j − B j H j + C j . (9.75)
280 9 The Role of Eigenvalues and Eigenvectors
The question is whether we can choose the. B j so that.Y j = 0 for every. j ∈ {2, . . . , r }.
If we divide the expression in Eq. (9.75) by .(λ1 − λ j ) /= 0, this does not change the
form of Eq. (9.75). So we may assume for some fixed . j, without loss of generality,
that .λ1 − λ j = 1, .∀ j ∈ {2, . . . , r }, just for our proof. Now the question is whether
we can solve the equation
0 = B j + H1 B j − B j H j + C j ,
. (9.76)
. X + H1 X − X H j = −C j . (9.77)
Setting
. L ≡ H1 , R ≡ H j and C0 ≡ −C j (9.78)
. X + L X − X R = C0 with X ∈ Cm 1 ×m j . (9.79)
The system of Eq. (9.79) has a unique solution if the homogeneous equation
. X + LX − XR = 0 (9.80)
has only the zero solution . X = 0. Lemma 9.5, shows that this is true. The solution
X obtained for Eq. (9.79) or equivalently . B j for Eq. (9.75), corresponds to .Y j = 0
.
in Eq. (9.71) and . F is similar to the block diagonal form in Eq. (9.61). This proves
the theorem. ∎
The lemma we used ensures that for the nilpotent matrices . L and . R(L ≡ H1 , R ≡
H j ) the homogeneous Eq. (9.80) has only the trivial solution .(X = 0).
. X (1) = L X − X R
X (2) = L X (1) − X (1) R
X (2) = L 2 X − 2L X R + X R 2
X (3) = L 3 X − 3L 2 X R + 3L X R 3 − X R 3
∑l ( )
l
X (l) = (−1)k L l−k X R k .
k=0
k
.X
(2n) = L 2n X + L 2n−1 X R + · · · L n+1 X R n−1 − L n X R n + L n−1 X R n+1 + · · · X R 2n .
Then
. X = L X − X R = X (1)
= X (2) = . . . = X (2n) = 0.
The structure theorem is, in particular, valid for all complex matrices since
the fundamental theorem of algebra states that every nonconstant polynomial
decomposes into linear factors over .C. The above formulation of the theorem
has the advantage that real and complex matrices are treated uniformly.
. A = λ1 + N ∈ Km×m
(λ1 + N )(λ1 + N ) = λ2 1 + λN + N λ + N N
.
. χ F (F) = 0 ∈ Kn×n .
we can apply Theorem 9.2 to reduce to the case where . F is a pre-Jordan matrix. This
leads to the following expression:
⎡ ⎤
χ F (F1 ) 0
⎢ χ F (F2 ) ⎥
⎢ ⎥
χ F (F) = ⎢
. .. ⎥.
⎣ 0 . ⎦
χ F (Fr )
m
with. H j nilpotent. So we have. H j j = 0 and.χ F (F j ) = 0 ∈ K(m j ×m j ) for all. j ∈ I (r ).
This leads to
.χ F (F) = 0 ∈ K
n×n
At this point, two questions arise simultaneously. Firstly, what exactly does the
expression
.χ F (F) = 0 (9.82)
in the Cayley-Hamilton Theorem 9.2 mean? Secondly, are there other polynomials
that fulfill the same relation? In Eq. (9.82), .χ F ≡ χ ∈ K(x) is the characteristic
polynomial of . F ∈ Kn×n ,
(1n , F, F 2 , . . . , F n−1 , F n )
. (9.85)
is linearly dependent. Since the space .Kn×n is .n 2 -dimensional and a list like
2
−1 2
.(1n , F, F 2 , . . . , F n , Fn ) (9.86)
284 9 The Role of Eigenvalues and Eigenvectors
is always linearly dependent, we also see that the .n + 1 elements in (9.85) which are
fewer than the .n 2 + 1 elements in (9.86), are already linearly dependent.
In connection with this, a logical question is whether a list of powers of . F with
even smaller length than .n + 1 could also be linearly dependent. It is clear that here
we talk about a vector space which we denote by .K[F] and which is generated by
the powers of a matrix . F:
. span(1n , F, F 2 , F 3 , . . . ). (9.87)
So we define
It is further clear that the vector space .K[F] is also a commutative sub-algebra of the
matrix algebra .Kn×n .
All this leads to the notion of minimal polynomials.
Let
. I F := {χ ∈ K[x] : χ(F) = 0} (9.89)
.a, b ∈ I ⇒ a + b, a − b ∈ I,
i ∈ I, a ∈ A ⇒ ai ∈ I.
.
Using the formalism of Sect. 1.3, we can state that there is an action of polynomials
on the annihilators of . F:
9.6 Algebraic Aspects of Diagonalizability 285
K[x] × I F −→ I F
.
(χ, ϕ) |−→ ϕχ ∈ I F .
It is interesting to see that, using the definition .χ F (x) = det(x1 − F), we can write
for .χ F (F) = 0:
Hence,
Hence the list .(1n , A, A2 , . . . , Am ) is linearly dependent and therefore there exist
coefficients .λs ∈ I (m). Not all of them are zero so that
λ As = 0
. s (9.92)
χ(x) := λs x s ,
. (9.93)
Since the set of annihilators of . A, I A , is an ideal of .K[x], it follows from the theory
of polynomials that
. I A = K[x]μ,
.μ F (F) = 0, (9.95)
The next proposition shows that the eigenvalues of an operator are also zeros of the
minimal polynomial.
Using Eq. (9.96) and .μ(A) = (A − λ1)Q(A), we obtain from Eq. (9.97):
The characteristic polynomial and the minimal polynomial have exactly the same
zeros even though they may have different multiplicities. For example, for an operator
. A on a .C-vector space:
9.6 Algebraic Aspects of Diagonalizability 287
. χ A (x) = (x − λ1 )m 1 (x − λ2 )m 2 · · · (x − λr )m r (9.98)
and
μ A (x) = (x − λ1 )d1 (x − λ2 )d2 · · · (x − λr )dr
. (9.99)
.χ F (x) = (x − λ1 )m 1 (x − λ2 )m 2 · · · (x − λr )m r (9.101)
μ F (x) = (x − λ1 )(x − λ2 ) · · · (x − λr ),
.
. B0−1 F B0 = Δ (9.102)
commute.
Using Eqs. (9.102) and (9.103), we obtain
⎡ ⎤
F1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
. = ⎢ Fj ⎥ (9.107)
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
Fr
with . N 'j again a strictly upper triangular matrix. Taking .ϕ(F j ), we obtain
with . N ''j the strictly upper triangular for all . j ∈ I (r ). Thus .ψ(F) is nilpotent. Now,
since .ψ(F) is both nilpotent and diagonizable, it follows from (iii) that .ψ(F) is zero.
Therefore, the above polynomial .ψ(x) is the minimal polynomial as it is already
minimal: .μ(x) = ψ(x). This proves (iv).
. F2 − λ2 12 = 0m 2 , . . . , Fr − λr Kr = 0m r .
Theorem 9.4, with the assertion (iii), could also be formulated differently:
A matrix . F is diagonalizable if and only if its characteristic polynomial decomposes
into linear factors and is semisimple.
This can even be shortened if we write for the decomposability of the characteristic
polynomial .χ F of the matrix . F:
A matrix . F is diagonalizable if and only if . F is decomposable and semisimple. This
is a pure algebraic point of view for diagonalizability.
The structure theorem in the previous section showed that for every operator in a
complex vector space, there exists a basis that leads to an upper triangular matrix
representation. As in the case of diagonalization, special bases play a crucial role.
But in this section, we will not discuss an . f -invariant decomposition of the given
vector space .V . We are going to use a procedure which allows to consider the vector
space .V as a whole. This is, in addition, particularly relevant and useful for proving
the spectral theorem (see Sect. 10.6) and consequently for a better understanding of
that theorem.
It is clear that Proposition 9.9 (iv) is basis-independent. Note that the above . f -
invariant “flag” is not what we usually mean by an . f -invariant decomposition of .V .
The existence of a basis . B0 , as in the above definition, was shown in the structure
theorem in Sect. 9.5. It is still instructive and useful to see a second proof.
9.7 Triangularization and the Role of Bases 291
0 x − ϕnn
so it decomposes into linear factors. If, on the other hand, the characteristic polyno-
mial .χ f is given by .χ f (x) = (x − λn ) · · · (χ − λn ), where .λ1 . . . , λn are the eigen-
values of . f (repetition of course is allowed), we show the assertion by induction
on .n.
For .n = 1, . f B B ∈ K1×1 is already upper triangular.
We start with an eigenvector .v1 ≡ b1 for .λ1 : f v1 = λ1 v1 and choose a basis
. B = (b1 , u 2 , . . . , u n ) of . V . We set . B1 = {v1 }, . B2 = (u 2 , u 3 , . . . , u n ), U1 = span B1 ,
. V = U1 ⊕ U2 .
We can write . f B B as ( )
λ1 h B1 B2
f
. BB =
0 g B2 B2
and we get
Summary
This chapter marks the beginning of the section of the book that we consider advanced
linear algebra. From now on, eigenvalues and eigenvectors take center stage.
Initially, we extensively presented the meaning, usefulness, and application of
eigenvalues and eigenvectors in physics, facilitating the reader’s entry into this
sophisticated area of linear algebra with many examples.
The question of diagonalization and the description of this process were the central
focus of this chapter. Highlights included two theorems. The first, the equivalence
theorem of diagonalizability, addressed the geometric aspects of this question. The
second theorem, concerning the algebraic perspective of diagonalizability, required
rather advanced preparation.
To understand the question of diagonalization properly, one must also understand
what non-diagonalizability means. Here too, we eased access to this question with
examples and theory. This theory led to the so-called pre-Jordan form, as we refer
to it here, which every diagonalizable and nondiagonalizable operator in a complex
vector space possesses. The highlight here was the structure theorem (pre-Jordan
form).
At the end of this chapter, we also discussed triangularization.
. E λ1 + · · · + E λr = E λ1 ⊕ · · · ⊕ E λr ;
∑
r
. dim E λi < dim V.
i=1
9.7 Triangularization and the Role of Bases 293
Exercise 9.4 Direct sum and direct decomposition. (See Proposition 9.5)
If . V = U1 ⊕ · · · ⊕ U j ⊕ · · · ⊕ Ur is a direct sum of subspaces .U1 , . . . , Ur , then
show that there is a list .(P1 , . . . , Pr ) of projections in .V with a direct decomposition
of the identity such that for each . j ∈ I (r ), .U j = im P j (See Definition 9.9). This
means that if we define projections . P j :
. P j : V −→ U j
v |−→ P j v := u j .
We need to show that .(P1 , . . . , Pr ) is a direct decomposition of identity, that is, that
– . P j is linear;
– . Pi P j = δi j ;
– . P1 + . . . + Pr = id.
Exercise 9.6 For an operator . f ∈ Hom(V, V )' , show that for every .k ∈ N,
. im f k+1 ≤ im f k .
[ ]
Exercise 9.8 Let . F be the matrix . F = 00 01 (as in Example 9.18), and .U =
span(e1 ). Show that there is no (complementary) . F-invariant subspace .Ū of .U ,
such that
.K = U ⊕ Ū ,
2
The following five exercises are applications of Comment 9.3 about the action
of polynomials on the algebra .Kn×n in connection with diagonalization and
spectral decomposition induced by a diagonalizable matrix, as discussed in
Theorems 9.1 and 9.4.
ev A :
. K[x] −→ K[A],
ϕ |−→ ϕ(A),
ϕs x s |−→ ϕs As , s ∈ N,
is an algebra homomorphism.
. ker(ev A ) = I A ≤ K[x]
μ(x) ϕ j (x)
ϕ j (x) :=
. and ψ j (x) := , x ∈ K; j ∈ I (r )
x − λj ϕ j (λ j )
(i) .ψi ψ j − δi j ψi ∈ I A , i, j ∈ I (r ),
(ii) .ψ1 + · · · + ψr − 1 ∈ I A ,
(iii) .λ1 ψ1 + · · · + λr ψr − id ∈ I A , with .id(x) = x.
So the evaluation map .ev A leads directly to the desired result.
Exercise 9.12 Let.V be a vector space and. f ∈ Hom(V, V ). Show that the following
subspaces are . f -invariant:
(i) .ker f ,
(ii) .im f ,
(iii) .U such that .U ≤ ker f ,
(iv) . W such that . W ≥ im f .
Exercise 9.13 Let .V be a vector space and . f, g ∈ Hom(V, V ) with the property
f ◦ g = g ◦ f . Show that .im f is .g-invariant.
.
Exercise 9.20 Show that the evaluation map commutes with the adjoint representa-
tion of the linear group: Let . M ∈ Kn×n , . F ∈ G L(n, K) and .ϕ ∈ K[x]. Show that
ϕ(F M F −1 ) = Fϕ(M)F −1 .
.
Exercise 9.22 This exercise is significant in connection with the minimal polynomial
of an operator.
Let . f ∈ Hom(V, V ), let .v ∈ V with .v /= 0, and let .μ be a polynomial of smallest
possible degree such that .μ( f )v = 0. Show that if .μ(λ) = 0, this .λ is an eigenvalue
of . f .
Choose step by step, first a basis of .ker f and then extend this to a basis of .ker f 2
and so on. The result is a basis of .V , and show that with respect to this, the basis
matrix . F has the desired form.
The following two exercises are first a wrong and then a more direct proof of
the Cayley-Hamilton theorem (see Theorem 9.3).
Show, using the expression .(x1n − F)(x1n − F)# = det(x1n − F)1n (see Proposi-
tion 7.4), by a direct calculation:
.χ F (F) = 0 ∈ Kn×n .
Exercise 9.27 This exercise answers the question whether for every given polyno-
mial.φ(x) = x n + ϕn−1 x n−1 + · · · + ϕ1 x + ϕ0 , ϕ0 , · · · , ϕn−1 ∈ K a corresponding
matrix . F ∈ Kn×n exists, such that the characteristic polynomial of . F is exactly the
given polynomial .φ(x).
Check that the matrix ⎡0 −ϕ
⎤
0
10 −ϕ1
⎢ .. ⎥
.F = ⎢
.. ⎥
⎣ . .n−2 ⎦
0 −ϕ
1 −ϕn−1
In this chapter, we summarize and complete some important facts about inner
product spaces, real and complex ones, mostly on a more advanced level than in
Chaps. 2 and 6. In this context, the notions of orthogonality, orthogonal compliment,
orthogonal projection, and orthogonal expansion are discussed once again.
In Sect. 10.4, we motivate and discuss normal operators in some detail. We give
both an algebraic and a geometric definition of normal operators. We explain a
surprising and gratifying analogy of normal operators to complex and real numbers.
The highlight of this chapter and in fact of linear algebra altogether, are the spectral
theorems that are treated at the end of this chapter.
Vector spaces with additional structures are especially welcome in physics and math-
ematics. One such structure which we particularly like, is an inner product space.
This has to do with our surrounding space being a Euclidean space. It seems nat-
ural, especially in applications, to prefer spaces with more structure than a pure
abstract vector space in mathematics. In connection with inner product spaces, there
is the marvelous Pythagorean theorem. In physics, whenever a mathematical space
is needed as a model for physical reality, there is the tendency to burden it immedi-
ately with as many structures as possible and often with even more structures than
required. Therefore, in physics, when we are talking about vector spaces, we usually
mean an inner product vector space or, as it is also called, a metric vector space or a
vector space with a metric.
As we saw, we distinguish between real and complex vector spaces for abstract
vector spaces. The same is true for inner products; in the real case, we talk about
Euclidean vector spaces, and in the complex case, about unitary vector spaces.
The standard Euclidean vector space in n-dimensions is, as we already know, .Rn
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 299
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_10
300 10 Operators on Inner Product Spaces
with the usual real dot product. The standard unitary vector space in n-dimensions
(n-complex dimensions) is .Cn with the usual complex dot product (Hermitian inner
product). Inner product vector spaces are characterized by the property of having
positive definite scalar products. In physics, especially in the real case, there are
also nondegenerate scalar products, and so we also talk about semi-Euclidean or
pseudo-Euclidean vector spaces. This is the case in special and general relativity.
In this chapter, before coming to operators, we review a few facts about inner
product vector spaces and discuss some critical applications of the metric structure
connected with orthogonality.
For a vector space.V with dimension.n over a field.K = R or.C, the inner product.(−|−)
was defined in Sect. 10.3. For the standard vector space .Kn , the standard (canonical)
inner product .(−|−)0 is given in the form of the complex dot product:
(u|v)0 := ū 1 v 1 + · · · + ū n v n = u i v i ,
.
.(V, (|)) ∼
= (Kn , (|)0 ).
|| · ||2
. : V −→ R+ ∪ {0},
v |−→ ||v||2 = (v|v).
As we see, the inner product can be expressed in terms of its quadratic form.
10.3 Orthonormal Bases 301
We first recall the definition of the orthogonality for vectors .u, v ∈ V . We say .u and
v are orthogonal if .(u|v) = 0 and we write .u ⊥ v.
.
. M ⊥ = {v ∈ V : (u|v) = 0 ∀u ∈ M}.
Proof If.(v1 , . . . , vk ),.k ∈ I (n) is a list of nonzero orthogonal vectors and if.vi λi = 0
with .λi ∈ K, then
Since each .vi /= 0, positive definiteness tells us that .λi = 0 ∀ i ∈ I (h). Thus,
.(v, . . . , vn ) is linearly independent. ∎
b := a1
. 1 with
bi
ci := i ∈ I (n) and
||bi ||
∑
k−1
bk := ak − cμ (cμ |ak ), k ∈ {2, . . . , n}.
μ=1
Set
. Ai := (a1 , . . . , ai ),
Bi := (b1 , . . . , bi ),
Ci := (c1 , . . . , ci ),
and Vi := span Ai .
So
= (c j |an ) − (c j |an ) = 0.
Since.cn and.bn are colinear, we have also.(c j |cn ) = 0 and.C is indeed an orthonormal
basis of .V . ∎
Proof Section 9.7 and the proposition made there showed that there exists a
basis . B0 = (b1 , . . . , bn ) of .V so that . f B0 B0 is triangular, and that .span B j =
span(b1 , . . . , b j ) is . f -invariant for all . j ∈ I (n). We apply the above proposition
concerning the Gram-Schmidt orthogonalization to the basis . B0 , with .span(C j ) =
span(c1 , . . . , c j ) = span(b1 , . . . , b j ) for all . j ∈ I (n). So we conclude that .span C j
is also . f -invariant for all . j ∈ I (n) and that . f CC is triangular. ∎
As we saw, metric structures (inner products) on vector spaces lead to the notion of
orthogonality and the orthogonal complement. This allows a refinement of the direct
product and the parallel projection to the orthogonal sum and orthogonal projection.
These are pure geometric properties well-known from Euclidean geometry and show
once more the entanglement between geometry and algebra in linear algebra. We are
first going to study some elementary properties of orthogonal complements in the
following propositions:
304 10 Operators on Inner Product Spaces
(ii) .U ∩ U ⊥ = {0};
The symbol .< stands for “subspace of” as throughout this book.
Proof (i)
Since .(0|u) = 0 for .u ∈ U , it then follows that .0 ∈ U ⊥ . By the linearity of the scalar
product .(v|·), we have, when .w, z ∈ U ⊥ , .(w|u) = 0, and .(z|u) = 0 for every .u ∈ U .
It then follows that .(w + z|u) = (w|u) + (z|u) = 0 + 0 = 0 so that .w + z ∈ U ⊥ .
Similarly .λw ∈ U ⊥ . ∎
Proof (ii)
Let .z ∈ U ∩ U ⊥ . Then .z ∈ U , .z ∈ U ⊥ so that .(z|z) = 0. Hence .z = 0 and thus .U ∩
U ⊥ = {0}. ∎
Proof (iii)
Let .w̄ ∈ W ⊥ . Then, as .U ⊆ W , we have .(w̄, u) = 0 ∀ u ∈ U . Thus .w̄ ∈ U ⊥ and so
⊥
.W ⊆ U ⊥. ∎
Proof (iv)
For all .v ∈ V , we have .(v|0) = 0 and so .v ∈ {0}⊥ which means that .{0}⊥ = V . For
⊥ ⊥
.v ∈ V , we have .(v|v) = 0 and so .v = 0 which means that . V = {0}. ∎
.U1 Θ U2 Θ · · · Θ Uk .
10.4 Orthogonal Sums and Orthogonal Projections 305
U ⊕ U ⊥ = V.
.
This means that every subspace.U uniquely induces a direct sum decomposition
of.V , an orthogonal decomposition. We may also write as above.U Θ U ⊥ = V.
u = u i (u i |v).
.
Let
ũ = v − u i (u i |v).
.
We now show that .ũ ∈ U ⊥ or, equivalently, that .(u j |ũ) = 0 for all . j ∈ I (k):
(u j |ũ) = (u j |v − u)
.
. (U ⊥ )⊥ = U.
306 10 Operators on Inner Product Spaces
Proof
(i) We show that .U < (U ⊥ )⊥ .
Let .u ∈ U . Then whenever .w ∈ U ⊥ , .(u|w) = 0. Hence .u ∈ (U ⊥ )⊥ and so
⊥ ⊥
.U ⊆ (U ) .
(ii) We show that .(U ⊥ )⊥ ⊂ U :
Let .w̄ ∈ (U ⊥ )⊥ . The orthogonal decomposition of .w̄ ∈ V relative to .U is given
(see Proposition 10.5) by .w̄ = u + z (so .w̄ − u = z) with .u ∈ U and .z ∈ U ⊥ .
Since .u ∈ U from (i), we have .u ∈ (U ⊥ )⊥ and so .z = w̄ − u ∈ (U ⊥ )⊥ . As we
see,.z ∈ U ⊥ ∩ (U ⊥ )⊥ = {0} (Proposition 10.4). So we have.z = 0 and.w̄ − u =
0 ⇒ w̄ = u ∈ U which means that .(U ⊥ )⊥ ⊂ U . So with (i) and (ii) we obtain
⊥ ⊥
.(U ) = U.
∎
Analogous to orthogonal sums, orthogonal projections also lead to more refined and
“perfect” projections than parallel projections.
. PU : V −→ U
v |−→ PU (v) := u.
(u|v)
. PU (v) = u .
(u|u)
or equivalently
∑
k
PU = |u j )(u j |.
j=1
Then
(i) .||v − u||2 > ||PU v − u||2 since ||v − PU v||2 > 0.
(ii) .||v − u||2 = ||v − PU v||2
One could say that normal operators, particularly those on a complex vector space,
are mathematically the nicest operators one can hope to have: They are diagonal-
izable (even if not the only diagonalizable operators) and they have an orthogonal
eigenbasis. As we shall see later, normal operators are the only diagonalizable oper-
ators with an orthonormal basis. They lead to the so-called complex and real spectral
theorems, which underlines their beauty and usefulness. One could also say that they
are the most important operators to a physicist. Self-adjoint operators and isometries
are both normal. We call self-adjoint operators on complex and real vector spaces
also Hermitian and symmetric, respectively. Isometries for complex and real vector
spaces (always finite-dimensional in our approach), are also known as unitary and
orthogonal, respectively.
There is no doubt that an inner product vector space has much more structure than
an abstract vector space. This is why it is, at least in physics, much more pleasant
to have at our disposal an inner product vector space than having only an abstract
vector space with the same dimension. In addition, when dealing with operators in
an inner product vector space, it is natural to be interested in operators that interact
well with the inner product structures. Stated differently, if an operator has nothing
to do with the inner product, then the inner product is not relevant to this operator,
so there is no necessity to introduce a metric in addition to the linear structure. This
justifies dealing with inner product spaces and special operators with a characteristic
intrinsic connection to the inner product. It turns out that these operators are the
“normal” operators and it may appear rather surprising that the normal operators are
not exclusively those operators which preserve the inner product. This is what we
would expect from our experience in similar situations. It may also partially explain
why, in physics, the notion of a normal operator is often absent.
10.5 The Importance of Being a Normal Operator 309
is, first, at the technical level, what is needed for the formulation of the specific
connection of the normal operator with the help of the inner product. At the same
time it is quite clear that the structure “inner product” is a necessary prerequisite
for the definition of what is “adjoint”. In the case of .V = Kn and .V ' = Km , the
inner product is the canonically given dot product. So we have . f ad = f † as was
shown in Sect. 6.3 and Proposition 6.9. From the discussion in Sect. 6.3, it follows
in addition that . f ad is a kind of a substitution for the inverse of . f . Now if we take
this assumption concerning “inverse” seriously and look for a weaker condition for
invertibility, it turns out that this idea can lead us to the definition of normal operators:
If . f ad was really an inverse operator .( f ad = f −1 ), we would have . f ◦ f ad = idV
and . f ad ◦ f = idV . A weaker condition would then be: . f ad ◦ f = f ◦ f ad . This
is exactly the definition of a normal operator! There is in addition a surprisingly
pleasant analogy to complex and real numbers which leads also to the notion of
normal, unitary and self-adjoint operators. For complex and real numbers we recall
the following well-known relation: if .z ∈ C\{0} and .x, y, r, ϕ ∈ R,
√ z /
. z = x + i y, z̄z = z z̄, |z| := z̄z, z = |z|, |z| = r = x 2 + y 2
|z|
z
positive or zero (nonnegative) and . |z| = eiϕ with .| z̄z | = |eiϕ | = 1.
If we ask for an operator . f in .(V, (|)) in a .C vector space with metric structure
which corresponds to .z and its properties shown above, using the analogy .z |→ z̄
with . f |→ f ad , we are led to the normal operators which in addition contain, again
in analogy to the complex numbers, the nonnegative and self-adjoint operators. So for
a normal operator. f , we expect the relation. f ad ◦ f = f ◦ f ad . This produces imme-
diately two obvious special cases for a normal operator: . f ad ◦ f = idV ( f ad = f −1 )
and . f ad = f which are also the most important ones, the isometries and the self-
adjoint operators. Further, if we define . f := h 1 + i h 2 with the two self-adjoint oper-
1 = h 1 and .h 2 = h 2 , it follows that . f normal is equivalent to the commu-
ators .h ad ad
tative relation .h 1 h 2 = h 2 h 1 . All displayed above may show that the introduction of
the notions “adjoint” and “normal” is not made by accident but are connected with
deep structures and are of tremendous relevance for mathematics and physics.
In order to proceed, we shortly recall the definitions of adjoint and self-adjoint
operators given in Sect. 6.3 and Definition 6.4. The adjoint operator of . f ∈ End(V )
is given by:
.( f w|v) := (w| f v) for all v, w ∈ V.
ad
It is clear that for the notion adjoint, self-adjoint and normal operator, the existence
of a metric structure on .V is required which in our case is expressed by the inner
product .s = (−|·):
s : V × V −→ K
.
. f is normal ⇔ f ad f − f f ad = 0
⇔ (w|( f ad f − f f ad )v) ∀ w, v ∈ V
⇔ (w|( f ad f V − (w|( f f ad v)
⇔ ( f w| f v) = ( f ad w| f ad v).
In what follows, we need to specify more of what we know about . f and . f ad . For
this reason, we have to consider separately for. f ∈ End(V ) whether.V is a complex or
real vector space. This means that we have to take into account whether. f is a.C-linear
or a.R-linear operator. Since.C-linearity is a stronger condition than.R linearity, it leads
as expected to more substantial results. This fact may be helpful in understanding the
results that follow. These are also extremely useful for understanding crucial aspects
of quantum mechanics theory.
10.5 The Importance of Being a Normal Operator 311
Proof Suppose .u, v ∈ V . We use a modified version of the polarization identity (see
Exercise 2.31):
We observe that the terms on the right-hand side are of the form required by the
condition. This implies that the right-hand side is zero and so the left-hand side is
zero too. Now set .v = f w. Then .( f w| f w) = 0 and so . f w = 0. Since .w is arbitrary,
we obtain . f = 0̂. ∎
We are here in the situation where we have to distinguish between a complex and a
real vector space. For a real vector space (i.e., for a linear
[ operator
] . f ), Proposition
10.10 is not valid: In .R2 we have a rotation of 90.◦ by . 01 −1
0 / = 0̂ and moreover we
have .(v| f v) = 0 ∀v ∈ R2 .
It is interesting to notice that the above proposition is also valid for a real vector
space in the special case of a self-adjoint operator.
Proof For a self-adjoint operator in a complex vector space, it was already proven
in Proposition 10.10. Therefore, we can assume that .V is a real inner product vector
312 10 Operators on Inner Product Spaces
space (. f is only .R-linear), then we have for the appropriate modified version of the
polarization identity in Sect. 10.2,
which holds if . f is symmetric and .(v| f v) real (see Exercise 2.28). So we obtain the
desired result as in Proposition 10.10. ∎
Proposition 10.12 states that all the zeros of this .χ f are real.
For .K = R we get .χ f ∈ R[x]. The matrix . F of . f is real and symmetric:
T
. F̄ = F and . F = F. This indicates in addition that . F = F holds. So the result
†
||( f − λ I dV )v||
. = ||( f ad − λ̄ I dV )v|| so that
( f − λ I dV )v = 0 ⇔ (f ad
− λ̄ I dV )v = 0.
This shows when . f is normal, . f ad and . f have the same eigenvectors and their
corresponding eigenvalues are complex conjugate. ∎
In the following proposition, we see that for normal operators, the eigenvectors
corresponding to different eigenvalues are linearly independent and orthogonal.
= (v2 | f v1 ) − (v2 | f v1 ) = 0.
to deal with. Therefore, the mathematical literature will usually refer to complex and
314 10 Operators on Inner Product Spaces
real spectral theorems separately. We follow this course for instructional reasons, but
we show in a corollary that if we have a slightly different perspective, it is possible
to consider only one spectral theorem for the general field .K.
As discussed in Sect. 9.4, the diagonalizability of an operator . F on an .n-
dimensional vector space .V implies both the existence of a diagonal representation
. f B = diag(λ1 , . . . , λn ) and the existence of a basis . B = (b1 , . . . , bn ) consisting of
eigenvectors .bs , correspond to the eigenvalues .λs for every .s ∈ I (n). It is reasonable
to call . B an eigenbasis of . f . We already know that not every operator . f has the priv-
ilege to be diagonalizable or equivalently to have an eigenbasis. In a complex vector
space, the normal operators are precisely those operators which have the privilege not
only to be diagonalizable but in addition to be diagonalizable with an orthonormal
eigenbasis!
This is essentially the content of the spectral theorems:
∑
n
. f cs = ci ϕis and f sad = ci (ϕad )is = ci ϕ̄is . (10.3)
i=1
∑
n ∑
n
|| f cs ||2 = || f ad cs || or equivalently
. |ϕis |2 = |ϕis |2 . (10.4)
i=1 i=1
This means that the norms of the corresponding columns of . F and . F † are equal.
This leads by induction of Eq. (10.2) directly to the result that only the diagonal
elements of . F are nonzero. So . F is diagonal and . f is diagonalizable. This proves
the Theorem. ∎
∑
i
j j
. f (bi ) = ϕi b j for some ϕi ∈ R.
j=1
b ∈ span{c1 , . . . , ci }
. i
and
c ∈ span{b1 , . . . , bi },
. i
Proof The proof goes through as in the complex spectral theorem since Schur’s
theorem can also be applied here! ∎
There exists a more direct formulation of the spectral theorem if we combine it
with Theorem 9.1. We thus obtain a spectral decomposition of every normal operator
. f parametrized by the set of its eigenvalues:
(iv) . f = λ1 P1 + λ2 P2 + · · · + λr Pr .
Proof This theorem is simply Theorem 9.1 with the additional information that the
Vi are orthogonal. The orthogonality part follows the spectral theorem of .K.
. ∎
Now that we clarified the structure of all normal operators, we would like to
discuss their content. Returning to the relation . f ad ◦ f = f ◦ f ad , we observe
10.6 The Spectral Theorems 317
As usual, the notation “.≤”, “.<” indicates a subspace with a similar structure.
It turns out that, as we shall see in Chap. 11, isometries and self-adjoint operators
are indeed the essential part of the normal operators.
Summary
In physics, when referring to a vector space, it almost always implies an inner prod-
uct vector space. In this chapter, we covered everything related to an inner prod-
uct space. This mainly includes concepts associated with orthogonality. Previously
known results were reiterated, summarized, and supplemented.
The normal operators, those endomorphisms relating to the inner product, were
extensively motivated and discussed here with emphasis on their analogies to com-
plex and real numbers. It was noted that precisely these operators are the most
commonly used in physics.
The spectral theorem applies to normal operators. Here, it was shown for complex
vector spaces that normal operators, such as isometries, self-adjoint operators, and
nonnegative operators, possess the best properties regarding their eigenvalues and
eigenvectors.
It was demonstrated that normal operators are the only ones that have orthog-
onal and orthonormal eigenbases. Moreover, self-adjoint operators even have real
eigenvalues. This allows them to act as observables in quantum mechanics.
Finally, it was pointed out that most operators describing symmetries in physics
are elements of unitary or orthogonal groups, and they also belong to the set of normal
operators.
318 10 Operators on Inner Product Spaces
In the first five exercises, you learn how to express a covector with the help of a
corresponding vector as scalar product. Furthermore, you learn to distinguish
a basis dependent isomorphism from a basis free one (canonical isomorphism)
with the example of a vector space .V and its dual .V ∗ .
Exercise 10.1 Riesz representation theorem. Let .V be an inner product vector space
and .ξ a covector (linear function, linear form, .ξ ∈ V ∗ := Hom(V, K)). Show that
there exists a unique vector .u such that we can express .ξ(v) with all .v ∈ V as a
scalar product:
.ξ(v) = (v|u).
Show that . f is normal if and only if .u and .w are linearly dependent (.w = λu for
some .λ ∈ K).
∼
Exercise 10.5 Canonical isomorphism .V can =
V ∗.
Let .(V, s) be an .n-dimensional Euclidean vector space with .s ≡ (|) a symmetric
positive definite bilinear form. Show that the map
ŝ :
. V −→ V ∗
u |−→ ŝ(u) := s(u, ·) ≡ (u|·) ≡ û ∈ V ∗ ,
(a|v)
s (v) = v − 2
. a a for all v ∈ V,
(a|a)
.(αi β i )2 ≤ (i αi αi )(k βk β k ),
Exercise 10.8 Let .V be a vector space and .u, v ∈ V with .||u|| = ||v||. Show that
||αu + βv|| = ||βu + αv|| for all .α, β ∈ R.
.
Exercise 10.9 Let .V1 and .V2 be inner product vector spaces and .(V1 × V2 , s ≡
(·, ·|·, ·)) given by
.(u 1 , u 2 |v1 , v2 ) := (u 1 |v1 ) + (u 2 |v2 ).
The next two exercises refer to Proposition 10.7 and to Comment 10.2.
Exercise 10.12 Let .V be a vector space and . P an operator with . P 2 = P such that
.ker P is orthogonal to.im P. Check that. V = ker P ⊕ im P and show that there exists
a subspace .U such that . P = PU .
be given by
. f (v) = w(u|v),
with .w ∈ V ' and .u, v ∈ V . Determine the adjoint operator . f ad . Write the result in
the Dirac formalism.
Exercise 10.16 Let
. F ∈ Hom(Kn , Kn )
be given by
. F[e1 · · · en ] = [0 e1 · · · en−1 ].
Determine . F ad ∈ Hom(Kn , Kn ).
Exercise 10.17 Let . f ∈ Hom(V, V ). Show that .λ ∈ K is an eigenvalue of . f if and
only if .λ̄ is an eigenvalue of . f ad .
Exercise 10.18 Let . f ∈ Hom(V, V ' ). Show that the following assertions hold:
(i) .dim ker f ad − dim ker f = dim V ' − dim V ;
(ii) .rank f ad = rank f .
Exercise 10.19 Let .V be a complex inner product vector space. Show that the set
of self-adjoint operators is not a complex vector space.
Exercise 10.20 Let . f, g ∈ Hom(V, V ) be self-adjoint operators. Show that .g ◦ f
is self-adjoint if and only if . f ◦ g = g ◦ f .
Exercise 10.21 Let .V be an inner product vector space and . P an operator with
P 2 = P. Show that . P is an orthogonal projection if and only if . P ad = P.
.
. V = E(λ1 , f ) ⊕ · · · ⊕ E(λr , f ).
References and Further Reading 321
In this chapter, we proceed with special normal operators such as the nonnegative
operators. The analogical comparison of normal operators with the complex num-
bers continues in that we now introduce nonnegative operators which correspond to
nonnegative real numbers.
Subsequently, we discuss isometries. These are closely related to symmetries in
physics, in particular to symmetries in quantum mechanics.
We then use properties of operators in complex vector spaces to derive properties
of operators in real vector spaces. An instrument for this method is complexification
which we explain in detail. In this way, we obtain the spectral theorem for real normal
operators which have not been accessible so far.
As discussed in Sect. 10.5, the normal operators are precisely those operators that
have a remarkable analogy to the complex numbers. This analogy goes one step
further and extends to positive and nonnegative numbers. This analogy also allows
us to speak of the root of a nonnegative operator. It is useful to remember first the
situation with the complex numbers. A complex number .z is positive if .z is real and
positive. This is equivalent to the existence of some .w /= 0 such that .z = w̄w, or to
√ √
having a positive square root . z ≡ z (positive). Likewise, .z is nonnegative if .z is
+
positive or zero. Coming now to the operators, the expected analogy with the complex
numbers is the following: The self-adjoint operators correspond to real numbers; the
positive operators which are always self-adjoint, correspond to positive numbers,
and the nonnegative operators correspond, of course, to nonnegative numbers. This
leads to this definition:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 323
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_11
324 11 Positive Operators–Isometries–Real Inner Product Spaces
Warning: Confusingly, some authors use the term “positive” to mean “nonnegative”.
11.2 Isometries
We repeat the geometric definition of an isometry valid for every .K-vector space with
an inner product.
This proposition specifies the following. (ii) shows that an isometry preserves the
inner product. It is compatible with the inner product and is therefore exactly what
we expect of an operator associated with the inner product.
(iii) and (iv) show that an isometry transforms an orthonormal basis into an
orthonormal basis.
(v), (vi), and (vii) show not only that an isometry is invertible, but also that we can
express this inverse simply by the adjoint .( f −1 = f ad ). This leads to the important
fact that the isometries form a group.
Proof (i) .⇒ (ii) We prove (ii) expressing the inner product by the norm, as given by
the polarization identity in Sect. 10.2.
326 11 Positive Operators–Isometries–Real Inner Product Spaces
Proof (ii) .⇒ (iii) For a given orthonormal list .(c1 , . . . , ck ) of the vectors .(ci |c j ) =
δi j i, j ∈ I (k), the preservation of the inner product gives
( f ci | f c j ) = (ci |c j ) = δi j ∀ i, j ∈ I (k).
.
.f
ad
◦ f = f ◦ f ad = idV .
= (v|v) = ||v||2 .
So . f ad is also an isometry.
since
. (i) implies (v),
so
. f ad ◦ f ◦ f −1 = f −1 ⇒ f ad = f −1 .
.idV = f ad ◦ f,
with
.(v| f ad f v) = ( f v| f v)
gives
The sequence of proofs (i) .⇒ (ii) .⇒ · · · ⇒ (vii) .⇒ (i) is complete and so is the proof
of the Proposition. ∎
The structure of isometries was given essentially by the spectral theorems in Sect.
10.6 since isometries are a subset of normal operators. However, here we have to
distinguish between .C- and .R-vector spaces. This is also clear with the following
two corollaries.
|| f v|| = ||v|| ∀ v ∈ U,
.
so . f is an isometry. ∎
Proof .⇒
If .V has an orthonormal basis .(c1 , . . . , cn ) of eigenvectors . f c j = λc j and
328 11 Positive Operators–Isometries–Real Inner Product Spaces
(Theorem 10.2) since the eigenvalues are real.(λ j ∈ R). It follows for every. j ∈ I (n):
It is easier to deal with operators in complex vector spaces than in real vector spaces.
As we already saw, over real vector spaces, only in the very special case where an
operator whose characteristic polynomial splits into linear factors, is it possible to
proceed analogously to the case of complex vector spaces. Therefore, we generally
expect the structure of normal operators in a real vector space to be quite different
from the corresponding case in a complex vector space. In contrast to the complex
spectral theorem, a rotation on a two-dimensional real vector space is generally not
diagonalizable. However, as we know, when going from .R to .C and even from .Rn to
.C , there are connections between real and complex vector spaces. These connections
n
C = Rc from .R.
.
.Rc ∼
=C∼
= R × R = {z = (ζ0 , ζ1 ) : ζ0 , ζ1 ∈ R}.
with
ζ , ζ1 , ξ0 , ξ1 ∈ R and i := (0, 1),
. 0
we obtain:
. C × Rc −→ Rc
(z, x) |−→ zx := (ζ0 ξ0 − ζ1 ξ1 , ζ0 ξ1 + ζ1 ξ0 ).
C = Rc = R × R = R + iR.
.
We are ready to generalize the above formalism to a real vector space .U . We hope
that the attentive reader will also accept our choice of the index .(0, 1) instead of
.(1, 2), we write .(ζ0 , ζ1 ) instead of .(ζ1 , ζ2 )!
.C × Uc −→ Uc ,
(ζ0 + iζ1 , u 0 + iu 1 ) | −→ (ζ0 + iζ1 )(u 0 + iu 1 ) := ζ0 u 0 − ζ1 u 1 + i(ζ0 u 1 + ζ1 u 0 ),
ζ0 , ζ1 ∈ R and u 1 , u 2 ∈ U.
330 11 Positive Operators–Isometries–Real Inner Product Spaces
Proof We have only to test that the list . B is linearly independent and spanning. Set-
ting .bs λs = 0, with .λs := λs0 + iλs1 , λs0 , λs1 ∈ R, s ∈ I (n), we obtain .bs (λs0 + iλs1 ) =
bs λs0 + ibs λs1 = 0. This means that .bs λs0 = 0 and .bs λs1 = 0.
From the linear independence of . B in .U , we obtain .λs0 = 0, λs1 = 0 and so .λs = 0
for all .s ∈ I (n), and . B is linearly independent in .Uc .
. B is also spanning in .Uc : If .v ∈ Uc , v = (v0 , v1 ) ∈ U × U , we have .v0 = bs λ0
s
∎
The extension of .U to .Uc = U + iU leads as expected to the question of whether
one can extend maps . f : U → V on real vector spaces to maps . f c : Uc → Vc on
their complexifications. Indeed, one can, with the following definition.
f (u 0 + iu 1 ) := f u 0 + i f u 1 .
. c
11.3 Operators in Real Vector Spaces 331
. c f ≡ f × f : U × U −→ V × V
(u 0 , u 1 ) |−→ ( f u 0 , f u 1 ).
= f (λ0 u 0 − λ1 u 1 ) + i f (λ0 u 1 + λ1 u 0 )
= λ0 f u 0 − λ1 f u 1 + i(λ0 f u 1 + λ1 f u 0 )
= (λ0 + iλ1 )( f u 0 + i f u 1 )
= λ f (v).
. A: Rn → Rn ,
u→ |→ Au→.
Ac : Cn −→ Cn
u→0 + i u→1 |−→ Ac (→
u 0 + i u→1 ) := Au→0 + i Au→1 .
This is obviously what we would do intuitively: Science is nothing but intuition that
became rational.
In this context, the question of how to represent . f c can easily be answered: If we
have . f B B = F ∈ Rn×n , then .( f c ) B B = F again. This follows from the fact that, as
the above proposition shows, any basis of .U is also a basis of .Uc . With the above
332 11 Positive Operators–Isometries–Real Inner Product Spaces
notation we have:
. A : R2 −→ R2
u→ |−→ Au→
has no eigenvalues and the spectrum (i.e., the set of eigenvalues) of . A is empty, given
by .σ (A) = ∅. The complexification operator is given by
. Ac : C2 −→ C2
v→ |−→ A→
v := A→
v
This is not the case for every operator on a real vector space. But the existence of
an eigenvalue in the complex setting guarantees at least the presence of an invariant,
one or two-dimensional invariant subspace corresponding to the subspace of the real
operator. This is the content of the following proposition.
Proof If .V is a real vector space, . f ∈ End(V ) and . f c ∈ End(Vc ), then there exists
an eigenvalue .λ ∈ C of . f c :
. f c v = λv.
iu 1 ). This leads to
. f u 0 = λ0 u 0 − λ1 u 1 and f u 1 = λ0 u 1 + λ1 u 0 .
We can now give a complete description of normal operators on real vector spaces.
The prerequisite for the notion of the normal operator is, of course, the existence of
an inner product vector space. So here, we are dealing with normal operators on a
Euclidean vector space. We also know from the previous section that every operator
on a real or a complex vector space has an invariant subspace of dimension 1 or 2.
Therefore, we must first describe the complete prescription of normal operators in
dimensions 1 and 2. For that purpose and to prepare the following procedure, it is
advantageous to primarily discuss the very pleasant properties of normal operators
in real and complex vector spaces, in the context of their restrictions on invariant
subspaces. The following two propositions illustrate this.
Proof Given [ ] [ ]
AC A† 0
.F = and F = ,
†
0 D C † D†
we have [ ][ ] [ † ]
A† 0 AC A A A† C
.F F = =
†
C † D† 0 D C † A C †C + D† D
and [ ][ ] [ † ]
AC A† 0 A A + CC † C D †
.F F = = .
†
0 D C † D† DC † D D†
If . F is normal, then
. F † F = F F †.
s=1
The expression .tr(A† A) is very interesting: firstly, the symmetry equation .tr(A† A) =
tr(A A† ) and secondly .tr(A† A) is a sum of squares so we have
and
∑
n
. tr(A† A) = |αsi |2 . (11.3)
s,i
. tr(CC † ) = 0. (11.5)
Proof (i)
Let.Cr := (c1 , . . . , cr ) be an orthonormal basis of.U . We extend.Cr to an orthonormal
basis .C on .V , .C = (Cr , Bs ) = (c1 , . . . , cr , b1 , . . . , bs ), so that .r + s = n = dim V .
Then . Bs , being orthogonal to .U , is a basis of .U ⊥ . Since .U is . f -invariant, the
representation of . f , with respect to the basis .C, is given by the block matrix . f C ≡ F:
[ ]
F1 F2
. F= .
0 F3
(( f |U )ad u 2 |u 1 ) = (u 2 | f |U u 1 ) = (u 2 | f u 1 ) = ( f ad u 2 |u 1 ) = ( f ad |U (u 2 )|u 1 )
.
. f ad U := f ad |U !
f ad fU = ( f ad )U fU
. U
= ( f ad f )U
= ( f ad )U fU
= f ad U fU .
This completes our preparation for normal operators. As we know from the last
section, every normal operator in a Euclidean vector space has a one-dimensional or a
two-dimensional invariant subspace. We will now determine the structure, starting
with this low-dimensional normal operator.
Nothing needs to be said in the one-dimensional case because, in this case, every
operator is a normal operator and invariant subspace here means eigenspace. The
two-dimensional case is not quite trivial. It is clarified in the following proposition.
11.4 Normal Operators on Real Vector Spaces 337
Proof For any orthonormal basis .C = (c1 , c2 ), the matrices . f C ≡ F and . f Cad ≡ F T
are given by [ ] [ ]
αγ T αβ
.F = and F = .
β δ γ δ
and
. αβ + γ δ = αγ + βδ. (11.7)
γ 2 = β 2 and γ = ±β.
.
After having obtained all the above results, we expect that a normal operator. f on a
real inner product vector space will have an orthogonal decomposition, consisting of
normal operators restricted to one-dimensional or two-dimensional Euclidean vector
spaces. This means that we will have an orthogonal decomposition of .V , we use the
symbol “.Θ” for it:
338 11 Positive Operators–Isometries–Real Inner Product Spaces
. V = U1 Θ · · · Θ Ui Θ · · · Θ Us s ∈ N, i ∈ I (s),
. f = f1 Θ f2 Θ · · · Θ fi Θ · · · fs ,
with normal . f i ∈ End(Ui ) and if .dim Ui = 2 with . f i of the type given in the previous
proposition. This is given in the next theorem.
orthonormal basis with respect to which the matrix of . f |U ⊥ has the expected block
11.4 Normal Operators on Real Vector Spaces 339
diagonal form. This basis of .U ⊥ , together with the basis of .U , gives an orthonormal
basis of .V with respect to which the matrix of . f has the form given in the above
theorem. ∎
Modulo reordering the basis vectors of an orthonormal basis .C, we obtain directly
the following corollary.
. F = 1 Θ · · · Θ 1 Θ (−1) Θ · · · Θ (−1) Θ F1 Θ · · · Θ F j Θ · · · Θ Fr .
[ ]
cos ϕ j − sin ϕ j
. Fj = .
cos ϕ j sin ϕ j
Summary
In this chapter, we first briefly discussed the nicest operators in mathematics: non-
negative operators and isometries. Both are special, normal operators and therefore
subjects of the spectral theorem.
Our main concern was to examine operators in real vector spaces, and especially
normal operators in real inner product spaces.
Here, the question of diagonalization was much more challenging than in complex
vector spaces.
The process of complexification was extensively discussed. This approach allows
us to relate the question of diagonalization to the known results in complex vector
spaces. In this way, the spectral theorem was also applied to real vector spaces.
340 11 Positive Operators–Isometries–Real Inner Product Spaces
Exercise 11.1 Show that the sum of two positive operators is a positive operator.
Exercise 11.3 Show that a nonnegative operator in a vector space .V has a unique
nonnegative square root.
Exercise 11.5 Let . F be a matrix . F ∈ Rn×n and .φ F : Sym(n) → Rn×n be the linear
map given by .φ F (S) := F T S F. Prove the following assertions:
(i) .φ F ∈ End(S(n));
(ii) .φ F is bijective if and only if . F is invertible.
T
[ ] 11.6 Consider the function . f (x) = αx Sx to show that the matrix . S =
Exercise
αβ
β δ ∈ Sym(2) is positive definite if and only if
Exercise 11.7 Let . S = (σis ) ∈ Sym(n) with .i, s ∈ I (n). Show that the following
conditions are necessary for . S to be positive definite.
(i) .σii > 0;
(ii) .σii σss − σis2 > 0 for every .i < s.
Exercise 11.8 Let . S ∈ Sym(n) and . F ∈ Gl(n). Show that the following assertions
are equivalent.
(i) . S is positive definite;
(ii) . F T S F is positive definite.
Exercise 11.10 For . S = (σis ) with .i, s ∈ I (n) and . S ∈ Sym(n), show that for any
m ∈ N the following assertions hold.
.
The next exercise shows again the analogy between real numbers and self-
adjoint operators. We have .x 2 + 2β + γ = (x − β)2 + γ − β 2 > 0 if .γ −
β 2 > 0. We may expect a similar relation with . f a self-adjoint operator to
. x a real number.
is invertible.
342 11 Positive Operators–Isometries–Real Inner Product Spaces
In the following three subsections, we will discuss some of the most important special
cases and applications of standard operators and linear maps, generally. We start with
orthogonal operators, isometries in real vector spaces, including reflections.
Next, we return to linear maps, .Hom(V, V ' ), on inner product vector spaces and
explain in some detail the role of singular value decomposition (SVD). This leads
smoothly to the polar decomposition operators of square matrices.
We finally discuss the Sylvester’s law of inertia and investigate shortly its con-
nection with special relativity.
Prime examples of operators contain what we usually call rotations and reflections.
Since we have described rotations in the previous Chap. 11, we now broaden our
study to reflections too. In two dimensions, the reader ought to be familiar with
both. In higher dimensions, the story becomes a little less straightforward. Some
important observables in physics are connected mainly with reflections, such as in
quantum mechanics and elementary particle physics. It is interesting to notice that
some observables which are also described by reflections, are involved in some of
the still unsolved problems in physics. As example, we mention observables which
are connected with the CPT theorem.
We start, as in the previous sections, with the two-dimensional case since this
shows all the essential geometric properties that also appear in higher dimensions.
Isometries are special cases of normal operators. Nontrivial normal operators in
two dimensions and real vector spaces have the form
[ α −β ]
.
β α , with α, β ∈ R,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 343
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_12
344 12 Applications
as it was given by the Theorem 11.1. Isometries are operators and have only one
additional condition: .α2 + β 2 = 1 which means we can find some .ϕ ∈ R such that
.α = sin ϕ and .β = cos ϕ, and so the matrix takes the form:
[ ]
cos ϕ − sin ϕ
. A ≡ A(ϕ) = ϕ ∈ R.
sin ϕ cos ϕ
A : R2 −→ R2
.
These sets, . O(2) and . S O(2), are subgroups of .Gl(2, R) ≡ Gl(2) and we have the
relation . S O(2) < O(2) < Gl(2). If we consider the elements of the group . S O(2)
as points in .R2 , we see the equivalence of . S O(2) ∼
= S 1 . As for the rest of . O(2), since
. A ∈ O(2) implies
1 = det(AT A)
.
= det(AT ) det(A)
= det(A)2 ,
we see that
. O(2) − S O(2) = {B ∈ O(2) : det B = −1}.
The main difference between . A and . B is that . B is a symmetric matrix and is therefore
diagonalizable.
12.1 Orthogonal Operators–Geometric Aspects 345
The points of . H are the fixed points of . B. It is interesting to see that . B factorizes!
[ ] [ ][ ]
cos ϕ sin ϕ cos ϕ − sin ϕ 1 0
. B= = .
sin ϕ − cos ϕ sin ϕ cos ϕ 0 −1
[ 0 ]
The matrix . S := 01 −1 is a reflection with the eigenvectors .e1 and .e2 and of course
.det S = −1. So the group . O(2) can be described as:
|a)(a| aa T
. Pa = ≡ T ,
(a|a) a a
as such
|a)(a|
. a S := id − 2 ≡ id − 2Pa .
(a|a)
So we have
. S = id − 2|e2 )(e2 |
and
At this point, it is also interesting to ask what happens when we compose reflec-
tions . Sb Sa {b /= ±a}. It turns out that . Sb Sa is again a rotation. The fixed points of
. Sb Sa are given by . Ha ∩ Hb = {0}! Since . Sb Sa is an orthogonal operator and the only
fixed point is zero, . Sb Sa must be a rotation. An explicit calculation can also show
this:
[ ][ ]
cos ϕ + sin ϕ cos ψ sin ψ
. B(ϕ)B(ψ) = =
sin ϕ − cos ϕ sin ψ − cos ψ
[ ]
cos(ϕ − ψ) − sin(ϕ − ψ)
= A(ϕ − ψ).
sin(ϕ − ψ) + cos(ϕ − ψ)
346 12 Applications
As we now see, we can express every rotation in .R2 by a composition of two reflec-
tions. As shown in the above equation, a possibly less pleasant result is that the group
. O(2) is a non-abelian group (noncommutative). Nevertheless, its subgroup . S O(2)
is a commutative group: . A(ϕ)A(ψ) = A(ϕ + ψ).
All the above results can be generalized to any .n-dimension, in particular the
Theorem 11.1 of the previous section. The following formulation is slightly different
from Corollary 11.4:
then
. V = U Θ H and sa i H = id H with sa iU = −idU .
This means that the vectors of all .w ∈ H are fixed points of .sa : sa (w) = w. The
map .sa describes a reflection of all vectors .u ∈ U over zero and a reflection of the
vectors .v ∈ V − H over the hyperplane . H . Hence .sa is a symmetric, involutive, and
orthogonal operator on .V . We are coming now to the proof.
Proof For the four different points of Proposition 12.1, an explicit calculation leads
to
(i): .sa (a) = a − 2 (a|a)
(a|a)
a = a − 2a = −a.
(ii): .(w|sa (v)) = (w|v) − 2 (a|v)(a|w)
(a|a)
. This is explicitly symmetric in .v, w.
(iii):
(a|sa v)
. a s (sa (v)) = sa (v) − 2 a
(a|a)
[ ]
(a|v) 1 (a|v)(a|a)
=v−2 a−2 (a|v) − 2 a
(a|a) (a|a) (a|a)
(a|v) (a|v) (a|v)
=v−2 a−2 a+4 a = v, utilizing (ii).
(a|a) (a|a) (a|a)
. f : S (n−1) (r ) −→ S (n−1) (r )
v |−→ f (v)
since || f (v)|| = ||v|| = r.
348 12 Applications
The next proposition shows a key property of reflections that determines the role
of reflections within the orthogonal operators. A reflection can connect two distinct
vectors on a sphere:
Proof We choose the reflection with .a = u − v and we observe that .(u − v|u) =
1/2||u − v||2 = 1/2(u − v|u − v). So we have
(u − v|u) (u − v|u − v)
s
. u−v (u) = u − 2 (u − v) = u − 2 21 (u − v)
(u − v|u − v) (u − v|u − v)
= u − (u − v) = v.
Since .dim W = n − 1, we can apply the induction hypothesis and we have at most
n − 1 reflections on .W :
.
so that
This leads to
. f = sa ◦ g = sa ◦ sbr ◦ · · · ◦ sb1 .
In this section, we return to general linear maps . f ∈ Hom(V, V ' ) on inner product
vectors spaces. We discover that if we take not one but two special orthonormal
bases, we obtain extremely pleasant results.
Up to this point, we have had frequent opportunities to see the importance of
bases for understanding the algebraic and geometric structure of linear maps. In this
section, we realize this fact once more. Considering a linear map . f : V → V ' , the
choice of a tailor-made basis . B = [b1 . . . bn ] and . B ' = [b1' . . . bm' ] for .V and .V ' (of
dimension .n and .m) respectively, led us in Chap. 3 to the equation
. f B = B ' ∑1
[ ]
and to a matrix representation of . f of the form . f B ' B ≡ ∑1 = 10r 00 which is the
normal form.
The matrix .∑1 is as simple as possible. The number .r , the rank of . f , is an impor-
tant geometric property. But by choosing such perfect tailor-made bases, we lost
some other important geometric properties of . f . In the case of the endomorphisms
. f ∈ Hom(V, V ) ≡ End(V ), since we consider only one vector space . V , it seems
unnatural to consider more than one basis . B, and this makes such a classification
350 12 Applications
. f bs◦ = λs bs◦
Apart from this, there are even more privileged endomorphisms, the normal oper-
ators. As we know from the spectral theorems, the tailor-made bases are given by
the orthonormal bases .C = [c1 , . . . , cn ]. And one of the results was that these are
the only linear maps that are orthogonally diagonalizable:
. f C = C Δ.
There is no doubt that normal operators are the “nicest” operators there are! It is
remarkable that in physical theories like quantum mechanics, they are essentially the
only ones we need. But from a mathematical point of view, there is the question of
universality. We would like to have a normal form for all endomorphisms! To achieve
this, we have to go one step back and try to use two bases for one vector space! This
will lead us to what we call singular value decomposition (SVD) which is applicable
to all . f ∈ End(V ) over real or complex inner product spaces and by construction to
all . f ∈ Hom(V, V ' ).
The appropriate tailor-made bases for . f are two orthonormal bases that are con-
nected to the two diagonalizable self-adjoint operators . f ad f and . f f ad . The use of
orthonormal bases instead of general bases preserves the information about eigen-
values even when considering general linear maps . f ∈ Hom(V, V ' ).
This is why we discuss the SVD for . f ∈ Hom(V, V ' ), the special case . f ∈
End(V ) is completely trivially included in the general situation. As we shall see,
a kind of miracle makes the whole procedure possible: The map . f essentially trans-
forms the eigenvectors, the positive part of the eigenbasis of . f ad f into the eigen-
vectors of . f f ad , and we are led to the following theorem:
12.2 Singular Value Decomposition (SVD) 351
Let .V, V ' be inner product spaces of dimensions .n, m respectively and
. f : V → V ' a linear map of rank .r . Then there are orthonormal bases .U =
[u 1 . . . u n ] of .V and .W = [w1 . . . wm ] of .V ' and positive scalars .σ1 > σ2 >
· · · > σr , the so-called singular values of . f , such that
. f (u s ) = σs ws if s ∈ I (r ) and
f (u s ) = 0 if s > r.
. f U = W∑ (12.1)
or equivalently by
. f = W ∑U ad (12.2)
If we use the same letters .U and .W for corresponding bases in .Kn and .Km , we have
the unitary (orthogonal) matrices
.U = [→
u 1 . . . u→n ] and W = [w
→1 . . . w
→ m ].
. FU = ∑W (12.4)
. F = W ∑U . †
(12.5)
352 12 Applications
. f = σ1 w1 u †1 + . . . σr wr u r† , (12.6)
. F = σ1 w→ 1 u→†1 + . . . σr w→ r u→† or equivalently (12.7)
. F = σ1 |w1 )(u 1 | + . . . σr |wr )(u r |. (12.8)
λ /= 0 if s ∈ I (r ) and λs = 0 if s > r ,
. s
. f u s = σ s ws if s < r (12.10)
and f u s = 0
. if s > r. (12.11)
It turns out that .(ws )r is an orthonormal basis of .im f = span(w1 , . . . , wr ) < V ' :
When .s, t ∈ I (r ), we have
(σt wt |σs ws ) = ( f u t | f u s ) = ( f ad f u t |u s ).
. (12.12)
This leads to
σ̄ σs (wt |ws ) = (σt2 u t |u s ) = σt2 (u t |u s ) = σt2 δts
. t (12.13)
and so
.(wt |ws ) = δts . (12.14)
if .s < r , as above.
If.s > r , since. f u s = 0 ⇔ f ad f u s = 0, we have from. f ad f u s = λs u s = 0,.λs =
0 and .σs = 0, also . f u s = 0. ∎
12.2 Singular Value Decomposition (SVD) 353
.s > r .
This means that
. f ad wt = u s (u s | f ad wt ) = u s ( f u s |wt )
= u s (σs ws |wt )
= σs u s (ws |wt ) = σs u s δst (12.18)
so that
. f ad ws = σs u s . (12.19)
we test
. f ad f u s = σs ( f ad ws ) = δs (δs u s ) = σs2 u s . (12.21)
The singular value decomposition (SVD) can be used to give direct proof of the
polar decomposition (see below) of an endomorphism of a square matrix. The polar
decomposition gives a factorization of every square matrix as a product of a unitary
and a nonnegative matrix. This means that we can express every square matrix by two
special normal operators which we can completely describe and understand by the
spectral theorems. In addition, the analogy to complex numbers which was explained
in Sect. 10.5 appears again: Just as we have for every .z ∈ C .z = eiϕ |z| with .ϕ ∈ R, so
we have for every . A ∈ Kn×n . A = Q P with . Q an isomorphism and . P a nonnegative
matrix. The analogy.eiϕ ∼ Q and.|z| ∼ P should be clear. This leads to the following
result.
354 12 Applications
Proof Using the SVD as in the above proposition for matrices, we can express . A by
the equation
.A = W ∑ U .
†
U and .W are unitary matrices and .∑ is a diagonal matrix with nonnegative entries.
.
Using .U † U = 1n , we obtain
. A = W 1n ∑U † = W U † (U ∑U † ).
. A=QP
which is assertion a.
To prove assertion b of Proposition 12.3, we assume that
. A = Q P = Q 0 P0
again with . Q 0 unitary and . P0 nonnegative. Now, since . A is invertible, . P and . P0 are
also invertible and therefore positive.
Consider A† A = (Q P)† Q P = P † Q † Q P = P Q † Q P = P 2
.
and A† A = (Q 0 P0 )† Q 0 P = P02 .
If you did not want to understand the structure of spacetime, you would probably not
need, as a physicist, to read this section. If Einstein had not discovered the theory of
relativity, the problem discussed in this section would not be so relevant for physical
investigations. In particular, as we have special relativity, we may ask how many
12.3 The Scalar Product and Spacetime 355
.s: V × V −→ R
(u, v) |−→ s(u, v).
and
We set.τμν ≡ τνμ for comparison with the usual notation in the mathematical literature.
At the level of matrices, we find
356 12 Applications
T
S = TC B SC TC B ≡ T T SC T
. B or equivalently (12.25)
T −1 T −1
S =
. C TBC S B TBC ≡ (T ) SB T . (12.26)
f
. BB = TBC f CC TC B = T −1 f CC T or equivalently (12.27)
−1
f
. CC = TC B f B B TBC = T fBB T . (12.28)
As one can see, the difference is the .T −1 that appears for endomorphism transforma-
tions. This means that if we know precisely what a given matrix represents, we know
its transformation law. This is extremely important to know, not only in relativity but
across physics. It is useful to give or recall some further definitions to proceed.
We observed above an equivalence relation corresponding to symmetric bilinear
forms. This is usually called congruence: The matrices . A and . B are congruent if an
invertible matrix .G ∈ Gl(m) exists so that . A = G T BG holds. In this sense, . S B and
. SC are congruent. The rank is mainly defined for a linear map, but it may also be
defined for a symmetric bilinear form:
. rank(s) = rank(S B ).
We say .s is degenerate if there is some .v ∈ V |{0} such that .s(v, u) = 0 for all
u ∈ V . Otherwise, we say .s is nondegenerate.
.
The subspace .U0 of .V given by
The analog definition is valid for a symmetric or a Hermitian matrix and in an obvious
notation we have to replace .s(v, v) or .s(u, u) by .v→† S v→ and .u→† S u→.
We are now returning to our classification problem for symmetric bilinear forms
which corresponds, in the case of spacetime, to the classification of all theoretically
possible models for a flat (linear) spacetime, that is, for special relativity. Mathe-
matically speaking, we have to consider all possible pairs .(V, s) with .s not only not
positive definite but also semidefinite and even .s degenerate. This means we have
to consider also non-Euclidean geometries. As is well-known, it took science more
than 2000 years to get to that point!
Here we restrict ourselves, of course, to the mathematical problem and, without
loss of generalization and for the sake of simplicity, to the level of matrices.
The first step is to diagonalize a representation, a matrix like . S B . The spectral
theorem tells us that we can even use an orthogonal matrix . Q ∈ O(n). This fits very
nicely with the transformation formula for scalar products since whenever. Q ∈ O(n),
−1
.Q = Q T ! So the eigenvalue equations at the level matrices give us
. B S Q = Q∑ B and ∑ B = Q T SB Q (12.29)
with ⎡ ⎤
σ1B 0
⎢ .. ⎥
∑B = ⎣
. . ⎦ and σμB ∈ R. (12.30)
0 σnB
the reason for the non-invariance in the transformation formula for scalar products
above (Eq. (12.29)). We have in . S B = Q T SC Q on the left hand side of . SC the matrix
T −1
. Q instead of the matrix . Q . This scales the coefficients in . SC , and we obtain
This is clear in the case.dim V = 1. The basis is given by.b ∈ V and.b /= 0. So we have
σ B = s(b, b). A second basis given by .c ∈ V and .c /= 0 with .c = λb, λ ∈ R, λ /= 0
.
Proof As we see from the above discussion, we may, according to the spectral
theorem, diagonalize the matrix . S = S B . We obtain the diagonal matrix .∑ using the
orthogonal matrix . Q ∈ O(n). This corresponds to an ordered orthonormal basis.
with
( ) ( )
. ∑ B = diag σi(+) , σ (−) (0)
j , σk ≡ diag σ(B)i(+) , σ(B)(−) (0)
j , σ(B)k (12.32)
B
This leads to ⎡ ⎤
1n + 0
∑ 0B = ⎣
. 1n − ⎦. (12.34)
0 0n 0
U B(+) := span(→
. q1(+) , . . . , q→n(+)
+
), U B(−) := span(→
q1(−) , . . . , q→n(−)
−
) and (12.35)
U B(0) := span(→
. q1(0) , . . . , q→n(0)
0
). (12.36)
12.3 The Scalar Product and Spacetime 359
It is clear that for .u +B ∈ U B(+) , .u −B ∈ U B(−) and .u 0B ∈ U B(0) , we have .s(u + , u + ) > 0,
.s(u − , u − ) < 0 and .s(u 0 , u 0 ) = 0.
At this point, we may have realized that the numbers.(n +B , n −B , n 0B ) are well defined
since .n 0 = dim V − rank(s) and .n +B (n −B ) correspond to the maximum dimension of
a subspace with positive (negative) definite restriction of .s. Apparently, we here have
a direct sum:
(+)
.V = U B ⊕ U B(−) ⊕ U B(0) . (12.37)
We can of course follow the same procedure for any other basis .C and . SC and we
come to similar expressions:
This means that if a subspace . Z has the property that for all .z ∈ Z , .s(z, z) > 0
holds, then its dimension is not bigger than .n +B so that .dim Z < n +B = dim U B(+) . This
can also be proven by contradiction:
/ U B(+) and.z̃ ∈ U B(−) + U B(0) which signifies.s(z̃, z̃) < 0 or, equivalently, that
so that.z̃ ∈
(0)
.dim(Z ∩ U ⊕ U (−1) ) > 1. This is in contradiction to the hypothesis .s(z, z) > 0
for all .z ∈ Z . So .dim Z < n +B holds.
Therefore, if we take . Z = UC(+) and .dim UC(+) = n C+ , we have .n C+ < n +B . Inter-
changing . B and .C, we get .n +B < n C+ . This leads to .n +B = n C+ and shows that
.(n +B , n −B , n 0B ) = (n C+ , n C− , n C0 ). (12.39)
We see that the numbers .(n + , n − , n 0 ) are basis-independent and depend only on the
scalar product .s. This means further that they are invariant under the congruence.
This proves the theorem. ∎
Summary
Three important applications of the last three chapters to operators were discussed.
Firstly, the orthogonal group in two dimensions was thoroughly examined. Then,
the structure of orthogonal operators was presented for arbitrary dimensions, cor-
responding to the spectral theorem for the orthogonal group. Reflections were also
presented as a multiplicatively generating system of the orthogonal group.
Next, the singular value decomposition (SVD) was introduced, a highly relevant
method in the field of modern numerical linear algebra. It allows for a universal
360 12 Applications
representation of any operator through the use of two specially tailored orthonormal
bases.
Finally, in terms of the structure of spacetime in special relativity, the classification
of symmetric, nondegenerate, bilinear forms was discussed. This was a well-known
mathematical problem that was addressed as early as the 19th century by the math-
ematician James Joseph Sylvester.
We focus our attention on the different transformation behavior of vectors and cov-
ectors by changing bases, and briefly reviewing what we already know. This includes
our conventions and notations, which should not be underestimated here. We try to
use as clear a notation as possible, especially for the desiderata in physics.
We start with an abstract .n-dimensional vector space .V, dim V = n and its dual
∗
. V . We choose two bases
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 361
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_13
362 13 Duality
B ∗ := (β 1 , . . . , β n ) ≡ (β s )n and C ∗ := (γ 1 , . . . , γ n ) ≡ (γ i )n
given by
.β s (br ) = δrs and γ i (c j ) = δ ij (13.1)
The change of basis transformation matrix .T = (τsi ) with the transformation coeffi-
cient .τsi is given by:
. B = C T or C = BT −1 . (13.6)
Note the correspondence of Eqs. (13.6) to (13.4). Using the above duality relation
(Eq. 13.1), we obtain
.β = τ̄i γ or γ i = τsi β s .
s s i
(13.7)
B = T −i C or C = T B.
. (13.8)
13.1 Duality on an Abstract Vector Space 363
and we obtain
v s = τ̄is vCi or vCi = τsi v sB and ξsB = ξiC τsi or ξiC = ξsB τ̄is .
. B (13.10)
Note the correspondence of Eq. (13.10) to Eqs. (13.7) and (13.4). The matrix form
of the above equation is expressed as
v = T −1 vC or vC = T v B and ξ B = ξ C T or ξ C = ξ B T −1 .
. B (13.11)
Hence,
with
v s , vCi , ξsB , ξiC ∈ K.
. B
. g : V −→ V ∗
bs |−→ g(bs ) = β s . (13.14)
Needless to say that we have here a basis dependent isomorphism .g ≡ g B and that
the bases . B and .B are the tailor-made bases of .g. So we have the representation
364 13 Duality
g B = 1n .
. BB (13.15)
The same can be done with the second bases .C and .C and we again get the represen-
tation
. gCC = 1n .
C
(13.16)
Since no other structure is present in .V , the explicit use of bases is needed for any
isomorphism. So we may conclude this “obvious” isomorphism is not canonical as it
depends on extra information. In fact, one can prove that no “canonical” isomorphism
exists for general vector spaces.
Before proceeding, let us have a short break. There is a point about duality
that we have to think over. The question arises of what will happen if we con-
sider further duals. What happens if we consider .(V ∗ )∗ and .((V ∗ )∗ )∗ , and so
forth. Here, we are lucky when .V is finite-dimensional. In contrast to what hap-
pens in an infinite-dimensional vector space, the duality operation stops. We set
∗ ∗ ∗∗
.(V ) ≡ V := Hom(V ∗ , K). The reason for this break is that between .V ∗∗ and .V ,
there exists a basis-independent (canonical) isomorphism.:
.ev : V −→ V ∗∗ = Hom(V ∗ , K)
v |−→ ev(v) ≡ v # : v # (ξ ) := ξ(v). (13.17)
A direct inspection shows that the map .ev is linear, injective, and surjective and we
have
∼
= V ∗∗ .
. V can (13.18)
v(ξ ) := ξ(v)!
. (13.19)
This is a fundamental relation used implicitly in tensor formalism and which justifies
the dual nomenclature.
We can now come to duality with the additional structure of an inner product in
a vector space and so bring together duality and orthogonality. The scalar (inner)
product changes the connection between .V and .V ∗ drastically, as it obtains for us,
without using a basis, not only a canonical isomorphism between .V and .V ∗ , but even
more a canonical isometry between .V and .V ∗ . If we want to, we can identify .V and
∗
. V as vector spaces with a scalar product, here meaning a “nondegenerate bilinear
form”.
13.2 Duality and Orthogonality 365
In physics, without discussing this point, we make this identification from aca-
demic infancy, for example, in Newtonian mechanics, in electrodynamics, and in
special relativity. Since the most critical applications of duality in physics, in special
and general relativity and relativistic field theory, concern real vector spaces, we
restrict ourselves in what follows to real vector spaces. For simplicity’s sake, we go
one step further and discuss the case of a positive definite scalar product, that is,
we consider Euclidean vector spaces. The formalism is precisely the same for the
more general case of a nondegenerate scalar product. In addition, it is comfortable
to always have in mind our three-dimensional Euclidean vector space.
We consider an .n-dimensional Euclidean vector space .(V, s) with a symmetric
positive definite bilinear form .s:
s : V × V −→ R
.
The first pleasant achievement is that we get from.s a canonical isomorphism between
V and .V ∗ :
.
.ŝ : V −→ V ∗
u |−→ ŝ(u) := s(u, ·) = (u|·) ≡ û ∈ V ∗ (13.20)
. V∼
=V ∗ .
s (13.21)
Proof. The map .û is a linear form since .û(v) = s(u, v) and .s is a bilinear form.
Here, .V is a real vector space and .s is linear in both arguments.
. ⊔
⊓
366 13 Duality
.š : V ∗ −→ V
ξ |−→ š(ξ ) ≡ ξ̌ ≡ u(ξ ) ∈ V
so that
(u(ξ )|v) ≡ (ξ̌ |v) := ξ(v) ∈ R.
. (13.22)
The second achievement of the inner product, also called a metric in physics, is that
s canonically induces in .V ∗ a metric which we denote by .s ∗ . Therefore both, .(V, s)
.
and .(V ∗ , s ∗ ), are Euclidean vector spaces. The previously defined map .ŝ : V → V ∗
is now an isometry and not “only” an isomorphism (see Comment 13.1 below).
s ∗ : V ∗ × V ∗ −→ R
.
. S ∗ = (σ i j ) (13.27)
with .σ i j := s ∗ (β i , β j ).
It is interesting that the matrices . S ∗ and . S = (σi j ) with .σi j = s(bi , b j ) represent
the isometry between .V and .V ∗ .
The map .ŝ and its inverse .š ≡ (ŝ)−1 produce a new basis in .V ∗ and .V :
Proof. (i)
Since .ŝ(bi ) ∈ V ∗ , we may write .ŝ(bi ) = λi j β i j with .λ j ∈ R. On the other hand, we
have .ŝ(bi ) = b̂i ≡ βi . This leads to .λi j β i (bk ) = βi (bk ), giving
. S = S S ∗ S since 1n = S ∗ S since (S ∗ = S −1 ).
This means that .σ ik σk j = δ ij . Here, we do not write the index . B in . S since it is not
necessary.
Proof. (iii)
Analogously to (i), since.βˇi ∈ V , we have.β̌ i = μik bk with.μik ∈ R. Taking the scalar
product on both sides, we obtain .μik (bk |b j ) = (β i |b j ). This leads to
.μik σk j = β i (b j )
and to μik σk j = δ ij ,
which means μik = σ ik .
. ⊔
⊓
All this shows that, mathematically, the two Euclidean vector spaces .(V, s) and
(V ∗ , s ∗ ) are indistinguishable and can be identified. This means we are left with only
.
one vector space .V but with two bases, .(b1 , . . . , bn ) and .(b1 , . . . , bn ) in .V , and the
following relations:
These two bases, bases .(bi )n and .(bi )n , are used in many fields in physics and are
often called reciprocal. The following discussion clarifies this situation even more.
The representations. S and. S ∗ of the isometry between.V and.V ∗ are symmetric but
not diagonal. We expect that if we find a tailor-made basis, we can diagonalize . S and
∗
. S . We can indeed do this if we choose an orthonormal basis .C = (c1 , . . . , cn ), with
its dual .C = (γ 1 , . . . , γ n ) and .γ i (c j ) = δ ij . The matrices . SC and . SC∗ also represent
the scalar products .s and .s ∗ , as they were introduced before. It is evident that with
the basis .C we have
σ C = s(ci , c j ) = δi j and so
. ij (13.28)
. S = 1n and S ∗ = 1n . (13.29)
c = σiCj c j = δi j c j = ci and
. i (13.30)
γ =
. i σiCj γ j =γ i
which means (13.31)
C = Cˇ and C = Ĉ.
. (13.32)
For this reason, we often use expressions (see Eq. 13.30) like
v = cs v s = cs (cs |v).
. (13.33)
So, in this case, with an orthonormal basis in .V , we need only to consider one basis in
V and one basis in its dual .V ∗ . In most cases in physics and geometry, it is reasonable
.
to work with an orthonormal basis. Nevertheless, there are cases where choosing an
orthonormal basis is not possible, for example , when selecting coordinate bases on a
manifold with nonzero curvature. This is one of the main reasons why we considered
the general situation in the discussions of this section.
Summary
This chapter was introduced to dispel certain biases in physics related to the dual
space of a vector space and its necessity.
In physics, vector spaces with an inner product are initially used. As demonstrated
here, this renders the dual space practically obsolete and therefore leads to a series
of misunderstandings. However, a residual form remains known as the reciprocal
basis, which finds broad application in certain areas of physics. Reciprocal bases are
clearly visible and distinguishable when nonorthogonal bases are used.
In the context of tensors, it is also important to differentiate whether the tensors
are defined on an abstract vector space or on an inner product space. For example, the
tensors we have discussed so far have all been tensors in an abstract vector space. It
is only in the next chapter that tensors defined on an inner product space will appear
for the first time. These are precisely the tensors that are preferred in physics.
Chapter 14
Tensor Formalism
Without any exaggeration, we can say that in physics tensor formalism is needed
and used as much as linear algebra itself. For example, it is impossible to understand
electrodynamics, relativity, and many aspects of classical mechanics without tensors.
Therefore, there is no doubt that a better understanding of tensor formalism leads to
a better understanding of physics.
Engineers and physicists first came across tensors in terms of indices to describe
certain states of solids. They were first realized as very complicated, unusual objects
with many indices, and their mathematical significance was even questionable. Later,
mathematicians found out that these objects correspond to a very precise and exciting
mathematical structure. It turned out that this structure is a generalization of linear
algebra. This is multilinear algebra and can be considered the modern name for what
physicists usually refer to as tensor calculus.
This chapter will discuss tensor formalism, also known as multilinear algebra, in
a basis-independent way. But of course, as we know from linear algebra, we cannot
do without basis-dependent representations of tensors. From Sect. 3.5 and Chap. 8
we already know what tensors are, and we know at least one possibility to arrive at
tensor spaces. Before, this was obtained by explicitly utilizing bases of vector spaces.
Now, we would like to achieve this differently, which will allow us to further expand
and consolidate the theory of tensors.
We start with the most general definition of a multilinear map and consider a vector-
valued multilinear map. Dealing with various special cases later will allow a better
understanding of the topic.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 371
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_14
372 14 Tensor Formalism
. f : V1 × · · · × Vk −→ Z
(v1 , . . . , vk ) |−→ f (v1 , . . . , vk ) ∈ Z
. T (V1 , . . . Vk ; Z ).
and
(λ f )(v1 , . . . , vk ) := λ f (v1 , . . . , vk ).
.
. T (V1 , . . . Vk ; R) ≡ T (V1 , . . . Vk ).
. T k (V ; Z ) or T k (V, R) ≡ T k (V ).
.ϕ : ,V × ·,,
· · × V, −→ R
k-times
(v1 , . . . , vk ) |−→ ϕ(v1 , . . . , vk ).
Such .ϕ are also called .k-linear forms of .k-forms or .k-tensors or more precisely
covariant .k-tensors.
When we talk about multilinear maps, we can ask what differences there are between
the properties of multilinear and linear maps. This leads us to the following comment
and remark.
Although the image of a linear map, as we know, is a vector space, the image
of a multilinear map is not, in general. A simple way to verify this fact is to
consider a bilinear map given by the product of two polynomials. Consider the
.(n + 1)-dimensional and .(2n + 1)-dimensional spaces of polynomials;
We set
. V = R[x]1 = {α0 + α1 x : α0 , α1 ∈ R}
and
. W = R[x]2 = {x0 + α1 x + α2 x 2 : α0 , α1 , α2 ∈ R}.
If we consider . f ∈ T 2 (V ; W ), given by
. f : V × V −→ W
(ϕ, χ) |−→ f (ϕ, χ) = ϕ · χ,
and take .ϕ, χ ∈ {1, x}, we see that .1, x 2 ∈ im f since .1 · 1 = 1 and .x · x =
x 2 . Yet, we observe that .1 + x 2 ∈
/ im f because otherwise .1 + x 2 would have
real roots, but as we know, both roots of .1 + x 2 = (x − i)(x + i) are strictly
imaginary.
Proof To avoid risking confusion due to burdensome notation, we will only prove
this in the case of a bilinear map . f ∈ T (U, V ; Z ): We take . B = (b1 , . . . , bk ), a basis
of .U , and .C = (c1 , . . . , cl ), a basis of .V . We set
. f (bs , ci ) = z si ∈ Z ,
s ∈ I (k), i ∈ I (l),
This shows the uniqueness since every single one of .z si , u sB , vCi is uniquely given.
This . f is bilinear as the partial maps . f u : V → Z and . f v : U → Z are linear. For
. f u we have for example,
A fact that is valid only for covariant tensors, is the connection of covariant
tensors on different vector spaces: Any linear map of vector spaces,
. F: V −→ W
v |−→ F(v) = w
. F ∗ : T k (W ) −→ T k (V )
ψ |−→ F ∗ ψ,
. F ∗ : W ∗ −→ V ∗
η |−→ F ∗ η, given by
(F ∗ η)(v) := η(Fv).
. T 1 (V, R) ≡ Hom(V, R) ≡ V ∗
The following examples also give a good idea of the corresponding multiplication,
the tensor product.
ξη :
. V ×V −→ R
(u, v) |−→ ξη(u, v) := ξ(u)η(v). (14.1)
. P: V∗ × V∗ −→ T 2 (V )
(ξ, η) |−→ P(ξ, η) := ξη ∈ T 2 (V ). (14.2)
14.1 Covariant Tensors and Tensor Products 377
. fg : X ×Y −→ R
( f, g) |−→ f g(x, y) := f (x)g(y). (14.3)
This product can also be called a tensor product. The product . f g is nat-
urally induced since it is based on the multiplication on their common
codomain .R.
Θ:
. V∗ × V∗ −→ V∗ ⊗ V∗
(ξ, η) |−→ Θ(ξ, η) := ξ ⊗ η. (14.5)
So we obtain
. V ∗ ⊗ V ∗ = span{β i ⊗ β j : i, j ∈ I (n)}. (14.7)
{(i, j ) where .i, j ∈ I (n)} since every set can be used as a basis of a vector
space!
.V ⊗ V ∼
∗ ∗
= R(I (n) × I (n)). (14.8)
Θ : V × V −→ V ⊗ V
.
As we see, via the same kind of product .Θ, we also obtain a tensor space
V ⊗ V over .V . For this reason, we see various symbols in literature which are
.
. V∗ ⊗ V∗ ∼
= T 2 (V ). (14.10)
Indeed, both have the same dimension and the exact relationship between .ξη
and .ξ ⊗ η is given by the following equation:
P̃
V∗ ⊗ V∗ T 2 (V )
Θ
P
V∗ × V∗
So we have:
ξ ⊗ η ⊗ θ ∈ V∗ ⊗ V∗ ⊗ V∗
.
given by
.(ξ ⊗ η ⊗ θ)(u, v, w) = ξ(u)η(v)θ(w) (14.13)
. T 3 (V ) ∼
= V ∗ ⊗ V ∗ ⊗ V ∗. (14.14)
Taking into account the above considerations and coming back to Comment
14.1, we can additionally give a new interpretation of the product of polynomials:
Taking. f = Θ, we have.Θ(ϕ, χ) = ϕ ⊗ χ. This means that we can also consider
the product of polynomials as example of a tensor product.
Θ : T k (V ) × T l (V ) −→ T k+l (V )
.
and
. T ⊗ (μ1 S1 + μ2 S2 ) = μ1 (T ⊗ S1 ) + μ2 (T ⊗ S2 ).
. L ⊗ (T ⊗ S) = (L ⊗ T ) ⊗ S
which is also easy to verify. This means that we can write tensor products of several
tensors without parentheses.
Since.T k (V ) is a vector space, we can choose a basis and determine the coefficients
of .T ∈ T k (V ).
of .T (V ) given by
k
. T = τi1 ...ik β i1 ⊗ . . . ⊗ β ik ,
This gives, based on Eq. (14.16) ., λi1 ...ik = Ti1 ...ik and (a) holds.
To show (ii), we set .λi1 . . . i k β i1 ⊗ . . . ⊗ β ik = 0 and by the same computa-
tion as above, we obtain .λi1 . . .ik = 0, so (b) is also valid and the proposition is
proven. ∎
Notice that with this proof we once again showed that the coefficients of .T
are given by .T (b j1 , . . . , b jk ). Additionally, this result makes the expression
. T k (V ) ∼
= ,V ∗ ⊗ ·,,
· · ⊗ V ,∗
k-times
. T k (V ) = ,V ∗ ⊗ ·,,
· · ⊗ V ,∗
k-times
plausible.
ξ i1 = ξ ij1 β j , . . . , ξ ik = ξ ijk β j ,
. (14.17)
with
ξ i1 , . . . , ξ ijk ∈ R
. j (14.18)
We may ask: Well, we see the product, the so-called tensor product. But
where is the algebra? Because in Definition 14.3, we leave both, .T k (V ) and
. T (V ).
l
The answer is that we have to take all the tensor spaces together and so to
obtain the tensor algebra over .V :
. T ∗ (V ) := T 0 (V ) ⊕ T 1 (V ) ⊕ · · · ⊕ T k (V ) ⊕ · · · .
In Example 14.5 in the last section, we also saw, in addition to the covariant tensor
space .V ∗ ⊗ V ∗ , the tensor space of contravariant tensors .V ⊗ V . It is reasonably
possible that as a student of physics, the first tensor space we meet is .V ⊗ V . We
first started with covariant tensors because, coming from analysis, it seems quite
natural to introduce and discuss the tensor product within covariant tensors. This
was also demonstrated in Example 14.5.
It is, therefore, necessary to explain the connection between covariant and con-
travariant tensors. As the reader has probably already realized, this connection has a
name: duality. To clarify the role of duality thoroughly, it is helpful to revue what we
know and to consider all the relevant possibilities which appear. If we start by com-
paring .V with its dual .V ∗ , we also have to compare .V ∗ with its dual .(V ∗ )∗ ≡ V ∗∗ ,
the so-called double dual of .V . We cannot stop here since we have also to compare
∗∗
.V with .(V ∗∗ )∗ and so on. But when the dimension of .V is finite, this procedure
stops.
14.2 Contravariant Tensors and the Role of Duality 383
As we saw in Eq. 13.17 in Sect. 13.1, it turns out that.V ∗∗ is canonically isomorphic
∼
to .V : V ∗∗ can
= V . As in Sect. 14.1, . V is an abstract vector space without further
structure. We know that in this case .V and .V ∗ are isomorphic, but this isomorphism
is basis dependent: .V ∼= V ∗ (noncanonical). So we have, for example,
B
.ψ B : V −→ V ∗
bi |−→ β i (14.19)
∼
= ∗∗
.ev :V →V
can
v |−→ v := ev(v) ∈ Hom(V ∗ , R),
#
(14.20)
given by
.v # :V ∗ −→ R
ξ |−→ v # (ξ) := ξ(v). (14.21)
This is the reason why we may identify .V with .V ∗∗ : V ∗∗ = V , and we do not make
a distinction between the elements .v # and .v. This identification is beneficial since it
simplifies a lot the tensor formalism. On the other hand, if we want to profit from
this simplification, we have to get a good understanding of this identification.
Proceeding similarly as in Sect. 14.1, we consider contravariant tensors essentially
by exchanging .V and .V ∗ . We now consider multilinear functions on .V ∗ instead of
considering them on .V . So the contravariant tensor of order .k, which we denote by
∗
. T (V ),
k
. T (V ) ∼
∗
k
= V ,⊗ ·,,
· · ⊗, V.
k-times
Taking into account Proposition 14.1 and using the notation . B # = (b1# , . . . , bn# ) for
a basis in .V ∗∗ and the identification .bi# = bi , we obtain for .T ∈ T k (V ∗ )
That should be compared to the actual result in Proposition 14.1 for a covariant
tensor.
384 14 Tensor Formalism
. T ∈ T k (V ) :
T = τi 1 ···ik β i1 ⊗ · · · ⊗ β ik .
T (V ) = V ⊗
. k · · ⊗, V and T k (V ) = ,V ∗ ⊗ ·,,
, ·,, · · ⊗ V ,∗ .
k-times k-times
We start with an example of a mixed tensor, the following canonical bilinear form
given by .V ∗ and .V which is the evaluation of covectors on vectors:
. φ : V ∗ × V −→ R
(ξ, v) |−→ φ(ξ, v) := ξ(v). (14.22)
This bilinear form .φ is nondegenerate and inherently utilizes the canonical isomor-
phism between .V and .V ∗∗ , given by the partial map .ψv .
φ̃ : V −→ (V ∗ )∗
.
This.ψv is the same as the map.v # of Eq. (14.21) and we notice again the identification
.ψv = v . With this preparation, it is easy to define a mixed tensor.
#
T k (V ) = ,V ∗ ⊗ ·,,
. l · · ⊗ V ,∗ ⊗ ,V ⊗ ·,,
· · ⊗ V, . (14.25)
k-times l-times
Using again the reasoning of Proposition 14.1, we may denote a mixed tensor . S ∈
Tlk (V ) by:
j1 ··· jl i 1
. S = σi ···i β ⊗ · · · ⊗ β ⊗ b j1 ⊗ · · · ⊗ b jl .
ik
1 k
(14.26)
14.4 Tensors on Semi-Euclidean Vector Spaces 385
Mixed tensors are the most general case in which we need a change of basis
formula. Using the notation of Example 8.1, we consider the bases . B = (bs ), B ∗ =
(β s ) and .C = (ci ), C ∗ = (γ i ) of .V and .V ∗ correspondingly, with .β r (bs ) = δsr and
j
.γ (ci ) = δi , and .r, s, i, j ∈ I (n).
j
. T = TC B = (τsi ) ∈ Gl(n)
and
T −1 = TBC = (τ̄is ).
∗ i 1 ···il
.(C, C ) by .σ(C) j ··· j .
1 k
So the transition map at the level of tensors is:
σ(C)ij11···i
.
il r1 rk s1 ···sl
··· jk = τs1 · · · τsl τ̄ j1 · · · τ̄ jk σ(B)r1 ···rk .
l i1
In the last section, the vector space .V was an abstract vector space without further
structure. This section considers tensors on a vector space with additional structures.
The most crucial additional structure on an abstract vector space in physics is a
Euclidean or semi-Euclidean structure. We add to the vector space .V a symmetric
nondegenerate bilinear form .s ∈ T 2 (V ) which is, as we know, a special covariant
tensor of rank 2. So we obtain .(V, s), a Euclidean vector space if .s is positive definite
or a semi-Euclidean vector space (e.g. Minkowski space in special relativity) if .s is
symmetric and nondegenerate.
In both cases, the connection between .V and .V ∗ changes drastically. As we
saw in Sect. 13.2, given the metric .s, there exists a canonical isometry between
∗
. V and . V :
∼
= (V ∗ , s ∗ ).
.(V, s)can
This means that we identify not only.V with.V ∗∗ but also.V with.V ∗ . The identification
∗
.V = V essentially indicates that we can forget the dual .V ∗ and that the vector
space .V alone is relevant for the tensor formalism. This means that the distinction
between covariant and contravariant tensors is obsolete: Given .(V, s), we also have
the identification .T k (V ∗ ) = T k (V ) or equivalently
386 14 Tensor Formalism
∗
. · · ⊗ V ,∗ = ,V ⊗ ·,,
,V ⊗ ·,, · · ⊗ V, .
k−times k−times
We therefore have only one kind of tensor. The distinction between covariant and
contravariant tensors is only formal and refers to the representation (the coefficients)
of a tensor relative to a given basis in .V . This follows directly from our discussion in
Sect. 13.2. The canonical isometry brings the basis . B ∗ = (β 1 , . . . , β n ) in .V ∗ down
to .V (see Remark 13.2):
š(β 1 ) = b1 , . . . , š(β n ) = bn .
.
Accordingly, we may call .τ i1 ···ik a contravariant coefficient and .τi1 ···ik a covariant
coefficient for the only given tensor .T . Similarly, we can continue with mixed coeffi-
cients of the type .(k, l). If we considered .V without a metric, as in a previous section,
we would have for .T ∈ Tlk (V )
j ···i
. T = τi11···ikl b j1 ⊗ · · · ⊗ b jl ⊗ β i1 ⊗ · · · ⊗ β ik .
with .i 1 , . . . , i m ∈ I (m), m = k + l.
and an abstract vector space .W of the same dimension .(dim W = n k ). The two
vector spaces .T k (V ) and .W are isomorphic as abstract vector spaces because of their
identical dimensions.
It should also be evident that a vector space like .V ⊗ · · · ⊗ V has more structure
than the vector space .W . This is so for at least two reasons.
Firstly, the vector space .V ⊗ · · · ⊗ V is different from .W since its elements have
the form of a linear combination of .v1 ⊗ · · · ⊗ vk , and secondly we have some kind
of product on it (the tensor product). The identification .T k (V ) = V ∗ ⊗ · · · ⊗ V ∗ and
the explicit presence of the tensor product symbol .⊗ make this algebraic structure
visible (see Comment 14.4).
On the other hand, such an algebraic structure is not available on the abstract vector
space .W . Furthermore, the comparison between the multilinear maps in .T k (V ∗ ) and
the tensor product .V ⊗ · · · ⊗ V or equivalently the multilinear maps .T k (V ) and
∗ ∗ ∗
. V ⊗ · · · ⊗ V , is an instructive one: Expressions like . V ⊗ · · · ⊗ V or . V ⊗ · · · ⊗
∗
V are by definition purely algebraic objects. Since we start with an abstract vector
space .V , its elements are just vectors and not maps, the elements of .V ⊗ · · · ⊗ V are
vectors (tensors), not maps or even multilinear forms like the elements of .T k (V ∗ ).
The same holds for .V ∗ and .V ∗ ⊗ · · · ⊗ V ∗ since by definition going from .V ∗ to
∗ ∗
. V ⊗ · · · ⊗ V , we follow the same procedure as going from . V to . V ⊗ · · · ⊗ V . This
We now return to Example 14.5 in Sect. 14.1 and once more discuss the relation of
tensors as multilinear maps to linear maps we found there (see diagram in Example
14.5), from a more general point of view. A tensor space is also a special vector space
that allows us to consider multilinear maps as linear maps. The domain of this special
linear map is the tensor space which corresponds to the given multilinear map. This
leads to the following proposition:
388 14 Tensor Formalism
Let .U and .V be two real vector spaces. There exists a vector space .U ⊗ V
together with a bilinear map
Θ : U × V −→ U ⊗ V,
. (14.27)
. ϕ : U × V −→ Z , (14.28)
ϕ̃ : U ⊗ V −→ Z ,
.
such that .ϕ = ϕ̃ ◦ Θ.
. T (U, V ; Z ) ∼
= Hom(U ⊗ V, Z ). (14.29)
Proof The use of bases makes the proof quite transparent. Let
There is a unique linear map .ϕ̃ ∈ Hom(U ⊗ V, Z ) defined on the basis elements
by
U
. ⊗ V ϕ̃ Z
−
→
bs ⊗ ci |→ ϕ̃(bs ⊗ c1 ) := z si . (14.30)
So we have
ϕ̃(Θ(bs , ci )) = ϕ(bs , ci )
.
or equivalently
. ϕ̃ ◦ Θ = ϕ.
. T (U, V ; Z ) ∼
= Hom(U ⊗ V, Z ). (14.31)
and to
. T 2 (U, V ) ∼
= Hom(U ⊗ V, R) ≡ (U ⊗ V )∗ . (14.32)
If we use the assertion of Example 14.5 in Sect. 14.1 and the isomorphism
. V∗ ⊗ V∗ ∼ = T 2 (V ) of Eq. (14.10), and after identification .T 2 (U, V ) = U ∗ ⊗
∗
V , we obtain additionally the equation
(U ⊗ V )∗ = U ∗ ⊗ V ∗ .
. (14.33)
390 14 Tensor Formalism
The above proposition along with its proof give a good understanding of the prop-
erties of linear and multilinear maps. Tensor spaces as domains of linear maps of the
form .V ⊗ · · · ⊗ V , corresponding to multilinear maps .T k (V ∗ ), can be considered as
algebraic manifestations of multilinear maps. This makes some properties of multi-
linear maps more transparent and justifies once more the identification of, a priori
different objects like
. T k (V ) = (V ⊗ · · ⊗, V )∗
, ·,, = V∗ ⊗
, ·,,
· · ⊗, V k ; (14.34)
k-times k-times
. T k (V ∗ ) = (V ∗ ⊗ · · ⊗, V ∗ )∗ = (V ⊗
, ·,, , ·,,
· · ⊗, V ). (14.35)
k-times k-times
This section summarizes and discusses some of the essential identifications within
tensor formalism which stem from duality and the universal properties of tensor
products. We also comment shortly on the relevant proofs since this gives a better
understanding of the key relations. When we consider duality, we use the identifi-
cation .V ∗∗ = V and the expected identification .(U × V )∗ = U ∗ ⊗ V ∗ for which, on
this occasion, we prove the corresponding canonical isomorphism.
We first discuss the following isomorphisms.
(i) .(U ⊗ V )∗ ∼
= T (U, V );
(ii) .U ∗ ⊗ V ∗ ∼
= T (U, V );
(iii) .(U ⊗ V )∗ ∼
= U ∗ ⊗ V ∗.
Proposition 14.2 in Sect. 14.5.2 leads to the bijection .ϕ ↔ ϕ̃ with .ϕ ∈ T (U, V ) and
ϕ̃ ∈ (U ⊗ V )∗ . It leads to the canonical isomorphism .Θ̃:
.
14.6 Universal Property of Tensors and Duality 391
. T (U, V ) −→ (U ⊗ V )∗
ϕ |−→ ϕ̃
given by
∼
=
. → (U ⊗ V )∗
T (U, V ) −
Θ̃
So (i) is proven. ∎
ψ̃
U∗ ⊗ V ∗ T (U, V )
Θ
ψ
U∗ × V ∗
Here, we need to show that the linear map .ψ̃ is a bijection. Since .dim((U ∗ ⊗ V ∗ ) =
dim T (U ∗ , V ∗ ), we only have to show that .ψ̃ is injective, see Example 14.5 in Sect.
14.1, with the corresponding notation . P = ψ and . P̃ = ψ̃. So we now have
ψ : U ∗ × V ∗ −→ T (U, V ),
.
ψ̃ : U ∗ ⊗ V ∗ −→ T (U, V ),
ξ ⊗ η | −→ ψ̃(ξ ⊗ η) = ψ(ξ, η) = ξη,
Θ̃
. T (U ∗ , V ∗ ; T (U, V )) −→ Hom(U ∗ ⊗ V ∗ , T (U, V )),
ψ −→ ψ̃.
This means .ψ̃(ξ ⊗ η)(β s , γ i ) = ξ(β s )η(γ i ) = 0 for all .s and .i. We obtain .ψ̃(ξ ⊗
η) = 0∗ ∈ U ∗ ⊗ V ∗ and .ψ̃ is indeed injective, and .ψ̃ is the canonical isomorphism:
392 14 Tensor Formalism
U∗ ⊗ V ∗ ∼
. = T (U, V ) (isomorphism (ii))
U ∗ ⊗ V ∗ = T (U, V ).
.
So (ii) is proven. ∎
(U ⊗ V )∗ ∼
. = U ∗ ⊗ V ∗ (isomorphism (iii)).
. T (U, V ) = (U ⊗ V )∗ , T (U, V ) = U ∗ ⊗ V ∗ , (U ⊗ V )∗ = U ∗ ⊗ V ∗
. T (U ∗ , V ∗ ) = U ⊗ V, T (U ∗ , V ∗ ) = U ⊗ V, (U ∗ ⊗ V ∗ )∗ = U ⊗ V.
with
. T k+1 (V, Z x ∗) = T (V, . . . , V .
, ,, ,
k-times,Z ∗ )
. Z∗ × Z −→ R
(ξ, z) |−→ ξ(z) ∈ R.
14.6 Universal Property of Tensors and Duality 393
F
. T k (V ; Z ) −→ T k+1 (V, Z ∗ )
ψ |−→ ψ# ,
where
.ψ # (v1 , . . . , vk , ξ) := ξ(ψ(v1 , . . . , vk )) ∈ R. (14.36)
The analogy with . Z =Z ˜ ∗∗ becomes visible if we fix .(v1 , . . . , vk ) and set .ψ̃ :=
ψ(v1 , . . . , vk ) ∈ Z and .ϕ̃ := ϕ(v1 , . . . , vk , ξ) ∈ R. The canonical isomorphism
˜ ∗∗ corresponds to
. Z =Z
. T (V ; Z ) ∼
= T k+1 (V, Z ∗ ).
k
. T k (V ; Z ) = T k+1 (V, Z ∗ ).
Corollary 14.2 From this, we can also obtain the following interesting special
cases. When .k = 1 we have
. Hom(V, Z ) ≡ T 1 (V ; Z )
or
. Hom(V, Z ) = V ∗ ⊗ Z .
. T 0 (V ; Z ) = T (1) (V, Z ∗ ),
. T 0 (V ; Z ) = T 1 (Z ∗ ) or
Z = Z ∗∗ .
394 14 Tensor Formalism
The tensor contraction, also known as the contraction or trace map, is a simple
operation on tensor spaces that we already know. It is essentially the evaluation of
a covector applied to a vector giving a scalar. It was used at the beginning of Sect.
14.3 about mixed tensors and the beginning of the proof in Proposition 14.4. The
importance of this operation explains our repeating. We consider the map again:
φ : V∗ × V
. −→ R
(ξ, v) |−→ φ(ξ, v) := ξ(v).
φ̃
V∗ ⊗ V R
Θ
φ
V∗ × V
C : V∗ ⊗ V
. −→ R
(ξ ⊗ v) |−→ C(ξ ⊗ v) = ξ(v).
The map .C is called tensor contraction. As we see, both maps, .φ and .C above, are
basis-independent. This .C can also be considered as an operator on tensor spaces.
. C : T11 (V ) −→ T00 (V ).
From Corollary 14.2 in the previous section, there follows the canonical
isomorphism:
∼ ∗
. Hom(V, V )=
ιV ⊗ V
tr
. Hom(V, V ) −→ R
f |−→ tr ( f ) := C ◦ ι( f ).
We would now like to extend the tensor contraction .C of the tensor of type .(k, l):
C ij : Tlk (V ) −→ Tl−1
.
k−1
(V ).
T k (V ) = V1∗ ⊗ · · · ⊗ Vk∗ ⊗ V1 ⊗ · · · ⊗ Vl .
. l
C ij
∗ ⊗ ··· ⊗ V∗ ⊗ V ⊗ ··· ⊗ V
. V1 −→ R ⊗ V1∗ ⊗ · · · ⊗ Vk−1
∗ ⊗ V ⊗ ··· ⊗ V
k 1 l 1 l−1
· · · ⊗ ξi ⊗ · · · ··· ⊗ vj⊗ | −→ ξ i (v j )ξ 1 ⊗ · · · ⊗ ξ k−1 ⊗ v1 ⊗ · · · ⊗ vl−1 .
Example 14.7 .T = ξ 1 ⊗ ξ 2 ⊗ v1 ⊗ v2 ⊗ v3
If we take .T = ξ 1 ⊗ ξ 2 ⊗ v1 ⊗ v2 ⊗ v3 , we obtain
Similarly, we get
We can also consider .Tlk (V ) as a multilinear form. In this case, the analogous
procedure for the application of contraction .C ij leads to the following development:
Consider .T ∈ Tlk (V ), with
396 14 Tensor Formalism
. · · ×, V × V ∗ ×
T : V ,× ·,, · · ×, V ∗ −→ R.
, ·,,
k-times l-times
We first fix the vectors .v1 , . . . , vk−1 and the covectors .ξ 1 , . . . , ξ l−1 and we then
consider a basis . B = (b1 , . . . , bn ) in .V and its dual .B = (β 1 , . . . , β ' ) so that
.(β (bs ) = δs i, s ∈ I (n)). We define:
i i
Taking the vector .bs ∈ V in position . j, and the covector .β s ∈ V ∗ in position .i.
With these definitions, we have .(C ij T )(v1 , . . . , vk−1 , ξ 1 , . . . , ξ l−1 ) ∈ R so that
.C j T is evidently a multilinear form of type .(k − 1, l − 1) and .C j a contraction of . T
i i
C ij
T k (V ) −→ Tl−1
. l
k−1
(V ).
∑
n
C31 T (v1 , v2 , ξ 2 ) =
. T (v1 , v2 , bs , β s , ξ 2 ).
s=1
∑
n
C21 T (v1 , v3 , ξ 2 ) =
. T (v1 , bs , v3 , β s , ξ 2 ).
s=1
∑
n
C31 T (bi1 , bi2 , β j1 ) =
. T (bi1 , bi2 , bs , β s , β j1 )
s=1
sj
= τi1 i12 s
and similarly:
∑
n
C21 T (bi1 , bi2 , β j1 ) =
. T (bi1 , bs , bi2 , β s , β j1 )
s=1
sj
= τi1 si1 2 .
Summary
This was the third time we delved into tensors. The first two approaches in Sect. 3.5
and Chap. 8 were basis-dependent to facilitate understanding. Simultaneously, our
index notation, which we have used throughout linear algebra, significantly facilitated
this understanding.
In this chapter, it was time for the basis-free and coordinate-free treatment of
tensor formalism. In this sense, we could affirm that a tensor is a multilinear map.
Since we initially considered abstract vector spaces, the distinction between covari-
ant, contravariant, and mixed tensors was necessary. Following that, we introduced
and discussed an inner product vector space, establishing and discussing the corre-
sponding tensor quantities.
After this, the universal property of the tensor product was introduced, allowing
for a deeper understanding of the concept of tensors. Finally, using the universal
property, several commonly used relationships, essentially involving the dual space
of a tensor product of two vector spaces, were proven.
A Covector fields, 71
Abelian group, 19, 20, 54, 81, 99
Affine space, 17, 55, 56, 138
Algebra, 20 D
Algebra End(V), 267 Decomposition, 117, 260, 261, 281
Algebraic multiplicity, 256 Determinant, 203
Alternating tensors, 239 Determinant function, 204
Angle, 47 Diagonalizability, 259, 260, 262, 265, 266, 287
Annihilators, 184
Dimension, 96, 108, 115, 125
Direct product, 114, 118
B Direct sum, 62, 118, 252
Basis, 94, 95, 106, 380 Disjoint composition, 221
Dual, 37, 50, 71, 86
Basis isomorphism, 109
Dual basis, 176
Bijective, 108
Dual map, 178
Bilinear form, 40, 49, 355
Dual space, 37
C E
Canonical basis, 72 Effective action, 11
Canonical dual basis, 176 Eigen element, eigenelement, 243
Canonical isometry, 366 Eigenspace, 243
Canonical isomorphism, 365 Eigenvalue, 243
Canonical map, 4, 60 Eigenvector, 243
Cauchy-Schwarz, 43 Elementary matrices, 199
Cayley-Hamilton, 282 Elementary row operations, 201
Change of basis, 157, 235 Equivalence classes, 2, 3, 6, 57, 58, 106, 111,
Characteristic polynomial, 254, 287 152, 220
Colist, 86, 87, 147, 362 Equivalence relation, 1–4, 6, 11, 12, 14, 58, 90,
Column rank, 148 112, 152, 203, 218, 219, 356
Column space, 148 Equivariant map, 1, 13, 14, 100, 102
Complementary subspace, 116 Equivariant vector space, 101
Complement in set theory, 116 Euclidean space, 138
Complexification, 329, 330 Euclidean vector space, 17, 39, 44, 47, 48, 54,
Complex spectral theorem, 314, 316, 327 74, 135, 186, 190, 215, 299, 333, 336,
Contravariant tensor, 382 346, 365, 385
Covariant k-tensor, 372
Covector, 53, 71, 87, 97, 175, 362
© The Editor(s) (if applicable) and The Author(s), under exclusive license 399
to Springer Nature Switzerland AG 2024
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0
400 Index
G N
Generalized eigenvector, 273 Newton Axioms, 120
Geometric multiplicity, 251, 256 Newtonian mechanics, 74, 120, 365
Gram-Schmidt orthogonalization, 302 Nilpotent, 277, 288
Group, 6–12, 14, 17, 98, 131, 133, 134, 138, Nilpotent matrix, 274–276
139, 221, 223, 227, 317, 325, 348, 359 Nilpotent operator, 248, 274
Group action, 1, 6–8, 13, 54, 98, 101, 104 Non-diagonalizability, 248, 268–270, 272, 276
G-space, 9 Nondiagonalizable, 271, 328
Norm, 40
Normal form, 111, 112
H Normal matrices, 333
Homogeneous, 165 Normal operator, 308, 310, 316, 335, 337, 338
Homogeneous equation, 165
Homomorphism, 36
O
Orientation, 219
I Oriented vector space, 220
Ideal, 284 Orthogonal basis, 301
Image, 36 Orthogonal complement, 304, 305
Inhomogeneous, 165 Orthogonality, 41
Injective, 108, 185 Orthogonal operator, 346, 348
Inner product, 38, 39, 366 Orthogonal projection, 41, 306
Isometry, 325, 327, 366 Orthogonal sum, 304
Isomorphism, 36 Orthonormal basis, 301
Isotropy group, 10–13 Orthonormal expansion, 301
K P
Kernel, 36 Parallel projection, 66
Permutation sign, 223
Photon, 136
L Point, 24, 29
Lagrangian equation, 120, 123 Polynomials, 267
Laplace expansion, 214 Pullback, 180
Left group action, 7, 10, 158
Leibniz formula, 225
Linear combination, 28, 82 Q
Linearly dependent, 89, 90, 92, 93 Quotient map, 4, 5, 60
Linearly independent, 63, 89–91, 93, 115, 251 Quotient space, 1, 3, 5, 6, 11–14, 56, 58, 60–62,
Linear maps, 32, 36 79, 111, 218, 220, 221
Linear system, 164, 172 Quotient vector space, 60
List, 22, 86, 87, 234, 362
R
M Rank, 83, 356
Matrix, 22, 86, 106, 113, 234, 362 Rank inequalities, 146
Matrix multiplication, 53, 142, 143, 193 Rank-nullity, 107
Index 401