0% found this document useful (0 votes)
69 views410 pages

Linear Algebra For Physics-Springer (2024)

The document is a preface and introduction to the book 'Linear Algebra for Physics' by Nikolaos A. Papadopoulos and Florian Scheck, aimed at physics students. It emphasizes the importance of linear algebra in physics, asserting that a significant portion of mathematical concepts in physics is rooted in linear algebra. The book is structured to cater to both undergraduate and graduate students, providing a comprehensive treatment of linear algebra with a focus on its applications in physical sciences.

Uploaded by

hijbwzs2rk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views410 pages

Linear Algebra For Physics-Springer (2024)

The document is a preface and introduction to the book 'Linear Algebra for Physics' by Nikolaos A. Papadopoulos and Florian Scheck, aimed at physics students. It emphasizes the importance of linear algebra in physics, asserting that a significant portion of mathematical concepts in physics is rooted in linear algebra. The book is structured to cater to both undergraduate and graduate students, providing a comprehensive treatment of linear algebra with a focus on its applications in physical sciences.

Uploaded by

hijbwzs2rk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 410

Nikolaos A.

Papadopoulos
Florian Scheck

Linear
Algebra
for Physics
Linear Algebra for Physics
Nikolaos A. Papadopoulos · Florian Scheck

Linear Algebra for Physics


Nikolaos A. Papadopoulos Florian Scheck (Deceased)
Institute of Physics (WA THEP) Mainz, Germany
Johannes Gutenberg University of Mainz
Mainz, Germany

ISBN 978-3-031-64907-3 ISBN 978-3-031-64908-0 (eBook)


https://doi.org/10.1007/978-3-031-64908-0

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

If disposing of this product, please recycle the paper.


All humans by nature desire to know.
—Aristotle
Metaphysics A 1.980a21

To the extent that [a science] is concerned


with what is prior in account and simpler, to
that extent the more exactness it has (and this
is what simplicity does).
—Aristotle
Metaphysics M 3.1078a9-11
To Constantin Carathéodory
and Hermann Weyl
Preface

What is a matrix? What is linear algebra? Most of those who open this book already
have an idea or perhaps a good understanding of both.
When one first comes across linear algebra, one may naively think that it is a
standalone area of mathematics, distinct from the rest. On the other hand, matrices
seem to simply be rectangular arrays of numbers and the reader may think that they
already know everything there is to know about them. However, once the reader has
gone through the first chapters of this book, they will experience that matrices are
omnipresent throughout the book, and it will gradually become clear that matrices
and linear algebra are two sides of the same coin. In fact, matrices are what you
can do with them, and linear algebra itself is ultimately the mathematical theory of
matrices.
A physicist constantly uses coordinates, which essentially are also matrices. This
fact is particularly pleasing for a physicist and could greatly facilitate the access and
understanding of linear algebra.
So why is linear algebra important for physicists?
The well-known mathematician Raoul Bott stated that 80% of mathematics is
linear algebra. According to our own experience with physics, we would state that
almost 90% of mathematics in physics is linear algebra. Furthermore, according to
our experience with physics students, the most challenging subject in mathematics for
Bachelor students is linear algebra. Students usually have hardly any problem with
the rest of mathematics, such as, for example, calculus which is already known from
school. The important challenge of linear algebra seems to be that it is underestimated,
both from the curriculum point of view and from the students themselves. The reason
for this underestimation is probably the widespread idea that, with linear algebra
being “linear”, it is trivial and plain, easy to learn and use.
Our intention is therefore, among others, to contribute with this book to ameliorate
this asymmetric relation between linear algebra and the rest of mathematics.

ix
x Preface

Finally, we would like to add that with the advent of Gauge theory, structures
connected to symmetries became much more important for physics than they ever
were at any time before. This means that virtuosity in handling linear algebra is
needed since linear structures are one of the chief instruments in symmetries.

Mainz, Germany Nikolaos A. Papadopoulos


Florian Scheck
Acknowledgments

Our first and foremost Acknowledgment goes to Christiane Papadopoulos, who tire-
lessly and devotedly prepared the LaTeX manuscript with extreme efficiency, and
beyond. We also extend our gratitude to our students. One of us, N.P., particularly
thanks the numerous physics students who, over the past few years, have contributed
significantly to our understanding of the role of Linear Algebra in physics through
their interest, questions, and bold responses in several lectures in this area. Our thanks
also go to many colleagues and friends. Andrés Reyes contributed significantly to
determining the topics to be considered in the early stages of the book. Our physics
colleague, Rolf Schilling, provided extensive support and highly constructive criti-
cism during the initial version of the manuscript. Similar appreciation is extended
to Christian Schilling for several selected chapters. We vividly remember how much
we learned about current mathematics many years ago from Matthias Kreck. The
numerous discussions with him about mathematics and physics have left traces in
this book. The same applies to Stephan Klaus, Stephan Stolz, and Peter Teichner.
One of us, N.P., has benefited greatly from discussions with Margarita Kraus and
Hans-Peter Heinz. Equally stimulating have been the discussions and collaborations
with mathematician Vassilios Papakonstantinou, which have lasted for decades until
today. We would like to express our heartfelt gratitude to mathematician Athanasios
Bouganis, whose advice in the final stages of the book was crucial. Last but not least,
we would like to thank mathematician Daniel Funck from Durham University, who
significantly improved not only the linguistic quality of the book but also its overall
content with great care and dedication.
We also thank the staff of Springer Nature and, in particular, Ute Heuser, who
strongly supported this endeavor.

xi
About This Book

In this book, we present a full treatment of linear algebra devoted to physics students
both undergraduate and graduate since it contains parts which are relevant for both.
Although the mathematical level is similar to the level of comparable mathematical
textbooks with definitions, propositions, proofs, etc., here, the subject is presented
using the vocabulary corresponding to the reader’s experience made in his lectures.
This is achieved by the special emphasis given to the role of bases in a vector space.
As a result, the student will realize that indices, as many as they may be, are not
enemies but friends since they give additional information about the mathematical
object we are using.
The book begins with an introductory chapter, the second chapter, which provides
a general overview of the subject and its relevance for physics. Then, the theory is
developed in a structured way, starting with basic structures like vector spaces and
their duals, bases, matrix operations, and determinants. After that, we recapitulate
the role of indices in linear algebra and give a simple but mathematically accurate
introduction to tensor calculus.
The subject material up to Chap. 8 may be considered as the elementary part of
linear algebra and tenor calculus. Detailed discussion about eigenvalues and eigen-
vectors is followed by Chap. 9 on operators on inner product spaces, which includes,
among many other things, a full discussion of the spectral theorem. This is followed
by a thorough presentation of tensor algebra in Chaps. 3, 8, and 14 which takes full
advantage of the material developed in the first chapters, thus making the introduction
of the standard formalism of multilinear algebra nothing else but a déjà vu.
Chapter 1 includes material that is usually left for the appendix. However, as
we wanted to highlight the usefulness on this chapter, especially for physicists, we
placed it as Chap. 1.
All chapters contain worked examples. The exercises and the hints are destined
mainly for physics students. Our approach is in many regards quite different from
the standard approach in the mathematical literature. We therefore hope that students
of both physics and mathematics will benefit a great deal from it.
Where the organization of the book is concerned, the first eight chapters deal
with what we would call elementary linear algebra and is therefore perfectly suitable

xiii
xiv About This Book

for bachelor students. It covers what is commonly needed in everyday physics. The
remaining chapters give a perspective and allow insights into what is interesting
and important beyond this. Hence the subjects of Chap. 9 up to the last one can be
considered as the advanced linear algebra part of the book.
Everything is written from a physicist’s perspective but respecting a stringent
mathematical form.
Contents

1 The Role of Group Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Some Prerequisites: Mathematical Structures . . . . . . . . . . . . . . . . . 1
1.2 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Equivariant Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 A Fresh Look at Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Examples of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Examples of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Linear Maps and Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Examples of Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Vector Spaces with Additional Structures . . . . . . . . . . . . . . . . . . . . 38
2.3.1 Examples of Vector Spaces with a Scalar Product . . . . . . 44
2.4 The Standard Vector Space and Its Dual . . . . . . . . . . . . . . . . . . . . . 50
2.5 Affine Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6 Quotient Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.7 Sums and Direct Sums of Vector Spaces . . . . . . . . . . . . . . . . . . . . . 62
2.7.1 Examples of Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.8 Parallel Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.9 Family of Vector Spaces in Newtonian Mechanics . . . . . . . . . . . . 67
2.9.1 Tangent and Cotangent Spaces of R2 . . . . . . . . . . . . . . . . 67
2.9.2 Canonical Basis and Co-Basis Fields of R2 . . . . . . . . . . . 71
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3 The Role of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.1 On the Way to a Basis in a Vector Space . . . . . . . . . . . . . . . . . . . . . 81
3.2 Basis Dependent Coordinate Free Representation . . . . . . . . . . . . . 96
3.2.1 Basic Isomorphism Between V and Rn . . . . . . . . . . . . . . . 97
3.2.2 The Space of Bases in V and the Group Gl(n) . . . . . . . . 98

xv
xvi Contents

3.2.3 The Equivariant Vector Space of V . . . . . . . . . . . . . . . . . . 100


3.2.4 The Associated Vector Space of V . . . . . . . . . . . . . . . . . . 104
3.3 The Importance of Being a Basis Hymn to Bases . . . . . . . . . . . . . . 105
3.4 Sum and Direct Sum Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5 The Origin of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.6 From Newtonian to Lagrangian Equations . . . . . . . . . . . . . . . . . . . 120
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4 Spacetime and Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.1 Newtonian Mechanics and Linear Algebra . . . . . . . . . . . . . . . . . . . 131
4.2 Electrodynamics and Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . 136
Reference and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5 The Role of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.1 Matrix Multiplication and Linear Maps . . . . . . . . . . . . . . . . . . . . . . 141
5.2 The Rank of a Matrix Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3 A Matrix as a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.4 Vector Spaces and Matrix Representations . . . . . . . . . . . . . . . . . . . 154
5.5 Linear Maps and Matrix Representations . . . . . . . . . . . . . . . . . . . . 160
5.6 Linear Equation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6 The Role of Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.1 Dual Map and Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2 The Four Fundamental Spaces of a Linear Map . . . . . . . . . . . . . . . 181
6.3 Inner Product Vector Spaces and Duality . . . . . . . . . . . . . . . . . . . . . 185
6.4 The Dirac Bra Ket in Quantum Mechanics . . . . . . . . . . . . . . . . . . . 191
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7 The Role of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.1 Elementary Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.2 The Algebraic Aspects of Determinants . . . . . . . . . . . . . . . . . . . . . 203
7.3 Second Definition of the Determinant . . . . . . . . . . . . . . . . . . . . . . . 210
7.4 Properties of the Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.5 Geometric Aspects of the Determinants . . . . . . . . . . . . . . . . . . . . . . 215
7.6 Orientation on an Abstract Vector Space . . . . . . . . . . . . . . . . . . . . . 218
7.7 Determinant Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.7.1 The Role of the Group of Permutations . . . . . . . . . . . . . . 222
7.7.2 Determinant Form and Permutations . . . . . . . . . . . . . . . . . 224
7.7.3 The Leibniz Formula for the Determinant . . . . . . . . . . . . 225
7.8 The Determinant of Operators in V . . . . . . . . . . . . . . . . . . . . . . . . . 225
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8 First Look at Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.1 The Role of Indices in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 231
8.2 From Vectors in V to Tensors in V . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Contents xvii

9 The Role of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 241


9.1 Preliminaries on Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . 241
9.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.3.1 Eigenvalues, Eigenvectors, Eigenspaces . . . . . . . . . . . . . . 244
9.3.2 Eigenvalues, Eigenvectors, Eigenspaces
of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3.3 Determining Eigenvalues and Eigenvectors . . . . . . . . . . . 257
9.4 The Question of Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . 258
9.5 The Question of Non-diagonalizability . . . . . . . . . . . . . . . . . . . . . . 268
9.6 Algebraic Aspects of Diagonalizability . . . . . . . . . . . . . . . . . . . . . . 283
9.7 Triangularization and the Role of Bases . . . . . . . . . . . . . . . . . . . . . 290
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10 Operators on Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.1 Preliminary Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.2 Inner Product Spaces Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
10.3 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
10.4 Orthogonal Sums and Orthogonal Projections . . . . . . . . . . . . . . . . 303
10.5 The Importance of Being a Normal Operator . . . . . . . . . . . . . . . . . 308
10.6 The Spectral Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
11 Positive Operators–Isometries–Real Inner Product Spaces . . . . . . . . 323
11.1 Positive and Nonnegative Operators . . . . . . . . . . . . . . . . . . . . . . . . . 323
11.2 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
11.3 Operators in Real Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
11.4 Normal Operators on Real Vector Spaces . . . . . . . . . . . . . . . . . . . . 333
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
12 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
12.1 Orthogonal Operators–Geometric Aspects . . . . . . . . . . . . . . . . . . . 343
12.1.1 The Role of Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
12.2 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . . . 349
12.3 The Scalar Product and Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . 354
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
13 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
13.1 Duality on an Abstract Vector Space . . . . . . . . . . . . . . . . . . . . . . . . 361
13.2 Duality and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
14 Tensor Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
14.1 Covariant Tensors and Tensor Products . . . . . . . . . . . . . . . . . . . . . . 371
14.1.1 Examples of Covariant Tensors . . . . . . . . . . . . . . . . . . . . . 375
14.2 Contravariant Tensors and the Role of Duality . . . . . . . . . . . . . . . . 382
14.3 Mixed Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
xviii Contents

14.4 Tensors on Semi-Euclidean Vector Spaces . . . . . . . . . . . . . . . . . . . 385


14.5 The Structure of a Tensor Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
14.5.1 Multilinearity and the Tensor Product . . . . . . . . . . . . . . . . 387
14.5.2 The Universal Property of Tensors . . . . . . . . . . . . . . . . . . 387
14.6 Universal Property of Tensors and Duality . . . . . . . . . . . . . . . . . . . 390
14.7 Tensor Contraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Chapter 1
The Role of Group Action

In this chapter, we shortly present some prerequisites for the book concerning math-
ematical structures in calculus and geometry.
We introduce and discuss in detail, especially for the benefit of physicists, the
notion of quotient spaces in connection with equivalence relations.
The last two sections deal with group actions, which are the mathematical face of
what we meet as symmetries in physics.
This chapter could be considered as appendix, but we set it at the beginning of
the book to point out its significance. For physicists, it can be skipped on the first
reading.

1.1 Some Prerequisites: Mathematical Structures

In this chapter, we are dealing informally with the various mathematical structures
needed in physics. Some of these structures will often be introduced without proof
and used intuitively, as in the literature in physics. It is a fact that in physics, we
often have to rely on our intuition, sometimes more than we would like to. This may
cause difficulties, not only for mathematically oriented readers. We try to avoid this
as much as possible in the present book. Therefore, we will rely here on our intuition
no more than necessary, and be precise enough to avoid misunderstandings.
We treat set theory as understood; we here discuss only the notion of quotient
space. Quotient spaces appear in many areas of physics, but in most cases they are
not recognized as such. In this chapter, we will concentrate on general and various
aspects of group actions and the revision of some essential definitions. In this context,
we also introduce the notion of an equivariant map that respects the group actions of
the input and output spaces in a compatible way.
Usually, we call a set with some structure a space. So we can talk about topological
spaces, metric spaces, Euclidean and semi-Euclidean spaces, affine spaces, vector
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_1
2 1 The Role of Group Action

spaces, tensor spaces, dual spaces, etc. Almost every set we meet in physics also has
a manifold structure. This is the case in the spaces mentioned above. Some examples
of manifolds that the reader already knows are the three-dimensional Euclidean
space we live in, any two-dimensional smooth surface, and any smooth curve (one-
dimensional manifold) we may think of or see in it. We may also think of manifolds
in any dimension .k < n in .Rn which is also the simplest manifold we can have in
.n-dimensions.

In this book, we will talk freely about manifolds (smooth manifolds) without
defining them, and we expect the reader to know at least intuitively what we refer to.

1.2 Quotient Spaces

We start with some general remarks that might be quite useful for many readers.
Equivalence relations first appear in real life when we want to talk about objects that
are not absolutely the same but show clear similarities. In this way, we can observe
rough structures more clearly and more precisely. We get new sets, often with far
fewer elements. In mathematics, such a set is called a quotient set and the elements
of this quotient set are called equivalence classes. Each such element, called an
equivalence class, is itself a special subset of the originally given set.
We could consider as an example from real life the set of the inhabitants of the
European Union. A possible equivalence relation exists if we consider the inhabitants
of each European state as equivalent. In this case, the quotient set is the set of
European states and the elements are the corresponding states Germany, France,
Cyprus and so on.
It would be surprising if we could not apply things that are produced in life
constantly and often unconsciously, in mathematics, and in particular in linear algebra
too. So, in the following, we describe this phenomenon and some consequences of
it, here in the framework of mathematical formalism.
In mathematics, it is well-known that we can construct new sets with set operations
like, for example, union and intersection. However, building quotient spaces is a
much more complex operation than obtaining a new set or a new manifold by union
or intersection. This happens when we have to talk not about equal elements but
about equivalent elements. Here we use the equivalence relation to construct a new
topological space or a new manifold. As this approach to constructions of quotient
spaces may seem highly abstract, we have to be much more precise than we usually
would be in the standard physics literature (at least at the beginning) and not rely,
entirely on our physical or geometric intuition. Interestingly enough, we use many
quotient spaces in physics intuitively and often without being aware of the precise
mathematical situation. A prominent example in special relativity is when we have to
consider our three-dimensional Euclidean space of a given four-dimensional space of
events (spacetime points). As is well-known, a point belonging to a Euclidean space
within this setup, is not a point but a straight line in the four-dimensional space of
events. This is, for example, the set of all the events at all times of a free moving point
1.2 Quotient Spaces 3

particle. Mathematically speaking, all the events on this straight line are equivalent
elements (events) of the spacetime and form an equivalent class or simply a class or
a coset. For example, the space point “London” is the equivalence class of all events
along a straight line. The set of all such equivalent classes or cosets (parallel straight
lines of free point particles) fills the whole spacetime. This set of equivalent classes
is called quotient space. So it is evident that the three-dimensional Euclidean space
is described by a quotient space of the four-dimensional spacetime.
Such a construction is a mathematically precise approach to obtaining new spaces
out of given ones. Even if it seems quite abstract, for this procedure we need only
elementary mathematics from set theory. The usefulness of this formalism in many
applications justifies its introduction here. Later, we shall also use this approach to
show precisely that the basis dependent (component-wise with indices) tensor for-
malism is equivalent to the coordinate-free formalism. In this sense, tensor formalism
as mostly used in physics, can also be considered as a coordinate-free formulation.
Therefore, in this book, we try to use the coordinate-free formalism and the tensor
formalism at the same level and take advantage of both.
We first remember the definition of an equivalence relation:

Definition 1.1 Equivalence relation.


On a set X an equivalence relation is a relation, that concerns a subset
' '
. R ⊂ X × X , but we usually write instead of .(x, x ) ∈ R : . x ∼ x .
The relation .∼ (or .∼ R ) is an equivalence relation if it is:
(i) reflexive: .x ∼ x;
(ii) symmetric: .x ∼ x ' ⇔ x ' ∼ x;
(iii) transitive: .x ∼ y and . y ∼ z ⇒ x ∼ z.

Definition 1.2 Equivalence class and quotient space.


We call the subset

.[x] := {x ' : x ' ∼ x, x ∈ X } ⊂ X

the equivalence class or coset of x relative to the equivalence relation .∼. The
new set of equivalence classes which we call quotient space of . X , determined
by the equivalence relation .∼, is given by:

. X/ ∼ := {[x] : x ∈ X }.

We have a natural, surjective map,


4 1 The Role of Group Action

.π :X → X/∼
x |→ π(x) := [x]

which may also be called canonical map, canonical projection or quotient map.

Remark 1.1 Equivalence relation and disjoint decomposition.

It is important to notice that any equivalence relation of . X induces a disjoint


decomposition of . X (called a partition) given by the fibers of .π −1 ([x]) ⊂ X .
We may represent this symbolically in Fig. 1.1. This will also be demonstrated
in Sect. 2.6 in connection with quotient vector spaces.

It is easy to realize intuitively that a given partition of X also introduces an


equivalence relation on . X .
At this point some explanations are necessary.
All the equivalent points of .x in . X are also given geometrically by the fibers
of .π :
−1
.π (π(x)) = π −1 ([x]) ⊂ X.

So we have again
.π −1 ([x]) = {x ' : x ' ∼ x, x ∈ X } ⊂ X.

Fig. 1.1 Disjoint decomposition of . X


1.2 Quotient Spaces 5

We may call any element .x ' ∈ π −1 ([x]) ∈ X a representative of the class .[x]. If
the different elements .x and .x ' are for example different because they have different
properties, we ignore these different features and consider .x and .x ' as essentially
identical. So we may identify all the other equivalent elements of .x with .x. We so
obtain a new object, a new element .[x] and a new set . X/∼= {[x], . . . }, both with
completely different properties as .x and . X = {x, . . . }.
It is clear that .[x] is not an element of . X , so we have .[x] ∈ / X . The set of represen-
tatives of .[x], the fiber .π −1 ([x]), is a subset of . X (i.e., .π −1 ([x]) ⊂ X ). It is a slight
misuse of notation when sometimes we mean .π −1 ([x]) and write “.[x] ⊂ X ”. Note
again that for the element .[x] ∈ X/∼ we may use different names: equivalent class,
coset, or fiber (i.e., .(π −1 ([x])). All this is demonstrated symbolically in Fig. 1.1.
In many cases, the quotient map .π is defined in such a way that . X and . X/∼
have the same algebraic (or geometric) structure. In this case, .π is a homomorphism
relative to the relevant structure. See for example the proposition in Sect. 2.6.
We are now going to give two very simple and essential geometrical examples
of quotient spaces. The first example corresponds to the special relativity case men-
tioned earlier in this section. The second one corresponds to a pure geometric case.

Example 1.1 Rays in .R2 as quotient space.


We consider the following subset:

. → = {x = ξ→ : ξ→ ∈ R2 and ξ→ /= 0}.
X = R2 − {0} →

So, as shown in Fig. 1.2, this is the two-dimensional plane without the zero
point. We denote with .R+ the positive numbers .R+ := {ξ ∈ R : ξ > 0}.
For each .x ∈ X , the x-ray . A(x) is given by . A(x) = R+ x.

Fig. 1.2 Rays in .R2


6 1 The Role of Group Action

We now define that equivalence relation .∼ in . X which is given by the x-


rays: For .x ∈ X , let . A(x) be the set of points in . X which are in relation with
. x:
' ' +
. x ∼ x ⇔ x ∈ R x = A(x). (1.1)

We denote the equivalence class or coset of .x by .[x]. This can also be defined
formally like this (for . X = R2 − {0}):

[x] := {x ' ∈ X : x ' ∼ x}.


. (1.2)

In Eq. (1.1), we see that .[x] is the set . A(x), the ray which contains .x.

Note also that by this construction we obtain a disjoint decomposition of . X (see


Fig. 1.2). Obviously the following holds:

.From x '' ∼ x follows A(x '' ) = A(x).

We further see that for two equivalence classes the following is true: either they are
equal or they are disjoint. This means that for . A(x) and . A(z) either . A(x) = A(z) or
. A(x) ∩ A(z) = ∅. The quotient space . X/∼ is given by the set of rays:

. X/ ∼ = {[x]} = {A(x) : x ∈ X }.

We can also describe this by a set of suitable representatives. For example, we may
choose the circle with radius one . S 1 = {x ∈ X : ||x|| = 1} and the bijective map:

Φ:
. X/∼ −→ S 1
x
A(x) |−→ .
|| x ||

So we determined for every equivalence class .[x] ≡ A(x) one and only one repre-
x
sentative, the point . ||x|| ∈ S 1 ⊂ R2 . That is for each ray a single point on the circle
and so we get the bijection:
. X/∼ ∼
= S1.

There are of course innumerable such bijections which characterize the quotient
space, . X/ ∼, but . S 1 seems to be the most pleasant.

1.3 Group Actions

There is hardly an area in mathematics or theoretical physics where groups and,


particularly, group actions on manifolds are irrelevant. Symmetries are present in
nature, they lie at the heart of the laws of nature. Group actions are the mathematical
1.3 Group Actions 7

face of what we call symmetries in physics.The notion of a group arises naturally


if we consider the bijective maps from a given set to itself. The composition of two
bijective maps is, again, a bijective map. The identity is immediately present, and it
is evident that for every bijective map, its inverse exists. Additionally, composition
is associative. Throughout this book, we would like to refer to an action of a group
element, or a bijective map, as a transformation. It is plausible that the set of all
transformations (bijective maps) in a given set leads to the notion of a group:

Definition 1.3 Group.


A group is a set G and a binary operation.

. G × G −→ G,
(a, b) |−→ a ∗ b ≡ ab

with the following axioms:


(i) associativity: .(a ∗ b) ∗ c = a ∗ (b ∗ c);
(ii) existence of a neutral element .e with: .e ∗ a = a ∗ e = a;
(iii) existence of inverse elements: for every .a ∈ G there exists an element
−1
.a ∈ G such that .a ∗ a −1 = a −1 ∗ a = a.
A group is abelian or commutative if .a ∗ b = b ∗ a.

We believe that groups would be useless in physics if there were no actions or


realizations of groups. Indeed, everywhere in physics, we have actions of groups.
Therefore, we introduce and apply group actions from the beginning:

Definition 1.4 Left group action.


The group G is acting on the set X (from the left).

.φ : G × X −→ X
(g, x) | −→ Φ(g, x) ≡ gx

and the following two conditions are valid:


(i) “Compatibility”: for( .g, h ∈ G h(gx))= (hg)x and
(ii) “Identity”: .ex = x φ(e, x) = id X (x) .
This means that .e ∈ G acts as the identity-map

id X : X −→ X
.

x −→ id X (x) = x.
8 1 The Role of Group Action

Even if it seems quite trivial, it turns out that the identity map, .id X , is a very
important map in mathematics and physics.
The map .φ above gives two families of partial maps which we denote by .(φg )G
and .(φx ) X .

φ : X −→ X
. g

x |−→ φg (x) := φ(g, x)

and

φ : G −→ X
. x

g |−→ φx (g) := φ(g, x).

We may call .φg a g-transformation and .φx an .x orbit maker! There are two more
corresponding maps .φ̂ and .φ̃. Using .T r f (X ) ≡ bijective .(X ) for bijective maps in
. X , we have:

.φ̂ : G −→ T r f (X )
g |−→ φ̂(g) := φg : X −→ X.

The map, .φ̂, converts an abstract group element into a transformation .φg in . X . If we
denote the set of all maps between . X and .Y by . Map(X, Y ) = { f, . . . },

. f : X −→ Y
x −→ f (x) = y.

If we have . X = Y , we write .T r f (X ) ⊂ Map(X, X ). The second map .φ̃ which


corresponds to .φx is given by

.φ̃ : X −→ Map(G, X )
x |−→ φ̃(x) := φx : G −→ X.

The map, .φ̃, converts the point .x into the map .φx , a kind of an “orbit maker”!
According to our convention, .T r f (X ) is the set of bijections on X so that we have
. T r f (X ) ≡ Bi j (X ). The official name for . T r f (X ) is the symbol . S(X ), the group of

all permutations in . X . The map, .φ̂, is a group homomorphism (G-homomorphism):

φ̂(gh) = φ̂(g)φ̂(h).
.

This follows immediately from the G-action. For each .x ∈ X :


1.3 Group Actions 9
( ) ( )
. φ̂(h) ◦ φ̂(g) (x) = φh ◦ φg (x) = φh φg (x)
= φh (gx) = hgx = φhg (x) = φ̂(hg)(x).

So we have .φ̂(h) ◦ φ̂(g) = φ̂(hg).


As we see, the abstract group multiplication .∗ corresponds in this consistent way
to the composition of transformations on X. The map, .φ̂, is a realization of the group
G as a particular G-transformation group on X.
There is also an analog action on . X from the right hand side (right group action).

ψ : X × G −→ X
.

(x, g) |−→ ψ(x, g) = ψg (x) = xg

with

ψ̂ : G −→ T r f (X )
.

g |−→ ψ̂(g).

The map, .ψ̂, is now an antihomomorphism:

ψ̂(hg) = ψ̂(g) ◦ ψ̂(h)


.

So we have

. (ψ̂(g) ◦ ψ̂(h))(x) = ψg ◦ ψh (x) = ψg (ψh (x)) = ψg (ψ(xh)) = ψ(xhg) = xhg =


= ψhg (x) = ψ̂(hg)(x) ⇒
⇒ ψ̂(g) ◦ ψ̂(h) = ψ̂(hg).

If there is no danger of confusion, we may also write anonymously . L g := φg and


Rg := ψg . For this reason, given the map
.

. G × X → X,

we may call the set


. X : left G-space

since we consider, as indicated by .G × X , a left action. The group .G acts from the
left. In the case
.Y × G → Y,

we may call the set


. Y : right G-space
10 1 The Role of Group Action

since we consider as indicated by .Y × G, a right action. The group .G acts from the
right.
It is important to realize that left and right actions correspond to two different
maps. In particular, as was shown above, a left action leads to a homomorphism
.φ̂(h) ◦ φ̂(g) = φ̂(hg). A right action leads to an antihomomorphism .Ψ̂(g) ◦ Ψ̂(h) =
Ψ̂(hg).

Comment 1.1 The meaning of the right action.

We would like to point out again that . Rg is an antihomomorphism. We have


for example . Rg (ab) = Rg (b)Rg (a) and . L g (ab) = L g (a)L g (b). Here we meet
a tricky technical point in the notation. If the action leads to a homomorphism
as above with .φ̂, it is commonly referred to as a left action. In the case of an
antihomomorphism, as with .Ψ̂, we are taking about a right action. This is why
we write for example:

.Ψ ' : X × G −→ X
(x, g) |−→ Ψ ' (x, g) := g −1 x.

Since this leads to an antihomomorphism, .Ψ ' is a right action even if the group
element .g −1 acts on .x, as we see, from the left. We may also indicate this fact
by writing . R̄ := L g−1 . We will need this fact later.

The left or right action is also relevant for what follows. It is obvious that for every
given point .x0 ∈ X , the left group action leads for example to a left orbit which we
denote by .Gx0 and which is the subset of . X given by:

. Gx0 = {gx0 : g ∈ G}.

A subgroup of.G,. Jx0 (that is, Jx0 < G) is connected with the orbit.Gxo at the position
x . This subgroup characterizes entirely the orbit, naturally together with .G. This
. o
leads to the following definition.

Definition 1.5 Isotropy group.


The isotropic subgroup of .G, with respect to .x0 , is given by . Jx0 := {g ∈
G : gx0 = x0 }.

. Jx 0 is often called isotropy group or stability group or even stabilizer subgroup

of .G with respect to .x0 . Different terms for the same thing sometimes indicate their
importance.
1.3 Group Actions 11

We will now give a definition of a few other essential types of action relevant
to some aspects of linear algebra and physics in general, for example, gravity and
especially cosmology.

Definition 1.6 Transitive action. The group, .G, acts transitively on . X if . X is


an orbit of .G.

So we have .Gx0 = X .
Equivalently, for any .x and .x ' in . X there exists a .g ∈ G such that .x ' = gx.

Definition 1.7 Effective action.


The group, .G, acts effectively if the only element of .G that fixes every
. x ∈ X is .e. That is, if . g ∈ G has . gx = x ∀ x ∈ X , then . g = e.

In other words, only the neutral element .e is acting on . X as the identity.

Definition 1.8 Free action. The group, .G, acts freely on . X if for all .x0 ∈ X ,
the isotropy group . Jx 0 is trivial (i.e., . Jx0 = {e}). This means that, in other
words, .G acts on . X without fixed points.

Remark 1.2 On homogeneous spaces.

Generally, any orbit .(Gxo ) is a homogeneous space. It is determined by the


group .G and a specific subgroup . H of .G. More precisely, the structure of the
orbit .Gxo is given by the quotient space .G/H , so that

. Gxo ∼
= G/H.

So we may consider the quotient space .G/H as a model of the orbit .Gxo in the
same sense as .Rn may be considered as the model of a real vector space .V with
.dim V = n. This is the first application of Sect. 1.2. The relevant equivalence

relation is given below.


Note that the orbit .Gxo which corresponds to the action of the group .G on
. X and which in general is not a group, is described by the two groups . G and

. H with . H < G. It turns out that the second group . H is given, as expected, by
the data of the orbit .Gxo and is precisely the stability group . Jxo . So we have
. H := Jx o .
12 1 The Role of Group Action

Given .G and . H as above, in order to obtain the quotient space .G/H , we have to
consider the right action of . H on .G:

. G × H −→ G,
(g, h) |−→ gh ≡ Rh g.

It is clear that . H acts freely on .G. This follows directly from the group axioms and
Definition 1.8 (Free action): If.gh = g, we also have.g −1 gh = g −1 g. Since.g −1 g = e,
it follows that .eh = e and .h = e and so, the isotropy group . Jg = {e} is trivial. This
means that . H acts freely on .G.
We now define the following equivalence relation.

. g ' ∼ g :↔ g ' = gh.


H

So we have
[g] := {g ' : g ' = gh, h ∈ H } ≡ g H.
.

This means that .g ' and .g are in the same (right) . H orbit .g H in .G.
This equivalence relation leads to the quotient space, as defined in Sect. 1.2:

. G/H := {[g] = g H : g ∈ G} (1.3)

Here the cosets .[g] are . H orbits in .G. Since the action of . H an .G is free, every such
H orbit in .G is bijectively related to the subgroup . H . So we get .g H ∼
. = H for all
bi j
. g ∈ G and .G/H consists of all such . H orbits. For this reason .G/H is also called . H
orbit space in .G and we may draw symbolically (Fig. 1.3).
This is one further example of a quotient space. Figure 1.3 and Eq. (1.3) also show
explicitly that the quotient space .G/H is given by the set of all positions .g H, g ∈ G
that the subgroup . H takes by the natural .G-action. This means nothing else but that
. G/H by itself is an orbit of the . G-action. Therefore . G/H is a homogeneous space.

At the same time, .G/H is generally a model for every .G orbit .Gxo in . X .

Fig. 1.3 . H orbit space in .G


1.4 Equivariant Maps 13

Summing up the above discussion, we can state that this type of quotient space
leads to a kind of universal relation: Every .G orbit in . X has the same structure and is
called a homogeneous space. So for a given .G orbit .Gxo ⊆ X there exists a subgroup
. H of . G (in fact, this subgroup . H is the isotropy group . Jx 0 ) such that

. Gxo ∼
= G/H.

This shows the importance of quotient spaces. Numerous applications in theo-


retical physics make use of them, for example in cosmology, in various aspects in
symmetries, in quantum mechanics, and in numerous cases in linear algebra.

Remark 1.3 Transitive and free .G-action.

There is still one further important fact. If we have simultaneously a transi-


tive and a free.G-action on. X , then. X is bijective to the group.G (see Definitions
1.6 and 1.8. So we get:
.X ∼
= G.
bi j

By assumption, . X is not a group but has the same “number” of elements, that
is, the same cardinality, as the group .G.

We use this fact in Sect. 3.2. If we work with all (linear) coordinate systems
simultaneously, it signifies that we are de facto coordinate-free. This is true for linear
algebra as well as for tensor calculus.
This proves that tensor calculus, in the basis dependent component formulation,
is completely equivalent to and not less valuable than the corresponding basis free
formulation.

1.4 Equivariant Maps

We consider two (left) .G-spaces . X and .Y :

φ : G × X → X and ψ : G × Y → Y
.

and the map


. F : X → Y.

The interesting case occurs when we demand . F to commute with the group actions
φ and .ψ. This leads to the notion of an equivariant map:
.
14 1 The Role of Group Action

Definition 1.9 . F is an equivariant map if . F ◦ φg = ψg ◦ F holds for all .g ∈


G.

Note that the interesting maps between groups are the group homomorphisms in
the same sense. Here we have .G-spaces, . X and .Y , and the interesting maps now are
the equivariant maps.
This can also be expressed by the following commutative diagram:

. F
X Y
φ
. g . ψg
X Y
. F

In the case . X = Y , we have . F ◦ φg = ψg ◦ F.

Summary

This chapter should normally be in the appendix. However, because we wanted to


highlight the usefulness of this chapter, especially for physicists, we placed it as
chapter one.
Firstly, we extensively motivated and discussed the significance of the quotient
space. Its relationship to the corresponding equivalence relation was elucidated.
The connection between the underlying set of the quotient space and the canonical
surjection was also clearly highlighted graphically and with a typical example.
Group actions are fundamental in physics, considering the role of symmetries
and the fundamental forces of physics. They are also relevant at every step in linear
algebra. Therefore, this section became necessary, even though it may appear quite
challenging upon first reading. The various aspects and definitions allow for a clear
distinction between the different structures that arise in linear algebra.
Finally, the very brief section on equivariant maps was a useful addition, although
it will only become relevant in the third chapter.

References

1. W.M. Boothby, An Introduction to Differentiable Manifolds and Riemannian Geometry (Aca-


demic Press, 1986)
2. T. Bröcker, T. Tom Dieck, Representations of Compact Lie Groups (Springer, 2013)
3. M. DeWitt-Morette, C. Dillard-Bleick, Y. Choquet-Bruhat, Analysis, Manifolds and Physics
(North-Holland, 1978)
References 15

4. M. Göckeler, Th. Schücker, Differential Geometry, Gauge Theories, and Gravity (Cambridge
University Press, 1989)
5. K.-H. Goldhorn, H.-P. Heinz, M. Kraus, Moderne mathematische Methoden der Physik. Band
1 (Springer, 2009)
6. K. Jänich, Topologie (Springer, 2006)
7. J.M. Lee, Introduction to Smooth Manifolds. Graduate Texts in Mathematics (Springer, 2013)
8. S. Roman, Advanced Linear Algebra (Springer, 2005)
9. C. Von Westenholz, Differential Forms in Mathematical Physics (Elsevier, 2009)
Chapter 2
A Fresh Look at Vector Spaces

We start at the level of vector spaces, and we first consider quite generally a vector
space as it is given only by its definition, an abstract vector space.
We have to investigate, to compare, and use vector spaces to describe, whenever
this is possible, parts of the physical reality. The most appropriate way is to use maps
that are in harmony, that is, compatible with a vector space structure. We have to use
linear maps, also called vector space homomorphisms.
It turns out that the physical reality demands additional structures which we have
to impose on an abstract vector space. The most prominent structure of this kind is a
positive definite scalar product (special symmetric bilinear form), the inner product,
and in this way we obtain an inner product vector space or a Euclidean vector space
which is strongly connected with our well-known (affine) Euclidean space. We have,
of course, also semi-Euclidean vector spaces where the scalar product is no more
positive definite.
It is interesting that instead of adding, as in Sect. 2.3 with a symmetric bilinear
form, we could also “subtract” structures from vector spaces. This means we can
consider a vector space a “special” manifold, a linear manifold which is usually
called affine space.

2.1 Vector Spaces

The discussion in Sect. 1.3 allows us to define a vector space that emphasizes the
point of view of group action.
In preparation of this first approach, we have to define what a field is, for example
the real and complex numbers. These are also called scalars and are used essentially
to stretch vectors, an operation which is also called scaling. We already know what
a group is. In some sense, a group .G contains a perfect symmetry structure. It is
characterized by one operation, by the existence of a neutral element .e with .eg =
ge = e, for all .g in .G and by the property that for every .g in .G the inverse .g −1 exists.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 17
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_2
18 2 A Fresh Look at Vector Spaces

Hence we have .gg −1 = g −1 g = e. We often also write .e = 1. In the case that .G is


commutative (abelian) and if the operation is additive, we usually write .e = 0. If the
operation is multiplicative, we write .e = 1.
On the way to determining what a field is, it seems useful to make a stop and first
define what a ring is.

Definition 2.1 Ring.


A ring is a set with two operations, .+ (addition) and .· (multiplication), so
that the following properties hold:
(i) Let .(R, +) be an abelian additive group with neutral element, .0, and the
inverse of .α ∈ R given by .−α. Hence we have .α + (−α) = α − α = 0 ;

(ii) .(R, ·) The multiplication is associative for .α, β, γ ∈ R:

α · (β · γ ) = (α · β) · γ .
.

The neutral element of multiplication is called unit 1 .(α · 1 = 1 · α = α);

(iii) The distributive laws hold, for .α, β, γ ∈ R:

.α · (β + γ ) = α · β + α · γ ;
(α + β) · γ = α · γ + β · γ .

A ring . R is called commutative if .α · β = β · α for all .α, β ∈ R.


In this case, both operations, addition as well as multiplication, are commutative.
In general, a ring needs not be commutative and in fact, it does not have to have
inverse elements for each of its elements.
As we shall see, a field is a commutative ring with one more condition, the
existence of inverse elements, for multiplication too.

Definition 2.2 Field (1).


A commutative ring .K is called a field if each nonzero element has a mul-
tiplicative inverse.

This means that for every .α ∈ K with .α /= 0, there exists an element .α −1 so that
−1
.α · α = α −1 · α = 1. It is easy to recognize the following equivalent definition.
2.1 Vector Spaces 19

Definition 2.3 Field (2).


A field is a set .K with two operations, addition (.+) and multiplication (.·)
so that
(i) .(K, +) is a commutative group with neutral element .0;

(ii) .(K \ 0, ·) is a commutative group with neutral element .1;

(iii) The distributive laws hold: For all .α, β, γ ∈ K, we have

α · (β + γ ) = α · β + α · γ ;
.

(α + β) · γ = α · β + β · γ .

In the case of a vector space, we have a new situation. We have to consider two
different sets. The main set is the vector space .V = {u, v, w, . . . } and the second set
is the field .K = {α, β, γ , . . . }. .(V, +) is an additive group. In addition, we see that
the second operation is an external operation between .K and .V . To be more precise:
we get .(K, ·), the multiplicative group of the field .K which acts on .V . The notion of
group action was discussed in Sect. 1.3 in general terms. This means here that every
scalar .λ ∈ K, λ /= 0 can expand or shrink every element (vector) of .V . All this is
summarized in the following definition.

Definition 2.4 Vector space.


A vector space is an abelian group (additive) .V on which the field .K acts
through its multiplicative group.

More precisely, we denote by .K the field of real or complex numbers .(K ∈


{R, C}). A vector space over .K is the set .V with two operations:
Addition:

.+ : V × V −→ V,
(v, w) |−→ v + w.

Scalar multiplication:

.K × V −→ V,
(α, x) |−→ αv,

with the following properties:


20 2 A Fresh Look at Vector Spaces

(i) .V is an abelian group;


(ii) .K acts on .V by
.a(βv) = (αβ)v for all .α, β ∈ K and .v ∈ V ;
.1v = v for all .v ∈ V and

.αv = vα;
(iii) The distributive law holds:
.(α + β)v = αv + βv and .α(v + w) = αv + αw.

Part (ii) of the definition refers to the group action, as discussed rather generally in
Sect. 1.3.
We restrict ourselves from the beginning throughout this book to finite-dimensional
vector spaces. Only a few examples are referring to nonfinite-dimensional vector
spaces. We use greek letters like .α, β, ξ, λ for the scalars since we want to underline
the scalar action on vectors.
In physics, the two fields .R and .C play the leading role; therefore, we restrict .K
to these two fields.
Although .R and .C vector spaces correspond to the same linear structure, in some
cases they have different properties, for example in the spectral theorems, which are
very important in physics. That is why it is necessary to distinguish them.
It is interesting to realize that the scalar action.K on the abelian group.V creates the
vector space we know, an object which is very “compact” and at the same time very
flexible. This is due to the existence of its basis and with it the notion of dimension,
especially in the finite-dimensional case we consider here. Thus every vector space
. V is entirely characterized by its dimension and the scalar action on it.
As we saw, all the above spaces, rings, fields, and vector spaces, start with an
abelian group and after that another operation is introduced. In linear algebra, for
square matrices and linear operators in a vector space, we can further define another
operation. This results in an algebra. In accordance with our procedure in this section,
there are two ways to arrive at an algebra. We can either start from a ring and then
introduce a scalar multiplication or start with a vector space and then introduce a
vector multiplication. In the following definition, we consider both.

Definition 2.5 Algebra.


An algebra over the field .K is a set . A, together with three operations: addition,
multiplication, and scalar multiplication for which the following holds:
(i) The set. A is a vector space over.K under addition and scalar multiplication.
(ii) The set . A is, in addition, a ring under addition and multiplication.
(iii) For a scalar .λ ∈ K and .a, b ∈ A,

λ(ab) = (λa)b = a(λb).


.

holds.
2.1 Vector Spaces 21

Here we see explicitly that an algebra is a vector space in which we can take the
product of vectors. Or, equivalently, we see that an algebra is a ring in which we can
multiply each element by a scalar.

2.1.1 Examples of Vector Spaces

There are thousands of examples of vector spaces. We prefer to discuss examples


for real vector spaces. These are more commonly used at the beginning of studies in
physics than complex vector spaces. Real vector spaces are quite familiar to physicists
and this facilitates to a great extend the understanding of the structures we discuss
here.
In what follows, we are using an obvious notation, trying to avoid cumbersome
definitions and pathological situations. In addition, we introduce the reader to the
notation we use as systematically as possible.

Example 2.1
. V = R0 := {0}.

This is the simplest but trivial vector space we have. Here, and in most examples
below, the verification of the vector space-axioms are very straightforward.

Example 2.2
. V = R = R1 = {v ∈ R}.

This is the simplest (nontrivial) vector space we can have. Both, scalars .α and
vectors .v, are real numbers .(α, v ∈ R).

Example 2.3
[ ]
ξ1
. V = R2 = R × R := {x = ξ2
: ξ 1 , ξ 2 ∈ R}. (2.1)
[ ] [ ]
We may also write .x = e1 ξ 1 + e2 ξ 2 with .e1 = 01 and .e2 = 01 . We would
like to clarify that multiplication is commutative, that is, we have (Definition
2.4) for example .e1 ξ 1 = ξ 1 e1 .
22 2 A Fresh Look at Vector Spaces

This is the simplest typical example of a vector space. The scalars .ξ 1 , ξ 2 are the
components, coefficients, coordinates of the vector .x. .R2 may also be seen as the
coordinate plane. It is clear that the data of the vector .x ∈ R2 is the list of length
two .(ξ 1 , ξ 2 ) ξ 1 , ξ 2 ∈ R, but we choose, as is common practice for good reasons,
to present this as a column which is a .2 × 1 matrix, as indicated in Eq. (2.1) with
square brackets. We assume, as usual in physics, that the reader learnt very early
to use matrices. In physics, we also like arrows! Here too, and we shall use them,
especially when we want to emphasize that these vectors belong to the standard
vector space .Rn (here .n = 2) written as columns. For this reason we freely use both
[ 1]
notations, .x ∈ R2 and .x ≡ ξ→ = ξξ 2 , ξ 1 , ξ 2 ∈ R. Here, the symbol “.≡” indicates
that we use a different notation for the same thing.

Comment 2.1 Lists and Matrices.

We write as usual a 2-tuple or a list of length 2, horizontally, with round


brackets, symbolically like .(∗, ∗), with a comma. We write a row in .R2 which
is a .1 × 2-matrix also horizontally but without comma (because this is the com-
mon way to write matrices), and with square brackets symbolically like .[∗∗] in
analogy to a column .[∗∗ ] in .R2 as in Eq. (2.1).
On the other hand, if we want to write a vertical list of length 2 simply as
data, and want to distinguish this list from the corresponding .2 × 1-matrix, we
write it with round brackets but necessarily without comma, symbolically like

.(∗ ). We proceed similarly in the case of .R .
n

This difference between a list and a matrix seems at first to be quite pedantic.
But in linear algebra, whenever we consider linear combinations of vectors in
.R or vectors in . V , it is better to talk of a linear combination with respect to a list
n

of length .n rather than of a linear combination of the entries of an .1 × n-matrix.


It is clear that in most cases we identify lists with matrices. For example, we
identify the standard basis in .Rn which is the list.(e1 , · · · , en ) with the matrix
.[e1 · · · en ] = 1n :
.(e1 , · · · , en ) = [e1 · · · en ],

and similarly we may do the identification (symbolically)


(∗) [∗]
.. = .. .
.
. .
∗ ∗
2.1 Vector Spaces 23

Example 2.4
⎧ ⎡ ξ1 ⎤ ⎫

⎪ ⎪


⎨ .
⎢ .. ⎥ ⎪

⎢ i⎥ 1
. V = Rn = x = ⎢ξ ⎥ : ξ ,...,ξ ∈ R .
n
(2.2)

⎪ ⎣.⎦ ⎪


⎩ .. ⎪

ξ n

This is an extension of the Example 2.3 for all .n ∈ N := {1, 2, 3, . . . }. .n is,


as we know, the dimension of the vector space .Rn . The precise definition of
the dimension is given in Sect. 3.1, after knowing precisely what a bases in a
vector space is.
If we define
.i ∈ I (n) := {1, 2, . . . , n},

we may write, as usually in various situations,

. x = ξ→ ≡ (ξ i )n ≡ (ξ i ).

The symbol “.≡” here indicates only a different notation of the same object.
We write the list of numbers .ξ 1 , ξ 2 , . . . , ξ n as a column of length or size .n, a
.n × 1 matrix, written as a vertical list (without commas of course). It is clear
that in general we cannot distinguish a .n × 1 matrix (column) from a vertical
list of length .n.

In physics, the vector space .Rn is extremely relevant and well-known, especially
for .n = 3. The vector space .R3 is the model for our Euclidean space which we
may denote by . E 3 in order to distinguish it from .R3 . It should be clear that the
Euclidean space . E 3 with its elements, the points . p ∈ E 3 (we denote elements of . E 3
with “. p”, “.q”, etc.), is not a vector space, and is therefore different to .R3 whose
elements we denote with “.x→” and “.ξ→ ”, and so on. As we know, . E 3 is a homogeneous
[0]
space whereas .R3 is not. The presence of the element .0→ = 0 ∈ R3 , the neutral
0
element .(0→ + ξ→ = ξ→ ), makes .R3 nonhomogeneous. It is clear that the points of . E 3
are not numbers and that we cannot add points. Nevertheless, we consider .R3 , not
only in physics, as a perfect model for . E 3 . This enables us, among other things, to
do calculations choosing a bijective correspondence (. p ↔ ξ→ ) between the points . p
in . E 3 and the three numbers .ξ 1 , ξ 2 , ξ 3 . The three numbers .ξ 1 , ξ 2 , ξ 3 describe the
position of the point . p. This allows for example to formulate Newton’s axioms in
coordinates and to perform all the calculations we need in Newtonian mechanics.
Similarly, .Rn is the model of . E n for .n ∈ N.
24 2 A Fresh Look at Vector Spaces

Comment 2.2 Identification of vectors with points.

Very often in physics and mathematics, we identify .ξ→ with . p and we also
call .ξ→ (the three numbers) a point. We furthermore use .ξ→ to denote a point . p, a
position which is not a vector. Here, we have to distinguish the following cases:
A translation of a vector leads to the same vector. If we consider a translation
of a point, we may think that we obtain another point. In other words, this
identification means that we may consider a vector space as a manifold (a linear
manifold).

Example 2.5
][
αβ
. V =R 2×2
:= {A = : α, β, γ , δ ∈ R}.
γ δ
[ ]
This is the vector space of .2 × 2-matrices. .0 ≡ [0] ≡ 00 00 is the zero. If we
write [ 1 1]
α1 α2
.A = = (αsi ) with αsi ∈ R, i, s ∈ I (2)
α12 α22

and
. B = (βsi ) C = (γsi ),

the addition is given component-wise,.αsi + βsi = γsi which is. A + B = C. The


scalar multiplication is given by

.R × R2×2 −→ R2×2 ,
[]
αβ11 αβ21
(α, B) |−→ α B := .
αβ12 αβ22

Example 2.6
( [ ] )
α11 α21 α31
. V =R 2×3
:= A = 2 2 2 = (αs ) i
α1 α2 α3
2.1 Vector Spaces 25

with .αsi ∈ R}, and with .i ∈ I (2), and .s ∈ I (3).


We thereby have correspondingly to the previous example the vector space of
.2 × 3-matrices.

Example 2.7

. V = RN := {x := (ξn )n ≡ (ξ1 , ξ2 , ξ3 , . . . ), ξ1 , ξ2 , ξ3 , . . . ∈ R}.

We here use the notation .Y X ≡ Map(X, Y ) for the maps .Map(X, Y ) = { f :


X → Y } between the sets . X and .Y . .V is the vector space of sequences and we
can interpret the sequence .x as the map

. x : N −→ R
n |−→ x(n) := ξn ∈ R.

Zero is .0 = (0, 0, . . . ). Addition and scalar multiplication are given compo-


nentwise. For .z = x + y and .α · x .(y = (ηn )n , z = (ζn )n ), we write:

. z(n) := (x + y)(n) := x(n) + y(n)

and
(α · x)(n) := αx(n).
.

This means that .ζn := (ξ + η)n := ξn + ηn and .(α · x)n := αξn . Finally, we
write again .z = x + y and .αx. Note that for .Rn , we can write in the above
notation .Rn ≡ R I (n) .

Example 2.8

. V = R(N0 ) := {α := (α0 , α1 , . . . , αm ) : α0 , α1 , . . . αm ∈ R and


m ∈ N0 = {0, 1, 2, . . . } }.

.V is the vector space of the interrupted sequences. Zero, addition, and scalar
multiplication are given as in the previous example. It turns out that this vector
space is equivalent to the space of polynomials denoted by
26 2 A Fresh Look at Vector Spaces


m
.R[x] = {α(x) := αk x k , m ∈ N0 }.
k=0

Then we get the vector isomorphism (without proof)

R(N0 ) ∼
. = R[x].

Example 2.9

. V = R X ≡ Map(X, R) ≡ F(X ) := { f, g, . . . },

with . f given by

. f : X −→ R
x |−→ f (x).

Zero, addition and scalar multiplication are in analogy to the Example 2.7. We
therefore have for zero the constant map .0̂.

0̂ : X −→ R,
.

x |−→ 0̂(x) := 0 ∈ R

for all .x ∈ X ,

( f + g)(x) : = f (x) + g(x) and


.

(α f )(x) := α f (x).

Example 2.10
. V = C 0 (X ).

V is the set of all continuous functions. Zero, addition, and scalar multiplication
.
are defined as in the previous Example 2.9. From analysis, we know that.C 0 (X )
is a vector space: For example, the addition of two continuous functions is a
continuous function too.
2.1 Vector Spaces 27

Example 2.11
. V = C 1 (X ).

The set of differentiable functions. Analogously to the previous Examples 2.9


and 2.10, we know that .C 1 (X ) has a vector space structure too.

Example 2.12
. V = L(X ).

The set of integrable functions in . X . We may take as an example the inter-


val . X = [−1, 1] ∈ R . .L(X ) is also a vector space since addition and scalar
multiplication of integrable functions produce integrable functions.

Example 2.13
. V = Sol(A, 0).

The set of solutions (. Sol) of a homogeneous linear equation given by a matrix


.A.
.Suppose A ∈ K
1×n
, is the row

. A = [α1 α2 · · · αn ] ≡ (αs )n αs ∈ R, s ∈ I (n)

and . Sol(A, 0) is the solution to the equation

α ξ 1 + α2 ξ 2 + · · · + αn ξ n = 0.
. 1 (2.3)

If we use the Einstein convention for the summation,


n
. αs ξ s = αs ξ s . (2.4)
s=1

Equation 2.3 takes the form


α ξs = 0
. s (2.5)

or, with .x = ξ→ as matrix equation, the form

. Ax = 0. (2.6)
28 2 A Fresh Look at Vector Spaces

The set of all solutions . Sol(A, 0) of Eq. 2.3, is a vector space: If .x and . y are
solutions of Eq. 2.3, the sum and the scalar product are also solutions of Eq.
2.3. If .x, y ∈ Sol(A, 0) : Ax = 0, Ay = 0, then it follows directly that

. A(x + y) = Ax + Ay = 0 + 0 = 0,
A(λx) = λAx = λ0 = 0(λ ∈ R)

are also valid.

Remark 2.1 .R- and .C-vector spaces.

If we replace .R by .C in all the above examples, nothing changes formally


in the existing structures. Therefore, the discussion applies equally well to real
and complex vector spaces.

Comment 2.3 Linear combinations.

All the above equations, Eqs. (2.3) to (2.6), contain linear combinations. As
we shall see in Sect. 3.1 and Remark 3.1, it might not be exaggerated to claim
that linear combinations are the most important operation in linear algebra. The
precise definition of a linear combination is given in Sect. 3.1, Definition 3.3.

The next example can have a very sobering effect on physicists!

Example 2.14 What is a vector?.


We consider the circle with radius .r in .R2 . See Fig. 2.1.
[ ] /
ξ
. S 1 = {x = ∈ R2 : ξ 2 + η2 = r > 0}.
η
2.1 Vector Spaces 29

Fig. 2.1 Circle . S 1 in .R2

We choose for example the elements .x, y ∈ S 1 ⊂ R2 and .z ∈ R2 but .z ∈ / S1.


We may ask: Are .x and . y vectors? The answer is not clear because the question
is not clear. The elements of . S 1 , .x and . y, are obviously not vectors since . S 1
is not a vector space. The elements .x, y, z of .R2 are obviously vectors since
.R is a vector space. To decide whether an element is a vector or not, we need
2

to know where it belongs to. The elements of a vector space are, of course,
vectors. The elements of a circle are not vectors, we call them points. This
clarifies the paradox that .x and . y are not vectors and at the same time .x and . y
are vectors.

Comment 2.4 What is a point?

All this reminds us of Euclid’s approach to geometry. Euclid does never tell
us what a point is, he only says how the points behave towards each other. If we
compare this with Comment 2.2, we see the difference. We considerate elements
on the circle . S 1 , .x and . y not as vectors but, since . S 1 is not a vector space, as
points in . S 1 .

At this stage, it is natural to address subsets of .V which have the same structure as
the vector space .V . This means that a subset .U should be by itself also a vector space.
In this sense, we may say that for the subset .U of .V , we would like to stay in the
vector space category, and we write .U < V . This leads to the following definition.
30 2 A Fresh Look at Vector Spaces

Definition 2.6 Subspace.


A subset .U of .V is called subspace of .V if .U is also a vector space (with the
same structure).

Remark 2.2 Criterion for a subspace.

A necessary and sufficient condition (a criterion) for a subset .U ⊆ V to be


a subspace of .V (U < V ), is given if for .λ ∈ K, u, v ∈ U ; 0, u + v, λu ∈ U
holds. This means in detail:
(i) The neutral element .0 belongs to .U ;
(ii) .U is closed under addition;
(iii) .U is closed under scalar multiplication.

2.1.2 Examples of Subspaces

Example 2.15 .{0} and .V are clearly subspaces of .V .

Example 2.16
Given .V = R2 = R × R.
[ ] [ ]
(i) The .x-axis .U1 := R × {0} =[ 01] R = {[ x0] : x ∈ R } and
the . y-axis .U2 := {0} × R = 01 R = { 0x : x ∈ R},
are subspaces of .V and we write .U1 , U2 < V ;
(ii) For every .v different from zero .(v /= 0),→ the set .Uv := Rv = {x ∈ R2 :
x = λv, λ ∈ R} is a subspace of .V ;
(iii) The set of solutions of the equation.α1 ξ 1 + α2 ξ 2 = 0 with. A = [α1 , α2 ] ∈
R1×2 is a subspace of .V . Assuming that . A /= 0 ≡ [00], we may write for


the above set of solutions .U = Sol(A, 0)
[ ]
α2
U = Rv0 with v0 =
. .
−α1
2.1 Vector Spaces 31

Example 2.17 Standard subspaces in .R3 .


Given that .V = R3 = R1 × R2 × R3 , U1,2 , U1,3 , U2,3 and .U0 are subspaces
of .V .
{[ x ] }
(i) .U1,2 = R2 × {0} = y : x, y, ∈ R < V ;
0
{[ 0 ]}
(ii) .U2,3 = y : y, z ∈ R < V;
z
{[ x ]}
(iii) .U1,3 = 0z : x, z ∈ R < V ;

→ the solution of the equation. A x→ = 0→ with. A = [α1 α2 α3 ] ∈


(iv) .U0 = Sol(A, 0),
R1×3
is also subspace of .V .

Example 2.18 Given that .V = Map(R, R), C 1 (R) and .C 0 (R) are subspaces
of .V , we may write
.C (R) < C (R) < V.
1 0

Comment 2.5 Union of two subspaces.


Similarly, it is natural to require for the union .U1 ∪ U2 of the two subspaces
.U1 and .U2 the same structure as for .V . But we may immediately realize that this
is, in general impossible. (We may choose .u 1 , u 2 ∈ U1 ∪ U2 so that .u 1 + u 2 ∈ /
U1 ∪ U2 .) The notion of the sum .U1 + U2 and .u 1 + u 2 ∈ U1 + U2 is defined in
Sect. 2.7, Definition 2.19. Here, there exists uniquely, as we shall see in Sect.
2.7, the smallest subspace of .V , which contains the set .U1 ∪ U2 .
Example: In .V = R2 we take .U1 := R1 = {(x, 0) : x /= 0, x ∈ R} and .U2 =
R2 := {(0, y), y /= 0, y ∈ R}. In this case it is clear that for the example .u 1 =
(1, 0), and .u 2 = (0, 2), we get .u 1 , u 2 ∈ U1 ∪ U2 but .u 1 + u 2 = (1, 2) ∈
/ U1 ∪
U2 . It is evident that .U1 ∪ U2 is not a vector space nor even a subspace of
. V = R . Geometrically speaking, this is perfectly clear. The vector .(1, 2), for
2

example, belongs neither to the x-axis nor to the y-axis, and consequently not
to their union. To simplify our notation, we write here .u 1 = (1, 0) instead of
.u 1 = [0 ].
1
32 2 A Fresh Look at Vector Spaces

Remark 2.3 Intersection of two subspaces.


It is interesting to notice what is easy to see here, that the intersection
U1 ∩ U2 (opposite to.U1 ∪ U2 ) is a subspace of.V . This is also plausible because
.
the intersection is a much stronger condition than the union.

2.2 Linear Maps and Dual Spaces

It is essential and beneficial to use the maps that conserve the mathematical structure
for every mathematical structure. Here, these so-called homomorphisms are the linear
maps.

Definition 2.7 Linear maps and .Hom(V, V ' ).


A map . f from .V to .V ' f : V → V ' is linear (or homomorph) if
(i) . f (u + v) = f (u) + f (v) −→ additivity ;
(ii) . f (αv) = α f (v) . −→ homogeneity
or, equivalently, if . f (αv + βv) = α f (v) + β f (v) holds.
Hom(V, V ' ) is the set of all linear maps from .V to .V ' .
.

A few further comments about linear maps: We would like to remind the reader that
we do not assume that they come upon this definition for the first time and the same
holds for most definitions and many propositions and theorems in this book. But
we are convinced that essential and fundamental facts have to be repeated and thus
this is by no means a loss of time. It is further an excellent (opportunity to fix our
notation. In this sense, we remember that )a subspace .U of .V denoted by U < V to
distinguish it from the symbol .⊆ (subset) is a vector space in its own right by the
restriction of addition and scalar multiplication to .U .

2.2.1 Examples of Linear Maps

Example 2.19 The zero map . f 0 ∈ Hom(V, V ' ).


The value of . f 0 is zero at each .v ∈ V :

f : V −→ V ' ,
. 0

v |−→ f 0 (v) := 0' ∈ V ' .


2.2 Linear Maps and Dual Spaces 33

. 0 f is linear since . f 0 (λv) = f 0 (u + v) = 0' . We have

f (αv1 + βv2 ) = 0' = 0' + 0' = f 0 (αv1 ) + f 0 (βx2 ).


. 0

Example 2.20 The identity map . f = id.


This is given by

. f : V −→ V,
v |−→ f (v) := v.

. f is linear since . f (αv1 + βv2 ) = αv1 + βv2 = α f (v1 ) + β f (v2 ).

Example 2.21 The map . f λ = λid, λ ∈ R is linear.


It is linear since

f (αv1 + βv2 ) = λ(αv1 + βv2 )


. λ

= αλv1 + βλv2
= α f λ (v1 ) + β f λ (v2 ).

Maps similar to . f λ in the form . f λ : U → U for .U < V are widely used in


quantum mechanics and in physics wherever symmetries are relevant.

Example 2.22 A parallel projection map . f = PU .


For .V = R3 , we define
⎧⎡ 1 ⎤ ⎫
⎨ ξ ⎬
.U := ⎣ξ 2 ⎦ : ξ 1 , ξ 2 ∈ R and
⎩ ⎭
0
⎧⎡ ⎤ ⎫
⎨ 0 ⎬
W := ⎣0⎦ : ξ ∈ R .
⎩ ⎭
ξ
34 2 A Fresh Look at Vector Spaces

It is apparent that we have .V = U + W as we shall see later in Definition 2.19


and Sect. 2.8. For each .v ∈ V , .v = u + w, u ∈ U and .w ∈ W .

. f : V −→ V
v = u + w |−→ f (v) := u.

The linearity is clear, and thus the image of . f is .im f = f (V ) = U . Note that
for this projection, we have to use both the subspace .U , and the subspace .W .
A parallel projection corresponds to the usual parallelogram rule when adding
two vectors. Here, we did not use the dot product in .R3 which means that we
do not have to use orthogonality.

In physics, we use for the most part orthogonal projections. These are, as we will
show later, connected with the inner product: here the dot product (see Sect. 10.4). In
this case, we only need to know the space .U = im f . The corresponding complement
(here .W ) is given by the orthogonality.

Example 2.23 . f = f A , a matrix induced map.


[ 1] [ 1 1]
α α
For .V = Rn with .n = 2, we define .v = ξ→ = ξξ 2 and . A = α12 α22 ∈ R2×2 .
1 2
The matrix . A induces the map:

. f : V −→ V
[ 1] [ 1 1 1 2] [ 1 s]
ξ α1 ξ +α2 ξ α ξ
ξ 2 | −→ α 2 ξ 1 +α 2 ξ 2
= αs2 ξ s .
1 2 s

Using the Einstein convention for the summation, we obtain componentwise,


with .s ∈ I (2) = {1, 2}:
( )1 ( )2
. f ξ→ = αs1 ξ s and f ξ→ = αs2 ξ s .

With the matrix form, the two above equations lead to a single one:

. f (ξ→ ) = Aξ→ .

A quite straightforward approach allows to see the linearity of . f ≡ f A . With


λ, μ ∈ R, we obtain:
.

. f (λξ→ + μ→
η ) = A(λξ→ ) + A(μ→ η ) = αsi (λξ s ) + αsi (μηs ) =
= λαsi ξ s + μαsi ηs = λ f (ξ→ ) + μf (→η ).
2.2 Linear Maps and Dual Spaces 35

Example 2.24 Differentiation . D := d


dx
.
For .V = C 1 , whose elements we denote by . f , we define the map

. D : C 1 (R) −→ C 0 (R)
df
f |−→ D f := .
dx
As is commonly known from analysis, . D is a linear map (.α, β ∈ R):

d d f1 d f2
. D(α f 1 + β f 2 ) = (α f 1 + β f 2 ) = α +β = α D f1 + β D f2 .
dx dx dx

Example 2.25 Integration.


For .V = L1 (X ) = { f }, the space of integrable functions in the interval . X =
[0, 1] ⊂ R. We consider the integral

. J : L1 (x) −→ R,

f |−→ J ( f ) := f (x)d x.
X

Once again we know from analysis that . J is a linear map:



. J (α f 1 + β f 2 ) = (α f 1 + β f 2 )d x =
∫X ∫
= α f1 d x + β f2 d x =

X X

=α f1d x + β f2 d x =
X X
= α J ( f 1 ) + β J ( f 2 ).

From all these examples of vector spaces and linear maps, we may learn that if
we have thousands of vector spaces, we expect millions of linear maps. In addition,
taking into account the above examples, we may observe that the image set of vector
spaces by linear maps is again a vector space.
36 2 A Fresh Look at Vector Spaces

Remark 2.4 Some useful well-known facts about linear maps.

A homomorphism . f ∈ Hom(V, V ' ) is called an isomorphism if . f is bijec-


tive: . f ∈ I so(V, V ' ). A homomorphism is called an endomorphism or a linear
operator if .V = V ' : f ∈ End(V ) := Hom(V, V ), or is called an automor-
phism if .V = V ' and . f is bijective: . f ∈ Aut (V ). In consequence, we have
the obvious hierarchy of spaces.

. I so(V, V ' ) ⊂ Hom(V, V ' ) < Map(V, V ' ).

The above notation with the symbol “.<” , indicates that the set .Hom .(V, V ' )
and .Map(V, V ' ) are vector spaces. The set . I so(V, V ' ) is not a vector space
since, for example, the sum of two such bijective maps is not necessarily
bijective.

In the case .V = V ' if we take the composition as multiplication,

. g f := g ◦ f for f, g ∈ End(V ),

it is not difficult to see that .End(V ) is a ring. Hence with the above composition,
we have an additional multiplicative operation in .End(V ). It is isomorphic to a
matrix ring, but we have not yet seen this (see Sect. 3.3, Corollary 3.8 and Comment
3.3). Similar connections of linear maps to matrices are dominant throughout linear
algebra. As we shall see, with the help of bases we can entirely describe, that is,
represent linear maps and their properties by matrices. In this sense, if we ask what
linear algebra is, we may simply state that linear algebra is the theory of matrices.

Definition 2.8 Kernel and image.


The kernel and image of a linear map . f are given by:

. ker f := {v ∈ V : f (v) = 0V ' } and im f := f (V ).

For .v ' ∈ V ' , the preimage (or fiber) of .v ' ∈ V ' is given by

. f −1 (v ' ) := {v ∈ V : f (v) = v ' }.

The kernel is often called null space and the image is called range.
We note some direct conclusion from the definition:
2.2 Linear Maps and Dual Spaces 37

Remark 2.5 Some important properties of linear maps.

Let . f : V → V ' . Then:


– . f (0V ) = 0V ' .
– If .U and .W are subspaces of .V and .V ' respectively, (i.e., .U ≤ V and .W ≤
V ' ), then . f (U ) ≤ V ' , . f −1 (W ) ≤ V and .ker f, im f are subspaces of .V and
'
. V as well.

– If . f is an isomorphism, then . f −1 : V ' → V is an isomorphism, that is, . f −1


is also linear, as . f .
– . f injective .⇔ ker f = 0V and . f surjective .⇔ im f = V ' .
– If .w ∈ .im f , then for every .v0 ∈ f −1 (w), . f −1 (w) = v0 + ker f := {v0 +
u : u ∈ ker f }.
Furthermore, if we know what the dimension of a vector space is (see
Definition 3.7), we can also give a number for. f which we call rank:.rank( f ) :=
dim(im f ).

Linear maps themselves build vector spaces. The most prominent example is the dual
space of .V denoted by .V ∗ . This contains all linear functions from .V to .K : V ∗ :=
Hom(V, K).

Definition 2.9 The dual space .V ∗ .


For a given vector space .V over .K, the dual space of .V is given by .V ∗ =
Hom(V, K).

This contains all homomorphisms, here all linear functions from.V to.K. We denote
the elements of .V ∗ preferentially by greek letters, for example .α, β, . . . , ξ, η, θ, . . .
and we thus have .V ∗ = {α, β, ξ, η, θ, . . .}. .ξ ∈ V ∗ is a linear function:

.ξ : V −→ K,
v |−→ ξ(v).

with .ξ(λ1 , v1 + λ2 v2 ) = λ ξ(v1 ) + λ2 ξ(v2 ) with .λ1 , λ2 ∈ K. .ξ is also called linear


form or linear functional. .V ∗ is of course a vector space.
38 2 A Fresh Look at Vector Spaces

Comment 2.6 On the connection between .V and .V ∗ .

Remarkably enough, .V ∗ has the same dimension as .V . We consider only


finite-dimensional vector spaces throughout this book. It turns out that .V and
. V are isomorphic (. V ∼

= V ∗ ). But for a given abstract vector space .V , no natural
isomorphism exists between .V and .V ∗ . An isomorphism .Φ : V → V ∗ depends
on the chosen basis, and there exist many of them. Although .V and .V ∗ are
isomorphic, these two vector spaces are imperceptibly different. This causes a
lot of difficulties, especially in connection with the use of tensors in physics.
This is why we are going to deal with it later.
We could cure this problem by introducing a scalar product in the vector space
. V . A Euclidean or a semi-Euclidean vector space with more structure than an

abstract vector space allows to identify the two spaces .V and .V ∗ . But we can
only fully understand the above mentioned subtlety if we deal explicitly with
the dual space .V ∗ , in both cases we take .V without and with the inner product.
We must admit that we cannot understand tensors if we do not understand
broadly the role of the dual space .V ∗ . Therefore we do not avoid, as is usually
done in introductions to elementary theoretical physics, but on the contrary, we
underline the role of .V ∗ in the present book. As already mentioned, the role of

. V is important to understand tensors.

2.3 Vector Spaces with Additional Structures

We now introduce the most prominent example of an additional structure in an


abstract vector space.

Definition 2.10 Inner products.


An inner product on .V is given by the function

(|) :
. V × V −→ K,
(w, v) |−→ (w | v).

It has the following properties:


(i) linearity in the second slot (argument):
.(w | u + v) = (w | u) + (w | v) and .(w | λv) = λ(w | v);

(ii) conjugate symmetry:


.(v | w) = (w | v);

(iii) positive definiteness :


.(v | v) > 0 if .v / = 0.
2.3 Vector Spaces with Additional Structures 39

In physics, we usually define linearity in the second slot, as above in (i), not in
the first one.
For .K = R, we call .(V, (|)) Euclidean vector space or real inner product space.
For .K = C, we call .(V, (|)) unitary vector space or complex inner product space or
Hilbert space (for finite dimensions). It corresponds usually to a finite-dimensional
subspace of the Hilbert space in quantum mechanics.
It follows from this definition by direct calculation,
(a) additivity in the first slot .(w + u | v) = (w | v) + (u | v);
(b) and .(λw | v) = λ̄(w | v).
For (a) we have from (i) and (ii)

.(w + u | v) = (v | w + u) = (v | w) + (v | u) = (v | w) + (v | u) = (w | v) + (u | v).

For (b) we have by analogy

(λw | v) = (v | λw) = (v | w)λ = (v | w)λ = (w | v)λ.


.

The inner product.(|) is called symmetric in the case of a.R vector space and Hermitian
in the case of a .C vector space. As a result, we see altogether that the real inner
product is a positive definite symmetric bilinear form. The complex inner product is
analogously what is called a Hermitian sesquilinear form positive definite. Note that
.(v | v) is real and nonnegative, for a complex vector space too.
Bearing in mind the very important applications in physics, it is instructive to
discuss again and separately the situation for a real vector space. This leads to some
more definitions for the special case of .K = R where the scalars are real numbers.

Definition 2.11 The inner product on a .R-vector space .V .

.(|) is now a bilinear form, symmetric and positive definite. Bilinear means
that if we write .σ (x, y) ≡ (x | y) for .x, y ∈ V .
(i) .σ is a bilinear form:

σ (x1 + x2 , y) = σ (x1 , y) + σ (x2 , y)


.

σ (x, y1 + y2 ) = σ (x, y1 ) + σ (x, y2 ).

Therefore .σ is linear in both slots;


(ii) .σ is symmetric:
.σ (x, y) = σ (y, x);
40 2 A Fresh Look at Vector Spaces

(iii) .σ is positive definite. Hence we have:


(a) .0 ≤ σ (x, x) and
(b) .σ (x, x) = 0 ⇔ x = 0.

This means:
(a) .σ (x, x) is never negative,
(b) .σ (x, x) is never zero if .x /= 0.
This leads to further definitions.

Definition 2.12 Non-degenerate bilinear form.


.σ is called nondegenerate in the second variable whenever .σ (x, y) = 0 for
all .x ∈ V , then this implies that . y = 0 too.
.σ is nondegenerate if it is nondegenerate in both variables.

From (b) it results that .σ is nondegenerate. Note that for nondegenerate bilinear
forms, .σ (x, x) can be negative.

Definition 2.13 Semi-Euclidean vector space.


.(V, σ ) with a nondegenerate, that is not necessarily positive definite .σ , is

called a semi-Euclidean vector space.

The use of inner products leads to different structures and properties for the vectors
in .V . The most important ones are the norm, the orthogonality, the Pythagorean the-
orem, the orthogonal decomposition, the Cauchy-Schwarz inequality, the triangular
inequalities, and the parallelogram equality.

Definition 2.14 Norm or length, .||v||.



For .v ∈ V , the norm of .v denoted by .||v|| is given by .||v|| := (v | v).

This leads to a positive definiteness, .||v|| = 0 ⇔ v = 0 and to a positive homo-


geneity .||λv|| =| λ | ||v||. Both are easily recognizable since if .v ∈ V , we have
.||v|| = (v | v) = 0 if and only if .v = 0. Similarly, we have for .λ ∈ C,
2

||λv||2 = (λv | λv) = λ̄(v | λv) = λ̄(v | v)λ = λ̄λ(v | v) = |λ|2 ||v||2 .
.
2.3 Vector Spaces with Additional Structures 41

Definition 2.15 Orthogonality.


The vectors .u, v ∈ V are orthogonal if .(u | v) = 0.
It follows that .0 ∈ V is orthogonal to every vector .(0 | v) = 0.

Theorem 2.1 The Pythagorean theorem.


If .u and .v are orthogonal in .V , that is, .(u | v) = 0), then we have:

||u + v||2 = ||u||2 + ||v||2 .


.

Note that for a real vector space, if we have .||u + v||2 = ||u||2 + ||v||2 , it follows that
.u and .v are orthogonal:

.0 = ||u + v||2 − ||u||2 − ||v||2 = (u | v) + (v | u) = 2(u | v) ⇒ (u | v) = 0.

This means that .||u + v||2 = ||u||2 + ||v||2 ⇔ (u | v) = 0.


For a complex vector space, we obtain analogously

.||u + v||2 = ||u||2 + ||v||2 ⇔ (u | v) + (v | u) = 0 ⇔ Re(u | v) = 0.

Definition 2.16 Orthogonal projection and decomposition.


We consider .v, b ∈ V with .b /= 0. The orthogonal projection of .v onto .b is
given by
(b | v)
. Pb v = b .
(b | b)

We define .vb := b (b|v)


(b|b)
, so .v = vb + vc and .vc = v − vb . A simple calculation shows
that .(b | vc ) = 0 so .vc is orthogonal to .b (Fig. 2.2).
This leads to the orthogonal projection . Pb :

. Pb : V −→ V,
(b | v)
v |−→ Pb (v) := b ∈ Kb.
(b | b)

A direct inspection shows that . Pb is a linear map and projects the vectors of .V
orthogonally to the one-dimensional subspace .Kb. . Pb , as any projection operator, is
idempotent:
. Pb = Pb .
2
42 2 A Fresh Look at Vector Spaces

Fig. 2.2 The orthogonal projection

This follows directly from the following: for all .v ∈ V ,

(b | vb )
. Pb Pb v = Pb vb = b
(b | b)
b
= (b | vb )
(b | b)
1 (b | v)
=b (b | b)
(b | b) (b | b)
(b | v)
=b
(b | b)
= Pb v.

This shows that . Pb Pb = Pb .


At this point, it is interesting to observe that for the projection we may also write
(b|v)
. Pb v =| b)
(b|b)
= |b)(b|
(b|b)
| v). This refers to the Dirac notation we use in quantum
mechanics (see Sect. 6.4).
If we start with a vector .e equal to .1, that is, .||e|| = 1, we simply get

. Pe v = e(e | v).

If we write .eb := b
||b||
, we obviously get . Pb = Peb since

. Pb v = eb (eb | v).

Note that up to now we used the orthogonal decomposition for the vector .v, relatively
to the vector .b /= 0! We may write

.idV v = v = vb + vc = Pb v + vc = Pb v + (idV − Pb )v
2.3 Vector Spaces with Additional Structures 43

for all .v ∈ V . This leads to the very useful decomposition of the identity .idV into
the two projection operations:

idV = Pb + (idV − Pb ).
.

The existence of orthogonal projections is one of the most practical applications of


the inner product. See also Sect. 10.4.
As an example, we show this by the proof of the following proposition.

Proposition 2.1 The Cauchy-Schwarz inequality: .∀u, v ∈ V

. | (u | v) |≤ ||u||||v||.

The equality holds if and only if .u and .v are colinear.

Proof For the inequality, we assume that .u /= 0 and we define the projection as
(u|v)
Pu (v) = u (u|u)
. =: vu and .vc := v − vu . Hence we have .v = vu + vc . Since

||v||2 = ||vu + vc ||2 = ||vu ||2 + ||vc ||2 ,


.

it follows that .||v|| > ||vu || such that


| |
| (u | v) | | (u | v) | | (u | v) |
. ||v|| > ||u|| | |
| (u | u) | = ||u|| ||u||2 = ||u||
.

That is why the Cauchy-Schwarz inequality indicates purely that an orthogonal pro-
jection is less than the original. Further, we have if .||vc || = 0(vu = v), then .v and .u
are colinear.

Corollary 2.1 Triangular inequality


||u + v|| ≤ ||u|| + ||v||.
.
44 2 A Fresh Look at Vector Spaces

Proof This follows essentially from the Cauchy-Schwarz inequality.

||u + v||2 = (u + v | u + v) = (u | u) + (u | v) + (v | u) + (v | v) =
.

= ||u||2 + 2Re(u | v) + ||v||2 ,


≤ ||u||2 + 2 | (u | v) | +||v||2 .

The Cauchy-Schwarz inequality for .|(u | v)| leads furthermore to the inequality

||u + v||2 ≤ ||u||2 + 2||u||||v|| + ||v||2 = (||u|| + ||v||)2 .


.

Thus we have .||u + v|| ≤ ||u|| + ||v||. ∎

2.3.1 Examples of Vector Spaces with a Scalar Product

Example 2.26 .(V, (|)) = R.


Here .V is the most simple nontrivial Euclidean vector space we can have. The
canonical or standard scalar product is given by .(α | β) := αβ and the norm
by .||α|| := |α| ≥ 0. Orthogonality leads to the following results:

.(α | β) = 0 ⇔ α = 0 or β = 0;
(α | α) = 0 ⇔ α = 0.

The Cauchy-Schwarz and triangular inequalities take the same form, as in


elementary analysis:

|(α | β)| ≤ |α| |β| and


.

|α + β| ≤ |α| + |β|.

Example 2.27 .(V, (|)σ ) = (R2 , (|)σ ).


With .(|)σ we denote a family of scalar products, like symmetric nondegenerate
bilinear forms that make.V a Euclidean, in (i) and (ii), or semi-Euclidean vector
space, in (iii).
(i) For .R2 , with the canonical or standard inner product .(|), that is, a positive
definite scalar product, we have:
2.3 Vector Spaces with Additional Structures 45
[ 1] [ 1]
If we set .x = ξ→ = ξξ 2 = (ξ s ), y = η→ = ηη2 = (ηr ) with .s, r ∈
I (2) = {1, 2} and . ξs = ξ s , ηr = ηr ∈ R, the transpose .T is given
by:
T
V ←→V ∗ ,
[ 1]
ξ [ ] [ ]
x = ξ→ = 2 |−→ x T = ξ := ξ 1 ξ 2 = ξ1 ξ2 ,
. ξ ∼
[ 1]
ξ [ ]
ξ T = 2 ←−|ξ = ξ1 ξ2 .
∼ ξ ∼

At this point, we introduced a new symbol for good reasons, .ξ ∈ R1×n .



We have to make the difference in our notation between the elements of
2 ∗
.R and the elements of .(R ) . We therefore write
2

[ 1]
→ ξ
.x = ξ = ∈ R2
ξ2

and
. x T = ξ = [ξ1 ξ2 ] ∈ (R2 )∗ .

Furthermore, in order to use the Einstein summation convention, we


denote the coefficients of the linear forms (covectors) by writing the index
downstairs: We write for .ξ ∈ (R2 )∗ , .ξ = ξ = [ξ1 , ξ2 ]. Consequently .ξ
∼ ∼
always corresponds to a row and .ξ→ to a column vector in .R2 , and similarly
for .(Rn )∗ and .Rn .

It is obvious that .T transforms columns to rows and rows to columns. This


demonstrates also the vector space isomorphism between .Rn and .(Rn )∗ .
Thus we get :

x |−→ x T |−→ (x T )T = x and


.

ξ |−→ ξ T |−→ (ξ T )T = ξ .
∼ ∼ ∼ ∼

This shows that .T is an involution: .T 2 = id.

With the above preparation, we can express the canonic inner product for
.n = 2 in various forms, as we saw, for example with .x = ξ→ and . y = η→,
where we have:


2
ξ y = x T y = (x | y) = (ξ→ | η→) =
. ξ s ηs = ξ s δsr ηr = ξs ηs .
s=1
46 2 A Fresh Look at Vector Spaces

The symbol .δsr , the Kronecker symbol, is given by


(
1 if s = r
.δs ≡ δsr :=
r
0 if s /= r.

Using the transpose .T and the explicit symbols for columns and rows,
→ , x T = ξ and .ξ T = x, we may again write for .( | ) :
.x = ξ
∼ ∼

[10]
(x | y) = x T 1y = x T y = ξ η→ with 1 = (δsr ) =
. 01 .

Here, the .V = R2 orthogonality is no longer trivial, as with .R. For a given


subspace .U ≤ V , the orthogonal space (perpendicular) .U ⊥ relative to .U ,
is given by

.U ⊥ = {v ∈ V : (v | u) = 0 for all u ∈ U }.

If we take for example .v0 ∈ V, v0 /= 0, we can define the subspace

U0 = v0 R := {u : u := λv0 , λ ∈ R} ≤ V.
.

The orthogonal space .U0⊥ , as shown in Fig. 2.3, is given in this case by

.w ∈ V, w / = 0 with .(w | v0 ) = 0. So the subspace .U0 is equal .wR and
we may write

. V = U0 + U0 .

Fig. 2.3 Orthogonality in .R2


2.3 Vector Spaces with Additional Structures 47
[ σ1 ]
(ii) .R2 with an inner product given by the matrix . S = (σsr ) = 0
0 σ2 with .σ1
and .σ2 positive:
T
.(x | y)σ := x Sy = ξ σsr η .
s r

. (V, (|)σ ) is again a Euclidean vector space. All aspects discussed in exam-
ple (i), apply one to one also here.
(iii) .R2 , with a symmetric [ nondegenerate
] bilinear form, is given, for example,
by the matrix . S = σ01 σ02 , with .σ1 = positive and .σ2 negative. We can
[ 0 ]
assume that .σ1 = 1 and .σ2 = −1. We get . S = 01 −1 .
.(V, (|)σ ) is a semi-Euclidean vector space. This is our first model for
the two-dimensional spacetime of special relativity: In other words, it
is the vector space that corresponds to the two-dimensional Minkowski
spacetime.

Remark 2.6 The angle.

In this example, we find a very important application of the Cauchy-Schwarz


inequality which allows the definition of an angle. If we take .u, v ∈ R2 , .u and
.v not zero, the Cauchy-Schwarz inequality .(u | v) ≤ ||u||||v|| leads to

|(u | v)|
. 0≤ ≤ 1.
||u||||v||

As a result, we obtain
(u | v)
. −1≤ ≤ 1.
||u||||v||

This allows the unique determination of a real number .ϕ ∈ [0, π ], the angle
between the two vectors .u, v ∈ R2 − {0}:

(u | v)
. cos ϕ := .
||u||||v||

It is clear that this definition applies to any Euclidean vector space .V , for any
two vectors .u, v ∈ V − {0}.
48 2 A Fresh Look at Vector Spaces

Example 2.28 .(V, (|)) = (R2×2 , (|) M ).


For the two matrices
[ 1 1]
α1 α2
.A = = (αsi ) and B = (βsi ) i, s ∈ I (2) = {1, 2},
α2 α2 1 1

an inner product is given by

(A | B) M := α11 β11 + α21 β21 + α12 β12 + α22 β22 .


.

If we define the trace of . A by



tr A := α11 + α22 =
. αii ≡ αii ∈ R
i=1

and write for the transpose of . A : AT = (α(T )is ) with .α(T )is = αsi , the inner
product takes the form:

(A | B) M :=
. αsi βsi = α(T )is βsi = tr (AT B).
i,s

The corresponding norm is given by


) )1/2

T
||A|| M := (tr A A)
.
1/2
= (αsi )2 .
i,s

It is easy to recognize the isomorphism

(R2×2 ), (|)) M ∼
. = (R4 , (|))

and to see that .(|) M is indeed an inner product in .V = R2×2 . It is also obvious
that all the above relations may be extended to every .n ∈ N. The space .(Rn×n ,
(|) M ), of .n × n matrices, is also a Euclidean vector space.

Example 2.29 . L 2 (R).


It is interesting to consider the next example, taken from analysis, even if it
corresponds to an infinite dimension vector space that we do not discuss in this
book. We denote by . L 2 (R) the vector space of the class of square integrable
functions in .R, as used in quantum mechanics. Without going into detail, we
can give
2.3 Vector Spaces with Additional Structures 49


.( f | g) := f (x)g(x)d x f, g ∈ L 2 (R)
R
and
/∫
|| f ||2 := f (x)2 d x.
R

Comment 2.7 Bilinear forms in .V × W and .V ∗ × V .

The definition of a bilinear form.ϕ applies also when we consider two different
vector spaces, .V and .W :

ϕ : V × W −→ R,
.

(v, w) |−→ ϕ(v, w) ∈ R.

The following naturally given bilinear form is a very instructive and remarkable
example. We take .V and its dual .V ∗ and denote the bilinear form by the symbol
.(, ):

(, ) V ∗ × V −→ R,
.

(ξ, v) |−→ (ξ, v) := ξ(v) ∈ R.

Note that in this case, we write .(, ) and not .(|) as for the inner product.

We complete this section by presenting a few characteristic examples which now


refer to .C vector spaces.

Example 2.30 .(V, (|)) = C.


For the canonical inner product .(|), we use the complex conjugation, as usual.
If .z, w ∈ C, z = x + i y |−→ z̄ = x − i y, x, y ∈ R, we have

(z | w) := z̄w.
.

The norm takes the form .||z|| := |z| = (z̄z)1/2 and the Cauchy-Schwarz
inequality is given by .|z̄ w| < |z| |w|.
50 2 A Fresh Look at Vector Spaces

Example 2.31 .(V, (|)) = (C2 , (|)).


For the canonical inner product .(|), we have to use the Hermitian conjugate
[ v1.v] ∈ C given
of 2
by .v † := v̄ T . More explicitly examined, this means that .v =
v2
with .v , v ∈ C, is by definition a column, and .v † = [v̄1 , v̄2 ] a row (with
1 2

.v = v̄1 , v = v̄2 ). The norm is given by


1 2

||v|| = (v † v)1/2 = (v̄1 v 1 + v̄2 v 2 )1/2 = (v̄s v s )1/2 s ∈ I (2).


.

Example 2.32 .(V, (|)) = (C2×2 ), (|) M ).


For . A, B ∈ C2×2 , the standard inner product .(|) is given by

(A | B) M := tr (A† B) and the norm by


.

||A|| M := (tr (A† A))1/2 .

Remark 2.7 Analogy to vector spaces over .C.


All the above relations can undoubtedly be extended from every .Rn to .Cn .
Using the simplest possible models allowed us to clarify the notation we intend
to use systematically throughout the book.

2.4 The Standard Vector Space and Its Dual

If we consider an abstract .n-dimensional vector space, it has only the linear structure.
We can of course introduce other structures, such as a volume form (see Sect. 7.5)
or an inner product, or even a particular basis. The standard vector space .Kn has all
of these structures. Therefore, we can say that the vector space .Kn has the maxi-
mum structure an .n-dimensional vector space can have! The reader should already
be familiar with .Rn from a first course in analysis. In Sect. 2.1, Example 2.3, we
introduced some elements of this section for the case .n = 2, so our discussion here
can be considered as a review thereof or as an extension to general .n ∈ N .
As discussed in Example 2.4, .Kn is the set of all finite sequences of numbers with
length .n which we may also call .n-tuple or list of length or size .n. We regard this
.n-tuple as a column. Any .K is given by
n
2.4 The Standard Vector Space and Its Dual 51
⎧ ⎡ ξ1 ⎤ ⎫

⎪ ⎪


⎨ ⎢ ... ⎥ ⎪

⎢ j⎥ j
Kn =
. x = ⎢ ξ ⎥ : ξ ∈ K, j ∈ I (n) := {1, 2, ...n} .

⎪ ⎣.⎦ ⎪


⎩ .. ⎪

ξ n

We also use the notation .x = (ξ j )n ≡ (ξ j ) ≡ ξ→ and we omit the index .n when its
value is clear. We may say that .ξ j is the . jth coordinate or the . jth coefficient or the
. jth component of . x ∈ K .
n

If we apply the standard addition and scalar multiplication by adding and multi-
plying the corresponding entries, the set .Kn is indeed a vector space since it fulfills
all the axioms of the definition of a vector space. .Kn is the standard vector space; it is
the model of a vector space with dimension .n, and, as is well-known from analysis,
it is locally also the model of a manifold!
It is evident that .Kn can be identified with the .n × 1-matrices with entries in .K.
The greatest advantage of .Kn is its canonical basis:
⎡ ⎤ ⎡ ⎤
1 0
0 .
. E := (e1 , . . . , en ) with e1 = ⎣ . ⎦ , . . . en = ⎣ .. ⎦ .
.. 0
0 1

Matrices are closely connected with linear maps. It is well-known that a matrix. F with
elements in .K, may, if we wish, describe a linear map . f as a matrix multiplication.
This was also demonstrated in Example 2.23. Here we use the letter. F to demonstrate
the close connection between a matrix and a map, and in our mind we may always
identify . f with . F:

. f : Kn −→ Km ,
x |−→ f (x) := F x.

This means that we have . F ∈ Km×n and . f ∈ Hom(Kn , Km ). Usually, the .m × n-


matrix. F is denoted by. F = (ϕi j ) with.ϕi j ∈ K,. j∑∈ I (n) := {1, . . . , n},.i ∈ I (m) :=
{1, . . . , m} and we have .z := f (x) and .z i = nj=1 ϕi j x j , z i , x j ∈ K, as usual in
literature. In special relativity, as in tensor analysis, we actually wish for the type
of relevant transformation we may perform to be indicated by the position of the
indices (up or down). We therefore put initially .ϕ ij := ϕi j . In addition, it is a great
advantage to use the Einstein convention for the summation which is almost implied
by this notation. Therefore we first write . F = (ϕ ij ) and we have

. z i = ϕ ij x j . (2.7)

However, this is not enough for us. Since we always use greek letters for scalars, we
set .x = (ξ j )n ≡ ξ→ , j ∈ I (n), z = (ζ i )n ≡ ζ→ , i ∈ I (m). In addition, we change
the index . j into the indices .s, r ∈ I (n) and we write .x = (ξ s )n . In this book, we
52 2 A Fresh Look at Vector Spaces

systematically use this kind of indices and notation, and we may call this “Smart
Indices Notation”. Hopefully, the reader will soon realize the usefulness of this
notation. We therefore write for the Eq. 2.7

ζ i = ϕsi ξ s .
. (2.8)

We can verify that . f is indeed a linear map in both notations. The matrix . F = (ϕsi )
contains all the information about the map . f .
Before proceeding further, we would like to remark that we restrict ourselves to
.R for simplicity reasons and because there are direct, relevant connections to the
n

Euclidean geometry, especially in .n = 3, and to special relativity (.n = 4). We must


mention again that we consider the elements of .Rn as columns. But nevertheless,
we need the rows too! If we take the dot product and the matrix multiplication into
account, and also take .ξ s = ξs and .ηs = ηs , we have:

.(|) : Rn × Rn −→ R,

n
(x, y) |−→ (x | y) := ξ s η s = ξs η s ,
i=1

we may interpret rows (.1 × n-matrices) with entries scalars as elements of the dual
space of .Rn and write:
{ }
(Rn )∗ := ξ ∗ ≡ ξ = [ξ1 . . . ξn ] : ξi ∈ R, i ∈ {1, 2, . . . , n} .
.

We used the matrix notation .ξ = [ξ1 ξ2 . . . ξn ] ∈ R1×n and not the notation for the

corresponding (horizontal) list.(ξ1 , ξ2 , . . . , ξn ) (see Comment 2.1). In order to use the
Einstein convention, we have to write the indices of the coefficients of .ξ downstairs
and we get, with . y = (ηs ) = η→,

ξ : Rn −→ R,
.
[ η1 ]
y |−→ ξ(y) = ξs η = [ξ1 · · · ξn ] .. .
s
.n
η

The Einstein convention is the simplest and nicest way to express matrix multi-
plications. The same holds for linear combinations. After all, both operations are
essentially the same thing. Therefore we can write symbolically in an obvious nota-
tion: [ ]
..
. [∗ ∗ ∗ · · · ∗ ∗ ] . = [∗ · + ∗ · + ∗ · + · · · + ∗ · + ∗ ·] (2.9)
...
2.4 The Standard Vector Space and Its Dual 53

which is
. [1 × n][n × 1] = [1 × 1].

Remark 2.8 A generalization of matrix multiplication.

The above Eq. 2.9 is nothing else but the usual row by column rule for the
multiplication of matrices. We assumed tacitly that between the entries “.∗” and
“..”, the multiplication and correspondingly the addition are already defined.
At the same time, with this symbolic notation, we also generalized the matrix
multiplication, even in the case where the entries are more general objects than
scalars. This means for example that if the entries “.∗” and “..” are themselves
matrices, we obtain also the matrix multiplication of block matrices.

We use the row matrix .[ξ ] ∈ R1×n to express the linear map

.ξ ∈ Hom(Rn , R) = (Rn )∗ .

This leads to the natural identification of.R1×n with.(Rn )∗ . Thus we can call.ξ ∈ (Rn )∗
a linear function, a linear form, a linear functional or a covector, and we may also
write:

ξ ≡ ξ ≡ [ξ ] ≡ (x |: Rn −→ R,
.

y |−→ ξ(y) = [ξ ]y = (x | y).

(x |∈ (Rn )∗ corresponds to the Dirac notation and is widely used in quantum mechan-
.
ics (see also Sect. 6.4).
Furthermore, if we use the transpose, we set .ξi = ξ i and we have

. T : Rn −→ (Rn )∗ ,
x |−→ x T = [ξ1 . . . ξn ].
54 2 A Fresh Look at Vector Spaces

Comment 2.8 The use of the transpose .T.

We can connect .Rn with .(Rn )∗ and in addition we can, if we want, identify
n ∗
.(R ) with .R . This is an identification we are often doing in physics from the
n

beginning without even mentioning the duality behind it.


However, a note of caution is in order here. In the case of an abstract vector
space .V , we cannot ignore .V ∗ because there is no basis-independent connection
between .V and .V ∗ . The situation changes drastically if we introduce an inner
product in .V . We see this immediately in the case of .Rn since the canonical
inner product (dot product as above) already exists here. We have altogether
T
. y (x) = y x = (y | x). It is clear that here .ξ = ξi ∈ R and we write
i

T
. y ≡| y) |−→ (y |≡ y ≡ y T ∈ (Rn )∗ .

See also the Dirac notation in Sect. 6.4.

2.5 Affine Spaces

In the previous sections, we started with an abstract vector space .V given only by
its definition, and we introduced the inner product as an additional structure on
it. Besides its geometric signification, this other structure facilitates the formalism
considerably within linear algebra, which is very pleasant for applications in physics
too.
On the other hand, removing some of its structure is necessary when starting with
. V . It is essential to consider the vector space . V as a manifold. This special manifold
is also called linear manifold or affine space. The well-known Euclidean space is
also an affine space, but its associated vector space is an inner product vector space,
a Euclidean vector space. An affine space has two remarkable properties: On the one
hand, it is homogeneous and on the other hand, it contains just enough structure for
straight lines to exist inside it, just as every every .Rn does. Similarly, any vector space
has the same capacity to contain straight lines. But at the same time, a vector space
is not homogeneous since its most important element, the zero, is the obstacle to
homogeneity. To obtain the associated affine space starting from a vector space, we
have to ignore the unique role of its zero, and to ignore that we can add and multiply
vectors by scalars. We obtain an affine space with the same elements as the vector
space we started with and which we now may call points. The action of a vector
space .V gives the precise procedure for this construction on a set . X , precisely in the
sense of group actions in Sect. 1.3. The group which acts on . X is the abelian group
of .V :
2.5 Affine Spaces 55

Definition 2.17 Affine space.


An affine space . X associated to the vector space .V is a triple .(X, V, τ ).We
may call the elements of . X points and the elements of .V vectors or translations
(using also the notation .V = T (X )) with

.τ : V × X −→ X,
(v, x) |−→ τ (v, x) = τv (x) = x + v.

The action .τ of .V on . X is free and transitive.

From this follows, according to Comment 2.2 in Sect. 2.1, that .V and . X have
the same cardinality or, in simple words, have the same “number” of elements: For
. x 0 ∈ X , we have . V × {x 0 } → X , and the map

=

. V −→ X,
v |−→ x0 + v,

is bijective.
We can give another equivalent and perhaps more geometric description of an
affine space, with essentially the following property:

. X a set, T (X ) the associated vector space and Δ the map:

Δ : X × X −→ T (X ),
.

(x , x ) |−→ −
0 1 x−→
x , 0 1

such that for .x0 , x1 , x2 ∈ X , the triangular equation .− x−→ −−→ −−→
0 x 2 = x 0 x 1 + x 1 x 2 holds. This
means that two points .x0 and .x1 in X determine an arrow .− x− →
0 x 1 ∈ T (X ) which we
may also interpret as a translation. We may consider the straight line through the
points .x0 and .x1 as an affine subspace of . X . Here, we see explicitly that an affine
space can contain straight lines. It is interesting to notice that for every .x0 ∈ X , we
have the set
−→
. Tx 0 X = {x 0 + x 0 x, x ∈ X }

which is the tangent space of . X at the point .x0 . In this sense, we may regard .T (X ) as
the universal tangent space of . X and the letter .T could mean not only “translation”
but also “tangent” space.
The above construction is equally valid ( if we start )with a vector space .V (X = V ).
The affine space now is the triple .V := V, T (V ), τ . In this case, the arrow .− v− →
v
0 1 is
given by the difference: .−
v−→
v
0 1 = v 1 − v0 and the vector space . V is now considered
as an affine space! See also Comment 2.2 in Sect. 2.1.
56 2 A Fresh Look at Vector Spaces

Starting with a vector space .V allows to give another, more direct definition of
an affine space in .V , considered as a subset, not a subspace, of .V . This construction
is more relevant to linear algebra. It also leads to a new kind of very useful vector
spaces, the quotient vector spaces which are discussed in Sect. 2.6.

Definition 2.18 Affine space in .V .


.U is a vector subspace of . V . For a given .v0 ∈ V , an affine space . A =
A(v0 ) ⊂ V , associated to .U , is given by the set

. A = v0 + U := {v0 + u : u ⊂ U }.

As we saw in Sect. 1.3, we may also write . A(v0 ) = U v0 , that means that the affine
space . A(v0 ) is exactly the orbit of the action .U on the vector .v0 . Here, the action of
.U on .v0 is the additive action of the commutative subgroup .U (of the commutative
group .V ). This justifies the notation . A(v0 ) = U v0 = {u + v0 = v0 + u : u ∈ U } =
U + v0 .

Example 2.33 Affine spaces as solutions of linear equations.


Given a linear map
. f : V −→ V ' ,

affine spaces appear very naturally as solutions of a linear equation . f (x) = w0


or equivalently as a preimage (or fiber) of .w0 . As discussed in Remarks 2.4
and 2.5 in Sect. 2.3, .ker f and .im f are subspaces of .V and .V ' respectively.
The preimage of .w0 is given by . f −1 (w0 ).
For . f (v0 ) = w0 with .w0 /= 0, we may write, as above, taking .U := ker f

. A(v0 ) = f −1 (w0 ) = v0 + U.

In Fig. 2.4, we see the corresponding affine spaces associated to the linear map
.f . They are all parallel to each other and have the same dimension.

2.6 Quotient Vector Spaces

We know from Sect. 1.2, in quite general terms, what a quotient space is. We now
have the opportunity to discuss a very important application which is probably the
most important example of a quotient space in linear algebra. In what follows, we
2.6 Quotient Vector Spaces 57

Fig. 2.4 Affine subspaces in .V relative to the subspace .U = ker f < V with .w = f (v), w1 =
f (v1 ), w0 = f (v0 )

would like to stay within the vector space .V , we use the notation of Sect. 2.5. We
consider the set of all such affine subspaces associated with a given subspace .U of
. V for all points .v ∈ V :
. A(U ) := {A(v) : v ∈ V }.

As shown in Sect. 2.5, all these affine subspaces are parallel and have the same
dimension. Here, it is also intuitively clear, as mentioned in Remark 1.1 in Sect. 1.2,
that this gives us a disjoint decomposition of the vector space .V .
It turns out that we can introduce a vector space structure on the set . A(U ) and
that we may thus obtain a new vector space out of .V and the given subspace .U .
Therefore we can talk about a vector space . A(U ) with elements (again vectors of
course) which are the affine spaces in .V associated with .U . The elements of . A(U )
are by construction equivalent classes or cosets. Every such class is in this case also
a vector since, as we will see below, we can introduce naturally a linear structure on
. A(U ). This makes . A(U ) a vector space and we denote this vector space by . V /U . As
sets, . A(U ) and .V /U are bijectively connected. . A(U ) is simply a set as was defined
above, and .V /U is this set with the vector space structure. Clearly, this new vector
space itself cannot be a subspace of .V , but we stay, in a more general mathematical
language, in the vector space category.
Another way to obtain similar new vector spaces, is to use the notion of equivalence
classes as discussed in Sect. 1.2. We notice that the subspace.U induces an equivalence
class on .V :
' '
.v ∼ v :⇔ v − v ∈ U.

The coset [.v] of this equivalence class of .v is exactly the additively written orbit of
the .U action on .v and we have for this orbit

[v] = U v = A(v) = v + U.
.
58 2 A Fresh Look at Vector Spaces

The set of cosets of .U in .V is given by

. V /U := {v + U : v ∈ V } (2.10)

which we may call the quotient space of .V modulo .U . It is evident that a bijection
of sets . A(U ) ∼
= V /U holds.
bi j
Furthermore, we are now in the position to use the formalism of Sect. 1.2 to
introduce a vector space structure on the set .V /U = {[v]}. Consequently, in the end,
we will be in the position to add and scalar multiply the affine subspaces, the classes
or cosets, associated with .U . This means for example that for a given subspace .U
with .dim = 1, all straight lines parallel to .U behave exactly as vectors of a vector
space. This also means that, in general, at the end of our construction, we may expect
to have
. V /U ∼= Km ,

and as we will see, with .m = dim V − dim U .


In the two following equivalent figures, we represent the various affine subspaces
of .R2 relative to a given subspace .U (U ≤ R2 ) and the quotient space .V /U . Our
[ ] = v+U =
figures should be self-explanatory with . A(v) [ [v].
]
The .x- and . y-axes are denoted by .R1 = 01 and .R2 = 01 . . A(v), A(v ' ), A(u 0 ) are

elements of .V /U . .U = Rλu 0 = A(u 0 ) = [0].
Keeping it general, we may also write symbolically for Fig. 2.5 (see Fig. 2.6):
Note that

. A(v1 ) = π −1 (π(v1 )) = π −1 ([v1 ]),



A(v0 ) = π −1 ([0]),
A(v) = π −1 (π(v)) = π −1 ([v]).

Since all these spaces are parallel and have the same dimension, and are elements of
the set . A(U ) = V /U , it seems quite natural to look at the concrete space .V /U and
try to define the addition and, similarly, the scalar multiplication in the following
way:

+
[v] · [w] := (v + U ) + (w + U ) = v + w + U,
.

λ · [v] := λ(v + U ) := λv + U.

We have only to make sure that these operations are well-defined. That is for .[v ' ] =
[v] and .[w ' ] = [w], we have .v ' + w ' + U = v + w + U , and .λv + U = λv ' + U .
Taking into account the definition of the equivalence class (see Definition 1.2), we
see immediately that this is indeed the case and that .V /U is a concrete vector space.
Since we may write the equivalence relation

v ∼ v ' ⇔ v ' − v ∈ U ⇔ v ' = v + u, u ∈ U ⇔ [v ' ] = v + U,


.
2.6 Quotient Vector Spaces 59

Fig. 2.5 The vector space .R2 with the .x- and . y-axes and the affine spaces parallel to .U = u R

Fig. 2.6 Figure 2.5 symbolically

and equally with .w and .w ' , we get

v ' + w ' + U = v + U + w + U + U = v + w + U.
.
60 2 A Fresh Look at Vector Spaces

This shows that

(v ' + w ' ) ∼ (v + w) ⇔ [v ' + w ' ] = [v + w] ⇔ [v]+. [w] = [v ' ]+. [w ' ].


.

Similarly,
λv ' + U = λ(v + U ) = λv + λU + U = λvU.
.

This shows that

λv ' ∼ λv ⇔ [λv ' ] = [λv] ⇔ λ · [v ' ] = λ · [v].


.

It is interesting to notice that the natural map (canonical map) .π ,

π : V −→ V /U, v |−→ π(v) := [v],


.

is a linear vector space homomorphism. It is evident since we may write

. π(v + w) = [v + w] = [v]+. [w] = π(v)+. π(w)


and
π(λv) = λ[v] = λ · π(v).

Hence, we may revert to use “.+” instead of “.+. ”:

. [v] + [w] := [v + w] and λ[v] := [λv].

We herewith established the following very interesting proposition:

Proposition 2.2 Quotient vector space .V /U .


Let .V be a vector space and .U < V , then the quotient space .V /U = {a +
U : a ∈ V } is a vector space with addition and scalar multiplication as follows:

. V /U × V /U −→ V /U,

(a + U, b + U ) |−→ (a + b) + U and
K × V /U −→ V /U,
(λ, a + U ) |−→ λa + U.

The canonical map or quotient map .π is given by:

π : V −→ V /U,
.

v |−→ [v] ≡ v + U.


Thus, the zero of .V /U is .ker π = U ≡ [0].
2.6 Quotient Vector Spaces 61

The map .π is linear and surjective (epimorphism). The dimension formula, a.k.a
rank-nullity (see Corollary 3.2 in Sect. 3.3), takes here the form

. dim V = dim U + dim V /U.

Comment 2.9 What is the essence of a vector space?

We now understand even better what a vector space is. The elements of a
vector space can be anything. The only one thing we can demand from these
elements, is that they behave well; that is, that we can add and multiply by
scalars. In consequence, we understand that it is the behavior that matters. Here,
we come across the same situation as we met with the Euclidean axioms. We do
not know what the essence of a vector is. Nevertheless, it does not matter since
we have the definition of a vector space.

Now we can ask ourselves: what are the other benefits of the notion of quotient
space for linear algebra? We present two examples that might also be a relevant in
physics.
As will be discussed in Sect. 3.5 in Proposition 3.11, for every given subspace
.U and . V there exist a lot of complementary subspaces . W in . V , satisfying .U ⊕
W ∼= V (see Definition 2.19). In this situation, the following holds: the quotient
vector space .V /U is isomorphic to .W , that is, .W ∼
= V /U and consequently to any
complementary subspace of .U in .V . We can therefore say that .V /U represents all
these complimentary vector spaces of .U .
The second example is the following isomorphism theorem which we give without
the proof.

Theorem 2.2 The first isomorphism theorem:


Let . f : V → V ' be a linear map. The . f induces a canonical isomorphism . f¯:

. V / ker f ∼
= im f.

If . f is a surjective map, this canonical isomorphism is given by

. V / ker f ∼
= V '.

This means that we can describe the vector space .V ' solely with data of the vector
space .V .
62 2 A Fresh Look at Vector Spaces

It is further worthwhile noting that quotient vector spaces play an important role
not only in finding and describing new spaces but also in formulating theorems or
various proofs in linear algebra. As already mentioned, in physics we also often use
quotient spaces intuitively even if we do not apply explicitly the above formalism.

2.7 Sums and Direct Sums of Vector Spaces

We are now going to answer the question that arose in Comment 2.5 of Sect. 2.1.
How can we obtain uniquely a subspace of.V from a union of two or more subspaces?
There is a natural construction that leads to the desired result.

Definition 2.19 Sum of subspaces.


For the given subspaces .U1 , . . . , Um of .V , the sum of .U1 , . . . , Um is given
by the following expression:

U1 + . . . + Um := {u 1 + u 2 + . . . + u m : u i ∈ Ui , i ∈ I (m)}.
.

We may also write



m
U=
. Ui = U1 + · · · + Um .
(i=1)

We can see immediately that .U is a subspace of .V , that is, .U < V . In addition, .U


is the smallest subspace of .V containing the subspaces .U1 , . . . , Um . This is easy to
perceive since every other subspace .W that contains also .U1 , . . . , Um must contain
something more than .U . The following construction seems even more perfect:

Definition 2.20 Direct sum of subspaces.


The sum of the subspaces .U1 , . . . , Um is a direct sum denoted by
m
.U1 ⊕ · · · ⊕ Um = ⊕ Ui
i=1

if every element of the sum .u ∈ U1 + · · · Um has a unique decomposition as a


sum
.u = u 1 + · · · u m with u i ∈ Ui i ∈ I (m).
2.7 Sums and Direct Sums of Vector Spaces 63

Remark 2.9 Direct sum and zero.

Definition 2.20 is equivalent to the following statement. The zero element


m
has a unique decomposition in .U := ⊕ Ui : That is, if .u 1 + · · · + u m = 0, then
i=1
u = 0, . . . , u m = 0.
. 1

Proof If the zero has a unique decomposition, we may check that if .u = u 1 + · · · +


u m , and.u ' = w1 + · · · + wm are given, we get for the difference.0 = u − u ' = (u 1 −
w1 ) + · · · + (u m − wm ) and then .u 1 = w1 , . . . , u m = wm . ∎

Remark 2.10 Direct sum and linear independence.

In the case of a direct sum, we may say that the list of vector spaces
.(U1 , . . . , Um ) is (block) linearly independent too (for the definition, see Sect.
3.1). As a consequence, a direct sum is a form of linear independence.

Comment 2.10 Direct sums and linear maps.

The direct sum decomposition of .V ,

. V = U1 ⊕ · · · ⊕ Um ,

is very interesting, especially if this decomposition is induced by an operator


(endomorphism) . f ∈ Hom(V, V ) ≡ End(V ) since such decompositions char-
acterize the geometric properties of . f . As we shall see later, this also leads
to a decomposition of every . f ∈ End(V ). In the more general case of a linear
map . f ∈ Hom(V, V ' ), if we choose a basis in .V and .V ' , we always obtain the
decomposition of the form

. V = U1 ⊕ U2 and V ' = U2' ⊕ U1'

with .U1 = ker f and .U2' = im f . So . f is given by

.U1 ⊕ U2 f
−→ U2' ⊕ U1'

which is a form of the fundamental theorem of linear maps (see Theorem 5.2 in
Sect. 5.3).
64 2 A Fresh Look at Vector Spaces

2.7.1 Examples of Direct Sums

Example 2.34 .V = R2 .
As in the examples in 2.16 in Sect. 2.1.2, .U1 and .U2 are the x-axis and y-axis.
We have of course .U1 ∩ U2 = {0} and we have the direct decomposition of
.R :
2

.R = U1 ⊕ U2 ≡ (x − axis) ⊕ (y − axis).
2

Example 2.35 .V = Pol(5).


All real polynomials with degree .< 5. If we define the odd polynomials in
Pol(5) by .U1 , and the even polynomials in . Pol(5) by .U2 , we get .U1 ∩ U2 =
.
{0} and the direct sum
. Pol(5) = U1 ⊕ U2 .

At this point, the question arises concerning a criterion for a sum to be a direct
sum. A simple answer is given in the following proposition for .m = 2.

Example 2.36 Direct sums in .R3 .


We consider .U1 and .U2 , two arbitrarily chosen one-dimensional subspaces
in .R3 with .U1 ∩ U2 = {0}, and we obtain a two-dimensional subspace .V , as
shown in Fig. 2.7a.
Similarly, we chose .U1 and .U2 , two arbitrarily chosen subspaces in .R3 with
.dim U1 = 1, .dim U2 = 2 with .U1 ∩ U2 = {0}, and we obtain . V = U1 ⊕ U2 =

R3 (Fig. 2.7b).
2.7 Sums and Direct Sums of Vector Spaces 65

Fig. 2.7 Direct Sums in .R3

Proposition 2.3 Direct sum of two subspaces.

U + Y is a direct sum if and only if


.

.U ∩ Y = {0}.

Proof If.U + Y is a direct sum, we have to show that.U ∩ Y = {0}. Suppose.z ∈ U ∩


Y , with .z ∈ U and .z ∈ Y , and .(−z) ∈ Y . We get .z + (−z) = 0 and also .0 + 0 = 0.
The unique representation of .0 leads to .z = 0 and .(−z) = 0. It follows that we get
. z = 0 and .U ∩ Y = {0}.
If .U ∩ Y = {0}, we have to show that .U + Y is direct: We have only to show that
the decomposition of .0 is unique. If .0 = u + y with .u ∈ U and . y ∈ Y , we have to
show that .u = 0 and y = 0:

0 = u + y ⇔ u = −y,
.

⇒ u ∈ Y,
⇒ u ∈ U and u ∈ Y ⇒ u ∈ U ∩ Y,
⇒ u = 0 and also y = 0.

Thus, the proposition holds. ∎


66 2 A Fresh Look at Vector Spaces

Remark 2.11 Direct sums for more than two subspaces.

It is of interest to notice, here without proof, that for more than two sub-
spaces, we have the following result: .U1 + · · · + Um is a direct sum if and only
if
∑m
.U j ∩ Ui = {0} for all j ∈ I (m) := {1, 2, . . . , m}.
i/= j

This is equivalent
∑m to the uniqueness or, generalizing, to the linear independence
relation: If . i= j u i = 0 with .u i ∈ Ui , i ∈ I (m), then .u i = 0 for all .i ∈ I (m).

2.8 Parallel Projections

In an affine space and in every abstract vector space, the notion of parallelism is part
of the structure. Consequently we may say that the vectors .u and .v in .V are parallel
whenever .u = λv for some scalar .λ ∈ K. See also Proposition 3.11 in Sect. 3.4 which
leads immediately to the definition of a (parallel) projection.

Definition 2.21 (Parallel) Projection.


Given a direct sum .V = U ⊕ W , the linear operator

. P : V −→ W < V,
v |−→ P(v) = w,

is given by the relation .v = u + w, u ∈ U and .w ∈ W . . P is called (parallel)


projection along .U (see Fig. 2.8).

As we see, parallel projection is essentially the well-known parallelogram rule


known in physics. It is clear that this projection depends on both, .U and .W : P =
P(U, W ). Furthermore, if we set .U = ker P and .W = im P, we have:

. V = ker P ⊕ im P

and of course .ker P ∩ im P = {0}.


We see directly that . P|W = idW and . P|U = 0̂U , where .0̂U is the null operator,
holds. Further on, the algebraic characterization of a projection is very interesting:

. P is idempotent P 2 = P.
2.9 Family of Vector Spaces in Newtonian Mechanics 67

Fig. 2.8 Parallel projection in .R2

Comment 2.11 The importance of projection operators.

The importance of projection operators stems also from the fact that every
operator, especially in a complex vector space, always contains essentially pro-
jection operators as components. In particular, this is present in connection with
the spectral decomposition of operators. In quantum mechanics and symmetries
in physics, idempotent operators are particularly important.

2.9 Family of Vector Spaces in Newtonian Mechanics

We now come to a completely different situation in which linear algebra is pointing


to analysis, differential geometry, and of course physics. We have to consider not
only one vector space, as is usual in linear algebra, but an infinite number of them
and they all are identical to each other. They are parametrized by the points of the
space we are interested in and which for convenience we take here to be .R2 .

2.9.1 Tangent and Cotangent Spaces of .R2

In the following construction, the elements of .R2 = { p, q, . . . } are used as indices.


They parametrize the various vector spaces. These are canonically isomorphic to
.R at the points . p, q . . . ∈ R . For this construction, we consider for example the
2 2

elements . p, q ∈ R as points and the elements .u, v ∈ R2 as free vectors.


2
68 2 A Fresh Look at Vector Spaces

Fig. 2.9 The vector space .R2 with its standard subspaces (x- and y-axes). .R1 ≡ R1 (0) = Re1 and
.R2≡ R2 (0) = Re2 , T p0 R2 ≡ R2 ( p0 ) = p0 + R2 : the tangent space at the points . p0 , q0 , p1 and
their standard subspaces .R1 ( p0 ) and .R2 ( p0 )

As a next step, we fix a point . p0 ∈ R2 and we add the vector .v so that we have
. p0 + v, a new point which we consider as the point . p = p0 + v ∈ R . It is usually
2

symbolized by an arrow from . p0 to . p. This is demonstrated in Fig. 2.9.


We can do likewise with another vector .w ∈ R2 , and we then obtain another end
point . p ' = p0 + w.
We may do the same with all .v, w, . . . ∈ R2 . We may think that we thus obtained
a new vector space which we denote by

.R2p0 = p0 + R2 .
2.9 Family of Vector Spaces in Newtonian Mechanics 69

We denote .Rq20 , .R2p1 the tangent spaces at the points .q0 and . p1 . Note that in the
literature both notations, .T p0 R2 and .Rnp0 are used.
All of the above is illustrated in Fig. 2.9.
It leads to the following definitions.

Definition 2.22 Tangent vector .v p0 .

A tangent vector .v p0 in .R2 consists of two parts, the point of application


. p0 and the vector part .v. We may write: .v p0 ≡ p0 + v = ( p0 , v) and .w p0 =

p0 + w ≡ ( p0 , w).

It is clear that we represent . p0 by two numbers (its coordinates) and the vector
part .v also by two numbers since we are in .R2 . Similarly, we may take another point
of application .q0 such that:

v = q0 + v
. q0 or wq0 = q0 + w.

Without doubt, the tangent vector .vq0 = q0 + v is different from the tangent vector
v = p0 + v. The point .q0 is different from . p0 , but .vq0 and .v p0 are parallel to each
. p0
other.
The tangent vectors .v p0 and .vq0 are equal if and only if .q0 = p0 and .u = v.

Remark 2.12 Tangent vectors in physics.

You may ask, where there is a connection with physics. In physics, we met
such objects very early! The instantaneous velocity of a point particle moving
in .R2 is exactly what we call here a tangent vector. Instantaneous velocity is,
as we know, a vector, but it is never alone. The point of its application is the
momentary position of the moving particle in .R2 .

If we consider a curve .α in .R2 ,

α : R −→ R2 ,
.

t |−→ α(t),

we get .α̇(0) := (α(0), dα


dt
(0)) = ( p0 , v). Furthermore, we know that the force which
may act on this point particle, is a vector which has the same point of application.
All this is usually understood in physics first on an intuitive level.
70 2 A Fresh Look at Vector Spaces

Coming back to tangent vectors in mathematics, we may consider all possible


tangent vectors for every fixed . p. This means that we fix . p and vary the vector
.v ∈ R . This leads to the following definition:
2

Definition 2.23 Tangent space at . p.


Given a point . p of .R2 , the set

. T p R2 ≡ R2p := {v p := ( p, v) : v ∈ R2 }

consisting of all tangent vectors with . p the point of application, is called the
tangent space of .R2 at the point . p.

Now, we may ask whether .T p R2 ≡ R2p is really a vector space. Yes, it is:

Addition :
. u p + vp := p + u + v = p + (u + v) = ( p, u + v).
Scalar multiplication : λv p := p + λv = ( p, λv).

Then .T p R2 is a vector space and it is clear that .T p R2 is isomorphic to .R2 : (T p R2 ∼


=
R2 ):

R2 −→ T p R2 ,
.

v |−→ ( p, v).

We further observe that we have such a vector space for every . p ∈ R2 .

. p −→ T p R2 .

We consequently obtain a family of vector spaces canonically isomorphic to .R2


which is parametrized by . p ∈ R2 . If we consider the disjoint union of all these vector
spaces, using the symbol .⊔ for a disjoint union, we may write:

. T R2 := T p R2 .
p∈R2

This is called the tangent space of .R2 or equivalently, also the tangent bundle of .R2 .
2.9 Family of Vector Spaces in Newtonian Mechanics 71

Comment 2.12 A tangent bundle is a special vector bundle.

A tangent bundle is a special case of a vector bundle. Vector bundles play an


important role in gauge theories. If we take into account the coordinates of .R2
and .T R2 , we expect to have:

. T R2 = R2 × R2 = {( p, v) : p ∈ R2 , v ∈ R2 }.

It is, in this special case, in bijection with the vector space .R4 : .T R2 ∼
= R4 .
bi j

Remark 2.13 Vector space and its dual, tangent space and its dual.

At this point, we have also to remember that an abstract vector space .V , as


all vector spaces, is never alone. Its dual vector space .V ∗ is always present.
We remember for example that the dual of
[ α1 ]
R2 = {columns of length 2} = {
.
α2
: α 1 , α 2 ∈ R2 }

is
(R2 )∗ = {rows of length 2} = {[ϕ1 , ϕ2 ] : ϕ1 , ϕ2 ∈ R2 }.
.

Note that we write a coefficient (component or coordinate) of a vector (column)


in .R2 with the index upstairs but a coefficient of a covector or linear form
(row) with the index downstairs. This leads, as expected, for every tangent
vector space at . p, T p R2 to its dual .T p∗ R2 ≡ (T p R2 )∗ . Similarly, the dual of
the tangent bundle .T R2 is denoted by .T ∗ R2 and called cotangent space or
cotangent bundle. In this case too, it is easy to recognize the bijection:

. T ∗ R2 ∼
= R2 × (R2 )∗ .
bi j

2.9.2 Canonical Basis and Co-Basis Fields of .R2

In mathematics and in physics, some aspects of analysis occur when we go from a


vector space to a vector bundle (tangent bundle). We have to deal with vector fields
and covector fields. We would like to choose, as above, the space .R2 as an example
72 2 A Fresh Look at Vector Spaces

because this facilitates both our notation, and our explanations. Going from vectors
to vector fields, we may describe this by the map . F which we call a vector field. We
use the usual simplified notation:

. F : R2 −→ R2 × R2 (= T R2 ),
p |−→ ( p, F( p)).

Similarly, we may write for a covector field the map .Θ:

. Θ : R2 −→ R2 × (R2 )∗ (= T ∗ R2 ),
p |−→ ( p, Θ( p)).

As we see, with . F we denote a family of vectors, parametrized by . p ∈ R2 , and with


.Θ we denote a family of covectors, parametrized by . p ∈ R .
2

We follow the same procedure with the basis and the cobasis of a vector space.
We then obtain a basis field and a cobasis field. This means that we get a family of
canonical basis vectors, related to .R2 .

. E : p |−→ ( p; e1 ( p), e2 ( p)) := ( p; (e1 , e2 )).

We represent this similarly as in Fig. 2.9. We get similarly a family of canonical


cobasis vectors:

. E : p |−→ ( p; ε1 ( p), ε2 ( p)) = ( p; (ε1 , ε2 )).

The covectors .ε1 ( p) and .ε2 ( p) are the linear forms (covectors, linear functionals,
linear functions)
.ε ( p), ε ( p) : T p R −→ R
1 2 2

with the duality relation

.εi ( p)(e j ( p)) = δ ij , i, j, ∈ {1, 2}.


[ ]
(ε1 ( p), ε2 ( p)) is the dual basis to .(e1 ( p), e2 ( p)). Note that both, .[e1 e2 ] and . εε2 are
1
.
[ ]
special .2 × 2-matrices. Here, we even have .[e1 e2 ] = 12 and . εε2 = 12 . The basis
1

vector field . E is represented symbolically in Fig. 2.10.

Remark 2.14 Calculus notation.

Traditionally, in calculus, we use another notation, for good reasons, as


explained there.
2.9 Family of Vector Spaces in Newtonian Mechanics 73

Fig. 2.10 The standard basis field in .R2

For the canonical basis field which corresponds to the Cartesian coordinates
. p = (x 1 , x 2 ), we write:


. ≡ ei and d x i ≡ εi .
∂xi
The duality relation is given by

∂ ∂xi
. dxi ( )= = δ ij .
∂x j ∂x j

This may partially explain the use of the . ∂∂x i , d x j notation. Here we used
the notation .ei ( p), εi ( p) because we wanted to emphasize the linear algebra
background.

Summary

Beginning with the elementary part of linear algebra, we introduced and discussed
the first steps of all necessary mathematical concepts which the physics student
should already know and in fact has to use from day one in any theoretical physics
lecture. The reader is probably already familiar with most of these concepts, at least
74 2 A Fresh Look at Vector Spaces

in special cases. But we now offered careful definitions and a catalogue of basic
properties which will better equip readers to follow the pure physics content of any
introductory lecture in theoretical physics. Moreover, this introduction provided the
reader a solid foundation to explore linear algebra in the subsequent chapters.
Right from the outset, we introduced the concept of the dual vector space. This
aspect is often overlooked in physics lectures, leading to difficulties later on, espe-
cially in comprehending tensors.
In physics, abstract vector spaces are rarely encountered. For instance, in New-
tonian mechanics, we begin with an affine Euclidean space and immediately utilize
its model, the corresponding Euclidean vector space. What we accomplished in this
chapter, mathematically speaking, was jumping from an abstract vector space to an
affine Euclidean space and then back to an inner product space with significantly
more structure than the initial abstract linear structure. The reader learned to manip-
ulate structures within mathematical objects, adding and subtracting structures, a
skill utilized throughout the book.
We also introduced the concept of a quotient space. While not typically utilized
in physics until the end of master’s studies, standard quotient spaces in differential
topology and geometry are increasingly relevant in modern fields of physics, such
as gravitation and cosmology.
Finally, we address a topic never found in linear algebra books but crucial for
readers and one of the most important applications of linear algebra to physics:
Newtonian mechanics begins with the concept of velocity, particularly the velocity
at a fixed point. This velocity is essentially a vector in a tangential space where
motion occurs. Hence, in physics, we often encounter families of isomorphic vector
spaces, as discussed at the end of this chapter.

Exercises with Hints

Exercise 2.1 The zero, the neutral element .0 of a vector space.V , is uniquely defined.
Prove that .0 is the only zero in .V .

Exercise 2.2 The inverse of any vector of a vector space .V is uniquely defined.
Prove that for any .v ∈ V there is only one additive inverse.

Exercise 2.3 A scalar times the vector .0 gives once more .0.
Prove that .λ0 = 0 for all .λ ∈ K.

Exercise 2.4 The number .0 ∈ K times a vector gives the zero vector.
Show that for all .v ∈ V , .0v = 0V .

Exercise 2.5 The number .−1 times a vector gives the inverse vector.
Show that for all .v ∈ V , .(−λ)v = λv holds.
2.9 Family of Vector Spaces in Newtonian Mechanics 75

Exercise 2.6 If a scalar times a vector gives zero, then at least one of the two is
zero.
Prove that if .λ ∈ Kv ∈ V , and .λv = 0, then either .λ = 0 or .v = 0.

Exercise 2.7 A matrix face of the field of complex numbers.


Consider the set of real matrices given by
[α ]
0 −α1
C = {A =
. α1 α0 : α0 , α1 ∈ R}.

Show that
(i) if . A, B ∈ C, then . AB ∈ C;
(ii) AB = [B A; ]
.
(iii) if . J = 01 −1
0 then . J J = −12 .
(iv) Check that, as a field, .C is isomorphic to complex numbers

C∼
.= C.

In the following four exercises, it is instructive to check that all the vector
space axioms hold on the given vector space. First, to check the axioms of the
commutative group .V and then the axioms connected to the .K scalar action on
the commutative group .V .

Exercise 2.8 Show that .V = Kn is a vector space.

Exercise 2.9 Show that .V = Km×n is a vector space.

Exercise 2.10 Show that the set of symmetric matrices

. V = {S ∈ Rn×n : S T = S}

is a vector space over .R, but the set of Hermitian matrices

. W = {H ∈ Cn×n : H † = H }

is not over .C.

Exercise 2.11 Show that the space of functions on the set . X ,

. V = R X = Map(X, R),

is a vector space.
76 2 A Fresh Look at Vector Spaces

Exercise 2.12 Square matrices have more structure than a vector space. They form
an algebra (see Definition 2.3).
Show that .Kn×n is an algebra.

The next four exercises are dealing with some aspects of subspaces.

Exercise 2.13 The union of two subspaces of a vector space .V is in general not a
subspace (see Comment 2.5). However, this exercise shows an exception.
Prove that the union of two subspaces .U and .W of .V is a subspace of .V if and only
if one of them is contained in the other.

.U ≤ W ≤ V or W ≤ U ≤ V.

Exercise 2.14 On the other hand, the intersection of subspaces of .V is always a


subspace of .V (see Remark 2.3).
Prove that if .U, W ≤ V , then .U ∩ W ≤ V .

Exercise 2.15 A special addition of two subspaces of .V .


Show that when .U ≤ V , then .U + V = V .

Exercise 2.16 The vector space complement of a given subspace .U of .V is not


uniquely defined.
Show that if .U ⊕ W1 = V and .U ⊕ W2 = V , we get in general .W1 /= W2 .

Exercise 2.17 Linear maps take .0 to .0' .


Show that for a linear map . f ∈ Hom(V, V ' ), then . f (0) = 0' .

Exercise 2.18 The null space is a subspace.


Show that for a map . f ∈ Hom(V, V ' ), .ker f is a subspace of .V ' .

Exercise 2.19 The range is a subspace.


Show that for a map . f ∈ Hom(V, V ' ), .im f is a subspace of .V ' .

Exercise 2.20 Criterion for injectivity.


Show that a linear map . f ∈ Hom(V, V ' ) is injective if and only if .ker f = {0}.

In the following two examples, we consider very special linear maps: Linear
functions, also called linear functionals or linear forms.
2.9 Family of Vector Spaces in Newtonian Mechanics 77

Exercise 2.21
(V ∗ )∗ = Hom(V ∗ , R).
.

Consider a vector.v ∈ V as a linear form denoted by.v # on.V ∗ , with.v # ∈ Hom(V ∗ , R)


is given by

.v* : V ∗ −→ R
ξ |−→ v # (ξ ) := ξ(v).

Show that .v # is indeed a linear function.

Exercise 2.22 We consider the vector space of square matrices .V = Rn×n , and its
dual .V ∗ = (Rn×n )∗ .
Show that the trace .tr ∈ (Rn×n )∗ , given by

. tr : (Rn×n ) −→ R

n
A |−→ tr(A) := αii
i=1

is indeed a linear function.

Exercise 2.23 Properties of the standard inner product on .C:


n
.(v|u) = v̄ T u = v † u = v̄ i u i where v i , u i ∈ C.
i=1

Let .u, v ∈ Cn and .λ ∈ C. Show that


(i) .(v|u) = (u|v) ;
(ii) .(λv|u) = λ̄(v|u);
(iii) .(v|λu) = λ(v|u).

Exercise 2.24 Equality in the Cauchy-Schwarz inequality.


Show that .(u|v) = ||v||||u|| if and only if .u = 0 or .v = λu for some .λ ∈ R.

Exercise 2.25 Equality in the triangular inequality.


Show that .||u + v|| = ||u|| + ||v|| if and only if either .u = 0 or if there is a scalar .λ
with .0 ≤ λ such that .v = λu.

Exercise 2.26 The parallelogram law.


Show that for elements .u and .v in an inner product vector space,

.||u + v||2 + ||u − v||2 = 2||u||2 + 2||v||2 .


78 2 A Fresh Look at Vector Spaces

Exercise 2.27 The polarization identity for a real inner product space .V .
Show that for .u, v ∈ V ,

.4(u|v) = ||u + v||2 − ||u − v||2 .

Exercise 2.28 Extended polarization identity for a symmetric operator . f ,.(u| f v) =


( f u|v), in a real inner product vector space .V .
Show that
.4(u| f v) = (u + v| f (u + v)) − (u − v| f (u − v)).

Exercise 2.29 Expectation values of a zero operator in a complex inner product


space .V .
Show that . f ∈ Hom(V, V ) is the operator .0̂ if and only if .(v| f v) = 0 for all .v ∈ V .

Exercise 2.30 Polarization identity for a complex inner product space .V .


Show that for all .v, w ∈ V ,

4(v|w) = ||v + w||2 − ||v − w||2 − i(||v + iw||2 − ||v − iw||2 ).


.

Exercise 2.31 Extended polarization identity for an operator . f in a complex inner


product space .V .
Show that for all .v, w ∈ V ,

4(v| f w) = (v + w| f (v + w)) − (v − w| f (v − w))−


.

−i((v + iw| f (v + iw)) − (v − iw| f (v − iw))).

Exercise 2.32 Affine subset of a subspace in a real vector space .V .


Let . A = a + U be an affine subset with .U a subspace in .V and .∈ V . Show that .U is
uniquely defined by . A.

Exercise 2.33 Translations in a real vector space .V . Definition: Any vector .v ∈ V


leads to a translation:

T : V −→ V
. a

v |−→ Ta (v) = a + v.

Check the following properties:


(i) T = idV ;
. 0
(ii) T ◦ Ta = Tb+a ;
. b
−1
(iii) . Ta = T−a and hence .Ta is bijective;
(iv) .When a / = 0, Ta is not a linear map.

Exercise 2.34 Affine subsets in a real vector space .V and translations.


Let . A be a subset of .V with .a ∈ A. If .T−a (A) is a subspace of .V , show that . A is an
affine subset of .V and that for any .a ' ∈ A, .T−a ' (A) = T−a (A).
2.9 Family of Vector Spaces in Newtonian Mechanics 79

Exercise 2.35 An affine subspace . A in a real vector space .V contains all its straight
lines.
Show that a subset . A in .V is an affine subset of .V if and only if for all .u, v ∈ A and
.λ ∈ R, .λv + (1 − λ)u ∈ A.

Exercise 2.36 Equivalence proposition for an affine space.


Let .V be a real vector space and . A ⊂ V a subset. Then show that the following
assertions are equivalent:
(i) . A is an affine subset of .V . (There exists a subspace .U of .V and .a ∈ V such
that . A = a + U );
(ii) There is a vector space .V , a linear map . f : V → V ' , and a vector .a ' ∈ V ' such
that
−1 '
.A = f (a );

(iii) There is a list of vectors .(v0 , . . . , vk ) in . A such that


k
. A = { vi α i |α i ∈ K i ∈ {0, . . . , k} and α i = 1}.
i=0

Exercise 2.37 Affine subset in a real vector space .V from a list of vectors.
Show that for a list of vectors .(v0 , . . . , vk ) in .V , there exists an affine subset . A of .V
given by:
. A = {v0 + (vi − v0 )α : α , . . . , α ∈ R and i ∈ I (k)}.
i 1 k

Exercise 2.38 Quotient space and its basis.


Let .U be a subspace of a vector space .V , .(u 1 , . . . , u r ) a basis of .U , and .(v1 +
U, . . . , vl + U ) a basis of .V /U . Show that .(u 1 , . . . , u k , v1 , . . . , vl ) is a basis of .V .
Exercise 2.39 Quotient space and vector space complement.
Let .U be a subspace of the vector space .V . Show that any complement .W of .U such
that .V = U ⊕ W , has the following properties::

. dim W = dim V /U.

The natural surjection .V → V /U gives an isomorphic

v → v + U,
.

. W ∼
= V /U.
Exercise 2.40 Isomorphism theorem for linear maps.
Let. f be a linear map. f ∈ Hom(V, V ' ) and.V, V ' vector spaces. Show that. f induces
an isomorphism:

=
. f¯ : V / ker f −→ im f
80 2 A Fresh Look at Vector Spaces

and when . f is surjective:


∼=
. f¯ : V / ker f −→ V ' .

References and Further Reading

1. S. Axler, Linear Algebra Done Right (Springer Nature, 2024)


2. W.M. Boothby, An Introduction to Differentiable Manifolds and Riemannian Geometry (Aca-
demic Press, 1986)
3. S. Bosch, Lineare Algebra (Springer, 2008)
4. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Studienanfänger. Grundkurs
Mathematik (Springer, 2020)
5. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra. (Pearson, 2013)
6. K.-H. Goldhorn, H.-P. Heinz, M. Kraus, Moderne mathematische Methoden der Physik. Band
1 (Springer, 2009)
7. S. Hassani, Mathematical Physics: A Modern Introduction to its Foundations (Springer, 2013)
8. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
9. N. Jeevanjee, An Introduction to Tensors and Group Theory for Physicists (Springer, 2011)
10. N. Johnston, Introduction to Linear and Matrix Algebra (Springer, 2021)
11. M. Koecher, Lineare Algebra und analytische Geometrie (Springer, 2013)
12. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences (Springer,
2018)
13. J.M. Lee, Introduction to Smooth Manifolds. Graduate Texts in Mathematics (Springer, 2013)
14. J. Liesen, V. Mehrmann, Linear Algebra (Springer, 2015)
15. P. Petersen, Linear Algebra (Springer, 2012)
16. S. Roman, Advanced Linear Algebra (Springer, 2005)
17. B. Said-Houari, Linear Algebra (Birkhäuser, 2017)
18. F. Scheck, Mechanics. From Newton’s Laws to Deterministic Chaos (Springer, 2010)
19. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach (Cam-
bridge University Press, 2022)
20. B.F. Schutz, Geometrical Methods of Mathematical Physics (Cambridge University Press,
1980)
21. G. Strang, Introduction to Linear Algebra (SIAM, 2022)
22. R.J. Valenza, Linear Algebra, An Introduction to Abstract Mathematics (Springer, 2012)
Chapter 3
The Role of Bases

In this chapter, we discuss in detail the basics of linear algebra. The first important
concepts in a vector space are linear combinations of vectors and related notions like
generating systems, linear independent and linear dependent systems. This leads
directly to the central notions of bases of vector spaces and their dimension.
Bases allow us to perform concrete calculations for vectors needed in physics,
such as assigning a list of numbers, the coordinates. This enormous advantage has
its price. The representation of an abstract vector by a list of numbers depends on
the chosen basis. Any theoretical calculation we do should obviously not depend on
the choice of a basis. We discuss in detail the satisfactory but demanding solution to
this problem in Sect. 3.2 which can be skipped on a first reading.
We then demonstrate a suitable choice of basis for the representation of linear
maps, and we discuss the origin of tensors in an elementary manner. Finally, we
provide an important application for physics, and show that the transition from New-
tonian mechanics to Lagrangian mechanics is nothing but the transition from a linear
dependent to a linear independent system.

3.1 On the Way to a Basis in a Vector Space

The multiplicative action of .K on an abelian group .V , as given in Definition 2.4, is


what makes the abelian group the vector space .V we know. We see immediately that
for every fixed vector .v /= 0, we can obtain, by a scalar multiplication, all vectors of
the one-dimensional subspace .vK = Kv in the direction .v. Analogously, for every
direction (or equivalently, every one-dimensional subspace), we need to fix only one
vector to describe the corresponding subspace. It may further seem a surprise that
within linear algebra, we often only need a finite number of such vectors to describe
all elements of .V which contain all these one-dimensional subspaces. This leads to
the existence of a basis in .V .

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 81


N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_3
82 3 The Role of Bases

As mentioned above, one main reason for the existence of a basis stems from the
scalar action of .K which gives the possibility of scaling the elements of a basis and
thus to obtain, after adding, all the elements of the given vector space .V .
In addition, we need an abstract and fundamental property that characterizes a
basis: the notion of linear independence or linear dependence (see Definition 3.4).
Before we can understand this, however, one must first understand the notions of
linear combinations and span.

Definition 3.1 Linear combination and span.


The following expression is a linear combination of vectors in .V .

.α1 v1 + α2 v2 + · · · + αk vk

with the coefficients or scalars .αi ∈ K and the vectors .vi ∈ V , .i ∈ I (k) and
.k ∈ N. .span(v1 , . . . , vk ) denotes the set of all such linear combinations:

( )

n
. span(v1 , . . . , vk ) := α vi
i
for all α ∈ V
i
.
i=1

A linear combination is trivial if all the coefficients are zero, and otherwise,
it is nontrivial. Since the linear combination is by definition a finite sum of
vectors, we use the Einstein convention for the sum, and we denote

α1 v1 + α2 v2 + · · · + αk vk =
.


k
= αi vi =: (αi vi )k , = αi vi ∈ V.
i=1

Usually, if there is no ambiguity, we drop the index .k. We write .v = αi vi and


we mean that the vector .v is a linear combination of the vectors .v1 , . . . , vk .
We can write, just as well, .v = vi αi since the left or right action of .K on .V is
the same.

Comment 3.1 The importance of linear combinations.

We consider linear combinations as the most important operation in linear


algebra. It is very difficult to make a statement in linear algebra without using
the term linear combination. For this reason, it might be useful to discuss various
aspects of linear combinations. We first discuss linear combinations in .Kn and
then the analogous linear combinations in a given vector space .V ' .
3.1 On the Way to a Basis in a Vector Space 83

We study a list of columns in .Kn : A = (a→1 , a→2 , . . . , a→k ) and the associ-
ated .n × k-matrix with the columns .a→s ∈ Kn with .s ∈ I (k) given by .[A] =
a1 a→2 . . . a→k ]. We write .[A] for this matrix to distinguish it from the list
[→
. A = (a→1 , a→2 , . . . , a→k ). Note that we use three kinds of brackets: “()” usually
for a list, “.[]” for a matrix, and the standard “.{}” for a set. We identify of course
.[A] with . A automatically, but sometimes it is useful to make the difference. And

we will do so also here when necessary.

To build formally a linear combination of .k columns in .Kn , we may also use


the map, .ψ A expressed as a matrix multiplication (see Remark 2.8) or as a linear
combination:

ψ A : Kk −→ Kn
.
[ 1] [ 1]
λ λ
λ .
→ = . |−→ ψ A (λ)→ := [A]λ
→ = [→
a →
a . . . →
a ] .. = a→s λs .
.k 1 2 k .k
λ λ

We therefore get for


. span A = span(→
a1 , . . . , a→k ),

where

. → = a→s λs : for all λ1 , . . . , λk ∈ K}.


span A = im ψ A = ψ A (Kk ) := {[A]λ

The range .im ψ A is of course also the set of all possible linear combinations of the
list . A, and so we have .im ψ A ≤ Kn . We see immediately that .im ψ A is a subspace in
.K because it is the image of the linear map. Thus we have also another proof that
n

.span A is, as .im ψ A , a subspace of . V , .span A ≤ V (see Proposition 3.1).


In this context, the dimension of the subspace .im ψ A is also relevant in order to
assign an important number to the list . A. Of course, in our notation, the same is true
for the matrix .[A]). This leads to the next definition.

Definition 3.2 Rank of a list.


rank(A) := dim(im ψ A ) = dim(span A).
.

Usually, we use the map .ψ A when the list . A is a basis. Then the map .ψ A is a basis
isomorphism and is also called a parametrization. This is extensively discussed in
Sect. 3.2.1. But here, . A is taken as an arbitrary list, as it is not necessary to have a
basis to obtain just a linear combination (see Comment 3.1). The same can be applied
to a linear combination with .as ∈ V instead of .a→s ∈ Kn :
84 3 The Role of Bases

ψ A : Kk −→ V,
.
[ ]
λ1
→ |−→ [a1 a2 . . . ak ]
λ .. = as λs ∈ V.
.k
λ

So we again get .[A] = [a1 . . . ak ] and .im ψ A ≤ V and, as above, we can also define
for . A = (a1 , . . . , ak )

. rank A = rank[A] := dim(im ψ A ) = dim(span A).

If we want, we can also identify .ψ A with .[A] and write

.[A] : Kk −→ V,
[ ]
λ1
→ |−→ [A]λ
λ → = [a1 · · · ak ]
→ = [a1 · · · ak ]λ .. ∈ V.
.k
λ

Since .span A := im ψ A , we have already proved the following proposition, taking


A = (v1 , . . . , vk ) and .U = span(v1 , . . . , vk ).
.

Proposition 3.1 The set .span(A) is a vector space.

.U = span(A) with . A = (v1 , . . . , vk ) is a subspace of .V . .(U < V ).

Proof We present a second, different, proof.


This can easily be shown by using the criteria for a subspace given in Remark 2.2.
i. The zero, .0 ∈ U , since .0 = 0v1 + · · · + 0vk ∈ U ;
ii. The set, .U , is closed under addition:

αi vi + β i vi = (αi + β i )vi ;
.

iii. The set, .U , is closed under scalar multiplication:

λ(αi vi ) = (λαi )vi .


.


3.1 On the Way to a Basis in a Vector Space 85

Remark 3.1 The set.U = span(v1 , . . . , vk ) is the smallest subspace of.V con-
taining the list .(v1 , . . . , vk ).

Proof Indeed, a subspace .W of .V containing all .vi of the above list, contains also
all linear combinations of them: .span(v1 , . . . , vk ) ⊂ W . This means that every such
. W is necessarily bigger than .U . Hence .U < W . ∎

Comment 3.2 Lists and colists.


Here, we repeat some facts and add some notations (conventions). Since in
this book we consider only finite-dimensional vector spaces, we have to use
mostly finite families of elements belonging to a given set. We call such a finite
family quite naturally a list. We may call the length or the size of the list, the
number of elements, the cardinality of the list. Very often, the length of a list is
either well-known or not relevant, in this case we do not specify the length. A
list of length .n is exactly what is usually called a .n-tuple. But we consider the
use of the name list as more elegant and more realistic (see also Example 3.1).
It is clear that in this book by definition the length of a list is always finite and
as a family the order matters, and repetitions are allowed. As it is natural, we
write a list horizontally. However, in many cases, and in particular in connection
with matrices (e.g. for a list of rows). We may also need to write a list vertically.
Therefore, in this book we introduce and we sometimes use the name “colist”
for a vertically written list. So the word “colist” always refers to a vertically
written list. But in some cases, very seldom, we allow ourselves to use the name
“list” in a general way, meaning horizontally or vertically written, when it does
not cause confusion.
The distinction between list and colist is especially useful in connection with
our conventions about the position of indices (the tensor formalism convention).
We write a list of vectors always as a horizontal list and, correspondingly,
use lower index notation. Accordingly, we write a list of covectors always as
a vertical list (a colist) and, correspondingly, use the upper index notation. For
scalars, depending on the situation, we write their indices again, as in every
horizontal list, below, and in a colist of scalars we write the indices above as in
every vertical list. The general rule is that if an element has its index below, it
belongs to a list (written horizontally) and if an element has its index above, it
belongs to a colist (written vertically).
86 3 The Role of Bases

Example 3.1 List, colist, and matrices.


The above considerations can easily be demonstrated if we take the vector
space .Rn and its dual .(Rn )∗ :

(Rn )∗ = Hom(Rn , R).


.

An element of .Rn is a vector taken as column which is a n.×1 matrix with scalar
entries which are, here, real numbers. According to our given convention, this
corresponds to a colist of numbers.
Until now, column and colist can be considered as synonyms, particularly if
we consider the elements of .Rn as points. But sometimes, as stated above,
it is necessary and useful to draw the following distinction, for instance if
we multiply matrices: A list and a colist are simply there as a set, as a given
data, and if we want, we can later define some algebraic operations and it is
usually uniquely clear from the context what we mean. Matrices are of course
the well-known algebraic objects. In other words, a .1 × n-matrix with vector
entries (columns) is more than simply a list. For a colist of length .m with any
element, we may write for example symbolically:
⎛ ⎞
∗1
∗2
⎝ . ⎠.
..
.

m

For an .m×n-matrix, we may write symbolically (.m = 3, n ∈ N)


⎡ ⎤
* ··· * ··· *
. ⎣* · · · * · · · *⎦ .
* ··· * ··· *

We may add and multiply the elements .* of the above matrix with each other.
So we may now write for the .m × 1-matrix (column) with entries .*:
[ *1 ]
. ..
.m
*

and for a .1 × 1-matrix, we may have

.[*] or [*1 + · · · + *n ].

Taking .v, a ∈ Rm , we have for instance:


3.1 On the Way to a Basis in a Vector Space 87

⎡ v1 ⎤
.
⎢ .. ⎥
.v = ⎢ i ⎥
⎣ v ⎦,
..
.m
v

and ⎡ α1 ⎤
.
⎢ .. ⎥
.a = ⎢ αi ⎥ ,
⎣ ⎦
..
.m
α

v i , αi ∈ R, i ∈ I (m).
.

For the covector .θ ∈ (Rn )∗ , we may write the list

.θ = (ϑ1 , . . . , ϑs , . . . ϑn ), ϑs ∈ R, s ∈ I (n).

We may associate this with the row matrix .1 × n and with the same symbol .θ
and without the comma:
.θ = [ϑ1 ϑ2 · · · ϑn ].

For the sake of completeness, we write the associated colist of scalars, here
numbers, to the vectors .v and .a above:
( 1)
v
. ..
.m
v

and ( )
α1
. ..
.m
α

Example 3.2 List and colist in .V .


We are going now to apply the above considerations to an abstract vector space
. V . For a list of vectors .v1 , . . . , vk ∈ V , we write for a list

(v1 , . . . , vk )
.

and for the corresponding .1 × k-matrix with entries these vectors:


88 3 The Role of Bases

.[v1 . . . vk ]

which is a row-matrix with entries vectors. Similarly, we write for the list of
the covectors

.θ , . . . , θ ∈ V
1 k

the colist: ( 1)
θ
..
.
.k
θ

and for the .k × 1-matrix with entries the above covectors


[ 1]
θ
..
.
.k
θ

which is a column-matrix with entries covectors.

Note that a vector in .Kn corresponds to a colist (vertically written list) of scalars and
a covector in .Kn to a list of scalars.
We can now proceed with a few additional and important definitions to get to
the notion of a basis in vector spaces which is a very special list of vectors with
appropriate properties.

Definition 3.3 Spanning list, spanning set.


A list . A or a set . A in .V spans .V if every vector in .V can be written as a
linear combination of vectors in . A, that is, if .V = span A holds. In this case,
we say that . A spans or generates .V . In this book, . A is typically finite and we
can write . A = (a1 , . . . , ak ), k ∈ N. So we get

. V = span(a1 , . . . , ak )

and the vector space .V is called finitely generated.

As usual in linear algebra, we consider mainly finitely generated vector spaces.


The notion of linearly independence is fundamental in linear algebra. It has a
deeply geometrical character. But the usual definition is given in an algebraic form
and seems quite abstract. For this reason, and to get a feeling of what is going on,
we need some preparation and we therefore construct a model of what we mean by
“linearly independent” and “linearly dependent”, geometrically speaking.
We start with an informal definition. In some sense, it points indirectly to the geo-
metric character of “linearly independent” and “linearly dependent”. This definition
3.1 On the Way to a Basis in a Vector Space 89

is equivalent to the usual and more abstract algebraic formulation, as we shall show
below. This sounds good, but it is difficult to check.

Definition 3.4 Linearly independent and linearly dependent.


(Informal definition)
We say that the vectors .a1 , . . . , am are linearly dependent, if there exists a
vector in the list which is a linear combination of the others. Otherwise, they
are linearly independent. Alternatively, we may define linear independence
as follows: We say that the vectors .a1 , . . . , am are linearly independent if no
vector in this list is a linear combination of the others. Otherwise, this list is
linearly dependent.

As we see, we have to check here a yes-no question. This, and all the equivalent
definitions of linearly independent and linearly dependent, refer to an abstract vector
space without any other structures, as, for example, volume and scalar product. Even
more, it should be clear, that in this section, we do not know yet what a basis in a
vector space is. This situation makes it difficult to recognize directly the geometric
character of the above definition.
For our demonstration, we therefore choose our standard vector space .Rn , n ∈
N. Here, we know of course the dimension, the volume, and the distance, and we
have the canonical basis . E = (e1 , . . . , en ) in .Rn . In order to proceed, we have to
remember what we mean by a .k-volume in a fixed .Rn where .(k < n). This is in itself
interesting enough. We consider the .k-vectors .(a1 , . . . , ak ) which usually should
define a nondegenerate .k- parallelepiped . Pk := Pk (a1 , . . . , ak ). We know its volume
.vol k (Pk ) = vol k (a1 , . . . , ak ), a positive number (we do not need the orientation,

here), which we may call .k-volume. It is clear that in the case that . Pk is degenerate,
we have .volk (a1 , . . . , ak ) = 0. Similarly, if we consider .k < n, in particular .(k /= n),
we have in an obvious notation .voln (a1 , . . . , ak ) = 0. For all that, we have of course
our experience with our three-dimensional Euclidean space. The generalization to
every fixed .n ∈ N is quite obvious. In connection with this, it is useful to think of
the following sequence of subspaces given by

R1 < R2 < R3 < · · · < Rk < · · · < Rn .


.

We may also see immediately the following results. The .n-dimensional volume,


⎨is positive if k = n and Pk is nondegenerate
.vol n (Pk ) = =0 if k < n


if n < k is not definite.

It is not surprising that this can be expressed with the help of determinants. The
parallelepiped . Pk (a1 , . . . , ak ) corresponds to the matrix . Ak = [a1 · · · ak ] and the
90 3 The Role of Bases

Euclidean volume is given by .volk (Pk ) = det Ak . Taking into account that every list
(a1 , . . . , am ) corresponds also to a parallelepiped, the above result may be stated
.
differently:
If the list . Am = (a1 , . . . , am ) “produces” enough dimension (enough “space”),
which means that .dim(span Am ) = m, we have .volm (Am ) positive (nonzero). If this
list “produces” not enough dimension, which means that .dim(span Am ) < m, we
have .volm ( Am ) = 0. Furthermore, since we are only interested in values positive or
zero, we may define an equivalence relation yes or no.

[ Am ] = yes if volm (Am ) /= 0;


.

[Am ] = no if volm (Am ) = 0.

This is our model for linearly independent and linearly dependent:

. Am = (a1 , . . . , am ) is linearly independent if and only if [Am ] = yes;


Am = (a1 , . . . , am ) is linearly dependent if and only if [Am ] = no.

In the above sense, we can say, to simplify, that linearly independent means “enough
space” and linearly dependent “not enough space”.

Example 3.3 Linearly dependent and linearly independent lists.


The following lists of vectors in .Rn and in .V , with .dim V = n, are linearly
dependent.
(i) .(0, v2 , . . . , vm ), (v1 , 0 . . . , vm ), (v1 , . . . , 0);
(ii) .(v, v, v3 , . . . , vm ), (v, v2 , v, . . . , vm ), (v, . . . , v);
(iii) .(v1 , v2 , v3 , . . . , vm ) with v3 = v1 + v2 ;
.(v1 , v2 , v3 , . . . , vm ) with v3 = v1 λ + v2 λ , λ1 , λ2 ∈ R.
1 2
(iv)
The list. A1 = (v) with.v /= 0 is linearly independent. The list. A∅ = ∅ is linearly
independent since no vector is a linear combination of the rest. Now we are
ready for the usual, more abstract, definition.

Definition 3.5 Linearly independent and linearly dependent.


A list of vectors.a1 , . . . , am is called linearly independent if.ai ξ i = 0 implies
that the only possibility is .ξ i = 0 for all .i ∈ {1, . . . , m}. Otherwise, the list
.a1 , . . . , am is linearly dependent. This means that there exist .ξ ∈ K, not all of
i

them zero, such that .ai ξ = 0.


i

The above definition can be formulated differently:


3.1 On the Way to a Basis in a Vector Space 91

The list .(a1 , . . . , am ) is linearly independent if the equation .ai ξ i = 0 has only the
trivial solution: .ξ i = 0 for all .i. Or else, the list .(a1 , . . . , am ) is linearly dependent
which means that the equation .ai ξ i = 0 has a nontrivial solution.
The next lemma shows a property of a linearly independent list which underlines
the importance of being linearly independent. As we shall see, this property turns
out to be an essential property of a basis in .V .

Lemma 3.1 Linear independence and uniqueness.

Given a list .a1 , . . . , am of vectors in .V , the following statements are equiv-


alent:
(i) The vectors .(a1 , . . . , am ) are linearly independent;
(ii) If .u = ai ξ i is a linear combination of the vector .u (.u ∈ U =
span(a1 , . . . , am )) with the coefficients .ξ i ∈ K, .i ∈ I (m), the coefficients
.ξ are uniquely determined.
i

Proof We start with (i). The list .(a1 , . . . , am ) is linearly independent.


If we write .u = ai ξ i and .u = ai η i , for .ξ i , η i ∈ K, subtracting we have:

0 = ai ξ i − ai η i = ai (ξ i − η i ) ∈ U.
.

Since by assumption (i), .(a1 , . . . , am ) is linearly independent, we obtain

.ξi − ηi = 0 ⇔ ξi = ηi

which shows that (ii) holds.


If we start with (ii), which states that every representation of .u ∈ U is unique
and set .u = 0 ∈ U , we have .0 = ai λi . One solution of this equation is .λi = 0 for
all .i. By assumption of uniqueness, (ii), it follows that this is the only solution. By
definition (see Definition 3.1), the list .(a1 , . . . , am ) is then linearly independent and
so (i) holds. ∎

The next lemma tells us essentially that the informal definition (see Definition
3.4) and Definition 3.5 are equivalent. It is closer to the geometric aspects of the
definition. This means that in a linearly independent list, a vector loss leads to the
loss of spanning space. For a linearly dependent list, there is always a redundant
vector, the absence of which leaves the spanning space of the list invariant. For
example, in the next lemma the vector .a j is redundant.
92 3 The Role of Bases

Lemma 3.2 Linear dependence or a redundant vector.


For a list of vectors.(a1 , . . . , a j , . . . , am ) in V, the following three statements
are equivalent:
(i) The vectors .a1 , . . . , a j , . . . , am are linearly dependent;
(ii) There exists an index . j with .a j ∈ span(a1 , . . . , a j−1 , a j+1 , . . . , am ), that
is, one of the vectors is a linear combination of the rest;
(iii) There exists an index . j with:

. span(a1 , . . . , am ) = span(a1 , . . . , a j−1 , a j+1 , . . . , am ).

Proof We show that (i) .⇔ (ii) and (i) .⇔ (iii) which establishes the result. For this
purpose, we define

. A := (a1 , . . . , am ), Ā := Ā j := (a1 , . . . , a j−1 , a j+1 , . . . , am ).

We set .i ∈ I (m) and .s ∈ {1, . . . , j − 1, j + 1, . . . , m}.


(i).⇒ (ii): We have to show that.a j is a linear combination of the rest:.a j ∈ span( Ā).
From (i), it follows that the equation

a ξi = 0
. i (3.1)

has a nontrivial solution. This means that, for example, there is some . j for which
E j /= 0. Without loss of generality, by scaling if necessary, we may put .ξ j = −1 and
.

we obtain from Eq. (3.1)


.as ξ − 1a j = 0
s
(3.2)

and .a j = as ξ s , which proves (ii).


(ii) .⇒ (i): (ii) means that we can choose .−a j = as ξ s which is .as ξ s + a j = 0. This
proves (i).
(i) .⇒ (iii): From (i) we can conclude that if we take, without loss of generality,
.α = 1 :
j

.a j = as α αs ∈ K.
s
(3.3)

If .v ∈ span A, we have

v = ai v i = as v s + a j v j with j = j0 fixed , v i ∈ K.
. (3.4)

We insert Eq. (3.3) into Eq. (3.4) with . j = j0 fixed (.v i ∈ K) and we so obtain

.v = as v s + (as αs )v j , (3.5)
3.1 On the Way to a Basis in a Vector Space 93

which is
v = as (v s + αs v j ) with v s + αs v j ∈ K.
. (3.6)

Equation (3.6) shows that (iii) is proven.


“(iii) .⇒ (i)”: (iii) says that if .a j ∈ span A, we have also .a j ∈ span Ā. This means
that, as above,
.a j = as α
s

and
. − a j + as αs = 0.

This proves (i) and altogether the above lemma. ∎

The next lemma refers to a special feature of a spanning list. A spanning list is
very “near” to a linearly dependent list.

Lemma 3.3 Spanning and linearly dependent list.

. A m = (a1 , . . . , am ) is a spanning list in . V , . V = span(A m ). If we add to


. Am one more vector, we get . Am+1 = (a1 , . . . , am , v) with .v ∈ V which is a
linearly dependent list.

Proof This is almost trivial: Since.V = span Am , we have.v ∈ span Am which means
that.−v = ai ξ i , i ∈ I (m). This is equivalent to.v + ai ξ i = 0 which shows that. Am+1
is linearly dependent. ∎

The next proposition concerns a relationship between a linearly independent list


and a spanning list. This relationship is completely evident in our above “model”,
following Definition 3.4, with .V = Rn since whenever .k < n, span(e1 , . . . , ek ) <
span(e1 , . . . , en ) = Rn . Equivalently, if . Ak = (a1 , . . . , ak ), like .(e1 , . . . , ek ), is a lin-
early independent list, then we have again.span(Ak ) < Rn . So the length (cardinality)
of any linearly independent list is less or equal to the length of a spanning list.

But in our case, here in this section, we cannot use the above considerations. In
this section, up to now with an abstract vector space .V , we have not yet defined what
a basis is. We neither know what a dimension of a vector space is. The following
proposition will help us, among other things, to prove the existence of a basis in .V .

Proposition 3.2 The length of a linearly independent and a spanning list.

The length of a linearly independent list is less than, or equal to the length
of a spanning list in .V .
94 3 The Role of Bases

Proof Since .V is an abstract vector space, we do not have enough structures to do


calculations. We have to try very elementary steps. We try to exchange the vectors in
the spanning list with those from the linearly independent list. This is the exchange
procedure, which was very important in the older literature. We start with . A ≡ Ar :=
(a1 , . . . , ar ), a linearly independent list and with .C ≡ Cm = (c1 , . . . , cm ), a span-
ning list. We have to show that .r < m which is .*(Ar ) < *(Cm ). We add .a1 to .C and
we so obtain a new list .(a1 , C) which is now, according to our preceding lemma
(3.3), a linearly dependent list and, of course, again a spanning list. Using the linear
dependence Lemma 3.2, we can throw out one of the vectors in .C, and we obtain
.C 1 = (a1 , c2 , . . . , cm ), a spanning list again, with length .m and eventually with a

different numbering.

The next step leads quite similarly to.C2 = (a1 , a2 , c3 . . . , cm ). Proceeding equally,
we obtain .Cr1−1 = (a1 , . . . ar −1 , cr , . . . , cm ), a spanning list again.

The last step leads to .Cr = (a1 , . . . ar , cr +1 , . . . , cm ), again a spanning list. As we


see, .*(Cr ) = *(C), of course. It is clear that we get .r < m (*(Ar ) < *(Cm )). ∎

So far, we discussed two important properties for a given fixed number of vectors
(a1 , . . . , am ) in .V . Such a list of vectors can be linearly independent or not and
.
spanning or not. The possibility of a list of vectors being both, linearly independent
and spanning, seems more attractive than the other three possibilities. This leads to
the definition of a basis for a finitely generated vector space .V which we consider in
this book.

Definition 3.6 Basis.


A list . B = (b1 , . . . , bn ) of vectors in .V is a basis for .V if it is linearly inde-
pendent and spans .V .

We consider four equivalent definitions for a basis in .V .

Proposition 3.3 Equivalent definitions for a basis.

The following four statements are equivalent.


(i) . B = (b1 , . . . , bn ) is a basis for.V that is linearly independent and spanning
.V ;
(ii) Every vector .v ∈ V is a unique linear combination of vectors in . B;
(iii) . B = (b1 , . . . , bn ) is a maximally linearly independent list in .V ;
(iv) . B = (b1 , . . . , bn ) is a minimally spanning list in .V .
3.1 On the Way to a Basis in a Vector Space 95

Proof We show that (i) .⇔ (ii), (i) .⇔ (iii), and (i) .⇔ (iv) which clearly establishes
the result.
(i) .⇒ (ii): Given (i), every .v ∈ V is a linear combination of vectors in . B, since
. B, according to (i), is also linearly independent. The above linear independence and

uniqueness lemma states that this linear combination is unique. So we proved (ii).
(ii) .⇒ (i): Given (ii), every vector .v is a linear combination of vectors in . B, so
. B spans . V . Since this linear combination is unique, the linear independence and

uniqueness lemma tells that . B is also linearly independent. This proves (i). So we
proved the statement (i) .⇐ (ii).
(i) .⇒ (iii): Given (i), we have to show that the linearly independent list . B is
maximal. Since . B spans .V , if we add any vector .v ∈ V to . B, we get .(B, v) =
(b1 , . . . , bn , v), and according to the above remark, this list is now linearly dependent,
and not linearly independent any more. This means that . B is linearly independent
and maximal. This proves (iii).
(iii) .⇒ (i): We have to show that . B is linearly independent and spans .V . Since . B
is already linearly independent and maximal, we have only to show that . B spans .V .
. B being maximally linearly independent, if we add any .v ∈ V , we get .(B, v) which
is now linearly dependent. Therefore, .v is a linear combination of . B. So . B spans .V
and (i) is proven. This is why the statement (i) .⇔ (iii) is proven too.
(i) .⇒ (iv): (i) means that . B = (b1 , . . . , bn ) spans .V and is linearly independent.
According to the linearly dependent lemma above, if we delete a vector of this list
and write, for example . B0 = (b1 , . . . , bn−1 ), then . B0 does not span .V any more. So
. B spans . V and is minimal. This proves (iv).
(iv) .⇒ (i): We start a spanning list with . B minimal. This means for example that
the list . B0 = (b1 , . . . , bn−1 ) does not span .V any more: .span(B0 ) /= span(B). In this
case, the linearly dependent lemma tells us that . B is linearly independent. So. B spans
. V and is linearly independent. This proves (i) and we proved the statement (i) .⇔

(iv). ∎
We considered all the above conditions in detail and will do so as well in what
follows because a basis is our best friend in linear algebra!
The existence of a basis is given by the following proposition:

Proposition 3.4 Basis existence.

Every finitely generated vector space .V possesses a basis.

Proof This can be seen as follows. Since .V is finitely generated, we can start by
a spanning list: Say .span(v1 , . . . , vm ) = V . If the list is not linearly independent,
we throw out some vectors of it until we obtain a minimally spanning list. This is,
according to the above proposition, a basis of .V . ∎
The existence of bases does not mean a priori that every basis for .V has the same
number of vectors (the same length). But, as the next corollary shows, it does.
96 3 The Role of Bases

Corollary 3.1 On the cardinality (length) of a basis.

In a finitely generated vector space .V in a basis, every basis has the same
finite number of vectors.

Proof Here we can apply the Proposition 3.2. The length of a linearly independent
list is less or equal to the length of a spanning list. We start with the two bases . B and
.C. . B is linearly independent and .C spans . V . So we have .*(B) < *(C). Similarly, we

can say that .C is linearly independent and . B spans .V so that we have .*(B) > *(C).
It follows that .*(B) = *(C). ∎

This means that the number of vectors in a basis, the length of a basis, is universal
for all bases in a vector space .V , and this is what we call the dimension of .V .

Definition 3.7 Dimension.

The dimension of a finitely generated vector space is the length of any basis.

We denote the dimension by .dimK V . The dimension depends on the field .K. If the
field for .V is clear, we may write .dim V , but we have to know that, for example,
.dim R V / = dim C V . It now becomes more apparent that the characteristic data of a

vector space .V are the field .K and its dimension. This also justifies the isomorphism
.V ∼= Kn . But since .Kn has much more structures than the abstract vector space .V
with .dimK V = n, the isomorphism refers to those structures in .Kn which correspond
only to the structure of .V .

3.2 Basis Dependent Coordinate Free Representation

Now that we have a basis . B = (b1 , . . . , bn ) for .V , we may ask how many bases
exist for .V . As we already mentioned, Poincaré might have proposed this question to
Einstein. We will discover that this goes very deeply into what relativity is (see also
Chap. 4). Apart from this, to understand linear algebra, it is fundamental to have a
good understanding of the space of bases. But here, the question initially arises what
the individual bases are useful for.
It is therefore helpful to discuss what a given basis makes of an abstract vector
space and, in particular, what a basis makes of a vector.
A given basis . B = (b1 , . . . , bn ) determines for each abstract vector .n numbers,
its coordinates. This leads to the parameterization of the given abstract vector space,
using the standard vector space .Kn and, in particular, it allows to describe each
3.2 Basis Dependent Coordinate Free Representation 97

abstract vector by .n numbers. We also speak of the representation of a vector space


and the representation of a vector by an .n × 1-matrix (a column).

3.2.1 Basic Isomorphism Between . V and .Rn

To facilitate our discussion in the remaining part of this section, we consider a


real vector space with .dim V = n. A basis . B for .V is given by a list of .n linearly
independent vectors.
. B = (b1 , . . . , bn ).

This allows to write a linear combination (unique representation) of every vector


v ∈ V with .ξ i ∈ R and .i ∈ I (n) := {1, 2, . . . , n}:
.


n
v=
. ξ i bi = (ξ i bi )n
I =1

We shall mostly use the notation for the scalars with small greek letters, vectors
with small latin letters, and matrices with capital letters; covectors with small greek
letters, taking care not to confuse them with the notation of scalars.
The scalars .ξ i are also called coefficients or components of .v with respect to
the basis . B. We use the Einstein convention for the summation and in addition
some obvious notations, as usual, setting, for example, .(ξ i bi )n = ξ i bi whenever no
confusion is possible. Further on, we consider the column vector or column-matrix

.ξ:
[ ] ξ1
. ξ→ = .. = (ξ i )n = (ξ i )
.n
ξ

as an element of .Rn identifying .Rn with .Rn×1 (the column-matrices) and using again
an obvious notation. What follows corresponds to Sect. 3.1 and to the notation pre-
sented there. For a fixed . B, this leads to a bijection between the elements of .Rn
and .V , as .ξ→ ↔ v, or more precisely to a linear bijection or isomorphism .ψ B (basis
isomorphism):

ψ B : Rn −→ V,
.
[ ξ1 ]
→ := ξ bi = [B]ξ→ = [b1 . . . bn ] .. .
ξ→ |−→ ψ B (ξ) i
.n
ξ

If we think in terms of manifolds, a basis. B induces here a (global) linear parametriza-


tion for the abstract vector space .V and equivalently a (global) linear chart or coor-
dinate map .ψ −1
B =: φ B from . V to .R :
n
98 3 The Role of Bases

.φ B : V −→ Rn ,
v |−→ φ B (v) = v B = ξ→ ∈ Rn .

The linear map .φ B , given by the basis . B, is also called a representation. It is, as well
as .ψ B , a linear bijection, an isomorphism, given by the basis . B and therefore also
called basis isomorphism.
We might want to identify .ψ B with . B and .ψ −1 −1
B = φ B with . B , and .[B] with . B
and write

=
B : Rn → V and
.

(3.7)
−1 =
B : V → R . n

3.2.2 The Space of Bases in . V and the Group . Gl(n)

As already stated, it is fundamental in mathematics, as well as in physics, to have a


good understanding of the space of bases in a given vector space. Therefore, our aim
is now to determine and discuss the space . B(V ) := {B, . . . } of all bases of .V . We
therefore use the identification .ψ B ≡! [B] ≡ [b1 . . . bn ] ≡! B and we have to consider
the space of basis isomorphisms

. B(V ) = I so(Rn , V ) = {ψ B } .

This point of view allows determining the spaces . B(V ) and finding the correct
behavior under bases and coordinate changes. It also allows us to determine precisely
what a coordinate-free notion means in a formalism, mainly when this formalism
depends explicitly on coordinates. This is also the case with tensor calculus.
What follows is a very interesting and important example and application of Sect.
1.3 about the group action and the definitions there. The key observation is that the
group .Gl(n) of linear transformations (“transformation” in this book is a synonym
for “bijection”) in .Rn , surprisingly acts also on the space . B(V ), even if .V is an
abstract vector space where the dimension .n is not visible as with .Rn . On the other
hand, it is clear that we have the following .Gl(n) actions on .Rn :

Gl(n) × Rn −→ Rn ,
. (3.8)

(g, ξ) |−→ g ξ→

and also (with .[B] ≡ ψ B )

B(V ) × Gl(n) −→ B(V ),


. (3.9)
(B, g) |−→ ψ B ◦ g ≡ [B] ◦ g ≡ Bg.
3.2 Basis Dependent Coordinate Free Representation 99

This action is naturally given by the diagrams

−→
.g :Rn −→ Rn [B] : Rn ∼
= V and
g [B]
Rn −→ Rn −→ V.
.

[B] ◦ g

The following proposition concerning the .Gl(n) action on . B(V ) which we give
without proof, answers our question about the space of bases in .V .

Proposition 3.5 . B(V ) ∼


= Gl(n).
bi j
The group .Gl(n) acts on . B(V ) free and transitively. So we get

. Gl(n)| −→ B(V ),
bi j

g | −→ Ψ(g) := B0 g, B0 ∈ B(V ).

This means that there is a bijection between the elements of . B(V ) and the elements
of .Gl(n) (i.e., . B ↔ g). Because of transitivity, for a fixed basis . B0 , for each . B, we
have .∃!g ∈ G with . B = B0 g.

. B = B0 g or B(V ) = B0 Gl(n).

. B(V ) is an orbit of .Gl(n) relative to . B0 . Note that . B(V ) is the so-called .Gl(n) torsor.
A space which is an orbit of a group .G is called a homogeneous space (see
Remark 1.2). So . B(V ) is a homogeneous space of the group .Gl(n). A free action
means that for every . B1 and . B2 ∈ B(V ) there exists a unique .g12 ∈ Gl(n) so that
. B2 = B1 g12 . This is analogous to the connection between a vector space . V and its

associated affine space. The homogeneous space . B(V ) corresponds to the affine
space . X = (V, T (V ), τ ) as discussed in Sect. 2.5 and the group .Gl(n) corresponds
to the abelian group .V .
The group .Gl(n) is also called the structure group of the vector space .V ∼ = Rn .
Relativity here means that the geometric object .v ∈ V can be represented by an
element of .Rn relative to the coordinate system . B as .ψ −1 B (v) = v B ∈ R .
n

In addition, every different coordinate system .C ≡ ψC ∈ B(V ) is good enough to


represent.V and can be obtained from. B by a transformation:.ḡ ∈ Gl(n). So we obtain
−1
.C = B ḡ and .ψC (v) = vC ∈ R . Obviously, the vector space . V is characterized by
n
100 3 The Role of Bases

the .Gl(n) relativity and in connection with this, the .Gl(n) group is also called the
structure group, and usually in physics, the symmetry group of the theory.

3.2.3 The Equivariant Vector Space of . V

Our next step is to construct a new vector space .Ṽ which contains in a precise way
all the representations of the vectors .v in .V and can be identified with our original
vector space .V . As we shall see, the result is a coordinate-free formulation of the
. Gl(n)-relativity of . V . Coordinate-free here means, by the explicit use of coordinate
systems, that we work with all coordinate systems simultaneously. In other words,
the tensor calculus, as applied in physics and engineering, which depends explicitly
on coordinates, can be formulated in a precise coordinate-independent way. In this
sense, it is equivalent to any coordinate-free formulation if done right in a consistent
notation.
The vector space .V ~ is given as a set .V
~ = {z̃, ỹ, x̃ . . . } of .Gl(n)-equivariant maps
from . B(V ) to .R (see Definition 1.9). As we saw, . B(V ) is a right .Gl(n) space (see
n

Definition 1.4) and we consider .Rn as a left .Gl(n) space. As we saw in Eqs. (3.8)
and (3.9), both actions are canonically given. This justifies the equivariance property
we demand, so we have for .z̃ ∈ Ṽ

. z̃ : B(V ) −→ Rn ,
B |−→ z̃(B),

with
. z̃(Bg) = g −1 z̃(B). (3.10)

See also Comment 1.1 on the meaning of the right action. This may also be shown
by the commutative diagram


. B(V ) −→ Rn
g↓ ↓ g −1
B(V ) −→ Rn . (3.11)

In Eq. (3.11), we interpret the .Gl(n) action on .Rn also as a right action:

Rn × Gl(n) −→ Rn ,
. (3.12)
→ g)
(ξ, →
|−→ g −1 ξ.
3.2 Basis Dependent Coordinate Free Representation 101

~.
Definition 3.8 The equivariant vector space .V

~ can
Taking into account Eqs. (3.10), (3.11), and (3.12), the vector space .V
be written, with .B := B(V ), as
( )
. ~ := Map(B,
V ~ Rn ) = Mapequ B(V ), Rn . (3.13)

For good reasons, we may call .V ~ the equivariant vector space of .V . We have here an
example of a “complicated” vector space (see Comment 2.9 in Sect. 2.6)!.V ~ is a vector
space since it is a vector valued map (.Rn —valued). Furthermore, .dim V ~ = dim V
holds since .z̃ ∈ V ~ is an equivariant and the group .Gl(n) acts transitively on . B(V ). So
if we define .z̃ at one given . B0 ∈ B(V ), then its value is also given by the equivariance
property in every other basis . B, for example, . B = B0 g. This leads to

. z̃(B) = z̃(B0 g) = g −1 z̃(B0 ). (3.14)

The vector space .V ~ = dim Rn = dim V = n. The


~ has the same dimension: .dim V
rest is shown by the following proposition:

Proposition 3.6 The equivariant vector space of .V .

~ = Map(B,
The vector space .V ~ Rn ) is canonically isomorphic to .V , so we
∼ ~
have .V = V .
k

Proof We already know that .V~ is a vector space and that its dimension is.dim Ṽ = n.
Addition and scalar multiplication are given as follows:

~ : z̃, ỹ : B(V ) −→ Rn ,
For z̃, ỹ ∈ V
.

we have (z̃ + ỹ)(B) := z̃(B) + ỹ(B),


and (λz̃)(B) := λz̃(B).

The canonical isomorphism .V ∼


=
k
~ is defined by
V

k :V −→ V
. ~ = Map(B,
~ Rn ),
v |−→ k(v) =: ṽ

with
ṽ(B) := B −1 (v) ∈ Rn
. (3.15)
102 3 The Role of Bases

and applying various identifications, we have

ṽ(B) = [B]−1 (v) = B −1 (v) = ψ −1


. B (v) = φ B (v) = v B = v
→B ∈ Rn . (3.16)

We use the notation .ṽ(B) = v→B ∈ Rn which might be more familiar.


It is evident that .k(v) = ṽ depends only on .v in this canonical way, independently
of a specific. B or, more precisely,.ṽ depends on all. B simultaneously in an equivariant
way, as in Eqs. (3.10) and (3.11) We could also say that this property is consistent
with the group .Gl(n) action. What is now left is to show that the explicitly defined
map .ṽ(B) := B −1 (v) is indeed an equivariant map. So we have altogether:

. B: Rn −→ V,
B ◦ g : Rn −→ Rn −→ V,
g B

R ←− R ←− V : (B ◦ g)−1 ,
n n
g −1 B −1

.ṽ(Bg) = (Bg)−1 (v) = g −1 ◦ B −1 (v) = g −1 ṽ(B). (3.17)

This represents simultaneously the effect of any change of basis and shows that
k(v) ≡ ṽ is indeed equivariant. The proposition is proven, and the identification
.
~ is established.
between .V and .V ∎

This means that geometrically we can work with .V or with .V ~, it is completely


equivalent. The identification of.ṽ ≡ v should now be clear. For every basis. B,.ṽ gives
its representation with this basis (coordinate system). .ṽ ≡ v is the quintessence of all
the representations of the given abstract vector .v ∈ V relative to all authorized (here
. Gl(n)) coordinate systems simultaneously. In this sense, .ṽ ≡ v is coordinate-free!
In order to reformulate our results in a more familiar formalism, we have to
necessarily consider, once again, some of the various notations which also appear in
the literature. So we have for example for the basis . B in .V

ψ B ≡ B ≡ (b1 , . . . , b1 ) ≡ [b1 . . . bn ] ≡ [B].


.

The expression .[b1 . . . bn ] which we sometimes also abbreviate by .[bi ], is a .1 × n-


matrix, that is, a row-matrix with vector entries. It could also represent (as the list
.(b1 , . . . , bn )) the isomorphism .ψ B . This was also presented in the previous section.
What we are doing here is to identify the linear map .ψ B : Rn → V (which is a
parametrization of the abstract vector space .V with the standard vector space .Rn )
with the list of the basis vectors (.b1 , . . . , bb ), and with the .1 × n-matrix .[b1 . . . bn ]
with basis vector entries. Then we denote all these by the symbol . B which we call
simply a basis. For the equivariant map .ṽ we may write, as above, similarly:

ṽ(B) ≡ v→B ≡ (v iB )n ≡ [v] B ≡ v B ∈ Rn .


.
3.2 Basis Dependent Coordinate Free Representation 103

For example, .[v] B is a .n × 1 column-matrix with entries scalars. The diagram (3.11)
g
and Eq. (3.12) express a change of basis via equivariance: . B |−→ B ' = Bg and we
get
→B ' = ṽ(B ' ) = ṽ(Bg) = g −1 ṽ(B) = g −1 v→B .
.v (3.18)

If we set .g −1 = h, we have

v→
. B' = ṽ(B ' ) = ṽ(Bh −1 ) = h ṽ(B) = h v→B . (3.19)

The last equation is the usual form for a change of basis for the coefficient vectors.
In what follows, we recall change of basis in the standard form, as usually done in
physics.
Taking a second basis.C = (c1 , . . . , cn ) for.V , we have analogously.ṽ(C) = vC =
v→C = [v i ]C ∈ Rn . So there exists a matrix .T ∈ Gl(n) with scalar entries .τsi ∈ R:

.i, s ∈ I (n) := {1, · · · , n},

. T = (τsi ) ≡ [τsi ],

so that
ψ B = ψC ◦ T or equivalently B = C T ⇔ C = BT −1 .
. (3.20)

So we have for .v ∈ V
v = ψ B (→
. v B ) = ψC (→
xC ) (3.21)

in various notations:

v = ψ B (→
. v B ) ≡ [B][v i ] B = B v→B = C v→C = ψC (→
vC ). (3.22)

The result of the map .ṽ is again given by .ṽ(C) = v→C . Using Eqs. (3.20), (3.21), and
(3.22), we can write:

v→ = ṽ(C) = ṽ(BT −1 ) = T ṽ(B) = T v→B .


. C (3.23)

The result of Eq. (3.23), .v→C = T v→B , is exactly the result of the equivariant property
of .ṽ.
The appearance here of .v ∈ V as the map .ṽ which is explicitly coordinate (basis)
dependent, legitimizes the formalism of coordinates of linear algebra and the tensor
calculus to be as rigorous as any coordinate-free formulation.
104 3 The Role of Bases

3.2.4 The Associated Vector Space of . V

There is, in addition, a different formalism which shows another aspect of coordinate
independence. There is a second canonical isomorphism of vector spaces:

. V̄ ∼
= V,

where .V̄ is defined in analogy to the vector bundle formalism and by using again
the .Gl(n) action on .B(V ) and .Rn . We will discuss this here shortly since it offers
a different but equivalent point of view. Besides that, it is a very interesting and
important example of Sect. 1.2 dealing with quotient spaces.
We consider the set . M := B(V ) × Rn = {(B, x→)} which is the space of pairs
(basis, coordinate vector). . M is canonically a .Gl(n) space. Setting .G = Gl(n) we
have an action defined as:

M×G
. −→ M,
( )
(B, x→), g |−→ (Bg, g −1 x→) =: (B, x→)g.

For every pair .(C, y→) we consider its .G orbit:

.(C, y→)G := {(C, y→) g = (Cg, g −1 y→) : g ∈ G}

and we define the equivalent class

.[C, y→] := (C, y→)G.

It is not difficult to recognize that this (basis, coordinate vector) class corresponds
bijectively to a unique vector in .V . We expect for example .[B, x→] ↔ ψ B (→ x ). This
leads to the definition (.G := Gl(n)):

. V̄ := (B(V ) × Rn ) / G = {[B, x→]}.

We see that .V̄ is a .G orbit space since every element .[B, x→] = (B, x→)G is a .G orbit;
see also Remark 1.2 on homogeneous spaces. Then the following proposition is valid.

Proposition 3.7 .V̄ , the associated vector space to .V .

n
. V̄ = (B(V ) × R ) / Gl(n) is a vector space and is canonically isomorphic
to .V .
3.3 The Importance of Being a Basis Hymn to Bases 105

λ : V̄ −→ V,
.

[B, x→] |−→ ψ B (→


x ) = x ∈ V.

We may call .V̄ the associated vector space to .V and .λ the isomorphism of the
structure since the vector space .V̄ can also be considered as a model for the
abstract vector space.

Proof The vector space structure of .V is not trivial to reveal:


Addition: .[B, x→] + [C, y→] :=?
We set .C = Bg and we have

[C, y→] = [Bg, y→] = [B, g −1 y→] = [B, →z ]


.

with .→z = g −1 y→, then

[B, x→] + [C, y→] = [B, x→] + [B, →z ] := [B, x→ + →z ].


.

Scalar multiplication:
.α[B, x→] := [B, α→
x ], α ∈ R.


So we have for the isomorphic .V̄ ∼
= V.

Remark 3.2 The four isomorphic vector spaces.

The following isomorphisms are valid.

Rn ∼
. =V ∼
=V~∼
= V̄ !

The isomorphism .Rn ∼


= V is not canonical, that is, it depends on the specific
B ∈ B(V ) we chose. However, .V ∼
. = Ṽ ∼
= Ṽ are, as stated above, canonical
isomorphisms.

3.3 The Importance of Being a Basis Hymn to Bases

We have already demonstrated the usefulness of a basis for .V . This makes it possible
to express an abstract vector .v ∈ V simply by a list of numbers (scalars). This allows
us to communicate to everybody this special vector just by numbers. The price for
106 3 The Role of Bases

this achievement is for example that, in the end, one basis is not enough and that we
have to consider all the bases . B(V ) altogether. This is demonstrated in Sect. 3.2 by
the equivariant map .ṽ : B(V ) → Rn . Stated differently, the price is that instead of a
single element .v, we have to know a special function .ṽ or the equivalence class .[v].
That is, we believe, a fair price!
It gives us even more: with given bases in .V and .V ' , we can, in addition, describe
a linear map . f ∈ Hom(V, V ' ) with a finite amount of numbers. This list of numbers
is organized by a matrix, as is well-known, and there is a linear bijectivity or an iso-
morphism between linear maps and matrices. In addition, many properties and many
proofs within the category of vector spaces can be easily formulated by explicitly
using bases. This is what we are going to demonstrate in what follows. Even more,
we are going to realize in this book that bases are our best friends in linear algebra.
The proposition below shows that a linear map is uniquely determined by the
values of the basis vectors for the domain space:

Proposition 3.8 Basis and linear map.

Given a basis . B = (b1 , . . . , bn ) of .V , then the linear map . f ∈ Hom(V, V ' )


is given uniquely by the values . f (bi ) = wi ∈ V ' i ∈ I (n) = {1, 2, . . . , n}.

Proof For every .v the basis . B delivers a unique expression .v = ξ i bi .


The coefficients .ξ i ∈ K are uniquely defined. This follows from the fact that . B is
also a linearly independent list and from Lemma 3.1 on linear independence and
uniqueness. We define by linearity:

. f (v) := f (ξ i bi ) := ξ i wi . (3.24)

This shows that the value . f (v) is given uniquely: the coefficients .ξ i are uniquely
defined and the values.wi = f (b1 ) are uniquely given. Therefore, there exists at most
one such map. ∎
The following proposition shows that we can choose a tailor-made basis . B0 ,
leading to further essential conclusions, as for example Theorem 3.1 below about
the normal form of linear maps which reveals its geometric character and many very
important corollaries.

Proposition 3.9 Tailor-made bases and linear map.


Let . f : V → V ' be a linear map, .(w1 , . . . , wr ) be a basis of .im f , and
−1
.(z 1 . . . . , z k ) be a basis of .ker f . We choose arbitrary vectors .bi ∈ f (wi ), i ∈
I (r ). Then the list . B0 := (b1 , . . . , br , z 1 , . . . , z k ) is a basis for .V .
3.3 The Importance of Being a Basis Hymn to Bases 107

Proof We show first that .span B0 = V .


For .v ∈ V we have . f (v) = (η i wi )r , η i ∈ K.
Define .u ∈ V by .u := η i bi .
So we have
. f (u) = f (η bi ) = η f (bi ) = (η wi )r
i i i

and we obtain

. f (v) = f (u) =⇒ f (v − u) = 0 =⇒ v − u ∈ ker f =⇒


v − u = (ξ μ z μ )k , μ ∈ I (k) =⇒
v = u + (ξ μ z μ )k = (η i bi )r + (ξ μ z μ )k =⇒ v ∈ span B0 .

We show that . B0 is linearly independent: We choose .0 = (λi bi )r + (ρμ z μ )k . Apply-


ing . f to this equation, we obtain, as .{wi } is a basis for .im f ,
( )
.0 = λi f (bi ) + 0 =⇒ λi wi = 0 =⇒ λi = 0 ∀i ∈ I (r )

so .0 = (ρμ z μ )k is left.
Since .(z μ )k is a basis for .ker f , it follows that .ρμ = 0. This shows that . B0 is
also linearly independent. The list . B0 spans .V and is linearly independent. We so
managed to find . B0 , a tailor-made basis for .V ! ∎

It is clear that this proposition determines the numbers .dim(ker f ) = k and


.dim(im f ) = r . Combined with .dim V = n and .dim V ' = m, it shows important
geometric aspects of the map . f .
From this proposition and the proof, we obtain the following corollaries directly.
These characterize substantially the structure of the vector space homomorphisms
' '
.Hom(V, V ) with . V / = V and subsequently also the structure of linear algebra itself.

Even more, we could say that these corollaries summarize the entire representa-
tion theory of linear maps with .V ' /= V or essentially also the endomorphisms
.Hom(V, V ) if we take two different bases (see the singular value decomposition,

SVD, in Sect. 12.2). On the other hand, if we use for the description of the endomor-
phisms .Hom(V, V ) only one basis, then the problem is more challenging and leads
to more advanced linear algebra (see Chaps. 9 up to 13). We consider, as usual in
linear algebra, finite-dimensional vector spaces.

Corollary 3.2 : Rank-nullity theorem.


dim(ker f ) + dim(im f ) = dim V .
.

Proof From the length of the bases and for the basis-independent subvector spaces
ker f , .im f and .V , .k = dim(ker f ), r = dim(im f ) and .n = dim V , and the basis
.

above . B0 , we see: .k + r = n ∎
108 3 The Role of Bases

Corollary 3.3 Dimension of an affine space . f −1 (w).

For w .∈ V ' , then .dim f −1 (w) = dim(ker f ) holds.

Proof Taking into account Definition 2.18 and Exercise 2.38 about affine spaces and
linear maps, and . f −1 (w) = A(v) = v + ker f from Fig. 2.4, we get .dim f −1 (w) =
dim A(v) = dim ker f . We so obtain directly from the rank-nullity theorem,
−1
.dim f (w) = dim V − dim(im f ) and .dim f −1 (w) = dim(ker f ). ∎

Corollary 3.4 Equivalence for equal dimensions.

For . f : V → V ' linear and .dim V = dim V ' , the following conditions are
equivalent:
(i) . f is injective,
(ii) . f is surjective,
(iii) . f is bijective. ∎

See Exercise 3.22.

Corollary 3.5 Criterion for injectivity.

Let . f : V → V ' be linear, . B = (b1 , . . . , bn ) be a basis for .V and . f (bi ) =


wi for .i ∈ I (n). Then . f is injective if and only if the list .(w1 , . . . , wn ) is
linearly independent.

Proof In Proposition 3.9, we have for.r = n and from. B0 = (b1 , . . . , br , z 1 . . . , z k=0 )


since .span B0 = V . Thus, .dim(ker f ) = 0 and so .ker f = 0. This shows the injec-
tivity. ∎

Corollary 3.6 Criterion for isomorphism.

The map . f ∈ Hom(V, V ' ) is an isomorphism if and only if for a basis . B =


(b1 , . . . , bn ) in .V and a basis . B ' = (b1' , . . . , bn' ) in .V ' , . f (bs ) = bs' , s ∈ I (n)
holds.
3.3 The Importance of Being a Basis Hymn to Bases 109

Corollary 3.7 Canonical basis and basis isomorphism.

For .V a vector space and . B = (b1 , . . . , bn ) a basis for .V , there exists one
canonical isomorphism .ψ B (basis-isomorphism)

ψ B : Kn −→ V with ψ B (ei ) = bi , i ∈ I (n).


.

Let . E = (e1 , . . . , en ) be the canonical basis for .Kn . .ψ B may be considered as


a parametrization of .V . With the inverse map .φ B := ψ −1 B , the pair .(V, φ B ) is
a global linear coordinate chart on .V .

This was used many a time so far!

Corollary 3.8 Identification .Hom(Kn , Km ) with .Km×n .

For every linear map


. f : Kn −→ Km ,

there exists precisely one matrix . F ∈ Km×n such that . f (→


x ) = F x→.

This shows that in this case we do not have to distinguish between linear maps and
matrices, the above . F x→ being of course a matrix multiplication.
Proof Notice that . f (ei ) ≡ Fei := f i are the columns of the matrix . F (see Example
2.23 and Sect. 2.4). So we have . F = [ f 1 . . . f n ]. ∎

Corollary 3.9 Representation of linear maps by matrices.

Given two vector spaces .V with basis . B = (v1 , . . . , vn ) and .V ' with basis
.C = (w1 , . . . , wm ).
Then for every linear map . f : V → V ' , there exists precisely one matrix
. F = (ϕr ) ∈ K such that . f (vr ) = wi ϕri for .r ∈ I (n) and .i ∈ I (m). The
i m×n

map

. MC B : Hom(V, V ' ) −→ Km×n ,


f |−→ F := MC B ( f )

is an isomorphism: .Hom(V, V ' ) ∼


= Km×n .
110 3 The Role of Bases

For the given bases. B and.C ,. F = MC B ( f ) is a representation of the linear


map . f by the matrix . F.

Proof We use the Einstein convention. The position of the indices .i and .r upstairs
and downstairs respectively refers also to the basis transformation properties
(.Gl(n), Gl(m)). As .C is a basis for .V ' , the linear combinations .wi ϕri are uniquely
determined and with the index .r fixed,
⎡ ϕ1 ⎤
r

⎢ ... ⎥
⎢ ⎥
. f r = ⎢ ϕri ⎥
⎣ . ⎦
..
ϕr m

is the .r th column of the matrix . F.


We now show that . MC B is linear: For a second map .g with matrix .G = (γri ) we
have

( f + g)(vr ) = f (vr ) + g(vr ) = wi ϕri + wi γri = wi (ϕri + γri )


.

and
(λ f )(vr ) = λwi ϕri = wi (λϕri ).

So we have with . M := MC B

. M( f + g) = M( f ) + M(g)
M(λ f ) = λM( f ).

Since . B is a basis for .V , . f is, by Proposition 3.8, uniquely defined by the con-
dition . f (vs ) := wi ϕis . Therefore, . F determines uniquely the values of . f : F =
[ f 1 . . . f n ], f (B) = [C]F and . MC B ( f ) = F. So . MC B is bijective. ∎

The most impressive consequence of the usefulness of tailor-made bases is that


they can be used to obtain the normal form for linear maps. The following theorem
reveals the geometric character of a linear map. It relates directly to the vector
spaces involved and their dimensions. Therefore, it could also be considered as the
fundamental theorem of linear maps.

Theorem 3.1 Normal form of linear maps.


Given . f : V → V ' linear, .n = dim V , .m = dim V ' . There exist bases . B0 for
'
. V and .C 0 for . V so that
3.3 The Importance of Being a Basis Hymn to Bases 111

⎡ ⎤ ⎡ ⎤
1r 0 1 0
. MC0 B0 ( f ) = ⎣ ⎦ where 1r = ⎢ ⎥
⎣ ... ⎦ ∈ R .
r ×r

0 0 0 1

Proof From Proposition 3.9 we have essentially the tailor-made bases . B0 and .C0 for
V and .V '
.
. B0 = (b1 , . . . , br , z 1 , . . . , z k ) ∈ B(V ).

We extend the basis in .im f, (w1 , . . . , wr ) ∈ B(im f ), to a basis .C0 of .V ' :

C0 = (w1 , . . . , wr , wr +1 , . . . , wm ) ∈ B(V ' ).


.

Then

. f (bi ) = wi for i ∈ I (r ) and


f (z j ) = 0 for j ∈ I (k).

Remark 3.3 Normal form.

The name “normal form” is historical. Behind it, however, are the notions
of equivalence, relation and quotient space as introduced in Sect. 1.2. Theo-
rem 3.1 is a very prominent example. It corresponds to the simplest possible
representation in every equivalence class. [ ]
Similarly, for example, the normal form . M0 = 10r 00 for an .m × n-matrix
. M, is a special representative of the corresponding matrices which are equiv-
alent to the given matrix . M. This corresponds to an improved form of the
reduced row echelon form (rref). Thus, the set of all these “normal” matrices
.{M0 } is bijective to the corresponding quotient space, as discussed in Sect. 1.2.
If we take .m ≤ n without loss of generality, we have:

Km×n /∼ : {[M] = M ∈ Km×n , rank(M) = r, r ∈ I (m)}.


.

So we get: [ ]

= 1r 0
K
.
m×n
/∼ { | r ∈ I (m)}.
bi j 0 0

We observe that the set .Km×n with infinite cardinality has the quotient space
.K /∼ which is finite:
m×n


=
Km×n /∼
. bi j I0 (n) = {0, 1, 2, . . . , n}.
112 3 The Role of Bases

The relevant equivalence relation .∼ here, is given by the following definition:

Definition 3.9 Equivalent linear maps and equivalent matrices.


The linear maps . f and .g, . f , .g ∈ Hom(V, V ' ) are equivalent: . f ∼ g if and
only if there are automorphisms .Φ and .Φ' (bijective linear maps in .V and .V ' )
and a commutative diagram:

f
. V −→ V '
Φ↑ ↑ Φ'
V −→ V '
g

so that . f ◦ Φ = Φ' ◦ g or equivalently .g = Φ'−1 ◦ f ◦ Φ.


Similarly, the matrices . A and . B, . A, B, ∈ Km×n are equivalent (. A ∼ B) if
and only if . F and . F ' , invertible matrices, exist so that . B = F '−1 A F holds.

Definition 3.10 Similar operators and similar matrices.


In Definition 3.9, if .V = V ' , Φ = Φ' and . F = F ' , then . f, g ∈ Hom(V, V )
are similar if and only if . f ◦ Φ = Φ ◦ g or, equivalently, if .g = Φ−1 ◦ f ◦ Φ
holds. . A, B ∈ Kn×n are similar if and only if . B = F −1 AF holds.

Remark 3.4 On the normal form of endomorphisms.

There is a simple question: Does the same simple normal form (see Theorem
3.1 for a linear map . f ∈ Hom(V, V ' ) also apply to the case where .V ' = V and
.C 0 = B0 , that is, for an endomorphism . f ∈ Hom(V, V )? The answer here is,
no. This is a difficult problem and it leads to the Jordan form. It is the question
of diagonalization or non-diagonalization of endomorphisms (operators) and
square matrices (see Chap. 9, Sect. 9.5 ).
But what would such a simple normal form mean? The operator . f , for
example, would in any case have a representation with a diagonal matrix, that
is, a direct decomposition of the space .V , a discrete list of .n = dim(V ) scalars
.(λ1 , λ2 , . . . λn ) and

. f (u i ) = λi u i , i ∈ I (n),

into one-dimensional subspaces .Ui , with .u i ∈ Ui ,


3.3 The Importance of Being a Basis Hymn to Bases 113

. V = U1 ⊕ U2 ⊕ · · · ⊕ Un ,

and correspondingly into a decomposition of . f of the form

f
. |U i = λi idU i .

We could simply represent each such one-dimensional space .Ui as a null space
of . f − λi idV :
.Ui = ker( f − λi id V ).

It is clear that if for a given index, for instance .i = 1, the scalar .λ1 is zero, then
we have .U1 = ker f and, in addition,

. im f = U2 ⊕ U3 ⊕ · · · ⊕ Un .

We thus obtain also a direct .ker f − im f decomposition

. ker f ⊕ im f = V.

It may be plausible that the search for such a simple decomposition of the
vector space .V cannot be straightforward. See also Proposition 3.13.

Comment 3.3 Notations for matrices of linear maps.


For the matrix . MC B ( f ), there exist many different notations in the literature,
and we are going to add a few more! The symbol “.≡” here means that a different
notation is used for the same object.

. MC B ( f ) ≡ M BC ( f ) ≡ M BC ( f ) ≡ f C B ≡ [ f ]C B
≡ FC B ≡ [ f (b1 )C . . . f (bn )C ] ≡ [F(B)]C .

For fixed . B and .C we can define a basis for .Hom(V, V ' ) : f ir ≡ fri : V → V ' :
(
' wi for s = r
f ≡
. ir fri :V →V : f ir (vs ) .
0, s /= r

Then . MC B ( f ir ) = E ir where . E ir = (0, . . . , 0, ei , 0, . . . 0) with .ei in the .r th


position: ⎡ ⎤
0
..
⎢ ⎥
⎢ 0 ··· 0 1. 0 ··· 0 ⎥
. E ir ≡ E r := ⎢ ⎥.
i
⎣ 0
.. ⎦
.
0
114 3 The Role of Bases

The entry .1 is in the .r th column and in the .ith row. All the entries are zero.
This is another proof of Corollary 3.8. .{ f ir } is a basis of .Hom(V, V ' ) and
. E ir is a basis of .K
m×n
. It should be evident that an isomorphism is a map that
sends a basis to a basis.

3.4 Sum and Direct Sum Revisited

It is instructive to start with the direct product or Cartesian product we know very
well because the comparison with the direct sum provides interesting insights for
both. We first briefly recall its definition:

Definition 3.11 Direct product (Cartesian product) of vector spaces.


The direct product of .U1 × · · · × Um is given by

U1 × · · · × Um := {(u 1 , . . . , u m ) : u 1 ∈ U1 , . . . , u m ∈ Um }
.

with addition .(u 1 , . . . , u m ) + (w1 , . . . , wm ) := (u 1 + w1 , . . . , u m + wm ) and


scalar multiplication by

λ(u 1 , . . . , u m ) := (λu 1 , . . . , λu m ).
.

Remark 3.5 The dimension of the direct product.


The dimension of .U1 × · · · × Um is given by

. dim(U1 × · · · × Um ) = dim U1 + · · · + dim Um .

A characteristic property of the direct product is directly related to the notion of


linear independence and we could call it a (block) linear independence. This leads
to the following definition:
3.4 Sum and Direct Sum Revisited 115

Definition 3.12 Linear independence of a list of vector spaces.


We call a list of vector spaces .(U1 , . . . , Um ) linearly independent if the fol-
lowing holds:
Any list of the form . A = (a1 , . . . , am ) with .a1 ∈ U1 , . . . , am ∈ Um is linearly
independent.

Using the definition in Sect. 2.7, the following results for given subspaces .U1 , ..., Um
of .V are directly obtained.
(i) .U1 + · · · + Um < V ;
(ii) .U1 + · · · + Um = span(U1 ∪ · · · ∪ Um );
(iii) .dim(U1 + · · · + Um ) < dim U1 + · · · + dim Um .
For the sum of two vector spaces, .U1 and .U2 , particularly .U1 ∩ U2 /= {0} not being
excluded, we have .dim(U1 + U2 ) ≤ dim(U1 ) + dim(U2 ).
The exact relation is given by the following statement.

Proposition 3.10 Dimension of a sum of two vector spaces.

dim(U1 + U2 ) = dim U1 + dim U2 − dim(U1 ∩ U2 ).


.

Proof In an obvious notation taking

. B1 = (a1 , . . . , ak ), B2 = (b1 , . . . , bl ), B3 = (c1 , . . . , cm ),

bases of .U1 , U2 , and .U3 = U1 ∩ U2 , we may obtain new bases . B1' and . B2' for .U1 and
.U2 .

. B1' = (c1 , . . . , cm , am+1 , . . . , ak ) and B2' = (c1 , . . . , cm , bm+1 , . . . bl ).

Then we obtain a basis . B4 for .U1 + U2 given by

. B4' = (c1 , . . . , cm , am+1 , . . . , ak , bm+1 , . . . bl ).

So we can read immediately .dim(U1 + U2 ) = m + (k − m) + (l − m) = k + l −


m. This is
. dim(U1 + U2 ) = dim U1 + dim U2 − dim U1 ∩ U2 .


116 3 The Role of Bases

Corollary 3.10 Basis of the direct sum of two vector spaces.


A basis . B of the direct sum .U1 ⊕ U2 is given by the disjoint union of the
two bases . B1 and . B2 of .U1 and .U2 :

. B = B1 ⊔ B2 hence dim(U1 ⊕ U2 ) = dim U1 + dim U2 .

Note: The symbol “.⊔” in . B1 ⊔ B2 here means disjoint union. If we consider


. B1 , B2 as lists, we may write . B as a new list . B = (B1 , B2 ).
Proof This follows directly from the above proof. Here we have .m = 0 so
that . B1 = (a1 . . . . , ak ), B2 = (b1 , . . . , bl ) which is . B1 ∩ B2 = ∅ so that . B =
(a1 , . . . , ak , b1 , . . . , bl ) and of course . B = B1 ⊔ B2 which means also .dim(U1 ⊕
U2 ) = dim U1 + dim U2 . ∎
Another result that is very important and almost evident is given below.

Proposition 3.11 A complementary subspace to .U .


If .U is a subspace of .V , there is always a complementary subspace .W such
that .V = U ⊕ W .

Proof Using appropriate bases in .V and .U , as in the proof of Proposition 3.10,


we see immediately the result: . B1 = (a1 , . . . , ak ) is a basis of .U and . B =
(a1 , . . . , ak , cn , . . . , cl ) a basis of .V . We set .W = span(c1 , . . . , cl ). We so have
. V = U + W and since .U ∩ W = {0}, we obtain

. V = U ⊕ W.

It is evident that the choice of .W is not unique. So we may have, for example, another
subspace .Y , such that again .V = U ⊕ Y.

Remark 3.6 Complement in set theory.


The set-theoretic complement .U c of .U in .V is different: .U c = V \ U /= W.
3.4 Sum and Direct Sum Revisited 117

Remark 3.7 On .ker f − im f decomposition of an operator.


The notion of an . f -invariant subspace of .V is central here. A subspace of .V is
. f -invariant if . F(U ) ⊆ U holds. Remark 3.4 and Theorem 3.1 can lead to the
following question. Can an operator . f ∈ Hom(V, V ) lead to an . f -invariant
decomposition of .V such that

. V = ker f ⊕ im f

holds? For this problem, the spaces .ker f, ker f 2 , im f 2 , im f are relevant,
as is their behavior. All these spaces are . f -invariant subspaces of .V and in
particular the relation
. ker f ≤ ker f
2
(3.25)

holds.
This follows from

x ∈ ker f ⇒ f x0 = 0 ⇒ f 2 x0 = f ( f x0 ) = f (0) = 0
. 0

and so
x ∈ ker f 2 .
. 0

The following proposition provides the answer to the above question.

Proposition 3.12 .ker f − im f decomposition.


Let . f ∈ Hom(V, V ). Then the following assertions are equivalent.
(i) .V = ker f ⊕ im f ,
(ii) .ker f ∩ im f = {o},
.ker f = ker f ,
2
(iii)
.ker f < ker f .
2
(iv)

Proof We already saw that .ker f ≤ ker f 2 (3.25). So assertion (iii) is equivalent to
condition (iv). Therefore, it is enough to show that (i) and (ii) are equivalent to (iv).
We now show that (iv) .⇔ (ii) and (ii) .⇔ (i) which establishes the result.
– (iv) .⇒ (ii)
Given .ker f 2 ≤ ker f , we have to show that .ker f ∩ im f = {0}: Let .z ∈ ker f ∩
im f which means .z ∈ im f or .z = f (x), and also .z ∈ ker f which means .0 =
f (z) = f ( f (x)) = f 2 (x) or .x ∈ ker f 2 . Assertion (iv) leads to .x ∈ ker f which
means . f (x) = 0, and to .z = f (x) = 0, which proves .ker f ∩ im f = {0} which
is assertion (ii).
118 3 The Role of Bases

– (ii) .⇒ (iv)
Given .ker f ∩ im f = {0}, we have to show .ker f 2 ≤ ker f : Let .x ∈ ker f 2 . Then
. f (x) = 0 which means . f ( f (x)) = 0 and . f (x) ∈ ker f . Since . f (x) ∈ im f ,
2

we have . f (x) ∈ ker f ∩ im f = {0} such that . f (x) = 0, x ∈ ker f and .ker f 2 ≤
ker f which proves (iv).
– (ii).⇔ (iv)
The implication (i) .⇒ (ii) is clear by Proposition 2.3 since the direct sum .U1 ⊕
U2 = V means that .U1 ∩ U2 = {0}.
– (ii) .⇒ (i)
Given .ker f ∩ im f = {0}, we have to show .ker f + im f = V . According to
Proposition 2.3, .ker f ∩ im f = {0} means direct sum:

. ker f + im f = ker f ⊕ im f.

The rank-nullity theorem (Corollary 3.2) states that .dim(ker f ) + dim(im f ) =


dim V . Since.ker f ⊕ im f is a subspace of.V , we obtain.ker f ⊕ im f = V which
proves (i) and completes the prove of the proposition.

Proposition 3.13 Direct product and direct sum.

The linear map .Φ : U1 × · · · × Um −→ U1 ⊕ · · · ⊕ Um given by


.(u 1 , . . . , u m ) |−→ Φ(u 1 , . . . , u m ) = u 1 + · · · + u m is an isomorphism:
.(U1 × · · · × Um ) ∼ = U1 ⊕ · · · ⊕ Um .

Proof We first show that .Φ is injective, that is, .ker Φ = {0}: If .Φ(z 1 , . . . , z m ) = 0,
we have .z 1 + · · · + z m = 0. Since the sum on the right hand side above is direct, the
uniqueness of the decomposition of .0 leads to .z 1 = 0, . . . , z m = 0 which shows that
.ker Φ = {0} and .Φ are injective. Further more, the rank-nullity theorem,

. dim(ker Φ) + dim(im Φ) = dim(U1 × . . . × Um ),

with .dim(ker Φ) = 0 gives

. dim(im Φ) = dim(U1 × . . . × Um ),

which shows that .Φ is also surjective. So we proved that .Φ is an isomorphism. ∎


3.5 The Origin of Tensors 119

This means that the dimension of .U1 ⊕ . . . ⊕ Um is also given by

Corollary 3.11 .dim(U1 ⊕ · · · ⊕ Um ) = dim(U1 ) + · · · dim(Um ).

3.5 The Origin of Tensors

As we saw, there are various possibilities to construct a new vector space out of the
two vector spaces .U1 and .U2 . The role of the two bases . B1 and . B2 is particularly
important in this construction. As we saw (Corollary 3.10), for the case of the direct
sum .U1 ⊕ U2 , we have .U1 ⊕ U2 = span(B1 ⊔ B2 ) with .dim(U1 ⊕ U2 ) = dim U1 +
dim U2 . Note that we may also write . B1 ⊔ B2 = (B1 , B2 ) and likewise .U1 ⊕ U2 =
span(B1 , B2 ).
Now we may ask the provocative question: If we take the Cartesian product . B1 ×
B2 instead of the disjoint union . B1 ⊔ B2 , what can we say about the corresponding
vector space .W = span(B1 × B2 )?
Using the same notation as in the corollary above, the basis . BW of .W is given by

. BW = {(as , bi ) : as ∈ B1 , bi ∈ B2 } s ∈ I (k), i ∈ I (l).

Let us now for simplification reasons consider real vector spaces. As we know,
an abstract vector space is completely determined by its dimension, so we have:
.dim W = dim U1 dim U2 . The vector space . W is what is called a tensor product of
.U1 and .U2 and we write . W = U1 ⊗ U2 .

In addition, it is clear that for .W nothing changes if we write for the basis vectors
.(as , bi ) ≡ as bi ≡ as ⊗ bi and for good reasons they may be called product or even
tensor product of the basis vectors .as and .bi . So we may write . BW = B1 × B2 =
{as ⊗ bi } and we have for .w ∈ U1 ⊗ U2

w = w si as ⊗ bi with w si ∈ R.
.

It may also be clear that the new vector space .W , .W = U1 ⊗ U2 is not a subspace of
V . This justifies our characterization of the above question as provocative.
.
The tensor space .U1 ⊗ U2 depends only on .U1 and .U2 , regardless of where these
come from.
As one may already realize, we can hardly find a subject that does not use tensors
in physics. In the Chaps. 8 (First Look at Tensors) and 14 (Tensor Formalism), we
are going to learn much more about tensor products.
120 3 The Role of Bases

3.6 From Newtonian to Lagrangian Equations

As we shall see, the transition from Newtonian to Lagrangian equations is essentially


the transition from a linearly dependent to a linearly independent system. In Newto-
nian mechanics, the Newton Axioms tell us for example the equations of motion for
a point particle with .n degrees of freedom in .Rn . The motion of this point particle
takes place in .Rn without further conditions, except of course the physical forces
that act on this particle. We use Cartesian coordinates for Newton’s equation, these
correspond to a chosen inertial frame. Usually, the space . Q in which the motion takes
place, is called configuration space, and we have here . Q = Rn . If the configuration
space . Q is not .Rn but rather a manifold or equivalently a .n-dimensional surface in
.R , that is, . Q ⊆ R , we have to derive the Lagrangian equations, starting from the
m m

Newtonian equations. In this sense, the Lagrangian equation may be considered as


the appropriate Newtonian equation for the motion in a manifold (Fig. 3.1).
An instrument to derive the Lagrangian equation starting from the Newtonian
equation, is traditionally D’Alambert’s principle. The crucial step here has to do
with linear algebra. It is the transition from a linearly dependent system, usually
expressed in a Cartesian coordinate system in .Rm , to a linearly independent system.
This linearly independent system corresponds to an appropriate coordinate system
in the submanifold . Q with .dim Q = n < m. This fact is often underestimated. Here,
we do the opposite and we take the point of view of linear algebra in order to perform
the same derivation, starting from the Newtonian equation in .Rm .
The motion is constrained to be on the surface . Q which we may parametrize with
.ψ = (ψ ), using a simplified notation as follows:
i

i ∈ I (m), s ∈ I (n)
.

Fig. 3.1 From Newtonian to Lagrangian equations


3.6 From Newtonian to Lagrangian Equations 121

and the functions

.ψ i : Rm −→ R,
(q 1 , . . . q n ) |−→ ψ i (q 1 , . . . , q n ) ≡ x i (q s ),

describing the configuration space. Q of dimension.n, see Fig. 3.1. The variables.q s are
the usual generalized coordinates used in classical mechanics. For our demonstration,
the time dependence is not relevant and is therefore left out. For the mass .m of the
particles, we take, without loss of generality, .m i = m = 1 for all .i ∈ I (m). It is clear
that we cannot solve the Newtonian equation in the usual form (with .m = 1 and the
force . F):
. ẍ (t) = F ,
i i
(3.26)

since for the Cartesian coordinate .x i we have the infinitesimal constrains

∂x i s
.dxi = dq . (3.27)
∂q s

For this construction, in order to use linear algebra, we direct our attention to
the point . p0 ∈ Q. We consider the vector space .V = T p0 Rm ∼ = Rm and its subspace
. W := T p0 Q ∼ = R . Heuristically, we have to “project” Eq. (3.24, 3.26) onto the con-
n

figuration space . Q and particularly onto the corresponding vector space .W at the
position . p0 ∈ Q. This leads to the work done by the force . F with respect to the dis-
placement .d x in .W . Using the dot product .<|> in .Rm and the Newtonian equation
(3.24, 3.26) we obtain
. < ẍ | d x >=< F | d x > (3.28)

and the corresponding tensor equation .(Fi = F i ):

. i x¨ d x i = Fi d x i . (3.29)

In this representation, we may consider .ẍi and . Fi as scalars and .d x i ( p0 ) as element


of the dual of .W :
∗ ∗
.d x (x 0 ) ∈ W = (T p0 Q) .
i
(3.30)

It is clear that we cannot drop the .d x i in Eq. (3.28, 3.29). The covectors .d x i are
linearly dependent as we saw in Eq. (3.26, 3.27). The covectors .dq s ( p0 ) are by
definition linearly independent. At the same time it is also clear that we have to use
Eq. (3.26, 3.27) in order to express Eq. (3.29) with the covectors .dq i ( p0 ). The latter
are linearly independent and so we obtain:

∂x i s ∂x i s

. i dq = Fi dq . (3.31)
∂q s ∂q s
122 3 The Role of Bases

Since .dq s are linearly independent, we may write for all .s ∈ I (n):

∂x i ∂x i

. i = Fi . (3.32)
∂q s ∂q s

From now on, we can proceed exactly the same way as in the physical literature. For
the sake of completeness, we continue with our simplified prerequisites (.m i = 1 and
time independence) and we obtain the Lagrangian equation: For the right hand-side
of the Eq. (3.32), we may write:

∂x i
. Fs := Fi . (3.33)
∂q s

The quantifiers .Fs are called generalized forces. In the case of the existence of a
potential .V , we may write

∂V d ∂V
.Fs = − + . (3.34)
∂q s dt ∂q s

For the left-hand side of Eq. (3.32), using essentially the product rule of differentia-
tion, we obtain:
) ) ) )
∂x i d ∂x i d ∂x i
. ẍ i = ẋi s − ẋi
∂q s dt ∂q dt ∂q s
) i)
d ∂x ∂ ẋ i
= ẋi s − ẋi s . (3.35)
dt ∂q ∂q

We write .v i = ẋ i for the velocity coefficients.


∑ Taking into account that the kinetic
energy .T of the system is given by .T = m (i=1) 2 vi v where .vi = v , and using Eq.
1 i i

(3.27), we get:
dxi ∂x i s
. ≡ ẋ i = v i (q s , q̇ s ) = q̇ , (3.36)
dt ∂q s

∂ ẋ i ∂x i
. = (3.37)
∂ q̇ s ∂q s

and
∂x i ∂ ẋ i ∂v i
. = = . (3.38)
∂q s ∂ q̇ s ∂ q̇ s

This way, Eq. (3.35) takes the form


3.6 From Newtonian to Lagrangian Equations 123
) ) ) )
∂x i d ∂v i ∂v i
. ẍ i = vi s − vi s
∂q s dt ∂ q̇ ∂q
) )
d ∂ 1 i ∂ (1 i)
= ( vi v ) − vi v ,
dt ∂ q̇ s 2
∂q s 2
) )
∂x i d ∂T ∂T
so ẍi s = − . (3.39)
∂q dt ∂ q̇ s ∂q

From Eqs (3.32) and (3.34), we have:


) )
∂x i ∂V d ∂V
. ẍ i = Fs = − s + . (3.40)
∂q s ∂q dt ∂ q̇ s

The equations (3.39) and (3.40) give


) ) ) )
d ∂T ∂T d ∂V ∂V
. − = − , (3.41)
dt ∂ q̇ s ∂q dt ∂ q̇ s ∂q s

and ) )
d ∂ ∂
. (T − V ) − s (T − V ) = 0. (3.42)
dt ∂ q̇ s ∂q

. L := T − V (3.43)

is the Lagrangian function, and as we know from Eq. (3.42),


) )
d ∂L ∂L
. − =0 (3.44)
dt ∂ q̇ s ∂q s

are the Lagrangian equations. As we see, the essential part for the derivation of the
Lagrangian equation up to Eq. (3.32), is just an application of linear algebra.

Summary

We have examined the role of bases from all angles. This role is extremely positive,
particularly concerning physics. We have also pointed out certain potential drawbacks
when expressing certain basis dependent statements. But for these drawbacks, a
satisfactory response was provided. We showed that one can look at all possible
bases simultaneously to a avoid coordinate dependence.
In order to define what a basis of a vector space is, it was necessary to introduce
several elementary concepts, such as the concept of a generating system of a vector
space and linear dependence or independence. When the number of elements in a
generating list is finite, we call such a vector space “finitely generated”. These are
124 3 The Role of Bases

precisely the vector spaces we discuss in this book. The associated dimension of a
vector space was then simply defined by the number of elements in a basis.
In the remaining chapter, some of the most important advantages resulting from
the use of bases were discussed. Perhaps the most significant is that through bases,
abstract vectors and abstract maps can be expressed by a finite number of scalars.
Thus, bases enable concrete calculations of a theory to be performed and compared
with experiments whose results are essentially numerical.
Bases allow us, in particular, to maintain the geometric character of a linear map.
By choosing suitable tailored bases, one can find the simplest possible representa-
tion matrix of the corresponding map, which in most cases only has entries on the
diagonal that are nonzero. This is the so-called normal form of a linear map, it may
be considered essentially as the fundamental theorem of linear maps.
With the help of bases, this chapter presented the first and probably the easiest
access to tensors. At the end of the chapter, a perhaps surprising application of linear
algebra to classical mechanics was discussed.

Exercises with Hints

Exercise 3.1 The span of a list in a vector space is the smallest subspace containing
this list.
Let .V be a vector space and . A = (a1 , . . . , ak ) a list of vectors in .V . Show that .span A
is the smallest subspace of .V containing all the vectors of the list . A.

Exercise 3.2 A linearly independent sublist in a linearly dependent list.


If in the linearly dependent list.(a1 , . . . , ar , v) of vectors in.V , the sublist.(a1 , . . . , ar )
is linearly independent, show that the vector .v is a linear combination of the list
.(a1 , . . . , ar ).

The following exercise is a variation of Exercise 3.2.

Exercise 3.3 If the list .(a1 , . . . , ar ) in a vector space .V is linearly independent and
.v ∈ V , then show that the list .(a1 , . . . , ar , v) is linearly independent if and only if
.v ∈
/ span(a1 , . . . , ar ).

Exercise 3.4 This exercise shows that the length of a spanning list plus one vector
more, is always linearly dependent.
Suppose that the list . Am = (a1 , . . . , am ) of .m vectors spans the vector space .V
(.span Am = V ). Show that any list . Am+1 with .m + 1 vectors (not necessarily con-
taining . Am ) is always linearly dependent.
3.6 From Newtonian to Lagrangian Equations 125

The following exercise proves the existence of a basis in a finitely generated


vector space by extending a linearly independent list, using the results of the
previous Exercise 3.3.

Exercise 3.5 Suppose that the vectors .(a1 , . . . , ar ) are linearly independent. Show
that either .(a1 , . . . , ar ) is a basis in .V or that there are vectors .(ar +1 , . . . , an ) such
that .(a1 , . . . , ar , ar +1 , . . . , an ) is a basis of .V .

Exercise 3.6 Using the extension of a linearly independent list to a basis as in the
previous Exercise 3.5, we obtain the following result.
Show that all bases of a vector space have the same length. This means that if . B(V )
is the set of bases in .V and . B1 , B2 ∈ B(V ), then .l(B1 ) = l(B2 ).

Now we give another definition for the dimension of a vector space which
does not rely on a property of a basis, as usual in literature. Only the notion of
linearly dependent or linearly independent will be used.

Definition: Dimension of a vector space .V :


We consider the following set of integers given by

N(V ) := {m ∈ N : any m + 1 vectors of V are linearly dependent}


.

and
. dim V := min N(V ).

Note that for finitely generated vector spaces, as we consider them in this book,
the set .N(V ) is nonempty, .N(V ) /= ∅ and if .V /= {0}, then .dim V ≥ 1.

Exercise 3.7 Given the above definition, show that if .dim V = min N(V ) = n, then
the length .l(B) of any basis . B of .V is given by .l(B) = n.

The next two exercises are almost trivial. Using the definition of dimension by
N(V ) might make them even easier to prove.
.
126 3 The Role of Bases

Exercise 3.8 Subspace and its dimension.


If .U is a subspace of a vector space .V , then show that

. dim U ≤ dim V.

Exercise 3.9 Subspaces with the maximal dimension.


If .U is a subspace of a vector space .V and .dim U = dim V , then show that

U = V.
.

In the next two exercises, if we know the dimension of a vector space, we


consider one additional condition for a list to become a basis.

Exercise 3.10 Linearly independent list in a vector space .V with length equal to
.dim V .
Let . A = (a1 , . . . , ak ) be a linearly independent list of vectors in .V with .dim V = n.
Show that if .k = n, then . A is a basis of .V .

Exercise 3.11 Spanning list in a vector space .V with length equal .dim V .
Let . A = (a1 . . . , ak ) be a spanning list of vectors in .V with .dim V = n. Show that
if .k = n, then . A is a basis of .V .

Exercise 3.12 Linear combinations and the basis map.


Let . A = (a1 , . . . , ak ) be a list in the vector space .V and a linear combination

→ with λs ∈ K, λ
a λs = A λ
. s → ∈ Kn

and the basis map .Ψ A given by

.Ψ A : Kn −→ V
es |−→ as .

Show the following assertions:


(i) . A is a basis in .V if and only if .Ψ A is an isomorphism;
(ii) . A is a linearly independent list if and only if .Ψ A is an injection;
(iii) . A is a spanning list if and only if .Ψ A is a subjection.

The following two exercises correspond to simple examples of linear maps.


3.6 From Newtonian to Lagrangian Equations 127

Exercise 3.13 Linear maps on a one-dimensional vector space.


If . f ∈ Hom(V, V ) with .dim V = 1, show that . f is a scalar multiplication, that is,
there is a .λ ∈ K such that :

. f (v) = λv for all v ∈ V.

Exercise 3.14 If . f is a linear map, . f ∈ Hom(Kn , Km ), show that there exist scalars
.ϕs ∈ K with .s ∈ I (n) and .i ∈ I (m) such that for every .v→ = es v s ∈ Kn , .v s ∈ K and
i

.(es )n the standard basis in .K


n

⎡ 1⎤ ⎡ 1 s⎤
v ϕs v
⎢ .. ⎥ ⎢ .. ⎥
.F : ⎣
. ⎦ −→ ⎣ . ⎦ holds.
vn s v
ϕm s

The following exercises concern properties of linear maps.

Exercise 3.15 The image of a basis already determines a linear map. This ensures
the existence of a linear map as was required in Proposition 3.8.
Let . B = (b1 , . . . , bn ) be a basis of a vector space .V and .(w1 , . . . , wn ) any list of
vectors in a second vector space .W . Show that there exists a unique linear map
. f : V → W such that . f (bs ) = ws ∀ s ∈ I (n).

Exercise 3.16 Any linear map preserves linear dependence.


Let . f be a linear map . f : V → V ' . If the list .(v1 , . . . , vk ) in .K is linearly dependent,
show that the list .( f (x1 ), . . . , f (vk )) in .V ' is also linearly dependent.

Exercise 3.17 The preimage of a linear map preserves the linear independence in
the following sense.
Let . f be a linear map . f : V → V ' . Show that the list .(v1 , . . . , vr ) in .V is linearly
independent if the list .( f (v1 ), . . . , f (vr )) in .V ' is linearly independent.

Exercise 3.18 If a linear map is injective, it preserves the linear independence.


Let . f be an injective linear map . f : V → V ' . If the list .(v1 , . . . , vr ) in .V is linearly
independent, show that the list .( f (v1 ), . . . , f (vr )) in .V ' is also linearly independent.

Exercise 3.19 The inverse map of a bijective linear map is also linear.
If the map . f : V → V ' is an isomorphism, show that the inverse map

. f −1 : V ' −→ V

is also an isomorphism (bijective and linear).


128 3 The Role of Bases

Exercise 3.20 All isomorphic vector spaces have the same dimension.
Show that vector spaces (finite-dimensional) are isomorphic if and only if they have
the same dimension.

Exercise 3.21 Criterion for isomorphism. (Corollary 3.6)


Show that the map . f ∈ Hom(V, V ' ) is an isomorphism if and only if for any basis
. B = (b1 , . . . , bn ) of . V

. f (B) = { f (b1 ), . . . , f (bn )}

is a basis of .V ' .

Exercise 3.22 Equivalence for equal dimensions. (Corollary 3.4)


Show that when . f : V → V ' is linear and .dim V = dim V ' , then the following con-
ditions are equivalent.
(i) . f is injective ;
(ii) . f is surjective ;
(iii) . f is bijective .

Exercise 3.23 Injectivity and dimensions.


Let.V and.V ' be vector spaces with.dim V > dim V ' . Show that any. f ∈ Hom(V, V ' )
is not injective.

Exercise 3.24 Surjectivity and dimensions.


Let .V and .V ' be vector spaces with .dim V < dim V ' . Show that . f ∈ Hom(V, V ' ) is
not surjective.

The following three exercises concern sums and direct sums of a vector space.

Exercise 3.25 Let .U1 , . . . , Um be subspaces of a vector space .V . Verify the follow-
ing results (see Definition 3.12 and the pages thereafter):
(i) .U1 + · · · + Um ≤ V ;
(ii) .U1 + · · · + Um = span(U1 ∪ . . . ∪ Um ) ;
(iii) .dim(U1 + · · · + Um ) ≤ dim U1 + · · · + dim Um .

Exercise 3.26 Equivalent conditions for a direct sum of subspaces of a vector space.
Let .U1 , . . . , Um be subspaces of a vector space .V and .U = U1 + · · · + Um . Show
that the following conditions for a direct sum

U = U1 ⊕ · · · ⊕ Um
.

are equivalent.
3.6 From Newtonian to Lagrangian Equations 129

(i) Every .u ∈ U has a unique representation .u = u 1 + · · · + u m with .u j ∈ U j for


each . j ∈ I (m);
(ii) Whenever .u 1 + · · · + u m = 0 and .u j ∈ U j , for each . j ∈ I (m) we have .u j := 0
for all .U j ∈ I (m);
(iii) For every . j ∈ I (m), .U j ∩ (U1 + · · · + U j−1 + U j+1 + · · · + Um ) = 0.

Exercise 3.27 Equivalent conditions for a direct sum decomposition of a vector


space.
Let.U1 , . . . , Um be subspaces of a vector space.V . Show that the following conditions
are equivalent:
(i) .V = U1 ⊕ · · · ⊕ Um ;
j j
(ii) For every . j ∈ I (m) and every basis of .U j , . B j = (b1 , . . . , bn j ), . B =
(B , . . . , B ) is a basis of .V ;
1 m

(iii) .V = U1 + · · · + Um and .dim V = dim U1 + · · · + dim Um .

The next exercises concern another point of view concerning the origin of ten-
sors (Sect. 3.5) and needs some preparation. We have to compare the Cartesian
product with the tensor product. The role of the scalar field is different. For
this comparison we consider the following two exercises.

Exercise 3.28 Cartesian product.


Let .U and .V be two vectors spaces with .dim U = k and .dim V = l. We denote their
Cartesian product by
.U × V ≡ (U × V, ·)

and the scalar action of the field .K explicitly by the dot .·. So we have, as usual, for
.λ ∈ K and .(u, v) ∈ U × V :

λ · (u, v) ≡ λ(u, v) := (λu, λv).


.

Show that.(U × V, ·) is a vector space and that its dimension is.dim(U × V ) = k + l.

Exercise 3.29 Tensor product.


Let .U and .V be two vector spaces with .dim U = k and .dim V = l. We denote their
tensor product by .U ⊗ V = (U × V, Θ) and the scalar action on .U ⊗ V by .Θ. The
scalar action of .K is given by

.λ Θ (u, v) ≡ λ(u, v) = (λu, v) = (u, λv),


130 3 The Role of Bases

and we have the bilinearity conditions

. (λ1 u 1 + λ2 u 2 , v) = λ1 (u 1 , v) + λ2 (u 2 , v) and
(u 1 , λ1 v1 + λ2 v2 , v) = λ1 (u, v1 ) + λ2 (u, v2 ).

To distinguish from .U × V in Exercise 3.28, we write .(u, v) as .u ⊗ v from now on.


The elements of .U ⊗ V are a linear combination of the type .u ⊗ v. Show that .U ⊗ V
is a vector space and that its dimension is .dim U ⊗ V = k · l.

References and Further Reading

1. S. Axler, Linear Algebra Done Right (Springer Nature, 2024)


2. S. Bosch, Lineare Algebra (Springer, 2008)
3. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Studienanfänger. Grundkurs
Mathematik (Springer, 2020)
4. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
5. S. Hassani, Mathematical Physics: A Modern Introduction to its Foundations (Springer, 2013)
6. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
7. N. Jeevanjee, An Introduction to Tensors and Group Theory for Physicists (Springer, 2011)
8. N. Johnston, Introduction to Linear and Matrix Algebra (Springer, 2021)
9. M. Koecher, Lineare Algebra und analytische Geometrie (Springer, 2013)
10. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences (Springer,
2018)
11. J.M. Lee, Introduction to Smooth Manifolds. Graduate Texts in Mathematics (Springer, 2013)
12. J. Liesen, V. Mehrmann, Linear Algebra (Springer, 2015)
13. P. Petersen, Linear Algebra (Springer, 2012)
14. S. Roman, Advanced Linear Algebra (Springer, 2005)
15. B. Said-Houari, Linear Algebra (Birkhäuser, 2017)
16. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach (Cam-
bridge University Press, 2022)
17. G. Strang, Introduction to Linear Algebra (SIAM, 2022)
18. R.J. Valenza, Linear Algebra. An Introduction to Abstract Mathematics (Springer, 2012)
Chapter 4
Spacetime and Linear Algebra

It is well known that Newtonian mechanics and electrodynamics are fundamental


theories of physics and also theories of our physical spacetime. Any theory of space-
time is, of course, a geometrical theory. As we already saw, linear algebra is, in
many aspects, also geometric. The surprise is that linear algebra, along with some
group theory, allows us to describe spacetime geometry in Newtonian mechanics and
electrodynamics. Consequently, to describe this surprise, we will show the relation
of linear algebra with the two central principles of Newtonian mechanics and elec-
trodynamics, in particular with the law of inertia and the relativity principle, with
linear algebra.
For thousands of years, it was thought to be an absolute convention that space and
time are a priori given and physics itself had to be formulated in this context. This, by
itself a very plausible position, was also taken in science for more than two thousand
years. It has its roots in the fascination and power of the Euclidean axioms. But
since Gauss and Riemann we had suspected, and since Einstein we have known that
physics is the one that determines the structure of spacetime. This is demonstrated
in this chapter.

4.1 Newtonian Mechanics and Linear Algebra

In Newtonian mechanics, it turns out that we only need the first Newtonian law to
determine the structure of spacetime.
First Newtonian law: Every body continues in its state of rest or of uniform
rectilinear motion, except if it is compelled by forces acting on it to change that state.
This law refers particularly to a trajectory .x→(t) of a mass point. It corresponds to
the well-known equation of motion without the presence of a force:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 131
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_4
132 4 Spacetime and Linear Algebra

d2
. x→(t) = 0. (4.1)
dt 2
This means that we here postulate the existence of a special reference frame in
which the solutions of the above equation are straight lines with constant velocity
(vanishing acceleration). Without going into details, we realize intuitively that the
space where the movement takes place must be a manifold which contains straight
lines. For instance, vector spaces and affine spaces (see Sect. 2.5) are such manifolds
which may contain straight lines as a subspace. On the other hand, there is not enough
room for straight lines in a sphere or a cube.

Definition 4.1 Reference frames (choice of coordinates) in which the first


2
Newtonian law above has indeed the analytic form . dtd 2 x→(t) = 0 are called
inertial frames.

In other words, Newton postulates by the above equation of motion, as Galilei,


the existence of an inertial frame. This is the principle of the law of inertia:
A force-free body remains at rest or in a state of rectilinear and uniform motion
if the spatial reference system and the time scale are chosen appropriately.
The law of inertia leads first to a four-dimensional affine spacetime, and it is
valid for both Newtonian mechanics and electrodynamics (special relativity) with
additional different geometric structures.
This may seem a pretty harmless formulation at first, but it is a massive step at once
for physics. There are not many manifolds where straight lines have enough space.
As already stated above, there has to be an affine space with additional structures.
For Newton, presumably, all the above discussion was only a question of consis-
tency since he assumed, according to the spirit of that time, that a priori the motion
takes place in a Euclidean space . E 3 , which is a three-dimensional affine space with
an inner product (dot product). His equation of motion was consistent with the math-
ematically given physical space or spacetime. Where time is concerned, Newton also
assumes a priori an affine one-dimensional Euclidean space . E 1 .
Although the notion of spacetime was not current in those times, we may assume
that his point of view of spacetime would be a four-dimensional manifold . M. If M is
not precisely equal to the Cartesian product . E 1 × E 3 , it is at least isomorphic to it:

. M∼
= E 1 × E 3.

Our intention is not to discuss the physical relevance of the law of inertia but
to compare the physical situation with mathematics, especially with linear algebra.
We hope that this helps appreciate the different roles of mathematics and physics
and their connection. This relation can be demonstrated within linear algebra in a
very transparent way. As we saw, in physics, we have to postulate the existence of
an inertial frame. This is a huge step in understanding our world. Its validity has
4.1 Newtonian Mechanics and Linear Algebra 133

to be examined and tested by experiments, of course. It is understood that here is


not the place for such discussions. But still, turning back to linear algebra, we may
first assume, for the moment, as a model for the comparison, that our spacetime
within linear algebra corresponds to a vector space. In this case, in mathematics, the
existence of an inertial frame corresponds to the existence of a basis in a vector space
which has only to be proven. As we see, the situation in mathematics is obvious. It is
also clear that we cannot do the same in physics. The existence of a frame of inertia
has to be postulated.
After having postulated in physics the existence of a frame of inertial and having
proven in linear algebra the existence of a basis in a vector space (see Proposition
3.4), we may now ask naively how many frames of inertia do exist and in analogy how
many bases exist in a vector space. Stated differently, we look for the set of Newtonian
inertial frames (which we denote by . I F(M) with . M the Newtonian spacetime), and
similarly, for the set of bases . B(V ) in a vector space .V . This leads to the principle
of relativity.
The equations describing the laws of physics have the same form in all admissible
frames of reference.
Bearing this in mind, we have to determine those admissible transformations
which transform one inertial frame into the other. We expect that the set of all these
transformations builds a group called the Galilean group or Galilean transformations
for good reasons.
In linear algebra we already answered this question analogously: The set of all
bases . B(V ) of an .n-dimensional vector space .V is given by the group of automor-
phism .Aut(V ) in .V , the linear transformations in .V , which is isomorphic to the real
group .Gl(n). So we have the isomorphism between groups:

. Aut(V ) = Gl(n).

The group .Aut(V ) consists precisely of those transformations which respect the
linear structure of .V . We here consider first an abstract vector space without further
structure. This can be described by the action of the group .Gl(n) on . B(V ). With the
basis . B = (b1 , . . . , bn ) and .g = (γsi ) ∈ Gl(n); γsi ∈ R; i, s ∈ I (n) = {1, . . . , n}; we
have:

. B(V ) × Gl(n) |−→ B(V )


(B, g) |−→ B ' := Bg = [bi γ1i , . . . , bi γni ].

It is well-known that .Gl(n) acts on . B(V ) freely and transitively from the right,
from which follows that the two sets, even having quite different structures, are still
bijective (see Proposition 3.5):

. B(V ) ∼
= Gl(n).
bi j
134 4 Spacetime and Linear Algebra

For this reason, we may call .Gl(n) the structure group of .V . Indeed, the above action
of .Gl(n) on . B(V ) completely characterizes the linear structure on the set .V via the
set of the basis . B(V ) of .V . This is the deeper meaning of the isomorphism between
groups
. Aut(V ) ∼
= Gl(n).

This discussion within linear algebra is the model that significantly clarifies the
corresponding discussion within the Newtonian mechanics and spacetime. So we
can apply the above procedure also in Newtonian mechanics.
As already stated, the transformations which are implied by the law of inertia and
the relativity principle are given by the Galilean group. To simplify, we here consider
the part of the Galilean group which is connected with the identity .G(a.k.a.G ≡
(+) (+)
G 1 ≡ Gal ↑ ). This is the so-called proper orthochronous Galilean group .Gal ↑ . In
an obvious notation, the element .g ∈ G that maps inertial frames to inertial frames
is given by the following expression:

.t | −→ t + s

x→ |−→ x→ ' := R x→ + wt
→ + a→ (4.2)

with. R ∈ S O(3), w, → a→ ∈ R3 , s ∈ R.. R corresponds to a rotation and.a→ to a translation


in space. .w → corresponds to a velocity transformation and .s to a time translation.
Velocity transformation .w → means that for a given frame of inertial . I F, any other
frame . F ' which moves with constant velocity .w → relatively to . I F, is also a frame of
inertia .(F ' = I F ' ). It is clear that all the above expressions are mathematics which
belong to linear algebra. But linear algebra is in addition useful as it gives an analogy,
a simple model, for the situation in physics.
It also helps to clarify the role of the Galilean group in determining the structure
of spacetime in Newtonian mechanics. So we may now proceed similarly as in linear
algebra. We already know the Galilean group in physics, which was determined for
the law of inertia and the relativity principle. We consider the set of inertial frames
. I F(M) in the spacetime of Newtonian mechanics. The Galilean group . G acts freely
and transitively from the right on . I F(M):

. I F(M) × G −→ I F(M)
(I F, g) |−→ I F ' := I Fg.

So we have as in linear algebra before the bijection

. I F(M) ∼
= G.
bi j

The Galilean group is the structure group of spacetime; this means that we have the
isomorphism of groups:
4.1 Newtonian Mechanics and Linear Algebra 135

. Aut(M) ∼
= G.

That means that the Galilean group ultimately determines the Newtonian spacetime
. M structure: We first see from the above expression for the Galilean transformations
that . M is an affine space with additional structure. We have now to determine and
describe the additional structure in this affine space. It is helpful to return to our linear
algebra model. We learned from the equation of motion in Newtonian mechanics that
our model of spacetime as a vector space .V is not realistic and should be an affine
space. So we have indeed to take the affine space corresponding to the vector space
. V , which we denote by . A given by the triple .(A, V, ψ) as discussed in Sect. 2.5 with

.ψ the action of . V on . A.

ψ : V × A −→ A
.

(v, p) −→ p + v.

This action is by definition free and transitive. We therefore have the bijection

. V ∼
= A.
bi j

This is still not enough. We need additional structures in the space . A. A realistic
and typical example would be to introduce a Euclidean structure to . A , for example,
our three-dimensional Euclidean space. This transforms . A into a Euclidean space
. E. This is done by definition in the corresponding vector space . V (also called the

difference space of . A), and we obtain a Euclidean vector space. So we have .V ≡


(V, < | >). We have to distinguish between the notion of a Euclidean space which
is always an affine space (affine Euclidean space) and a Euclidean vector space. An
inner product or, more generally, a scalar product (symmetric nondegenerate bilinear
form) to an affine space, is defined directly on the difference space of an affine space
which is exactly the corresponding vector space.
A simplified but quite realistic linear algebra model of spacetime inspired by
Newtonian mechanics would be a Euclidean space . E as discussed above.
We come back to Newtonian mechanics. We found that the spacetime . M in New-
tonian mechanics is an affine space given by the triplet (. A4 , V 4 , ψ) with .V 4 the
four-dimensional difference space of . A4 with possibly additional structure. If we
denote by .x a (global) coordinate chart on . M, we have for this coordinate system in
an obvious notation:

. x(A4 ) = {(t, x→) ∈ R4 }


→ ∈ R4 }.
x(V 4 ) = {(t2 − t1 , x→2 − x→1 ) =: (τ , ξ) (4.3)

The Galilean transformation (Eq. 4.2) in the coordinate of .V 4 is given by


136 4 Spacetime and Linear Algebra

.τ' = τ
ξ→ ' = wτ →
→ + R ξ. (4.4)

This leads immediately to the two invariants:

.(1) τ
/
(2) || ξ→ || if τ = 0 (|| ξ→ ||= < ξ|ξ >). (4.5)

Summarizing, we can say that we found that the Newtonian spacetime . M is


an affine space . A4 with an additional structure. This can be described by the two
invariants in Eq. (4.5) .τ and .|| ξ→ ||, the latter being the length or norm defined only if
.τ = 0. So we may write

. M = (A4 , Δt, || Δ→
x || if Δt = 0).

The invariant .τ = Δt is the duration between two events that characterize the
absolute time as assumed by Newton. The second invariant .|| ξ→ ||=|| Δ→ x || is the
Euclidean distance between two simultaneous events. Therefore, it is clear that the
spacetime . M of the Newtonian mechanics is not a Euclidean or semi-Euclidean or
an affine space with certain scalar products. The reason for this “complication” is
the velocity transformation in Galilean transformations. (Eq. (4.2)).
Despite this, time and space are each separately regarded as Euclidean spaces . E 1
and . E 3 , as considered by Newton. We believe, and hopefully, the reader can also
see, that a good understanding of linear algebra is necessary to clarify the structure
of spacetime completely.

4.2 Electrodynamics and Linear Algebra

In electrodynamics, both the law of inertia and the relativity principle hold. As in the
previous section, the spacetime of electrodynamics that we denote now by .M is an
affine space . A4 as in Newtonian mechanics but with a different additional structure.
To determine the additional structure, we have to use the laws of electrodynamics.
Following Einstein, we take from electrodynamics only the existence of a photon.
This turns out to be entirely sufficient to determine the structure of spacetime in
electrodynamics. From our experience with Newtonian mechanics, we learned that
we have to search for the relevant invariants. Therefore, it is quite reasonable to use
the velocity of light .c. For this reason, we consider the speed of light (of a photon)
in two different frames of inertia, and we have:
4.2 Electrodynamics and Linear Algebra 137

|| Δ→
x ||2
.In I F with coordinates (t, x→) : c2 = . (4.6)
Δt 2
' 2
' ' x ||
|| Δ→
→ ' ) : c'2 =
.In I F with coordinates (t , x . (4.7)
Δt ' 2
The invariance of velocity of light given by

c' = c
. (4.8)

leads to the equation


x ' ||2
|| Δ→ || Δ→
x ||2
. = = c2 (4.9)
Δt ' 2 Δt 2

or equivalently to the equation

c2 Δt ' − || Δ→
x ' ||2 = c2 Δt 2 − || Δ→
2
. x ||2 = 0. (4.10)

To proceed, we make an assumption that is, in principle, not necessary but which
simplifies our derivation significantly in a very transparent way. We assume that the
Eq. (4.10) is valid also in the form:

c2 Δt ' − || Δ→
x ' ||2 = c2 Δt 2 − || Δ→
2
. x ||2 = K (4.11)

with . K ∈ R. This means that the invariant . K could also be different from zero.
Defining .Δx 0 := cΔt which has the dimension of length as .Δx i with .i ∈ {1, 2, 3}
and .μ, ν ∈ {0, 1, 2, 3}, Eq. (4.11) takes first the form
'
. x ' ||2 = (Δx 0 )2 − || Δ→
(Δx 0 )2 − || Δ→ x ' ||2 (4.12)

and we may define


Δs 2 := σμν Δx μ Δx ν
. (4.13)

with ⎡ ⎤
1 0 0 0
⎢0 −1 0 0⎥
. S = (σμν ) = ⎢ ⎥. (4.14)
⎣0 0 −1 0⎦
0 0 0 −1

The expressions in Eqs. (4.13) and (4.14) correspond to the relativistic scalar product
(symmetric nondegenerate bilinear form) which is nondegenerate and nonpositive
definite. .Δs 2 is invariant and we have, for example, for the two different frames of
inertia . I F and . I F ' :

Δs 2 (I F ' ) = Δs 2 (I F)
. (4.15)
138 4 Spacetime and Linear Algebra

which shows that .Δs 2 is universal. This means that the spacetime .M of electrody-
namics is an affine space . A4 with a scalar product given by the matrix . S = (σμν ),
a symmetric covariant tensor called also metric tensor or Minkowski Metric ten-
sor. After diagonalization, this tensor has the canonical form given by Eq. (4.14).
Concluding, we may state that the spacetime .M of electrodynamics is given by the
pair
.M = (A , S).
4
(4.16)

Since here the scalar product . S is not positive definite as in the case of a Euclidean
space,.M is here known as a semi-Euclidean or pseudo-Euclidean space or Minkowski
spacetime.
It is interesting to notice that the space .M in electrodynamics is mathematically
much simpler than the space . M in Newtonian mechanics. The space .M is formally
almost a Euclidean space, whereas in. M, as we saw, the invariants are mathematically
not as simple as a scalar product. On the other hand, the physics of .M, the spacetime
of electrodynamics (special relativity), is much more complicated and complex than
the physics of . M, the spacetime of Newtonian mechanics because the duration .τ of
the two events is not any more an invariant. At the same time, .M, the semi-Euclidean
or Minkowski spacetime, is the spacetime of elementary particle physics or simply
the spacetime of physics without gravity. This causes all the well-known difficulties
which enter into relativistic physics.
Now having found the structure of spacetime .M, it is equally interesting to deter-
mine its structure group .G.
. Aut(M) = G. (4.17)

Since we know that the space .M is a semi-Euclidean space, we expect that the
structure group .G consists of semi-Euclidean transformations. So .G is isomorphic
to the well-known Poincaré (Poin) or inhomogeneous Lorentz group. This is a special
affine group similar to a Euclidean group (affine Euclidean group). We may write
. G = Poin and we have in an obvious notation:

. Poin = {(a, Δ) : a = (αμ ) ∈ R4 , Δ = (λμν ) ∈ O(1, 3)}. (4.18)

The Poincare transformations are given by


'
. x μ −→ x μ := λμν x ν + αμ (4.19)

or in a matrix form
. x |→ x ' = Δx + a. (4.20)

Δ corresponds to a special linear transformation (semiorthogonal transformation)


.

given by:
T
. O(1, 3) = {Δ, Δ σΔ = σ}. (4.21)
Reference and Further Reading 139

As we see, all the mathematics we used in this chapter belong formally to linear
algebra. After this experience, we may expect that all the mathematics we need for
symmetries in physics also belongs to linear algebra.

Summary

In this chapter, we discussed one of the most important applications of linear algebra
to physics. Starting from two fundamental theories of physics, Newtonian mechan-
ics and electrodynamics, essentially using linear algebra alone, we described the
structure of spacetime.
Using Newton’s axioms, specifically employing the principles of inertia and rela-
tivity, we derived the spacetime structure of Newtonian mechanics, which is famously
associated with the Galilean group.
For the description of the spacetime of electrodynamics, which simultaneously
represents the spacetime of elementary particle physics and essentially the space-
time of all physics if one wants to exclude gravitational interaction, we followed
Einstein’s path: from electrodynamics, we only adopted the properties of the photon,
the elementary particle associated closely with the electromagnetic force. With the
photon, the principle of relativity, and linear algebra, we described the spacetime of
physics without gravity. This spacetime is famously also closely connected with a
group of transformation, the Poincaré group.

Reference and Further Reading

1. F. Scheck, Mechanics. From Newton’s Laws to Deterministic Chaos (Springer, 2010)


Chapter 5
The Role of Matrices

What is a matrix? We consider matrices as elements of .Km×n . We have first to clarify


a potential source of confusion. Matrices by themselves are not tensors in the sense
of tensor calculus. They do not have a specific transformation rule, so, for example,
the transformation property of a matrix representing a linear map is different to the
transformation property of a matrix representing a given scalar product. Therefore
we should always clarify how to use matrices in a given situation. Matrices seem
to be very flexible objects, and we use this feature in mathematics and physics. We
do not consider matrices only with scalars, but we also use matrices with entries
other than scalars, for example, entries with vectors or covectors (linear forms), or
even matrices. The most prominent property of all matrices is that we can add and,
under certain conditions, multiply matrices. The multiplication rule seems at first
sight quite complicated, but it turns out to be a reasonable and practical approach.
In most cases, matrices are used as a representation of linear maps. In addition, the
multiplication rule is justified by the composition of linear maps.

5.1 Matrix Multiplication and Linear Maps

One of the most important applications of matrices, is the ability to add, and, under
certain conditions, to multiply matrices. This makes matrices look like numbers,
perhaps something like super numbers. This possibility opens up when you consider
matrices as linear maps. More precisely, it turns out that the composition of maps
induces the product of matrices:
We consider the linear maps . f : U → V and .g : V → W .
Let . X = (u 1 . . . . , u n ) ≡ (u r )n be a basis of .U, Y = (v1 , . . . , v p ) ≡ (vμ ) p a basis
of .V , and . Z = (w1 , . . . , wm ) ≡ (wi )m a basis of .W . For the indices we choose
.r ∈ I (n), μ ∈ I ( p), and.i ∈ I (m). The composition of. f and.g is given by.h = g ◦ f
with .h : U → W .

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 141
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_5
142 5 The Role of Matrices

Suppose now that we do not know anything about matrix multiplication. We want
to define the matrix multiplication to be compatible with the composition of linear
maps, that is, obtain a homomorphism between linear maps and matrices.
The values of . f, g, h at basis vectors are given:

. f (u r ) = vμ ϕrμ , g(vμ ) = wi ψμi , h(u r ) = wi χri , (5.1)

for ϕrμ , ψμi , χri ∈ K.


.

The composition .h = g ◦ f leads to


( )
. h(u r ) = (g ◦ f )(u r ) = g f (u r ) = g(vμ ϕrμ ) = g(vμ )ϕrμ =
= wi ψμi ϕrμ and
h(u r ) = wi χri = wi ψμi ϕrμ . (5.2)

So, comparing (5.1) with (5.2), we obtain

ψμi ϕrμ = χri


. (5.3)

which is the standard matrix multiplication in tensor notation. The corresponding


matrices are given by

. F = (ϕrμ ) G = (ψμi ) H = (χri )

and we have .G F = H . More precisely, we may write

. F ≡ fY X G ≡ gZ Y H = hZX.

Thus we see (Eqs. 5.1, 5.2, 5.3) the homomorphism between linear maps and matrices
underlining the role of bases:
.h Z X = g Z Y f Y X . (5.4)

Remark 5.1 Products of linear maps and matrix multiplications have the same
algebraic properties, so that we have

.associativity : l ◦ (g ◦ f ) = (l ◦ g) ◦ f , L(G F) = (LG)F,


distributivity : g ◦ ( f 1 + f 2 ) = g ◦ f 1 + g ◦ f 2 , G(F1 + F2 ) = G F1 + G F2 .
5.1 Matrix Multiplication and Linear Maps 143

To make the analysis of the above results in algebraic terminology easier, we change
the notation of matrices to

. A := G B := F C := H, (5.5)

so we have
. A = (αiμ ) B = (βrμ ) C := (γri ). (5.6)

We demonstrate various aspects to present a matrix, taking as example the matrix . A


in Eqs. (5.6) and (5.7), . A ∈ Rm× p . So we write

A = (aμ ) p = [a
⎡ 1 .1.⎤. a p ] : a row (1 × p-matrix) with columns (vectors) entries ,
α
. ⎢ .. ⎥
A = (α )m = ⎣ . ⎦ : a column (m × 1-matrix) with rows (covectors) entries .
i

αm

Then for the columns of . A, we have .aμ ∈ Rm and for the rows (covectors) of . A, we
have .αi ∈ (R p )∗ . We have similar expressions for . B and .C in Eq. (5.6) and in Eqs.
(5.8) and (5.9) below. In Eqs. (5.7), (5.8) and (5.9), we also see the block matrix form
of . A, B and .C, with blocks columns or rows. The matrix multiplication is given by
the following map, written as juxtaposition:

Km× p × K p×n −→ Km×n ,


.

(A, B) |−→ C := AB.

This also leads to various aspects of matrix multiplication: Summarizing and using
an obvious notation, we write

. A = (αiμ ) = (aμ ) p = (αi )m , (5.7)


. B= (βrμ ) = (br )n = (β μ ) p , (5.8)
C=
. (γri ) = (cr )n = (γ i )m . (5.9)

For the various components of the product matrix .C, we have, using Eqs. (5.7), 5.8,
and (5.9), the following very compact and transparent expressions:

αiμ βrμ = γri ,


. (5.10)
μ
.aμ βr = cr Abr = cr , (5.11)
i μ
.αμ β =γ α B=γ ,
i i i
(5.12)
μ
.aμ β = C. (5.13)

Equation (5.10) is the standard form of multiplication. This is actually the well-
known low level multiplication.
144 5 The Role of Matrices

Equation (5.11) means that the linear combination of the columns of the first
matrix . A with the coefficients of the .r th column of the second matrix . B gives the
.r th column of the product matrix .C or it means simply that all columns of .C = AB
are linear combinations of the columns of . A. In Eq. (5.11) we see explicitly that the
action of the matrix . A on the column .br gives the column .cr of the product.
Equation (5.12) is analog to rows: the linear combination of the rows of the second
matrix . B with the .ith row coefficients of the first matrix . A gives the .ith row of the
matrix .C. Equivalently, the right action of the matrix . B on the .ith row of the matrix
. A, gives the .ith row of the product.
In Eq. (5.13), for fixed .μ, the product .αμ β μ is the matrix product between an
.m × 1 and a .1 × n-matrix.
The identification between linear maps and matrices provides the notion of rank
for matrices as well. If we denote by . f A the linear map related to the matrix . A, then
we can also define a rank for matrices:

Definition 5.1 Rank of a matrix.


The rank of an .m × n-matrix . A is given by the rank of the linear map:

.Kn −→ Km
ξ→ |−→ → := Aξ→ = as ξ s ∈ Km .
f A (ξ)

So we have

. rank(A) := rank( f A ) := dim(im( f A )) ≡ dim(im A).

This definition corresponds to the column rank (.c rank(A) ≡ rank(A)), as defined in
the next section, Sect. 5.2.
Linear maps not only provide the notion of rank for matrices, they also provide
the corresponding estimate for the rank of the product of two matrices. In an obvious
notation, we define the following vector space and linear maps:
Let

. V := Kn , V ' := K p , V '' := Km ,
f := f B , g := f A , h := FC := g ◦ f,
ḡ := g| im f

and suppose we have maps:

f g
. V −→ V' −→ V '' ,
V −→ im f −→ im ḡ.
f ḡ
5.1 Matrix Multiplication and Linear Maps 145

This leads to the following proposition.

Proposition 5.1 Rank inequalities for the composition .g ◦ f .

For the vector spaces .V, V ' , V '' and the linear maps . f : V → V ' and .g :
V ' → V '' , the following inequalities hold:

. rank( f ) + rank(g) − dim V ' ≤ rank(g ◦ f ) ≤ min{rank f, rank(g)}.

Proof Define .ḡ := g|im f . We consider the image of .ḡ. Then we have:

. im ḡ = im(g ◦ f ) (5.14)

and the following inequalities:


. im ḡ ⊆ im g (5.15)

and
. dim(im ḡ) ≤ dim(im f ) and rank(ḡ) ≤ rank( f ). (5.16)

Equations (5.15) and (5.16) lead to

. rank(g ◦ f ) ≤ min{rank( f ), rank(g). (5.17)

To get the other inequality, consider the kernel of .ḡ. Then

. ker ḡ = ker g ∩ im f. (5.18)

The rank-nullity theorem gives

. dim(ker g) = dim V ' − rank(g) (5.19)

and the following inequalities:

. ker ḡ ≤ ker g and dim(ker ḡ) ≤ dim(ker g). (5.20)

Using the rank-nullity theorem and Eq. (5.16), this leads for the .rank(g ◦ f ) to

. rank(g ◦ f ) = rank(ḡ) = rank( f ) − dim(ker ḡ) (5.21)

and to
. rank(g ◦ f ) ≥ rank( f ) − dim(ker g). (5.22)
146 5 The Role of Matrices

Inserting Eq. (5.18), we obtain

. rank(g ◦ f ) ≥ rank( f ) − {dim V ' − rank(g)} (5.23)

and
. rank(g ◦ f ) ≥ rank( f ) + rank(g) − dim V ' . (5.24)

The inequalities of Eqs. (5.17) and (5.24) give

. rank( f ) + rank(g) − dim V '' ≤ rank(g ◦ f ) ≤ min{rank( f ), rank(g)}.

This corresponds for matrices, using the above notation, to the corollary:

Corollary 5.1 Rank inequalities.

For matrices . A ∈ Km× p and . B ∈ K p×n , and . AB ∈ Km×n , it holds that

. rank(A) + rank(B) − p ≤ rank(AB) ≤ min{rank(A), rank(B)}.

Comment 5.1 Rank inequality from matrix multiplication.

A direct application of the matrix multiplication, as in Eq. (5.11), leads also


to the inequality:
. rank AB ≤ rank A. (5.25)

Since we have
c = aμ βrμ ,
. r (5.26)

every column of. AB is a linear combination of the columns of. A. This means that

. span AB ≤ span A

and
.c rank AB ≤ c rank A. (5.27)

which is by definition .rank AB ≤ rank A. ∎


See Definitions 5.3 and 5.4 in Sect. 5.2 below.
5.2 The Rank of a Matrix Revisited 147

5.2 The Rank of a Matrix Revisited

Perhaps the most important parameter of a matrix is its rank. We would like to
remember that if we consider a .m × n-matrix . A = (αis ), it is, as we know, just a
rectangular array with scalar (number) entries. We may think that . A is a list of .n
vectors (columns),

. A = (a1 , . . . , as , . . . , an ), as ∈ Rm , s ∈ I (n),

or that . A is a list of .m rows

. (α1 , . . . , αi , . . . , αm ), αi ∈ (Rn )∗ , i ∈ I (m),

which we write vertically. We may call it a colist, this was already discussed in
Sect. 3.1. ⎛ 1⎞
α
⎜ · ⎟
⎜ ⎟
⎜ · ⎟
⎜ i⎟
.B = ⎜ α ⎟ .
⎜ ⎟
⎜ · ⎟
⎜ ⎟
⎝ · ⎠
αm

If we want to stress that by . A we mean the matrix face of . A (the matrix . A, A ≡ [A])
or analogously for . B, we write
⎡ ⎤
α1
⎢ · ⎥
⎢ ⎥
⎢ · ⎥
⎢ i⎥
. A = [A] = [a1 · · · as · · · am ] or [B] = ⎢ α ⎥ .
⎢ ⎥
⎢ · ⎥
⎢ ⎥
⎣ · ⎦
αm

Usually, we identify . A with .[A] and .[B]. But the list . A is different from the colist . B,
.(A /= B). The elements of the list . A belong to .Rm whereas the elements of the list
n ∗
. B (list . B = α , . . . , α ) belong to .(R ) .
1 m

In what follows, we do not initially consider the matrix . A as a map. We stay at a


very elementary level and use only the terms linearly independent and .span. These
were introduced in Sect. 3.1.
We are only interested in the .rank of the above two lists . A and . B. More precisely,
we only want to compare the .rank of the above different lists. The .rank of a list
is given by the dimension of the spanned subspace or, equivalently, by the size of
a maximal linearly independent sublist. Therefore, we consider the two relevant
subspaces which are related to the matrix . A:
148 5 The Role of Matrices

Definition 5.2 Column space and row space of the matrix . A.


The column space .C(A) := span(a1 , . . . , an ) ≤ Rm ;
The row space . R(A) := span(α1 , . . . , αm ) ≤ (Rn )∗ .

We have . R(A) ∼ = C(AT ) ≤ (Rn ). Note that . AT is the column face of rows of . A. This
leads to the following definition:

Definition 5.3 Row and column rank.


The row rank of the matrix . A is given by .r rank(A) := dim C(AT ).
The column rank of the matrix . A is given by .c rank(A) := dim C(A).

It is further on clear that if we set .t := r rank and .c := c rank, .t is also the number of
linearly independent rows and .c is also the number of linearly independent columns.
We are now able to formulate a main theorem of elementary matrix theory.

Theorem 5.1 Row and column rank, the first fundamental theorem of linear
algebra.
The row rank of a matrix is equal to the column rank: .r rank(A) =
c rank(A), so that .t = c.

Proof The goal is to try to reduce . A to a specific .[t × c]-matrix . Ã, with .t linearly
independent rows and .r linearly independent columns if possible. Without loss of
generality, we choose the first .t rows to be linearly independent and we call the
remaining, the linearly dependent rows, superfluous. Similarly, we choose the first .r
columns to be linearly independent and we call the rest (that is, the linearly dependent
columns) superfluous. For the rows, in order to express the linear dependence, we split
the index.i in. j and.μ. Taking.σ ∈ I (n), i ∈ I (m) , j ∈ I (t) and.μ ∈ {t + 1, . . . , m}
μ
with .ρ j ∈ K, we have:
μ μ j μ μ j
.α = ρ j α and ασ = ρ j ασ . (5.28)

For the columns, we consider the columns and their column rank. According to our
choice above, we need to rearrange and write the .r linearly independent columns
first in the list and we have

c rank(A) = c rank(a1 , . . . , ar , ar +1 , . . . an ) = c rank(a1 , . . . , ar ).


. (5.29)
5.2 The Rank of a Matrix Revisited 149
( j)
ασ
We may also write for the column, .ασ = . The linear independence of the
ασμ
columns .(a1 , . . . , as , . . . , ar ), .(s ∈ I (r ), and .i, j, μ as above), means that: If

.αs λs = 0 or equivalently
αis λs = 0, (5.30)
(αsj λs = 0 and αsμ λs = 0)
for all i ∈ I (n),

then it follows that


.λs = 0 for all s ∈ I (r ). (5.31)

Now we throw out the .m − t superfluous rows and we are so left with the
shortened columns which we denote by . Ās , the shortened matrix or list . Ā =
(ā1 , . . . , ās , . . . , ān ).
The point is that the row operation above does not affect the column rank and we
get the equality:
.c rank(ā1 , . . . , ār ) = c rank(a1 , . . . , ar ). (5.32)

This is equivalent to the statement that also the shortened column list .(ā1 , . . . , ār )
is linearly independent like the given list .(a1 , . . . , ar ). This means that we have to
show the assertion:
If
.ās λ = 0 (αs λ = 0, ∀ j ∈ I (t)),
s j s
(5.33)

it follows that
λs = 0 ∀s ∈ I (r ).
. (5.34)

Given that the equation


a λs = 0 (see Eq. 5.30),
. s (5.35)

leads to the equation


λs = 0 (see Eq. 5.31),
. (5.36)

it is sufficient to show that equation

ā λs = 0 (see Eq. 5.30),


. s (5.37)

leads to the equation


a λs = 0.
. s (5.38)

This can be shown as follows: The equations .as λs = 0 or .αis λs = 0 contain the
j
equations .ās λs = 0 or .αs λs = 0, . j ∈ I (t), i ∈ I (m), t < m. So what is left, is to
check the equations
150 5 The Role of Matrices

.αsμ λs = 0, μ ∈ {t + 1, . . . , m}. (5.39)

The next calculation shows that Eq. (5.39) is indeed valid.


μ μ j
From Eq. (5.28), we have .αs = ρ j αs . This gives us, with Eq. (5.39) and using
Eq. (5.30),
μ s μ j s μ
.αs λ = ρ j αs λ = q j (αs λ ) = 0.
i s
(5.40)

So it is proven that .c rank( Ā) = r and .c rank( Ā) = c rank(A).


This means that by row operations, the column rank stays invariant. The row rank
stays in any case invariant by construction.
The result is that . Ā is a .t × n-matrix with .c rank( Ā) = r and .r rank( Ā) = t.
Using now the same procedure as before but interchanging the role of rows and
columns, we can get rid of the superfluous columns to create a new matrix with the
same row rank as originally.
This new matrix is now the .t × r -matrix . Ā.¯ In this matrix, both, the rows and the
columns should be by construction linearly independent.
This is only possible in this case if the equation .t = r holds and so we proved the
equation .r rank(A) = c rank(A). ∎

The assertion of this theorem is highly nontrivial.


With this result, we can set the usual definition for the rank of a matrix.

Definition 5.4 Rank of a matrix.


rank(A) := c rank(A).
.

We may think that the number of rank .(A) is the quintessence of the matrix . A and
indicates the “true” size of . A.
The following sections will justify this point of view.

5.3 A Matrix as a Linear Map

In this section, we will discuss the geometric properties of a .m × n-matrix . F con-


sidered as a linear map and the role of the rank. We will then compare the general
situation of the two abstract vector spaces .V and .V ' . We consider for the sake of
simplicity, .R vector spaces. We will take into consideration that in .Rn (and .Rm ), we
have the canonical Euclidean structure given by the dot product and the canonical
basis . E = (e1 , . . . , en ).
The scalar product allows an enjoyable result: a unique decomposition where
.coim f , a subspace of .R (coim f ≤ R ), is the (orthogonal) complement of .ker f
n n

and .coker f , a subspace of .R (coker f ≤ Rm , the (orthogonal) complement of .im f


m

given by the matrix . F, acting as a linear map . f . We identify . f with . F and, slightly
misusing the notation (see Comment 5.2 below), we have:
5.3 A Matrix as a Linear Map 151

.Rn = ker f ⊕ coim f f


−→ im f ⊕ coim f = Rm with
coim f ∼
= im f or, equivalently,
( )
dim coim f = dim(im f ) = rank f.

So we have, more precisely:

Rn Rm
|| ||
ker f coker f
⊕ ⊕

0→'
coim f ∼ im f, 0→' ∈ im f
=

But .coker f ≃ ker f or equivalently .dim(coker f ) /= dim(ker f ) since we have .n /=


m in general.
In an abstract vector space there is no such decomposition. In order to obtain a
similar behavior, we have to consider, as we will see in Sect. 6.2, the dual spaces .V ∗
'
and .V ∗ or to introduce explicitly a scalar product in .V and .V ' (see below and in Sect.
6.3). Here, we have .V = Rn and .V ' = Rm . The linear map . f is given in our smart
notation with .r, s ∈ I (n) and .i ∈ I (m) by the matrix . F = (ϕis ) = ( f s )n = (φi )m :
ϕis ∈ R, fr ∈ Rm , φi ∈ (Rn )∗ :

. f : Rn −→ Rm
x |−→ f (x) := F x.

F x is the matrix multiplication. If . E := (e1 , . . . , en ) and . E ' := (e1' , . . . , em' ) are the
.
canonical bases in .Rn and .Rm , we may also think the matrix . F as a representation
of the map . f with respect to the bases . E and . E ' and in our notation . F = f E ' E ∈
Rm×n . In components (coefficients, coordinates), the map . f is given by the following
expressions:

. Setting y = f (x) with x = es ξ s ≡ (ξ s )n ≡ ξ→


and y = ei' η i ≡ (η i )m ≡ η→, s ∈ I (n), i ∈ I (m) and ξ s , η i ∈ R.

. f (x) is given by .η i = ϕis ξ s or equivalently by . f (es ) = ei' ϕis , ei' ∈ Rm . This is


apparent in the following elementary steps:

. y = ei' η i = f (x) = f (es ξ s ) = f (es )ξ s = ei' ϕis ξ s = ei' (ϕis ξ s ) ⇒ η i = ϕis ξ s .


152 5 The Role of Matrices

Remark 5.2 . F as a row-matrix with entries columns.

We have . f (es ) = f s and of course . F = [ f 1 . . . f n ], a .1 × n-matrix with


entries columns.

Comment 5.2 The role of a rank (.rank f = r ).

To reveal the role of .rank f , we would like to point out that given the map
. f ∈ Hom(V, V ' ), with .dim V = n and .dim V ' = m, the subspaces .ker f in .V
and .im f in .V ' are always uniquely defined. If we want to consider two abstract
vector spaces .V and .V ' , that is, without a scalar product and without using any
specific bases, this leads to a nonunique decomposition of .V and .V ' . So we may
have for example:

. V ∼ f im f ⊕ Ω ∼
= ker f ⊕ U1 −→ 1 = V
'
with
V ∼
= ker f ⊕ U2 −→ im f ⊕ Ω2 ∼
f '
= V with
∼ ∼
U1 = U2 = im f or, equivalently,
dim U1 = dim U2 = dim(im f ) = rank f = r.

In general, we have .n /= m and

Ω1 ∼
= Ω2
. ≃ ker f or, equivalently
m − r = dim Ω1 = dim Ω2 /= dim (ker f ) = n − r.

We see this directly if we take a tailor-made basis for . f as in Proposition 3.9.


It is usual to call the equivalence class .U1 ∼= U2 ∼ = . . . =: co im f and the iso-
morphism class .Ω1 ∼ = Ω2 . . . =: coker f . It is clear that .coim f and .coker f are
uniquely defined since .coim f is the quotient to the above equivalence relation.
The same holds of .ker f . So

. V ∼ f im f ⊕ coker f ∼
= ker f ⊕ coim f −→ = V '.

Remark 5.3 The role of the transpose.

In the present case with .(Rn , E) and .(Rm , E ' ), with . E and . E ' the cor-
responding canonical (standard) basis in .Rn and .Rm , the situation is now
5.3 A Matrix as a Linear Map 153

significantly better than in Comment 5.2. We have the transpose matrix


.F T ∈ Rn×m and concretely this leads also to a unique decomposition. In our
case, here, we have .im f ≡ C(F) := span( f 1 , . . . , f n ) < Rm and .ker f ≡
N (F) := {x ∈ V : F x = 0} < Rn with .C(F) the column space and . N (F) the
null space of . F.

. F T is closely connected with the dot products in .Rn and .Rm . This gives
g
Rm −
. → Rn ,
w |−→ g(w) := F T w.

So we get
f ≡F
Rn −→ Rm ,
.

g≡F T
. Rn ←− Rm .

Now we have .im g = C(F T ). This is the row space of . F in form of columns and it
is a subspace of .Rn . The null space of . F T N (F T ) = ker g is a subspace of .Rm . As
we already showed in Theorem 5.1, .dim C(F T ) = dim C(F) = rank(F) = r . So we
also obtain
. dim(im f ) = dim(im g) = rank f = r. (5.41)

In addition, we can show that .im g = C(F T ) and .ker f = N (F) are not only
complementary but also orthogonal:
Given .z ∈ N (F) and .v ∈ C(F T ), we have with .w ∈ Rn :
. F z = 0 and . F w = v. The transpose of the last equation is given by
T

v T = w T F.
. (5.42)

Using Eq. (5.42) and . F z = 0 in .v T z, we obtain .v T z = w T F z = 0.


The equation .v T z = 0 indicates the orthogonality .v⊥z. So now the subspaces in
T
.R , N (F) and .C(F ), are orthogonal and we may write
n

. N (F) ⊥ C(F T ) (5.43)

or
. ker f ⊥ im g. (5.44)

Similarly, we obtain
.C(F) ⊥ N (F T ) (5.45)

for .C(F) and . N (F T ) ≤ Rm .


154 5 The Role of Matrices

It is clear that now .coim f = C(F T ) and .coker f = N (F T ) are uniquely deter-
mined. It is interesting that in this case we obtain with .Rn and .Rm not only uniquely
the decomposition

Rn = ker f ⊕ coim f
. f
−→ im f ⊕ coker f = Rm , (5.46)

but even more: the unique orthogonal decomposition denoted by the symbol .Θ:

Rn = ker f Θ coim f
. f
−→ im f Θ coker f. (5.47)

Since we have . f ≡ F, we can also write this equivalently as

Rn = ker F Θ im F T −→
. F im F Θ ker F T = Rm . (5.48)

So the following theorem was proven.

Theorem 5.2 The fundamental theorem of linear maps.


Given a matrix . F ∈ Hom(Rn , Rm ), the following orthogonal decomposition
holds:
.R
n
= ker F Θ im F T → im F Θ ker F T = Rm .

For good reasons, this theorem may be considered as the second fundamental theorem
of elementary linear algebra. It states more precisely the situation at the beginning
of this section, written as:

Rn = ker F ⊕ coker F → im F ⊕ coker F = Rm .


.

The result of the above theorem may be represented symbolically by Fig. 5.1.

5.4 Vector Spaces and Matrix Representations

If we want to perform concrete calculations with abstract vectors, we have to use


matrices. For vectors, it is standard to use columns and, for maps, rectangular matri-
ces. In physics, we often start with concrete calculations, that is to say, we use
matrices of the appropriate type or size from the beginning. Nevertheless, it is often
useful and necessary to also take an overview of the abstract situation. So firstly, we
are going to present an abstract vector space with columns, and the presentation of
linear maps will be treated in the next section. Although we already covered most of
the necessary background knowledge concerning vector spaces, especially in Sect.
3.2, we are now going to review and summarize briefly some of the results as this
5.4 Vector Spaces and Matrix Representations 155

Fig. 5.1 The fundamental theorem of linear maps symbolically

will be a useful preparation for the presentation of linear maps too. Section 3.2 gives
a in-depth knowledge, but what follows here is self-sufficient.
We start with an abstract vector space .V with .dim(V ) = n, and, as we already
know, we need a basis . B = (b1 , . . . , bs , . . . , bn ) in order to construct the appropriate
representation. This provides the following basis isomorphism:

.φ B :V −→ Kn ,
bs |−→ φ B (bs ) := es s ∈ I (n), (5.49)
⎡ 1⎤
v
⎢·⎥
.or v | −→ φ B (v) := ⎢ ⎥
⎣ · ⎦ = v B = v→B . (5.50)
vn

The elements of the list .(v 1 , . . . , v s , . . . , v n ), v s ∈ K are called coordinates, coeffi-


cients, or components. With . E = (e1 , . . . , es , . . . , en ), we denote the canonical basis
in.Kn . In Eq. (5.50), we also use the notation.v→B instead of.v B when we want to empha-
size that .v B belongs to .Kn . The index . B has to be used if there is also another basis
involved. In Eq. (5.49), the basis isomorphism .φ B is given by the transformation
from the basis list . B to the basis list . E (see Proposition 3.8):

(φ B (b1 ), . . . , φ B (bs ) . . . , φ B (bn )) := (e1 , . . . , es , . . . , en )


.

or
φ B (B) := E.
.
156 5 The Role of Matrices

Comment 5.3 Comparison of the use of bases with the theory of manifolds.

In Sect. 3.2 we used exclusively the isomorphism.ψ B with.ψ −1 B = φ B , in linear


algebra both are called basis isomorphisms. In the theory of manifolds, we use
the notations .(V, φ B ), a (global) chart and .(V, ψ B ), a (global) parametrization.
We can now see the four faces of a basis:

– .B = (b1 , . . . , bn ), a list,
– .[B] = [b1 · · · bn ], a matrix,
– .φ B , a chart,

– .ψ B , a parametrization,
⎡ 1⎤
v
⎢ .. ⎥
and we write .v = ψ B (→
v ) = [B]→
v = [b1 · · · bn ] ⎣ . ⎦.
vn

We might now ask ourselves what the main purpose of a basis in practice is. With
a given basis, we can replace an abstract vector space by a concrete vector space,
and since this consists of number lists, we can calculate not only with vectors, but
we can also send these vectors elsewhere, for example, from Nicosia to Berlin. This
is possible if the observers at different positions previously agreed on the basis to be
used. So we can think that we gained a lot. But what is the price for this gain?
We lose uniqueness: any other basis yields quite different values, coordinates for
.v. So we actually need to know how to go from a basis . B to any other basis .C.

In Sect. 3.2, we learned that the best way to think about the abstract vector .v is to
present .v with all its representations simultaneously. This means to consider all the
bases, . B, C, D, ..., at the same time. But concretely, it is actually enough to use only
one more basis, for example .C, and to determine the transition from . B to .C. This
means, we use the set of all bases . B(V ) (think of relativity!). Thus, we can think that
we can present this .v with all its representations simultaneously and we know that
we can reach and use all the bases in .V , as shown and discussed in Sect. 3.2. We can
describe this, using the bases . B and .C, in the commutative diagram in Fig. 5.2:
We choose a new basis.C = (c1 , . . . , cn ) and the corresponding basis isomorphism
−1
.φC . The transition map is given by . TC B = φC ◦ φ B which we identify immediately
μ μ
with the matrix.T := TC B = (τs ), τs ∈ K. In what follows, we take.μ, ν, s, r ∈ I (n).
The matrix .T is invertible (so that .T ∈ Gl(n)), so we have .T T −1 = 1n and .T −1 T =
1n , with .T −1 = (τ̄μs ), or equivalently

τ r τ̄ μ = δsr and
. μ s

τ̄sμ τνs = δνμ .


5.4 Vector Spaces and Matrix Representations 157

Fig. 5.2 Vector space and


representations . B and .C

Since, with our smart indices, we distinguish clearly .μ, ν, from .s, r , we may write

τ̄ s ≡ τμs .
. μ

In this sense, .τ̄μs is a pleonasm and is used only if we want to emphasize that .τμs
belongs to .T −1 . It is unfortunate that the notation can be confused with the transpose
T −1
. T , and one should notice that generally, . T /= T T .
Now, we can describe the change of basis also by this definition:

Definition 5.5 Change of basis.


For a given basis . B = (b1 , . . . , bs , . . . , bn ) and .C = (c1 , . . . , cμ , . . . , cn )
μ
in .V , we write .bs = cμ τs or equivalently .[B] = [C]T :
⎡ ⎤
τ11 . . . τn1
.[b1 · · · bn ] = [c1 · · · cn ] ⎣ · ⎦.
τ1 . . . τn
m m

The following lemma is commonly used in physics.

Lemma 5.1 Change of basis.


The assertions (i) and (ii) are equivalent.
μ
(i) .bs = cμ τs ,
μ μ
(ii) .vC = τs v sB ,
μ
with .vC , v sB ∈ K or in matrix form .v→C = T v→B .

Proof (i) .⇒ (ii)


μ
Given .v = bs v sB and .v = cμ vC , we have
158 5 The Role of Matrices

(i) μ
v = bs v sB == cμ τsμ v sB == cμ vC
!
.
μ
=⇒vC = τsμ v sB

which proves (ii). ∎

Proof (ii) .⇒ (i)


Given
μ (ii)
v = cμ vC == cμ τsμ v sB ,
.

or v = bs v sB ∀ v B ∈ Kn ,

we obtain
b = cμ τsμ
. s

which proves (i). ∎

Comment 5.4 Right and left action.

It is interesting to realize that the matrix .T acts on the basis from the right
and on the coefficient vectors from the left:

[B] = [C]T ⇔ [C] = [B]T −1 and


.

v→C = T v→B ⇔ v→B = T −1 v→C .

This is also an example for the discussion in Sect. 1.3, particularly for Remark 1.3.

Comment 5.5 .TC B = idC B .

If we want we can use an equivalent diagram with a different notation from


the one given in Fig. 5.2, for the same situation.

idV
V V
φB φC
RnB RC
idC B

and we have
idC B ◦ φ B = φC ◦ idV .
.
5.4 Vector Spaces and Matrix Representations 159

The meaning of .TC B is .TC B ≡ idC B . This is what we call in physics a pas-
sive symmetry transformation. It corresponds to a coordinate change while the
physical system remains fixed.

In the theory of manifolds as well as in linear algebra, we have a cycle relation:

Lemma 5.2 Composition of change of basis.


If we have.vC = TC B v B and.v D = TDC vC , . B, C, D ∈ B(V ), the cycle relation
. TD B = TDC TC B , holds.

This is explained in the following commutative diagram (Fig. 5.3).

Remark 5.4 The set of all representations of .V .

It is clear that the set of all representations of .V is given by the set of bases
in .V, B(V ) = {B}. If we consider the set of all bases in .V , every basis . B
leads to a basis isomorphism .φ B or to a linear chart .(V, φ B ). It is also called
a representation of the vector space .V by .n × 1-matrices or by columns of
length .n, or by the coefficient vectors in .Kn .

As discussed in Eqs. (5.49) and (5.50) at the beginning of this section, if we want,
we can also use the same notation for all these representations with the letter . M (like
matrix) and so get for every . B ∈ B(V ) the isomorphism

. M : V −→ Kn ≡ Kn×1 ,

Fig. 5.3 Composition of change of basis: .TD B = TDC ◦ TC B


160 5 The Role of Matrices

v |−→ M(v) = v→.

In this sense, we here use a “universal” notation. In the next section, we shall see that
we may use the same universal notation for the representation of linear maps too.

5.5 Linear Maps and Matrix Representations

In this section, we use a similar notation and apply the results of the previous sections.
For the representation of linear maps, we return now to the general case where
'
. V and . V are vector spaces without further structure. In order to describe a given
linear map
' '
. f : V −→ V , f ∈ Hom(V, V ),

we have to choose a basis . B = (b1 , . . . , bn ) and . B ' = (b1' , . . . , bm' ) corresponding


to .V and .V ' . The connection with the matrix . F ≡ f B ' B ∈ Km×n which represents the
map, is shown in the following cumulative diagram:

f
V V'
φB φB'
RnB RmB '
F

So we have .φ B (bs ) = es and .φ'B (bi' ) = ei' with .s ∈ I (n) and .i ∈ I (m). The map
. f is, according to Proposition 3.8 in Sect. 3.3, determined uniquely by the values of
the basis . B:
' i
. f (bs ) = bi ϕs , ϕis ∈ K. (5.51)

This also determines uniquely the matrix

f
. B' B ≡ F := (ϕis ). (5.52)

Equation (5.51) can also be written as follows:

. f (b1 , . . . , bn ) = ( f (b1 ), . . . , f (bn )) (5.53)


.or

f [b1 . . . bn ] = [ f (b1 ) . . . f (bn )],


f [b1 . . . bn ] = [bs' ϕs1 . . . bs' ϕsn ]
and
f [b1 . . . bn ] = [(b1' ) . . . (bm' )]F (5.54)
.or
5.5 Linear Maps and Matrix Representations 161

f [B] = [B ' ]F. (5.55)

Equations (5.54) and (5.55) correspond to the above diagram:

.φ B ' ◦ f = FB ' B ◦ φ B or (5.56)


.φ B ' ◦ f = F ◦ φ B . (5.57)

This gives also


. F = φ B ' ◦ f ◦ φ−1
B . (5.58)

As mentioned in the previous Sect. 5.4, .(φ B , V ) and (.φ B ' , V ' ) are what is called in
the theory of manifolds (local) charts. In linear algebra, these are of course global
charts. Equations (5.51) and (5.52) exposed in the form of a diagram as above, give:

f
bs f (bs ) = bi' ϕis

φB φB' φB'

es F(es ) = ei' ϕis


F

So we have from .φ B (bs ) = es , .φ B ' (bi' ) = ei' , using linearity and Eqs. (5.51) and
(5.52),

.φ B ' ( f (bs )) = φ B (bi' ϕis ) = φ B ' (bi' )ϕis and (5.59)


' i
.φ B ' (bi ϕs ) = ei' ϕis . (5.60)

This justifies the equations in the above diagram and the following correspondence:

bs f bi
. B→ → → B'
es F ei'

This shows, stated in simple words, that we can perfectly describe what happens at
f F
the “top” level of .V −→ V ' , at the “bottom” level of .Rn −→ Rm . This is the essence
of the representation theory of linear maps. Using here in addition the universal
notation with the symbol . M, we can write

F
v |−→ w
. = F(v),
M( f )
M(v) |−→ M(w) = M( f )M(v)
or v→B |−→ w
→ B' = F v→B . (5.61)
162 5 The Role of Matrices

We obtain Eq. (5.62) using again Eqs. (5.51) and (5.52):


!
. f (bs v sB ) = f (bs )v sB = bi' ϕis v sB = bi' wiB ' . (5.62)

So we have
w 'B ' = ϕis ν Bs ' , wiB , ν Bs ∈ K
. (5.63)

which is
.→ 'B ' = F v→B
w (5.64)

with .v→B ∈ Kn and .w


→ B ' ∈ Km . If we consider the various faces of the representation
matrix . F ∈ Km×n
, we can write:
⎡ 1⎤
φ
⎢ · ⎥
⎢ ⎥
⎢ · ⎥
⎢ i⎥
. F = [ϕs ] = [ f 1 · · · f s · · · f n ] = ⎢ φ ⎥ .
i
⎢ ⎥ (5.65)
⎢ · ⎥
⎢ ⎥
⎣ · ⎦
φm

Since in the above diagram, we already have the equation . Fes = ei' ϕis , it is
obvious that
. Fes = f s . (5.66)

This means that the .sth columns of the matrix . F which represents the map . f , is
the value of the canonical basis vectors .es ∈ Kn . In addition, it means that the .sth
column of . F gives the coefficients of the value of the .sth basis vector .bs in .V . These
coefficients correspond to the basis . B ' in .V ' , as expected.
At this stage, we have to explain how . F changes by transforming the basis . B into
the basis .C in .V and the basis . B ' into the basis .C ' in .V ' . For simplicity’s sake we
call the new matrix . F̄ := f C ' C and so have to consider the transition from . F to . F̄.
This is given by the following proposition.

Proposition 5.2 Representation change by bases change.

From the bases . B and .C in .V and . B ' and .C ' in .V ' , the matrix . FC ' C of . f is
given by . FC ' C = T ' FB ' B T −1 where the corresponding transition matrices

. T ≡ TC B and T ' ≡ TC ' B '

are given by

. B = C T and B ' = C ' T ' .


5.5 Linear Maps and Matrix Representations 163

Proof The proof is given by the two commutative diagrams:

The second diagram leads to . f C ' C ◦ T = T ' ◦ f B ' B and to . f C ' C = T ' ◦ f B ' B ◦ T −1 .

We can also achieve the above result directly using only the tensor formalism: Putting
the right indices in the right place, we obtain

r, s ∈ I (n) i, j ∈ I (m), T = (τsr ), T ' = (τi ),


j
.

j
f B ' B ≡ FB = (ϕis ) f C ' C ≡ FC = (ηi ).

j
We start with . FB = (ϕ)is and we want to obtain . FC = (η)r , using .T = (τrs ) and
' 'j
. T = (τi ). We so obtain
'j s i 'j i s
.ηr = τi τr ϕs = τi ϕs τr .
j
(5.67)

This corresponds to
. FC = T ' FB T −1 . (5.68)

Usually, if the involved bases are evident from the context, they are often not included
in the notation. Here, seizing this opportunity, we would like to express all the relevant
isomorphisms by the same letter . M, using in a way a universal notation:
164 5 The Role of Matrices

. M : Hom(V, V ' ) −→ Km×n ,


f |−→ M( f ) = F,
M : V −→ Kn×1 ,
v |−→ M(v) = v→,
M : V ' −→ Km×1 ,
w |−→ M(w) = w.

So the equation . f (v) = w can be represented by

. M( f (v)) = M( f ) M(v) ∈ Km×1 , (5.69)

or
. M(w) = M( f ) M(v), (5.70)

and
w
→ = F v→.
. (5.71)

5.6 Linear Equation Systems

We now turn to perhaps the very first applications of linear algebra, which is also one
of the most important: solving a system of linear equations. With the results from
the previous chapters, tackling this problem and providing the corresponding proofs
is quite straightforward.

Definition 5.6 Linear system of equations.


Let . A be a coefficient matrix . A ∈ Km×n given by . A = (as )s = (αis ) with .αis ∈
K, as ∈ Km , .i ∈ I (m) and .s ∈ I (n). A linear system of equation is given by
the following three forms:
(i) .αis ξ s = β i : tensor form;
(ii) .as ξ s = b, b = (β i )m ∈ Km : vector form;
(iii) . Ax = b, .x = ξ→ ∈ Kn : matrix form.
Every .xb ≡ ξ→b ∈ Kn with . Axb = b is called a solution of the linear system. We
denote the set of all solutions of the linear system by

.L(A, b) := {xb : Axb = b}.


5.6 Linear Equation Systems 165

We call the equation of the form . Ax = 0 with .b = 0 homogeneous, and


when .b /= 0, we call it inhomogeneous. We denote the set of solutions of
the homogeneous equation by .L(A) = L(A, 0).

After formulating this system of equations, the following questions arise:


(i) Existence. Under what conditions on . A and .b is there an .x so that . Ax = b
holds?
(ii) Universal solvability. Is the equation . Ax = b solvable for all .b ∈ Km ?
(iii) Unique solvability. If there is a solution, when is it unique?
(iv) Representation of solutions of the linear systems of equations. How do we
describe all solutions of the equation system?
With the results provided so far, these questions can already be answered. This leads
to the following corresponding propositions. Most proofs are quite direct and are left
as exercises to the reader. Denoting by . f A the corresponding map to the matrix or
list . A, f A : Kn → Km , we get the following propositions.

Proposition 5.3 Existence.


The following assertions are equivalent:
(i) The equation system . Ax = b is solvable.
(ii) .b ∈ im f A ≡ span A.
(iii) .rank A = rank(A, b).
(iv) Whenever . y ∈ Km , . AT y = 0 ⇔ bT y = 0.

Proposition 5.4 Universal solvability.


The following assertions are equivalent:
(i) The equation system . Ax = b is universally solvable.
(ii) .rank A = m.
(iii) .im f A = Km .

Proposition 5.5 Unique solvability.


The following assertions are equivalent:
(i) The equation system . Ax = b is uniquely solvable.
(ii) .ker f A = {0}.
(iii) .rank A = n.
166 5 The Role of Matrices

(iv) The homogeneous equation system has only the trivial solution:

L(A) = {0}.
.

Proposition 5.6 Representation of solutions of the linear systems of equa-


tions.
(i) The set .L(A) of the solutions .x0 of the homogeneous system of equa-
tions, . Ax0 = 0, is a subspace of .Kn , L(A) ≤ Kn and its dimension is
.dim L(A) = n − rank A.

(ii) If . pb is a solution of . Ax = b, then all solutions of . Ax = b are given by:

x = pb + x0 with x0 ∈ L(A) = ker f A .


. b

So we have
L(A, b) = pb + L(A).
.

This means that .L(A, b) is an affine space with the corresponding subspace
.L(A) ≤ Kn , with
. dim L(A) = n − rank A.

Proposition 5.7 Linear system of equations for .n = m.


If . A is a coefficient matrix . A ∈ Kn×n , then the following assertions are
equivalent:
(i) . Ax = b is solvable for any .b ∈ K.
(ii) . Ax = b is uniquely solvable for .a b ∈ Kn .
(iii) . Ax = 0 possesses only the trivial solution.

. Ax = b is uniquely solvable for any .b ∈ K .


n
(iv)
−1
(v) . A is invertible and the solution of . Ax b = b is given by . x b = A b.

If we take the opportunity and utilize tailor-made bases for the map

. f A : Kn → Km ,
5.6 Linear Equation Systems 167

with corresponding matrix . A ∈ Km×n to give us a decomposition . F ' AF = Ã with


'
. F and . F invertible matrices of rank .m and .n respectively, and . Ã of the form

[ ]
1r 0
. Ã = ∈ Km×n .
0 0

This way, one can solve the equation system

. Ã x˜b = b̃

very comfortably: We first get the result

L(A, b) = L(F ' A, F ' b) ≡ L(F ' A, b̃).


.

The following proposition summarizes the solution for this case.

Proposition 5.8 Tailor-made solution of the linear equation system.


Suppose . Ax = b is solvable. Let . F ' and . F be as before, and .r = rank A, then
[ ]

. F 'b = where b̄ ∈ Kr .
0

The general solution has the form:


[ ]

x =F
. b with λ ∈ Kn−r and b̄ ∈ Kr .
λ

Proof We have

. Axb = b ⇔
F Axb = F ' b ⇔
'
[ ]

F ' Axb = with b̄ ∈ Kr and
0
[ ]

b̃ := F ' b = . (5.72)
0

This shows the first statement of the proof.


For the general solution of the linear equation, setting

x̃ := F −1 xb ⇔ xb = F x̃b ,
. b (5.73)
168 5 The Role of Matrices

leads to

. Axb = b ⇐⇒
'
F AF x̃b = b̃ ⇐⇒
F AF F −1 xb = F ' b ⇐⇒
'
[ ]
1r 0
x̃ = F ' b.
0 0 b

The result corresponds


[ to]an improved form of the reduced row echelon form. The
1 0
image of the matrix . r is of the required form, so . F ' b is in the image if and only
0 0
[ ]
b
if . F ' b = .
0
Solving the above equation, we obtain

ξ = b̄ and λ ∈ Kn−r with λ free,


.

This leads to [ ]

x̃ =
. b .
λ

Using Eqs. (5.72) and (5.73), we obtain


[ ]

.xb = F
λ

which proves the proposition. ∎

Summary

What is a matrix? This is where we started. By the end of this chapter, hopefully the
following became clear: a matrix is what you can do with it! In short, linear algebra
is what you can do with matrices.
Initially, we focused on matrix multiplication and its relationship to the com-
position of linear maps. We explored the four facets of matrix multiplication. A
fundamental theorem of linear algebra is the fact that the row rank of a matrix equals
its column rank. Many aspects related to this theorem are also considered in the
exercises.
The various roles of a matrix, as a linear map, as a basis isomorphism (as a supplier
of coordinates), as a representation matrix of abstract vectors and linear maps, along
with the associated change of basis formalism, were discussed extensively.
5.6 Linear Equation Systems 169

This led to the fundamental theorem of linear maps which relates the kernels and
images of a matrix with those if its transpose.

Exercises with Hints

Exercise 5.1 Preservation of rank under multiplication by invertible matrices (see


Comment 5.1).
For the matrices . A ∈ Km×n , . F ' ∈ G L(m, K), and . F ∈ G L(n, K), show that

. rank(F ' A) = rank(A), rank(AF) = rank(A) and


rank(F ' AF) = rank A.
Exercise 5.2 As the dot product is positive definite, it leads directly to the following
equality for the null space.
For any matrix . A ∈ Kn×n , show that . A and . A† A have the same null space, that is,
.ker A A = ker A.

In the next five exercises, you are asked to prove once more the fundamental
theorem of linear algebra which yields the equality of row and column ranks
of a matrix. All these proofs, for which one needs only very elementary facts,
show very interesting aspects of the structure of matrices.

Exercise 5.3 The row—column—rank theorem.


Choose a basis . B for the column space of a matrix . A ∈ Km×n . Show that the fac-
torization . A = BC is unique. By comparison with . AT = C T B T , show that the row
rank of . A equals its column rank.
Exercise 5.4 The row—column—rank theorem.
If the column rank of a matrix . A ∈ Km×n is .r , take the first .r columns of . A to be
linearly independent and write . Br for the corresponding basis of the column space
of . A. Show that now the factorization . A = Br C gives directly the result row rank .C
= column rank . A .= r .
Exercise 5.5 The row—column—rank theorem.
Choose the row rank of a matrix . A ∈ Km×n to be .t and rearrange the rows so that the
first .t rows are linearly independent. Show by inspection that each column of . A is a
linear combination of the standard basis vector .e1' , . . . , et' ∈ Km and that

. column rank A ≤ row rank A

and furthermore that


. row rank A = column rank A.
170 5 The Role of Matrices

Exercise 5.6 The row—column—rank theorem.


Use the results of the Exercises 5.1 and 5.2 to show that

. rank A = rank AT A ≤ rank AT

and that
. rank AT ≤ rank A,

which leads to
. rank AT = rank A,

and thus to
row rank A = column rank A.
.

Exercise 5.7 The change of a basis matrix.

. T ≡ TC B = (τsi )

as given in Comments 5.4 and 5.5, can also be directly expressed by the coordinates
of the basis vector .bs , s ∈ I (n) of the basis . B = (b1 , . . . , bn ) in the vector space .V .
Show that the matrix .TC B in the notation of Comments 5.4 and 5.5 is given by

T
. CB = [φC (b1 ) . . . φC (bn )].

Similarly, the matrix .TBC is given by

T
. BC = [φ B (c1 ) . . . φ B (cn )].

Exercise 5.8 The representation of a linear map . f ∈ Hom(V, V ' ) by using the
canonical basis isomorphism between .V and .Kn (.V ' and .Km ) with basis . B =
(b1 , . . . , bn ) in a vector space .V (. B ' = (b1' , . . . , bm' ) in .V ' ) can be expressed
symbolically as follows:

f
bs bi' ϕis

es ei' ϕis
F

with .s ∈ I (n), .i ∈ I (m), . F = (ϕis ), .ϕis ∈ K, .es ∈ Rn , and .ei' ∈ Rm .


Explain the following relation using the corresponding commutative diagram.

. f (b)s = bi' ϕis ⇔ Fes = ei' ϕis .


5.6 Linear Equation Systems 171

Exercise 5.9 Rank of a linear map and its representing matrix.


Suppose the rank of a linear map . f ∈ Hom(V, V ' ) is .r . Show that there exists at
least one basis in .V and one basis in .V ' such that, relative to these bases, only the
first .r columns and the first .r rows of the matrix . F = M( f ) of . f are nonzero.
Exercise 5.10 Let . A and . B be two matrices for which addition and multiplication
are defined. Show that

(AB)T = B T AT and ((A)T )T = A.


.

Exercise 5.11 Let . A and . B be two invertible matrices in .Kn×n .


Show that .(AB)−1 = B −1 A−1 .
Exercise 5.12 Let . f ∈ Hom(V, V ' ) and . B = (b1 , . . . , bn ) be a basis of .V and
' ' ' '
. B = (b1 , . . . , bn ) a basis of . V . Prove that the following are equivalent.

(i) . f is invertible;
(ii) The columns of the matrix . M( f ) = F are linearly independent in .K;
(iii) The columns of . F span .Kn .

The following five exercises concern systems of linear equations. It is a col-


lection of well-known results. The proof can also be seen as a fairly direct
application of Chaps. 2, 3, and this chapter.

Exercise 5.13 Existence.


Prove that the following assertions are equivalent:
(i) The equation system . Ax = b is solvable.
(ii) .b ∈ im f A ≡ span A.
(iii) .rank A = rank(A, b).
(iv) If . y ∈ Km , . AT y = 0 ⇔ bT y = 0.
Exercise 5.14 Universal solvability.
Prove that the following assertions are equivalent:
(i) The equation system . Ax = b is universally solvable.
(ii) .rank A = m.
(iii) .im f A = Km .
Exercise 5.15 Unique solvability.
Prove that the following assertions are equivalent:
(i) The equation system . Ax = b is uniquely solvable.
(ii) .ker f A = {0}.
(iii) .rank A = n.

(iv) The homogeneous equation system has only the trivial solution.
172 5 The Role of Matrices

Exercise 5.16 Representation of solutions of the linear systems of equations.


Prove the two following assertions:
(i) The set .L(A) of the solutions .x0 of the homogeneous system of equations,
. Ax 0 = 0, is a subspace of .K , L(A) ≤ K and its dimension is .dim L(A) =
n n

n − rank A.
(ii) If . pb is a solution of . Ax = b, then all solutions of . Ax = b are given by:

x = pb + x0 with x0 ∈ L(A) = ker f A .


. b

So we have
L(A, b) = pb + L(A).
.

This means that .L(A, b) is an affine space with the corresponding subspace .L(A) ≤
Kn . With .r = rank A, we have

. dim L(A) = n − r.

Exercise 5.17 Linear system of equations for .n = m.


Let . A be a coefficient matrix . A ∈ Kn×n . Prove that then the following assertions are
equivalent:
(i) . Ax = b is solvable for any .b ∈ K.
(ii) . Ax = b is uniquely solvable for .a b ∈ Kn .
(iii) . Ax = 0 possesses only the trivial solution.
. Ax = b is uniquely solvable for any .b ∈ K .
n
(iv)
−1
(v) . A is invertible and the solution of . Ax b = b is given by . x b = A b.

References and Further Reading

1. S. Axler, Linear Algebra Done Right (Springer Nature, 2024)


2. S. Bosch, Lineare Algebra (Springer, 2008)
3. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Stu-dienanfänger. Grundkurs
Mathematik (Springer, 2020)
4. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra. (Pearson, 2013)
5. S. Hassani, Mathematical Physics: A Modern Introduction to its Foundations (Springer, 2013)
6. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
7. N. Johnston, Advanced Linear and Matrix Algebra (Springer, 2021)
8. N. Johnston, Introduction to Linear and Matrix Algebra (Springer, 2021)
9. M. Koecher, Lineare Algebra und analytische Geometrie (Springer, 2013)
10. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences (Springer,
2018)
11. J.M. Lee, Introduction to Smooth Manifolds. Graduate Texts in Mathematics (Springer, 2013)
12. J. Liesen, V. Mehrmann, Linear Algebra (Springer, 2015)
13. P. Petersen, Linear Algebra (Springer, 2012)
References and Further Reading 173

14. S. Roman, Advanced Linear Algebra (Springer, 2005)


15. B. Said-Houari, Linear Algebra (Birkhäuser, 2017)
16. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach
(Cambridge University Press, 2022)
17. G. Strang, Introduction to Linear Algebra (SIAM, 2022)
18. R.J. Valenza, Linear Algebra. An Introduction to Abstract Mathematics (Springer, 2012)
Chapter 6
The Role of Dual Spaces

As mentioned in Sect. 5.3, a linear map . f between .Kn and .Km determines uniquely
the four subspaces .ker f, coim f ≤ Kn , .coker f , and .im f ≤ Km which give essen-
tial information about this map.
If we consider . f between two abstract vector spaces .V and .V ' with no additional
structure, .coker f and .coim f are not uniquely defined. If we want subspaces fixed
similarly as for . f ∈ Hom(Kn , Km ), we have to additionally consider the dual spaces
of .V and .V ' , .V ∗ and .V '∗ . This is what we are going to show in what follows.
Beforehand, we have to note that dual spaces play a unique role in linear algebra and
a crucial role in many areas of mathematics, for example, in analysis and functional
analysis. They are essential for a good understanding of tensor calculus; they appear
in special and general relativity and are ubiquitous in quantum mechanics. The Dirac
notation in quantum mechanics is perhaps the best demonstration of the presence of
dual spaces in physics.

6.1 Dual Map and Representations

We consider a vector space.V and its dual.V ∗ = Hom(V, K) with.dim V = dim V ∗ =


n. An element of .V ∗ is called a linear form or sometimes a linear function, linear
functional, one form or even covector. We denote the elements of .V ∗ by small greek
letters.α, β, · · · , ξ, η, · · · , taking care not to confuse them with the notation of scalars
which we also denote by small greek letters. For .ξ ∈ V ∗ we have

.ξ : V −→ K,
v |−→ ξ(v),

so that .ξ is a linear map between the vector spaces .V and .V ' = K. We choose the
basis . B = (b1 , . . . , bn ) of .V and . B ' = (1) of .K and the matrix representation of .ξ
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 175
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_6
176 6 The Role of Dual Spaces

is given in various notations by

ξ
. B' B ≡ [ξ1 . . . ξn ] ≡ (ξi )n ≡ (ξ→ )T ≡ M(ξ ) ∈ K1×n
ξi ∈ K, ξ→ ≡ (ξ i ) with ξ i = ξi and i ∈ I (n).

Taking .v = bi v i and using the anonymous or “universal” . M notation, we have


[ ]
v1
. M[v] = .. = v→ ∈ Kn×1 (v i ∈ K) and
.n
v
[ ]
( ) v1
M ξ(v) = M(ξ )M(v) = [ξ1 . . . ξn ] .. = ξi v i = (ξ→ | v→).
.n
v

The chosen basis . B of .V uniquely determines a specific basis . B ∗ of .V ∗ which


has the simplest possible relation to the basis . B. This is expressed in the following
proposition:

Proposition 6.1 Dual basis.


If . B = (b1 , . . . , bn ) (is a basis of .V , then . B ∗ = (β 1 , . . . , β n ) ≡ (b1∗ , . . . , bn∗ )
with .β i (b j ) = δ ij or . bi∗ (b j ) = δi j ), .i, j ∈ I (n), is a basis of .V ∗ .
∗ ∗
. B is called the dual basis or cobasis of . B. This . B is uniquely defined.

Proof Since . B ∗ ⊆ V ∗ is a linearly independent list of cardinality .n = dim V ∗ , it fol-


lows that. B ∗ is a basis of.V ∗ : If we set.λi β i = 0, it follows that.∀ j ∈ I (n) λi β i (b j ) =
λ j and .λ j = 0. In addition, .span B ∗ ≤ V ∗ , and since .dim V ∗ = n, .span B ∗ = V ∗ .
So . B ∗ is a basis of .V ∗ . The basis . B ∗ is uniquely defined since every .β i is given by
the .n numbers .β i (b j ) = δ ij . ∎

Remark 6.1 Canonical dual basis.

For.V = Kn and.V ∗ = Kn ∗ , if we take the canonical basis. E = (e1 , . . . , en ),


then . E ∗ = (ε1 , . . . , εn ) ≡ (e1∗ , . . . , en∗ ) is given by

εi (e j ) = δ ij εi ≡ ei∗ = eiT
. ∈ (Kn )∗ = K1×n ,

for example .ε1 = [10 · · · 00], · · · , εn = [00 · · · 01]. It should be clear that any
linear map, here the linear function .ξ ∈ V ∗ = Hom(V, K), can also have the
6.1 Dual Map and Representations 177

representation .ξ B ' B . We thereby use the notation previously used to reflect


explicitly the basis dependence of the representation.

Comment 6.1 Basis induced dual isomorphism.

Proposition 6.1 shows also the following: For every basis . B in .V , there exists
a dual isomorphism .Ψ B from .V to .V ∗ which can be regarded as giving .V the
vector structure of .V ∗ . This isomorphism is given exactly by the dual basis . B ∗ .
Using the notation of Proposition 6.1, we have

Ψ B :V −→ V ∗
.

bs |−→ Ψ(bs ) = bs∗ ≡ β s .

Remark 6.2 Representation of the dual basis.


As expected, with the basis . B ∗ the linear chart .Φ B ∗ is given by:

Φ B ∗ : V ∗ −→ (K n )∗ ≡ K1×n ,
.

β i |−→ Φ B ∗(β i ) := εi .

So we have for .ξ = ξi β i

Φ B ∗(ξ ) = Φ B ∗(ξi β i ) = ξi Φ B ∗(β i ) = ξi εi = [ξ1 . . . ξn ].


.

This is consistent with the previous result:

Φ B ∗(ξ ) = ξ B ' B .
.
178 6 The Role of Dual Spaces

Remark 6.3 Dual basis and coordinates of vectors.

The covector .β i , as an element of the cobasis . B ∗ = (β 1 , . . . , β n ), deter-


mines the .ith coordinate of .v ∈ V in the basis . B = (b1 , . . . , bn ).

. For v = b j v j , v j ∈ K, we have β i (v) = β i (b j v j ) = β i (b j )v j = δ ij v j = v i .

As we already saw, the basis . B determines similarly the coordinate .ξi of the
covector .ξ ∈ V ∗ in the cobasis . B ∗ = (β 1 , . . . , β n ). Taking .ξ = ξi β i , ξi ∈ K,
we have
j
.ξ(bi ) = ξ j β (bi ) = ξ j δi = ξi .
j

We can think that, besides the category of vector spaces, there is also the
category of covector spaces, that is, that there exists to each .V the associated

. V . Analogously, we can think that to each linear map . f there exists the
associated dual linear map . f ∗ .

Definition 6.1 Dual map . f ∗ .


If . f ∈ Hom(V, W ), then the dual map . f ∗ ∈ Hom(W ∗ , V ∗ ) is defined by
∗ ∗
. f (η) := η ◦ f ∈ V .

We can see that . f ∗ is a linear map from:

. f ∗ (η + θ ) = (η + θ ) ◦ f = η ◦ f + θ ◦ f = f ∗ η + f ∗ θ and
f ∗ (λη) = λη ◦ f = λ(η ◦ f ) = λ f ∗ η.

The map .∗ : f |→ f ∗ is linear as well. It can be verified by

. f + g |−→ ( f + g)∗ = f ∗ + g ∗
λ f |−→ (λ f )∗ = λ f ∗ .

Example: .( f + g)∗ (η) = η ◦ ( f + g) = η ◦ f + η ◦ g = f ∗ η + g ∗ η.


6.1 Dual Map and Representations 179

Comment 6.2 . f and the dual . f ∗ .

A direct comparison between . f and its dual . f ∗ shows that . f ∗ points to the
opposite direction:

f
. V −→ V ' ,
f∗
V ∗ ←− (V ' )∗ .

This makes it difficult to handle . f ∗ . The following notation can help. We


define a new symbol .( , ):

(ξ, v) := ξ(v), ξ ∈ V ∗ , v ∈ V.
.

Thus the above definition . f ∗ η(v) = η( f v) can also take the form:

( f ∗ η, v) = (η, f v).
.

Proposition 6.2 Composition of dual maps.

Let . f ∈ Hom(U, V ) and .g ∈ Hom(V, W ), then .(g ◦ f )∗ = f ∗ ◦ g ∗ .

Proof First proof.


If

. f ∗ ∈ Hom(V ∗ , U ∗ ), g ∗ ∈ Hom(W ∗ , V ∗ ), (g ◦ f )∗ ∈ Hom(W ∗ , U ∗ ) and η ∈ W ∗ ,

then

.(g ◦ f )∗ (η) = η ◦ (g ◦ f ) = (η ◦ g) ◦ f
( ) (
= g ∗ (η) ◦ f = f ∗ g ∗ (η) = f ∗ ◦ g ∗ (η) ∈ U ∗ .

Since this holds for all .η ∈ W ∗ , we obtain .(g ◦ f )∗ = f ∗ ◦ g ∗ . ∎

Proof Second proof, using Comment 6.2.


180 6 The Role of Dual Spaces

. ((g ◦ f )∗ η, v) =
= (η, (g ◦ f )v) = (η, g( f v)) =
= (g ∗ η, f v) = ( f ∗ (g ∗ η), v) = (( f ∗ ◦ g ∗ )η, v) ⇒ (g ◦ f )∗ η = ( f ∗ ◦ g ∗ )η.

Remark 6.4 . f ∗ as pullback.

The map . f ∗ is the pullback associated to . f as shown in the commutative


diagram:

f
V W
f ∗ η=η◦ f η

K
The representation of the dual map . f ∗ is, as expected, deeply connected with the
representation of . f : . M( f ∗ ) = M( f )T !

Proposition 6.3 Representation of the dual map.


Let . f : V → V ' be a linear map. Let . B, C be bases of .V, V ' , respectively,
and . B ∗ , C ∗ their dual bases, and . F = f C B . Then the representation matrix of
.f

: (V ' )∗ → V ∗ is given by . f B∗∗ C ∗ = F T .

Proof For this proof, we use only the corresponding bases and cobases. We have

β r (bs ) = δsr
. r, s ∈ I (n) and
γ (c j ) =
i
δ ij i, j ∈ I (m)

. with F = f C B = (ϕsi ) (6.1)

. f (bs ) = ci ϕsi . (6.2)

We define
. f ∗ (γ i ) = β r χri . (6.3)

For the matrix representation of . f ∗ we write

. F ∗ := f B ∗ C ∗ = (χri ). (6.4)
6.2 The Four Fundamental Spaces of a Linear Map 181

We determine . F ∗ by the following sequence of equations:

( f ∗ γ i )(bs ) = γ i ◦ f (bs ) = γ i ( f bs ) = γ i (c j ϕsi )


.

= γ i (c j )ϕsj = δ ij ϕsj = ϕsi . (6.5)

Equation (6.3) leads to

( f ∗ γ i )(bs ) = (β r χri )(bs ) = χri β r (bs ) = χri δsr = χsi


. (6.6)

. =⇒ χsi = ϕsi .

So we obtain. f ∗ (γ i ) = β s ϕsi . If we compare with. f (bs ) = ci ϕsi , we see that. F ∗ = F T


or in a different notation . M( f ∗ ) = M( f )T . ∎

6.2 The Four Fundamental Spaces of a Linear Map

As we saw in Sect. 5.3, the matrix . F ∈ Hom(Kn , Km ) determines uniquely the four
subspaces
T T
. ker F, im F ≤ K and im F, ker F ≤ K
n m
(6.7)

which give important information about the map . F. For . f ∈ Hom(V, V ' ) this is not
possible if .V and .V ' have no additional structure. The reason is that only .ker f and
.im f are uniquely defined by . f , but .coim f and .coker f are not uniquely defined by
' '
. f . Only if we choose bases . B and . B for . V and . V , the complements of .ker f and
'
.im f are also fixed by . f and .(B, B ). So we may write:

. V ∼ f im f ⊕ coker f ∼
= ker f ⊕ coim B f −→ B = V '. (6.8)

We need . B and . B ' since, as mentioned, when .V, V ' are abstract vector spaces, we
do not possess anything analogous to . F T as in the case when we consider .Kn and
'
.K . As we shall see later in Sect. 6.3, if . V and . V are Euclidean or unitary vector
m
ad T
spaces, the adjoint . f will play the role of . F and we can find a basis-free version
of .coim B f and .coker B f , induced directly from . f .
Here, with .V and .V ' abstract vector spaces without further structure, if we want
to find from . f induced a kind of basis-free decomposition of .V and .V ' , we have to
make use of the dual point of view and consider . f ∗ ∈ Hom(V '∗ , V ∗ ).
As we already know, for a given . f the dual . f ∗

. f ∗ : V '∗ −→ V ∗ ,
η |−→ f ∗ (η) := η ◦ f. (6.9)
182 6 The Role of Dual Spaces

is uniquely determined. Now the subspaces .im f ∗ ≤ V ∗ and .ker f ∗ ≤ V '∗ are also
uniquely determined by . f ∗ . These two subspaces, .im f ∗ , ker f ∗ , which correspond
to .coim B f and .coker B f , are a kind of substitute for .im F T and .ker F T , respectively.
So we get the big picture for . f as given by the proposition in form of a diagram:

Proposition 6.4 The four subspaces of a linear map.

. V∗ ∼
= coker B f ∗ ⊕ im f ∗ f∗
coim B f ∗ ⊕ ker f ∗ ∼
= V '∗

∼ B
=↑ B ' ↑∼
=
V ∼
= ker f ⊕ coim B ( f ) →
im f ⊕ coker B ( f ) ∼
= V '.
f
(6.10)

Proof The proof is obtained straightforwardly almost by inspection, using the dual
bases for .V and .V ' . We may also write symbolically for the uniquely defined sub-
spaces:
f∗
· · · ⊕ im f ∗ ← · · · ⊕ ker f ∗
. →
ker f ⊕ · · · f im f ⊕ · · · .

This exhibits the four relevant subspaces that are basis-independent. ∎

Comment 6.3 Isomorphisms of the four fundamental subspaces.

From the last proportion, we obtain immediately the following isomorphisms.

. im f ∗ ∼ = coim B ( f ) ∼
= im f ∼ = coim B ( f )∗ (6.11)
. ker f ∼
= coker B ( f )∗ and ker f ∗ ∼
= coker B ( f ). (6.12)

Proposition 6.4 may also be considered as a synopsis of the results of the second
fundamental theorem of linear algebra (see Theorem 5.2) for the general case of an
abstract vector space.
If we use the notation of an annihilator, further results are obtained.

Definition 6.2 Annihilator of a subspace.


For .U ≤ V , the annihilator of .U , denoted by .U 0 , is given by .U 0 := {ξ ∈ V ∗ :
ξ(u} = 0 ∀u ∈ U }.

We can directly verify that .U 0 is a subspace of .V ∗ .


6.2 The Four Fundamental Spaces of a Linear Map 183

Proposition 6.5 Dimension of the annihilator .U 0 .

For .U ⊆ V and .U 0 as above, we have .dim U + dim U 0 = dim V . This


means that the annihilator of .U has the dimension of any complement of .U in
. V . So if we have .U ⊕ W ∼
= V , then .dim U 0 = dim W .

Proof Let . A := (a1 , . . . , ak ) be a basis of .U , and extend it to a basis .C :=


(a1 , . . . , ak , b1 , . . . , bl ) of .V . Let .C ∗ = (α 1 , . . . , α k , β 1 , . . . , β l ) be its dual basis.
In what follows, we set .i, j ∈ I (k) and .r, s ∈ I (l). Note that the choice . B :=
(b1 , . . . , bl ) is a basis of .W := span(b1 , . . . , bl ) and .W ≤ V .
We notice that we set .β s (ai ) = 0 for all .i ∈ I (k) and also .β s (br ) = δrs for all
.s ∈ I (l), as usual. This means that

. W ∗ := span((β 1 , . . . , β l ) = {λs β s : λs ∈ K}

is a subspace of .V ∗ (W ∗ ≤ V ∗ ) and .W ∗ ∼
= W . We also notice that .W ∗ annihilates .U :

.β s (U ) = 0 for all s ∈ I (l),

so that .W ∗ is a subspace of .U 0 :
. W ∗ ≤ U 0. (6.13)

It is left to show is that .U 0 ≤ W ∗ holds: If .w ∈ U 0 , we have

. w = μ j α j + λs β s , μ j , λs ∈ K (6.14)

and
!
w(ai ) = 0 for all i ∈ I (k).
. (6.15)

Equations (6.14) and (6.15) lead to

w(ai ) = μ j α j (ai ) + λs β s (ai ) =


.

j
= μ j δi + 0 =
!
= μi = 0. (6.16)

This leads to
.w = λs β s ∈ U 0 . (6.17)

We showed that .U 0 ≤ W ∗ and together with Eq. (6.13) we get .W ∗ = U 0 . Now it is


clear that for .W ∗ , the dual of .W , we have .dim W ∗ = dim W = dim U 0 . This proves
.dim U + dim U = dim V. ∎
0
184 6 The Role of Dual Spaces

Proposition 6.6 Annihilators of .im f and .ker f .

The following equations hold:

. ker f ∗ = (im f )0 , (6.18)

. im f ∗ = (ker f )0 . (6.19)

This follows directly by setting .ker f = U and using tailor-made bases in the
proof of Proposition 6.5. Here, we give a basis-independent proof for Eq. (6.18):
Proof In Eq. (6.13), we have
f∗
V ∗ ←− V '∗ ≥ ker f ∗
V −→ V ' ≥ im f.
.
f

We have to show
(a) .ker f ∗ ≤ (im f )0 and
(b) .ker f ∗ ≥ (im f )0 .
which gives .ker f ∗ = (im f )0 .
For (a): We consider the sequence of the following implications:

η ∈ ker f ∗ ⇒ f ∗ η0 = 0∗ ∈ V ∗
. 0

so that for all .v ∈ V this leads to

. f ∗ η0 (v) = 0∗ (v) = 0 ⇒ η0 ( f v) = 0,
⇒ η0 ( f V ) = 0 or η0 (im f ) = 0

which means that


η ∈ (im f )0 and ker f ∗ ≤ (im f )0 .
. 0

So (a) is proven.
For (b): We start with .θ ∈ (im f )0 . Then we have for all .v ∈ V

.θ ( f v) = 0 and f ∗ θ (v) = 0.

Hence . f ∗ θ = 0∗ ∈ V ∗ and .θ ∈ ker f ∗ , so that .(im f )0 ≤ ker f ∗ . This proves (b).


(a) and (b) both hold so that

.(im f ) = ker f .
0


6.3 Inner Product Vector Spaces and Duality 185

Proposition 6.7 Injective and surjective relations between . f and . f ∗ .


Let . f ∈ Hom((V, V ' ) and . f ∗ its dual. Then
(i) if . f is injective, . f ∗ is surjective;
(ii) if . f is surjective, . f ∗ injective.

Proof (i) If . f is injective, then we have .ker f = {0}. Using the propositions 6.5
and 6.6, we obtain
∗ ∗
.(ker f ) = V = dim V = n
0

and so
. dim(im f ∗ ) = dim(ker f )0 = n,

and so
. im f ∗ = V,

thus . f ∗ is surjective.
(ii) Similar to the proof for (i).

6.3 Inner Product Vector Spaces and Duality

We consider a linear map. f ∈ Hom(V, V ' ) between.V = (V, (|)) and.V ' = (V ' , (|)),
two inner product vector spaces. In our approach, we always mean a finite-
dimensional vector space by an inner product vector space, usually a Euclidean
or unitary vector space. Here we obtain the same picture of the four relevant sub-
spaces of . f as in the case . f ≡ F ∈ Hom(Kn , Km ). The role of the transpose . F T
(giving .ker F T and .im F T ) is taken over now by the adjoint map . f ad of . f . We will
discuss this new linear algebra notion in this section too. The existence of adjoint and
self-adjoint operators is omnipresent in physics. Especially for quantum mechanics,
it is interesting to note that self-adjoint operators describe the physical observables
on a Hilbert space. It is well-known that for finite dimensions, the notion of a Hilbert
space is equivalent to that of a unitary space.
So here we have the opportunity to look at finite dimension spaces first which
are much easier than infinite dimensioned spaces, to understand the Hilbert space
structure and observe its geometric significance.
The existence of an inner product in .V allows a second dual isomorphism which
is basis-independent.
186 6 The Role of Dual Spaces

Definition 6.3 Dual isomorphism induced by the inner product.

Let the map j : V −→ V ∗ ,


.

v |−→ j (v) := (v| ·)

which means that


. j (v)(u) := (v|u). (6.20)

Remark 6.5 Antilinear and semilinear map.

There is here a slight difference between a Euclidean and a unitary vector


space. For a Euclidean vector space, . j is a linear map, whereas, for a unitary
vector space, . j is what is called a .C̄-linear or antilinear map.

. j (u + v) = j (u) + j (v) and j (λv) = λ̄ j (v). (6.21)

To describe both possibilities, we uniformly use the name semilinear map if we


mean that a map . f is linear or antilinear. For example, . f is called semilinear
if for

. f : V −→ V ' ,
v |−→ f (v),
f (u + v) = f (u) + f (v) and for λ ∈ C,
f (λv) = λ̄ f (v) or f (λv) = λ f (v). (6.22)

It is clear that for a Euclidean vector space, the semilinear map . j is a linear map.

Definition 6.4 The adjoint map, . f ad .


For . f ∈ Hom(V, V ' ), the adjoint of . f is the map . f ad : V ' → V uniquely
defined by the property
.(v| f w) := ( f v|w)
ad
(6.23)

. for all v ∈ V and all w ∈ V ' .

It is equivalent to define the adjoint . f ad of . f via the commutative diagram:


6.3 Inner Product Vector Spaces and Duality 187

f∗
V∗ . V '∗

j J
f ad
V V'

The isomorphism . j and . J are the corresponding dual isomorphisms as given in Eqs.
(6.20) and (6.24) below:

J : V ' −→ V '∗
.

w |−→ J (w) := (w|·) (6.24)

and we obtain
. j ◦ f ad = f ∗ ◦ J (6.25)

or equivalently
. f ad = j −1 ◦ f ∗ ◦ J. (6.26)

We see that . f ad can be considered as the manifestation of . f ∗ .


Equations (6.25) and (6.26) are equivalent to Eq. (6.23). At the level of vector
spaces we have:

. f ad : V ' −→ V,
w |−→ f ad (w).

We can obtain the analytic expression (6.23) for . f ad from (6.25) as follows:

.( j ◦ f ad )(w) = ( f ∗ ◦ J )(w)
⇔ j ( f ad w) = f ∗ (J w) ∈ V ∗
⇔ ( j ( f ad w))(v) = J (w)( f v) ∈ K for any v ∈ V and w ∈ V '
⇔ ( f ad w|v) = (w| f v)
⇔ (v| f ad w) = ( f v|w) (6.27)

which is the second definition (see Eq. (6.23)) of . f ad .


188 6 The Role of Dual Spaces

Remark 6.6 Some properties of . f ad and .ad-action.


(i) The map . f ad is linear.
. j and. J are semilinear, but in the composition (6.26) both of them appear.
Therefore the “semi” is “annihilated” and . f ad becomes linear. ∎
(ii) The map .ad is antilinear for . f, g ∈ Hom(V, V ' ).
.( f + g) = f ad + g ad and .(λ f )ad = λ̄ f ad , λ ∈ C.
ad

The additivity is quite clear because for .v ∈ V , .w ∈ V ' . The semi lin-
earity follows from

. (v | (λ f )ad w) = (λ f v | w) = λ̄( f v | w) = λ̄(v | f ad w),


= (v | λ̄ f ad w) ⇒ (λ f )ad = λ̄ f ad . (6.28)


(iii) .ad is an involution: .( f ad )ad = f . For any .w ∈ V, v ∈ V ,

(w | ( f ad )ad v) = ( f ad w | v) = (w | f v).
. (6.29)


(iv) .(g ◦ f )ad = f ad ◦ g ad .
Using an obvious notation as above for . j, J and . K , and (6.26), we obtain

.(g ◦ f )ad = j −1 ◦ (g ◦ f )∗ ◦ K = j −1 ◦ f ∗ ◦ g ∗ ◦ K ⇒
(g ◦ f )ad = j −1 ◦ f ∗ ◦ J ◦ (J −1 ◦ g ∗ ◦ K ) = f ad ◦ g ad . (6.30)

We are now in the position to determine the relation of .ker f ad and .im f ad to .im f
and .ker f :

Proposition 6.8 Kernel image relation between . f ad and . f .


Let .V and .V ' be inner product spaces,

. f :V −→ V ' be a linear map, and (6.31)


'
. f ad
:V −→ V be the adjoint map. (6.32)

Then:
(i) .ker f ad = (im f )⊥ .
(ii) .im f ad = (ker f )⊥ .
6.3 Inner Product Vector Spaces and Duality 189

Proof (i) Let . y ∈ V ' , then . y ∈ ker f ad :

. ⇔ f ad y = 0
⇔ (v| f ad y) = 0 ∀v ∈ V
⇔ ( f v|y) = 0 ∀v ∈ V
⇔ y ∈ (im f )⊥

Hence .ker f ad = (im f )⊥ . This shows (i).


(ii) Since .( f ad )ad = f , we may replace . f with . f ad in (i) and we obtain .ker f =
( )⊥
im f ad . Its orthogonal complement gives

. im f ad = (ker f )⊥ .

This result also gives a geometric interpretation of . f ad :

. V = ker f Θ im f ad and V ' = im Θ ker f ad . (6.33)

This result shows that . f and . f ad lead uniquely through “.ker” and “.im” in an orthog-
onal decomposition of .V and .V ' .
In this way, we obtained for a general . f ∈ Hom(V, V ' ), with the inner product
vector spaces .(V, (|)) and .(V ' , (|)' ), the same connection with the four . f -relevant
subspaces as with . F ∈ Hom(Rn , Rm ) in Theorem 5.2. Therefore, it may be consid-
ered as another face of the same theorem:

Theorem 6.1 The fundamental theorem of linear maps for inner product vec-
tor spaces.
Any map . f : V → w decomposes as follows:

f
. V = ker f Θ im f ad −→ im f Θ ker f ad = V ' .

Furthermore, we know that .dim(im f ad ) = dim(im f ) = rank f = r . So again


if .dim(ker f ) = k and .dim(ker f ad ) = l, then

. dim V = k + r and dim V ' ≡ r + l.


190 6 The Role of Dual Spaces

Remark 6.7 .im f = (ker f ad )⊥ .

The orthogonal complement on both sides of (i), in Proposition 6.8, gives


( )⊥
.im f = ker f ad .

Proposition 6.9 Representation of . f ad .

The representation . F ad of . f ad is given for orthonormal bases in .V and .V '


by the representation . F of . f :

. F ad = F † where (F † := F̄ T ).

Proof For orthonormal bases . B = (va )n in .V and .C = (wi )m in .V ' .

.(va | vb ) = δab a, b ∈ I (n) and (wi | w j ) = δi j i, j, ∈ I (m).

j
We obtain from . f va = wi ϕai ϕa ∈ K,

.(w j | f va ) = (w j | wi )ϕai = δi j ϕai = ϕaj .

Taking . f ad wi = va χia χia ∈ K, we obtain analogously

.(vb | f ad wi ) = χib ,
(vb | f ad wi ) = ( f vb | wi ) = (wi | f vb ) = ϕbi .

The comparison with the last two equations leads to the result

χ b = ϕbi .
. i

So we have . f ad wi = va ϕ̄ai which means . F ad = F † . ∎

Remark 6.8 . F ad = F T for Euclidean vector spaces.

In the case of Euclidean vector spaces,

ϕ i ∈ R so that ϕ̄ai = ϕai and F ad = F T .


. a
6.4 The Dirac Bra Ket in Quantum Mechanics 191

This is precisely the result for . f ∈ Hom(Rn , Rm ) as discussed in Sect. 5.4


and Theorem 5.2.

6.4 The Dirac Bra Ket in Quantum Mechanics

Quantum mechanics is done in a Hilbert space . H , that is, the realm of quantum
mechanics. Here, we consider finite-dimensional vector spaces and therefore we
also consider finite-dimensional Hilbert spaces. An .n-dimensional Hilbert space
is a .C vector space with inner product . H = (V, (|)), dim V = n. If we choose a
orthonormal basis .C = (c1 , . . . , cn ), then we have the following isomorphism:

. H∼
= Cn .

So we can identify the Hilbert space . H with .Cn . The inner product here is also called
a Hermitian product, and .(|) is noting else but the Dirac Bra Ket! But Dirac goes one
step further and decomposes the BraKet .(|) into two maps .((|) → (||) → (| and |))
which is .|) = id H ∈ Hom(H, H ):

|) : H −→ H,
.

v |−→ |v) := id H (v) = v,

and .(| ∈ Hom(H, H ∗ ):

(| : H −→ H ∗ ,
.

v |−→ (v|

with

(v| : H −→ C,
.

u |−→ (v|u) = v † u.

So the result is that we have in fact .| v) = v and .(v | /= v, definitively. So the new
object is only the map .(| ∈ Hom(H, H ∗ ): However, this is nothing else but the well-
known canonical isometry between . H and . H ∗ (see Chap. 11.2 and also the canonical
dual isomorphism as in Definition 6.3). At this point, to facilitate our intuition, we
prefer considering a real vector space. So we set now . H = Rn ≡ Rn×1 . This is no
restriction for the following considerations.
We notice immediately that because of the equation .(v|u) = v T u, the equality
T ∗
.(| = (·) : H → H holds too. Thus, the transpose.T, when restricted to. H , is nothing
else but the map ."Bra" = (| taken from Bra Ket. So we have just to call the symbol
192 6 The Role of Dual Spaces

.|) = id H , Ket, as Dirac did. But what is the difference between the transpose .T and
Bra? Bra is only defined on . H while the transpose .T is defined on . H as well as on

. H . So we have the well-known relations:

T : Rn −→ (Rn )∗
.

column |−→ row


and T : (Rn )∗ −→ Rn
row |−→ column
but only (| : Rn −→ (Rn )∗
column |−→ row.

This means that when we are using .| v), we see .v ∈ Rn but explicitly never
n ∗ ∗
.ξ = ξv ∈ (R ) . This facilitates the identification of . H with . H . In coordinates
(coefficients), using the canonical basis . E = (e1 , . . . , en ) and the canonical cobases

. E = (ε , . . . , ε ), (ei |es ) = δis , ε (es )) = δs , i, s ∈ I (n), we have
1 n i i

.| v) = v = es v s v s ∈ R and

(v | = v T = vi εi with vi = v i .
.

If we consider the standard quantum mechanics case where we take . H = Cn , we


have, with .(v| = v † = v̄ T ,


n
T
. (v|u) = v̄ u = v̄ i u i = vi u i .
i=1

Then, .(| corresponds to the conjugate transpose .† and we can write for .v ∈ H :

|v) = ei v i , v i ∈ C and
.

(v| = vi εi with vi = v̄ i .

The conjugate transpose of .v ≡| v) is:

.(v| = v̄ 1 ε1 + · · · + v̄ n εn = [v1 · · · vi · · · vn ] = v † ∈ (Cn )∗ = H ∗ .

Remark 6.9 Comparison of .(|) with .|)(|.

As we know, the symbol .(|) denotes the Hermitian product for a .C vector
space which is a sesquilinear (“one and a half linear”) map.

(|) : H × H −→ C.
.
6.4 The Dirac Bra Ket in Quantum Mechanics 193

What is .|)(|? We first consider .|v)(u| ∈ Hom(H, H ) = Cn×n . Then .|v)(u| is a


remarkable map as well as a remarkable matrix since .|v)(u| = v ū T = vu † is a matrix
with rank .(|v)(u|) = 1. We may think that as a map, .|v)(u| is acting from the left on
. H = C . On the other hand, .|v)(u|, as a matrix, also acts quite naturally from the
n

right on . H ∗ = (Cn )∗ . We can thus also interpret .|v)(u| ∈ Hom(H ∗ , H ∗ )!

Remark 6.10 Canonical basis in .Cn×n in Dirac’s Bra Ket formalism.

The canonical basis in .Cn×n is given by .E := {E is : i, s ∈ I (n)} with


j
basis matrices given by . E is = |ei )(es | = (εis )r = δi j δsr ). It is interesting that
.rank(E is ) = 1 (see also Eq. (7.1)). We thus get for the matrix . A the expression:


n
. A= αsi | ei )(es |.
s,i

∑n
Remark 6.11 The extended identity . i=1 | ei )(ei |.
∑n
According to remark 6.9, the expression. i=1 |ei )(ei | can be identified with

id H or id H ∗ .
.

Remark 6.12 Matrix multiplication.

The present Dirac formalism leads also directly, as in Sect. 5.1, to the expres-
j
sion for matrix multiplication. With. A = (αsi ), B = (βr ), C = (γri ), i, j, r, s ∈
I (n), using Remark 6.10, we get:

.C = A B = αsi |ei )(es |βrj |e j )(er |


= αsi βrj |ei )(es |e j )(er | = αsi βrj | ei )δ sj (er |
= α ij βrj |ei )(er |.

This leads to
γ i = α ij βrj .
. r


194 6 The Role of Dual Spaces

Summary

The role of the dual vector space and the dual map was thoroughly discussed. This
is an area that is often neglected in physics. The last section of this chapter on Dirac
formalism in quantum mechanics illustrates that it doesn’t have to be the case. Dual
maps in situations where only abstract vector spaces are available serve as a certain
substitute for adjoint maps, which, as we have seen, are defined on inner product
vector spaces.
Here, we also observed the dual version of the four fundamental subspaces of a
linear map. The annihilator of a subspace, as a subspace in the dual space, also played
an important role. We showed that an inner product space is naturally isomorphic to
its dual spaces.
Following this, within duality of inner product vector spaces, the corresponding
adjoint to a given linear map, was introduced. The four fundamental subspaces are
naturally most visible in the inner product space situation using the adjoint map.
Finally, as mentioned, Dirac formalism was addressed.

Exercises with Hints

Exercise 6.1 Show that transposition

. T : Km×n −→ Kn×m
A |−→ AT

is a linear and invertible map.


Exercise 6.2 Any nontrivial linear function is always surjective. Show that .im ξ =
K for any .ξ ∈ Hom(V, K)\{0}.

In the following two exercises, we see explicitly the role of the dual space
.(Km )∗ in determining the rows of a matrix . A ∈ Km×n . Similarly, we see that
n ∗ T
.(K ) determines the rows of . A .

Exercise 6.3 We consider the matrix . A ∈ Km×n as a linear map . A ∈ Hom(V, V ' )
with .V = Kn and .V ' = Km and we write:
⎡ 1⎤
α
' ' ⎢ .. ⎥
. A = (αs ) = [a1 . . . an ] = ⎣
. ⎦,
i

αm
6.4 The Dirac Bra Ket in Quantum Mechanics 195

with .i ∈ I (m), .s ∈ I (n), .αsi ∈ K, .as ∈ Km , and .α i ∈ (Kn )∗ .


Show that
' 'i
.as = Aes and α = ε A,
i

where

(es , εs ) is the canonical dual basis pair in Kn ;


.

(ei' , ε' ) is the canonical dual basis pair in Km .


i

Exercise 6.4 Use a similar notation as in Exercise 5.3.


Let . AT : Km → Kn , with
[ 1 ]
α'
. AT = (αis ) = [a1 . . . am ] = .. ,
.n
α'
αi ∈ Kn , α ' ∈ (Km )∗ .
s

Show that
a = AT ei' and α ' = εs AT .
s
. i

Having more advanced tools in this chapter, it will be even easier to prove
the row-column-rank theorem in Exercise 6.6. Beforehand, let us recall the
connection between the .im f (with . f ∈ Hom(V, V ' )) and the column rank of
. M( f ) in Exercise 6.5.

Exercise 6.5 Let . f be a linear map . f ∈ Hom(V, V ' ). Show that .dim im f equals
the column rank of . M( f ), the matrix of . f .
Exercise 6.6 The row-column-rank theorem.
Consider the linear map . f = A ∈ Hom(Kn , Km ), then show that the row rank of . A
equals the column rank of . A.
Exercise 6.7 The form of any rank equal .1 matrix.
Use the experience made with the proof in Exercise 5.4 to show that for any matrix
.A ∈ R , the rank of . A is .1 if and only if the matrix . A is of the form . A = wξ with
m×n
n ∗
.w ∈ R and .ξ ∈ (R ) . In this case, we can also write:
m

. A = wv T = |w)(v|.
Exercise 6.8 The dual basis covectors select the coordinates of the vectors in .V
and the basis vectors select the coordinates of covectors in .V ∗ .
If .(b1 , . . . , bn ) is a basis in .V and .(β 1 , . . . , β n ) its dual basis, so that .(β s (br )) =
δrs , s, r ∈ I (n), show that for any vector .v ∈ V and any covector .ξ ∈ V ∗ ,
196 6 The Role of Dual Spaces

(i) .v = bs β s (v) ∈ V ;
(ii) .ξ = ξ(bs )β s ∈ V ∗ .

Exercise 6.9 There alway exists an element of the dual space which annihilates any
given proper subspace of the corresponding vector space.
Let .U be a subspace of a vector space .V . If .dim U < dim V , then show that there
exists some .ξ ∈ V ∗ such that .ξ(U ) = 0.

Exercise 6.10 The annihilator is a subspace.


If .U is a subspace of a vector space .V , then show that the annihilator .U 0 of .U is a
subspace of the dual .V ∗ :

.U ≤ V .
0

Exercise 6.11 Here is another proof of Proposition 6.5. This proof is basis free.
For .U a subspace of a vector space .V , the dimension of the annihilator .U 0 is given
by
. dim U + dim U = dim V.
0

Prove the above assertion using the inclusion map

i : U −→ V
u |−→ i (u) = u ∈ V,

and the rank-nullity theorem.

The next four exercises deal with various simple relations between two sub-
spaces and the corresponding annihilators.

Exercise 6.12 If .U1 and .U2 are subspaces of .V with .U1 ≤ U2 , then show that .U20 ≤
U10 .

Exercise 6.13 If .U1 and .U2 are subspaces of .V with .U20 ≤ U10 , then show that
.U1 ≤ U2 .

Exercise 6.14 If .U1 and .U2 are subspaces of .V , then show that .(U1 + U2 )0 = U10 ∩
U20 .

Exercise 6.15 If .U1 and .U2 are subspaces of .V , then show that .(U1 ∩ U2 )0 = U10 +
U20 .
References and Further Reading 197

The notion of double dual space .V ∗∗ of a vector space .V is very important,


especially to understand the tensor formalism. This is why it will be present
at times in the next chapters.

Definition 6.5 The double dual of .V , here denoted by .V ∗∗ , is the dual of the
dual space .V ∗ :
∗∗
.V := Hom(V ∗ , R).

. V ∗∗ is canonically isomorphic to .V :

.Ψ : V −→ V ∗∗
v |−→ (Ψ(v))(ξ ) ≡ v # (ξ ) := ξ(v)

for .v ∈ V and .ξ ∈ V ∗ .

Exercise 6.16 Show the following assertion: .Ψ ≡ (·)∗ is a linear map from .V to
∗∗
.V .

Exercise 6.17 Show the following assertion: .Ψ is an isomorphism from .V to .V ∗∗ .

References and Further Reading

1. S. Axler, Linear Algebra Done Right. (Springer Nature, 2024)


2. S. Bosch, Lineare Algebra. (Springer, 2008)
3. G. Fischer, B. Springborn, Lineare Algebra, Eine Einführung für Stu-dienanfänger, Grundkurs
Mathematik. (Springer, 2020)
4. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
5. K. Jänich, Mathematik 1, Geschrieben für Physiker (Springer, 2006)
6. N. Jeevanjee, An Introduction to Tensors and Group Theory for Physicists. (Springer, 2011)
7. J. Liesen, V. Mehrmann, Linear Algebra. (Springer, 2015)
8. P. Petersen, Linear Algebra. (Springer, 2012)
Chapter 7
The Role of Determinants

The determinant is one of the most exciting and essential functions in mathematics
and physics. Its significance stems from the fact that it is a profoundly geometric
object. It possesses many manifestations. Its domain is usually the .n × n-matrices,
and it may also be called the determinant function. Another form is the map from
the Cartesian product .V n = V × . . . × V to .K (where .dim V = n) which is linear
in every component with the additional property (alternating) that if two vectors are
identical, the result is zero. This is usually called a multilinear alternating form or a
determinant form or even a volume form on .V . In connection with this, the notion
of orientation is illuminated by a determinant.
In what follows, we start with the algebraic point of view for determinants, and
in doing so, we derive and discuss most of the properties of determinants. Later we
address the geometric point of view. In addition, we define the determinant of a linear
operator, which is essentially a third manifestation of determinants.

7.1 Elementary Matrix Operations

From the algebraic point of view, the use of elementary operations and elementary
matrices offers some advantages since the expressions and proofs become clearer and
shorter. We start with a few remarks on the notations and definitions. We first consider
the .m × n canonical basis matrices (see also Comment 3.3, and the Examples 2.5
and 2.6) given by:
. E is := (εis )r
j
(7.1)

with
(εis )rj ≡ (εis )rj := δi j δsr , for i, j ∈ I (m), r, s ∈ I (n).
. (7.2)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 199
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_7
200 7 The Role of Determinants

So we can write, as in Comment 3.3,


⎡ 0 ⎤
..
⎢ . ⎥
⎢ ⎥
E is = ⎢ 0 ··· 0 1 0 ··· 0 ⎥ ith
0
.
⎢ ⎥ row. (7.3)
⎣ 0
.. ⎦
.
0

sth column
.

Now, we consider various matrices . f ∈ Kn×n with the form:

. Fis = 1n + E is (7.4)
. Fk (λ) = 1n + (λ − 1)E kk λ ∈ K, i, s, k ∈ I (n) (7.5)
⎡ ⎤
1 0 0 0 0
⎢0 1 0 1 0⎥
⎢ ⎥i
. Fis = 1n + E is = ⎢0 0 1 0 0⎥
⎢ ⎥
⎣0 0 0 1 0⎦
0 0 0 0 1
⎡ ⎤
1
⎢ 1 0⎥
⎢ ⎥
. Fk (λ) = ⎢ λ ⎥k
⎢ ⎥
⎣0 1 ⎦
1

. k

Comment 7.1 Inversion and transpose of the “. F” matrices.


It is clear that . Fis and . Fλ (k)(λ /= 0) belong to .Gl(n, K).
Since . Fis (1n − E is ) = 1n and . Fk (λ)Fk ( λ1 ) = 1n , this is easy to check. The
inverses are given by

1
. Fis−1 = 1n − E is and Fk−1 (λ) = Fk ( ). (7.6)
λ
For the transpose, we have
7.1 Elementary Matrix Operations 201

T
. Fi j = F ji .
Fk (λ)T = Fk (λ). (7.7)

This follows directly from Eqs. 7.4 and 7.5.

Remark 7.1 The elementary matrices.


The elementary matrices in the literature are usually given by
(i) . Pis exchange of the two .i and .s columns or rows,
(ii) . Fk (λ),
(iii) . Fis (λ) := 1n + λE is .
Elementary operations are obtained using . Fis and . Fk (λ). For an .m × n-matrix
.A we obtain:
(a) . AFis from . A by adding the .ith column to the .sth column.
(b) . AFk (λ) from . A by multiplying the .kth column with .λ.
Analogously, we have . Fis' A and . Fk' (λ)A, the corresponding row operations for
the matrix . A ∈ Km×n with . Fis' , Fk' (λ) ∈ Km×m .

Remark 7.2 Left and right actions by elementary matrices.


The elementary column (row) operations for a matrix. A ∈ Km×n correspond
exactly to the right (left) multiplication of . A by elementary matrices from
. Gl(n, K), respectively .(Gl(m, K)).

Comment 7.2 Elementary row operations and notation.


We use the notation

. E si ≡ E is , Fsi ≡ Fis , Fsi (λ) ≡ Fis

and
. Pis ≡ Psi .
202 7 The Role of Determinants

'
For the elementary row operations with the lower triangular matrix . Fs i (λ), (i <
s), we have:
'
Fs i (λ)
.αi |−→ αi + λαs and
α |−→ α j j
for j /= i, (7.8)

and with the diagonal matrix . F j (λ), we have:


'
Fi (λ)
αi |−→ λαi
. and
α |−→ α j j
for j /= i. (7.9)

It is well known that the action on matrices . A ∈ Km×n and . B ∈ Kn×n with
.rank B = n by a sequence of elementary matrices as given in Remark 7.1, leads

to [1 ∗]
.F1' F2' · · · F '' A = r
0 ∗ (7.10)
l

and to
. F1' F2' · · · Fk'' B = 1n . (7.11)

Analogously, the column operations lead to the results


[1 ]
.A F1 · · · Fl = r 0 (7.12)
∗ ∗

and
. B F1 · · · Fk = 1n . (7.13)

Remark 7.3 Normal form of matrices and elementary matrices. Using the
tailor-made bases (see Proposition 3.8 and Theorem 3.1), we obtained for an
.m × n-matrix . A with

. rank(A) = r, a representation à of A in the form


[ ]
à = 10r 00 . (7.14)

The same can be obtained using elementary row and column operations as
in Comment 7.1, and in Eqs. (7.10) and (7.12) in Comment 7.2. Hence

. à = F1' . . . Fl'' AF1 . . . Fl or à = F ' AF (7.15)


7.2 The Algebraic Aspects of Determinants 203

with
. F ' := F1' F2' . . . Fl'' and F := F1 F2 . . . Fl . (7.16)

. F1' , F2' , . . . , Fl'' are elementary matrices from .Gl(m). . F1 F2 , . . . , Fl are elementary
matrices from .Gl(n).

Remark 7.4 Criterion for invertibility.

. A is invertible if and only if . A is a product of elementary matrices.

Remark 7.5 Equivalent, row equivalent, and column equivalent matrices.

The relation . Ã ∼ A defined by . Ã = F ' AF, as above, is an equivalence


relation and we say . Ã and . A are equivalent. This was defined in Sect. 3.3 in
Definition 3.9. It turns out(that
) in the context of row or column operations, the
∼ ∼
notion of row equivalent . r or column equivalent .( c) are relevant.

So we have the following definition:

Definition 7.1 Row equivalence and column equivalence.


Matrices . A and . B in .Km×n are row equivalent, denoted by . A ∼ B if and only
r
if there exists a matrix . F ' ∈ Gl(m) such that

. B = F ' A.

Similarly, . A and . B in .Km×n , denoted by . A ∼ B, are column equivalent if and


c
only if there exists a matrix . F ∈ Gl(n) such that

. B = A F.

7.2 The Algebraic Aspects of Determinants

We start with an algebraic definition of “determinant”.


204 7 The Role of Determinants

Definition 7.2 Determinant function.


A map .Δ : Kn×n → K is a determinant function if

(Δ1)
. Δ(AFis ) = Δ(A) i /= s,
( )
(Δ2) Δ AFk (λ) = λΔ(A) ∀λ ∈ K.

Axiom .Δ1 means that .Δ is invariant if we add to a given column another col-
umn. This is a specific additive property of .Δ. Geometrically means that .Δ is shear
invariant.
Figure 7.1 shows in 2 dimensions the shear invariance of the area of a parallelo-
gram. As is well-known, this is given by the Euclidean geometry. Here, it corresponds
to a certain determinant function which is usually denoted by.det (see Definition 7.4).
Property .Δ2 means that if we scale a given column by .λ, the same happens to .Δ.
We may call this homogeneity or the scaling property of .Δ.
Using the definition given in Sect. 7.1, we set . B := AFis and . B ' = AFk (λ). It
is obvious that . A and . B ' are column equivalent . A ∼ B, and similarly . A and . B ' are
c
column equivalent . A ∼ B ' .
c
We may also express the properties .Δ1 and .Δ2 using, as usual, the identification
between a list and the corresponding matrix:

Fig. 7.1 The shear invariance of the area of a parallelogram


7.2 The Algebraic Aspects of Determinants 205

. A = (a1 , a2 , . . . , an ) = [a1 a2 . . . an ] ∈ Kn×n

shortly, by

(Δ1) Δ(. . . ar + as . . . as . . . ) = Δ(. . . ar . . . as . . . ),


.
(Δ2) Δ(. . . λar . . . ) = λΔ(. . . ar . . . . . . ).

Remark 7.6 Zero scaling.

The scalar .λ is allowed to be zero! This means that if a column in . A is zero,


then .Δ(A) is zero too.

Remark 7.7 Determinant functions as vector space.

The space of determinant functions .Δn = {Δ . . . } is a vector space: This


is evident since the space of functions .Δ : K n×n → K is a vector space. The
additional conditions in axioms .Δ1 and .Δ2 in Definition 7.2 do not affect the
vector space structure since they respect linear combinations.

From the axioms .Δ1 and .Δ2, four very important properties follow:

Comment 7.3 Elementary transformations act like scalars.


We can summarize the above axioms by the following characteristic property:
Every elementary transformation acts on determinant functions like a scalar.
This leads to the next proposition. In the proof of it, we will see explicitly how
it works.

Proposition 7.1 First implications for the determinant functions.


(i) Let matrix . A ∈ Kn×n have .rank(A) < n, then .Δ(A) = 0.
(ii) If .Δ(1n ) = 0, then .Δ = 0.
(iii) If .rank(A) = n, then there is a scalar .λ' /= 0 so that .Δ(A) = λ' Δ(1n ).
.dim(Δ ) = 1.
n
(iv)
206 7 The Role of Determinants

Proof
(i) From Remarks 7.3 and 7.6, for .r = rank A < n, that is, .(r /= n) and the prop-
erties of elementary matrices, we obtain the sequence of equations with scalars
' '
.λ1 , . . . , λl and .λ1 , . . . , λl ' :

[ ]
Δ(A) = Δ(F1' F2' . . . Fl'' 10r 00 F1 F2 . . . Fl ),
.

..
.
[ ]
Δ(A) = Δ(F1' . . . Fl'' 10r 00 )λ1 , . . . λl ,
[ ]
and Δ(A) = λ'1 λ'2 . . . λl' ' Δ 10r 00 )λ1 . . . λl .

Since for example the last columns are zero, we obtain also .Δ(A) = 0. This
proves (i).
(ii) In this case, we may assume that .r = n, for example .Δ(A) = Δ(F1n ). Anal-
ogously as before, we have .Δ(A) = λ' Δ(1n ) with .λ' /= 0. It is now clear that
if .Δ(1n ) = 0, it follows .Δ = 0 as well. So (ii) is proven.
(iii) In the proof of (ii), we found .Δ(A) = λ' Δ(1n ) with .λ' /= 0, so (iii) is already
proven.
(iv) The result of (iii) means essentially by itself that the dimension of the space of
the determinant functions is .1: .Δ(A) = λΔ(1n ) signifies that finally for every
matrix . A, it is the value .Δ(1n ) that counts. Since .Δ(1n ) ∈ K, we have:
∼ ∼
Δn := {Δ} bi=j Δ(1n )K = K
.

and .dim(Δn ) = 1. This proves iv and therefore proposition 7.1.


Furthermore, the relation.Δ(A) = λ' Δ(1n ) leads to the standard determinant,.det:


We take .Δ0 with
.Δ0 (1n ) = 1. (7.17)

Then we have .Δ(1n ) = λΔ ∈ K and we may write

Δ(1n ) = λΔ Δ0 (1n ).
. (7.18)

This means that we also get

Δ = λΔ Δ0 and Δ(A) = λΔ Δ0 (A).


. (7.19)

This leads to the following definition and proposition.


7.2 The Algebraic Aspects of Determinants 207

Definition 7.3 Normalization of the determinant function.


In addition to (.Δ1) and (.Δ2) above, we normalize (.Δ3): For .Δ0 ∈ Δn , we take
.Δ0 (1n ) = 1.

Proposition 7.2 Uniqueness of .Δ0 .

The axioms .Δ1, Δ2, and .Δ3 determine uniquely the function .Δ0 ∈ Δn with
.Δ0 (1n ) = 1.

Proof For .Δ ∈ Δn with .Δ(1n ) = 1, according to Proposition 7.1 (iii), and Eqs.
(7.18) and (7.19,) we have:

Δ = λ Δ Δ0
. and
Δ(1n ) = λΔ Δ0 (1n ).

!
Since .Δ(1n ) = 1 and .Δ0 (1n ) = 1, this gives

1 = λΔ 1 and λΔ = 1.
.

So .Δ = Δ0 and .Δ0 is uniquely defined. ∎

Definition 7.4 We define the standard .det to be .Δ0 and write .det := Δ0 .

Corollary 7.1 .Δ and .det.

Using Eq. 7.19 and Definition 7.4, we can write

Δ = λΔ det .
. (7.20)


Having shown the uniqueness of .det, we are going now to show also its existence
inductively with respect to the dimension .n.
208 7 The Role of Determinants

Proposition 7.3 The existence of the determinant.

Axioms .Δ1, Δ2, and .Δ3 determine uniquely the determinant function .Δ =
det : Kn×n → K.
For low dimensions we have, as is well-known, in an obvious notation the
following results:

Proof For .n = 1: The existence is clear.


[ ]
α1 β 1
Proof For .n = 2: If we put . A = (a, b) ≡ [ab] = α2 β 2
, we can set .det (A) :=
2
α1 β 2 − α2 β 1 .
We verify the axioms .(Δ1), (Δ2), and .(Δ3):

[ 1 1 1]
Δ1 : det (a + b, b) = det α2 +β 2 β 2 = (α1 + β 1 )β 2 − (α2 + β 2 )β 1 .
2 2 α +β β
= α1 β 2 + β 1 β 2 − α2 β 1 − β 2 β 1 = α1 β 2 − α2 β 1 = det (a, b).
. [ 1 1] 2
Δ2 : det (λa, b) = det λα2 β 2 = λα1 β 2 − λα2 β 1 = λ det (a, b).
2 2 [ λα ] β 2
Δ3 : det (I2 ) = det 01 01 = 1.
2 2

So the existence of .n = 2 is proven.


Proof For .n = 3: It is interesting and useful to proof the existence of .det using the
3
well-known iterative expression for . A = [a1 a2 a3 ] = [abc]:

. det A = α1 det (b̄, c̄) − β 1 det (ā, c̄) + γ 1 det (ā, b̄), (7.21)
3 2 2 2

⎡⎤
α1 β 1 γ 1 [ 2 2 2]
⎣ 2 2 2⎦ α β γ
.with [a b c] := α β γ and [ā b̄ c̄] := 3 3 3 .
α β γ
α3 β 3 γ 3

We have again to show that the axioms .(Δ1), (Δ2), and .(Δ3) are valid:

Δ3 :
. is clear, det (13 ) = 1.
3
Δ2 : We have, for example, for the second column
[ 1 1 1]
α λβ γ
det(a, λb, c) = α2 λβ 2 γ 2 ,
α3 λβ 3 γ 3

det(a, λb, c) = α det (λb̄, c̄) − λβ 1 det (ā, c̄) + γ 1 det(ā, λb̄),
1
2 2
= α λ det (b̄, λc̄) − λβ det (ā, c̄) + γ λ det(ā, b̄).
1 1 1
2 2
7.2 The Algebraic Aspects of Determinants 209

Using Eq. 7.21, we obtain


.λ det(a, b, c).

This proves .Δ2.


What still needs to be shown is that axiom .Δ1 (the shear invariance) is also valid.
We proof this in the case of the first column as the other two cases are similar:
.det (a + b, b, c) = det (a, b, c), using .det (ā + b̄, b̄) = det (ā, b):
3 3 2 2

. det(a + b, b, c) =
= (α1 + β 1 ) det (b̄, c̄) − β 1 det (b̄ + b̄, c̄) + γ 1 det (ā + b̄, b̄),
2 2 2
= α1 det (b̄, c̄) + β 1 det (b̄, c̄) − β 1 det (ā, c̄) − β 1 det(b̄, c̄) + γ 1 det (ā, b̄),
2 2 2 2 2

= α det (b̄, c̄) − β det (ā, c̄) + γ det (ā, b̄),


1 1 1
2 2 2
= det (a, b, c).
3

So the existence of .n = 3 is also proven.


We now proceed to the proof of the existence of the determinant. For the proof of
the existence of .det , we use induction with respect to .n. We start with an appropriate
n
generalization of Eq. (7.21). Assume .det satisfies axioms .Δ1, .Δ2 and .Δ3.
n−1

Proof Proof of Proposition 7.3 for .n ∈ N.

. det (a1 , . . . , an ) :=α11 det (b̌1 , b2 , . . . , bn ) − α21 det (b1 , b̌2 , . . . , bn ) . . . +


n n−1 n−1

(−1) n
αn1 det(b1 , . . . , bn−1 ),

using [ ]
α11 , α21 ... αn1
. (a1 , a2 , . . . , an ) = b1 , b2 bn
with b1 , . . . , bn ∈ Kn−1 ,

where .(b1 , . . . , bˇi , . . . , bn ) indicates the list .(b1 , . . . , bn ) but omitting .bi . We see that
.det is linear in every column:
n

α11 det B1 − α21 det , B2 . . . + (−1)n αn1 det Bn


.
n−1 n−1 n−1

with

. B1 = (b̌1 , b2 , . . . , bn ), B2 = (b1 , b̌2 , b3 . . . bn ), . . . , Bn = (b1 , . . . , bn−1 ),

so .Δ2 is valid since linearity contains the homogeneity .(Δ2). In order to show that
Δ1 is valid as well, we proceed in an analogous way as in the case of .n = 3 and we
.

can show e.g that


210 7 The Role of Determinants

. det (a1 + a2 , a2 , . . . , an ) = det(a1 , . . . , an ). (7.22)


n

Δ3 is evidently also valid: .Δ(1n ) = 1.


.

So the existence is also proven for .n ∈ N. ∎


We therefore consider the existence of the determinant established.
Additionally, we have proven that .Δ is a multilinear map with respect to the
columns. Written in a symbolic way, we have the property

(Δ4) Δ(ar + as ) = Δ(ar ) + Δ(as ),


.
Δ(λar ) = λΔ(ar ).

A more precise formulation of multi-linearity is given by the following definition:


Δ is column-multilinear if for every .i ∈ I (n) and fixed .a1 , . . . ai−1 , ai+1 , . . . an , the
.
map
. x → Δ(a1 , a2 , . . . , ai−1 , x, ai+1 . . . , an )

is linear. In the following remark, we discuss the question of multilinearity as example


for .n = 3, using the Leibniz formula.

Remark 7.8 Column linearity.

The expression of Eq. (7.21) has the form

. det A = εi jk αi β j γ k .
3

So every factor is linear in.α, β, γ separately, for example.(α + ζ)βγ = αβγ +


ζβγ and.β |→ λβ leads to.λεi jk αi β j γ k . This means that.det A (as well as.det A)
3 2
has in addition the following property :

Δ4 :
. det is linear in every column which of course includes the axiom Δ2.
3

This leads to a further definition of determinants which is very common in liter-


ature.

7.3 Second Definition of the Determinant

The definition is given by the two axioms.

(D1) Δ : Kn×n → K is multilinear with respect to the columns,


. A |→ Δ(A),
(D2) and : Δ(A) = 0 if rank(A) < n.
7.4 Properties of the Determinants 211

The property . D2 is also called alternating.

Remark 7.9 The definitions .(D1, D2) and .(Δ1, Δ2) are equivalent.

Proof

From .(Δ1, Δ2) it follows that .Δ is multilinear so . D1 holds.


Axiom . D2 follows from Proposition 7.1 (i), so we have .(Δ1, Δ2) ⇒ (D1, D2).
Axiom .Δ2 follows from . D1 since homogeneity follows from the linearity of .Δ.
From . D2 and the multi-linearity .(D1), since rank .(b, b, . . . ) < n gives
.Δ(b, b, . . . ) = 0, we have for example

.Δ(a + b, . . . ) = Δ(a, b, . . . ) + Δ(b, b, . . . ) and Δ(a + b, . . . ) = Δ(a, b).

So it follows .Δ1 and we have .(D1, D2) ⇒ (Δ1, Δ2). ∎

7.4 Properties of the Determinants

We summarize some of the most important properties of .det below. Most properties
follow directly from the existence of .Δ1 and .Δ2 (or . D1 and . D2) and the normal-
ization
. det(1n ) = 1 (Δ3).

Interestingly, we do not have to explicitly use the permutation group .(Sn ) at this
level.
(i) The determinant of a matrix is linear in every column. This is equivalent to
the determinant being a multilinear map on .Kn .
(ii) The determinant remains unchanged if we add a linear combination of some
columns to a different column. This corresponds geometrically, as we saw in
Sect. 7.2 and Fig. 7.1, to the shear transformation invariance of the determi-
nant.
(iii) The determinant is zero if the matrix columns are linearly dependent. This is
closely connected with the next property.
(iv) The determinant changes sign if two columns are interchanged. This means
that the determinant is an alternating multilinear form.
(v) Multiplication law: If .Δ is the normalized determinant (with .Δ(1n ) = 1),
then
.Δ(AB) = Δ(A)Δ(B) if Δ(1n ) = 1.
212 7 The Role of Determinants

Proof Let .Δ1 and .Δ2 be determinant functions. From Proposition 7.1 (i), it
first follows that

. Δ1 (1n )Δ2 (A) = Δ2 (1n )Δ1 (A) (7.23)

since .Δ3 (A) := Δ1 (1n )Δ2 (A) − Δ2 (1n )Δ1 (A) is also a .det function and with
Δ3 (1n ) = Δ1 (1n )Δ2 (A) − Δ2 (1n )Δ1 (1n ) = 0, we have.Δ3 = 0 so that (7.23)
.
holds.
Setting

Δ1 (B) := Δ(AB) which is also a det function,


.

Δ1 (1n ) = Δ(A),

and using (7.23) for .Δ1 and .Δ we obtain

.Δ(1n )Δ1 (B) = Δ1 (1n )Δ(B),


Δ1 (B) = Δ(A)Δ(B) and
Δ(AB) = Δ(A)Δ(B).


(vi) Any determinant function .Δ is transposition invariant:

Δ(AT ) = Δ(A) holds.


.

Proof This follows from the fact that every invertible matrix . A is a product of
an elementary matrix (see Comments 7.1 and 7.2, and Remark 7.3):

. A = F1 , F2 . . . Fm

and for every elementary matrix . F j , j ∈ I (m)


T
. det F j = det F j

holds (see Comment 7.1). So we have, using the multiplication law,


T T T T
det AT = det(F1 . . . Fm )T = det(Fm . . . F1 ) = det Fm . . . det F1 and
det AT
.
= det Fm . . . det F1 = det A.


(vii) The multi-linearity leads to the expression

. det(λA) = λn det A.
7.4 Properties of the Determinants 213

(viii) For an upper triangular matrix, the determinant function .Δ is given by the
product
⎡ of the⎤diagonal elements:
a1 ∗
⎢ .. ⎥
.Δ ⎣ . ⎦ = a1 a2 . . . an .
0 an

Proof Using those elementary operations which leave the determinant func-
tions invariant, we obtain
⎡ ⎤ ⎡ ⎤
a1 ∗ a1 0
⎢ .. ⎥ ⎢ .. ⎥
.Δ ⎣ . ⎦ = Δ⎣ . ⎦ = a1 . . . an Δ(1n ) = a1 . . . an .
0 an 0 a1


[A B]
(ix) Let . C D be a block matrix with . A ∈ Kr ×s , B ∈[ Kr ×(n−s)
] , C ∈ K(m−r )×s , and
(m−r )×(n−s)
.D ∈ K , then the following holds: .det 0 D = det A det D.
A 0

Proof
[ ]
If we define Δ(A) := det A0 D0 which is a det function,
. using Δ(A)
[ = Δ(1
] n ) det A and Δ(1n ) = det D,
we obtain det A0 D0 = det D det A = det A det D.


[A B]
(x) Using the same notation as in (ix), the following holds: .det 0 D = det A det D.

Proof If . A is invertible, we have:


[A B] [ A 0 ][ ]
. = I A−1 B = det A det D.
0 D 0 D 0 I

(xi) . A is invertible if and only if .det A /= 0.

Proof If . A is invertible, it means that there exists a . B so that . AB = 1n and


det(AB) = det A det B = 1 ⇒ det A /= 0.
.

If .det A /= 0 it means that .rank A = n. Otherwise, .rank A < n, and we


would have .det A = 0.
But .rank A = n means that .ker A = 0 and thus . A is invertible. ∎

(xii) All the properties of the determinant that refer to the columns also hold when
replacing columns with rows.
(xiii) Cofactor expansion concerning the columns (rows).
From a given matrix . A ≡ (as )n ≡ (αis ), we define various matrices with
respect to the fixed position .(i, s).
214 7 The Role of Determinants

Ais := (a1 , . . . , as−1 , ei , as+1 , . . . an )


.
[ A11 0 A12 ]
A(1)is := 0 1 0 where 1 is in the ith row and the sth column
A 0 A
[ 21 A12 22]
Ais := AA11
/ 21 A22
is the (n − 1) by (n − 1) matrix

where the .ith row and the .sth column have been deleted.

.Let γis := det Ais , then C := (γis ) is called the cofactor of A.

If we use elementary matrix operation, we see that the entry .γis is given by
the expressions

γ = det Ais = det(A(1)is ) = (−1)i+s det /


. is Ais . (7.24)

Proposition 7.4 The cofactor expansion (Laplace expansion).

The adjunct . Ai#j of . A is given by . A# = (αi#j ) where .αi#j = γ ji , or, equiva-


lently, . A# = C T . Then the cofactor expansion with respect to the columns is
given by
. A A = A A = (det A)1n
# #

or equivalently by

n
. αik
#
αks = δis (det A).
k=1

Proof The calculation of the components of the matrix . A# A is given by (.i fixed)


n ∑
n
αik
#
αks = det(a1 , . . . , ai−1 , ek , ai+1 . . . an )αks ,
k=1 k=1

= det(a1 , . . . , ai−1 , ek αks , ai+1 . . . , an ),
k
= det(a1 , ai−1 , as , ai+1 , . . . , an ).
. If s /= i, we have
det(a1 , . . . , ai−1 , as , ai+1 , . . . an ) = 0
since det(. . . as . . . as . . . ) = 0.
If s = i, we have
∑n
αik
#
αks = det(a1 , . . . , ai−1 , ai , ai+1 , . . . an ) = det A.
k=1
7.5 Geometric Aspects of the Determinants 215

So we have altogether

n
. αik
#
αks = δis (det A). ∎
k=1

The corresponding expression for the rows is given by


n
. αik αks
#
= δis det A.
k=1

Remark 7.10 Laplace expansion of a determinant.



With .i = 1 and .s = 1, we obtain . nk=1 α1k αk1
#
= det A or equivalently


n
. det A = (−)1+k α1k det /
A1k .
k=1

This is the recursion formula which was used in the proof of the existence of
.det.

7.5 Geometric Aspects of the Determinants

As we saw so far, the determinant gives essential information about .(n × n)-matrices
and subsequently about linear maps between vector spaces of the same dimension,
for example, about linear operators (endomorphism).
In addition, determinants also have a deep geometric significance. We restrict
ourselves to .R- vector spaces to simplify the explanations and support the intuition.
The determinant of an operator. f : V → V measures how this. f changes the volume
of solids in .V . In addition, since .det f is a scalar with positive and negative values,
it also measures how this . f changes the orientation in .V . The determinant by itself
turns out to be essentially a subtle geometric structure that defines the volume and
the orientation in .V .
It is important to note that this is a new geometric structure on .V called a volume
form. It may be, or rather has to be defined directly on an abstract vector space
(a vector space without a scalar product on it). Despite this, if we already have a
Euclidean vector space .V , this induces a specific volume form on .V . Hence, the
volume form is a weaker geometric structure than a scalar product.
216 7 The Role of Determinants

We want to demonstrate these ideas in the simplest nontrivial case. We consider the
two-dimensional Euclidean space .R2 with its standard basis .(e1 , e2 ). It is understood
that our discussion is also valid for .R3 , R4 , . . . Rn . [ 1] [ 1]
α α
We start with a parallelogram . P(a1 , a2 ) given by .a1 = α12 and .a2 = α22 . The
1 2
area of . P(a1 , a2 ) is given by the usual formula. For the square, we have

. volume2 (a1 , a2 )2 = || a1 ||2 || a2 ||2 sin2 α, (7.25)

with α, the angle α = <(a1 , a2 ), and || ai ||2 = (ai | ai ),


.

. volume2 (a1 , a2 )2 = (a1 | a1 )(a2 | a2 ) − (a1 | a2 )2 . (7.26)

Applying the linear map . f to .a1 , a2 and .volume2 , we have from


μ
. f (ai ) = aμ ϕi , ϕis ∈ R, F = (ϕis ), i, μ, s ∈ I (2) (7.27)

and

( f a1 , f a2 ) = (a1 , a2 )F
. (7.28)
. volume2 ( f a1 , f a2 ) =( f a1 , | f a1 )( f a2 | f a2 ) − ( f a1 | f a2 ) .
2 2
(7.29)

A straightforward calculation gives the very interesting result

.( f a1 | f a1 )( f a2 | f a2 ) − ( f a1 | f a2 )2
= (ϕ11 ϕ22 − ϕ21 ϕ12 )2 ((a1 | a1 )(a2 | a2 ) − (a2 | a2 )(a1 | a2 )). (7.30)

This is in fact

. volume2 ( f a1 , f a2 )2 = (det F)2 volume2 (a1 , a2 )2 (7.31)

and with the definition . P := P(a1 , a2 ) for the parallelogram .(a1 , a2 ), . P ' :=
P( f a1 , f a2 ), Eq. (7.31) may be written as
.

.(volume2 P ' )2 = (det F)2 (volume2 P)2 . (7.32)

There are a few important remarks to be made:


(i) The ratio .volume2 P ' / volume2 P is independent of the dot product in .R2 .
(ii) A nontrivial result is obtained if both, .(a1 , a2 ) and .( f a1 , f a2 ), are linearly
independent, that is . B := (a1 , a2 ) and . B ' := ( f a1 , f a2 ) are bases in .R2 and of
course . f is an isomorphism (.det F /= 0).
(iii) We remember (see Proposition 3.1) that every arbitrary basis .C in .R2 can be
obtained from the standard basis . E = (e1 , e2 ) by applying an isomorphism .g in
7.5 Geometric Aspects of the Determinants 217

.R2 : C = g(E) or equivalently .C = E G because .G = gC E = (γsi = γis ). This


means that if we fix .volume2 (e1 , e2 ) = vol2 ∈ R, then .volume2 (c1 , c2 ) is given
by

. volume2 (c1 , c2 )2 = (det G)2 volume2 (e1 , e2 ) = (det G)2 vol22 . (7.33)

To simplify things, we may put .volume2 (e1 , e2 ) = 1 and we have

. volume2 (c1 , c2 )2 = (det G)2 . (7.34)

We can go one step further and define

. volume2 (c1 , c2 ) := det G. (7.35)

The result is that we may define on .R2 and in every two-dimensional vector
space .V , a signed volume which is completely independent of the presence or
not of a scalar product. Slightly more generally we may write, as example for
.n = 2, using the notation of Sect. 7.2, the following definition:

Definition 7.5 Volume form on .R2 .


A volume form . D is a determinant form on .R2 .

. D: R2 × R2 −→ R,
(a1 , a2 ) |−→ D(a1 , a2 ) := Δ(A)

with . A = [a1 a2 ]. For .Δ◦ (12 ) = 1, we have .Δ◦ = det and

. D◦ (e1 , e2 ) = det(12 ) = 1.

(iv) The interpretation of the signs can be read off from the example
[10]
. D◦ (e1 , e2 ) = det[e1 e2 ] = det 01 = +1
and
[0 1]
D◦ (e2 , e1 ) = det[e2 e1 ] = det 10 = −1,
or more generally
D◦ (a1 , a2 ) = det[a1 a2 ] = det A,
D◦ (a2 , a1 ) = det[a2 a1 ] = − det A.

The sign of a given volume form characterizes the “standard” or the “non-
standard” orientation of a basis.
218 7 The Role of Determinants

(v) From Eq. (7.32), we may write

. volume2 P ' = (det F) volume2 P. (7.36)

In this case, if .det F = −1, then the basis . B ' = ( f a1 , f a2 ) has a different ori-
entation to the basis . B = (a1 , a2 ). This means that the linear map . F changes the
orientation.

So far, we actually used the term orientation in a common way. In the next section,
we are going to focus our attention on a more profound discussion of this term.

7.6 Orientation on an Abstract Vector Space

The last discussion is a good introduction to the notion of orientation on a real vector
space. Orientation is a special structure that can be introduced on an abstract vector
space.
Orientation plays a very important role in both physics and mathematics, and has
a great impact on daily life. The scientists who are confronted with this concept have
a good intuitive understanding of it. Here, we are going to give the precise definition
of it. Section 1.2 and our discussion in Sect. 7.5 will be very helpful.
On a vector space .V , we first consider all the bases. The reason is that bases,
and in particular the relations between them, are the source of additional structures
in an abstract vector space. So we choose a basis . A = (a1 , . . . , an ) and a second
basis . B = (b1 , . . . , bn ). There exist certain relations between them, given by the
determinant of their transition matrix. It is clear that the transition matrix is invertible
and therefore its determinant is nonzero. Here however, we are only interested in
whether this determinant is positive or negative. This determines the equivalence
relation between the set of bases . B(V ) of .V which we call orientation.
We say two bases are orientation equivalent if and only if the determinant of their
transition matrix is positive. In this case, the two bases are consistently oriented. As
we learned in Sect. 1.2, this equivalence relation leads to a class decomposition of
the set of bases, and so to a quotient space consisting of subsets of bases.
Furthermore, it is evident that this quotient space consists only of two elements,
only of two subsets of . B(V ): The bases consistently oriented to the chosen basis . A
(with the positive determinant of the corresponding transition matrix), and the bases
having opposite orientations to the chosen basis . A (with the negative determinant of
the corresponding transition matrix).
The next definition summarizes the above considerations.
7.6 Orientation on an Abstract Vector Space 219

Definition 7.6 Orientation as an equivalence relation.


Two bases . A = (a1 , . . . , an ) and . B = (b1 , . . . , bn ) of a real vector space .V
have the same orientation, denoted by “or”, if the transition matrix .T , given
by . A = BT where .T ∈ Gl(n) ⊂ Rn×n has positive determinants.


In this case, we write . B or A and we say also that . A and . B are consistently oriented.
Note that . A = BT means equally well

[a1 · · · an ] = [b1 · · · bn ] T
.

or equivalently

a = bs τsi with T = (τsi ) τsi ∈ R and i, s ∈ I (n).


. s


Comment 7.4 .or is an equivalence relation.

Proof We need to show that .or is (i) reflexive and (ii) symmetric.

(i) . A or A

since . A = 1n , so .or is reflexive;
∼ ∼
(ii) . B or A ⇔ A or B
since, if . A = BT , then, with .det T = positive, . B = AT −1 and .det T −1 =

positive, so .or is symmetric;
∼ ∼ ∼
(iii) . A or B and . B or C ⇒ . A or C
since, if . A = BT and . B = C T ' , then, with .det T > 0 and .det T ' > 0, we

get. A = BT = C T ' T , with.det T T ' = det T det T ' > 0, so.or is transitive.

Remark 7.11 .Gl + (n) < Gl(n).

The subset .Gl + (n) ⊆ Gl(n), defined by .Gl + (n) := {T ∈ Gl(n) : det T >
0}, is a subgroup..


Using this, we can affirm the following for . A, B ∈ B(V ): . A has an equal orientation
with . B if there is some .T ∈ Gl + (n) with . A = BT .
220 7 The Role of Determinants


Definition 7.7 The quotient space . B(V )/or.
For a given basis . A = (a1 , . . . , an ) ∈ B(V ), we call the set of bases, given by

. or(A) := {B = (b1 , . . . , bn ) ∈ B(V ) : B or A}

an orientation of .V .
The corresponding quotient space is given by

. B(V )/or := {or(A) : A ∈ B(V )}.

The bases . A and . B above represent the same equivalence class or coset which we
call, as stated, orientation.

Example 7.1 An opposite orientation to a given one.

It is easy to obtain, for example, .or( Ā), an opposite orientation of the given
.or(A) with . A = (a1 , a2 , . . . , an ): we take .or( Ā) with . Ā = (−a1 , a2 , . . . , an )
and we observe that .or( Ā) /= or(A), since we may write

(−a1 , a2 , . . . , an ) = (a1 , a2 , . . . , an ) T '


.

with ⎡ ⎤
−1 0 0
0 1 ···
. T ' = ⎣ .. .. ⎦
. . 0
0 0 1

and .det T ' = −1.


Remark 7.12 The cardinality of . B(V )/or is .2.

From Remark 7.11 and Example 7.1, we can see that we have

. B(V )/or = {[A], [ Ā]}.

This means, as expected and widely known, that there are only two orientations
on a real vector space.

Now, we can also specify what we specifically mean by an oriented vector space.
7.7 Determinant Forms 221

Definition 7.8 Oriented vector space.


An oriented vector space is the pair .(V, or) with .V a real vector space, with
.dim V = n and an orientation denoted by or given by Definitions 7.6 and 7.7.


Comment 7.5 . B(V )/or as an orbit space.

The last discussion on orientation in linear quotient space was essentially an


application of Sect. 1.2.
We now come to a pleasant application of Sect. 1.3 on group actions. It turns

out that by using the terminology of that section, the quotient space. B(V )/or is at
+
the same time a right orbit space of the groups action .Gl (n) on .Gl(n). Taking
into account Remark 7.11, we realize that for .g1 ∈ Gl(n) with for example
.det g1 = −1, we can write

. Gl(n) = Gl + (n) ∪ g1 Gl + (n) with Gl + (n) ∩ g1 Gl + (n) = ∅.

We so obtain a disjoint composition of .Gl(n). As a result, by applying the


terminology of Sect. 1.3, we get the following isomorphism:

. B(V )/ or ∼
= Gl(n)/Gl + (n).

7.7 Determinant Forms

There are many ways of defining the determinant of a matrix. In the past (see Sect.
7.2), we first defined determinants as special functions on the set of square matrices

.Δ : Kn×n → K,
A |→ Δ(A).

In order to make this clear, we used the name determinant function. It was very natural
to consider the same object also as a function of the.n columns.(a1 , . . . , an ) of a given
matrix . A = [a1 . . . an ]. This is why we may use now the equivalent definition with
the letter . D just for distinction:

. D: Kn × . . . × K n −→ K,
(a1 , . . . an ) |−→ D(a1 , . . . , a1 ),
222 7 The Role of Determinants

with . D multilinear and alternating. We denote the space of determinant forms by

Δn (Kn ) = {D . . . },
.

and we get
. Δn = Δn (Kn ) ∼
= K.

This definition can be used to extend the concept of determinant to the case of an
abstract vector space.V with.dim V = n. In this case, we are talking about multilinear
forms on .V or n-linear forms or determinant forms on .V . The space of determinant
forms on .V are denoted similarly by

Δn (V ) = {D . . . }.
.

The concept of determinant can be extended further in this direction, as we shall


see, also to an endomorphism on .V .
It is essential to notice that all the properties of determinant functions are valid
for determinant forms if we exchange the words column and matrix with the words
vector and endomorphism. In addition, we have to consider the role of permutation
in connection with determinants.

7.7.1 The Role of the Group of Permutations

Permutations are appearing here because the determinants are not only multilinear
but also alternating. This leads further to the explicit form of the determinant known
as the Leibniz formula.
Therefore, it is helpful to summarize some of the relevant properties of the sym-
metric group .(Sn ) and the role of the sign of a given permutation.

( )
Definition 7.9 The sign .επ of a permutation .π = π11 π22 ... ... πn ∈ Sn is .−1 if
n
π j −πi
the number of pairs .Φ(π) in the list .(π1 , . . . , πn ) with . j−i = negative is odd,
and the sign is .+1 otherwise.

That is
ε = (−1)Φ(π) .
. π (7.37)

This can also be written as ∏ π j − πi


. π ε = . (7.38)
i< j
j −i
7.7 Determinant Forms 223

Using, without proof, the fact that every permutation is a product of a certain number
of transposes, .t (π) ∈ N0 ,
.π = τ1 τ2 . . . τt (π) ,

and taking into account that every transpose .τ has negative sign .ετ = −1, we also
have
t (π)
.επ = (−1) . (7.39)

We can also show that if we represent .π by the matrix . Pπ , defined as

. Pπ := [eπ1 · · · eπn ] = (δi,πi ), (7.40)

then . P is a group homomorphism:

. P : Sn −→ Gl(n, N◦ ) < Gl(n),


π |−→ Pπ

with
. Pπ◦σ = Pπ Pσ . (7.41)

The sign of .π is now given by


. π ε = det(Pπ ). (7.42)

The above homomorphism shows that the sign .ε is also a group homomorphism:

ε : Sn −→ Z2 ∼
. = {+1, −1}

with
ε
. π◦σ = επ ◦ ε σ . (7.43)

Comment 7.6 The permutation sign.

ε = (−1)Φ(π) = (−1)t (π) = det(Pπ ).


. π (7.44)

Here, we summarize the various definitions of the permutation sign.


224 7 The Role of Determinants

7.7.2 Determinant Form and Permutations

Coming back to the determinant terms on .V .(dim V = n), we write

. D: Vn −→ K
(v1 , . . . , vn ) |−→ D(v1 , . . . , vn ) ∈ K,

with . D .n-linear (.n-multilinear) and alternating.


We denoted the space of determinant forms by

.Δn (V ) = {D . . . }.

One of the most important result of permutations is the relation

. D(vπ1 , . . . , vπn ) = επ D(v1 , . . . , vn ) (7.45)

which describes the alternating property of . D.


By repeating essentially the results of Sect. 7.2 for the determinant function .Δ,
we obtain altogether the following equivalent conditions for a n-linear form . D so
that it gets the alternating property.
(i) . D(. . . vi . . . , v j . . . ) = −D(. . . v j . . . vi . . . ),
(ii) . D(. . . vi . . . , vi . . . ) = 0,
(iii) . D(. . . , vi = 0, . . . , . . . ) = 0,
(iv) if .(v1 , . . . , vn ) are linearly dependent, then . D(v1 , . . . , vn ) = 0 ,
(v)
. D(vπ1 , . . . , vπn ) = επ D(v1 , . . . , vn ), π ∈ Sn . (7.46)

As in Sect. 7.2, we obtain here again the result that the set of n-linear alternating
forms is a one dimension vector space.

Δn (V ) ∼
. = K. (7.47)

Further more, if we use the multi-linearity of . D and the above relation (v), we obtain
the explicit expression for the determinant. This is a very important Leibniz formula.
7.8 The Determinant of Operators in V 225

7.7.3 The Leibniz Formula for the Determinant

Proposition 7.5 The Leibniz formula.


Let . A = (αis ) be a matrix. Then

. det A = επ α1π1 α2π2 . . . αnπn . (7.48)
π∈Sn

Proof Using the multi-linearity of the determinant, we get

. det A = det(a1 , . . . , an ) ai ∈ Kn i,
for i 1 , . . . , i n ∈ I (n), we may write
det A = det(ei 1 αi11 , ei 2 αi22 , . . . , ei n αi n ),
det A = det(ei 1 , ei 2 , . . . , ei n )αi11 . . . αinn .

Using the relation (v) in Eq. (7.46), we obtain immediately



. det A = επ α1π1 α2π2 . . . απn .
π∈Sn

In Eq. (7.47), we may see immediately that every nontrivial determinant form,
also called for good reasons volume form or simply volume, can be used as a basis
of .Δn (V ).

7.8 The Determinant of Operators in . V

Initially, the determinant of an operator in .V is a basis dependent extension from


the determinant of a given matrix representation of an endomorphism (operator)
. f ∈ Hom(V, V ) to the endomorphism . f itself. This definition is reasonable since it
turns out that it is in fact nonetheless basis independent.
226 7 The Role of Determinants

Definition 7.10 The Determinant of an operator.


For a given endomorphism . f ∈ Hom(V, V ) with a matrix representation . FB ,
with respect to a basis . B ∈ B(V ), the determinant of . f is given by:

. det f := det FB .

Now, we have to show that this definition is well defined, that it is basis independent
which will justify the above notation as .det f .
In order to check the basis independence, we choose a second basis .C ∈ B(V )
and we expect to show that
. det FC = det FB . (7.49)

This follows from the commutative diagram given below in an obvious notation, with
. T = TC B :
FC
KCn KCn

T T
KnB KnB
FB

This shows instantly that . FC ◦ T = T ◦ FB or, equivalently, expressed by matrix


multiplication:
−1
. FC = T FB T . (7.50)

We therefore obtain from Eq. (7.50), taking the determinant and using the rules
mentioned in Sect. 7.4,

. det FC = det(T FB T −1 ) = det T det FB det T −1 =


= det T det FB (det T )−1 = det FB .

This shows that Definition 7.10 is well defined.


An elegant geometric definition of .det f is obtained by the pullback . f ∗ of a given
determinant form . D:

. f ∗ D(v1 , . . . , vn ) := D( f v1 , . . . , f vn ).

This may also be written as . f ∗ D = D ◦ ( f × . . . × f ).


The determinant is a function on the endomorphisms of .V denoting .(End(V ) ≡
Hom(V, V )):

. det : End(V ) −→ K,
f | −→ det f (7.51)
7.8 The Determinant of Operators in V 227

given by the relation


. f ∗ D = (det f )D (7.52)

which means
. D( f v1 , . . . , f v1 ) = (det f )D(v1 , . . . , vn ), (7.53)

or, equivalently
D( f v1 , . . . , f vn )
. det f = . (7.54)
D(v1 , . . . , vn )

This definition reveals the geometric character of the determinant of a given endo-
morphism . f ∈ End(V ) ≡ Hom(V, V ). Equations (7.53) and (7.54) show that .det f
characterizes the scaling of the . f -transformation on any volume . D in .V .

Summary

The determinant, alongside the identity, the exponential function, and a few others,
is one of the most important maps in mathematics and physics. Therefore, like in
most books on linear algebra, we dedicated an entire chapter to determinants.
Initially, we adopted an algebraic approach for defining and understanding the
properties of determinants. Here, elementary matrices and elementary row operations
were the primary tools. Then, we delved into the geometric aspects of determinants.
In doing so, we introduced and extensively discussed the concept of orientation.
The role of permutations and the permutation group was also introduced in a
suitable manner.
Finally, the concept of determinants was extended to operators, not just matrices,
and we presented the corresponding geometric interpretation.

Exercises with Hints

Elementary matrices induce column and row operations (see Remark 7.2). In
connection with the proof of Theorem 5.1, it follows by construction that for
instance elementary column operations do not affect the column rank of a
matrix. The following Exercise 7.1 shows that they do not affect the row rank
either. The same applies when we replace column by row.

Exercise 7.1 Use the result of Exercise 6.2 to show that elementary column opera-
tions on a given matrix . A ∈ Km×n do not affect the row rank of . A.
228 7 The Role of Determinants

We now apply the above Exercise 7.1 to prove once more the following theo-
rem.

Exercise 7.2 The row—column—rank theorem


Use the fact that elementary column operators affect neither the column nor the rank,
and that, similarly, elementary row operations affect neither the row nor the column
rank of a matrix to show that the row rank of a matrix . A ∈ Km×n equals its column
rank.

For the next six exercises one can find proofs in the literature which generally
differ from the ones given in this chapter. The reader is asked to chose our
proofs, or to try to find different proofs.

Exercise 7.3 Show that the set of determinant functions on a vector space .V , or
equivalently the set of determinant forms is a vector space of dimension one.

Exercise 7.4 Let.Δ1 and.Δ2 be two determinant functions:.Δ1 , Δ2 : Km×n → K and


.A ∈ K
n×n
. Show that
.Δ1 (1n )Δ2 (A) = Δ2 (1n )Δ1 (A).

Exercise 7.5 Use the above Exercise 7.4 to prove the following exercise.
Let .Δ be a determinant function with .Δ(1n ) = 1 and . A, B ∈ Kn×n . Show that

Δ(AB) = Δ(A)Δ(B).
.

Exercise 7.6 Let .Δ be a determinant function and . A ∈ Kn×n . Show that

Δ(AT ) = Δ(A).
.

Exercise 7.7 The determinant of block diagonal matrices.


Let . A ∈ Ks×s , . D ∈ Kr ×r , and .s + r = n. Show that when
[A 0]
. 0 D ∈ Kn×n ,

then [A 0]
. det 0 D = det A det D.

Exercise 7.8 The determinant of upper block triangular matrices.


Let . A ∈ Ks×s , . B ∈ Ks×r , . D ∈ K r ×r , and .s + r = n. Show that when
[A B]
. 0 D ∈ Kn×n ,
7.8 The Determinant of Operators in V 229

then [A B]
. det 0 D = det A det D.

Exercise 7.9 On the geometric interpretation of determinants.


Let . f be a linear map . f ∈ Hom(R2 , R2 ) given by
μ
. f (ai ) = aμ ϕi ,

with
μ
a ∈ R2 , ϕi ∈ R, F = (ϕis ), i, s, μ ∈ I (2),
. i

so we can write
. [ f (a1 ) f (a2 )] = [a1 a2 ]F.

Prove
.vol2 ( f a1 , f a2 )2 = ( f a1 | f a1 )( f a2 | f a2 ) − ( f a1 | f a2 )2 ,

and hence prove


.vol2 ( f a1 , f a2 )2 = (det F)2 vol2 (a1 , a2 )2 .

Exercise 7.10 Let .π be a permutation .π ∈ Sn . Show that .π is a product of trans-


positions. Apply the following induction according to .r (π), where .r (π) is the max-
imal .r (π) ∈ {0, 1, . . . , n}, with the property .π(i) = i for .i = 1, . . . , r (π). And if
.r (π) = n, we get .π = id and this is a void product of transposition.

Exercise 7.11 Let . Pπ = [eπ1 . . . eπn ] = (δiπi ) be a permutation matrix . Pπ ∈ Rn×n .


Show that . P is a group homomorphism

. P: Sn −→ Gl(n, Z) < Gl(n)


π |−→ Pπ

with . Pπ◦σ = Pπ Pσ .
Note that if we compare the entries of the matrix . Pπ with the squares of a chess-
board, and set on every entry .1 a rook, the zeros correspond exactly to the area of
activity of the rook. Therefore we could call . P the chess-representation of the group
. Sn .

[ ]
Exercise 7.12 Check that for a matrix . A = αγ βδ ∈ R2×2 with .det A /= 0,

[ δ −β
]
. A−1 = 1
det A −γ α .
230 7 The Role of Determinants

References and Further Reading

1. S. Axler, Linear Algebra Done Right (Springer Nature, 2024)


2. S. Bosch, Lineare Algebra (Springer, 2008)
3. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Studienanfänger. Grundkurs
Mathematik (Springer, 2020)
4. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
5. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
6. N. Johnston, Introduction to Linear and Matrix Algebra (Springer, 2021)
7. M. Koecher, Lineare Algebra und analytische Geometrie (Springer, 2013)
8. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences (Springer,
2018)
9. P. Petersen, Linear Algebra (Springer, 2012)
10. S. Roman, Advanced Linear Algebra (Springer, 2005)
11. G. Strang, Introduction to Linear Algebra (SIAM, 2022)
Chapter 8
First Look at Tensors

In Sect. 3.5, we discussed in very elementary terms an origin of tensors. Yet there
are many ways to introduce tensors. Some of them are very abstract. We believe that
the most direct way, which is also very appropriate for physics, is to use bases of

. V and . V . This definition is, of course, basis-dependent, but as we have often seen

thus far, this is not a disadvantage. In this chapter, we shall proceed on this path and
leave a basis-independent definition for later (see Chap. 14).
Thus far, we have gained considerable experience with bases. We already used
the simplest possible tensors (e.g. scalars, vectors, and covectors) with their indices
on various occasions. Therefore, it is very instructive to summarize first what we
already know from linear algebra about the use of indices and tensors. This leads us
to the following section.

8.1 The Role of Indices in Linear Algebra

One important application (one may even say chief application) of bases, is they
enable to represent abstract vectors with coordinates, and do concrete calculations
with them. One can say, that bases “give indices to vectors” since we label bases
vectors with an order. It turns out that indices, as they are used in linear algebra,
and in particular in this book, are quite helpful since they give additional informa-
tion about the properties and structures of the mathematical objects they are related
to. We accomplish this using the Einstein Summation Convention and with some
straightforward conventions we add. In this way, we obtain valuable indices which
we call smart indices.
There are two different kinds of indices corresponding to the vectors in .V and the
covectors in.V ∗ . These two possibilities appear clearly and efficiently when written as
upper or down indices. This meets precisely with the Einstein Convention. Regardless
of the Einstein Convention, using indices up or down is much more functional than
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 231
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_8
232 8 First Look at Tensors

writing them left or right, as is usually done in the mathematical literature and not
seldom in the physics literature.
For the sake of simplicity, we consider a real vector space .V with dimension .n
and its dual .V ∗ , and we are going to summarize our experience with indices in linear
algebra till now. Our primary purpose is to revise some subjects and show typical
examples of how the indices enter the various expressions. Thus, we also see the
positive influence of the chosen conventions on a good understanding of the path to
a given expression.

Example 8.1 Change of basis.

We start with two different bases in .V .

. B = (br ) = (b1 , . . . , bn ) and C = (ci ) = (c1 , . . . , cn ),

together with their dual bases

. B ∗ = (β s ) = (β 1 , . . . , β n ) and C ∗ = (γ j ) = (γ 1 , . . . , γ n ).

. B ∗ and .C ∗ are the corresponding bases in .V ∗ dual to . B and .C:


j
.β s (br ) = δrs and γ j (ci ) = δi .

We use an obvious notation throughout the examples, and we demonstrate how


easily and directly the indices themselves lead to the change of relations of
bases. We take .r, s, i, j ∈ I (n), then .v ∈ V and .ξ ∈ V ∗ are given by

. v = br vrB = ci vCi and (8.1)


ξ=
. ξsB β s = ξ Cj γ j . (8.2)

with the corresponding coefficients given by.vrB , vCi , ξsB , ξ Cj ∈ R. For the change
of basis we use a regular matrix

. T = TC B = (τsi ) ∈ Gl(n),

with its inverses


. T −1 ≡ T̄ = (τ̄ir ) τri , τ̄ir ∈ R.

The change of basisis given by

. r b = c j τrj or ci = br τ̄ir and (8.3)


. γ = i
τsi β s or β = r
τ̄ir γ i . (8.4)
8.1 The Role of Indices in Linear Algebra 233

The corresponding coefficients are given by

. Cv i = τsi v sB and ξiC = ξsB τ̄is . (8.5)

Note that the indices .r, s correspond to the bases . B and . B ∗ and the indices .i, j
to the bases .C and .C ∗ . We use different kinds of indices for different bases even
when they correspond to the same vector space. This distinction is not usual in
the literature, but it prevents confusion. In connection with this, we point out that
coefficients (components) of vectors have upper indices and coefficients of covectors
have lower indices. However, vectors themselves have lower indices and covectors
themselves upper indices. This is, of course, consistent with the Einstein Convention
and in the usual matrix formalism also leads to the following expressions:
⎡ ⎤ ⎡ 1⎤
v 1B vC
⎢ .. ⎥ ⎢ .. ⎥ C
→B = ⎣ . ⎦ , ξ = [ξ1 . . . ξn ], v→C = ⎣ . ⎦ , ξ = [ξ1C . . . ξnC ],
.v
B B B
(8.6)
∼ ∼
v nB vCn

v = B v→B = C v→C and ξ = ξ B B = ξ C C,


. (8.7)
∼ ∼

⎡ ⎤ ⎡ ⎤
β1 γ1
⎢ ⎥ ⎢ ⎥
. B := [b1 · · · bn ], B ∗ := ⎣ ... ⎦ , C = [c1 · · · cn ], C ∗ := ⎣ ... ⎦ . (8.8)
β n
γ n

v→ , v→C ∈ Rn , ξ B , ξ C ∈ (Rn )∗ , v ∈ V, ξ ∈ V ∗ .
. B (8.9)
∼ ∼

Note that by the rotation .ξ B we express explicitly that .ξ B is a row. Furthermore,



this choice is related to the transformation behavior of basis vectors, cobasis elements
(covectors), the coefficients of vectors, and the coefficients of covectors as indicated
in Eqs. (8.3) (8.4) and (8.5). We may express this very important information in the
suggestive expressions in Eqs. (8.9), (8.10) and (8.11) below.
The upper indices are transformed by a matrix .T = (τri ), and the lower indices
are transformed by a matrix .T −1 ≡ T̄ = (τ̄ir ) which we may express symbolically:

( · )i = τsi (·)s ,
. (8.10)

. ( · )i = (·)s τ̄is . (8.11)


[ ξ1 ] [ ξ1 ]
B C

If we want to write.ξ and.ξ as columns,.ξ→B :=


B C .. and.ξ→C := .. with.ξ s = ξ B
∼ ∼ .n .n B s
ξB ξC
and .ξCi = ξiC , then we find within the matrix formalism
234 8 First Look at Tensors

v→ = T v→B and ξ→C = (T −1 )T ξ→B .


. C (8.12)

This also leads to an invariant expression which is very important, not only in physics:

(·)s (−)s = ( · )i (−)i = invariant.


. (8.13)

Note that writing the symbol .ξ to indicate the row with entries .ξs ∈ R : ξ =

[ξ1 · · · ξn ], we may also write .ξ ≡ ξ B ≡ ξ B ∈ (Rn )∗ and .v→ ≡ v→B ≡ v B ∈ Rn too.
∼ ∼
We would like to iterate that we mostly use the symbol “.≡” to indicate that we use
different notations for the same objects. We admit that the use of .v→B and .ξ B instead

of .v B and .ξ B is a pleonasm.

Comment 8.1 Notation for lists and matrices.

Note that we often identify the list . B = (b1 , b2 , . . . , bn ) with matrix row
(.1 × n) with entries given by vectors .b1 , . . . , bn which we denote by .[B] :=
[b1 b2 . . . bn ], and we denote both by . B. For the cobasis . B ∗ , we analogously
write the symbol . B ∗ for the column (.n × 1-matrix) with entries the covectors
∗ ∗
.β , . . . , β , as in Eq. (8.8). We also apply this to .C, D, . . . and .C , D , . . . .
1 n

Sometimes this kind of flexibility in the notation is quite useful to avoid


possible confusion.

We are now coming to the next important illustration of using smart indices, by
summarizing again our results concerning the representation of linear maps.

Example 8.2 Representation of linear maps.

We consider the map . f ∈ Hom(V, V ' ) with .dim V ' = m and the basis . B ' =
(bQ' ) = (b1' , · · · , bm' ), .Q ∈ I (m). Suppose . f is given by the equations

. f (br ) = bQ' ϕrQ . (8.14)

We define the matrix . F by

. F = f B ' B = (ϕrQ ) ∈ Rm×n . (8.15)


8.1 The Role of Indices in Linear Algebra 235

Then, using a suggestive notation, we have a commutative diagram.

. f
V V'
ψB
. ψB'
.

RnB RmB '


. F
ρ
The corresponding coefficients .(vrB , w B ' ∈ R), taking .v ∈ V , .w ∈ V ' , and
. f (v) = w, are given by
' Q
.w = bQ w B ' . (8.16)

ρ
We choose our indices, .r for .v, .ρ for .w, .ϕr for . F, and we write .vr , w Q ∈ R.
This leads directly to the expression:
Q
w B ' = ϕrQ vrB .
. (8.17)

This is, of course, also the result of the direct calculation, which in the matrix
formalism takes the well-known form:

. w
→ B ' = F v→B , (8.18)

or even
−−→
→ = f (v) or −
w
.

w = F v→,

as mostly used in physics.

Our last example is a change of basis for the representation of the map . f .

Example 8.3 Change of basis for the representation of . f ∈ Hom(V, V ' ).

Here, the change of basis leads us immediately to the correct result with the
help of our smart indices. In contrast to the matrix formalism, where this is
less straightforward, especially if you are not very familiar with commutative
diagrams.
Now, we would like to obtain the representation of . f relative to the new bases
'
.C and .C , as indicated in the following diagram:

. f
V V'
ψC
. ψC '
.

RCn RCm'
. F̃
236 8 First Look at Tensors

with
. F̃ = f C ' C = (ϕ̃iα ) α ∈ I (m). (8.19)

We take
.C ' = (cα' ) = (c1' , . . . , cm' )

and
Q
w = bQ' w B ' = cα' wCα ' ,
. (8.20)

with
. T ' = TC ' B ' = (τQ'α ),

Q
.wCα ' = τQ'α w B ' . (8.21)

In accordance with the convention concerning the choice of indices, we obtain


the matrix . F̃ in a straightforward manner by following the right indices. Since
Q
we want to have the coefficient .ϕ̃iα of . F̃ in terms of the coefficients .ϕr of . F, we
can write:
α 'α Q r
.ϕ̃i = τQ ϕr τ̄i . (8.22)

We have taken into account that the indices .i and .α correspond to the new basis
C and .C ' and the indices .r and .Q to the bases . B and . B ' . Doing so, the result is
.

uniquely determined. The corresponding expression in the matrix formalism is


given from Eq. (8.22) by
' −1
. F̃ = T F T . (8.23)

This result was also found in Sect. 5.5, “Linear Maps and Matrix Representa-
tions”.

8.2 From Vectors in . V to Tensors in . V

To justify the path leading from vectors to tensors, both intuitively and correctly, we
need to explain the notion of a free vector space. This will also help to understand
better our approach from vectors to tensors via the explicit use of bases. In addition,
this will show that both tensors and their indices are something quite natural and, in
a way, inevitable.
Starting with an abstract vector space .V , we may also choose a basis that is simply
a list of .n = dim V linearly independent vectors in .V .
We may start now with an arbitrary list of arbitrary objects, that is, elements.
Using a list of arbitrary objects in any given set as a basis, we can formally also
8.2 From Vectors in V to Tensors in V 237

determine a vector space called a free vector space. This brings us to the following
definition:

Definition 8.1 A free vector space.

For a given set . S = {s(1), . . . , s(n)} of cardinality .n, the free .R-vector space
on . S is
.RS := {si v : si := s(i), v ∈ R, i ∈ I (n)}.
i i

It is a vector space with .dim(RS) = n.

We can show immediately that .RS is a vector space and we write .V = RS. The
set we started with, . S = {s(1), . . . , s(n)}, is a basis of .V so that .dim V = n and
we may change the notation and write . B = S, taking . B = (b1 , . . . , bn ) with .b1 :=
s(1), . . . , bn := s(n).
Repeating the above construction for a new set . S(2) := {s(i, j) := bi b j , i, j ∈
I (n)}, we obtain a new vector space .T 2 V := RS(2) with the basis . B2 = S(2) =
{bi j := bi b j }. We call .T 2 V a tensor space of rank 2. This is the simplest nontrivial
tensor space over .V . It is also clear that .T 2 V is given by

. T 2 V = span S(2) = {v i j bi b j : v i j ∈ R}.

For any fixed .k ∈ N0 = {0, 1, 2, . . . }, we generalize this construction .T k = RS(k)


as follows:
. T V := R and T 1 V := V.
0

So we have, in general, .T k V := RS(k) with . S(k) given by

. S(k) = {bi1 bi2 , . . . , bik = i 1 , . . . , i k ∈ I (n)}.

It is clear that . S(k) is a basis of .T k V and that .dim T k V = n k . .T k V is given by

. T k V = {v i1 ...ik bi1 bi2 . . . bik : v i1 ...ik ∈ R}.

An expression like .i 1 i 2 . . . i k is usually called a multiindex.


It is reasonable to think when we go from .bi ∈ B to .bi b j that a kind of multiplica-
tion is in action. We may write .bi b j = bi ⊗ b j with .⊗ a symbol of a product which
is called tensor product. This way we may generally write

. T k V = {v i1 ...ik bi1 ⊗ bi2 . . . ⊗ bik : v i1 ...ik ∈ R}.

This is also called a contravariant tensor of rank .k or .k-tensor. Since every con-
travariant tensor of rank .k can be written as a linear combination of tensor products
of vectors, we can also justify the following expression for .T k V :
238 8 First Look at Tensors

. · · ⊗ V, .
T k V = ,V ⊗ ·,,
k-times

It is evident that the same construction can also be made for .V ∗ . So we have . B ∗ =
(β 1 , . . . , β n ) and

. T k V ∗ = {vi1 ...ik β i1 ⊗ . . . ⊗ β ik : vi1 ...ik ∈ R}.

So we have analogously to the above expression for .T k V ∗ :

. T k V ∗ = ,V ∗ ⊗ ·,,
· · ⊗ V ,∗ .
k-times

This is called covariant tensor of rank .k or .k-tensor. The expressions “contravariant”


and “covariant” to .T k V and .T k V ∗ are purely historical nomenclatures. It is clear
that here,

. S (k) := {β β . . . β : i 1 , i 2 , . . . i k ∈ I (n)},
i1 i2 ik

is a basis of .T k V ∗ .
In our construction, there is no restriction to no additional properties on the set . S
preventing it from being a basis for a vector space. Hence we can also have the set

. S = S(l, k) = {bi1 . . . bik β j1 . . . β jl : k, l ∈ N0 and i 1 , . . . , i k , j1 . . . jl ∈ I (n)}

as a basis of a vector space which we denote by .Tlk V . We thus obtain another tensor
space given by

T k V = {v i1 ...ik
. l j1 ... jl bi1 ⊗ · · · ⊗ bik ⊗ β j1 ⊗ . . . ⊗ β jl }

which we call a mixed tensor space of type .(l, k). We can also write

· · ⊗ V, ⊗ ,V ∗ ⊗ ·,,
T k V = ,V ⊗ ·,,
. l · · ⊗ V ,∗ .
k-times l-times

A tensor . A ∈ Tlk V is an element of a vector space (tensor space), as stated above,


the representation relative to the basis .(B, B ∗ ) of .(V, V ∗ ) is given by

. A = αr1 ...rk s1 ...sl br1 ⊗ · · · ⊗ brk ⊗ β s1 ⊗ . . . ⊗ β sl .

In the basis .(C, C ∗ ) of .(V, V ∗ ), the corresponding representation is given by

. A = α i1 ...ik j1 ... jk ci1 ⊗ · · · ⊗ cik ⊗ γ j1 ⊗ . . . ⊗ γ jl .


8.2 From Vectors in V to Tensors in V 239

For the corresponding coefficients of . A in bases . B and .C, we may write

. A B := (αr1 ···rk s1 ···s2 ) and AC := (α i1 ···ik j1 ··· j2 ).

Using the notation given in Eqs. (8.1) and (8.2), the change of basis .br = ci τsi leads,
for the coefficients of . A, to the transformation given as expected by
s
α 1 ···ik = τr 11 τr 22 · · · τr kk αr 1 r2 ···rk s 1 s2 ···sl τ̄ j 11 τ̄ j 22 · · · τ→j ll .
i i i i s s
. j1 ··· jl

Symmetric and antisymmetric tensors are special kinds of tensors. They are essen-
tial for mathematics, especially for differential geometry, and everywhere in physics.
We restrict ourselves here to the covariant tensors since similar considerations apply
to contravariant tensors too. For mixed tensors, a similar approach is not really rele-
vant.
Symmetric tensors are tensors whose coefficients, in any basis inter-change, stay
unchanged. Antisymmetric (totally antisymmetric) or alternating tensors are tensors
whose coefficients change signs by interchanging any pair of indices.

Definition 8.2 Symmetric .k-tensors.

A.k-tensor.τ is symmetric if its coefficients.τi1 ···i k , in any basis, are unchanged


by any permutation of the indices .i 1 , . . . , i k .

In physics, we have two prominent examples of symmetric tensors, the metric tensor
g and the energy impulse tensor .Tμν .
. μν

Definition 8.3 Alternating tensors.


A .k-tensor .τ is alternating if its coefficients .τin ···i k , in any basis, change the
sign by interchanging any pair of indices:

τ
. i 1 ···i a ···i b ···i k = − τi1 ···i b ···ia ···i k .

There are two other equivalent definitions:


– .T is alternating if any two indices are the same as in.τi1 ···i a ···ia ···i k , then.τi1 ···i a ···ia ···i k =
0;
– .T is alternating if for any permutation of the indices .π ∈ Sk , the relation
.τπ(i 1 )···π(i k ) = sgn(π )τi 1 ···i k holds (.sqn(π ) ≡ επ ).

The most prominent example of an alternating tensor in mathematics and physics,


is the volume form in the corresponding dimensions.
240 8 First Look at Tensors

Summary

This chapter concludes the section of the book that we consider elementary linear
algebra.
Here, we summarized and reiterated our notation for linear algebra and tensor
formalism. This notation primarily involves the systematic selection of indices and
their positioning, whether upper or lower, in the entries of the representation matrices
of linear maps.
Several advantages of this notation were mentioned, and multiple examples facil-
itated the reader’s understanding of using these “smart indices”, as we prefer to call
them.
Following that, we presented our second elementary introduction to tensors. This
introduction, like the one in Chap. 3, is dependent on the basis but represents a certain
generalization concerning the dimension and the rank of the tensor space considered
and the corresponding tensor notation.
Chapter 9
The Role of Eigenvalues and Eigenvectors

This is one of the most important topics of linear algebra. In physics, the experimental
results are usually numbers. In quantum mechanics, these numbers correspond to
eigenvalues of observables which we describe with special linear operators in Hilbert
spaces or in finite-dimensional subspaces thereof. Eigenvalues are also relevant for
symmetries in physics. Eigenvalues and eigenvectors help significantly to clarify the
structure of operators (endomorphisms). We recall that an operator in linear algebra
is a linear map of a vector space to itself. We denote the set of operators on .V by
.End(V ) ≡ Hom(V, V ).
In this chapter, after some preliminaries and definitions, we will discuss the role
of eigenvalues and eigenvectors of diagonalizable and nondiagonalizable operators.
For any given operator . f on a .C vector space, we get to a corresponding direct sum
decomposition of .V , using only very elementary notions. This decomposition leads
to the more refined Jordan decomposition of .V . First, we consider the situation on
an abstract vector space without other structures. Later, in Chap. 10, we shall discuss
vector spaces with an inner product structure.

9.1 Preliminaries on Eigenvalues and Eigenvectors

The first step towards understanding a part of the structure of a given operator
f ∈ End(V ), was presented in Chap. 6. After choosing a basis, we obtained a decom-
.
position:
. V = ker f ⊕ coim f.

This also shows the general direction we have to choose to investigate the structure
of any arbitrary operator in .V . We need to find a finer decomposition of .V induced
by the operator . f . We hope to find a list of subspaces,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 241
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_9
242 9 The Role of Eigenvalues and Eigenvectors

Ui where i ∈ I (ω) for some ω ∈ N


.

and a direct sum decomposition of .V ,

. V = U1 ⊕ · · · ⊕ Ui ⊕ · · · ⊕ Uω

in such a way that each .Ui is as small as possible, that is, their dimensions are as
small as possible. The restriction of . f to every such subspace must of course be an
endomorphism : . f |U j ∈ End(U j ). In other words, every .U j must be . f invariant.
The subspaces .ker f and .im f are, indeed, both . f invariant subspaces. The most
we can expect in this situation, is for every . f invariant subspace .Ui to be one-
dimensional. Such one-dimensional. f invariant subspaces lead to the specific scalars,
the eigenvalues of . f , and special vectors, the eigenvectors of . f . These are special
characteristics and geometric properties of every operator . f . Furthermore, if, for
example, the operator . f is connected with an observable, as in quantum mechanics,
then the eigenvalues correspond to the results that the experiments produce. These
have to be compared with the theoretical results given from the calculations of such
eigenvalues.
However, to clearly understand what is going on, we must consider the most
general case without the inner product (metric) structure. This is also justified because
eigenvalues and eigenvectors are independent of any isometric structure.
We cannot expect that every operator . f will induce such one-dimensional direct
sum decompositions of .V . The existence of this fundamental property of an operator
. f is connected with the diagonalization problem we shall discuss below. Diagonal-

izable operators are a pleasant special case from the mathematical point of view.
Fortunately, the most physically relevant operators are diagonalizable too. Almost
all diagonalizable operators in physics are so-called normal operators, defined only
on inner product vector spaces (see Sect. 10.5).
One more comment has to be made. The theory of eigenvalues and eigenvectors
differs when considering a complex or real vector space. The formalism within a
complex vector space seems more natural and straightforward than within a real
vector space. It should be clear that both real and complex vector spaces, are equally
relevant and essential in physics.
In what follows, we start with the theory of eigenvalues and eigenvectors on a
.K—vector space for .K ∈ {C, R} and we may think in most cases of a .C—vector
space and only when there is a difference to the .R—vector space formulation, we
shall comment on that appropriately.

9.2 Eigenvalues and Eigenvectors

As already mentioned, the theory of eigenvalues and eigenvectors depends on the


field .K = {C, R}. For example, we shall see that the real and complex versions of the
spectral theorem are significantly different (see Chaps. 10 and 11). Since the theory
9.2 Eigenvalues and Eigenvectors 243

with complex vector spaces is easier to deal with, we generally think of complex
vector spaces. If there is a difference, we bear this difference in mind when we
restrict ourselves to the real vector space framework.
We are led to the notion of eigenvalues and eigenvectors if we think of the smallest
possible nontrivial subspace.U of.V . Then.U is of course a one-dimensional subspace
and is determined by a nonzero vector .u (i.e., .u /= 0), so we have .U = span(u) =
{αu : α ∈ K}. We would like .U to also be consistent with the operator . f . That is, we
would like .U to also be . f - invariant,

. f (U ) < U.

It follows that the equation


. f (u) = λu

should be valid. Further, if we take .v ∈ U with .v = αu, α /= 0, we have

. f (v) = f (αu) = α f (u) = αλu = λαu = λv.

This leads to the fundamental definition :

Definition 9.1 Eigenvalue and eigenvector.


For a given operator . f ∈ End(V ), a number .λ ∈ K is called eigenvalue of . f
if there exists a nonzero vector .v ∈ V \{0} so that . f (v) = λv. The vector .v is
called an eigenvector of . f and corresponds to the eigenvalue .λ. We may call
the pair (.λ, v) an eigenelement.

It is important to realize from the beginning that it is the eigenelement, the pair
(.λ, v), which is uniquely defined. If the eigenvalue .λ is given, there are always many
eigenvectors belonging to this .λ: all the nonzero .u ∈ U = span(u) = {αu : α ∈ K}.
Furthermore, it is also possible that some other vector .w ∈ V \U exists which fulfills
the same eigenvalue equations as above.

. f (w) = λw.

This leads us to an . f -invariant subspace of .V called the .λ- eigenspace.

Definition 9.2 Eigenspace . E(λ, f ).


Let . f ∈ End(V ) and .λ ∈ K. The eigenspace of . f corresponding to the eigen-
value .λ denoted by . E(λ, f ) is given by

. E(λ, f ) = {v ∈ V : f (v) = λv}


244 9 The Role of Eigenvalues and Eigenvectors

or equivalently
. E(λ, f ) = ker( f − λidV ).

In other words, . E(λ, f ) is the set of all eigenvectors corresponding to .λ with the
inclusion of the vector .0 in order . E(λ, f ) to be a vector space.

9.3 Examples

9.3.1 Eigenvalues, Eigenvectors, Eigenspaces

Example 9.1 . f = idV

Eigenvectors Eigenvalue
all .v ∈ V \ {0} .λ=1

because . f v = idV v = 1 v.

Example 9.2 . f ∈ Hom(V, V ) arbitrary

Eigenvectors Eigenvalue
.kerf \ {0} .λ=0

because . f v = 0 v.
Hence, the eigenspace . E(λ = 0, f ) = ker f .

Example 9.3 Projections


We consider . f ∈ Hom(V, V ) given by the relation

. f 2 = f. (9.1)

Eigenvectors Eigenvalue
.im f \ {0} .λ1 =1
.ker f \ {0} .λ2 = 0
9.3 Examples 245

and .V has the decomposition:

. V = E(1, f ) ⊕ E(0, f )
= im f ⊕ ker f.

We prove this as follows:


Proof Let .v be an eigenvector with eigenvalue .λ so that

. f v =λv. (9.2)

Thus, we have f 2 v = f (λv) = λ2 v.


. (9.3)

Using Eqs. (9.1) and (9.2), we obtain

. f v = λ2 v ⇒ λv = λ2 v. (9.4)

Since.v /= 0, we get.λ = λ2 . This leads to the two eigenvalues.λ1 = 0 and.λ2 =


1. The operator . f is diagonalizable and thus we are led to a decomposition
(spectral decomposition) of .V and . f : Since . f 2 = f , we have also .ker f 2 =
ker f and by Proposition 3.12, we get:

. V = ker f ⊕ im f. (9.5)

The decomposition of.V = ker f ⊕ im f gives a decomposition of the identity


into projections as follows. With the decomposition of .id ≡ idV ,

id = f + (id − f ).
. (9.6)

We now show that .id − f is also a projection operator:

.Since (id − f ) ◦ (id − f ) = id ◦ id + f ◦ f − id ◦ f − f ◦ id


= id + f 2 − 2 f
= id + f − 2 f
= id − f.

Hence .(id − f ) ◦ (id − f ) = id − f and .id − f is a projection.

For λ = 0, we have
. f v = 0v and so E(λ = 0) ≡ E(0) = ker f.
For λ = 1, we have fv = v and so E(λ = 1) ≡ E(1) = im f.

Using the fact that.ker f = im(id − f ) and.im f = ker(id − f ), we can write


alternatively for Eq. (9.5)
246 9 The Role of Eigenvalues and Eigenvectors

. V = im(id − f ) ⊕ im f = E(0) ⊕ E(1). (9.7)

If we use for both projection operators, the eigenvalue notation . P0 := id − f


and . P1 := f , we again get from Eqs. (9.5) and (9.7)

. V = im P0 ⊕ im P1 , (9.8)

and for Eq. (9.6)


id = P0 + P1 ,
. (9.9)

and for the spectral decomposition of . f ≡ P1 , the “trivial” relation

. f = 0 P0 + 1 P1 . (9.10)

This is what is generally expected of a diagonalizable operator. A further example


of an exemplary character as well, especially useful for various symmetries in physics,
is the following one.

Example 9.4 Involutions


We consider . f ∈ Hom(V, V ) given by the relation

. f 2 = id. (9.11)

We define the projection . P := 21 (idV + f ) and we have

Eigenvectors Eigenvalues
.im P \ {0} .λ1 =1
.ker P \ {0} .λ2 = −1

and .V has the decomposition

. V = E(1, f ) ⊕ E(−1, f ).

We prove this as follows:


Proof We try, as in Example 9.3, to determine a connection to projections.
Using the relation (9.11) with the eigenvalue equation

. f v = λv and f 2 v = λ2 v,
9.3 Examples 247

we obtain .v = λ2 v which leads to .1 = λ2 and to .λ1 = 1 and .λ2 = −1. Taking


into account the relation (9.11), we define

. P := 21 (id + f )

which is a projection operator since:

. P 2 = 21 (id + f ) ◦ 21 (id + f ) = 41 {id + 2 f + f 2 }


= 41 {id + 2 f + id}
= 21 (id + f ).
So P 2 = P.

We show that .im P = E(λ = 1, f ):


Let .v ∈ E(1), then we have . f v = v and we obtain:

. Pv = 1
2
(id + f )v = 1
2
v+ 1
2
fv = 1
2
v+ 1
2
v = v,

hence .v ∈ im P.
Let .v ∈ im P, then we have . Pv = v and we obtain:

. Pv = v ⇔ 1
2
(id + f )v = v ⇔ 1
2
v+ 1
2
f v − v = 0 ⇔ 21 f − 21 v = 0
⇔ fv = v

such that .v ∈ E(1). So we established

. E(1) = im P. (9.12)


Similarly, one can show that . E(−1, f ) = ker P.
Using the procedure of Example 9.3, we see that we can write .ker P =
im(id − P) and we get . E(−1) = im(id − P) such that we have the direct
sum decomposition of .V :

. V = E(1) ⊕ E(−1). (9.13)

Using Eq. (9.12) and the above results, we define. P1 := P and. P−1 := id − P.
The spectral decomposition of . f is given by

. f = 1 P1 + (−1)P−1 . (9.14)
248 9 The Role of Eigenvalues and Eigenvectors

Example 9.5 Nilpotent operators


Let . f ∈ Hom(V, V ) be nilpotent. We consider a subspace .U ≡ U ( f, v0 ) of
.V with .v0 ∈ V and . f m v0 = 0 with . f m−1 v0 /= 0. The subspace .U is given by

.U = span(v0 , f v0 , f 2 v0 , . . . , f m−1 v0 ). (9.15)

We define .bs := f s−1 v0 , s ∈ I (m). By definition, . f is a special nilpotent


operator and . B = (b1 , . . . , bm ) is a basis of .U (see Exercise 9.1). We have
. f bμ = bμ+1 for .μ ∈ I (m − 1) and

. f bm = 0.

Thus .λ = 0 is the only eigenvalue of . f and . E(0) = ker f (see also Definition
9.12, Lemma 9.4, and Proposition 9.6). In this example, there is no basis of
eigenvectors and so. f is not diagonalizable. As we shall see later, in Theorem
(9.4), the fact that . f is a nilpotent operator is not an accident, it is the heart of
non-diagonalizability.

These examples show that the terms eigenvalue, eigenvector, and eigenspace are
very natural ingredients of operators. This applies regardless of how they can be
specifically determined.

9.3.2 Eigenvalues, Eigenvectors, Eigenspaces of Matrices

[10]
Example 9.6 . A = 00

This is a projection, as in Example 9.3. Additionally, we can test directly the


relation . A2 = A. Further, here . A is a symmetric matrix which, as it turns out,
leads to an orthogonal projection on the x-axis:
[ 1] [ ]
. Aξ→ = A = ξ
ξ2
= ξ1
0
= ξ 1 e1 ∈ R1 ≡ e1 R.

We further recognize directly that[ .ker] A = R2 ≡ e2 R. So we have .im A = R1


and .ker A = R2 . Since .1 − A = 00 01 , we get .im(1 − A) = R2 . We write, as
in Example 9.3,
9.3 Examples 249

[0 0]
. P1 := A and P0 := 01

and we denote the orthogonal decomposition of . A given by .R1 and .R2


(.R1 ⊥ R2 ):
. A = 1 P1 + 0 P0 .

[11]
Example 9.7 . A = 1
2 11

This is again an orthogonal projection since . A2 = A and . AT = A[ hold.


] There
.λ1 = 1 and .λ2 = 0 with eigenvectors .v1 = 1 and .v0 =
are two eigenvalues, 1
[ 1 ]
−1 : [ ][1] [1] [ ][ 1 ] [0]
1 = 1 = 0 .
1 11
.
2 11
and 21 11 11 −1

We see explicitly that .v1 ⊥v2 . So we get . E(1) = Rv1 = im A and . E(0) =
Rv0 = ker A.

R2

ξ→
Rv1 = E(1)

Aξ→
R1

Rv0 = E(0)

[1 ]
Example 9.8 . A = 0
0 −1

We have [1 ][1 ] [10]


.
0
0 −1
0
0 −1 = 01 .

That is, . A2 = 1 and . A T = A. This is a direct reflection which is also an invo-


lution, see Example 9.9. Here, and in the next example, we see an abstract
involution in a concrete situation:
[ ] [ ]
→ = A ξ 21 = ξ 12 .
. Aξ
ξ −ξ
250 9 The Role of Eigenvalues and Eigenvectors

So we have explicitly:
[1 ][1] [1] [1 ] [0] [0]
.
0
0 −1 0 = 0 and 0
0 −1 = 1 = (−1) 1 .
[1]
[ 0 ]eigenvalues, .λ1 = 1 and .λ2 = −1, with eigenvectors .v1 = 0
There are two
and .v−1 = 1 and . E(1) = R1 , . E(−1) = R2 . For the corresponding projec-
tions, we get
. P1 : R −→ E(1), P−1 : R −→ E(−1).
2 2

This leads to the spectral decomposition of . A:

. A = 1 P1 + (−1)P−1
[1 ] [ ] [ ]
0
0 −1 = 1 01 00 + (−1) 00 01

R2
ξ→

R1

Aξ→

[ ]
cos ϕ sin ϕ
Example 9.9 . A = sin ϕ − cos ϕ

. A is a reflection (involution). Using almost the same notation as in Example


9.8, the eigenvectors are given here by
[ ] [ ]
cos( ϕ2 ) cos( ϕ2 + π2 )
.v1 = and v−1 =
sin( ϕ2 ) sin( ϕ2 + π2 )

Formally, we obtain the same result as in Example 9.8:

. A = 1 P1 + (−1)P−1 .
9.3 Examples 251

[ ]
cos ϕ − sin ϕ
Example 9.10 . A = sin ϕ cos ϕ

This is a rotation by the angle .ϕ. Unless .ϕ ∈ πZ, there is no subspace of .R2
that stays . A-invariant and thus there are neither eigenvalues nor eigenvectors.
Observe that this behavior is only because we are working over .R, and it is
quite different over .C!

[0 1 0]
Example 9.11 . A = 001
000

A is a nilpotent matrix.
[0 0 1] [0 0 0]
. A2 = 000 , A3 = 000
000 000

[ 1 ]we see that there is only one eigenvalue,.λ = 0, with eigenvector


By inspection,
.v0 = e1 = 0 and eigenspace
0

. E(0) = ker A = R1 = Re1 .

Definition 9.3 Geometric multiplicity.


The dimension, .n λ , of . E(λ, f ), is called geometric multiplicity of the eigen-
vector .λ.

In what follows, we consider . f fixed and write . E λ ≡ E(λ, f ). As we see, the restric-
tion of . f to . E λ < V, f | Eλ acts only as a multiplication by .λ, the simplest nontrivial
action an operator can do. It is interesting to notice that for .λ1 /= λ2 , it follows that
. E λ1 ∩ E λ2 = {0} so that . E λ1 and . E λ2 are linearly independent (see Definition 3.12).
This means that the eigenvectors corresponding to distinct eigenvalues are linearly
independent. This is shown in the following proposition.

Proposition 9.1 Linear independence of eigenvectors.


Let . f ∈ End(V ). If .λ1 , . . . , λr are distinct eigenvalues of . f , then correspond-
ing eigenvectors .v1 , . . . , vr are linearly independent.

Proof Suppose that the eigenvectors.v1 , . . . , vr are linearly dependent. This will lead
to the contradiction that one of them must be zero. (By definition, every eigenvector
252 9 The Role of Eigenvalues and Eigenvectors

is a nonzero vector.) Since the list .(v1 , . . . , vr ) is by assumption linearly dependent,


one of the vectors in this list must be a linear combination of the proceeding vectors.
Let .k be the smallest such index. This means that the vectors .v1 , . . . , vk−1 are linearly
independent. So .vk is a linear combination of the proceedings:

v = αμ vμ , μ ∈ I (k − 1), αμ ∈ K.
. k (9.16)

Acting by . f , we obtain

. f (vk ) = f (αμ vr ) = αμ ( f vμ ), (9.17)


μ
. =⇒ λk vk = α λμ vμ . (9.18)

On the other hand, we multiply Eq. (9.16) by .λk and we have

λ v = λk αμ vμ = αμ λk vμ .
. k k (9.19)

Subtracting Eq. (9.18) from (9.19), we obtain

0 = αμ (λμ − λk )vμ .
. (9.20)

Since .(λμ − λk ) /= 0 and the .(v1 , . . . , vk−1 ) are linearly dependent, it follows that

.αμ = 0 ∀μ ∈ I (k − 1). (9.21)

Equation (9.16) shows that .vk = 0, this is in contradiction to the fact that .vk is an
eigenvector and therefore nonzero. This completes the proof and the list .(v1 , . . . , vr )
is linearly independent. ∎

From the above proposition, we can directly deduce the following corollary.

Corollary 9.1 Direct sum of eigenspaces.


If the eigenvalues .(λ1 , . . . , λr ) of the operator . f are distinct, then
(i) the sum of the eigenspaces is direct:

. E λ1 + · · · + E λr = E λ1 ⊕ · · · ⊕ E λr ;

(ii) the following inequality is valid:


r
. dim E λi < dim V.
i=1
9.3 Examples 253

Corollary 9.2 Number of distinct eigenvalues.


The operator . f ∈ End(V ) can have at most .n = dim V distinct eigenvalues.

Proof Proof of Corollaries 9.1 and 9.2.


We choose a basis . Bi := (v1(i) , . . . vn(i)i ) of . E λi , i ∈ I (r ). Then the list .(B1 ∪ B2 ∪
. . . ∪ Br ) is a basis of
. W := E λ1 ⊕ · · · ⊕ E λr

since. E λi < V, W < V and the above sum is direct which means that the subspaces
(E λ1 , . . . , E λr ) are linearly dependent. This leads directly to the result:
.

. dim W = dim E λ1 + · · · + dim E λr < V.

Everything concerning an operator . f ∈ End(V ) is precisely valid for a matrix


. F if it is considered as an endomorphism . F ∈ End(Kn ) and, in particular, if it is
regarded as a representation . F ≡ f B of . f relative to a basis . B. In this sense, the
eigenvalue problem of. f corresponds precisely to the eigenvalue problem of. F ≡ f B .
The following statement shows this.

Proposition 9.2 Representation of eigenvectors.


Given an operator . f ∈ End(V ) and a basis . B = (b1 , . . . , bn ) in .V . Then
.v is an eigenvector of . F corresponding to the eigenvalue .λ if and only if

.φ B (v) =: v B ∈ K is an eigenvector of . F ≡ f B corresponding to the same


n

eigenvalue .λ. Recall that .φ B is the canonical basis isomorphism between .V


and .Kn , as given by the commutative diagram:
f
V V
φB φB
Kn Kn
F

So we have :
. f v = λv ⇔ Fv B = λv B . (9.22)

Proof The above diagram explains everything.

.φB ◦ f = F ◦ φB . (9.23)
254 9 The Role of Eigenvalues and Eigenvectors

Let .(λ, v) be an eigenelement of . f . Then we have the following equivalences:

. f v = λv ⇔ φ B ◦ f (v) = φ B (λv), (9.24)


. ⇔ F ◦ φ B (v) = λφ B (v), (9.25)
. ⇔ Fv B = λv B . (9.26)

This proves the equivalence of Eq. (9.22). ∎

This means that by using matrices, we can obtain everything we want to know
about operators. In particular, the eigenvalues of an operator . f are given by the
eigenvalues of the corresponding representation . f B ≡ F.
How do we find the eigenvalues of a matrix? To answer this, we first notice that
.λ is an eigenvalue of . F if and only if the equation

(F − λ1n )→
. v=0 (9.27)

has a nontrivial solution or, equivalently, if .(F − λ1n ) is singular or if .det(λIu −


F) = 0. This leads to the following definition:

Definition 9.4 The characteristic polynomial of a matrix.


The polynomial.χ F (x) = det(x1n − F) is called the characteristic polynomial
of the matrix . F ∈ Kn×n and is given by

.χ F (x) = x n + χn−1 x n−1 + · · · + χ2 x 2 + χ1 x + χ0

with the coefficients .χ0 , χ1 , . . . , χn−1 ∈ K.

The roots of .χ F are the eigenvalues of . F.

Remark 9.1 Criterion for eigenvalues.


A scalar .λ is an eigenvalue of . F if and only if

χ F (λ) = 0.
.

The equation .det(x1 − F) = 0 holds if and only if the matrix .x1 − F is not
invertible, which is equivalent to the statement:

. ker(x1 − F) /= {0}.
9.3 Examples 255

This gives us a computational way to calculate the eigenvalues of a matrix . F.


One simply finds the solutions of the eigenvalue equation:

. det[x1 − F] = 0.

Definition 9.5 Characteristic polynomial of an operator.


The characteristic polynomial of the operator . f is defined by the same poly-
nomial as above if we take . F ≡ f B for some given basis . B ∈ B(V ):

. χ f (x) := χ f B (x).

This definition is well defined since it is independent of the chosen basis . B. The
following lemma shows this.

Lemma 9.1 Basis invariance.


Let . f ∈ End(V ) and .dim V = n. For the two bases . B, C ∈ B(V ) with the
corresponding basis isomorphisms .φ B and .φC and the representations of . f :

f = φ B ◦ f ◦ φ−1
. B B and f C = φC ◦ f ◦ φC−1 .

Then .det f C = det f B .

Proof We define .T := φC ◦ φ−1 −1


B . We have . f = φ B ◦ f B ◦ φ B . This leads to

f = φC ◦ f φC−1 = φC ◦ (φ−1
. C
−1
B ◦ f B ◦ φ B ) ◦ φC ,
f C = T ◦ f B ◦ T −1 .

Then .det f C = det(T f B T −1 ) = det T det f B det T −1 = det f B . ∎

This proves also that .det f := det f B is well defined and that the determinant of
the operator .xidV − f is the characteristic polynomial of the operator . f :

χ f (x) := det(x idV − f ).


.

Note that the characteristic polynomial is a monic polynomial with .deg(χ f ) = n =


dim V .
256 9 The Role of Eigenvalues and Eigenvectors

Definition 9.6 Algebraic multiplicity of an eigenvalue.


If the characteristic polynomial has the form

.χ f (x) = (x − λ)m λ Q(x)

with .Q(λ) /= 0, the exponent .m λ is called the algebraic multiplicity of the


eigenvalue .λ.

As expected, there is a relation between the geometric.n λ = dim E λ and the algebraic
multiplicity .m λ .

Lemma 9.2 Geometric and algebraic multiplicity.


If . f : V → V , .λ is an eigenvalue and .n λ , m λ are the geometric and algebraic
multiplicity respectively, then .n λ < m λ .

Proof We choose a basis of .V ,

. B := (c1 , . . . , cn λ , bn λ +1 , . . . , bm 'λ , bn λ +m 'λ +1 , . . . , bn ),

where .(c1 , . . . , cn λ ) is an eigenbasis of . E λ . The matrix of . f with respect to this basis


is given by the block matrix . f B :
[ ]
λ1n λ M1
. F ≡ fB =
0 M2

In an obvious notation we have


'
χ f (x) = χ F (x) = (x − λ)n λ χ2 (x) and χ2 (x) = (x − λ)m λ Q2 (x) with Q2 (λ) /= 0
.

so
'
.χ f (x) = (x − λ)n λ +m λ Q2 (x), (m 'λ > 0).

So we obtain .m λ = n λ + m 'λ and .n λ < m λ . ∎


9.3 Examples 257

9.3.3 Determining Eigenvalues and Eigenvectors

[12]
Example 9.12 . F = 24

The[ eigenvalue
] equation .det[x 1 − F] = 0 for the above . F is given by
x−1 −2
.det −2 x−4 = 0. This leads to

(x − 1)(x − 4) − 4 = 0 ⇔ x 2 − 5x + 4 − 4 = 0 ⇔ x 2 − 5x = 0.
.

So the eigenvalues of . F are .λ1 = 0 and .λ2 = 5. One can calculate an eigen-
vector corresponding to .λ1 from the matrix equation
[ 1 2 ] [ ξ1 ] ξ 1 +2 ξ 2 =0
. 24 ξ2
= 0 or by the system 2ξ 1 +4 ξ 2 =0
.
[ 2 ]
The solution gives the eigenvector .v0 = −1 and the eigenspace . E(0) =
ker F = Rv0 . Similarly, for the eigenvalue .λ2 = 5 we have
[ 1−5 ] [ −4 ]
.
2
2 4−5 =0⇔ 2
2 −1 = 0.

This leads to the system. −4ξ +2ξ =0 1 2

2ξ [
, with a solution.ξ 2 = 2 ξ 1 . An eigenvector
]
−ξ =0
1 2

of .λ = 5 is given by .v5 = 21 . The corresponding eigenspace is given by

. E(5) = Rv5 .

We observe that .v0 ⊥ v5 since . F T = F.

[1 0]
Example 9.13 . F = 00

We already know everything from [ Examples


] 9.8 and 9.6. Therefore we now
check the results .χ F (x) = det x−1
0 x
0
= (x − 1)x. This leads as expected to
.(x − 1)x = 0 and to the eigenvalues .λ1 = 0 and .λ2 = 1.

[11]
Example 9.14 . F = 1
2 11

Here we proceed as in Example 9.13 and determine the eigenvalues of . F.


258 9 The Role of Eigenvalues and Eigenvectors

[ ] 1 1 1
x− 21 − 21
χ F (x) = det
.
− 21 x− 21
= (x − )(x − ) − =
2 2 4
1 1 1 1
= x − x − x + − = x 2 − x = x(x − 1).
2
2 2 4 4
This leads to .x(x − 1) = 0 and to eigenvalues of . F, .λ1 = 0 and .λ2 = 1.

[0 1]
Example 9.15 . F = 10

F is an involution: . F 2 = 1.
.
The characteristic polynomial is given by
[ x −1
]
.χ F (x) = det −1 x = x 2 − 1.

So we get .χ F (x) = 0 ⇔ x 2 − 1 = 0 and the solutions are .λ1 = 1 and .λ2 =


−1.

[0 1 0]
Example 9.16 . F = 001
000

This is a nilpotent matrix as in Example 9.11.


[x 1 0]
χ F (x) = det
. 0x 1 = x 3.
00x

So we have the equation for the eigenvalues

. x 3 = 0.

The only solution is.x = 0 and we have only one eigenvalue.λ = 0, as expected.

9.4 The Question of Diagonalizability

For a linear map . f ∈ Hom(V, V ' ), the problem of diagonalization was solved in
Sect. 3.3 and in Theorem 3.1. To accomplish this, we simply had to choose two
tailor-made bases . B0 and . B0' in .V and .V ' . For an operator . f ∈ End(V ), the situation
is very different and much more difficult. Here, it is natural to look for only one
9.4 The Question of Diagonalizability 259

tailor-made basis . B0 (one vector space, one basis) and we expect (or hope) to obtain
a diagonal matrix
⎡λ 0

1

⎢ .. ⎥
⎢ . ⎥
f
. B0 B0 ≡ f B0 =⎢ λs ⎥ with λs ∈ K.
⎣ .. ⎦
.
0 λn

However, we cannot expect to find a diagonal representation for every operator. This
leads to the diagonalizability question and to the following equivalent definitions.

Definition 9.7 Diagonalizability 1.


A map . f ∈ End(V ) is diagonalizable if there is a basis . B0 of .V so that the
representation . f B0 is a diagonal matrix.

Definition 9.8 Diagonalizability 2.


The endomorphism . f ∈ End V is diagonalizable, if the vector space .V has a
basis .C consisting of eigenvectors of . f .

Proposition 9.3 Definitions 9.8 and 9.7 are equivalent.

Proof If Definition 9.7 holds, . f is diagonalizable with a diagonal representative of


the equivalence [ ] λ1 0
f = .. . (9.28)
. B0
.
0 λn

Then the entries of this matrix are

. B0f = [ϕis ] with ϕis = λs δsi . (9.29)

Suppose the basis . B0 is given by the list .(v1 , . . . , vn ). The values . f vs of the basis
vector .vs are given as usual by the expression

. f vs = vi ϕis = vi λs δsi = vs λs , (9.30)


f vs = λs vs .
260 9 The Role of Eigenvalues and Eigenvectors

This shows that the basis vectors of . B0 are eigenvectors of the map . f . Hence a
tailor-made basis . B0 is an eigenvector basis of . f . This is Definition 9.8.
Conversely, if Definition 9.8 holds,. f has a basis of eigenvectors.C = (c1 , . . . , cn )
and then we have . f cs = λs cs . This means . f C = (λs δsi ), and we see immediately that
[λ 0
]
1

f = .. .
. C
.
0 λn

This is Definition 9.7. That means .C = B0 , an eigenvector basis, is a tailor-made


basis. This completes the proof of the equivalence of Definition 9.7 and Definition
9.8. ∎
The following proposition gives a sufficient condition for the diagonalizability:

Proposition 9.4 Diagonalizability by distinct eigenvectors.


If .V (of dimension .n) and . f ∈ End(V ) has .n distinct eigenvalues, then . f is
diagonalizable.

Proof From Proposition 9.1 we know that if the eigenvalues .λ1 , . . . , λn are distinct,
so the list of the .n corresponding eigenvectors .v1 , . . . , vn is a linearly independent
set. Since .dim V = n, it follows that .(v1 , . . . , vn ) is a basis of .V . This basis consists
of eigenvectors. So by Proposition 9.3, . f is diagonalizable. ∎

Comment 9.1 Abstract decomposition.

The best way to analyze the structure of an operator . f ∈ Hom(V, V ) is to


decompose .V as finely as possible into . f -invariant subspaces:

. V = U1 ⊕ · · · ⊕ U j ⊕ · · · ⊕ Ur (9.31)

with corresponding endomorphism decomposition . f j := f |U j

. f j : U j −→ U j so that

. f = f 1 ⊕ · · · ⊕ f j ⊕ · · · ⊕ fr . (9.32)

For this purpose, it is useful to first consider an abstract decomposition of .V


without the given operator . f . Furthermore, if later we want to establish the
connection with the operator . f , it is helpful to use a corresponding projection
system (. P1 , . . . , Pr ) when decomposing .V with:
9.4 The Question of Diagonalizability 261

. P j : V −→ U j .

This results in a “complete orthogonal system” of idempotents (projections).


We describe this with the following definition.

Definition 9.9 Abstract decomposition and projection operators.


We call a list of idempotents (projections) (. P1 , . . . , P j , . . . , Pr ) a direct decom-
position of identity if for .i, j ∈ I (r )
(i) . Pi P j = δi j Pi and
(ii) . P1 + · · · + P j + · · · + Pr = idV .
The property (i) says that the list (. P1 , . . . , Pr ) is a linear independent list of
idempotents. Further, it is called “orthogonal” even though there is no mention
of an inner product space. The property (ii) is a decomposition of the identity.

The following proposition summarizes the two aspects, the direct sum decomposition
of .V and the complete orthogonal system of projection.

Proposition 9.5 Direct sum and direct decomposition of the identity.

(i) If . V = U1 ⊕ · · · ⊕ U j ⊕ · · · ⊕ Ur is a direct sum of the subspaces


.U1 , . . . , Ur , then there is a list .(P1 , . . . , Pr ) of projections in . V with a
direct decomposition of the identity such that for each . j ∈ I (r ), .U j =
im P j .
(ii) Conversely, let . P1 , . . . , Pr be a direct decomposition of identity in .V ,
then there is a direct sum decomposition

. V = U1 ⊕ · · · ⊕ Ur

with .U j = im P j .

Proof (i)
The above direct sum allows to write for .v ∈ V and .u j ∈ U j , j ∈ I (r ):

v = u 1 + · · · + u j + · · · + ur .
.

If we define projectors . P j :
262 9 The Role of Eigenvalues and Eigenvectors

. P j : V −→ U j
v |−→ P j v := u j .

We need to show that .(P1 , . . . , Pr ) is a direct decomposition of identity, that is, that
– . P j is linear;
– . Pi P j = δi j ;
– . P1 + . . . + Pr = id.
(See Exercise 9.4).
Conditions (i) and (ii) of Definition 9.9 hold and so. P1 , . . . , Pr is the corresponding
abstract decomposition. ∎

Proof (ii)
We have to show that for .v ∈ V there is a unique decomposition:

.v = u 1 + · · · + u j + · · · + ur

with .u j ∈ im P j = U j , j ∈ I (r ). The property (ii) leads to

v = idV v = (P1 + · · · + P j + · · · + Pr ) v = P1 v + · · · + Pr v = u 1 + · · · + u r .
.

So the decomposition holds. We show the uniqueness of the decomposition. We start


with the representation :

. 1 y + · · · + y j + · · · + yr = 0.

With . y j ∈ U j , x j ∈ V and . y j = P j x j , using the property (i) of Definition 9.9, we


obtain

0 = (P j (y j + · · · + yr ) = P1 y j = P j P j x j = P j2 x j = P j x j = y j
.

for all. j ∈ I (r ). Therefore, the decomposition is unique so (ii) and with it Proposition
9.5 is proven. ∎
We are now in the position to give a geometric characterization of
diagonalizability.

Theorem 9.1 Equivalence theorem of diagonalizability.


If . f ∈ End(V ) and .dim V = n, then the following statements are equivalent.
(i) The map . f is diagonalizable.
(ii) The characteristic polynomial .χ f decomposes into .n linear factors and
the geometric multiplicity is equal to the algebraic multiplicity .(n λ =
m λ ) for all eigenvalues of . f .
9.4 The Question of Diagonalizability 263

(iii) If .λ1 , . . . , λr are all the eigenvalues of . f (assumed distinct), then we


have the direct sum decomposition of .V :

. V = E λ1 ⊕ · · · ⊕ E λr .

(iv) There exists a linearly independent list of idempotents . P1 , . . . , Pr ∈


End(V ) with the following properties:
(a) . Pi P j = 0 for .i /= j and .i, j ∈ I (r );

(b) . P1 + · · · + Pr = idV ;

(c) . f = λ1 P1 + · · · + λr Pr .

According to Definition 9.9, the properties (a) and (b) state that the list.(P1 , . . . , Pr ) is
a direct decomposition of the identity. Assertion (iv) states that there exists a spectral
decomposition of .V induced by the operator . f .

Proof Part (i) .⇒ (ii).


Since . f is diagonalizable, the basis . B0 of eigenvectors of . f is given by the list

. B0 = (B1 , B2 , . . . , Br ) (9.33)
( j) ( j)
with . B j the basis of . E λ j , B j := (b1 , . . . , bn j ) j ∈ I (r ) and

.n j = dim E λ j . (9.34)

Since .V = span(B1 , . . . , Br ),

.n = dim V = n 1 + · · · + n r . (9.35)

Since . f is diagonalizable, we further get from the characteristic polynomial

χ f (x) = (x − λ1 )m 1 · · · (x − λr )m r ,
. (9.36)

the equation
.n = m 1 + · · · + mr . (9.37)

Since .n j < m j , we obtain from Eqs. (9.35) and (9.36) .∀ j ∈ I (r ), n j = m j . This


proves Part (ii). ∎

Proof Part (ii) .⇒ (iii).


As we saw in Sect. 9.2, when .λ1 /= λ2 , the sum of . E λ1 and . E λ2 is direct:
264 9 The Role of Eigenvalues and Eigenvectors

. E λ1 + E λ2 = E λ1 ⊕ E λ2 .

This can be generalized for all the eigenvalues of . f . So we have . E := E λ1 ⊕ · · · ⊕


E λr , a subspace of .V . Since .dim E = n 1 + · · · + n r = dim V = n, it follows that
. E = V and Part (iii) is proven. ∎

Proof Part (iii) .⇒ (i).


( j) ( j)
If . B j = (b1 , . . . , bn j ) is an eigenvector basis of . E λ j , then

. B = (b1(1) , . . . , bn(1)j , . . . , b1(r ) , . . . , bn(rr ) )

is an eigenbasis of . E = V . This means that


⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ λ1 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
. B = B0 and f B = ⎢



⎢ .. ⎥
⎢ . ⎥
⎢ λr ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
λr

with .m 1 the multiplicity of .λ1 and .m r the multiplicity of .λr .


So . f is diagonalizable and Part (i) is proven. ∎

Proof Part (iii) .⇔ (iv)


Suppose assertion (iii) holds. Then there is an . f -invariant decomposition of .V with
subspaces .U j = E λ j . Then Proposition 9.5 implies that there is a direct decompo-
sition of the identity .(P1 , . . . , Pr ) with .im Pi = Ui for each .i ∈ I (r ), which then
satisfies condition (iv)(a) and (iv)(b).
For condition (iv)(c), since this decomposition is . f -invariant, we have . f (U j ) ⊆
U j for all . j ∈ I (r ) and because .U j = E λ j = ker( f − λ j id), we get

. f |U j = λ j idU j = (λ1 P1 + · · · + λr Pr )|U j

for every . j ∈ I (r ). Consequently, we see . f = λ1 P1 + · · · + λr Pr and hence (iv)


holds.
Conversely, suppose we have (iv). Then by Proposition 9.5, we have a decompo-
sition
. V = U1 ⊕ . . . ⊕ Ur

with .Ui = im Pi for each .i ∈ I (r ). Then, since


9.4 The Question of Diagonalizability 265

. f − λ j id = (λ1 P1 + . . . + λ j P j + · · · + λr Pr ) − (λ j P1 + . . . + λ j Pr )
= (λ1 − λ j )P1 + . . . + (λ j−1 − λ j )P j−1 + (λ j−1 − λ j )P j + . . . + (λr − λ j )Pr

and .λi − λ j /= 0. Whenever .i /= j, we see

. E λ j = ker( f − λ j id) = ker((λ1 − λ j )P1 + . . . + (λr − λ j )Pr )


⊆ im P j
= Uj

because .(λi − λ j )Pi P j = 0 ∀ i /= j.


Further, if.x ∈ ker( f − λ j id), there is a unique decomposition of.x = u 1 + · · · +
u r with each .u i ∈ Ui and so

0 = ( f − λ j id)(x)
.

= (λ1 − λ j )u 1 + · · · + (λ j−1 − λ j )u j−1 + (λ j+1 − λ j )u j + · · · + (λr − λ j )u r .

By linear independence, each .(λi − λ j )u i = 0, and thus .u i = 0 whenever .i /= j.


Hence, .x = u j ∈ u j and . E λ j ⊆ u j . Thus

. Eλ j = U j ,
V =E λi ⊕ · · · ⊕ E λr

and we have (iii). Thus, (iii) and (iv) are equivalent. So Theorem 9.1 is proven. ∎
For this theorem, an interesting and clarifying conclusion is given by the following
statement.

Corollary 9.3 : Diagonalizability and direct decomposition.


If . f is diagonalizable, this is equivalent to the direct sum decomposition of .V
in one-dimensional . f -invariant subspaces:
n
. V = ⊕ Vα
α=1

with .dim Vα = 1 for all .α ∈ I (n). This corresponds to a decomposition of . f


with appropriate repetitions of the eigenvalues .λα :
n
. f = ⊕ λα PVα .
α=1

Proof We use the notation of Theorem 9.1 and the proof there. The proof is straight-
forward since we may write .α = ( j, μ) and .Vα = V j,μ := span(bμ( j) )
μ ∈ I (n j ). ∎
266 9 The Role of Eigenvalues and Eigenvectors

Comment 9.2 Diagonalizability of matrices.

All the formalisms mentioned above apply, of course, to matrices. A matrix


.F ∈ Kn×n may also be considered as an operator on the vector space .Kn , F ∈
End(Kn ). In addition, matrices are needed when using basis isomorphisms to
investigate the properties of operators on an abstract vector space .V . In this
sense, the question of diagonalizability is more direct for matrices.
A matrix . F is diagonalizable if there is an invertible matrix . P and a diagonal
matrix . D so that any one of the following equalities is satisfied.

. F = P D P −1 ⇔ D = P −1 F P ⇔ F P = P D. (9.38)

We immediately see from matrix multiplication on column vectors of . P that


the diagonalization of . F is equivalent to the existence of .n linearly independent
eigenvectors of . F, as proven in the Theorem 9.1. The .n columns of . P are the
.n linearly independent eigenvectors of . F and the diagonal entries of . D are the

corresponding eigenvalues of . F.
In other words the matrix . P considered as a list of .n columns (vectors in .Kn )
is an eigenbasis of . F. We can also see it as follows:
For
. P := (c1 , . . . , cn ) and D := diag(λ1 . . . , λn ),

we may write in the same notation

. F P = F(c1 , . . . , cn ) = (Fc1 , . . . , Fcn ) (9.39)

and [λ ]
1

P D = [c1 . . . cn ] .. = (λ1 c1 , . . . , λn cn ). (9.40)


.
.
λn

Equations 9.38, 9.39, and 9.40 lead to the eigenvalue

. Fcs = λs cs ∀s ∈ I (n). (9.41)

Again, we see here immediately that the columns of . P = (c1 , . . . , cn ) have to


be linearly independent whereas the scalars.(λ1 , . . . , λn ) have not to be different
from each other. So the diagonal of the diagonal matrix . D is given by

. D = diag(λ1 , . . . , λ1 , λ2 , . . . , λ2 , . . . , λr , . . . , λr ), (9.42)

with .(λ1 , . . . , λr ) distinct .(r < n).


9.4 The Question of Diagonalizability 267

After having discussed the question of diagonalizability, we may now ask what it
is suitable for.
There are a lot of reasons. In physics, we sometimes call it the decoupling proce-
dure, which shows an important aspect. In mathematical terms, when we are looking
for the most straightforward possible representation of an operator, we have to choose
an appropriate basis which we may, in this case, also call a tailor-made basis. It turns
out that this basis consists of eigenvectors. So the diagonalization reveals the true
face of an operator and its geometric properties. In the case where . f corresponds
to a physical observable, the eigenvalues of . f are exactly the physical values of the
experiment. In addition, diagonalization allows a considerable simplification in the
calculations.
At this point, it is instructive to compare the situation between .Hom(V, V ' ) and
'
.End(V ). In the case of .Hom(V, V ), the diagonalization is relatively straightforward

to obtain. As we saw in Sect. 3.3, Theorem 3.1, and Remark 3.3, we have at our
disposal two bases, one in .V and the other one in .V ' . In the case of .End V , it seems
natural to work with one basis since the domain and the codomain are identical,
and the problem is far more complex. This leads to the question of normal form for
endomorphisms. This problem was firstly discussed in Remark 3.4 (On the normal
form of endomorphisms) and Proposition 3.12 where we showed some obstacles to
using diagonalization.
But there is a chance! The space .End(V ) has much more structure than the space
'
.Hom(V, V ). .End(V ) is an algebra: the operator can be raised to powers of . f and to
linear combination of powers of . f , in contrast to linear maps in .Hom(V, V ' ).

Comment 9.3 The action of polynomials on the algebra .End(V ).

We have

. f k := f ◦ . . . . . . . . . ◦ f ∈ End(V ).
k − times

This also means that we may talk about linear combinations of powers of . f ,
with . f 0 = id.

αm f m + αm−1 f m−1 + · · · + α2 f 2 + α1 f + α0 id.


. (9.43)

As we immediately see, this is connected to a polynomial .ϕ ∈ K[x].

ϕ(x) = αm x m + · · · + α2 x 2 + α1 x + α0 .
. (9.44)

If we consider also the map

. f |−→ ϕ( f ) := αm f m + · · · + α2 f 2 + α1 f + α0 f 0 , (9.45)
268 9 The Role of Eigenvalues and Eigenvectors

we obtain altogether the action of polynomials on the algebra .End(V ) which is


a further example in our discussion in Sect. 1.3:

K[x] × End(V ) −→ End(V )


.

(ϕ, f ) |−→ ϕ( f ). (9.46)

This means for the exploration of the subtle properties that we may use any of the
operators .ϕ( f ) in (9.45) with .ϕ ∈ K[x]. This gives us additional information about
the one operator . f ! In this spirit, we already took advantage of this when we used
the linear polynomials .lλ j ∈ K[x] given by

. λjl (x) := x − λ j ,

to determine the eigenspace . E λ j by

. E λ j ≡ E(λ j , f ) := ker lλ j ( f ) = ker( f − λ j I dV ). (9.47)

Now, if we continue along this path, we come to very interesting insights about
the operator . f . We consider for example the polynomial . pm (x) = x m and write
.f
m
:= pm ( f ). So if we have to calculate the power of . F ∈ Kn×n , we may use the
diagonalization, as in Eq. (9.38), and we get

. F = P D P −1

and we have
. F 2 = F F = P D P −1 P D P −1 = P D 2 P 1

and similarly
. F m = P D m P −1 .

Since . D m = diag(λm1 , . . . , λn ), a perfectly simple expression, we obtained every


m

power of . F directly.

9.5 The Question of Non-diagonalizability

To understand diagonalizability more thoroughly, we have to consider some aspects


[ ]
of the non-diagonalizability.
[0 1] We start with two examples of matrices, . A = 01 −1
0
and .C = 0 0 and check whether they are diagonizable or not.
9.5 The Question of Non-diagonalizability 269

[ ]
Example 9.17 Example of non-diagonalizability: . A = 01 −1 0 .
At first, we consider these matrices as real and then as complex matrices. The
matrix . A leads to the following map:

A
.R2 −→ R2
(e1 , e2 ) −→ (e2 , −e1 ).

So we have for .u ∈ R2 , .u ' := A u

R'u = u ' R
R2 = e2 R
u' = A u

ϕ = π2 Ru = uR
. e2 u
ϕ ϕ .

e1 R1 = e1 R

This map is a rotation by .ϕ = π2 . Therefore, there is no one-dimensional linear


subspace that is invariant by the map . A. There are therefore no eigenvectors
or eigenvalues, thus the matrix . A is not diagonizable in the real vector space
.R . The fact that there are no real eigenvalues also follows from the solution
2

of the equation:
.χ A (x) = 0, (9.48)

given by the characteristic polynomial


[ x +1
]
χ A (x) = det[x1 − A] = det
. −1 x = x 2 + 1. (9.49)

Since
χ A (x) > 0,
. (9.50)

there exist no eigenvalues of this . A ∈ R2×2 . Therefore the set of eigenvalues


of . A is empty. We now consider . A as a complex matrix and the map:

. A: C2 −→ C2
(e1 , e2 ) | −→ (e2 , −e1 ).
270 9 The Role of Eigenvalues and Eigenvectors

The eigenvalues of . A are now determined by Eqs. (9.48) and (9.49). Hence we
obtain the eigenvalues .λ1 = −i and .λ2 = i. The corresponding eigenvectors
are given by: [1] [1]
.b1 = i and b2 = −i .

Thus . Ab1 = −ib1 and . Ab2 = ib2 . The eigenbasis . A is given by:
[1 ]
. B = [b1 b2 ] = 1
i −i .

We notice that in a complex vector space, as expected, the eigenvalue equa-


tion .χ A (x) = 0 always has a solution and thus there is always at least one
eigenvector.

[0 1]
[ 0 1 ] of non-diagonalizability: .C = 0 0 .
Example 9.18 Example
For the matrix .C = 0 0 however, we see that even in a complex vector space
we have:
C
.C2 −→ C2
→ e→1 ].
(e1 , e2 ) −→ [0,

There is no way to make the matrix .C better, that is, a diagonal one. Here, the
eigenvalue equation
[ x −1 ]
χC (x) = det
.
0 x = x2 = 0 (9.51)

provides only a single eigenvalue .λ1 = 0, thus the set of eigenvalues is . E W =


{λ1 = 0}.
Here, we obviously do not have enough eigenvectors. We conclude for the
moment that if there are not enough eigenvectors to obtain an eigenbasis, so
the corresponding matrix is not diagonalizable.
Later, we obtain for .C:
[0 1][0 1] [0 0]
C2 =
. 00 00 = 00 . (9.52)

We call a matrix . N nilpotent if for some .m ∈ N we have . N m = 0 (see also Def-


inition 9.12: Nilpotent operator, nilpotent matrix). This result tells us that nilpo-
tent matrices . N , as matrix .C and Eq. (9.52), are connected to the question of non-
diagonalizabilitiy.
The instrument to investigate non-diagonalizability is again the action of polyno-
mials on operators we introduced in Comment 9.3.
9.5 The Question of Non-diagonalizability 271

Especially, as we shall see, the action of nonlinear polynomials turns out to be very
helpful again. To proceed, we have to restrict ourselves to complex vector spaces or,
more generally, to operators in.C vector spaces with characteristic polynomials which
decompose into linear factors because here, the theory is much easier. In addition,
this is also the first and most significant step towards the theory of operators to real
vector spaces.
Our aim in this section, is not to develop the whole theory but to give a helpful
idea of what we may expect if the operator . f is nondiagonalizable.
The theory of diagonalizability leads us to the conclusion that, for a nondiagonal-
izable operator, the direct sum of the eigenspaces is not sufficient to decompose the
whole vector space .V . In this case, we obtain strict inequality:
r
. ⊕ Eλ j < V (9.53)
j=1

where .r < dim V is the number of distinct eigenvalues.


How can we deal with this problem? We first encountered this situation in Sect.
3.4 and Remark 3.4 (On the normal form of endomorphisms). In Proposition 3.12
(.ker f − im f decomposition), we saw a possible hint for dealing with this problem.
The central point is to first search for the finest possible . f -invariant decomposition
of .V . Here also, . f -invariant subspaces play a central role. Therefore, we consider
the following proposition:

Lemma 9.3 Sequence of increasing zero spaces.


For an operator .g ∈ Hom(V, V ), there is a number .m ∈ N such that the fol-
lowing sequence holds:

. ker g ≤ ker g 2 ≤ . . . ≤ ker g m−1 ≤ ker g m = ker g m+1 = ker g m+2 = . . . .

As we see, at the number .m, there is a certain saturation such that thereafter the
equality follows endlessly.
Proof In the proof of Proposition 3.12, we saw that the inequality .ker g ≤ ker g 2
holds. It is an easy exercise (see Exercise 9.5) that for every power .k ∈ N : ker g k ≤
ker g k+1 . Thus we have the following sequence of inequalities:

. ker g ≤ ker g 2 ≤ . . . ≤ ker g k ≤ ker g k+1 ≤ ker g k+2 ≤ . . . .

So we have to proof that there exists an .m ∈ N so that the following holds:

. ker g < ker g m−1 < ker g m = ker g m+1 = ker g m+2 = . . . .
272 9 The Role of Eigenvalues and Eigenvectors

First note that

if
. x ∈ ker g k+1 \ ker g k ,
then g x ∈ ker g k \ ker g k−1 ,

since

. x ∈ ker g k+1 \ ker g k ⇔ g k+1 x = 0 and g k x /= 0


⇔ g k (g x) = 0 and g k−1 (g x) /= 0
⇔ g x ∈ ker g k \ ker g k−1 .

Conversely,

.if ker g k = ker g k−1 ,


then ker g k+1 = ker g k .

Hence, if .ker g k+1 = ker g k , we see through induction that .ker g N = ker g k ∀N > k.
Then there are two possibilities. Either there is some .m ∈ N such that

. ker g m = ker g m+1 ,

in which case we are done.


Or we have an infinitely increasing sequence

. ker g ≤ ker g 2 < . . . < ker g m ker g m+1 < . . .

of vector spaces of ever increasing dimension. But since the dimensions are bounded
above by .dim V = n (as .ker g k ≤ V ∀ k ∈ N), this is not possible. ∎

Remark 9.2 The .m = dim V case.


One can see from the last paragraph of the proof that in fact .m ≤ dim V so
.ker g = ker g
n n+1
.

Remark 9.3 Criterion for non-diagonalizability.


Theorem 9.1 told us that if there are not enough eigenvectors and corre-
spondingly not enough eigenspaces, then there is no diagonalizability. The
eigenspace of a fixed operator . f for a fixed eigenvalue .λ, is given by
. E λ = ker( f − λid V ). This shows the relevance of Lemma 9.3. If we set
9.5 The Question of Non-diagonalizability 273

.g := f − λidV , then if we have .ker g 2 /= ker g or equivalently if we have


.ker g < ker g , then we lose diagonalizability. So if
2

. ker( f − λidV ) < ker( f − λidV )2

holds, then there exists a .w /= 0 such that .w ∈


/ E λ and .( f − λidV )w /= 0.
Therefore
'
.( f − λid V )w = w / = 0, w ∈/ Eλ

and
. f w = λw + w ' ∈
/ Eλ.

So . f is not diagonalizable. However, subspaces of the form .ker( f −


λidV )m are not only relevant but additionally lead to the answer of the ques-
tion of an appropriate direct . f -invariant decomposition of .V . This leads to the
Jordan decomposition of .V which will be described later, see Theorem 9.2 in
Sect. 9.5.

Fortunately, there are the so called “generalized eigenvectors” as elements of the


generalized eigenspaces .W (λ j , f ), given by the following definitions:

Definition 9.10 Generalized eigenvector.


Let .V be an .n-dimensional vector space, . f ∈ Hom(V, V ) and .λ ∈ K an eigen-
value of . f . Then a vector .v is a generalized eigenvector corresponding to .λ if
.v / = 0 and . N ∈ N such that

( f − λidV ) N v = 0.
. (9.54)

Remark 9.4 The . N = dim V case.


From Remark 9.2, one can just as well define

. W (λ, f ) ≡ ker( f − λidV )dim V .


274 9 The Role of Eigenvalues and Eigenvectors

Definition 9.11 Generalized eigenspace.


For an operator . f ∈ Hom(V, V ) with .dim V = n and .λ ∈ K, the generalized
eigenspace of . f corresponding to the eigenvalue .λ is defined by

. W (λ, f ) = ker( f − λidV )n . (9.55)

For a fixed . f , we write


. Wλ := W (λ, f ).

Thus we see that .Wλ is the set of all generalized eigenvectors of . f with respect to
λ, the vector .0 included. Additionally, we see that the eigenspace . E λ is contained in
.

the generalized eigenspace .Wλ . So we have

. E λ ≤ Wλ .

We now obtain the direct sum decomposition of .V :


r
. V = ⊕ Wλ j . (9.56)
j=1

This decomposition is . f -invariant:

. f : Wλ j −→ Wλ j ∀ j ∈ I (r ).

It is interesting to notice that the nonlinear polynomials . Pλ j (x) := (x − λ j )n are


relevant. Their mathematical formalism, which led to the above result, is similar to
the formalism which led to the diagonalizability theorem. It is worth knowing that
in connection with all the generalized eigenspaces .Wλ j , a special kind of operators,
the nilpotent operators play a crucial role. The problem is reduced to studying the
representations of nilpotent operators.

Definition 9.12 Nilpotent operator, nilpotent matrix.


An operator .h ∈ End(V ) is nilpotent if some power of .h is zero:

. h m = 0 with m ∈ N.

Similarly, a matrix . N ∈ Kn×n is nilpotent if for some .m ∈ N

. N m = 0.
9.5 The Question of Non-diagonalizability 275

Lemma 9.4 Eigenvalues of a nilpotent matrix.


The only eigenvalue of a nilpotent matrix is zero.

Proof We first show that .λ = 0 is an eigenvalue of a nilpotent matrix . N ∈ Kn×n . Let


N m = 0 and . N m−1 /= 0, then there is some .u ∈ Km , such that
.

. N m−1 u = v /= 0, N v = N m u = 0→ and v ∈ ker N /= {0}.


So .λ = 0 is an eigenvalue of . N . We further show that .λ = 0 is the only eigenvalue of


N : Let .λ be an eigenvalue of . N , so there is some .v ∈ Kn , v /= 0 such that . N v = λv.
.
Then, since . N m = 0, we have

. N m v = λm v = 0 and thus λ = 0.

The sets of eigenvalues of . N are given by

. E W (N ) = {0}.

The result of this lemma is that one nilpotent matrix is similar to a strictly upper
triangular matrix: [ ]
0 ∗
.. .
.
.
0 0

Definition 9.13 Upper triangular and strictly upper triangular matrix.


A matrix . A is called upper triangular if there are only zeros below the main
diagonal, that is, .αis = 0 for .i > s. Further, . A is called strictly upper triangular
if there are zeros on and below the main diagonal, that is,

.αis = 0 for i > s.

The next proposition shows that a nilpotent matrix is similar to a strictly upper
triangular matrix.
276 9 The Role of Eigenvalues and Eigenvectors

Proposition 9.6 On the structure of a nilpotent matrix.


Let . A ∈ Kn×n be a nilpotent matrix. The following equivalence holds:
(i) . A is nilpotent,
(ii) . A is similar to a strictly upper triangular matrix.
In this case, we have . An = 0 and the characteristic polynomial, .χ A (ξ) = ξ n .

Proof

– (i) .⇒ (ii)
Let . A be nilpotent. Lemma 9.4 shows that .λ = 0 is an eigenvalue and so . A is
similar to the matrix
[ ]
0 ∗
.N =
0→ N1 with N1 ∈ K(n−1)×(n−1) .

The matrix . N1 is, after an induction argument .n, a strictly upper triangular matrix
of the form [ ]
0 ∗
N1 = .. . . ,
.
. .
0 ··· 0

and . N is also a strictly upper triangular matrix.


– (ii) .⇒ (i)
We have just to show that a strictly upper triangular matrix . A ∈ Kn×n has . An = 0.
Again using induction on .n, we write
[0 α]
. A= 0→ B
with α ∈ (Kn−1 )∗ and B ∈ K(n−1)×(n−1)

with . B a strictly upper triangular with . B n−1 = 0. We then take


[0 α][0 α] [ 0 αB ] [ ]
0 αB n−1
. AA = 0→ B 0→ B
= 0→ B 2 such that An = 0→ B n
.

Since, as above, . B n−1 = 0 and . B n = 0, we also get . An = 0. This proves (i).

It follows that [ξ ∗
]
χ A (ξ) = det(ξ1n − A) = det .. = ξn .
.
.
0 ξ

So Proposition 9.6 is proven. ∎

This proposition shows that a nontrivial nilpotent matrix is not diagonalizable. We


will see later that (see Theorem 9.4 (iii)) this is not just an example but one of the
two obstructions to diagonalizability. For complex vector spaces, it is even the only
obstruction. We therefore supplement this proposition with the following corollary.
9.5 The Question of Non-diagonalizability 277

Corollary 9.4 Nilpotent, not diagonalizable.


Let . N /= 0 be a nilpotent matrix. Then it follows that . N is not diagonalizable.

Proof This follows directly from Proposition 9.6. ∎

In our case, the linear polynomial .lλ j with . L j := lλ j ( f ) ∈ End(V ), together with the
generalized eigenspace, .Wλ j leads to the nilpotent operator

. h j := ( f − λ j idV )|Wλ j . (9.57)

This should be compared with the corresponding operator

. ( f − λ j idV )| Eλ j , (9.58)

which is by definition the zero operator of the eigenspace . E λ j ! Denoting the restric-
tions of . f and .idV on the generalized eigenspace .Wλ j by . f j and .id j , we get

. f j := f |Wλ j , id j := id|Wλ j . (9.59)

We may write Eq. (9.57) as

. f j = λ j id j + h j , j ∈ I (r ). (9.60)

This leads us to the expectation that there exists a decomposition of the operator . f in
partial operators . f j , each one characterized by the eigenvalue .λ j , and that all these
. f j should have the same structure.
We may expect a universal structure for the operators . f j , j ∈ I (r ), with .r the
number of distinct eigenvalues. The following theorem shows that this is the case.
We prove it by using very elementary methods, as demonstrated by [7, pg. 238]. Note
that most proofs used in the literature use much more advanced techniques for this
kind of theorem. For simplicity, the formalism is given at the level of a matrix:

Theorem 9.2 Structure theorem (pre-Jordan form).


If the characteristic polynomial of a matrix . F ∈ Kn×n decomposes into linear
factors, the matrix . F is similar to a block diagonal matrix:
⎡ ⎤
F1 0
⎢ F2 ⎥
⎢ ⎥
. ⎢ .. ⎥. (9.61)
⎣ . ⎦
0 Fr
278 9 The Role of Eigenvalues and Eigenvectors

which we can also represent as follows:

. F1 ⊕ F2 ⊕ · · · ⊕ Fr .

Every block is given by

. F j = λ j id j + H j ∈ Km j ×m j , j ∈ I (r ) (9.62)

with.r the number of distinct eigenvalues of. F and. H j a strictly upper triangular
matrix in the strictly upper triangular form:
⎡ ⎤
0 ∗
⎢ .. ⎥
.Hj = ⎣ . ⎦. (9.63)
0 0

We call the upper triangular matrices . F j pre-Jordan matrices.


This gives the following direct sum decomposition of .Kn into . f -invariant
subspaces:
.K ∼ = Km 1 ⊕ · · · ⊕ Km j ⊕ · · · ⊕ Km r .
n

Note that the Jordan approach, using higher level mathematical instruments, contin-
ues to decompose each .Km j into . f -invariant, . f -irreducible vector spaces.

Proof The proof goes through by induction on .n = dim V . For .n = 1, F ≡ F1 =


λ1 and the theorem is trivially valid. We assume that the theorem is proven for
all matrices belonging to .K(n−1)×(n−1) . For the matrix . F ∈ Kn×n , the characteristic
polynomial is given by

.χ F (x) = (x − λ1 )m 1 · · · (x − λr )m r . (9.64)

F is similar to the matrix


[ ]
λ1 ∗
. and G ∈ K(n−1)×(n−1) (9.65)
0 G

and the characteristic polynomial factors as

χ F (x) = (x − λ1 )χG (x),


.

with
.χG (x) = (x − λ1 )m 1 −1 (x − λ2 )m 2 · · · (x − λr )m r . (9.66)

By induction the matrix .G is similar to the matrix


9.5 The Question of Non-diagonalizability 279
⎡ ⎤
F1∗ 0 · · · 0
⎢ 0 F2 0⎥
⎢ ⎥
.⎢ .. ⎥ (9.67)
⎣ . ⎦
0 0 Fr

with
. F j = λ j I j + H j ∈ K(m j ×m j ) j ∈ {2, . . . , r } (9.68)

and
. F1∗ = λ1 1 + H1 ∈ K(m 1 −1)×(m 1 −1) . (9.69)

From Eqs. (9.65), (9.67) and (9.68), it follows that . F is similar to a block matrix
given by ⎡ ⎤
F1 C2 · · · C j · · · Cr
⎢ 0 F2 0⎥
⎢ ⎥
.C = ⎢ .. ⎥ and C j ∈ Km 1 ×m j . (9.70)
⎣ . ⎦
0 0 Fr

Now, we have to show that .C is similar to the matrix . F. This means, we would like
to obtain something of the form
⎡ ⎤
F1 Y2 · · · Yr
⎢ 0 F2 0 0 ⎥
−1 ⎢ ⎥
.F = B CB = ⎢ .. ⎥ (9.71)
⎣ . 0⎦
0 0 Fr

with
. Y j = 0. (9.72)

For this purpose we choose the following invertible matrix . B with a similar form as
C in Eq. 9.70:
.
⎡ ⎤
1m 1 B2 · · · Br
⎢ 0 Im 2 0 ⎥
⎢ ⎥
.B = ⎢ . ⎥. (9.73)
⎣ . . ⎦
0 0 1m r

Then, when we conjugate .C by . B, we obtain . F = B −1 C B as above in Eq. (9.71)


with .Y j = 0.
.Y j = F1 B j − B j F j + C j . (9.74)

By setting . F j = λ j I j + H j , we get

. Y j = (λ1 − λ j )B j + H1 B j − B j H j + C j . (9.75)
280 9 The Role of Eigenvalues and Eigenvectors

The question is whether we can choose the. B j so that.Y j = 0 for every. j ∈ {2, . . . , r }.
If we divide the expression in Eq. (9.75) by .(λ1 − λ j ) /= 0, this does not change the
form of Eq. (9.75). So we may assume for some fixed . j, without loss of generality,
that .λ1 − λ j = 1, .∀ j ∈ {2, . . . , r }, just for our proof. Now the question is whether
we can solve the equation

0 = B j + H1 B j − B j H j + C j ,
. (9.76)

with unknowns the entries of . B j ≡ X ∈ Cm 1 ×m j . We have to solve the system .m 1 m j


of linear equations with .m 1 m j variables:

. X + H1 X − X H j = −C j . (9.77)

Setting
. L ≡ H1 , R ≡ H j and C0 ≡ −C j (9.78)

we have to solve generically the system

. X + L X − X R = C0 with X ∈ Cm 1 ×m j . (9.79)

The system of Eq. (9.79) has a unique solution if the homogeneous equation

. X + LX − XR = 0 (9.80)

has only the zero solution . X = 0. Lemma 9.5, shows that this is true. The solution
X obtained for Eq. (9.79) or equivalently . B j for Eq. (9.75), corresponds to .Y j = 0
.
in Eq. (9.71) and . F is similar to the block diagonal form in Eq. (9.61). This proves
the theorem. ∎

The lemma we used ensures that for the nilpotent matrices . L and . R(L ≡ H1 , R ≡
H j ) the homogeneous Eq. (9.80) has only the trivial solution .(X = 0).

Lemma 9.5 If . L ∈ Ks×s , R ∈ K t×t , s, t ∈ N are nilpotent matrices, the sys-


tem
.X − L X + X R = 0 (9.81)

has only the solution . X = 0.

Proof Let . X ∈ Kn×n and define . X (i) iteratively as follows:


9.5 The Question of Non-diagonalizability 281

. X (1) = L X − X R
X (2) = L X (1) − X (1) R
X (2) = L 2 X − 2L X R + X R 2
X (3) = L 3 X − 3L 2 X R + 3L X R 3 − X R 3
∑l ( )
l
X (l) = (−1)k L l−k X R k .
k=0
k

Since . L and . R are nilpotent, we have . L n = 0 and . R n = 0 where .n = dim V . If we


take .l = 2n, we already have . X (2n) = 0 because

.X
(2n) = L 2n X + L 2n−1 X R + · · · L n+1 X R n−1 − L n X R n + L n−1 X R n+1 + · · · X R 2n .

We have for . L l−k and . R k , .n < k or .n < l − k, so we obtain for . L l−k X R k = 0 ∀ k ∈


{0, 1, . . . , l = 2n}. Suppose, now, that . X is a solution to . X − L X + X R = 0.

Then
. X = L X − X R = X (1)
= X (2) = . . . = X (2n) = 0.

So . X is the trivial solution . X = 0. ∎

Comment 9.4 Structure theorem with complex matrices.

The structure theorem is, in particular, valid for all complex matrices since
the fundamental theorem of algebra states that every nonconstant polynomial
decomposes into linear factors over .C. The above formulation of the theorem
has the advantage that real and complex matrices are treated uniformly.

Corollary 9.5 Upper triangular decomposition of .V in generalized


eigenspaces.
If the characteristic polynomial.χ f of an operator. f decomposes into linear
factors, there exists a representation which we may call “pre-Jordan”, as in
the above structure theorem, and a decomposition of .V ∼ = Kn in . f -invariant
generalized eigenspaces .Wλ j = ker( f − λ j idV ) :
n

. V = Wλ1 ⊕ Wλ2 ⊕ · · · ⊕ Wλr .


282 9 The Role of Eigenvalues and Eigenvectors

A simple proof of the Cayley-Hamilton theorem is a nice application of Theorem 9.2.


This proof is also essentially based on properties of nilpotent matrices, together with
the action of polynomials on matrices which appear in Proposition 9.2 and which we
call pre-Jordan matrices. We first need the following lemma.

Lemma 9.6 Action of a polynomial on a pre-Jordan matrix.


For a matrix in the form

. A = λ1 + N ∈ Km×m

with . N a strictly upper triangular matrix, the action of a polynomial .ϕ ∈ K[x]


is given by
'
.ϕ(A) = ϕ(λ)1 + N

with . N ' also a strictly upper triangular matrix.

Example 9.19 Proof by example.


We do not prove the above statement but give an example when .ϕ(x) = x 2 . If
we use the fact that

(λ1 + N )(λ1 + N ) = λ2 1 + λN + N λ + N N
.

with . N̄ = N N again a strictly upper triangular matrix such that .ϕ(A) =


ϕ(λ)1 + N ' is again a pre-Jordan matrix.

Theorem 9.3 Cayley-Hamilton theorem.


Let . F be a matrix .∈ Kn×n with characteristic polynomial .χ F (x) = (x −
λ1 )m 1 · · · (x − λr )m 1 . Then

. χ F (F) = 0 ∈ Kn×n .

Proof Since we can use the relations

. F = T F̃ T −1 with T ∈ Gl(n) and


−1
ϕ(F) = T ϕ( F̃)T with ϕ ∈ K[x],
9.6 Algebraic Aspects of Diagonalizability 283

we can apply Theorem 9.2 to reduce to the case where . F is a pre-Jordan matrix. This
leads to the following expression:
⎡ ⎤
χ F (F1 ) 0
⎢ χ F (F2 ) ⎥
⎢ ⎥
χ F (F) = ⎢
. .. ⎥.
⎣ 0 . ⎦
χ F (Fr )

Lemma 9.6 leads for every .χ F (F j ) ∈ Km j ×m j , j ∈ I (r ) to the result


mj
χ F (F j ) is proportional to (F j − λ j 1 j )m j ≡ H j
.

m
with. H j nilpotent. So we have. H j j = 0 and.χ F (F j ) = 0 ∈ K(m j ×m j ) for all. j ∈ I (r ).
This leads to
.χ F (F) = 0 ∈ K
n×n

and to the proof of Theorem 9.3. ∎

9.6 Algebraic Aspects of Diagonalizability

At this point, two questions arise simultaneously. Firstly, what exactly does the
expression
.χ F (F) = 0 (9.82)

in the Cayley-Hamilton Theorem 9.2 mean? Secondly, are there other polynomials
that fulfill the same relation? In Eq. (9.82), .χ F ≡ χ ∈ K(x) is the characteristic
polynomial of . F ∈ Kn×n ,

.χ(x) = x n + χn−1 x n−1 + · · · + χ2 x 2 + χ1 x + χ0 ∈ K, (9.83)

and the following expression holds:

. F n + χn−1 F n−1 + · · · + χ2 F 2 + χ1 F + χ0 1n = 0 ∈ Kn×n . (9.84)

This means that the list of matrices

(1n , F, F 2 , . . . , F n−1 , F n )
. (9.85)

is linearly dependent. Since the space .Kn×n is .n 2 -dimensional and a list like
2
−1 2
.(1n , F, F 2 , . . . , F n , Fn ) (9.86)
284 9 The Role of Eigenvalues and Eigenvectors

is always linearly dependent, we also see that the .n + 1 elements in (9.85) which are
fewer than the .n 2 + 1 elements in (9.86), are already linearly dependent.
In connection with this, a logical question is whether a list of powers of . F with
even smaller length than .n + 1 could also be linearly dependent. It is clear that here
we talk about a vector space which we denote by .K[F] and which is generated by
the powers of a matrix . F:

. span(1n , F, F 2 , F 3 , . . . ). (9.87)

So we define

Definition 9.14 .K[F]: The matrix polynomials of . F.

K[F] := {ϕ(F) : ϕ ∈ K[x]}.


. (9.88)

It is further clear that the vector space .K[F] is also a commutative sub-algebra of the
matrix algebra .Kn×n .
All this leads to the notion of minimal polynomials.
Let
. I F := {χ ∈ K[x] : χ(F) = 0} (9.89)

be the set of all polynomials which annihilate the matrix . F. We call .χ ∈ I F an


annihilator of the matrix . F. Note that . I F < K[x] is an ideal in .K[x]. This means that
. I F is an additive subgroup of .K[x] and that for any .χ ∈ I F and .ϕ ∈ K[x], .ϕχ ∈ I F .
Thus we can say that any scaling of an annihilator of . F by a polynomial, leads back
to an annihilator of . F.
For a commutative algebra .A, we can generally give the following definition for
an ideal:

Definition 9.15 Ideal.


An ideal of a commutative algebra .A is a subset . I of .A that is closed under
addition and subtraction,

.a, b ∈ I ⇒ a + b, a − b ∈ I,

and also under multiplication by elements of .A, that is,

i ∈ I, a ∈ A ⇒ ai ∈ I.
.

Using the formalism of Sect. 1.3, we can state that there is an action of polynomials
on the annihilators of . F:
9.6 Algebraic Aspects of Diagonalizability 285

K[x] × I F −→ I F
.

(χ, ϕ) |−→ ϕχ ∈ I F .

It is interesting to see that, using the definition .χ F (x) = det(x1 − F), we can write
for .χ F (F) = 0:

. F n − (tr F)F n−1 + · · · + (−1)n det(F) = 0. (9.90)

Hence,

χ F (x) = x n + χn−1 x n−1 + · · · + χ2 x 2 + χ1 x + χ0 ,


.

with χn−1 = − Tr(F) and χ0 = (−1)n det(F).

The following proposition shows the existence of minimal polynomials.

Proposition 9.7 Existence of minimal polynomials.


For a given matrix. A ∈ Kn×n , there exists a unique monic polynomial.μ ∈ K[x]
with smallest positive degree such that .μ(A) = 0.

Proof Since .K[A] is a subspace of .Kn×n , the following holds:

.m := dim K[A] < dim Kn×n = n 2 . (9.91)

Hence the list .(1n , A, A2 , . . . , Am ) is linearly dependent and therefore there exist
coefficients .λs ∈ I (m). Not all of them are zero so that

λ As = 0
. s (9.92)

holds. This corresponds to the polynomial

χ(x) := λs x s ,
. (9.93)

with .0 /= ϕ ∈ I A , such that


χ(A) = 0.
. (9.94)

Since the set of annihilators of . A, I A , is an ideal of .K[x], it follows from the theory
of polynomials that
. I A = K[x]μ,

where .μ is a uniquely determined monic polynomial with minimal degree and


deg μ = m.
. ∎
There exists another proof without the use of the theory of polynomials.
286 9 The Role of Eigenvalues and Eigenvectors

Definition 9.16 Minimal polynomials.


The minimal polynomial .μ F (x) of a matrix . F is the unique monic polynomial
of minimal positive degree that annihilates . F, that is, .μ F (F) = 0.

.μ F (F) = 0, (9.95)

and has the smallest degree among such polynomials.

The next proposition shows that the eigenvalues of an operator are also zeros of the
minimal polynomial.

Proposition 9.8 Eigenvalues and zeros of the minimal polynomial.


For a matrix . A ∈ Kn×n and .λ ∈ K, the following conditions are equivalent:
(i) .λ is an eigenvalue of . A.
(ii) .λ is a zero of the minimal polynomial of . A.

Proof (i) .⇒ (ii)


Let .v /= 0 be an eigenvector of . A with eigenvalue .λ. Then . Ak v = λk v and in fact
for a polynomial .ϕ ∈ K[x], .ϕ(A)v = ϕ(λ)v holds. Additionally, when .ϕ = μ, we
obtain .μ(A)v = μ(λ)v, and since .μ(A) = 0, we have .μ(λ)v = 0 and .μ(λ) = 0. This
is the assertion (ii).

Proof (ii) .⇒ (i)


Let .μ(λ) = 0. We have to show that there is some .v ∈ V \ {0} so that . Av = λv or
.(A − λ1)v = 0. We write
.μ(x) = (x − λ)Q(x). (9.96)

Since .μ is the minimal polynomial of . A, we have .Q(A) /= 0. Let .W := im Q(A) < Kn


so that .W = {Q(A)u : u ∈ Kn } /= 0. We choose .0 /= v ∈ W so that there is a .u ∈ Kn
with .Q(A)u = v. Hence

. Av − λv = (A − λ1)v = (A − λ1)Q(A)u. (9.97)

Using Eq. (9.96) and .μ(A) = (A − λ1)Q(A), we obtain from Eq. (9.97):

.(A − λ1)v = μ(A)u = 0 u = 0.

This shows that (i) holds and Proposition 9.8 is proven. ∎

The characteristic polynomial and the minimal polynomial have exactly the same
zeros even though they may have different multiplicities. For example, for an operator
. A on a .C-vector space:
9.6 Algebraic Aspects of Diagonalizability 287

. χ A (x) = (x − λ1 )m 1 (x − λ2 )m 2 · · · (x − λr )m r (9.98)

and
μ A (x) = (x − λ1 )d1 (x − λ2 )d2 · · · (x − λr )dr
. (9.99)

with .d j < m j , j ∈ I (r ). Further, for all . A ∈ Kn×n ,

. deg μ A < deg χ A = dim Kn = n. (9.100)

The characteristic polynomial gives direct information about the eigenvalues of . A


because.χ A (x) is equal by definition to.det(A − x1) which is calculable. The minimal
polynomial is more difficult to determine. However, it provides indirect information
about the eigenvectors and thus about the diagonalizability of . A. As we will see
(Theorem 9.4), if the minimal polynomial has only simple zeros, then this guarantees
the diagonalizability of . A.
Theorem 9.2, together with the above preparations such as the commutative alge-
bra .K[A] in Definition 9.14 and the minimal polynomials .μ A , Definition 9.16, pro-
vides also a pure algebraic criterion for the diagonalizability of an operator or a
matrix. For the sake of simplicity, we treat here the matrix version. We define the
term semisimple for a commutative algebra because it is useful for a concise summary
of our results.

Definition 9.17 Semisimple commutative algebra.


We call a commutative algebra .A semisimple if .A has no nonzero nilpotent
elements. In our case where .A = K[A], we can say even more concisely that
an element . A ∈ K[A] is semisimple if .K[A] is semisimple.

Theorem 9.4 Diagonalizability, algebraic perspective.


If the characteristic polynomials of a matrix . F ∈ Kn×n decomposes into linear
. F factors with characteristic polynomials,

.χ F (x) = (x − λ1 )m 1 (x − λ2 )m 2 · · · (x − λr )m r (9.101)

with .λ j ∈ K, j ∈ I (r ) distinct and with a minimal polynomial .μ F , then the


following statements are equivalent:
(i) The matrix F is diagonalizable.
(ii) Every element of .K[F] is diagonalizable.
288 9 The Role of Eigenvalues and Eigenvectors

(iii) The commutative algebra .K[F] contains no nonzero nilpotent elements


or, equivalently, the commutative algebra .K[F] is semisimple.
(iv) The minimal polynomial .μ F is

μ F (x) = (x − λ1 )(x − λ2 ) · · · (x − λr ),
.

or, equivalently, all zeros of .μ F are simple.

Proof (i) .⇒ (ii)


Since . F is diagonalizable, there exists a basis . B0 in .Kn such that

. B0−1 F B0 = Δ (9.102)

holds, where .Δ is a diagonal matrix with entries .λ1 , . . . , λr (including possible


multiplicities). The matrix .ϕ(F) for .ϕ ∈ K[x] is an element of .K[A]. The following
relation is valid between the polynomial and the adjoint .G L(n) action on matrices
in .Kn×n :
−1 −1
. B0 ϕ(F)B0 = ϕ(B0 F B0 ). (9.103)

Hence the two actions

. F |−→ ϕ(F) and


F |−→ B0−1 F B0

commute.
Using Eqs. (9.102) and (9.103), we obtain

. B0−1 ϕ (F) B0 = ϕ(Δ). (9.104)

Since .ϕ(Δ) is also diagonal, (ii) holds.


Proof (ii) .⇒ (iii)
Assume that . N ∈ K[A] is nilpotent. By assertion (ii), . N is diagonalizable. Corollary
9.4 tells us that . N is zero. It follows that no nilpotent nonzero element exists in .K[F].
This proves (iii).

Proof (iii) .⇒ (iv)


We first show that the polynomial

.ψ(x) = (x − λ1 )(x − λ2 ) · · · (x − λr ) (9.105)

with .ψ(λ j ) = 0 j ∈ I (r ), leads to a nilpotent matrix .ψ(F). According to Theorem


9.2, . F is similar to
. F1 ⊕ · · · ⊕ F j ⊕ · · · ⊕ Fr (9.106)
9.6 Algebraic Aspects of Diagonalizability 289

⎡ ⎤
F1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
. = ⎢ Fj ⎥ (9.107)
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
Fr

with the pre-Jordan matrix


. Fj = λ j 1 j + N j (9.108)

with . N j a strictly upper triangular matrix. The action of polynomials on . F j is given


by Lemma 9.6:
'
.ϕ(F j ) = ϕ(λ j ) 1 j + N j (9.109)

with . N 'j again a strictly upper triangular matrix. Taking .ϕ(F j ), we obtain

ψ(F j ) = ψ(λ j )1 j + N '' = 0 1 j + N ''j = N ''j


. (9.110)

with . N ''j the strictly upper triangular for all . j ∈ I (r ). Thus .ψ(F) is nilpotent. Now,
since .ψ(F) is both nilpotent and diagonizable, it follows from (iii) that .ψ(F) is zero.
Therefore, the above polynomial .ψ(x) is the minimal polynomial as it is already
minimal: .μ(x) = ψ(x). This proves (iv).

Proof (iv) .⇒ (i)


By Theorem 9.2, we have as above . F1 ⊕ · · · ⊕ F j ⊕ · · · ⊕ Fr with . F j = λ j 1 j +
N j j ∈ I (r ). We take, without loss of generality, . j = 1 as an example.

.μ F (F1 ) = (F1 − λ1 11 )(F1 − λ2 11 ) · · · F1 − λr 11 ). (9.111)

Consider. F1 − λ j 1 with. j /= 1. Then. F1 − λ j 1 = λ1 1 + N1 − λ j 1 = (λ1 − λ j )1 +


N1 . Hence,
.(F1 − λ2 11 )(F1 − λ3 11 ) · · · F1 − λr 11 ) =: B, (9.112)

where . B is an invertible matrix. From Eqs. (9.111) and (9.112) we obtain

.μ F (F1 ) = (F1 − λ1 11 )B and


0 = (F1 − λ1 11 )B and
(F1 − λ1 11 ) = 01 ≡ 0m 1 ∈ Km 1 ×m 1 .

Similarly, we obtain for . j = 2, . . . , j = r

. F2 − λ2 12 = 0m 2 , . . . , Fr − λr Kr = 0m r .

This proves (i) and thus the diagonalizability Theorem. ∎


290 9 The Role of Eigenvalues and Eigenvectors

Theorem 9.4, with the assertion (iii), could also be formulated differently:
A matrix . F is diagonalizable if and only if its characteristic polynomial decomposes
into linear factors and is semisimple.
This can even be shortened if we write for the decomposability of the characteristic
polynomial .χ F of the matrix . F:
A matrix . F is diagonalizable if and only if . F is decomposable and semisimple. This
is a pure algebraic point of view for diagonalizability.

9.7 Triangularization and the Role of Bases

The structure theorem in the previous section showed that for every operator in a
complex vector space, there exists a basis that leads to an upper triangular matrix
representation. As in the case of diagonalization, special bases play a crucial role.
But in this section, we will not discuss an . f -invariant decomposition of the given
vector space .V . We are going to use a procedure which allows to consider the vector
space .V as a whole. This is, in addition, particularly relevant and useful for proving
the spectral theorem (see Sect. 10.6) and consequently for a better understanding of
that theorem.

Proposition 9.9 Equivalence definitions for triangularization.


Let.V be a vector space with.dim V = n and. f ∈ End(V ) and. B0 = (b1 , . . . , bn )
a basis of .V with . f bs = bi ϕis , .s, i, j ∈ I (n) and . FB0 B0 = (ϕis ) ∈ Kn,n . Then
the following definitions are equivalent:
(i) f is upper triangular;
. B0 ,B0

(ii) For each . j ∈ I (n), . f b j ∈ span(b1 , . . . , b j );


(iii) The vector space .span(b1 , . . . , b j ) is . f -invariant for each . j ∈ I (n);
(iv) There exists an . f -invariant “flag”:

. V0 < V1 < V2 · · · < Vn = V

with .dim V j = j and . f V j ⊆ V j for all . j ∈ I (n).

It is clear that Proposition 9.9 (iv) is basis-independent. Note that the above . f -
invariant “flag” is not what we usually mean by an . f -invariant decomposition of .V .
The existence of a basis . B0 , as in the above definition, was shown in the structure
theorem in Sect. 9.5. It is still instructive and useful to see a second proof.
9.7 Triangularization and the Role of Bases 291

Proposition 9.10 Triangularization on a .K vector space.


There exists a basis . B0 of .V so that the matrix . FB0 B0 is upper triangular if and
only if the characteristic polynomial .χ f of . f decomposes into linear factors
over .K (which is always the case in a complex vector space).

Proof If . f B0 B0 is upper triangular, the characteristic polynomial .χ f of . f is given by


⎡ ⎤
x − ϕ11 ∗
⎢ .. ⎥
. χ f (x) = det ⎣ . ⎦ = (x − ϕ1 ) · · · (x − ϕn ),
1 n

0 x − ϕnn

so it decomposes into linear factors. If, on the other hand, the characteristic polyno-
mial .χ f is given by .χ f (x) = (x − λn ) · · · (χ − λn ), where .λ1 . . . , λn are the eigen-
values of . f (repetition of course is allowed), we show the assertion by induction
on .n.
For .n = 1, . f B B ∈ K1×1 is already upper triangular.
We start with an eigenvector .v1 ≡ b1 for .λ1 : f v1 = λ1 v1 and choose a basis
. B = (b1 , u 2 , . . . , u n ) of . V . We set . B1 = {v1 }, . B2 = (u 2 , u 3 , . . . , u n ), U1 = span B1 ,

and .U2 = span B2 . So we have

. V = U1 ⊕ U2 .

We define .h : U2 → U1 and .g : U2 → U2 with .μ, ν ∈ {2, . . . , n} by


ν
.h(u μ ) = b1 ϕμ and .g(u μ ) = u ν ϕμ .
1

. f : U2 → V is given by f (u) := h(u) + g(u), u ∈ U2 .

We can write . f B B as ( )
λ1 h B1 B2
f
. BB =
0 g B2 B2

and we get

χ f (x) = (x − λ1 )χg (x) and χg (x) = (x − λ2 ) . . . (x − λn ).


.

The characteristic polynomial of .g decomposes into linear factors, so the induction


hypothesis goes through, and so a basis . B20 = (b2 , . . . , bn ) exists such that the matrix
.g B 0 B 0 is upper triangular.
2 2
Now we are in the position to define a new tailor-made basis for . f , . B0 :=
(B1 , B20 ) = (b1 , . . . , bn ) so that the matrix . f B0 B0 is upper triangular. This proves
the proposition. ∎
292 9 The Role of Eigenvalues and Eigenvectors

Summary

This chapter marks the beginning of the section of the book that we consider advanced
linear algebra. From now on, eigenvalues and eigenvectors take center stage.
Initially, we extensively presented the meaning, usefulness, and application of
eigenvalues and eigenvectors in physics, facilitating the reader’s entry into this
sophisticated area of linear algebra with many examples.
The question of diagonalization and the description of this process were the central
focus of this chapter. Highlights included two theorems. The first, the equivalence
theorem of diagonalizability, addressed the geometric aspects of this question. The
second theorem, concerning the algebraic perspective of diagonalizability, required
rather advanced preparation.
To understand the question of diagonalization properly, one must also understand
what non-diagonalizability means. Here too, we eased access to this question with
examples and theory. This theory led to the so-called pre-Jordan form, as we refer
to it here, which every diagonalizable and nondiagonalizable operator in a complex
vector space possesses. The highlight here was the structure theorem (pre-Jordan
form).
At the end of this chapter, we also discussed triangularization.

Exercises with Hints

Exercise 9.1 In Example 9.5, the subspace .U ( f, vo ), is . f -invariant. We call it an


f cyclic subspace: Let . f ∈ Hom(V, V ) be nilpotent. We consider a subspace .U ≡
.
U ( f, v0 ) of .V with .v0 ∈ V and . f m v0 = 0 with . f m−1 v0 /= 0. The subspace .U is
given by
.U = span(v0 , f v0 , f v0 , . . . , f v0 ).
2 m−1

We define .bs := f s−1 v0 , s ∈ I (m).


Show that . B = (b1 , . . . , bn ) is a basis of .U .

Exercise 9.2 Direct sum of eigenspaces. (See Corollary 9.1)


If the eigenvalues .(λ1 , . . . , λr ) of the operator . f are distinct, then show that
(i) the sum of the eigenspaces is direct:

. E λ1 + · · · + E λr = E λ1 ⊕ · · · ⊕ E λr ;

(ii) the following inequality is valid:


r
. dim E λi < dim V.
i=1
9.7 Triangularization and the Role of Bases 293

Exercise 9.3 Linear independence of eigenvectors.


Prove Proposition 9.1 by induction : Let. f ∈ End(V ). If.λ1 , . . . , λr are distinct eigen-
values of . f , then corresponding eigenvectors .v1 , . . . , vr are linearly independent.

Exercise 9.4 Direct sum and direct decomposition. (See Proposition 9.5)
If . V = U1 ⊕ · · · ⊕ U j ⊕ · · · ⊕ Ur is a direct sum of subspaces .U1 , . . . , Ur , then
show that there is a list .(P1 , . . . , Pr ) of projections in .V with a direct decomposition
of the identity such that for each . j ∈ I (r ), .U j = im P j (See Definition 9.9). This
means that if we define projections . P j :

. P j : V −→ U j
v |−→ P j v := u j .

We need to show that .(P1 , . . . , Pr ) is a direct decomposition of identity, that is, that

– . P j is linear;
– . Pi P j = δi j ;
– . P1 + . . . + Pr = id.

Exercise 9.5 For an operator . f ∈ Hom(V, V ), show that for every .k ∈ N,

. ker f k ≤ ker f k+1 .

(See Lemma 9.3)

Exercise 9.6 For an operator . f ∈ Hom(V, V )' , show that for every .k ∈ N,

. im f k+1 ≤ im f k .

Exercise 9.7 Sequence of falling ranges.


Show that for an operator . f ∈ Hom(V, V ), there is a number .m ∈ N such that the
following sequence holds:

im f m+2 = im f m+1 = im f m ≤ im f m−1 ≤ im f m−2 ≤ · · · ≤ im f.


.

[ ]
Exercise 9.8 Let . F be the matrix . F = 00 01 (as in Example 9.18), and .U =
span(e1 ). Show that there is no (complementary) . F-invariant subspace .Ū of .U ,
such that
.K = U ⊕ Ū ,
2

which means that . F is not diagonalizable.


294 9 The Role of Eigenvalues and Eigenvectors

The following five exercises are applications of Comment 9.3 about the action
of polynomials on the algebra .Kn×n in connection with diagonalization and
spectral decomposition induced by a diagonalizable matrix, as discussed in
Theorems 9.1 and 9.4.

Exercise 9.9 Show that the evaluation map .ev A , A ∈ Kn×n ,

ev A :
. K[x] −→ K[A],
ϕ |−→ ϕ(A),
ϕs x s |−→ ϕs As , s ∈ N,

is an algebra homomorphism.

Exercise 9.10 For a matrix . A ∈ Kn×n , show that

. ker(ev A ) = I A ≤ K[x]

such that we have


K[x]/I A ∼
. = K[A]!

For a diagonalizable matrix, we can use the minimal polynomial to obtain


the corresponding spectral decomposition. This means we determine the
eigenspaces of a diagonalizable matrix directly from the minimal polynomial
and the evaluation map.

Exercise 9.11 Let . A be a diagonalizable matrix . A ∈ Kn×n with minimal polynomial


.μ(x) = (x − λ1 ) . . . (x − λr ) where .r ≤ n. Use the minimal polynomial to construct

the linearly independent idempotence . P j ∈ K[A] j ∈ I (r ) with the following prop-


erties:
(i) . Pi P j = 0 if .i ∈ I (r ) and .i /= 0,
(ii) . P1 + · · · + Pr = 1n and
(iii) . A = λ1 P1 + · · · + λr Pr .
For this construction, show that there exist polynomials given by

μ(x) ϕ j (x)
ϕ j (x) :=
. and ψ j (x) := , x ∈ K; j ∈ I (r )
x − λj ϕ j (λ j )

with the following properties:


9.7 Triangularization and the Role of Bases 295

(i) .ψi ψ j − δi j ψi ∈ I A , i, j ∈ I (r ),
(ii) .ψ1 + · · · + ψr − 1 ∈ I A ,
(iii) .λ1 ψ1 + · · · + λr ψr − id ∈ I A , with .id(x) = x.
So the evaluation map .ev A leads directly to the desired result.

Exercise 9.12 Let.V be a vector space and. f ∈ Hom(V, V ). Show that the following
subspaces are . f -invariant:
(i) .ker f ,
(ii) .im f ,
(iii) .U such that .U ≤ ker f ,
(iv) . W such that . W ≥ im f .

Exercise 9.13 Let .V be a vector space and . f, g ∈ Hom(V, V ) with the property
f ◦ g = g ◦ f . Show that .im f is .g-invariant.
.

Compare the following two exercises with Example 9.17.

Exercise 9.14 Let . F be a map . F ∈ Hom(R2 , R2 ) given by


[ 0 +4 ]
. F= 1 0 .

Find the eigenvectors and eigenvalues of . F.

Exercise 9.15 Let .V be a map . F ∈ Hom(R2 , R2 ) given by


[ 0 −4 ]
. F= 1 0 .

Find the eigenvectors and eigenvalues of . F.

Compare the following two exercises with Example 9.14.

Exercise 9.16 Determine the eigenvalues and eigenvectors of the matrix


[1 1]
. F= 11 ∈ Hom(K2 , K2 ).

Exercise 9.17 Determine the eigenvalues and eigenvectors of the matrix


[111]
. F= 111 ∈ Hom(K3 , K3 ).
111
296 9 The Role of Eigenvalues and Eigenvectors

Exercise 9.18 Let. f ∈ Hom(V, V ) be invertible and.λ ∈ K. Show that.λ is an eigen-


value of . f if and only if .λ−1 is an eigenvalue of . f −1 !

The following exercise is relevant for Sect. 12.2: SVD.

Exercise 9.19 Singular Value Decomposition (SVD).


Let .V be a vector space and . f, g Hom(V, V ). Show that .g ◦ f and . f ◦ g have the
same eigenvalues.

The next two exercises are relevant for Theorem 9.4.

Exercise 9.20 Show that the evaluation map commutes with the adjoint representa-
tion of the linear group: Let . M ∈ Kn×n , . F ∈ G L(n, K) and .ϕ ∈ K[x]. Show that

ϕ(F M F −1 ) = Fϕ(M)F −1 .
.

Exercise 9.21 Example of a diagonalizable nilpotent operator.


Show that a self-adjoint nilpotent operator . f ∈ Hom(V, V ) is zero: . f = 0.

Exercise 9.22 This exercise is significant in connection with the minimal polynomial
of an operator.
Let . f ∈ Hom(V, V ), let .v ∈ V with .v /= 0, and let .μ be a polynomial of smallest
possible degree such that .μ( f )v = 0. Show that if .μ(λ) = 0, this .λ is an eigenvalue
of . f .

Nilpotent operators play a central role in the question of diagonalizability (see


Theorem 9.4). Proposition 9.6 showed that a nilpotent operator is similar to a
strictly upper triangular matrix. Here, a more direct proof is demanded.

Exercise 9.23 Representation matrix of a nilpotent operator.


Let. f be a nilpotent operator,. f ∈ Hom(V, V ) and.dim V = n. Show that there exists
a representation matrix . F of . f which has the strict upper triangular form:
[ ]
0 ∗
F= .. .
.
.
0 0
9.7 Triangularization and the Role of Bases 297

Choose step by step, first a basis of .ker f and then extend this to a basis of .ker f 2
and so on. The result is a basis of .V , and show that with respect to this, the basis
matrix . F has the desired form.

The following two exercises are first a wrong and then a more direct proof of
the Cayley-Hamilton theorem (see Theorem 9.3).

Exercise 9.24 Explain why the following proof is incorrect:

.χ F (F) = det(F1 − F) = det(F − F) = det(0) = 0 !

Exercise 9.25 Cayley-Hamilton theorem: second proof.


Let . F be a matrix .Kn×n with characteristic polynomial

χ F (x) = det(x1n − F).


.

Show, using the expression .(x1n − F)(x1n − F)# = det(x1n − F)1n (see Proposi-
tion 7.4), by a direct calculation:

.χ F (F) = 0 ∈ Kn×n .

Exercise 9.26 Check by explicit calculation that the characteristic polynomial


.χ A (x) = det(x1n − A) of a matrix . A ∈ Kn×n , the expression .χ A (B) ∈ Kn×n , with
.B ∈ K
n×n
does not mean that

χ A (B) equals det(A − B).


.

Exercise 9.27 This exercise answers the question whether for every given polyno-
mial.φ(x) = x n + ϕn−1 x n−1 + · · · + ϕ1 x + ϕ0 , ϕ0 , · · · , ϕn−1 ∈ K a corresponding
matrix . F ∈ Kn×n exists, such that the characteristic polynomial of . F is exactly the
given polynomial .φ(x).
Check that the matrix ⎡0 −ϕ

0
10 −ϕ1
⎢ .. ⎥
.F = ⎢
.. ⎥
⎣ . .n−2 ⎦
0 −ϕ
1 −ϕn−1

has characteristic polynomial of . F:

.χ F (x) := det(x1n − F) = φ(x).


298 9 The Role of Eigenvalues and Eigenvectors

References and Further Reading

1. S. Axler, Linear Algebra Done Right (Springer Nature, 2024)


2. S. Bosch, Lineare Algebra (Springer, 2008)
3. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Stutdienanfünger. Grundkurs
Mathematik (Springer, 2020)
4. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
5. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
6. N. Johnston, Advanced Linear and Matrix Algebra (Springer, 2021)
7. M. Koecher, Lineare Algebra und analytische Geometrie (Springer, 2013)
8. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences (Springer,
2018)
9. J. Liesen, V. Mehrmann, Linear Algebra (Springer, 2015)
10. P. Petersen, Linear Algebra (Springer, 2012)
11. S. Roman, Advanced Linear Algebra (Springer, 2005)
12. B. Said-Houari, Linear Algebra (Birkhäuser, 2017)
13. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach (Cam-
bridge University Press, 2022)
14. G. Strang, Introduction to Linear Algebra (SIAM, 2022)
15. R.J. Valenza, Linear Algebra. An Introduction to Abstract Mathematics (Springer, 2012)
16. R. Walter, Lineare Algebra und analytische Geometrie (Springer, 2013)
Chapter 10
Operators on Inner Product Spaces

In this chapter, we summarize and complete some important facts about inner
product spaces, real and complex ones, mostly on a more advanced level than in
Chaps. 2 and 6. In this context, the notions of orthogonality, orthogonal compliment,
orthogonal projection, and orthogonal expansion are discussed once again.
In Sect. 10.4, we motivate and discuss normal operators in some detail. We give
both an algebraic and a geometric definition of normal operators. We explain a
surprising and gratifying analogy of normal operators to complex and real numbers.
The highlight of this chapter and in fact of linear algebra altogether, are the spectral
theorems that are treated at the end of this chapter.

10.1 Preliminary Remarks

Vector spaces with additional structures are especially welcome in physics and math-
ematics. One such structure which we particularly like, is an inner product space.
This has to do with our surrounding space being a Euclidean space. It seems nat-
ural, especially in applications, to prefer spaces with more structure than a pure
abstract vector space in mathematics. In connection with inner product spaces, there
is the marvelous Pythagorean theorem. In physics, whenever a mathematical space
is needed as a model for physical reality, there is the tendency to burden it immedi-
ately with as many structures as possible and often with even more structures than
required. Therefore, in physics, when we are talking about vector spaces, we usually
mean an inner product vector space or, as it is also called, a metric vector space or a
vector space with a metric.
As we saw, we distinguish between real and complex vector spaces for abstract
vector spaces. The same is true for inner products; in the real case, we talk about
Euclidean vector spaces, and in the complex case, about unitary vector spaces.
The standard Euclidean vector space in n-dimensions is, as we already know, .Rn

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 299
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_10
300 10 Operators on Inner Product Spaces

with the usual real dot product. The standard unitary vector space in n-dimensions
(n-complex dimensions) is .Cn with the usual complex dot product (Hermitian inner
product). Inner product vector spaces are characterized by the property of having
positive definite scalar products. In physics, especially in the real case, there are
also nondegenerate scalar products, and so we also talk about semi-Euclidean or
pseudo-Euclidean vector spaces. This is the case in special and general relativity.
In this chapter, before coming to operators, we review a few facts about inner
product vector spaces and discuss some critical applications of the metric structure
connected with orthogonality.

10.2 Inner Product Spaces Revisited

For a vector space.V with dimension.n over a field.K = R or.C, the inner product.(−|−)
was defined in Sect. 10.3. For the standard vector space .Kn , the standard (canonical)
inner product .(−|−)0 is given in the form of the complex dot product:

(u|v)0 := ū 1 v 1 + · · · + ū n v n = u i v i ,
.

with .u, v ∈ Kn , u i , .v i ∈ K, u i := ū i and .i ∈ I (n). We can always construct an


isomorphism of inner product spaces:

.(V, (|)) ∼
= (Kn , (|)0 ).

The quadratic form, .|| · ||2 , is given by

|| · ||2
. : V −→ R+ ∪ {0},
v |−→ ||v||2 = (v|v).

At this point, it is interesting to notice the so called “polarization identities” which


connects the inner product to the corresponding quadratic form: If .V is a .R vector
space, we have
.4(u|v) = ||u + v|| − ||u − v|| .
2 2

If .V is a .C vector space, we have

.4(u|v) = ||u + v||2 − ||u − v||2 − i{||u + iv||2 − ||u − iv||2 }.

As we see, the inner product can be expressed in terms of its quadratic form.
10.3 Orthonormal Bases 301

10.3 Orthonormal Bases

We first recall the definition of the orthogonality for vectors .u, v ∈ V . We say .u and
v are orthogonal if .(u|v) = 0 and we write .u ⊥ v.
.

Definition 10.1 Orthogonal complement.


If . M is a subset of .V , the set . M ⊥ of all vectors in .V which are orthogonal to
. M is called the orthogonal complement of . M in . V :

. M ⊥ = {v ∈ V : (u|v) = 0 ∀u ∈ M}.

Definition 10.2 Orthogonal and orthonormal basis.


A basis. B = (b1 , . . . , bn ) of.V of dimension.n is called orthogonal if.(bi |b j ) =
β j δi j with .β j > 0 ∀i, j ∈ I (n).
A basis .C = (c1 , . . . , cn ) of .V is called orthonormal if .(ci |c j ) = δi j ∀i, j ∈
I (n).

Proposition 10.1 Orthogonality and linear independence.


A list of nonzero orthogonal vectors is linearly independent.

Proof If.(v1 , . . . , vk ),.k ∈ I (n) is a list of nonzero orthogonal vectors and if.vi λi = 0
with .λi ∈ K, then

||vi λi ||2 = (vi λi |v j λ j ) = |λi |2 ||vi ||2 = 0.


.

Since each .vi /= 0, positive definiteness tells us that .λi = 0 ∀ i ∈ I (h). Thus,
.(v, . . . , vn ) is linearly independent. ∎

Proposition 10.2 Orthonormal expansion.


For .v ∈ V and an orthonormal basis .(ci ), the expansion .v = c1 (c1 |v) + · · · +
cn (cn |v) holds. This is the widely used notation in quantum mechanics (see
Sect. 6.4):
.|v) = |c1 )(c1 |v) + · · · + |cn )(cn |v).
302 10 Operators on Inner Product Spaces

Proof Since .(c1 , . . . , cn ) is a basis, we have .v = ci v i . Thus,

(c j |v) = (c j |ci v i ) = (c j |ci )v i = δ ji v i = v i .


.

So we get .v = ci (ci |v), .(ci ≡ ci ). ∎

Proposition 10.3 (Gram-Schmidt orthogonalization)


If .(a1 , . . . , an ) is a basis of .V , then there exists an orthonormal basis
.(c1 , . . . , cn ) of . V with

. span(a1 , . . . , a j ) = span(c1 , . . . , c j ) for each j ∈ I (n).

Proof Consider the following inductive definitions:

b := a1
. 1 with
bi
ci := i ∈ I (n) and
||bi ||

k−1
bk := ak − cμ (cμ |ak ), k ∈ {2, . . . , n}.
μ=1

Set

. Ai := (a1 , . . . , ai ),
Bi := (b1 , . . . , bi ),
Ci := (c1 , . . . , ci ),
and Vi := span Ai .

We are going to show the result by induction on .n = dim V .


For .n = 1 we have .V1 = span C1 = span B1 ≡ span(a1 ). By induction hypothe-
sis, .Cn−1 = (c1 , . . . ., cn−1 ) is an orthonormal basis of .Vn−1 = span(a1 , . . . , an−1 ),
with .C j = (c1 , . . . , c j ) orthonormal basis of .V j , j ∈ I (n − 1).
We have still to prove that .Cn ≡ C = (c1 , . . . , cn ) is also an orthonormal basis to
. Vn ≡ V . Since by assumption .an ∈ / Vn−1 , we have .bn /= 0 and so .cn = ||bbnn || is well
defined with .||cn || = 1.
To show that .cn is orthogonal to .Vn−1 , we have for .k = n, setting for simplicity
μ μ
.(c | ≡ (cμ | in order to use the Einstein convention, , .bn = an − cμ (c |an ) and we
consider .μ, j ∈ I (n − 1):

(c j |bn ) = (c j |cn ||bn ||)


.

= (c j |cn )||bn ||.


10.4 Orthogonal Sums and Orthogonal Projections 303

On the other hand,

(c j |bn ) = (c j |(an − cμ (cμ |an )))


.

= (c j |an ) − (c j |cμ )(cμ |an ).

So

(c j |bn ) = (c j |an ) − δ jμ (cμ |an )


.

= (c j |an ) − (c j |an ) = 0.

Since.cn and.bn are colinear, we have also.(c j |cn ) = 0 and.C is indeed an orthonormal
basis of .V . ∎

Corollary 10.1 Schur’s Theorem.


Suppose .V is an inner product .K-vector space and . f ∈ End(V ). If the charac-
teristic polynomial .χ f of . f decomposes into linear factors over .K, then there
exists an orthonormal basis .C of . f so that . f CC is an upper triangular matrix.
This is, of course, especially valid for a .C-vector space.

Proof Section 9.7 and the proposition made there showed that there exists a
basis . B0 = (b1 , . . . , bn ) of .V so that . f B0 B0 is triangular, and that .span B j =
span(b1 , . . . , b j ) is . f -invariant for all . j ∈ I (n). We apply the above proposition
concerning the Gram-Schmidt orthogonalization to the basis . B0 , with .span(C j ) =
span(c1 , . . . , c j ) = span(b1 , . . . , b j ) for all . j ∈ I (n). So we conclude that .span C j
is also . f -invariant for all . j ∈ I (n) and that . f CC is triangular. ∎

10.4 Orthogonal Sums and Orthogonal Projections

As we saw, metric structures (inner products) on vector spaces lead to the notion of
orthogonality and the orthogonal complement. This allows a refinement of the direct
product and the parallel projection to the orthogonal sum and orthogonal projection.
These are pure geometric properties well-known from Euclidean geometry and show
once more the entanglement between geometry and algebra in linear algebra. We are
first going to study some elementary properties of orthogonal complements in the
following propositions:
304 10 Operators on Inner Product Spaces

Proposition 10.4 Properties of orthogonal complements.


For the subspaces .U and .W of .V , the following is valid:
(i) .U ⊥ < V ;

(ii) .U ∩ U ⊥ = {0};

(iii) .U < W ⇒ W ⊥ < U ⊥ ;

(iv) .{0}⊥ = V and V ⊥ = {0}.

The symbol .< stands for “subspace of” as throughout this book.
Proof (i)
Since .(0|u) = 0 for .u ∈ U , it then follows that .0 ∈ U ⊥ . By the linearity of the scalar
product .(v|·), we have, when .w, z ∈ U ⊥ , .(w|u) = 0, and .(z|u) = 0 for every .u ∈ U .
It then follows that .(w + z|u) = (w|u) + (z|u) = 0 + 0 = 0 so that .w + z ∈ U ⊥ .
Similarly .λw ∈ U ⊥ . ∎
Proof (ii)
Let .z ∈ U ∩ U ⊥ . Then .z ∈ U , .z ∈ U ⊥ so that .(z|z) = 0. Hence .z = 0 and thus .U ∩
U ⊥ = {0}. ∎
Proof (iii)
Let .w̄ ∈ W ⊥ . Then, as .U ⊆ W , we have .(w̄, u) = 0 ∀ u ∈ U . Thus .w̄ ∈ U ⊥ and so

.W ⊆ U ⊥. ∎
Proof (iv)
For all .v ∈ V , we have .(v|0) = 0 and so .v ∈ {0}⊥ which means that .{0}⊥ = V . For
⊥ ⊥
.v ∈ V , we have .(v|v) = 0 and so .v = 0 which means that . V = {0}. ∎

Definition 10.3 Orthogonal sum.


Let .U1 , . . . , Uk ≤ U be subspaces of .U . We say .U1 + · · · + Uk is an
orthogonal sum if:
(i) .Ui ∩ U j = {0} whenever .i /= j (so the sum is a direct sum),
(ii) Whenever .i /= j, .Ui ⊥ U j .
We denote orthogonal sums with .Θ as follows:

.U1 Θ U2 Θ · · · Θ Uk .
10.4 Orthogonal Sums and Orthogonal Projections 305

Proposition 10.5 Orthogonal complement.


If .U is a subspace of .V , then:

U ⊕ U ⊥ = V.
.

This means that every subspace.U uniquely induces a direct sum decomposition
of.V , an orthogonal decomposition. We may also write as above.U Θ U ⊥ = V.

Proof Let .v ∈ V . We write .v = u + v − u and we set .ũ := v − u or .v = u + ũ. Let


(u 1 , . . . , u k ) be an orthonormal basis of .U , we set .(u i | = (u i |, and
.

u = u i (u i |v).
.

Let
ũ = v − u i (u i |v).
.

We now show that .ũ ∈ U ⊥ or, equivalently, that .(u j |ũ) = 0 for all . j ∈ I (k):

(u j |ũ) = (u j |v − u)
.

= (u j |v) − (u j |u i )(u i |v)


= (u j |v) − δ ji (u i |u)
= (u j |v) − (u j |v) = 0.

So we may use the notation .ũ ≡ u ⊥ and .v = u + u ⊥ . Of course .u and .u ⊥ depend


on .v.

Corollary 10.2 .dim U + dim U ⊥ = dim V .

As expected, the property of “.⊥” is an involution:

Proposition 10.6 Orthogonality as involution. If .U is a subspace of .V , then

. (U ⊥ )⊥ = U.
306 10 Operators on Inner Product Spaces

Proof
(i) We show that .U < (U ⊥ )⊥ .
Let .u ∈ U . Then whenever .w ∈ U ⊥ , .(u|w) = 0. Hence .u ∈ (U ⊥ )⊥ and so
⊥ ⊥
.U ⊆ (U ) .
(ii) We show that .(U ⊥ )⊥ ⊂ U :
Let .w̄ ∈ (U ⊥ )⊥ . The orthogonal decomposition of .w̄ ∈ V relative to .U is given
(see Proposition 10.5) by .w̄ = u + z (so .w̄ − u = z) with .u ∈ U and .z ∈ U ⊥ .
Since .u ∈ U from (i), we have .u ∈ (U ⊥ )⊥ and so .z = w̄ − u ∈ (U ⊥ )⊥ . As we
see,.z ∈ U ⊥ ∩ (U ⊥ )⊥ = {0} (Proposition 10.4). So we have.z = 0 and.w̄ − u =
0 ⇒ w̄ = u ∈ U which means that .(U ⊥ )⊥ ⊂ U . So with (i) and (ii) we obtain
⊥ ⊥
.(U ) = U.


Analogous to orthogonal sums, orthogonal projections also lead to more refined and
“perfect” projections than parallel projections.

Definition 10.4 Orthogonal projections.


Suppose .U is a subspace of .V . As .U Θ U ⊥ = V , every element .v ∈ V can be
decomposed as .v = u + u ⊥ with .u ∈ U, u ⊥ ∈ U ⊥ . The orthogonal projection
. PU is the projection given by:

. PU : V −→ U
v |−→ PU (v) := u.

If we take into account an orthogonal or an orthonormal basis in .U , we can express


PU more directly. In Sect. 10.3 we already saw that for .U := span(u), . PU is given by
.

(u|v)
. PU (v) = u .
(u|u)

More generally, we notice that if we choose an orthonormal basis . BU = (u 1 , . . . , u k )


with .(u i |u j ) = δi j i, j ∈ I (k) in .U , then

k
. PU (v) = u j (u j |v)
j=1

or equivalently

k
PU = |u j )(u j |.
j=1

Here, we use a formalism which is routine, especially in quantum mechanics (see


Sect. 6.4), (.|u j ) ≡ u j and .(u j | ∈ U ∗ , the dual of .U ). For the orthogonal projection
. PU , the same relations as those for the parallel projections also hold. They are given
in the next proposition.
10.4 Orthogonal Sums and Orthogonal Projections 307

Proposition 10.7 Properties of orthogonal projections.


(i) . PU |U = idU ;
(ii) . PU |U ⊥ = 0̂ : zero operator;

(iii) .ker PU = U , im PU = U ;
. PU = PU ;
2
(iv)
(v) . PU + PU ⊥ = id V . ∎

Comment 10.1 Comparison with parallel projections.

Perhaps the most important property of an orthogonal projection which makes


the difference to parallel projections, is not clearly visible in this last proposition.
To define . PU , we just have to fix one subspace .U since the complement .U ⊥
is uniquely defined. On the other hand, we need two subspaces to determine a
parallel projection. In a similar notation as above, we have to write for a parallel
projection
. PU,W : V → U with V = U ⊕ W.

Comment 10.2 Cauchy-Schwarz inequality and projections.

The expression (v) in Proposition 10.7 leads us directly to a kind of


generalization of the Cauchy-Schwartz inequality of Sect. 10.3.

||PU v|| < ||v||.


.

A very interesting application of orthogonal projections in various fields in math-


ematics, physics, and many other sciences, refers to the minimization problem as
given in the following proposition.

Proposition 10.8 Minimization.


Let .U be a subspace of .V and .u ∈ U . Then
(i) .||v − PU v|| < ||v − u|| for .v ∈ V ;
(ii) .||v − PU v|| = ||v − u|| if and only if .u ≡ u 0 = PU v.
308 10 Operators on Inner Product Spaces

Proof As .(v − PU v|PU v − u) = 0, we have by Pythagoras

.||v − u||2 = ||v − PU v + PU v − u||2 = ||v − PU v||2 + ||PU v − u||2 .

Then
(i) .||v − u||2 > ||PU v − u||2 since ||v − PU v||2 > 0.
(ii) .||v − u||2 = ||v − PU v||2

– iff .||PU v − u||2 = 0,


– iff .||PU v = u.

10.5 The Importance of Being a Normal Operator

One could say that normal operators, particularly those on a complex vector space,
are mathematically the nicest operators one can hope to have: They are diagonal-
izable (even if not the only diagonalizable operators) and they have an orthogonal
eigenbasis. As we shall see later, normal operators are the only diagonalizable oper-
ators with an orthonormal basis. They lead to the so-called complex and real spectral
theorems, which underlines their beauty and usefulness. One could also say that they
are the most important operators to a physicist. Self-adjoint operators and isometries
are both normal. We call self-adjoint operators on complex and real vector spaces
also Hermitian and symmetric, respectively. Isometries for complex and real vector
spaces (always finite-dimensional in our approach), are also known as unitary and
orthogonal, respectively.
There is no doubt that an inner product vector space has much more structure than
an abstract vector space. This is why it is, at least in physics, much more pleasant
to have at our disposal an inner product vector space than having only an abstract
vector space with the same dimension. In addition, when dealing with operators in
an inner product vector space, it is natural to be interested in operators that interact
well with the inner product structures. Stated differently, if an operator has nothing
to do with the inner product, then the inner product is not relevant to this operator,
so there is no necessity to introduce a metric in addition to the linear structure. This
justifies dealing with inner product spaces and special operators with a characteristic
intrinsic connection to the inner product. It turns out that these operators are the
“normal” operators and it may appear rather surprising that the normal operators are
not exclusively those operators which preserve the inner product. This is what we
would expect from our experience in similar situations. It may also partially explain
why, in physics, the notion of a normal operator is often absent.
10.5 The Importance of Being a Normal Operator 309

The operator “adjoint”,

.ad : Hom(V, V ) −→ Hom(V, V )


f |−→ f ad

is, first, at the technical level, what is needed for the formulation of the specific
connection of the normal operator with the help of the inner product. At the same
time it is quite clear that the structure “inner product” is a necessary prerequisite
for the definition of what is “adjoint”. In the case of .V = Kn and .V ' = Km , the
inner product is the canonically given dot product. So we have . f ad = f † as was
shown in Sect. 6.3 and Proposition 6.9. From the discussion in Sect. 6.3, it follows
in addition that . f ad is a kind of a substitution for the inverse of . f . Now if we take
this assumption concerning “inverse” seriously and look for a weaker condition for
invertibility, it turns out that this idea can lead us to the definition of normal operators:
If . f ad was really an inverse operator .( f ad = f −1 ), we would have . f ◦ f ad = idV
and . f ad ◦ f = idV . A weaker condition would then be: . f ad ◦ f = f ◦ f ad . This
is exactly the definition of a normal operator! There is in addition a surprisingly
pleasant analogy to complex and real numbers which leads also to the notion of
normal, unitary and self-adjoint operators. For complex and real numbers we recall
the following well-known relation: if .z ∈ C\{0} and .x, y, r, ϕ ∈ R,
√ z /
. z = x + i y, z̄z = z z̄, |z| := z̄z, z = |z|, |z| = r = x 2 + y 2
|z|
z
positive or zero (nonnegative) and . |z| = eiϕ with .| z̄z | = |eiϕ | = 1.
If we ask for an operator . f in .(V, (|)) in a .C vector space with metric structure
which corresponds to .z and its properties shown above, using the analogy .z |→ z̄
with . f |→ f ad , we are led to the normal operators which in addition contain, again
in analogy to the complex numbers, the nonnegative and self-adjoint operators. So for
a normal operator. f , we expect the relation. f ad ◦ f = f ◦ f ad . This produces imme-
diately two obvious special cases for a normal operator: . f ad ◦ f = idV ( f ad = f −1 )
and . f ad = f which are also the most important ones, the isometries and the self-
adjoint operators. Further, if we define . f := h 1 + i h 2 with the two self-adjoint oper-
1 = h 1 and .h 2 = h 2 , it follows that . f normal is equivalent to the commu-
ators .h ad ad

tative relation .h 1 h 2 = h 2 h 1 . All displayed above may show that the introduction of
the notions “adjoint” and “normal” is not made by accident but are connected with
deep structures and are of tremendous relevance for mathematics and physics.
In order to proceed, we shortly recall the definitions of adjoint and self-adjoint
operators given in Sect. 6.3 and Definition 6.4. The adjoint operator of . f ∈ End(V )
is given by:
.( f w|v) := (w| f v) for all v, w ∈ V.
ad

We say in addition, . f is self-adjoint if . f ad = f .


We are now in a position to define a normal operator.
310 10 Operators on Inner Product Spaces

Definition 10.5 Normal operator (algebraic definition).


The operator . f ∈ End(V ) is normal if it commutes with its adjoint: . f ad ◦ f =
f ◦ f ad .

It is clear that for the notion adjoint, self-adjoint and normal operator, the existence
of a metric structure on .V is required which in our case is expressed by the inner
product .s = (−|·):

s : V × V −→ K
.

(u, v) |−→ s(u, v) ≡ (u|v)

as defined in Sect. 10.3.


For normal operators, we can also give the following equivalent definition:

Proposition 10.9 Normal operator (geometric definition).


f ∈ End(V ) is normal if and only if
.

.|| f ad v|| = || f v|| for all v ∈ V.

Proof The following sequence of equivalences leads to a pleasant result.

. f is normal ⇔ f ad f − f f ad = 0
⇔ (w|( f ad f − f f ad )v) ∀ w, v ∈ V
⇔ (w|( f ad f V − (w|( f f ad v)
⇔ ( f w| f v) = ( f ad w| f ad v).

The expression .( f w| f v) = ( f ad w| f ad v) ∀ w, v ∈ V is equivalent to .|| f v||2 =


|| f ad v||2 ∀v ∈ V because .(w|v) can be written in terms of its norm .|| · || through
the polarization identity (see Sect. 10.2). ∎

In what follows, we need to specify more of what we know about . f and . f ad . For
this reason, we have to consider separately for. f ∈ End(V ) whether.V is a complex or
real vector space. This means that we have to take into account whether. f is a.C-linear
or a.R-linear operator. Since.C-linearity is a stronger condition than.R linearity, it leads
as expected to more substantial results. This fact may be helpful in understanding the
results that follow. These are also extremely useful for understanding crucial aspects
of quantum mechanics theory.
10.5 The Importance of Being a Normal Operator 311

Proposition 10.10 Zero expectation value of an operator.


Suppose .V is a complex vector space and . f ∈ End(V ). If .(v| f v) = 0 for all
.v inV , then . f = 0̂.

Proof Suppose .u, v ∈ V . We use a modified version of the polarization identity (see
Exercise 2.31):

4(v| f u) = (v + u| f (v + u)) + (v − u| f (v − u))−


.

− i((v + iu| f (v + iu)) − (v − iu| f (v − iu))).

We observe that the terms on the right-hand side are of the form required by the
condition. This implies that the right-hand side is zero and so the left-hand side is
zero too. Now set .v = f w. Then .( f w| f w) = 0 and so . f w = 0. Since .w is arbitrary,
we obtain . f = 0̂. ∎

We are here in the situation where we have to distinguish between a complex and a
real vector space. For a real vector space (i.e., for a linear
[ operator
] . f ), Proposition
10.10 is not valid: In .R2 we have a rotation of 90.◦ by . 01 −1
0 / = 0̂ and moreover we
have .(v| f v) = 0 ∀v ∈ R2 .
It is interesting to notice that the above proposition is also valid for a real vector
space in the special case of a self-adjoint operator.

Comment 10.3 .C-linearity versus .R-linearity.

The assertion here that .V is a complex inner product space is important.


The condition .(v| f v) = 0 means that for every matrix representation . f B , the
diagonal is zero. Stated geometrically, the vector . f v is always orthogonal to .v
or, as in quantum mechanics, the expectation of the observable . f is zero. It then
follows that the operator . f itself is zero.

Proposition 10.11 Zero expectation values for a self-adjoint operator. If . f is


a self-adjoint operator and if
(v| f v) = 0 for all .v ∈ V , then . f = 0̂.
.

Proof For a self-adjoint operator in a complex vector space, it was already proven
in Proposition 10.10. Therefore, we can assume that .V is a real inner product vector
312 10 Operators on Inner Product Spaces

space (. f is only .R-linear), then we have for the appropriate modified version of the
polarization identity in Sect. 10.2,

4(v| f u) = (v + u| f (v + u)) + (v − u| f (v − u))


.

which holds if . f is symmetric and .(v| f v) real (see Exercise 2.28). So we obtain the
desired result as in Proposition 10.10. ∎

Before discussing the spectral theorems, it is helpful to study some properties of


eigenvalues and eigenvectors of normal and self-adjoint operators. This leads first to
Propositions 10.12, 10.13, and 10.14.

Proposition 10.12 Eigenvalues of self-adjoint operators.


Every eigenvalue of a self-adjoint operator is real.

Proof Let .(V, (−|−)) be an inner product space, . f = f ad a self-adjoint operator


on .V , and .v ∈ V \{0} an eigenvector .v with eigenvalue .λ. Then

λ(v|v) = (v|λv) = (v| f v) = ( f v|v) = λ̄(v|v).


.

Since .(v|v) /= 0, we have .λ̄ = λ. ∎

Comment 10.4 Characteristic polynomials of self-adjoint operators.

The characteristic polynomial of self-adjoint operators decomposes into


linear factors.
This follows essentially from above Proposition 10.12: When for .K = C,
the fundamental theorem of algebra states that the characteristic polynomial
.χ f of a self-adjoint operator in a .C vector space factorizes into linear factors.

Proposition 10.12 states that all the zeros of this .χ f are real.
For .K = R we get .χ f ∈ R[x]. The matrix . F of . f is real and symmetric:
T
. F̄ = F and . F = F. This indicates in addition that . F = F holds. So the result

of .K = C above also applies to the .K = R case.

A comparison between the eigenelements of . f ad and . f is given in the following


relation.
10.6 The Spectral Theorems 313

Proposition 10.13 Eigenvalues and eigenvectors of a normal operator, and


its adjoint.
Suppose . f is normal and .(λ, v) is an eigenelement of . f , then .(λ̄, v) is an
eigenelement of . f ad .

Proof If . f is normal, . f − λ I dV is also normal since . f − λ I dV and . f ad − λ̄ I dV


commute as we see after a direct calculation. From Proposition 10.9, we obtain

||( f − λ I dV )v||
. = ||( f ad − λ̄ I dV )v|| so that
( f − λ I dV )v = 0 ⇔ (f ad
− λ̄ I dV )v = 0.

This shows when . f is normal, . f ad and . f have the same eigenvectors and their
corresponding eigenvalues are complex conjugate. ∎

In the following proposition, we see that for normal operators, the eigenvectors
corresponding to different eigenvalues are linearly independent and orthogonal.

Proposition 10.14 Eigenvalues of distinct eigenvalues are orthogonal. Let .V


be an inner product space over .K ∈ {R, C} and let . f ∈ End(V ) be normal. If
.(λ1 , v1 ) and .(λ2 , v2 ) are eigenelements so that .λ1 / = λ2 , then .(v1 |v2 ) = 0.

Proof For .(λ2 − λ1 ) /= 0 we have

(λ2 − λ1 )(v2 |v1 ) = λ2 (v2 |v1 ) − λ1 (v2 |v1 )


.

= (λ¯2 v2 |v1 ) − (v2 |λ1 v1 ).

Using that . f is normal and Proposition 10.13, we obtain:

(λ2 − λ1 )(v2 |v1 ) = ( f ad v2 |v1 ) − (v2 | f v1 )


.

= (v2 | f v1 ) − (v2 | f v1 ) = 0.

10.6 The Spectral Theorems

As already pointed out, whether a matrix is diagonalizable depends on the field .R or


C we are working over. In general, the situation for a complex vector space is simpler
.

to deal with. Therefore, the mathematical literature will usually refer to complex and
314 10 Operators on Inner Product Spaces

real spectral theorems separately. We follow this course for instructional reasons, but
we show in a corollary that if we have a slightly different perspective, it is possible
to consider only one spectral theorem for the general field .K.
As discussed in Sect. 9.4, the diagonalizability of an operator . F on an .n-
dimensional vector space .V implies both the existence of a diagonal representation
. f B = diag(λ1 , . . . , λn ) and the existence of a basis . B = (b1 , . . . , bn ) consisting of
eigenvectors .bs , correspond to the eigenvalues .λs for every .s ∈ I (n). It is reasonable
to call . B an eigenbasis of . f . We already know that not every operator . f has the priv-
ilege to be diagonalizable or equivalently to have an eigenbasis. In a complex vector
space, the normal operators are precisely those operators which have the privilege not
only to be diagonalizable but in addition to be diagonalizable with an orthonormal
eigenbasis!
This is essentially the content of the spectral theorems:

Theorem 10.1 The complex spectral theorem .(K = C).

Let .V be an .n-dimensional complex vector space and . f ∈ End(V ). Then


. f is normal if and only if .V has an orthonormal basis of eigenvectors of . f .

Proof Suppose that an orthonormal basis eigenbasis .C = (c1 , . . . , cn ) exists. Then


.f is diagonalizable and the matrix . F := f C can be written as . F = diag(λ1 , . . . , λn ).
The adjoint. f Cad = F † is also diagonal and so. F † F = F F † . Thus. f ad ◦ f = f ◦ f ad
and . f is normal.
Suppose that . f is normal. Then, by Schur’s theorem (see Corollary 10.1), . f is
triangularizable so there exists an orthonormal basis .C = (c1 , . . . , cn ) and . f C ≡
F = (ϕis ) is an upper triangular matrix: .ϕis = 0 for .i > s, i, s ∈ I (n).
Then the matrix . F is given by
⎡ ⎤
ϕ11 ϕ12 · · · · · · · · · ϕ1n
⎢ ⎥ ..
⎢0 ϕ22 · · · · · · · · · ⎥ .
⎢ ⎥
⎢. ⎥ ..
.F = ⎢ . ⎥.
⎢. 0 ϕ33 · · · · · · ⎥ .
⎢. .. ⎥
⎣ .. . n−1 n−1 ⎦
· · · 0 ϕn−1 ϕn
0 0 ··· 0 ϕnn

The adjoint of . f is given by

f ad = F † = (ϕad )is ) with (ϕad )is = ϕ̄is .


. C (10.1)

Equation (10.1) tells that . F † is lower triangular and we can write:


[∗ ∗ ∗] [∗ 0 0]
. F= 0∗∗ and F ad = ∗∗0 . (10.2)
00∗ ∗∗∗
10.6 The Spectral Theorems 315

. F and . F ad represent . f and . f ad . This means:


n
. f cs = ci ϕis and f sad = ci (ϕad )is = ci ϕ̄is . (10.3)
i=1

Since . f is normal, we have in addition:


n ∑
n
|| f cs ||2 = || f ad cs || or equivalently
. |ϕis |2 = |ϕis |2 . (10.4)
i=1 i=1

This means that the norms of the corresponding columns of . F and . F † are equal.
This leads by induction of Eq. (10.2) directly to the result that only the diagonal
elements of . F are nonzero. So . F is diagonal and . f is diagonalizable. This proves
the Theorem. ∎

Theorem 10.2 The real spectral theorems .(K = R).

Let .V be an .n-dimensional real vector space and . f ∈ End(V ). Then . f is


self-adjoint if and only if . f has an orthonormal basis of eigenvectors of . f .

Proof By Schur’s theorem, there is a basis . B = (b1 , . . . , Bn ) such that . f B is upper


triangular. There is


i
j j
. f (bi ) = ϕi b j for some ϕi ∈ R.
j=1

Through the Gram-Schmidt algorithm, one can construct an orthogonal basis


C = (c1 , . . . , cn ) such that for each .i ∈ I (n),
.

b ∈ span{c1 , . . . , ci }
. i

and
c ∈ span{b1 , . . . , bi },
. i

so that the change of basis matrix .T = TBC is triangular. Hence . f C = T f B T −1 is


upper triangular. But since . f C is an orthogonal representation of . f , we see .( f C )† =
f C . Thus, . f C must be diagonal, and .C is an orthonormal eigenbasis for . f . ∎
316 10 Operators on Inner Product Spaces

Remark 10.1 Since a self-adjoint operator is a normal operator, the complex,


self-adjoint case was already proven in the previous theorem. We give here a
second proof for the case .K = C.

Corollary 10.3 Real and complex spectral theorems.


If the characteristic polynomial of. f ∈ End(V ) decomposes into linear factors
of .K ∈ {R, C}, then . f is normal if and only if . f has an orthonormal basis of
eigenvectors of . f .

Proof The proof goes through as in the complex spectral theorem since Schur’s
theorem can also be applied here! ∎
There exists a more direct formulation of the spectral theorem if we combine it
with Theorem 9.1. We thus obtain a spectral decomposition of every normal operator
. f parametrized by the set of its eigenvalues:

Theorem 10.3 Spectral decomposition theorem.

Let . f be a normal operator on an inner product space .V with the distinct


eigenvalues .λ1 , . . . , λr and a characteristic polynomial which decomposes
into linear factors. If .V j is the eigenspace corresponding to the eigenvalue .λ j ,
. j ∈ I (r ) and . P j is the orthogonal projection of . V on . V j , then the following
statements hold:
(i) .V = V1 Θ V2 Θ · · · Θ Vr .
(ii) .Pi P j = δi j Pi for .i, s ∈ I (r ).
(iii) .id V = P1 + P2 + · · · + Pr .

(iv) . f = λ1 P1 + λ2 P2 + · · · + λr Pr .

Proof This theorem is simply Theorem 9.1 with the additional information that the
Vi are orthogonal. The orthogonality part follows the spectral theorem of .K.
. ∎

Comment 10.5 Normal operators revisited.

Now that we clarified the structure of all normal operators, we would like to
discuss their content. Returning to the relation . f ad ◦ f = f ◦ f ad , we observe
10.6 The Spectral Theorems 317

the two “trivial” realizations as discussed in Sect. 10.5: . f ad ◦ f = I dV = f ◦


f ad and . f ad = f . It turns out that the first relation corresponds to isometries
or equivalently to unitary or orthogonal operators and the second relation to
self-adjoint operators. At the matrix level, one writes . A† A = 1n = A A† which
corresponds to the unitary or orthogonal matrices . A ∈ U (n) or . A ∈ O(n) if . A ∈
Cn×n or . A ∈ Rn×n respectively. Similarly, the condition . A† A = A A† follows
trivially when . A† = A which means that if . A is complex, then . A belongs to the
Hermitian matrices . H (n), or if . A is real, then . A belongs to . S(n), the symmetric
matrices. It is interesting to notice that .U (n) is a group, we may consider . O(n)
as a subgroup of .U (n) and we have . O(n) ≤ U (n) ≤ Gl(n, C), and that . H (n)
is a vector space and we have

. S(n) ≤ H (n) < Cn,n .

As usual, the notation “.≤”, “.<” indicates a subspace with a similar structure.

It turns out that, as we shall see in Chap. 11, isometries and self-adjoint operators
are indeed the essential part of the normal operators.

Summary

In physics, when referring to a vector space, it almost always implies an inner prod-
uct vector space. In this chapter, we covered everything related to an inner prod-
uct space. This mainly includes concepts associated with orthogonality. Previously
known results were reiterated, summarized, and supplemented.
The normal operators, those endomorphisms relating to the inner product, were
extensively motivated and discussed here with emphasis on their analogies to com-
plex and real numbers. It was noted that precisely these operators are the most
commonly used in physics.
The spectral theorem applies to normal operators. Here, it was shown for complex
vector spaces that normal operators, such as isometries, self-adjoint operators, and
nonnegative operators, possess the best properties regarding their eigenvalues and
eigenvectors.
It was demonstrated that normal operators are the only ones that have orthog-
onal and orthonormal eigenbases. Moreover, self-adjoint operators even have real
eigenvalues. This allows them to act as observables in quantum mechanics.
Finally, it was pointed out that most operators describing symmetries in physics
are elements of unitary or orthogonal groups, and they also belong to the set of normal
operators.
318 10 Operators on Inner Product Spaces

Exercises with Hints

In the first five exercises, you learn how to express a covector with the help of a
corresponding vector as scalar product. Furthermore, you learn to distinguish
a basis dependent isomorphism from a basis free one (canonical isomorphism)
with the example of a vector space .V and its dual .V ∗ .

Exercise 10.1 Riesz representation theorem. Let .V be an inner product vector space
and .ξ a covector (linear function, linear form, .ξ ∈ V ∗ := Hom(V, K)). Show that
there exists a unique vector .u such that we can express .ξ(v) with all .v ∈ V as a
scalar product:
.ξ(v) = (v|u).

Exercise 10.2 Let .(a1 , . . . , ar ) be a linearly independent list of vectors in .V not


necessarily orthonormal. Show that there exists a vector .u ∈ V such that .(as |u) is
positive for all .s ∈ I (r ).
Exercise 10.3 An isomorphism .V ∼ = V ∗.
B
Let .V be a vector space with . B = (b1 , . . . , bn ) a basis of .V and .V ∗ = Hom(V, R)
its dual with basis . B ∗ = (β 1 , . . . , β n ), such that .β i (bs ) = δsi , i, s ∈ I (n). Show that
there exists a map .ψ B : V → V ∗ , that is, an isomorphism between .V and .V ∗ .
Exercise 10.4 Let .V be an inner product vector space, . f ∈ Hom(V, V ), and .u, w ∈
V , such that
. f (v) = w(u|v) for all v ∈ V.

Show that . f is normal if and only if .u and .w are linearly dependent (.w = λu for
some .λ ∈ K).

Exercise 10.5 Canonical isomorphism .V can =
V ∗.
Let .(V, s) be an .n-dimensional Euclidean vector space with .s ≡ (|) a symmetric
positive definite bilinear form. Show that the map

ŝ :
. V −→ V ∗
u |−→ ŝ(u) := s(u, ·) ≡ (u|·) ≡ û ∈ V ∗ ,

is a canonical isomorphism between .V and .V ∗ .


Exercise 10.6 Transitive action (see Definition 1.6 ) of reflections on spheres.
Given two distinct vectors .u and .w with .||u|| = ||w||, show that there exists a third
vector .a ∈ V and a map

(a|v)
s (v) = v − 2
. a a for all v ∈ V,
(a|a)

such that .sa (u) = w and .sa (w) = u.


10.6 The Spectral Theorems 319

Exercise 10.7 Use the Cauchy-Schwarz inequality to show that

.(αi β i )2 ≤ (i αi αi )(k βk β k ),

with .αi = αi , .βi = β i ∈ R, and .i, k ∈ I (n).

Exercise 10.8 Let .V be a vector space and .u, v ∈ V with .||u|| = ||v||. Show that
||αu + βv|| = ||βu + αv|| for all .α, β ∈ R.
.

Exercise 10.9 Let .V1 and .V2 be inner product vector spaces and .(V1 × V2 , s ≡
(·, ·|·, ·)) given by
.(u 1 , u 2 |v1 , v2 ) := (u 1 |v1 ) + (u 2 |v2 ).

Show that .s is an inner product in .V1 × V2 .

The next two exercises refer to Proposition 10.7 and to Comment 10.2.

Exercise 10.10 Let .U be a subspace of an inner product space .V and . PU an


orthogonal projection. Show the following assertions:
(i) . PU is a linear operator;
(ii) . PU |U = idU ;
(iii) . PU |U ⊥ = 0̂, the zero operator;

(iv) .ker PU = U ;
(v) .im PU = U .

Exercise 10.11 Let .U be a subspace of an inner product space .V and . PU an


orthogonal projection. Show the following assertions:
(i) . PU2 = PU ;
(ii) For all .v ∈ V , .v − PU (v) ∈ U ⊥ ;
(iii) .idV − PU = PU ⊥ .

Exercise 10.12 Let .V be a vector space and . P an operator with . P 2 = P such that
.ker P is orthogonal to.im P. Check that. V = ker P ⊕ im P and show that there exists
a subspace .U such that . P = PU .

Exercise 10.13 Let.U be a subspace of a vector space.V and. f ∈ Hom(V, V ). Show


that if
. PU ◦ f ◦ PU = f ◦ PU ,

then .U is an . f -invariant subspace of . f .


320 10 Operators on Inner Product Spaces

Exercise 10.14 Let .U be a subspace of .V with basis . A = (a1 , . . . , ar ) and . B =


(a1 , . . . , ar , b1 , . . . , bl ) a basis of .V . The Gram-Schmidt procedure applied to . B
produces an orthonormal basis . E = (c1 , . . . , cr , d1 , . . . , dl ) = (C, D). Show that .C
and . D are orthonormal bases of .U and .U ⊥ respectively.
Exercise 10.15 Let
. f ∈ Hom(V, V ' )

be given by
. f (v) = w(u|v),

with .w ∈ V ' and .u, v ∈ V . Determine the adjoint operator . f ad . Write the result in
the Dirac formalism.
Exercise 10.16 Let
. F ∈ Hom(Kn , Kn )

be given by
. F[e1 · · · en ] = [0 e1 · · · en−1 ].

Determine . F ad ∈ Hom(Kn , Kn ).
Exercise 10.17 Let . f ∈ Hom(V, V ). Show that .λ ∈ K is an eigenvalue of . f if and
only if .λ̄ is an eigenvalue of . f ad .
Exercise 10.18 Let . f ∈ Hom(V, V ' ). Show that the following assertions hold:
(i) .dim ker f ad − dim ker f = dim V ' − dim V ;
(ii) .rank f ad = rank f .
Exercise 10.19 Let .V be a complex inner product vector space. Show that the set
of self-adjoint operators is not a complex vector space.
Exercise 10.20 Let . f, g ∈ Hom(V, V ) be self-adjoint operators. Show that .g ◦ f
is self-adjoint if and only if . f ◦ g = g ◦ f .
Exercise 10.21 Let .V be an inner product vector space and . P an operator with
P 2 = P. Show that . P is an orthogonal projection if and only if . P ad = P.
.

Exercise 10.22 Let . f ∈ Hom(V, V ) be a normal operator. Show that

. ker f ad = ker f and


im f ad = im f.
Exercise 10.23 Let . f ∈ Hom(V, V ) be a normal operator. Show that .ker f m =
ker f for every .m ∈ N.
Exercise 10.24 Let .V be a real inner product vector space and . f ∈ Hom(V, V ).
Show that . f is self-adjoint if and only if all pairs of eigenvectors corresponding to
distinct eigenvalues .λ1 , . . . , λr of . f are orthogonal and

. V = E(λ1 , f ) ⊕ · · · ⊕ E(λr , f ).
References and Further Reading 321

References and Further Reading

1. S. Axler, Linear Algebra Done Right (Springer Nature, 2024)


2. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Studienanfänger. Grundkurs
Mathematik (Springer, 2020)
3. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
4. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
5. N. Johnston, Advanced Linear and Matrix Algebra (Springer, 2021)
6. M. Koecher, Lineare Algebra und analytische Geometrie (Springer, 2013)
7. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences (Springer,
2018)
8. J. Liesen, V. Mehrmann, Linear Algebra (Springer, 2015)
9. P. Petersen, Linear Algebra (Springer, 2012)
10. S. Roman, Advanced Linear Algebra (Springer, 2005)
11. B. Said-Houari, Linear Algebra (Birkhäuser, 2017)
12. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach
(Cambridge University Press, 2022)
13. G. Strang, Introduction to Linear Algebra (SIAM, 2022)
Chapter 11
Positive Operators–Isometries–Real
Inner Product Spaces

In this chapter, we proceed with special normal operators such as the nonnegative
operators. The analogical comparison of normal operators with the complex num-
bers continues in that we now introduce nonnegative operators which correspond to
nonnegative real numbers.
Subsequently, we discuss isometries. These are closely related to symmetries in
physics, in particular to symmetries in quantum mechanics.
We then use properties of operators in complex vector spaces to derive properties
of operators in real vector spaces. An instrument for this method is complexification
which we explain in detail. In this way, we obtain the spectral theorem for real normal
operators which have not been accessible so far.

11.1 Positive and Nonnegative Operators

As discussed in Sect. 10.5, the normal operators are precisely those operators that
have a remarkable analogy to the complex numbers. This analogy goes one step
further and extends to positive and nonnegative numbers. This analogy also allows
us to speak of the root of a nonnegative operator. It is useful to remember first the
situation with the complex numbers. A complex number .z is positive if .z is real and
positive. This is equivalent to the existence of some .w /= 0 such that .z = w̄w, or to
√ √
having a positive square root . z ≡ z (positive). Likewise, .z is nonnegative if .z is
+
positive or zero. Coming now to the operators, the expected analogy with the complex
numbers is the following: The self-adjoint operators correspond to real numbers; the
positive operators which are always self-adjoint, correspond to positive numbers,
and the nonnegative operators correspond, of course, to nonnegative numbers. This
leads to this definition:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 323
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_11
324 11 Positive Operators–Isometries–Real Inner Product Spaces

Definition 11.1 Positive and nonnegative operators.


A self-adjoint operator . f on an inner product space .V is positive if .(v| f v) > 0
and is nonnegative if .(v| f r ) > 0 for all .v ∈ V \{0}.

Warning: Confusingly, some authors use the term “positive” to mean “nonnegative”.

Definition 11.2 Square root.


An operator .g is a square root of an other operator . f if .g 2 = f .

Proposition 11.1 below gives some properties of nonnegative operators.

Proposition 11.1 Nonnegative operator.


The following statements are equivalent:
(i) An operator . f is nonnegative.
(ii) The eigenvalues of . f are nonnegative.
(iii) . f has a nonnegative root . g.

(iv) There exists an operator .g so that . f = g ad g.

Proof (i) .⇒ (ii)


Suppose (i) holds. Then . f is self-adjoint and .(v| f v) > 0 for all .v ∈ V . By the spec-
tral theorem, there is an orthonormal eigenbasis .(c1 , . . . , cn ) of . f with eigenvalues
.(λ1 , . . . , λn ). Then for each . j ∈ I (n), .(c j | f c j ) = (c j |λ j c j ) = λ(c j |c j ) = λ j > 0.
This proves (ii).
Proof (ii) .⇒ (iii)
Suppose (ii)/holds. We can/define a linear operator / the basis .C = (c1 , . . . , cn ) so
/.g on
that .gc j = λ j c j . Then . λ j > 0 and .g 2 c j = λ j λ j c j = λ j c j for all . j ∈ I (n)
so that .g 2 = f . This .g is a nonnegative square root of . f .
Proof (iii) .⇒ (iv)
Suppose (iii) holds. Then it follows that (iv) is also valid since .g is self-adjoint and
. f = gg = g g (and of course . f = f ).
ad ad

Proof (iv) .⇒ (i)


If . f = g ad g holds, then .(v| f v) = (v|g ad gv) = (gv|gv) > 0 by the definition of .(|).
So . f is nonnegative. This completes the proof of the Proposition. ∎
/ /
In the proof of (iii), we have .λ j > 0 and we chose for .g: . λ j ≡ λ j > 0. This
+
means that the nonnegative operator .g is uniquely defined. This is in perfect analogy

to complex numbers: .z /= 0 has a unique positive square root . z > 0.
+
11.2 Isometries 325

11.2 Isometries

We repeat the geometric definition of an isometry valid for every .K-vector space with
an inner product.

Definition 11.3 Isometry.


An operator . f on an inner product space .(V, (|)) is an isometry if it preserves
the norm
.|| f v|| = ||v|| for all v ∈ V.

In addition, if .K = C, an isometry . f is also called a unitary operator . f and if


.K = R, an isometry . f is called an orthogonal operator . f .

The term “orthogonal” is a tradition and actually means “orthonormal”!


The following proposition gives a series of equivalent descriptions of what an
isometry is.

Proposition 11.2 Equivalent formulations of an isometry.


If . f is an operator on .V , the following statements are equivalent:
(i) . f is an isometry (.|| f v|| = ||v||).
(ii) .( f v| f v) = (u|v) for all .u, v ∈ V .
(iii) For every orthonormal list .(c1 , . . . , ck ) of vectors in .V , .( f c1 , . . . , f ck )
is also orthonormal.
(iv) There exists an orthonormal basis .(c1 , . . . , cn ) of .V so that
.( f c1 , . . . , f cn ) is orthonormal.
(v) . f ad ◦ f = I dV = f ◦ f ad .
(vi) . f ad is an isometry.
(vii) . f is invertible and . f −1 = f ad .

This proposition specifies the following. (ii) shows that an isometry preserves the
inner product. It is compatible with the inner product and is therefore exactly what
we expect of an operator associated with the inner product.
(iii) and (iv) show that an isometry transforms an orthonormal basis into an
orthonormal basis.
(v), (vi), and (vii) show not only that an isometry is invertible, but also that we can
express this inverse simply by the adjoint .( f −1 = f ad ). This leads to the important
fact that the isometries form a group.

Proof (i) .⇒ (ii) We prove (ii) expressing the inner product by the norm, as given by
the polarization identity in Sect. 10.2.
326 11 Positive Operators–Isometries–Real Inner Product Spaces

Proof (ii) .⇒ (iii) For a given orthonormal list .(c1 , . . . , ck ) of the vectors .(ci |c j ) =
δi j i, j ∈ I (k), the preservation of the inner product gives

( f ci | f c j ) = (ci |c j ) = δi j ∀ i, j ∈ I (k).
.

Proof (iii) .⇒ (iv): If .(c1 . . . . , cn ) is an orthonormal basis in .V , we have as above


for .k = n
.( f ci | f c j ) = (ci |c j ) = δi j i, j ∈ I (n).

Proof (iv) .⇒ (v) Consider

(v| f ad ◦ f v) = ( f v| f v) = (v|v) = (v|idV v)


.

for all .v ∈ V and so we have .(v|( f ad f − idV )v) = 0. As . f ad f is self-adjoint and


Proposition 10.11, valid for complex and real vector spaces, we get . f ad ◦ f − idV =
0 ⇔ f ad ◦ f = idV . It follows that . f ad ◦ f is invertible and hence, so are . f ad and
. f which means

.f
ad
◦ f = f ◦ f ad = idV .

Proof (v) .⇒ (vi)

|| f ad v||2 = ( f ad v| f ad v) = (v| f ◦ f ad v) = (v|idV v) =


.

= (v|v) = ||v||2 .

So . f ad is also an isometry.

Proof (vi) .⇒ (vii) If . f ad is an isometry, then

( f ad )ad = f and f ad ◦ f = idV


.

since
. (i) implies (v),

so
. f ad ◦ f ◦ f −1 = f −1 ⇒ f ad = f −1 .

Proof (vii) .⇒ (i) If . f −1 = f ad , then

.idV = f ad ◦ f,

with
.(v| f ad f v) = ( f v| f v)

gives

|| f v||2 = ( f v| f v) = (v| f ad ◦ f v) = (v|idV v) = (v|v) = ||v||2 .


.
11.2 Isometries 327

The sequence of proofs (i) .⇒ (ii) .⇒ · · · ⇒ (vii) .⇒ (i) is complete and so is the proof
of the Proposition. ∎

The structure of isometries was given essentially by the spectral theorems in Sect.
10.6 since isometries are a subset of normal operators. However, here we have to
distinguish between .C- and .R-vector spaces. This is also clear with the following
two corollaries.

Corollary 11.1 Complex spectral theorems for isometries.


An operator . f is an isometry on a complex vector space .V if and only if .V
has an orthonormal basis of eigenvectors of . f and all eigenvalues have the
absolute value .1 (i.e., .|λ| = 1).

Proof We have only to prove

|λi | = 1 for every i ∈ I (n).


.

For an orthonormal eigenbasis .(c1 , . . . , cn ) of . f , we have . f ci = λi ci and .|| f ci || =


||ci || = 1 so that

1 = ||ci || = || f ci || = ||λi ci || = |λi |||ci || = |λi |.


.

Conversely, if . f has such an eigenbasis, then for each eigenvector .ci ,

|| f ci || = ||λi ci || = |λi | ||ci || = ||ci ||


.

so that by linearity and Pythagoras,

|| f v|| = ||v|| ∀ v ∈ U,
.

so . f is an isometry. ∎

Corollary 11.2 Real spectral theorem for isometries.


Let . f be an operator on a real inner product vector space .V , then . f has an
orthonormal eigenbasis with eigenvalues of absolute value .1 if and only if it
is both, orthogonal (isometry) and self-adjoint.

Proof .⇒
If .V has an orthonormal basis .(c1 , . . . , cn ) of eigenvectors . f c j = λc j and
328 11 Positive Operators–Isometries–Real Inner Product Spaces

|λ j | = 1 j ∈ I (n), then . f is self-adjoint .( f ad = f ) by the real spectral theorem


.

(Theorem 10.2) since the eigenvalues are real.(λ j ∈ R). It follows for every. j ∈ I (n):

( f ad ◦ f )(c j ) = ( f ◦ f )(c j ) = f ( f c j ) = f (λ j c j ) = λ2j c j = 1c j .


.

So we have. f ad ◦ f = I dV which means that. f is orthogonal. So. f is both orthogonal


and self-adjoint. This was one direction of the proof. ∎
Proof .⇐
If. f is self-adjoint and orthogonal at the same time, then, by the real spectral theorem,
an orthonormal eigenbasis .(c1 , . . . , cn ) exists with . f ci = λi ci . By the fact that . f is
also orthogonal, we have .|| f ci || = ||ci || and so

1 = ||ci || = || f ci || = ||λi ci || = |λi |||ci || = |λi |1 = |λi |.


.

This is the other direction of the proof. ∎


This corollary shows that an orthogonal operator must be self-adjoint to be diag-
onalizable. This means that, in general, an orthogonal operator will be nondiagonal-
izable (see Examples 9.10 and 9.17). Hence, we have to discuss separately normal
operators and particularly isometries in a real inner product space.

11.3 Operators in Real Vector Spaces

It is easier to deal with operators in complex vector spaces than in real vector spaces.
As we already saw, over real vector spaces, only in the very special case where an
operator whose characteristic polynomial splits into linear factors, is it possible to
proceed analogously to the case of complex vector spaces. Therefore, we generally
expect the structure of normal operators in a real vector space to be quite different
from the corresponding case in a complex vector space. In contrast to the complex
spectral theorem, a rotation on a two-dimensional real vector space is generally not
diagonalizable. However, as we know, when going from .R to .C and even from .Rn to
.C , there are connections between real and complex vector spaces. These connections
n

also affect the behavior of the corresponding operator. Therefore, it is reasonable to


examine how the results in the complex case affect the operators in the real case.
The instrument for such a procedure is complexification. As .R is embedded natu-
rally in .C and .Rn in .Cn , a real vector space .U can be embedded naturally in a complex
vector space .Uc , the complexification of .U .
The best way to get a feeling for this rather abstract situation with operators, is
to consider what happens with matrices. How do we obtain information about real
matrices from what we know about the structure of complex matrices? And yet, this
is not the first time we are faced with complexification!
The introduction of complex numbers was already a case of complexification,
though on the lowest possible level. It is instructive to recall the steps to construct
11.3 Operators in Real Vector Spaces 329

C = Rc from .R.
.

.Rc ∼
=C∼
= R × R = {z = (ζ0 , ζ1 ) : ζ0 , ζ1 ∈ R}.

What is completely new about .Rc ∼ = C with respect to .R × R, is that we have an


additional notion of complex multiplication. Complex multiplication on .R2 makes it
a field like .R. This gives us an action of .C on the vector space .R × R. Equivalently,
starting with the following procedure, taking

. z := (ζ0 , ζ1 ) = ζ0 + iζ1 , x := (ξ0 , ξ1 ) = ξ0 + iξ1 ,

with
ζ , ζ1 , ξ0 , ξ1 ∈ R and i := (0, 1),
. 0

we obtain:

. C × Rc −→ Rc
(z, x) |−→ zx := (ζ0 ξ0 − ζ1 ξ1 , ζ0 ξ1 + ζ1 ξ0 ).

The above definition implies that

ii = (0, 1)(0, 1) = (−1, 0)


.

which justifies the identification

C = Rc = R × R = R + iR.
.

We are ready to generalize the above formalism to a real vector space .U . We hope
that the attentive reader will also accept our choice of the index .(0, 1) instead of
.(1, 2), we write .(ζ0 , ζ1 ) instead of .(ζ1 , ζ2 )!

Definition 11.4 Complexification of .U .


The complexification .Uc of .U is given by .Uc := U × U = U + iU and the
complex scalar multiplication by

.C × Uc −→ Uc ,
(ζ0 + iζ1 , u 0 + iu 1 ) | −→ (ζ0 + iζ1 )(u 0 + iu 1 ) := ζ0 u 0 − ζ1 u 1 + i(ζ0 u 1 + ζ1 u 0 ),
ζ0 , ζ1 ∈ R and u 1 , u 2 ∈ U.
330 11 Positive Operators–Isometries–Real Inner Product Spaces

It is important to distinguish between the three vector spaces.U ,.U × U , and.Uc . As


U and .U × U are real vector spaces with real dimensions .dim U = n, dim U × U =
.
2n. The vector space .Uc , however, is a complex vector space with the definition given
above. Still, nobody can hinder us from also considering .Uc as a real vector space
given by .Uc = U × U with dimension .dimR Uc = 2n of course! But what is the
dimension of .Uc as a complex vector space?
For the dimension of .Uc (where we denote .dim Uc ≡ dimC Uc ), a basis of .U plays
again a central role:

Proposition 11.3 Basis of .Uc .


If .U is a real vector space (.dim U = n) with a basis . B = (b1 , . . . , bn ), then
this same . B is a basis of the complex vector space .Uc .

Proof We have only to test that the list . B is linearly independent and spanning. Set-
ting .bs λs = 0, with .λs := λs0 + iλs1 , λs0 , λs1 ∈ R, s ∈ I (n), we obtain .bs (λs0 + iλs1 ) =
bs λs0 + ibs λs1 = 0. This means that .bs λs0 = 0 and .bs λs1 = 0.
From the linear independence of . B in .U , we obtain .λs0 = 0, λs1 = 0 and so .λs = 0
for all .s ∈ I (n), and . B is linearly independent in .Uc .
. B is also spanning in .Uc : If .v ∈ Uc , v = (v0 , v1 ) ∈ U × U , we have .v0 = bs λ0
s

and .v1 = bs λ1 since . B is a basis in .U . So we obtain


s

v = (v0 , v1 ) = (bs λs0 , bs λs1 ) = bs (λs0 , λs1 ) = bs (λs0 + iλs1 ) = bs λs


.

with .λs ∈ C. So . B is indeed spanning in .Uc and . B is a basis in .Uc . ∎

Corollary 11.3 The dimension of .Uc is .dim Uc ≡ dimC Uc = n = dimR U ≡


dim U .


The extension of .U to .Uc = U + iU leads as expected to the question of whether
one can extend maps . f : U → V on real vector spaces to maps . f c : Uc → Vc on
their complexifications. Indeed, one can, with the following definition.

Definition 11.5 Complexification of . f .


If .U is a real vector space, and . f : U → V a real linear map, the complexifi-
cation . f c of . f, f c ∈ Hom(Uc , Vc ) is given with .u 0 , u 1 ∈ U by

f (u 0 + iu 1 ) := f u 0 + i f u 1 .
. c
11.3 Operators in Real Vector Spaces 331

This means equivalently that we have

. c f ≡ f × f : U × U −→ V × V
(u 0 , u 1 ) |−→ ( f u 0 , f u 1 ).

Remark 11.1 . f c is .C-linear.


Indeed, . f c is a .C-linear operator, as .v = u 0 + iu 1 and .λ = λ0 + iλ1 , u 0 , u 1 ∈
U, λ0 , λ1 ∈ R, we obtain

f (λv) = f c ((λ0 + iλ1 )(u 0 + iu 1 ) = f c (λ0 u 0 − λ1 u 1 + i(λ0 u 1 + λ1 u 0 ))


. c

= f (λ0 u 0 − λ1 u 1 ) + i f (λ0 u 1 + λ1 u 0 )
= λ0 f u 0 − λ1 f u 1 + i(λ0 f u 1 + λ1 f u 0 )
= (λ0 + iλ1 )( f u 0 + i f u 1 )
= λ f (v).

We can of course apply the notion of complexification immediately to matrices


Rn×n ⊆ Cn×n because a matrix . A ∈ Rn×n can be understood as an operator, . A ∈
.

End(Rn ). So for a real matrix . A

. A: Rn → Rn ,
u→ |→ Au→.
Ac : Cn −→ Cn
u→0 + i u→1 |−→ Ac (→
u 0 + i u→1 ) := Au→0 + i Au→1 .

This is obviously what we would do intuitively: Science is nothing but intuition that
became rational.
In this context, the question of how to represent . f c can easily be answered: If we
have . f B B = F ∈ Rn×n , then .( f c ) B B = F again. This follows from the fact that, as
the above proposition shows, any basis of .U is also a basis of .Uc . With the above
332 11 Positive Operators–Isometries–Real Inner Product Spaces

notation we have:

. f bs = bt ϕst ϕst ∈ R s, t ∈ I (n) and


f c bs = bt ϕst .

This is consistent with . F = (ϕst ) ∈ Rn×n and . Fc = (ϕst ) ∈ Rn×n ⊆ Cn×n .


The last but not trivial question is what happens with the eigenvalues and eigen-
vectors of . f and . f c . This leads us back to our original discussion of what can be
learned from the complexification . f c of . f for the . f itself. The best way to under-
stand this problem is to consider a simple example at the level of matrices in low
dimensions.
We start with a real matrix in two dimensions, a rotation about the angle . π2 which
we know to be normal but nondiagonalizable:
[ ] [ ]
0 −1 0 −1
. A= and Ac = .
1 0 1 0

The characteristic polynomial is given by


[
]
0 −1
.χ A (x) = det = x2 + 1
1 0

and it has no real solution. This means that the operator

. A : R2 −→ R2
u→ |−→ Au→

has no eigenvalues and the spectrum (i.e., the set of eigenvalues) of . A is empty, given
by .σ (A) = ∅. The complexification operator is given by

. Ac : C2 −→ C2
v→ |−→ A→
v := A→
v

u 0 , u→1 ),.u→0 , u→1 ∈ R2 . The eigenvalues of. Ac are the solutions of


with.v→ = u→0 + i u→1 ≡ (→

.χ A (x) = x 2 + 1 = 0, λ1 = i and λ2 = −i.

So the spectrum of . Ac is .σ (Ac ) = {i, −i}. For


[ 1the
] corresponding
[ ] eigenvectors, we
may choose the two orthogonal vectors .b1 = −i and .b2 = 1i . We therefore have
an eigenbasis given by . B = (b1 , b2 ) with .< b1 |b2 >= 0.
After this short excursion, we want to find out what we can generally learn about
an operator on a real vector space from an operator on a complex vector space. As we
already saw, every operator in a complex vector space can be triangularized. This is
possible since every operator on a complex vector space has at least one eigenvalue.
11.4 Normal Operators on Real Vector Spaces 333

This is not the case for every operator on a real vector space. But the existence of
an eigenvalue in the complex setting guarantees at least the presence of an invariant,
one or two-dimensional invariant subspace corresponding to the subspace of the real
operator. This is the content of the following proposition.

Proposition 11.4 . f -invariant spaces in real vector spaces.


Every operator in a real vector space has an invariant subspace of dimension
1 or 2.

Proof If .V is a real vector space, . f ∈ End(V ) and . f c ∈ End(Vc ), then there exists
an eigenvalue .λ ∈ C of . f c :
. f c v = λv.

We can write .λ = λ0 + iλ1 , λ0 , λ1 ∈ R and .v = u 0 + iu 1 , u 0 , u 1 ∈ V , so we have


f v = f u 0 + i f u 1 and the eigenvalue equation . f u 0 + i f u 1 = (λ0 + iλ1 )(u 0 +
. c

iu 1 ). This leads to

. f u 0 = λ0 u 0 − λ1 u 1 and f u 1 = λ0 u 1 + λ1 u 0 .

We may define .U := spanR (u 0 , u 1 ) and, as seen above, .U is an . f -invariant subspace


of .V with dimension 1 or 2. ∎

11.4 Normal Operators on Real Vector Spaces

We can now give a complete description of normal operators on real vector spaces.
The prerequisite for the notion of the normal operator is, of course, the existence of
an inner product vector space. So here, we are dealing with normal operators on a
Euclidean vector space. We also know from the previous section that every operator
on a real or a complex vector space has an invariant subspace of dimension 1 or 2.
Therefore, we must first describe the complete prescription of normal operators in
dimensions 1 and 2. For that purpose and to prepare the following procedure, it is
advantageous to primarily discuss the very pleasant properties of normal operators
in real and complex vector spaces, in the context of their restrictions on invariant
subspaces. The following two propositions illustrate this.

Lemma 11.1 Normal [ matrices.


]
A block matrix F = . A0 CD with the blocks . A, C, and . D is normal if and only
if the matrix .C is zero (.C = 0) and the matrix . A and . D are normal.
334 11 Positive Operators–Isometries–Real Inner Product Spaces

Proof Given [ ] [ ]
AC A† 0
.F = and F = ,

0 D C † D†

we have [ ][ ] [ † ]
A† 0 AC A A A† C
.F F = =

C † D† 0 D C † A C †C + D† D

and [ ][ ] [ † ]
AC A† 0 A A + CC † C D †
.F F = = .

0 D C † D† DC † D D†

If . F is normal, then
. F † F = F F †.

This leads to the condition


. A† A = A A† + CC † . (11.1)

Since, in general, we have . A† A /= A A† , we have to proceed to take the trace of the


above equation. We need first the following definition:

Definition 11.6 Trace of a matrix.


The trace .tr(A) of a square matrix . A = (αsi ) ∈ Kn×n is the sum of the diagonal
entries of . A:
∑ n
. tr(A) := αs ≡ αss .
s

s=1

The expression .tr(A† A) is very interesting: firstly, the symmetry equation .tr(A† A) =
tr(A A† ) and secondly .tr(A† A) is a sum of squares so we have

. tr(A† A) = ᾱμs αsμ = αμs ᾱsμ = tr(A A† ) (11.2)

and

n
. tr(A† A) = |αsi |2 . (11.3)
s,i

The trace in Eq. (11.1) leads to

. tr(A† A) = tr(A A† ) + tr(CC † ). (11.4)


11.4 Normal Operators on Real Vector Spaces 335

With the result of Eq. (11.2), we obtain

. tr(CC † ) = 0. (11.5)

From the result in Eq. (11.3) there follows that also .C = 0. ∎

Proposition 11.5 Restriction of a normal operator.


Let . f be a normal operator . f on an inner product vector space .V , and .U an
. f -invariant subspace of . V . Then

(i) The subspace .U ⊥ is also . f -invariant;



(ii) .U and .U are . f ad -invariant;
(iii) The restrictions to .U satisfy .( f |U )ad = ( f ad )|U ;
. f |U and . f |U are normal.
ad
(iv)

Remark 11.2 This means that, if we use the notation . fU := f |U , we have


f , fUad ∈ End V and . fU , fU ⊥ , fUad , f ad U ⊥ are normal.
. U

Proof (i)
Let.Cr := (c1 , . . . , cr ) be an orthonormal basis of.U . We extend.Cr to an orthonormal
basis .C on .V , .C = (Cr , Bs ) = (c1 , . . . , cr , b1 , . . . , bs ), so that .r + s = n = dim V .
Then . Bs , being orthogonal to .U , is a basis of .U ⊥ . Since .U is . f -invariant, the
representation of . f , with respect to the basis .C, is given by the block matrix . f C ≡ F:
[ ]
F1 F2
. F= .
0 F3

Since . f is normal, . F is a normal matrix. By Lemma 11.1, . F2 = 0 and . F is a block


diagonal matrix: [ ]
F1 0
.F = .
0 F3

This shows instantly that .U ⊥ is . f -invariant. ∎


Proof (ii)
The matrix of . f ad , with respect to the basis .C, is given by
[ ]
F1† 0
f ad ≡ F † =
. C
0 F3†

and this shows again that .U and .U ⊥ are . f ad -invariant. ∎


336 11 Positive Operators–Isometries–Real Inner Product Spaces

Proof (iii) First proof.


For .u 1 , u 2 ∈ U and the definitions . f |U (u 1 ) = f u 1 and . f ad |U (u 2 ) = f ad (u 2 ) using
.f
ad
(U ) < U from (ii), we obtain

(( f |U )ad u 2 |u 1 ) = (u 2 | f |U u 1 ) = (u 2 | f u 1 ) = ( f ad u 2 |u 1 ) = ( f ad |U (u 2 )|u 1 )
.

which shows that .( f |U )ad = f ad |U . This also justifies the notation . f ad U :

. f ad U := f ad |U !

Proof (iii) Second proof.


The representation matrix . f Cad ≡ F † shows this result directly: We have . f C r = F1
and . f Cadr = F1† . ∎

Proof (vi) First proof.


Observe:

f ad fU = ( f ad )U fU
. U

= ( f ad f )U
= ( f ad )U fU
= f ad U fU .

This shows that . fU is normal. ∎

Proof (iv) Second proof.


It follows from Proposition 11.1 that . F normal is equivalent to . F1 and . F3 being
normal. ∎

This completes our preparation for normal operators. As we know from the last
section, every normal operator in a Euclidean vector space has a one-dimensional or a
two-dimensional invariant subspace. We will now determine the structure, starting
with this low-dimensional normal operator.
Nothing needs to be said in the one-dimensional case because, in this case, every
operator is a normal operator and invariant subspace here means eigenspace. The
two-dimensional case is not quite trivial. It is clarified in the following proposition.
11.4 Normal Operators on Real Vector Spaces 337

Proposition 11.6 Normal operators in a two-dimensional real vector space.

Suppose a normal operator. f on a two-dimensional Euclidean vector space


. V , has no one-dimensional . f -invariant subspace (i.e., no real eigenvalues).
In that case, every orthonormal basis gives a representation of . f by a matrix
of the form: [ ]
α −β
.F =
β α

with .α, β ∈ R, β /= 0. Without loss of generality, we may also assume that


.β > 0.

Proof For any orthonormal basis .C = (c1 , c2 ), the matrices . f C ≡ F and . f Cad ≡ F T
are given by [ ] [ ]
αγ T αβ
.F = and F = .
β δ γ δ

Note that . f is normal precisely when . F is normal: . F T F = F F T . This leads to the


two independent equations
.α + β = α + γ
2 2 2 2
(11.6)

and
. αβ + γ δ = αγ + βδ. (11.7)

From Eq. (11.6), we obtain

γ 2 = β 2 and γ = ±β.
.

We exclude the case.γ = β (and.β = 0) because it leads to. F T = F. But a symmetric


matrix has a real eigenvalue and a one-dimensional eigenspace which is . f -invariant.
So we have to take .β /= 0 and .γ = −β. From Eq. (11.7), we obtain .αβ − βδ =
−αβ + βδ and .α − δ = −α + δ which leads to .α = δ and to
[ ]
α −β
. F= .
β α

After having obtained all the above results, we expect that a normal operator. f on a
real inner product vector space will have an orthogonal decomposition, consisting of
normal operators restricted to one-dimensional or two-dimensional Euclidean vector
spaces. This means that we will have an orthogonal decomposition of .V , we use the
symbol “.Θ” for it:
338 11 Positive Operators–Isometries–Real Inner Product Spaces

. V = U1 Θ · · · Θ Ui Θ · · · Θ Us s ∈ N, i ∈ I (s),

with .dim Ui = 1 or .dim Ui = 2 and the corresponding decomposition of . f :

. f = f1 Θ f2 Θ · · · Θ fi Θ · · · fs ,

with normal . f i ∈ End(Ui ) and if .dim Ui = 2 with . f i of the type given in the previous
proposition. This is given in the next theorem.

Theorem 11.1 Normal operators in real vector spaces.


If . f is a normal operator on a real inner product vector space .V , then the
following are equivalent:
(i) . f is normal;
(ii) There is an orthogonal decomposition of .V into one-dimensional or two-
dimensional. f -invariant subspaces of.V , with a corresponding orthonor-
mal basis of .V with respect to which . f has a block diagonal represen-
tation such that each block is a .1 × 1 matrix (with . Fi = αi ) or a .2 × 2
matrix of the form [ ]
αi −βi
. Fi = ,
βi αi

with .αi , βi ∈ R, .s ∈ N, i ∈ I (s) and .βi > 0.

Proof Proof of the Theorem.


We prove (ii) by induction on .dim(V ). We assume that .dim V > 3 (if .dim V = 1 or
.dim V = 2 there is nothing to be proven). Let .U be a one-dimensional . f -invariant or,

if no such one-dimensional subspace exists, let .U be a two-dimensional . f -invariant


subspace of .V (without a one-dimensional subspace of .V ) according to Proposition
11.6.
If .dim U = 1, any nonzero vector of .U is an eigenvector of . f in .U . We normalize
this vector to norm 1 and we obtain an orthonormal basis in .U . In this case, the
matrix of . f |U is given by . F1 = (α1 ), α1 ∈ R. If .dim U = 2, then the matrix of . f |U
is given by Proposition 11.6. This matrix has, with respect to an orthonormal basis
of .U , the form: [ ]
αi −βi
. F1 = ,
β1 α1

with .α1 , β1 ∈ R and .β1 > 0.


The subspace .U ⊥ of .V has fewer dimensions than .V . According to Proposition
11.5, .U ⊥ is also an . f -invariant subspace of .V and . f |U ⊥ is a normal operator in
⊥ ⊥
.U . Therefore we can apply the hypothesis of induction on .U : There exists an

orthonormal basis with respect to which the matrix of . f |U ⊥ has the expected block
11.4 Normal Operators on Real Vector Spaces 339

diagonal form. This basis of .U ⊥ , together with the basis of .U , gives an orthonormal
basis of .V with respect to which the matrix of . f has the form given in the above
theorem. ∎

Modulo reordering the basis vectors of an orthonormal basis .C, we obtain directly
the following corollary.

Corollary 11.4 Spectral theorem for orthogonal operators.


Let . f be an orthogonal operator in a Euclidean vector space .V with .dim V =
n. Then there exists an orthonormal basis.C such that the matrix. F representing
. f can be given in the following block diagram form with dimensions .k + l +
2r = n:

. F = 1 Θ · · · Θ 1 Θ (−1) Θ · · · Θ (−1) Θ F1 Θ · · · Θ F j Θ · · · Θ Fr .

That is, there are .k trivial eigenvalues with .λ = 1, .l trivial eigenvalues


with .λ = −1 and orthogonal matrices . F j , j ∈ I (r ) given by the angles
.ϕ j ∈ [0, 2π], ϕ j / = 0, π, 2π and

[ ]
cos ϕ j − sin ϕ j
. Fj = .
cos ϕ j sin ϕ j

Summary

In this chapter, we first briefly discussed the nicest operators in mathematics: non-
negative operators and isometries. Both are special, normal operators and therefore
subjects of the spectral theorem.
Our main concern was to examine operators in real vector spaces, and especially
normal operators in real inner product spaces.
Here, the question of diagonalization was much more challenging than in complex
vector spaces.
The process of complexification was extensively discussed. This approach allows
us to relate the question of diagonalization to the known results in complex vector
spaces. In this way, the spectral theorem was also applied to real vector spaces.
340 11 Positive Operators–Isometries–Real Inner Product Spaces

Exercises with Hints

Exercise 11.1 Show that the sum of two positive operators is a positive operator.

Exercise 11.2 Show that a nonnegative operator is positive if and only if it is


invertible.

Exercise 11.3 Show that a nonnegative operator in a vector space .V has a unique
nonnegative square root.

In the next exercises, we consider symmetric matrices . S (symmetric opera-


tors). We define the vector space of symmetric matrices (see Exercise 2.10) by
.Sym(n) := {S ∈ R
n×n
: S T = S}.

Exercise 11.4 Show that .dim Sym(n) = 21 n(n + 1).

Exercise 11.5 Let . F be a matrix . F ∈ Rn×n and .φ F : Sym(n) → Rn×n be the linear
map given by .φ F (S) := F T S F. Prove the following assertions:
(i) .φ F ∈ End(S(n));
(ii) .φ F is bijective if and only if . F is invertible.
T
[ ] 11.6 Consider the function . f (x) = αx Sx to show that the matrix . S =
Exercise
αβ
β δ ∈ Sym(2) is positive definite if and only if

α > 0 and αδ − β 2 > 0.


.

Exercise 11.7 Let . S = (σis ) ∈ Sym(n) with .i, s ∈ I (n). Show that the following
conditions are necessary for . S to be positive definite.
(i) .σii > 0;
(ii) .σii σss − σis2 > 0 for every .i < s.

Exercise 11.8 Let . S ∈ Sym(n) and . F ∈ Gl(n). Show that the following assertions
are equivalent.
(i) . S is positive definite;
(ii) . F T S F is positive definite.

Exercise 11.9 Criterion for positive definite matrices.


For . S = (σis ) with .i, s ∈ I (n), show that the following assertions are equivalent.
(i) . S is positive definite;
(ii) There exists . F ∈ Gl(n) such that . S = F T F;
11.4 Normal Operators on Real Vector Spaces 341
[ σ11 ··· σ1k ]
(iii) All main minors .δk = δk (S) := det ... .. are positive.
.
σk1 ··· σkk

Exercise 11.10 For . S = (σis ) with .i, s ∈ I (n) and . S ∈ Sym(n), show that for any
m ∈ N the following assertions hold.
.

(i) .ker S m = ker S;


(ii) .im S m = im S;
(iii) .rank S m = rank S.
Exercise 11.11 Let . S, T be positive definite matrices and suppose . ST = T S. Show
that .T S is also positive definite.
Exercise 11.12 Let. S, T ∈ Sym(n) and. S be positive definite. Show that there exists
some . B ∈ Gl(n) with . B T S B = 1n and . B T T B = D with . D diagonal.
Exercise 11.13 Check that the statements of Exercises 11.6, 11.7, and 11.8, also
hold for positive semidefinite (nonnegative) matrices if we interchange the symbols
.> and .≥.

Exercise 11.14 Criterion for positive semidefinite matrices.


For . S = (σis ) ∈ Sym(n), i, s ∈ I (n), show that the following assertions are equiva-
lent:
(i) . S is positive semidefinite (nonnegative);
(ii) There exists . F ∈ Rn×n such that . S = F T F.
Exercise 11.15 Let . S ∈ S(n) be positive definite. Show that then . S is positive
semidefinite if and only if .det S /= 0.
[ ]
Exercise 11.16 Let . A ∈ R2×2 be given by . A = γα βδ . Show that . A has real
eigenvalues if and only if
.(α − δ) + 4βγ > 0.
2

The next exercise shows again the analogy between real numbers and self-
adjoint operators. We have .x 2 + 2β + γ = (x − β)2 + γ − β 2 > 0 if .γ −
β 2 > 0. We may expect a similar relation with . f a self-adjoint operator to
. x a real number.

Exercise 11.17 If . f ∈ Hom(V, V ) is self-adjoint and .β, γ ∈ R such that .(γ −


β 2 ) > 0, then show that
. f + 2β f + γ id
2

is invertible.
342 11 Positive Operators–Isometries–Real Inner Product Spaces

References and Further Reading

1. S. Axler, Linear Algebra Done Right. (Springer Nature, 2024)


2. G. Fischer, B. Springborn, Lineare Algebra, Eine Einführung für Studienanfänger. Grundkurs
Mathematik. (Springer, 2020)
3. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
4. K. Jänich, Mathematik 1. Geschrieben für Physiker (Springer, 2006)
5. N. Johnston, Advanced Linear and Matrix Algebra. (Springer, 2021)
6. M. Koecher, Lineare Algebra und analytische Geometrie. (Springer, 2013)
7. G. Landi, A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences. (Springer,
2018)
8. P. Petersen, Linear Algebra. (Springer, 2012)
9. S. Roman, Advanced Linear Algebra. (Springer, 2005)
10. B. Said-Houari, Linear Algebra. (Birkhäuser, 2017)
11. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach.
(Cambridge University Press, 2022)
12. G. Strang, Introduction to Linear Algebra. (SIAM, 2022)
Chapter 12
Applications

In the following three subsections, we will discuss some of the most important special
cases and applications of standard operators and linear maps, generally. We start with
orthogonal operators, isometries in real vector spaces, including reflections.
Next, we return to linear maps, .Hom(V, V ' ), on inner product vector spaces and
explain in some detail the role of singular value decomposition (SVD). This leads
smoothly to the polar decomposition operators of square matrices.
We finally discuss the Sylvester’s law of inertia and investigate shortly its con-
nection with special relativity.

12.1 Orthogonal Operators–Geometric Aspects

Prime examples of operators contain what we usually call rotations and reflections.
Since we have described rotations in the previous Chap. 11, we now broaden our
study to reflections too. In two dimensions, the reader ought to be familiar with
both. In higher dimensions, the story becomes a little less straightforward. Some
important observables in physics are connected mainly with reflections, such as in
quantum mechanics and elementary particle physics. It is interesting to notice that
some observables which are also described by reflections, are involved in some of
the still unsolved problems in physics. As example, we mention observables which
are connected with the CPT theorem.
We start, as in the previous sections, with the two-dimensional case since this
shows all the essential geometric properties that also appear in higher dimensions.
Isometries are special cases of normal operators. Nontrivial normal operators in
two dimensions and real vector spaces have the form
[ α −β ]
.
β α , with α, β ∈ R,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 343
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_12
344 12 Applications

as it was given by the Theorem 11.1. Isometries are operators and have only one
additional condition: .α2 + β 2 = 1 which means we can find some .ϕ ∈ R such that
.α = sin ϕ and .β = cos ϕ, and so the matrix takes the form:

[ ]
cos ϕ − sin ϕ
. A ≡ A(ϕ) = ϕ ∈ R.
sin ϕ cos ϕ

This may also be expressed differently: . A is an operator which transforms an


orthonormal basis and particularly the canonical orthonormal basis in .R2 to another
orthonormal basis into .R2 :

A : R2 −→ R2
.

E = [e1 e2 ] |−→ [a1 a2 ] = A.


[ cos ϕ ]
If we consider the unit circle . S 1 := {e(ϕ) := sin ϕ : ϕ[ ∈ R}, ]we may choose .a1 =
e(ϕ) and .a2 = e(ϕ + π2 ) (which corresponds to .a2 = −cos sin ϕ
ϕ so that .a2 ⊥a1 ) and
obtain . A as above. Of course, if .{a1 , a2 }is orthonormal, then so is .{a1 , a2 } and thus
we obtain a second possibility, parametrized again by the angle .ϕ giving
[ ]
cos ϕ sin ϕ
. B ≡ B(ϕ) = .
sin ϕ − cos ϕ

It is immediately apparent that . A corresponds to a rotation around the angle .ϕ and it


turns out that . B is a reflection over the line .span(e( ϕ2 )). For both . A and . B, we have
T T
. A A = 12 and . B B = 12 but .det A = 1 and .det B = −1. So we may define:

. O(2) := {A ∈ R2×2 : AT A = 12 } and


S O(2) := {A ∈ R2 × R2 : AT A = 12 and det A = 1}.

These sets, . O(2) and . S O(2), are subgroups of .Gl(2, R) ≡ Gl(2) and we have the
relation . S O(2) < O(2) < Gl(2). If we consider the elements of the group . S O(2)
as points in .R2 , we see the equivalence of . S O(2) ∼
= S 1 . As for the rest of . O(2), since
. A ∈ O(2) implies

1 = det(AT A)
.

= det(AT ) det(A)
= det(A)2 ,

we see that
. O(2) − S O(2) = {B ∈ O(2) : det B = −1}.

The main difference between . A and . B is that . B is a symmetric matrix and is therefore
diagonalizable.
12.1 Orthogonal Operators–Geometric Aspects 345

It is easy to check that the eigenvectors of . B are given by .(c1 , c2 ) = (e( ϕ2 ), e( ϕ2 +


π
2
)) with the corresponding eigenvalues .λ1 = 1 and .λ2 = −1. So we have . Bc1 = c1
and . Bc2 = −c2 . If we define the lines .U = span(c2 ) = Rc2 and . H = U ⊥ , we obtain
the orthogonal decomposition .R2 = U Θ H . Then . B describes a reflection of the
vectors .v ∈ R2 in the line . H so that we have:

. B|U = −idU and B| H = id H .

The points of . H are the fixed points of . B. It is interesting to see that . B factorizes!
[ ] [ ][ ]
cos ϕ sin ϕ cos ϕ − sin ϕ 1 0
. B= = .
sin ϕ − cos ϕ sin ϕ cos ϕ 0 −1
[ 0 ]
The matrix . S := 01 −1 is a reflection with the eigenvectors .e1 and .e2 and of course
.det S = −1. So the group . O(2) can be described as:

O(2) = {A, AS : A ∈ S O(2)} or


.

O(2) = S O(2) ∪ S O(2)S with S O(2) ∩ S O(2)S = ∅.

A reflection can be expressed in terms of a vector .a and a corresponding projection,

|a)(a| aa T
. Pa = ≡ T ,
(a|a) a a

as such
|a)(a|
. a S := id − 2 ≡ id − 2Pa .
(a|a)

So we have
. S = id − 2|e2 )(e2 |

and

. B = id − 2|e(ϕ/2 + π/2))(e(ϕ/2 + π/2)| and id = |e1 )(e1 | + |e2 )(e2 |.

At this point, it is also interesting to ask what happens when we compose reflec-
tions . Sb Sa {b /= ±a}. It turns out that . Sb Sa is again a rotation. The fixed points of
. Sb Sa are given by . Ha ∩ Hb = {0}! Since . Sb Sa is an orthogonal operator and the only
fixed point is zero, . Sb Sa must be a rotation. An explicit calculation can also show
this:
[ ][ ]
cos ϕ + sin ϕ cos ψ sin ψ
. B(ϕ)B(ψ) = =
sin ϕ − cos ϕ sin ψ − cos ψ
[ ]
cos(ϕ − ψ) − sin(ϕ − ψ)
= A(ϕ − ψ).
sin(ϕ − ψ) + cos(ϕ − ψ)
346 12 Applications

As we now see, we can express every rotation in .R2 by a composition of two reflec-
tions. As shown in the above equation, a possibly less pleasant result is that the group
. O(2) is a non-abelian group (noncommutative). Nevertheless, its subgroup . S O(2)
is a commutative group: . A(ϕ)A(ψ) = A(ϕ + ψ).
All the above results can be generalized to any .n-dimension, in particular the
Theorem 11.1 of the previous section. The following formulation is slightly different
from Corollary 11.4:

Corollary 12.1 Orthogonal operators.


If . f is an orthogonal operator on a Euclidean vector space, then the following
statements are equivalent:
– . f is orthogonal.
– There is an orthogonal decomposition of .V into one-dimensional or two-
dimensional . f -invariant subspaces of .V : .V = U1 Θ U2 Θ · · · Θ Us .s ∈ N,
.i ∈ I (n) and a corresponding orthonormal basis of . V . With respect to this
basis, . f has a block diagonal representation such that each block is a .1 × 1-
matrix .(Fi = ±1) or a .2 × 2-matrix of the form
[ ]
cos ϕi − sin ϕi
. Fi =
sin ϕi cos ϕi

with .ϕi ∈ R, .i ∈ I (s). ∎

We will now discuss the role of reflections as special orthogonal operators on an


n-dimensional Euclidean vector space .V . The definition is the same as in the .R2
.
case above. It is characterized by the orthogonal decomposition of .V into a one-
dimensional subspace .U and its orthogonal complement . H := U ⊥ so that .dim H =
n − 1 and .V = U Θ H . This is explained in the following proposition.

12.1.1 The Role of Reflections

Proposition 12.1 Reflections in .n dimensions, reflections in .V .


Let .V be an .n-dimensional real inner product space, .a ∈ V \{0} and .sa : V →
V be the map:
(a|v)
.sa (v) = v − 2 a.
(a|a)

Then the following statements are valid:


12.1 Orthogonal Operators–Geometric Aspects 347

(i) s (a) = −a.


. a

(ii) .(w|sa (v)) = (sa (w)|v), w, v ∈ V .


(iii) .sa ◦ sa = id.
(iv) .sa is an orthogonal operator on . V (i.e., .sa ∈ O(V )).

Some additional explanation may be quite useful. If we define

U ≡ U (a) := Ra ≡ span(a) and H = H (a) := U ⊥ = a ⊥ ,


.

then
. V = U Θ H and sa i H = id H with sa iU = −idU .

This means that the vectors of all .w ∈ H are fixed points of .sa : sa (w) = w. The
map .sa describes a reflection of all vectors .u ∈ U over zero and a reflection of the
vectors .v ∈ V − H over the hyperplane . H . Hence .sa is a symmetric, involutive, and
orthogonal operator on .V . We are coming now to the proof.
Proof For the four different points of Proposition 12.1, an explicit calculation leads
to
(i): .sa (a) = a − 2 (a|a)
(a|a)
a = a − 2a = −a.
(ii): .(w|sa (v)) = (w|v) − 2 (a|v)(a|w)
(a|a)
. This is explicitly symmetric in .v, w.
(iii):

(a|sa v)
. a s (sa (v)) = sa (v) − 2 a
(a|a)
[ ]
(a|v) 1 (a|v)(a|a)
=v−2 a−2 (a|v) − 2 a
(a|a) (a|a) (a|a)
(a|v) (a|v) (a|v)
=v−2 a−2 a+4 a = v, utilizing (ii).
(a|a) (a|a) (a|a)

(iv): .(sa w|sa v) = (w|sa sa v) = (w|v), utilizing (iii). ∎

We will now discuss additional properties of orthogonal operators, particularly a


surprising connection of reflections to all other orthogonal transformations.
An obvious fact is that orthogonal transformations act on the sphere of radius .r
given by
(n−1)
.S (r ) := {v ∈ V : ||v|| = r }.

In other words, whenever . f ∈ O(V ):

. f : S (n−1) (r ) −→ S (n−1) (r )
v |−→ f (v)
since || f (v)|| = ||v|| = r.
348 12 Applications

The next proposition shows a key property of reflections that determines the role
of reflections within the orthogonal operators. A reflection can connect two distinct
vectors on a sphere:

Proposition 12.2 Transitive action of reflections on spheres.


Let .V be a real inner product space. Suppose two distinct vectors .u and .v have
.||u|| = ||v||, then there exists a third vector .a ∈ V such that

s (u) = v and sa (v) = u.


. a

Proof We choose the reflection with .a = u − v and we observe that .(u − v|u) =
1/2||u − v||2 = 1/2(u − v|u − v). So we have

(u − v|u) (u − v|u − v)
s
. u−v (u) = u − 2 (u − v) = u − 2 21 (u − v)
(u − v|u − v) (u − v|u − v)
= u − (u − v) = v.

Finally, .sa (v) = u follows simply as .sa is an involution. ∎


This result shows that the whole group . O(V ) acts on the sphere . S (n−1) (r ) transitively
(see Definition 1.6 and also generally Sect. 1.3 on group actions). The following
theorem is an observation made by Élie Cartan. When it was published, it was indeed
a surprise because of the delay with which it appeared and because of the statement
itself. Cartan’s point was simply that the orthogonal group . O(V ) is generated by
reflections:

Theorem 12.1 Reflections and orthogonal operators. (Élie Cartan)


Let .V be an .n-dimensional real inner product space. Every orthogonal oper-
ator . f ∈ O(V ) is a product of at most .n = dim V reflections.

Proof We will prove this theorem by induction on .n = dim V . For .n = 1, we have


. V = R and.(y|x) = yx so that.sa (x) = −x ∀x ∈ R, so that in the notation of Propo-
sition 12.1, .U = R, . H = {0} and . O(V ) = {±id}. Suppose now, for all dimensions
smaller than .n, that the theorem holds. Since . f ∈ O(V ) and . f /= I d, we consider
'
.v0 ∈ V with . f (v0 ) = v0 / = v0 and of course .|| f (v0 )|| = ||v0 ||. So we can apply the
above Proposition 12.1 for .v0 and .v0' . There exists a nonzero vector .a and a reflection
'
.sa so that .sa (v0 ) = vo or .sa ( f v0 ) = v0 and .sa ◦ f (v0 ) = v0 . We set . Z := Rv0 and
⊥ ⊥
. W := (Rv0 ) ≡ v0 . Since .sa ◦ f is an orthogonal transformation, we see immedi-
ately that . Z and .W are .sa ◦ f -invariant Euclidean subspaces of .V with
s ◦ f | Z = id Z and g := sa ◦ f |W ∈ O(W ).
. a
12.2 Singular Value Decomposition (SVD) 349

Since .dim W = n − 1, we can apply the induction hypothesis and we have at most
n − 1 reflections on .W :
.

s , . . . , sbr where bi ∈ W i ∈ I (r ), r < n − 1


. bi

so that

s ◦ · · · ◦ sb1 |W ∈ O(W ) and g = sa ◦ f |W = sbr ◦ · · · ◦ sb1 |W .


. br

Since .bi ∈ W ⊆ V and .z ∈ Z = W ⊥ , we have .sbi z = z for all .i ∈ I (r ) and so .sbr ◦


· · · ◦ sb1 (z) = z or .sbr ◦ · · · ◦ sb1 | Z = I d Z .
This means that we can continue consistently the map .g from .W to .V = W Θ Z:

. g = sa ◦ f = sbr ◦ · · · ◦ sb1 ∈ O(V ).

This leads to
. f = sa ◦ g = sa ◦ sbr ◦ · · · ◦ sb1 .

This proves the factorization of . f to at least .n reflections. ∎

12.2 Singular Value Decomposition (SVD)

In this section, we return to general linear maps . f ∈ Hom(V, V ' ) on inner product
vectors spaces. We discover that if we take not one but two special orthonormal
bases, we obtain extremely pleasant results.
Up to this point, we have had frequent opportunities to see the importance of
bases for understanding the algebraic and geometric structure of linear maps. In this
section, we realize this fact once more. Considering a linear map . f : V → V ' , the
choice of a tailor-made basis . B = [b1 . . . bn ] and . B ' = [b1' . . . bm' ] for .V and .V ' (of
dimension .n and .m) respectively, led us in Chap. 3 to the equation

. f B = B ' ∑1
[ ]
and to a matrix representation of . f of the form . f B ' B ≡ ∑1 = 10r 00 which is the
normal form.
The matrix .∑1 is as simple as possible. The number .r , the rank of . f , is an impor-
tant geometric property. But by choosing such perfect tailor-made bases, we lost
some other important geometric properties of . f . In the case of the endomorphisms
. f ∈ Hom(V, V ) ≡ End(V ), since we consider only one vector space . V , it seems

unnatural to consider more than one basis . B, and this makes such a classification
350 12 Applications

much more complicated. It is more difficult to determine the corresponding normal


form of . f . For endomorphisms, the theory leads in the end to what is called a Jordan
normal form of . f . But, within .End(V ), there are “privileged” endomorphisms that
allow a most simple matrix representation while at the same time keeping the impor-
tant geometric properties of . f . These are the diagonalizable endomorphisms. The
corresponding tailor-made bases, here . B ◦ = [b1◦ , . . . , bn◦ ], are given by the eigenvec-
tors .bs◦ with eigenvalues .λs , .s ∈ I (n):

. f bs◦ = λs bs◦

which we may express also by the equation


[λ 0
]
1
◦ ◦
FB = B Δ with Δ= .. .
.
.
0 λn

Apart from this, there are even more privileged endomorphisms, the normal oper-
ators. As we know from the spectral theorems, the tailor-made bases are given by
the orthonormal bases .C = [c1 , . . . , cn ]. And one of the results was that these are
the only linear maps that are orthogonally diagonalizable:

. f C = C Δ.

There is no doubt that normal operators are the “nicest” operators there are! It is
remarkable that in physical theories like quantum mechanics, they are essentially the
only ones we need. But from a mathematical point of view, there is the question of
universality. We would like to have a normal form for all endomorphisms! To achieve
this, we have to go one step back and try to use two bases for one vector space! This
will lead us to what we call singular value decomposition (SVD) which is applicable
to all . f ∈ End(V ) over real or complex inner product spaces and by construction to
all . f ∈ Hom(V, V ' ).
The appropriate tailor-made bases for . f are two orthonormal bases that are con-
nected to the two diagonalizable self-adjoint operators . f ad f and . f f ad . The use of
orthonormal bases instead of general bases preserves the information about eigen-
values even when considering general linear maps . f ∈ Hom(V, V ' ).
This is why we discuss the SVD for . f ∈ Hom(V, V ' ), the special case . f ∈
End(V ) is completely trivially included in the general situation. As we shall see,
a kind of miracle makes the whole procedure possible: The map . f essentially trans-
forms the eigenvectors, the positive part of the eigenbasis of . f ad f into the eigen-
vectors of . f f ad , and we are led to the following theorem:
12.2 Singular Value Decomposition (SVD) 351

Theorem 12.2 Singular value decomposition (SVD).

Let .V, V ' be inner product spaces of dimensions .n, m respectively and
. f : V → V ' a linear map of rank .r . Then there are orthonormal bases .U =
[u 1 . . . u n ] of .V and .W = [w1 . . . wm ] of .V ' and positive scalars .σ1 > σ2 >
· · · > σr , the so-called singular values of . f , such that

. f (u s ) = σs ws if s ∈ I (r ) and
f (u s ) = 0 if s > r.

Here, .u s are eigenvectors of . f ad f with eigenvalues .λs = σs2 whenever .s ∈ I (r ), and


.λs = σs = 0 if .s > r .
2

This may be expressed by the compact notation:

. f U = W∑ (12.1)

or equivalently by
. f = W ∑U ad (12.2)

with the matrix


⎡ ⎤
σ1 0
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ σr ⎥
.∑ = ⎢ ⎥ uniquely defined. (12.3)
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0 0

If we use the same letters .U and .W for corresponding bases in .Kn and .Km , we have
the unitary (orthogonal) matrices

.U = [→
u 1 . . . u→n ] and W = [w
→1 . . . w
→ m ].

For the representation . F := f W U of . f , we may write the above equation in the


standard matrix form with entries consisting of scalars only:

. FU = ∑W (12.4)
. F = W ∑U . †
(12.5)
352 12 Applications

Remark 12.1 If we denote by .U ∗ = (u ∗s )n the dual basis of .U = (u s )n


(i.e., Us∗ (u r = δs ), we may express all the above equations as a sum of rank .1
.
operators (matrices):

. f = σ1 w1 u †1 + . . . σr wr u r† , (12.6)
. F = σ1 w→ 1 u→†1 + . . . σr w→ r u→† or equivalently (12.7)
. F = σ1 |w1 )(u 1 | + . . . σr |wr )(u r |. (12.8)

Proof Theorem 12.2


By Proposition 11.1, . f ad f ∈ End(V ) is a nonnegative operator (positive semidefi-
nite). An orthonormal eigenbasis .(u 1 , . . . , u n ) of . f ad f is guaranteed by the spectral
theorems with the corresponding eigenvalues .λ which we can order so that:

λ /= 0 if s ∈ I (r ) and λs = 0 if s > r ,
. s

with λ1 > · · · > λr > 0 and λs = σs2 . (12.9)



Since.ker f ad f = ker f (.(v| f ad f v) = ( f v| f v)), we can uniquely define.σs = λs
when .s ∈ I (r ) and .σs = 0 when .s > r , so we have

. f u s = σ s ws if s < r (12.10)
and f u s = 0
. if s > r. (12.11)

It turns out that .(ws )r is an orthonormal basis of .im f = span(w1 , . . . , wr ) < V ' :
When .s, t ∈ I (r ), we have

(σt wt |σs ws ) = ( f u t | f u s ) = ( f ad f u t |u s ).
. (12.12)

This leads to
σ̄ σs (wt |ws ) = (σt2 u t |u s ) = σt2 (u t |u s ) = σt2 δts
. t (12.13)

and so
.(wt |ws ) = δts . (12.14)

We extend .(w1 , . . . , wr ) to an orthonormal basis in .V ' : .(w1 , . . . , wr , wr +1 , . . . , wm )


and so we have
. f u s = σ s ws (12.15)

if .s < r , as above.
If.s > r , since. f u s = 0 ⇔ f ad f u s = 0, we have from. f ad f u s = λs u s = 0,.λs =
0 and .σs = 0, also . f u s = 0. ∎
12.2 Singular Value Decomposition (SVD) 353

Comment 12.1 SVD direct.

It is interesting to notice that if . f ∈ Hom(V, V ' ) is given by the data . f u s =


σs ws with .s < r , and . f u s = 0 with .s > r where .U = (u 1 , . . . , u n ) and where
'
. W = (w1 , . . . , wm ) are orthonormal bases in . V and . V , then it follows that .U

is an eigenbasis of . f f with eigenvalues .λs = σs with .s < r and .λs = 0 with


ad 2

.s > r .
This means that

. f ad f u s = σs2 u s if s < r and (12.16)


. f ad
f us = 0 if s > r. (12.17)

Proof A direct calculation leads to the above result: We expand . f ad wt ∈ V


according to the orthonormal basis .U :

. f ad wt = u s (u s | f ad wt ) = u s ( f u s |wt )
= u s (σs ws |wt )
= σs u s (ws |wt ) = σs u s δst (12.18)

so that
. f ad ws = σs u s . (12.19)

Starting again with


. f u s = σ s ws , (12.20)

we test
. f ad f u s = σs ( f ad ws ) = δs (δs u s ) = σs2 u s . (12.21)

For .s > r we have . f ad f u s = 0 ⇔ f u s = 0. ∎

The singular value decomposition (SVD) can be used to give direct proof of the
polar decomposition (see below) of an endomorphism of a square matrix. The polar
decomposition gives a factorization of every square matrix as a product of a unitary
and a nonnegative matrix. This means that we can express every square matrix by two
special normal operators which we can completely describe and understand by the
spectral theorems. In addition, the analogy to complex numbers which was explained
in Sect. 10.5 appears again: Just as we have for every .z ∈ C .z = eiϕ |z| with .ϕ ∈ R, so
we have for every . A ∈ Kn×n . A = Q P with . Q an isomorphism and . P a nonnegative
matrix. The analogy.eiϕ ∼ Q and.|z| ∼ P should be clear. This leads to the following
result.
354 12 Applications

Proposition 12.3 Polar decomposition.


For any square matrix . A ∈ Kn×n , there exists a unitary matrix . Q and a non-
negative matrix . P such that
(a) . A = Q P and
(b) if . A is invertible, this decomposition is unique.

Proof Using the SVD as in the above proposition for matrices, we can express . A by
the equation
.A = W ∑ U .

U and .W are unitary matrices and .∑ is a diagonal matrix with nonnegative entries.
.
Using .U † U = 1n , we obtain

. A = W 1n ∑U † = W U † (U ∑U † ).

We set . Q := W U † and . P := U ∑U † . It is obvious that . Q is unitary and . P nonneg-


ative with the same rank as .∑. So the above equation leads to

. A=QP

which is assertion a.
To prove assertion b of Proposition 12.3, we assume that

. A = Q P = Q 0 P0

again with . Q 0 unitary and . P0 nonnegative. Now, since . A is invertible, . P and . P0 are
also invertible and therefore positive.

Consider A† A = (Q P)† Q P = P † Q † Q P = P Q † Q P = P 2
.

and A† A = (Q 0 P0 )† Q 0 P = P02 .

We obtain . P 2 = P02 and since . P1 P0 are positive, . P0 = P . With . A = Q 0 P0 = Q P


we obtain . Q 0 = A P −1 . This shows that the decomposition . A = Q P is unique. ∎

12.3 The Scalar Product and Spacetime

If you did not want to understand the structure of spacetime, you would probably not
need, as a physicist, to read this section. If Einstein had not discovered the theory of
relativity, the problem discussed in this section would not be so relevant for physical
investigations. In particular, as we have special relativity, we may ask how many
12.3 The Scalar Product and Spacetime 355

kinds of “special relativities” could be possible, in principle. Special relativity is


known to be characterized by three space dimensions and one time dimension. This
ratio could be (theoretically) different, especially if we want to consider models of
spacetime with larger spacetime dimension. In other words, we would like to classify
all possible special relativities. Today, this is useful for instructional reasons and also
in connection, for example, with cosmological models. Mathematicians have already
solved this mathematical problem. It is the classification of all symmetric bilinear
forms which James Joseph Sylvester found in the middle of the 19th century. The
discussion of this problem here allows us to examine the role of matrices. They are
used to represent, for example, linear maps or bilinear forms. We expect that this
additional and, in a way, technical point will be particularly useful for physicists.
We consider symmetric bilinear forms in a real vector space for the sake of sim-
plicity and their relevance in physics. We start with representing a symmetric bilinear
form on a real vector space, here also called scalar product, and the corresponding
transformation formula by changing the bases.
We consider an .n-dimensional vector space .V with a bilinear, symmetric scalar
product .s:

.s: V × V −→ R
(u, v) |−→ s(u, v).

We choose a basis . B = (b1 , . . . , bn ). We define a matrix . S B = (σμν


B
) with .σμνB
:=
s(bμ , bν ) μ, ν ∈ I (n). Since .s is a symmetric scalar product, it is clear that . S B is a
symmetric matrix. If .u = bμ u μ and .v = bμ v μ ∈ V , we obtain
T
s(v, u) = s(bμ v μ , bν u ν ) = v μ σμν
.
B ν
u = v B SB u B . (12.22)

If we choose a second basis .C = (c1 , . . . , cn ), we have .σμν


C
:= s(cμ , cν ) . SC := (σμν
C
)
T
and .s(v, u) = vC SC u C . The change of the bases from . B to .C is given by .bs =
μ μ
cμ τs , τs , ∈ R and we may define the transition matrix
μ
. T ≡ TC B := (τt ).

We also have . B = C T , using matrix notation with entries vectors in .V , . B =


[b1 , . . . , bn ] and .C = [c1 , . . . , cn ] for . B and .C. So we have

σ B = s(bμ , bt ) = s(cν τμν , cr τtr )


. μ,t (12.23)

and

σ B = τμν s(cν , cr )τtr = τμν σνr


. μ,t
C r
τt ≡ τνμ σνr
C
τr t , with r, t ∈ I (n). (12.24)

We set.τμν ≡ τνμ for comparison with the usual notation in the mathematical literature.
At the level of matrices, we find
356 12 Applications

T
S = TC B SC TC B ≡ T T SC T
. B or equivalently (12.25)
T −1 T −1
S =
. C TBC S B TBC ≡ (T ) SB T . (12.26)

This is the transition formula for bilinear forms.


It is important to compare this with the transformation formula for endomor-
phisms. If . f ∈ End(V ) with corresponding representations . f B B and . f C B , with the
bases . B and .C, then we have

f
. BB = TBC f CC TC B = T −1 f CC T or equivalently (12.27)
−1
f
. CC = TC B f B B TBC = T fBB T . (12.28)

As one can see, the difference is the .T −1 that appears for endomorphism transforma-
tions. This means that if we know precisely what a given matrix represents, we know
its transformation law. This is extremely important to know, not only in relativity but
across physics. It is useful to give or recall some further definitions to proceed.
We observed above an equivalence relation corresponding to symmetric bilinear
forms. This is usually called congruence: The matrices . A and . B are congruent if an
invertible matrix .G ∈ Gl(m) exists so that . A = G T BG holds. In this sense, . S B and
. SC are congruent. The rank is mainly defined for a linear map, but it may also be
defined for a symmetric bilinear form:

Definition 12.1 Rank of a bilinear form.


Suppose .s = S is a bilinear form on .V . The rank of .s, .rank(s), is the rank of
any representing matrix, that is, for any basis . B of .V

. rank(s) = rank(S B ).

We say .s is degenerate if there is some .v ∈ V |{0} such that .s(v, u) = 0 for all
u ∈ V . Otherwise, we say .s is nondegenerate.
.
The subspace .U0 of .V given by

. U0 := {v ∈ V : s(v, u) = 0 for all u ∈ V },

what may also be called degenerate.


If .s is nondegenerate, then we have of course .U0 = {0} and .rank(s) =
dim V . We further recall the following definition which is valid for symmetric
and hermitian forms.
• .s is positive definite if .s(v, v) > 0 for all .v /= 0, v ∈ V ,
• .s is negative definite if .s(v, v) < 0 for all .v /= 0, v ∈ V ,
• .s is indefinite if there exist .u and .v in .V ,
so that .s(v, v) > 0 and .s(u, u) < 0.
12.3 The Scalar Product and Spacetime 357

The analog definition is valid for a symmetric or a Hermitian matrix and in an obvious
notation we have to replace .s(v, v) or .s(u, u) by .v→† S v→ and .u→† S u→.
We are now returning to our classification problem for symmetric bilinear forms
which corresponds, in the case of spacetime, to the classification of all theoretically
possible models for a flat (linear) spacetime, that is, for special relativity. Mathe-
matically speaking, we have to consider all possible pairs .(V, s) with .s not only not
positive definite but also semidefinite and even .s degenerate. This means we have
to consider also non-Euclidean geometries. As is well-known, it took science more
than 2000 years to get to that point!
Here we restrict ourselves, of course, to the mathematical problem and, without
loss of generalization and for the sake of simplicity, to the level of matrices.
The first step is to diagonalize a representation, a matrix like . S B . The spectral
theorem tells us that we can even use an orthogonal matrix . Q ∈ O(n). This fits very
nicely with the transformation formula for scalar products since whenever. Q ∈ O(n),
−1
.Q = Q T ! So the eigenvalue equations at the level matrices give us

. B S Q = Q∑ B and ∑ B = Q T SB Q (12.29)

with ⎡ ⎤
σ1B 0
⎢ .. ⎥
∑B = ⎣
. . ⎦ and σμB ∈ R. (12.30)
0 σnB

The scalars (numbers) .σμB are the eigenvalues of . S B if we consider . S B as an


endomorphism in .Rn . Here, . S B instead represents the bilinear form .s given in the
basis . B.
The process of diagonalization leads mathematically to the result in Eq. (12.30).
But the notion of eigenvalue is not relevant for a bilinear form, because the scalars
.σ1 , . . . , σn are not invariant. We show this fact explicitly by the index . B. We can see
B B

the reason for the non-invariance in the transformation formula for scalar products
above (Eq. (12.29)). We have in . S B = Q T SC Q on the left hand side of . SC the matrix
T −1
. Q instead of the matrix . Q . This scales the coefficients in . SC , and we obtain

(σ1C , . . . , σnC ) /= (σ1B , . . . , σnB ).


.

This is clear in the case.dim V = 1. The basis is given by.b ∈ V and.b /= 0. So we have
σ B = s(b, b). A second basis given by .c ∈ V and .c /= 0 with .c = λb, λ ∈ R, λ /= 0
.

leads to .σ C = s(c, c) = s(λb, λb) = λ2 s(b, b) = λ2 σ B . It is interesting to notice


that .λ2 being positive, the sign of .σ B is conserved. The so-called Sylvester’s law of
inertia determines exactly the invariance we are looking for: The matrices . A and . B
have the same invariants if an invertible matrix.G ∈ Gl(n) exists, so that. A = G T BG.
It turns out that the invariants of the congruence are three numbers: The number of
positive .(n + ), negative .(n − ), and zero .(n 0 ) entries in the diagonal matrix .∑ B . This
means that we can write this .∑ B as a block diagonal matrix:
358 12 Applications
[ ]
∑ (+) 0
. ∑B = ∑ (−1) = diag(σ1(+) , . . . , σn(+)
+
, σ1(−) , . . . , σn(−)

, σ1(0) , . . . , σn(0)
0
)
0 ∑ (0)

with .σ1(+) , . . . , σn(+)


+
positive, .σ1(−) , . . . , σn(−)

negative, and .σ1(0) = 0, . . . , σn(0)
0
= 0.
For a compact formulation of the above, we propose this definition.

Definition 12.2 The inertia of.s are the numbers.(n + , n − , n 0 ).(n = n + , n − , n 0 ).

All the above discussion leads to the following theorem.

Theorem 12.3 Sylvester’s law of inertia.


The inertia .(n + , n − , n 0 ) of a symmetric bilinear form .s or of a symmetric
matrix. S is invariant under the congruence. This means that we have.n + (S B ) =
n + (SC ), .n − (S B ) = n − (SC ), and .n 0 (S B ) = n 0 (SC ).

Proof As we see from the above discussion, we may, according to the spectral
theorem, diagonalize the matrix . S = S B . We obtain the diagonal matrix .∑ using the
orthogonal matrix . Q ∈ O(n). This corresponds to an ordered orthonormal basis.

. Q = (qi(+) , q (−) (0)


j , qk ) with i ∈ I (n + ), j ∈ I (n − ), k ∈ I (n 0 ), (12.31)

with
( ) ( )
. ∑ B = diag σi(+) , σ (−) (0)
j , σk ≡ diag σ(B)i(+) , σ(B)(−) (0)
j , σ(B)k (12.32)
B

and .σi(+) positive, .σ (−)


j negative, and .σk(0) = 0 for all .i ∈ I (n + ), j ∈ I (n − ), k ∈
(n + ) (n − )
I (n 0 ). Now we can scale the basis vectors in. Q and we obtain. Q̄ = (→qi , q→ j , q→k(n 0 ) )
with
1 1
→i(+) = √ qi(+) , q→(−)
.q =/ q (−) and q→k(0) = q→k(0) . (12.33)
σi j
|σ j | j

This leads to ⎡ ⎤
1n + 0
∑ 0B = ⎣
. 1n − ⎦. (12.34)
0 0n 0

We consider the subspaces

U B(+) := span(→
. q1(+) , . . . , q→n(+)
+
), U B(−) := span(→
q1(−) , . . . , q→n(−)

) and (12.35)
U B(0) := span(→
. q1(0) , . . . , q→n(0)
0
). (12.36)
12.3 The Scalar Product and Spacetime 359

It is clear that for .u +B ∈ U B(+) , .u −B ∈ U B(−) and .u 0B ∈ U B(0) , we have .s(u + , u + ) > 0,
.s(u − , u − ) < 0 and .s(u 0 , u 0 ) = 0.
At this point, we may have realized that the numbers.(n +B , n −B , n 0B ) are well defined
since .n 0 = dim V − rank(s) and .n +B (n −B ) correspond to the maximum dimension of
a subspace with positive (negative) definite restriction of .s. Apparently, we here have
a direct sum:
(+)
.V = U B ⊕ U B(−) ⊕ U B(0) . (12.37)

We can of course follow the same procedure for any other basis .C and . SC and we
come to similar expressions:

n C = dim UC(+) , n C− = dim UC(−) , n C0 = dim UC(0) .


. + (12.38)

This means that if a subspace . Z has the property that for all .z ∈ Z , .s(z, z) > 0
holds, then its dimension is not bigger than .n +B so that .dim Z < n +B = dim U B(+) . This
can also be proven by contradiction:

.If dim Z > n +B ,

then there exists a z̃ ∈ Z ,


.

/ U B(+) and.z̃ ∈ U B(−) + U B(0) which signifies.s(z̃, z̃) < 0 or, equivalently, that
so that.z̃ ∈
(0)
.dim(Z ∩ U ⊕ U (−1) ) > 1. This is in contradiction to the hypothesis .s(z, z) > 0
for all .z ∈ Z . So .dim Z < n +B holds.
Therefore, if we take . Z = UC(+) and .dim UC(+) = n C+ , we have .n C+ < n +B . Inter-
changing . B and .C, we get .n +B < n C+ . This leads to .n +B = n C+ and shows that

.(n +B , n −B , n 0B ) = (n C+ , n C− , n C0 ). (12.39)

We see that the numbers .(n + , n − , n 0 ) are basis-independent and depend only on the
scalar product .s. This means further that they are invariant under the congruence.
This proves the theorem. ∎

Summary

Three important applications of the last three chapters to operators were discussed.
Firstly, the orthogonal group in two dimensions was thoroughly examined. Then,
the structure of orthogonal operators was presented for arbitrary dimensions, cor-
responding to the spectral theorem for the orthogonal group. Reflections were also
presented as a multiplicatively generating system of the orthogonal group.
Next, the singular value decomposition (SVD) was introduced, a highly relevant
method in the field of modern numerical linear algebra. It allows for a universal
360 12 Applications

representation of any operator through the use of two specially tailored orthonormal
bases.
Finally, in terms of the structure of spacetime in special relativity, the classification
of symmetric, nondegenerate, bilinear forms was discussed. This was a well-known
mathematical problem that was addressed as early as the 19th century by the math-
ematician James Joseph Sylvester.

References and Further Reading

1. G. Fischer, B. Springborn, Lineare Algebra (Eine Einführung für Studienanfänger, Grundkurs


Mathematik (Springer, 2020)
2. S.H. Friedberg, A.J. Insel, L.E. Spence, Linear Algebra (Pearson, 2013)
3. N. Johnston, Advanced Linear and Matrix Algebra. (Springer, 2021)
4. M. Koecher, Lineare Algebra und analytische Geometrie. (Springer, 2013)
5. G. Strang, Introduction to Linear Algebra. (SIAM, 2022)
Chapter 13
Duality

In physics, there is a tendency to believe that dual vector spaces .V ∗ = Hom(V, K)


are unnecessary and superfluous. This can be fallacious and cause unnecessary obsta-
cles to understanding crucial aspects of linear algebra related to theoretical physics.
Duality is also essential for understanding tensor formalism, the subject of the fol-
lowing chapter. A particular difficulty with the dual space is that we have to know
precisely in what situations one must and in what situations one need not pay close
attention to it. Even in this case, the elements of .V ∗ , which appear as elements of
. V , have a different transformation behavior from the standard elements in . V . This
is why our intention in this section is also to help avoid such confusion.
To clarify this situation, we have to proceed slowly and repeat, when necessary,
some well-known facts. So we will distinguish precisely the formalism corresponding
to an abstract vector space (without an inner product) from a vector space with a scalar
product. We believe that this section is necessary to understand tensor formalism.
This makes it indispensable for a good understanding of special and general relativity
and relativistic field theory.

13.1 Duality on an Abstract Vector Space

We focus our attention on the different transformation behavior of vectors and cov-
ectors by changing bases, and briefly reviewing what we already know. This includes
our conventions and notations, which should not be underestimated here. We try to
use as clear a notation as possible, especially for the desiderata in physics.
We start with an abstract .n-dimensional vector space .V, dim V = n and its dual

. V . We choose two bases

B := (b1 , . . . , bn ) ≡ (bs )n and C := (c1 , . . . , cn ) ≡ (ci )n ,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 361
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_13
362 13 Duality

with the corresponding dual bases

B ∗ := (β 1 , . . . , β n ) ≡ (β s )n and C ∗ := (γ 1 , . . . , γ n ) ≡ (γ i )n

given by
.β s (br ) = δrs and γ i (c j ) = δ ij (13.1)

where .r, s, i, j ∈ I (n).

Remark 13.1 Cobasis notation.


A consistent notation within our sign convention is to write .B := [β 1 . . . β n ]T
and .C := [γ 1 . . . γ n ]T vertically. Yet, if it is useful for our demonstration, we
identify the corresponding lists, colists, and matrices.

For a vector .v ∈ V and a covector .ξ ∈ V ∗ , we have the following expression, using


the Einstein convention:

v = bs v sB = ci vCi and ξ = ξsB β s = ξiC γ i .


. (13.2)

The coefficients (components) .v sB , vCi , ξsB , ξiC ∈ K can be expressed by

v s = β s (v), vCi = γ i (v) and ξsB = ξ(bs ), ξiC = ξ(ci ).


. B (13.3)

The change of basis transformation matrix .T = (τsi ) with the transformation coeffi-
cient .τsi is given by:

b = ci τsi or ci = bs τ̄is and T̄ = (τ̄is ) = T −1 .


. s (13.4)

The coefficients .τsi , τ̄is can also be expressed by

τ i = γ i (bs ) and τ̄is = β s (ci ).


. s (13.5)

The matrix form of the change of basis is given by

. B = C T or C = BT −1 . (13.6)

Note the correspondence of Eqs. (13.6) to (13.4). Using the above duality relation
(Eq. 13.1), we obtain
.β = τ̄i γ or γ i = τsi β s .
s s i
(13.7)

The corresponding matrix form is given by

B = T −i C or C = T B.
. (13.8)
13.1 Duality on an Abstract Vector Space 363

For the coefficients of .v and .ξ , we write


⎡⎤ ⎡ '⎤
v 1B vC
⎢ .. ⎥ ⎢ .. ⎥ B
.v B := ⎣ . ⎦ , vC := ⎣
. ⎦ , ξ := [ξ1 . . . ξn ], ξ := [ξ1 . . . ξn ],
B B C C C
(13.9)
vβn v?n

and we obtain

v s = τ̄is vCi or vCi = τsi v sB and ξsB = ξiC τsi or ξiC = ξsB τ̄is .
. B (13.10)

Note the correspondence of Eq. (13.10) to Eqs. (13.7) and (13.4). The matrix form
of the above equation is expressed as

v = T −1 vC or vC = T v B and ξ B = ξ C T or ξ C = ξ B T −1 .
. B (13.11)

Hence,

vector coefficients transform like T, covector coefficients like (T −1 )T ,


.

cobasis elements transform like T, basis elements like (T −1 )T .

We may also express this fact slightly differently:

v s transforms like β s , ξs transforms like bs ,


.

v i transforms like γ i , ξi transforms like ci . (13.12)

Note that .v s , ξs , v i , ξi ∈ K, .bs , ci ∈ V and .β s , γ i ∈ V ∗ .


We therefore see explicitly the importance of the position of the indices (upstairs
and downstairs). This is a great advantage of our tensor notation within linear algebra.
It allows to see immediately why and how.v ∈ V and.ξ ∈ V ∗ are coordinate invariant:

v = bs v sB = ci vCi and ξ = ξsB bs = ξiC γ i ,


. (13.13)

with
v s , vCi , ξsB , ξiC ∈ K.
. B

In our discussion, the distinction between .V and .V ∗ is at least implicitly always


present. If we use the basis . B and its dual .B, we can determine an isomorphism:

. g : V −→ V ∗
bs |−→ g(bs ) = β s . (13.14)

Needless to say that we have here a basis dependent isomorphism .g ≡ g B and that
the bases . B and .B are the tailor-made bases of .g. So we have the representation
364 13 Duality

g B = 1n .
. BB (13.15)

The same can be done with the second bases .C and .C and we again get the represen-
tation
. gCC = 1n .
C
(13.16)

Since no other structure is present in .V , the explicit use of bases is needed for any
isomorphism. So we may conclude this “obvious” isomorphism is not canonical as it
depends on extra information. In fact, one can prove that no “canonical” isomorphism
exists for general vector spaces.
Before proceeding, let us have a short break. There is a point about duality
that we have to think over. The question arises of what will happen if we con-
sider further duals. What happens if we consider .(V ∗ )∗ and .((V ∗ )∗ )∗ , and so
forth. Here, we are lucky when .V is finite-dimensional. In contrast to what hap-
pens in an infinite-dimensional vector space, the duality operation stops. We set
∗ ∗ ∗∗
.(V ) ≡ V := Hom(V ∗ , K). The reason for this break is that between .V ∗∗ and .V ,
there exists a basis-independent (canonical) isomorphism.:

.ev : V −→ V ∗∗ = Hom(V ∗ , K)
v |−→ ev(v) ≡ v # : v # (ξ ) := ξ(v). (13.17)

A direct inspection shows that the map .ev is linear, injective, and surjective and we
have

= V ∗∗ .
. V can (13.18)

We therefore can identify .V ∗∗ with .V and write .v # = v and define

v(ξ ) := ξ(v)!
. (13.19)

This is a fundamental relation used implicitly in tensor formalism and which justifies
the dual nomenclature.

13.2 Duality and Orthogonality

We can now come to duality with the additional structure of an inner product in
a vector space and so bring together duality and orthogonality. The scalar (inner)
product changes the connection between .V and .V ∗ drastically, as it obtains for us,
without using a basis, not only a canonical isomorphism between .V and .V ∗ , but even
more a canonical isometry between .V and .V ∗ . If we want to, we can identify .V and

. V as vector spaces with a scalar product, here meaning a “nondegenerate bilinear
form”.
13.2 Duality and Orthogonality 365

In physics, without discussing this point, we make this identification from aca-
demic infancy, for example, in Newtonian mechanics, in electrodynamics, and in
special relativity. Since the most critical applications of duality in physics, in special
and general relativity and relativistic field theory, concern real vector spaces, we
restrict ourselves in what follows to real vector spaces. For simplicity’s sake, we go
one step further and discuss the case of a positive definite scalar product, that is,
we consider Euclidean vector spaces. The formalism is precisely the same for the
more general case of a nondegenerate scalar product. In addition, it is comfortable
to always have in mind our three-dimensional Euclidean vector space.
We consider an .n-dimensional Euclidean vector space .(V, s) with a symmetric
positive definite bilinear form .s:

s : V × V −→ R
.

(u, v) |−→ s(u, v) ≡ (u|v).

The first pleasant achievement is that we get from.s a canonical isomorphism between
V and .V ∗ :
.

Proposition 13.1 Canonical isomorphism.


The map

.ŝ : V −→ V ∗
u |−→ ŝ(u) := s(u, ·) = (u|·) ≡ û ∈ V ∗ (13.20)

is a canonical isomorphism between .V and .V ∗ :

. V∼
=V ∗ .
s (13.21)

Proof. The map .û is a linear form since .û(v) = s(u, v) and .s is a bilinear form.
Here, .V is a real vector space and .s is linear in both arguments.

– .ŝ is injective since .ker ŝ = {0}: If .ŝ(u) = û = 0∗ ∈ V ∗ , then .û(v) = 0∗ (v) =


0 ∀ v ∈ V . This means that .û(v) = (u|v) = 0 ∀ v ∈ V . Setting .v = u, we also
have .(u|u) = 0 so that .u = 0 and .ŝ is injective.
– .ŝ is also surjective: Since .dim V ∗ = dim V and .ŝ(V ) ≤ V ∗ , we have .ŝ(V ) = V .
– .ŝ is surjective and injective and so .ŝ is a vector space isomorphism.

. ⊔

366 13 Duality

Remark 13.2 Notation for the canonical isomorphism (isometry).


We believe that the following notation is quite suggestive and useful: For the
inverse of .ŝ we use the symbol .š ≡ (ŝ)−1 .

.š : V ∗ −→ V
ξ |−→ š(ξ ) ≡ ξ̌ ≡ u(ξ ) ∈ V

so that
(u(ξ )|v) ≡ (ξ̌ |v) := ξ(v) ∈ R.
. (13.22)

The second achievement of the inner product, also called a metric in physics, is that
s canonically induces in .V ∗ a metric which we denote by .s ∗ . Therefore both, .(V, s)
.
and .(V ∗ , s ∗ ), are Euclidean vector spaces. The previously defined map .ŝ : V → V ∗
is now an isometry and not “only” an isomorphism (see Comment 13.1 below).

Definition 13.1 Induced inner product on .V ∗ .


Given an inner product space (.V, s), the inner product on .V ∗ is given by

s ∗ : V ∗ × V ∗ −→ R
.

(ξ, ζ ) |−→ s ∗ (ξ, ζ ) ≡ (ξ |ζ )∗ ≡ (ξ |ζ ), (13.23)

where .s(ξ̌ , ζ̌ ) = s ∗ (ξ, ζ ) or equivalently by

(û, v̂) |→ s ∗ (û, v̂) := s(u, v).


. (13.24)

With this metric, .(V ∗ , s ∗ ) is also a Euclidean vector space.

Comment 13.1 The canonical isometry.


(i) Equations (13.23) and (13.24) mean that .(ŝ(u)|ŝ(v)) ≡ (û|v̂)∗ = (u|v) so
that .ŝ : V → V ∗ is a canonical isometry .V ∼ =V ∗ .
s
(ii) We have
.||û|| = ||u||, ||ξ̌ || = ||ξ || (13.25)

and from Eqs. (13.23) and (13.24),

.(ξ |v̂)∗ ≡ (ξ |v̂) = ξ(v). (13.26)

Notice the correspondence between Eqs. (13.26) and (13.22).


13.2 Duality and Orthogonality 367

If we choose the bases . B = (b1 , . . . , bn ) and .B = (β 1 , . . . , β n ) with .β i (b j ) =


δ ij , i, j ∈ I (n), the representation of .s ∗ is given by the matrix . S ∗ :

. S ∗ = (σ i j ) (13.27)

with .σ i j := s ∗ (β i , β j ).
It is interesting that the matrices . S ∗ and . S = (σi j ) with .σi j = s(bi , b j ) represent
the isometry between .V and .V ∗ .
The map .ŝ and its inverse .š ≡ (ŝ)−1 produce a new basis in .V ∗ and .V :

. B̂ := (b̂1 , . . . , b̂n ) and B̌ := (β̌ 1 , . . . , β̌ n ).

There are now two bases in .V ∗ : .B and . B̂.


To keep with our convention of using latin letters for vectors in .V and greek letters
for covectors in .V ∗ , we have to redefine the elements .b̂i and .β̌ i and to write:
– .βi := b̂i and .bi := β̌ i , so we write
– . B̂ = (β1 , . . . , βn ) compared to .B = (β 1 , . . . , β n ), both in .V ∗ ;
– . B̌ = (b1 , . . . , bn ) compared to . B = (b1 , . . . , bn ), both in .V .
We show and proof all the above assertions in the next proposition.

Proposition 13.2 Representation of the isomorphism .ŝ : V → V ∗ .


Let .V be a real inner product space. With all notation defined as above, the
following statements hold true.
(i) .βi = σi j β i j and . S = (σi j ) invertible;
(ii) . S ∗ = S −1 , . S ∗ S = 1n and .σ ik σk j = δ ij ;
(iii) .bi = σ i j b j .

Proof. (i)
Since .ŝ(bi ) ∈ V ∗ , we may write .ŝ(bi ) = λi j β i j with .λ j ∈ R. On the other hand, we
have .ŝ(bi ) = b̂i ≡ βi . This leads to .λi j β i (bk ) = βi (bk ), giving

λ δ i = βi (bk ) ≡ b̂i (bk ) = (bi |bk ) = σik


. ij k

and so .λik = σik . . S is invertible since .ŝ is bijective.


Proof. (ii)
As .ŝ is an isometry, we have

σ = s(bi , b j ) = s ∗ (b̂i , b̂ j ) ≡ s ∗ (βi , β j )(i)


. ij =
= s ∗ (σik β k , σ jl β l ) = σik s ∗ (β k , β l )σ jl = σik σ kl σl j
368 13 Duality

or equivalently in matrix form s

. S = S S ∗ S since 1n = S ∗ S since (S ∗ = S −1 ).

This means that .σ ik σk j = δ ij . Here, we do not write the index . B in . S since it is not
necessary.

Proof. (iii)
Analogously to (i), since.βˇi ∈ V , we have.β̌ i = μik bk with.μik ∈ R. Taking the scalar
product on both sides, we obtain .μik (bk |b j ) = (β i |b j ). This leads to

.μik σk j = β i (b j )
and to μik σk j = δ ij ,
which means μik = σ ik .

. ⊔

All this shows that, mathematically, the two Euclidean vector spaces .(V, s) and
(V ∗ , s ∗ ) are indistinguishable and can be identified. This means we are left with only
.
one vector space .V but with two bases, .(b1 , . . . , bn ) and .(b1 , . . . , bn ) in .V , and the
following relations:

(bi |b j ) = σi j , (bi |b j ) = σ is and (β i |b j ) = δ ij .


.

This can be shown by a direct calculation:


=
. (bi |b j ) ≡ (β̌ i |β̌ j ) isometric (β i |β j ) = σ i j ,
=
(bi |b j )(a)(bi |σ jk bk ) = (bi |bk )σ jk = σ ik σ jk = δ ij .

These two bases, bases .(bi )n and .(bi )n , are used in many fields in physics and are
often called reciprocal. The following discussion clarifies this situation even more.
The representations. S and. S ∗ of the isometry between.V and.V ∗ are symmetric but
not diagonal. We expect that if we find a tailor-made basis, we can diagonalize . S and

. S . We can indeed do this if we choose an orthonormal basis .C = (c1 , . . . , cn ), with
its dual .C = (γ 1 , . . . , γ n ) and .γ i (c j ) = δ ij . The matrices . SC and . SC∗ also represent
the scalar products .s and .s ∗ , as they were introduced before. It is evident that with
the basis .C we have

σ C = s(ci , c j ) = δi j and so
. ij (13.28)
. S = 1n and S ∗ = 1n . (13.29)

ˇ the two bases in .V , and .C and .Ĉ, the two bases


Furthermore, if we consider .C and .C,
in .V ∗ , we have
13.2 Duality and Orthogonality 369

c = σiCj c j = δi j c j = ci and
. i (13.30)
γ =
. i σiCj γ j =γ i
which means (13.31)
C = Cˇ and C = Ĉ.
. (13.32)

For this reason, we often use expressions (see Eq. 13.30) like

v = cs v s = cs (cs |v).
. (13.33)

So, in this case, with an orthonormal basis in .V , we need only to consider one basis in
V and one basis in its dual .V ∗ . In most cases in physics and geometry, it is reasonable
.

to work with an orthonormal basis. Nevertheless, there are cases where choosing an
orthonormal basis is not possible, for example , when selecting coordinate bases on a
manifold with nonzero curvature. This is one of the main reasons why we considered
the general situation in the discussions of this section.

Summary

This chapter was introduced to dispel certain biases in physics related to the dual
space of a vector space and its necessity.
In physics, vector spaces with an inner product are initially used. As demonstrated
here, this renders the dual space practically obsolete and therefore leads to a series
of misunderstandings. However, a residual form remains known as the reciprocal
basis, which finds broad application in certain areas of physics. Reciprocal bases are
clearly visible and distinguishable when nonorthogonal bases are used.
In the context of tensors, it is also important to differentiate whether the tensors
are defined on an abstract vector space or on an inner product space. For example, the
tensors we have discussed so far have all been tensors in an abstract vector space. It
is only in the next chapter that tensors defined on an inner product space will appear
for the first time. These are precisely the tensors that are preferred in physics.
Chapter 14
Tensor Formalism

Without any exaggeration, we can say that in physics tensor formalism is needed
and used as much as linear algebra itself. For example, it is impossible to understand
electrodynamics, relativity, and many aspects of classical mechanics without tensors.
Therefore, there is no doubt that a better understanding of tensor formalism leads to
a better understanding of physics.
Engineers and physicists first came across tensors in terms of indices to describe
certain states of solids. They were first realized as very complicated, unusual objects
with many indices, and their mathematical significance was even questionable. Later,
mathematicians found out that these objects correspond to a very precise and exciting
mathematical structure. It turned out that this structure is a generalization of linear
algebra. This is multilinear algebra and can be considered the modern name for what
physicists usually refer to as tensor calculus.
This chapter will discuss tensor formalism, also known as multilinear algebra, in
a basis-independent way. But of course, as we know from linear algebra, we cannot
do without basis-dependent representations of tensors. From Sect. 3.5 and Chap. 8
we already know what tensors are, and we know at least one possibility to arrive at
tensor spaces. Before, this was obtained by explicitly utilizing bases of vector spaces.
Now, we would like to achieve this differently, which will allow us to further expand
and consolidate the theory of tensors.

14.1 Covariant Tensors and Tensor Products

We start with the most general definition of a multilinear map and consider a vector-
valued multilinear map. Dealing with various special cases later will allow a better
understanding of the topic.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 371
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0_14
372 14 Tensor Formalism

Definition 14.1 Multilinear vector valued map.


Suppose .V1 , . . . , Vk and . Z are vector spaces. A map

. f : V1 × · · · × Vk −→ Z
(v1 , . . . , vk ) |−→ f (v1 , . . . , vk ) ∈ Z

is said to be multilinear if it is linear in every variable individually, so with


.λ, μ ∈ R:

. F(v1 , . . . , λu i + μvi , . . . , vk ) = λ f (v1 , . . . , u i . . . vk ) + μ f (v1 , . . . , vi , . . . vk ).

We denote the space of such multilinear maps by

. T (V1 , . . . Vk ; Z ).

It is clear that if we take .k = 1, we have a linear map . f ∈ T (V ; Z ) ≡ Hom(V, Z ).


If we take .k = 2, . f ∈ T (V1 , V2 ; Z ) is a vector valued bilinear form and for .k = 3
we may talk about trilinear forms. Since . Z is a vector space, it is evident that . Z also
induces a vector structure on .T (V1 , . . . , Vk ; Z ):

.( f + g)(v1 , . . . , vk ) := f (v1 , . . . , vk ) + g(v1 , . . . , vk )

and
(λ f )(v1 , . . . , vk ) := λ f (v1 , . . . , vk ).
.

If we take . Z = R, we may also write

. T (V1 , . . . Vk ; R) ≡ T (V1 , . . . Vk ).

and in this case, we call . f a multilinear .R valued function or a multilinear form. If


we take .V1 = · · · = Vk =: V , we may also write

. T k (V ; Z ) or T k (V, R) ≡ T k (V ).

Definition 14.2 A covariant .k-tensor.


A covariant tensor on .V is a real valued multilinear function of .k vectors
of .V :
14.1 Covariant Tensors and Tensor Products 373

.ϕ : ,V × ·,,
· · × V, −→ R
k-times
(v1 , . . . , vk ) |−→ ϕ(v1 , . . . , vk ).

Such .ϕ are also called .k-linear forms of .k-forms or .k-tensors or more precisely
covariant .k-tensors.

When we talk about multilinear maps, we can ask what differences there are between
the properties of multilinear and linear maps. This leads us to the following comment
and remark.

Comment 14.1 The image of multilinear maps.

Although the image of a linear map, as we know, is a vector space, the image
of a multilinear map is not, in general. A simple way to verify this fact is to
consider a bilinear map given by the product of two polynomials. Consider the
.(n + 1)-dimensional and .(2n + 1)-dimensional spaces of polynomials;

R[x]n := span(1, x, . . . , x n ) and R[x]2n := span(1, x, . . . , x 2n ).


.

We set
. V = R[x]1 = {α0 + α1 x : α0 , α1 ∈ R}

and
. W = R[x]2 = {x0 + α1 x + α2 x 2 : α0 , α1 , α2 ∈ R}.

If we consider . f ∈ T 2 (V ; W ), given by

. f : V × V −→ W
(ϕ, χ) |−→ f (ϕ, χ) = ϕ · χ,

and take .ϕ, χ ∈ {1, x}, we see that .1, x 2 ∈ im f since .1 · 1 = 1 and .x · x =
x 2 . Yet, we observe that .1 + x 2 ∈
/ im f because otherwise .1 + x 2 would have
real roots, but as we know, both roots of .1 + x 2 = (x − i)(x + i) are strictly
imaginary.

Our next remark is more pleasant.


374 14 Tensor Formalism

Remark 14.1 The role of bases in multilinear maps.

Just as in the case of linear maps, any multilinear map. f ∈ T (V1 , . . . , Vk ; Z )


is uniquely determined by its values on the basis vectors of the .Vs .

Proof To avoid risking confusion due to burdensome notation, we will only prove
this in the case of a bilinear map . f ∈ T (U, V ; Z ): We take . B = (b1 , . . . , bk ), a basis
of .U , and .C = (c1 , . . . , cl ), a basis of .V . We set

. f (bs , ci ) = z si ∈ Z ,
s ∈ I (k), i ∈ I (l),

and for .u = bs u sB ∈ U and .v = ci vCi ∈ V , we define:

. f (u, v) := f (bs , ci )u sB vCi = z si u sB vCi .

This shows the uniqueness since every single one of .z si , u sB , vCi is uniquely given.
This . f is bilinear as the partial maps . f u : V → Z and . f v : U → Z are linear. For
. f u we have for example,

f (v) = f (u, v) = z si u sB vCi = (z si · u sB )vCi


. u

which is explicitly linear in V. ∎

Remark 14.2 Maps of covariant tensors, the pullback.

A fact that is valid only for covariant tensors, is the connection of covariant
tensors on different vector spaces: Any linear map of vector spaces,

. F: V −→ W
v |−→ F(v) = w

induces a linear map . F ∗

. F ∗ : T k (W ) −→ T k (V )
ψ |−→ F ∗ ψ,

given by .(F ∗ ψ)(v1 , . . . , vk ) := ψ(Fv1 , . . . , Fvk ).


14.1 Covariant Tensors and Tensor Products 375

This generalizes the dual map of covectors:

. F ∗ : W ∗ −→ V ∗
η |−→ F ∗ η, given by
(F ∗ η)(v) := η(Fv).

We may call . F ∗ pullback of covariant tensors.


We now present several examples which illuminate various aspects of multilinear
maps and thus also proceed towards the definition of a tensor. This should also show
how natural the notion of tensor product is, even if it looks very complicated.

14.1.1 Examples of Covariant Tensors

Example 14.1 Bilinear Maps


As we have already seen in Comment 14.1 and Remark 14.1, multiplication
provides an example of a bilinear map.

Example 14.2 Bilinear Forms


The inner product in .(V, s) and the dot product in .(R, <|>) are, as we know,
symmetric bilinear forms.

Example 14.3 Constant (Scalars)


When .k = 0, we define .T 0 (V ) = R and the elements of .T 0 (V ) are usually
called scalars.
376 14 Tensor Formalism

Example 14.4 Linear Forms


When .k = 1 we have

. T 1 (V, R) ≡ Hom(V, R) ≡ V ∗

and any element of .V ∗ is called a covector, a linear form, a linear functional,


or a .1-form. The different names for .ξ ∈ V ∗ demonstrate appropriately the
importance of .V ∗ . Indeed, this .V ∗ is very special in the context of multilinear
forms because, as we shall see, it generates multiplicatively .T k (V ) for every
.k ∈ {2, 3, . . . }.

The following examples also give a good idea of the corresponding multiplication,
the tensor product.

Example 14.5 On the Way to Tensor Products


This example shows a basic covariant tensor that is nontrivial and created by
two covectors leading to an elementary tensor product. Given the covectors .ξ
and .η, we can obtain in a natural way from the Cartesian product .V ∗ × V ∗ ,
the bilinear form .ξη on .V :

ξη :
. V ×V −→ R
(u, v) |−→ ξη(u, v) := ξ(u)η(v). (14.1)

This is obviously a nontrivial covariant tensor .ξη ∈ T 2 (V ).


At the same time, we obtain a bilinear product . P from the Cartesian product
∗ ∗
. V × V to the bilinear forms in . V :

. P: V∗ × V∗ −→ T 2 (V )
(ξ, η) |−→ P(ξ, η) := ξη ∈ T 2 (V ). (14.2)
14.1 Covariant Tensors and Tensor Products 377

Comment 14.2 The bilinear form .ξη and analysis.


The bilinear map . P can also be defined on all functions, not only on
linear maps. Given two sets . X and .Y and the two functions . f and .g
correspondingly on . X and .Y (we write . f ∈ F(X ) ≡ Map(X, R) and .g ∈
F(Y )), we may obtain a new function on the Cartesian product . X × Y :

. fg : X ×Y −→ R
( f, g) |−→ f g(x, y) := f (x)g(y). (14.3)

At the same time, we obtain the product . P:

. P: F(X ) × F(Y ) −→ F(X × Y )


( f, g) |−→ f g. (14.4)

This product can also be called a tensor product. The product . f g is nat-
urally induced since it is based on the multiplication on their common
codomain .R.

There is also a complementary view concerning the above product . P given in


Eq. (14.2). Instead of.T 2 (V ), we may introduce a second vector space.V ∗ ⊗ V ∗
which is directly algebraically generated from the Cartesian product .V ∗ × V ∗
by a new product which we denote here by .Θ. In this way, we obtain a second
bilinear map given by

Θ:
. V∗ × V∗ −→ V∗ ⊗ V∗
(ξ, η) |−→ Θ(ξ, η) := ξ ⊗ η. (14.5)

If we use a basis . B ∗ = (β 1 , . . . , β n ) = (β i )n ⊆ V ∗ , we can make this more


concrete with .i, j ∈ I (n):

.(β i , β j ) |−→ Θ(β i , β j ) := β i ⊗ β j . (14.6)

So we obtain
. V ∗ ⊗ V ∗ = span{β i ⊗ β j : i, j ∈ I (n)}. (14.7)

In addition, we can see that .V ∗ ⊗ V ∗ is a free vector space (see Defini-


tion 8.1 since every set can be used as a basis of some vector space) over
∗ ∗
. B × B = {(β , β ) : i, j ∈ I (n)} or equivalently over the set. I (n) × I (n) =
i j
378 14 Tensor Formalism

{(i, j ) where .i, j ∈ I (n)} since every set can be used as a basis of a vector
space!
.V ⊗ V ∼
∗ ∗
= R(I (n) × I (n)). (14.8)

As expected, .Θ is the tensor product and .V ∗ ⊗ V ∗ is a tensor space. It turns


out that this tensor product .Θ has a more general character. This means that
we also have a tensor product for .V . We denote it by the same symbol:

Θ : V × V −→ V ⊗ V
.

(u, v) |−→ Θ(u, v) := u ⊗ v. (14.9)

As we see, via the same kind of product .Θ, we also obtain a tensor space
V ⊗ V over .V . For this reason, we see various symbols in literature which are
.

used instead of our character .Θ, such as the symbol .⊗.


To fully understand what is happening, we have to compare .ξη with .ξ ⊗ η. We
do not expect, of course, .ξη and .ξ ⊗ η to be equal, and for the same reason,
we do not expect .T 2 (V ) and .V ∗ ⊗ V ∗ to be the same.
What we expect is the two vector spaces above to be equivalent, that is, we
expect a canonical isomorphism

. V∗ ⊗ V∗ ∼
= T 2 (V ). (14.10)

Indeed, both have the same dimension and the exact relationship between .ξη
and .ξ ⊗ η is given by the following equation:

ξ ⊗ η(u ⊗ v) := ξη(u, v) = ξ(u)η(v).


. (14.11)

This can also be expressed by the commutative diagram:


V∗ ⊗ V∗ T 2 (V )
Θ
P
V∗ × V∗

So we have:

. P̃(ξ ⊗ η)(u, v) = P(ξ, η)(u, v) = ξη(u, v) = ξ(u)η(v). (14.12)

We observe that . P is a bilinear and . P̃ a linear map. Because of the above


isomorphism .V ∗ ⊗ V ∗ ∼ = T 2 (V ), we may identify .ξ ⊗ η with .ξη and write
.ξ ⊗ η(u, v) = ξη(u, v), and we use the unambiguous notation .ξ ⊗ η, as is
often done in the literature.
14.1 Covariant Tensors and Tensor Products 379

Our intention is to give an informal introduction to tensor products with these


examples. It might be instructive to see the same results for .k = 3, as in the
following example.

Example 14.6 Tensor Product


Starting with the covectors.ξ, η, θ ∈ V ∗ and proceeding similarly as in example
5, we may write for their tensor product,

ξ ⊗ η ⊗ θ ∈ V∗ ⊗ V∗ ⊗ V∗
.

given by
.(ξ ⊗ η ⊗ θ)(u, v, w) = ξ(u)η(v)θ(w) (14.13)

and we expect a canonical isomorphism given by

. T 3 (V ) ∼
= V ∗ ⊗ V ∗ ⊗ V ∗. (14.14)

Comment 14.3 Product of polynomials as tensor product.

Taking into account the above considerations and coming back to Comment
14.1, we can additionally give a new interpretation of the product of polynomials:
Taking. f = Θ, we have.Θ(ϕ, χ) = ϕ ⊗ χ. This means that we can also consider
the product of polynomials as example of a tensor product.

We are now ready for the general definition of a tensor product.

Definition 14.3 Tensor product on .V .


For two given covariant tensors,.T ∈ T k (V ) and. S ∈ T l (V ), the tensor product
.⊗ is defined by

Θ : T k (V ) × T l (V ) −→ T k+l (V )
.

(T, S) |−→ T ⊗S(v1 , . . . , vk , vk+1 , . . . , vk+l ) := T (v1 , . . . , vk )S(vk+1,...,vk+l ).

We say .T ⊗ S is a covariant tensor of rank or order .k + l.


380 14 Tensor Formalism

From the multilinearity of .T and . S follows the multilinearity of their product


since the expression
. T (v1 , . . . , vk ) S(vk+1 , . . . , vk+l )

is linear in each argument .vi , i ∈ I (k + l) separately. It is further clear that the


operation
Θ
.(T, S) |→ T ⊗ S

is bilinear; since when .T, T1 , T2 ∈ T k (V ), S, S1 , S2 ∈ T l (V ) and .λ1 , λ2 , μ1 , μ2 ∈


R:
.(λ1 T1 + λ2 T2 ) ⊗ S = λ1 (T1 ⊗ S) + λ2 (T2 ⊗ S)

and
. T ⊗ (μ1 S1 + μ2 S2 ) = μ1 (T ⊗ S1 ) + μ2 (T ⊗ S2 ).

As we see, tensor product is bilinear, just as every product should be.


In addition, it is associative:

. L ⊗ (T ⊗ S) = (L ⊗ T ) ⊗ S

which is also easy to verify. This means that we can write tensor products of several
tensors without parentheses.
Since.T k (V ) is a vector space, we can choose a basis and determine the coefficients
of .T ∈ T k (V ).

Proposition 14.1 Basis in .T k (V ).


Let .V be a vector space. If . B = (b1 , . . . , bn ) is a basis of .V and . B ∗ =
(β 1 , . . . , β n ) a basis of .V ∗ with .β j (bi ) = δi i, j ∈ I (n), then there is a basis
j

of .T (V ) given by
k

. B (k) = {β i1 ⊗ . . . ⊗ β ik : i 1 , . . . , i k ∈ I (n)}. (14.15)

It follows that when .T ∈ T k (V ), the decomposition

. T = τi1 ...ik β i1 ⊗ . . . ⊗ β ik ,

with .τi1 ...ik ∈ R and dim T k (V ) = n k .

Proof For a basis we have to show that


(i) .span(B (k) ) = T k (V ) ,
(ii) . B (k) is linearly independent .
14.1 Covariant Tensors and Tensor Products 381

We first show (i).


The multilinear form .T is uniquely determined by its values on certain lists of basis
vectors like .(bi1 , . . . , bi k ) by Remark 14.1. Therefore, .T is defined by the values

. T (bi1 , . . . , bi k ) = τi1 ...ik . (14.16)

That . B (k) spans .T k (V ) means that we have .T as a linear combination

. T = λi1 ...ik β i1 ⊗ . . . ⊗ β ik with λi1 ...ik ∈ R.

Using the multilinearity of .T , we may determine .λi1 ...ik , taking

. T (b j1 , . . . , b jk ) = λi1 ...ik (β i1 ⊗ . . . ⊗ β ik )(b j1 , . . . , b jk )


= λi1 ...ik (β i1 (b j1 ) . . . β ik (b jk )
= λi1 ...ik δ ij11 . . . δ ijkk
= λ j1 ... jk .

This gives, based on Eq. (14.16) ., λi1 ...ik = Ti1 ...ik and (a) holds.
To show (ii), we set .λi1 . . . i k β i1 ⊗ . . . ⊗ β ik = 0 and by the same computa-
tion as above, we obtain .λi1 . . .ik = 0, so (b) is also valid and the proposition is
proven. ∎

Comment 14.4 Determination of the coefficients of tensors.

Notice that with this proof we once again showed that the coefficients of .T
are given by .T (b j1 , . . . , b jk ). Additionally, this result makes the expression

. T k (V ) ∼
= ,V ∗ ⊗ ·,,
· · ⊗ V ,∗
k-times

and the corresponding identification

. T k (V ) = ,V ∗ ⊗ ·,,
· · ⊗ V ,∗
k-times

plausible.

As we see, every covariant tensor can be written as a linear combination of tensor


products of covectors. Further more, we may always assume that every .T ∈ T k (V )
can be written as .T = μi1 ...ik ξ i1 ⊗ . . . ⊗ ξ ik with .ξ i1 , . . . , ξ ik ∈ V ∗ not necessarily
basis vectors of .V ∗ . We see this if we take
382 14 Tensor Formalism

ξ i1 = ξ ij1 β j , . . . , ξ ik = ξ ijk β j ,
. (14.17)

with
ξ i1 , . . . , ξ ijk ∈ R
. j (14.18)

and do the same calculation as above.

Definition 14.4 Decomposable tensors.


A tensor of the form .ξ 1 ⊗ . . . ⊗ ξ k is called decomposable, .ξ 1 , . . . , ξ ∗ ∈ V ∗ .

We stated above that the set of decomposable tensors spans .T k (V ).

Comment 14.5 Tensor algebra on a vector space.

We may ask: Well, we see the product, the so-called tensor product. But
where is the algebra? Because in Definition 14.3, we leave both, .T k (V ) and
. T (V ).
l

The answer is that we have to take all the tensor spaces together and so to
obtain the tensor algebra over .V :

. T ∗ (V ) := T 0 (V ) ⊕ T 1 (V ) ⊕ · · · ⊕ T k (V ) ⊕ · · · .

14.2 Contravariant Tensors and the Role of Duality

In Example 14.5 in the last section, we also saw, in addition to the covariant tensor
space .V ∗ ⊗ V ∗ , the tensor space of contravariant tensors .V ⊗ V . It is reasonably
possible that as a student of physics, the first tensor space we meet is .V ⊗ V . We
first started with covariant tensors because, coming from analysis, it seems quite
natural to introduce and discuss the tensor product within covariant tensors. This
was also demonstrated in Example 14.5.
It is, therefore, necessary to explain the connection between covariant and con-
travariant tensors. As the reader has probably already realized, this connection has a
name: duality. To clarify the role of duality thoroughly, it is helpful to revue what we
know and to consider all the relevant possibilities which appear. If we start by com-
paring .V with its dual .V ∗ , we also have to compare .V ∗ with its dual .(V ∗ )∗ ≡ V ∗∗ ,
the so-called double dual of .V . We cannot stop here since we have also to compare
∗∗
.V with .(V ∗∗ )∗ and so on. But when the dimension of .V is finite, this procedure
stops.
14.2 Contravariant Tensors and the Role of Duality 383

As we saw in Eq. 13.17 in Sect. 13.1, it turns out that.V ∗∗ is canonically isomorphic

to .V : V ∗∗ can
= V . As in Sect. 14.1, . V is an abstract vector space without further

structure. We know that in this case .V and .V ∗ are isomorphic, but this isomorphism
is basis dependent: .V ∼= V ∗ (noncanonical). So we have, for example,
B

.ψ B : V −→ V ∗
bi |−→ β i (14.19)

with .β i (b j ) = δ ij i, j ∈ I (n). The identification between .V and .V ∗ in the case of


an abstract vector space .V , is not possible. Again, if we want to compare .V ∗ with
∗ ∗ ∗∗
.(V ) ≡ V ≡ Hom(V ∗ , R), the same is true for the same reason.
Yet, if we want to compare .V with .V ∗∗ , the situation changes drastically. Here,
there exists a canonical isomorphism and an identification is possible:


= ∗∗
.ev :V →V
can
v |−→ v := ev(v) ∈ Hom(V ∗ , R),
#
(14.20)

given by

.v # :V ∗ −→ R
ξ |−→ v # (ξ) := ξ(v). (14.21)

This is the reason why we may identify .V with .V ∗∗ : V ∗∗ = V , and we do not make
a distinction between the elements .v # and .v. This identification is beneficial since it
simplifies a lot the tensor formalism. On the other hand, if we want to profit from
this simplification, we have to get a good understanding of this identification.
Proceeding similarly as in Sect. 14.1, we consider contravariant tensors essentially
by exchanging .V and .V ∗ . We now consider multilinear functions on .V ∗ instead of
considering them on .V . So the contravariant tensor of order .k, which we denote by

. T (V ),
k

. T (V ) ∼

k
= V ,⊗ ·,,
· · ⊗, V.
k-times

Taking into account Proposition 14.1 and using the notation . B # = (b1# , . . . , bn# ) for
a basis in .V ∗∗ and the identification .bi# = bi , we obtain for .T ∈ T k (V ∗ )

. T = bi#1 ⊗ · · · ⊗ bi#k τ i1 ···ik = bi1 ⊗ · · · ⊗ bik τ i 1 ···1k .

That should be compared to the actual result in Proposition 14.1 for a covariant
tensor.
384 14 Tensor Formalism

. T ∈ T k (V ) :
T = τi 1 ···ik β i1 ⊗ · · · ⊗ β ik .

For .T k (V ∗ ) we find in the literature also the notation .Tk (V ) := T k (V ∗ ). So we have,


including the corresponding identifications, the expressions

T (V ) = V ⊗
. k · · ⊗, V and T k (V ) = ,V ∗ ⊗ ·,,
, ·,, · · ⊗ V ,∗ .
k-times k-times

14.3 Mixed Tensors

We start with an example of a mixed tensor, the following canonical bilinear form
given by .V ∗ and .V which is the evaluation of covectors on vectors:

. φ : V ∗ × V −→ R
(ξ, v) |−→ φ(ξ, v) := ξ(v). (14.22)

This bilinear form .φ is nondegenerate and inherently utilizes the canonical isomor-
phism between .V and .V ∗∗ , given by the partial map .ψv .

φ̃ : V −→ (V ∗ )∗
.

v |−→ ψv . with (14.23)



.ψv : V −→ R

ξ |−→ ψv (ξ) = φ(ξ, v) = ξ(v). (14.24)

This.ψv is the same as the map.v # of Eq. (14.21) and we notice again the identification
.ψv = v . With this preparation, it is easy to define a mixed tensor.
#

Definition 14.5 Mixed tensors.


Let .V be a finite as usual dimensional vector space and .k and .l be nonnegative
integrals. The space of mixed tensors on .V of type .(k, l) is given by

T k (V ) = ,V ∗ ⊗ ·,,
. l · · ⊗ V ,∗ ⊗ ,V ⊗ ·,,
· · ⊗ V, . (14.25)
k-times l-times

Using again the reasoning of Proposition 14.1, we may denote a mixed tensor . S ∈
Tlk (V ) by:
j1 ··· jl i 1
. S = σi ···i β ⊗ · · · ⊗ β ⊗ b j1 ⊗ · · · ⊗ b jl .
ik
1 k
(14.26)
14.4 Tensors on Semi-Euclidean Vector Spaces 385

With the corresponding coefficients of . S given in the basis .(B, B ∗ ), we get


j ··· j
. S = σi11···ikl = S(bi1 , · · · , bil β j1 , · · · , β jk ) ∈ R.

Mixed tensors are the most general case in which we need a change of basis
formula. Using the notation of Example 8.1, we consider the bases . B = (bs ), B ∗ =
(β s ) and .C = (ci ), C ∗ = (γ i ) of .V and .V ∗ correspondingly, with .β r (bs ) = δsr and
j
.γ (ci ) = δi , and .r, s, i, j ∈ I (n).
j

The change of basis transformation is given by the regular matrix:

. T = TC B = (τsi ) ∈ Gl(n)
and
T −1 = TBC = (τ̄is ).

We denote the coefficient of . S in the basis .(B, B ∗ ) by .σ(B)rs11 ···s


···rk and in the basis
l

∗ i 1 ···il
.(C, C ) by .σ(C) j ··· j .
1 k
So the transition map at the level of tensors is:

σ(C)ij11···i
.
il r1 rk s1 ···sl
··· jk = τs1 · · · τsl τ̄ j1 · · · τ̄ jk σ(B)r1 ···rk .
l i1

14.4 Tensors on Semi-Euclidean Vector Spaces

In the last section, the vector space .V was an abstract vector space without further
structure. This section considers tensors on a vector space with additional structures.
The most crucial additional structure on an abstract vector space in physics is a
Euclidean or semi-Euclidean structure. We add to the vector space .V a symmetric
nondegenerate bilinear form .s ∈ T 2 (V ) which is, as we know, a special covariant
tensor of rank 2. So we obtain .(V, s), a Euclidean vector space if .s is positive definite
or a semi-Euclidean vector space (e.g. Minkowski space in special relativity) if .s is
symmetric and nondegenerate.
In both cases, the connection between .V and .V ∗ changes drastically. As we
saw in Sect. 13.2, given the metric .s, there exists a canonical isometry between

. V and . V :

= (V ∗ , s ∗ ).
.(V, s)can

This means that we identify not only.V with.V ∗∗ but also.V with.V ∗ . The identification

.V = V essentially indicates that we can forget the dual .V ∗ and that the vector
space .V alone is relevant for the tensor formalism. This means that the distinction
between covariant and contravariant tensors is obsolete: Given .(V, s), we also have
the identification .T k (V ∗ ) = T k (V ) or equivalently
386 14 Tensor Formalism


. · · ⊗ V ,∗ = ,V ⊗ ·,,
,V ⊗ ·,, · · ⊗ V, .
k−times k−times

We therefore have only one kind of tensor. The distinction between covariant and
contravariant tensors is only formal and refers to the representation (the coefficients)
of a tensor relative to a given basis in .V . This follows directly from our discussion in
Sect. 13.2. The canonical isometry brings the basis . B ∗ = (β 1 , . . . , β n ) in .V ∗ down
to .V (see Remark 13.2):

š(β 1 ) = b1 , . . . , š(β n ) = bn .
.

So we have two bases in .V , the original . B = (b1 . . . , bn ) and . B̌ = (b1 , . . . , bn )


j
given by .b j = σ ji bi (Proposition 13.2), with .s(bi , b j ) =< bi | b j >= δi . We call
. B a covariant basis and . B̌ a contravariant basis of . V . For a given tensor . T ∈ T (V ),
k

we have two corresponding representations given by

. T = τ i1 ···ik bi1 ⊗ · · · ⊗ bik = τi1 ···ik ⊗ bi1 ⊗ · · · ⊗ bik .

Accordingly, we may call .τ i1 ···ik a contravariant coefficient and .τi1 ···ik a covariant
coefficient for the only given tensor .T . Similarly, we can continue with mixed coeffi-
cients of the type .(k, l). If we considered .V without a metric, as in a previous section,
we would have for .T ∈ Tlk (V )
j ···i
. T = τi11···ikl b j1 ⊗ · · · ⊗ b jl ⊗ β i1 ⊗ · · · ⊗ β ik .

Here, we consider .V with metric .(V, s), so we have


j ··· j
. T = τi11···ik l b j1 ⊗ · · · ⊗ b jl ⊗ bi1 ⊗ · · · ⊗ bik .

In this case, .T is a tensor of rank .m = k + l. We further see that a fixed basis . B, in


fact, leads to different representations for the same tensor .T ∈ ,V ⊗ ·,,· · ⊗ V,. So we
(k+l)-times
may set symbolically
···i k
.τ i1 ···τm ←→ τ j1 ··· jm ←→ τ ij11··· jQ .

with .i 1 , . . . , i m ∈ I (m), m = k + l.

14.5 The Structure of a Tensor Space

In this section, we introduce the universal property of tensors.


14.5 The Structure of a Tensor Space 387

14.5.1 Multilinearity and the Tensor Product

Suppose, as always, .V is a vector space of dimension .n. We start with a comparison


between the vector space
∗ ∗
. T (V ) = V ⊗ · · · ⊗ V
k
, ,, ,
k-times

and an abstract vector space .W of the same dimension .(dim W = n k ). The two
vector spaces .T k (V ) and .W are isomorphic as abstract vector spaces because of their
identical dimensions.
It should also be evident that a vector space like .V ⊗ · · · ⊗ V has more structure
than the vector space .W . This is so for at least two reasons.
Firstly, the vector space .V ⊗ · · · ⊗ V is different from .W since its elements have
the form of a linear combination of .v1 ⊗ · · · ⊗ vk , and secondly we have some kind
of product on it (the tensor product). The identification .T k (V ) = V ∗ ⊗ · · · ⊗ V ∗ and
the explicit presence of the tensor product symbol .⊗ make this algebraic structure
visible (see Comment 14.4).
On the other hand, such an algebraic structure is not available on the abstract vector
space .W . Furthermore, the comparison between the multilinear maps in .T k (V ∗ ) and
the tensor product .V ⊗ · · · ⊗ V or equivalently the multilinear maps .T k (V ) and
∗ ∗ ∗
. V ⊗ · · · ⊗ V , is an instructive one: Expressions like . V ⊗ · · · ⊗ V or . V ⊗ · · · ⊗

V are by definition purely algebraic objects. Since we start with an abstract vector
space .V , its elements are just vectors and not maps, the elements of .V ⊗ · · · ⊗ V are
vectors (tensors), not maps or even multilinear forms like the elements of .T k (V ∗ ).
The same holds for .V ∗ and .V ∗ ⊗ · · · ⊗ V ∗ since by definition going from .V ∗ to
∗ ∗
. V ⊗ · · · ⊗ V , we follow the same procedure as going from . V to . V ⊗ · · · ⊗ V . This

means that in such an algebraic construction, we discard in .V ∗ any other structure


different from the structure of an abstract vector space. With this in mind, we can say
that .V ∗ ⊗ · · · ⊗ V ∗ is a purely algebraic object as opposed to the multilinear forms
. T (V ).
k

14.5.2 The Universal Property of Tensors

We now return to Example 14.5 in Sect. 14.1 and once more discuss the relation of
tensors as multilinear maps to linear maps we found there (see diagram in Example
14.5), from a more general point of view. A tensor space is also a special vector space
that allows us to consider multilinear maps as linear maps. The domain of this special
linear map is the tensor space which corresponds to the given multilinear map. This
leads to the following proposition:
388 14 Tensor Formalism

Proposition 14.2 Universal property of tensor products.

Let .U and .V be two real vector spaces. There exists a vector space .U ⊗ V
together with a bilinear map

Θ : U × V −→ U ⊗ V,
. (14.27)

with the following "universal" property:


For every vector space . Z , together with a bilinear map,

. ϕ : U × V −→ Z , (14.28)

there exists a unique linear map,

ϕ̃ : U ⊗ V −→ Z ,
.

such that .ϕ = ϕ̃ ◦ Θ.

This corresponds to the commutative diagram


ϕ̃
U ⊗V Z
Θ
ϕ
U ×V

So we have .dim U ⊗ V = dim U · dim V and the canonical isomorphism

. T (U, V ; Z ) ∼
= Hom(U ⊗ V, Z ). (14.29)

Proof The use of bases makes the proof quite transparent. Let

. B = (b1 , . . . , bn ) = (bs )n and C = (c1 , . . . , cm ) = (ci )m

be the corresponding bases for .U and .V . We consider the set . B × C = {(bs , ci ) :


s ∈ I (n), i ∈ I (m)}. The two multilinear maps .Θ and .ϕ are, as we know, uniquely
defined on the set . B × C. We define .U ⊗ V to be the free vector space generated by
. B × C,
.U ⊗ V := R(B × C)

with basis .bs ⊗ ci := Θ(bs , ci ) (see Definition 8.1).


It is clear that .dim U ⊗ V = n · m by construction. This corresponds to the .n · m
values .ϕ(bs , ci ) := z si ∈ Z of the bilinear map .ϕ ∈ T (U, V ; Z ).
14.5 The Structure of a Tensor Space 389

There is a unique linear map .ϕ̃ ∈ Hom(U ⊗ V, Z ) defined on the basis elements
by

U
. ⊗ V ϕ̃ Z


bs ⊗ ci |→ ϕ̃(bs ⊗ c1 ) := z si . (14.30)

So we have
ϕ̃(Θ(bs , ci )) = ϕ(bs , ci )
.

or equivalently
. ϕ̃ ◦ Θ = ϕ.

The bijectivity.ϕ ↔ ϕ̃ is given by construction and we have a canonical isomorphism

. T (U, V ; Z ) ∼
= Hom(U ⊗ V, Z ). (14.31)

Remark 14.3 With the isomorphism .T (U, V ; Z ) ∼ = Hom(U ⊗ V, Z ) we


gave above and which depended on a choice of basis of .U and .V , it turns
out that one can prove that all choices lead to the same isomorphism, and thus,
that this isomorphism is a canonical isomorphism.

Corollary 14.1 Universal property of tensors with . Z = R.


The special case with . Z = R leads to
ϕ̃
U ⊗V R
Θ
ϕ
U ×V

and to
. T 2 (U, V ) ∼
= Hom(U ⊗ V, R) ≡ (U ⊗ V )∗ . (14.32)

If we use the assertion of Example 14.5 in Sect. 14.1 and the isomorphism
. V∗ ⊗ V∗ ∼ = T 2 (V ) of Eq. (14.10), and after identification .T 2 (U, V ) = U ∗ ⊗

V , we obtain additionally the equation

(U ⊗ V )∗ = U ∗ ⊗ V ∗ .
. (14.33)
390 14 Tensor Formalism

The above proposition along with its proof give a good understanding of the prop-
erties of linear and multilinear maps. Tensor spaces as domains of linear maps of the
form .V ⊗ · · · ⊗ V , corresponding to multilinear maps .T k (V ∗ ), can be considered as
algebraic manifestations of multilinear maps. This makes some properties of multi-
linear maps more transparent and justifies once more the identification of, a priori
different objects like

. T k (V ) = (V ⊗ · · ⊗, V )∗
, ·,, = V∗ ⊗
, ·,,
· · ⊗, V k ; (14.34)
k-times k-times
. T k (V ∗ ) = (V ∗ ⊗ · · ⊗, V ∗ )∗ = (V ⊗
, ·,, , ·,,
· · ⊗, V ). (14.35)
k-times k-times

14.6 Universal Property of Tensors and Duality

This section summarizes and discusses some of the essential identifications within
tensor formalism which stem from duality and the universal properties of tensor
products. We also comment shortly on the relevant proofs since this gives a better
understanding of the key relations. When we consider duality, we use the identifi-
cation .V ∗∗ = V and the expected identification .(U × V )∗ = U ∗ ⊗ V ∗ for which, on
this occasion, we prove the corresponding canonical isomorphism.
We first discuss the following isomorphisms.

Proposition 14.3 Canonical isomorphisms in tensor formalism.

(i) .(U ⊗ V )∗ ∼
= T (U, V );
(ii) .U ∗ ⊗ V ∗ ∼
= T (U, V );
(iii) .(U ⊗ V )∗ ∼
= U ∗ ⊗ V ∗.

Proof Proof of (i)


The isomorphism (i) is best obtained from the commutative diagram
ϕ̃
U ⊗V R
Θ
ϕ
U ×V

Proposition 14.2 in Sect. 14.5.2 leads to the bijection .ϕ ↔ ϕ̃ with .ϕ ∈ T (U, V ) and
ϕ̃ ∈ (U ⊗ V )∗ . It leads to the canonical isomorphism .Θ̃:
.
14.6 Universal Property of Tensors and Duality 391

. T (U, V ) −→ (U ⊗ V )∗
ϕ |−→ ϕ̃

given by

=
. → (U ⊗ V )∗
T (U, V ) −
Θ̃

and to the identification


. T (U, V ) = (U ⊗ V )∗ .

So (i) is proven. ∎

Proof Proof of (ii)


The isomorphism (ii) is obtained again from Proposition 14.2 and the commutative
diagram

ψ̃
U∗ ⊗ V ∗ T (U, V )
Θ
ψ
U∗ × V ∗

Here, we need to show that the linear map .ψ̃ is a bijection. Since .dim((U ∗ ⊗ V ∗ ) =
dim T (U ∗ , V ∗ ), we only have to show that .ψ̃ is injective, see Example 14.5 in Sect.
14.1, with the corresponding notation . P = ψ and . P̃ = ψ̃. So we now have

ψ : U ∗ × V ∗ −→ T (U, V ),
.

ψ̃ : U ∗ ⊗ V ∗ −→ T (U, V ),
ξ ⊗ η | −→ ψ̃(ξ ⊗ η) = ψ(ξ, η) = ξη,

and with .Θ, Θ̃ for the universal property:

Θ̃
. T (U ∗ , V ∗ ; T (U, V )) −→ Hom(U ∗ ⊗ V ∗ , T (U, V )),
ψ −→ ψ̃.

Finally, we have to show the injectivity of .ψ̃. We have for .(β 1 , . . . , β n ) = (β s )n ,


basis of .U ∗ and .(γ 1 , . . . , γ m ) = (γ i )m , basis of .V ∗ , with
!
. ψ̃(ξ ⊗ η)(β s ⊗ γ i ) = ξη(β s , γ i ) = ξ(β s )η(γ i ) = 0.

This means .ψ̃(ξ ⊗ η)(β s , γ i ) = ξ(β s )η(γ i ) = 0 for all .s and .i. We obtain .ψ̃(ξ ⊗
η) = 0∗ ∈ U ∗ ⊗ V ∗ and .ψ̃ is indeed injective, and .ψ̃ is the canonical isomorphism:
392 14 Tensor Formalism

U∗ ⊗ V ∗ ∼
. = T (U, V ) (isomorphism (ii))

corresponding to the identification

U ∗ ⊗ V ∗ = T (U, V ).
.

So (ii) is proven. ∎

Proof Proof of (iii)


From the isomorphisms in (i) and (ii), we also obtain directly

(U ⊗ V )∗ ∼
. = U ∗ ⊗ V ∗ (isomorphism (iii)).

Summarizing, we have, altogether, identifications:

. T (U, V ) = (U ⊗ V )∗ , T (U, V ) = U ∗ ⊗ V ∗ , (U ⊗ V )∗ = U ∗ ⊗ V ∗

and, by taking duals, we obtain

. T (U ∗ , V ∗ ) = U ⊗ V, T (U ∗ , V ∗ ) = U ⊗ V, (U ∗ ⊗ V ∗ )∗ = U ⊗ V.

The following proposition is useful in general relativity, particularly in connection


with curvature.

Proposition 14.4 Canonical isomorphism between vector-valued multilinear


maps and multilinear forms.
Let .V, Z be vector spaces and .k ∈ N. There exists the following canonical
isomorphism.
. T (V ; Z ) ∼
= T (k+1) (V, Z ∗ ).
k

with
. T k+1 (V, Z x ∗) = T (V, . . . , V .
, ,, ,
k-times,Z ∗ )

Proof This isomorphism is a generalization of the canonical isomorphism between


˜ ∗∗ . In addition, the proof here goes along the same line:
vectors and bivectors: . Z =Z
We consider the dual pairing,

. Z∗ × Z −→ R
(ξ, z) |−→ ξ(z) ∈ R.
14.6 Universal Property of Tensors and Duality 393

The evaluation map .z # ∈ Z ∗∗ is given, as usual, by .z # (ξ) = ξ(z) whenever .ξ ∈ Z ∗ .


There is a similar relationship for tensor maps too:

F
. T k (V ; Z ) −→ T k+1 (V, Z ∗ )
ψ |−→ ψ# ,

where
.ψ # (v1 , . . . , vk , ξ) := ξ(ψ(v1 , . . . , vk )) ∈ R. (14.36)

The analogy with . Z =Z ˜ ∗∗ becomes visible if we fix .(v1 , . . . , vk ) and set .ψ̃ :=
ψ(v1 , . . . , vk ) ∈ Z and .ϕ̃ := ϕ(v1 , . . . , vk , ξ) ∈ R. The canonical isomorphism
˜ ∗∗ corresponds to
. Z =Z

. T (V ; Z ) ∼
= T k+1 (V, Z ∗ ).
k

So the proposition is proven. ∎


We get the identification

. T k (V ; Z ) = T k+1 (V, Z ∗ ).

Corollary 14.2 From this, we can also obtain the following interesting special
cases. When .k = 1 we have

. Hom(V, Z ) ≡ T 1 (V ; Z )

and the identification

. Hom(V, Z ) = T (2) (V, Z ∗ )

or
. Hom(V, Z ) = V ∗ ⊗ Z .

When .k = 0, we may write .T (0) (V ; Z ) ≡ Z and we get

. T 0 (V ; Z ) = T (1) (V, Z ∗ ),

which is the same as

. T 0 (V ; Z ) = T 1 (Z ∗ ) or
Z = Z ∗∗ .
394 14 Tensor Formalism

14.7 Tensor Contraction

The tensor contraction, also known as the contraction or trace map, is a simple
operation on tensor spaces that we already know. It is essentially the evaluation of
a covector applied to a vector giving a scalar. It was used at the beginning of Sect.
14.3 about mixed tensors and the beginning of the proof in Proposition 14.4. The
importance of this operation explains our repeating. We consider the map again:

φ : V∗ × V
. −→ R
(ξ, v) |−→ φ(ξ, v) := ξ(v).

or equivalently the map corresponding to the universal property of tensor products


in Proposition 14.2:

φ̃
V∗ ⊗ V R

Θ
φ

V∗ × V

Here, we denote the relevant map by .C.

C : V∗ ⊗ V
. −→ R
(ξ ⊗ v) |−→ C(ξ ⊗ v) = ξ(v).

The map .C is called tensor contraction. As we see, both maps, .φ and .C above, are
basis-independent. This .C can also be considered as an operator on tensor spaces.

. C : T11 (V ) −→ T00 (V ).

Comment 14.6 If we apply this reasoning to an operator . f ∈ Hom(V, V ), we


may recognize that this corresponds to taking the trace of . f .

From Corollary 14.2 in the previous section, there follows the canonical
isomorphism:
∼ ∗
. Hom(V, V )=
ιV ⊗ V

and the trace of f is given by the decomposition


14.7 Tensor Contraction 395

tr
. Hom(V, V ) −→ R
f |−→ tr ( f ) := C ◦ ι( f ).

With a given basis . B of .V , this takes the form .tr ( f ) := tr ( f B B ) := tr (ϕis ) =


ϕii ∈ R which is the well-known sum of the diagonal elements of the represen-
tation matrix . f B B . From the transformation properties of the matrix . f B B , we see
again that .tr ( f ) does not depend on the chosen basis . B.

We would now like to extend the tensor contraction .C of the tensor of type .(k, l):

C ij : Tlk (V ) −→ Tl−1
.
k−1
(V ).

For this purpose, we first consider the tensor space .V ∗ ⊗ · · · ⊗ V ∗ ⊗ V ⊗ · · · ⊗ V


of type .(k, l). We have to specify the position of one covector space .Vi∗ with .i ∈ I (k)
and one vector space .V j with . j ∈ I (l), and we can write

T k (V ) = V1∗ ⊗ · · · ⊗ Vk∗ ⊗ V1 ⊗ · · · ⊗ Vl .
. l

With appropriate notation and numeration, the operation .C ij is given by:

C ij
∗ ⊗ ··· ⊗ V∗ ⊗ V ⊗ ··· ⊗ V
. V1 −→ R ⊗ V1∗ ⊗ · · · ⊗ Vk−1
∗ ⊗ V ⊗ ··· ⊗ V
k 1 l 1 l−1
· · · ⊗ ξi ⊗ · · · ··· ⊗ vj⊗ | −→ ξ i (v j )ξ 1 ⊗ · · · ⊗ ξ k−1 ⊗ v1 ⊗ · · · ⊗ vl−1 .

Evidently, .C ij (· · · ⊗ ξ i ⊗ · · · ⊗ v j ⊗ · · · ) is a tensor of type .(k − 1, l − 1).

Example 14.7 .T = ξ 1 ⊗ ξ 2 ⊗ v1 ⊗ v2 ⊗ v3
If we take .T = ξ 1 ⊗ ξ 2 ⊗ v1 ⊗ v2 ⊗ v3 , we obtain

.C31 (ξ 1 ⊗ ξ 2 ⊗ v1 ⊗ v2 ⊗ v3 ) = ξ 1 (v3 )ξ 2 ⊗ v1 ⊗ v2 ∈ T21 (V ).

Similarly, we get

.C21 (ξ 1 ⊗ ξ 2 ⊗ v1 ⊗ v2 ⊗ v3 ) = ξ 1 (v2 )ξ 2 ⊗ v1 ⊗ v3 ∈ T21 (V ).

We can also consider .Tlk (V ) as a multilinear form. In this case, the analogous
procedure for the application of contraction .C ij leads to the following development:
Consider .T ∈ Tlk (V ), with
396 14 Tensor Formalism

. · · ×, V × V ∗ ×
T : V ,× ·,, · · ×, V ∗ −→ R.
, ·,,
k-times l-times

We first fix the vectors .v1 , . . . , vk−1 and the covectors .ξ 1 , . . . , ξ l−1 and we then
consider a basis . B = (b1 , . . . , bn ) in .V and its dual .B = (β 1 , . . . , β ' ) so that
.(β (bs ) = δs i, s ∈ I (n)). We define:
i i

.C ij T (v1 , . . . , vk−1 , ξ 1 , . . . , ξ l−1 )



n
:= T (v1 , . . . , v j−1 , bs , v j+1 , . . . vk , ξ 1 , . . . , ξ i−l , β s , ξ i+1 , . . . , ξ l ).
s=1

Taking the vector .bs ∈ V in position . j, and the covector .β s ∈ V ∗ in position .i.
With these definitions, we have .(C ij T )(v1 , . . . , vk−1 , ξ 1 , . . . , ξ l−1 ) ∈ R so that
.C j T is evidently a multilinear form of type .(k − 1, l − 1) and .C j a contraction of . T
i i

corresponding to .(i, j):

C ij
T k (V ) −→ Tl−1
. l
k−1
(V ).

Example 14.8 .T (v1 , v2 , v3 , ξ 1 , ξ 2 )


For .T (v1 , v2 , v3 , ξ 1 , ξ 2 ) and .i = 1 and . j = 3:


n
C31 T (v1 , v2 , ξ 2 ) =
. T (v1 , v2 , bs , β s , ξ 2 ).
s=1

If instead .i = 1 and . j = 2, then :


n
C21 T (v1 , v3 , ξ 2 ) =
. T (v1 , bs , v3 , β s , ξ 2 ).
s=1

Example 14.9 Example 14.8 in coefficients.


For the coefficients relative to the bases . B = (bs ) and . B ∗ = (β r ), with
.s, r, i 1 , i 2 , j3 ∈ I (n) in the above examples, we have:
References and Further Reading 397


n
C31 T (bi1 , bi2 , β j1 ) =
. T (bi1 , bi2 , bs , β s , β j1 )
s=1
sj
= τi1 i12 s

and similarly:


n
C21 T (bi1 , bi2 , β j1 ) =
. T (bi1 , bs , bi2 , β s , β j1 )
s=1
sj
= τi1 si1 2 .

Summary

This was the third time we delved into tensors. The first two approaches in Sect. 3.5
and Chap. 8 were basis-dependent to facilitate understanding. Simultaneously, our
index notation, which we have used throughout linear algebra, significantly facilitated
this understanding.
In this chapter, it was time for the basis-free and coordinate-free treatment of
tensor formalism. In this sense, we could affirm that a tensor is a multilinear map.
Since we initially considered abstract vector spaces, the distinction between covari-
ant, contravariant, and mixed tensors was necessary. Following that, we introduced
and discussed an inner product vector space, establishing and discussing the corre-
sponding tensor quantities.
After this, the universal property of the tensor product was introduced, allowing
for a deeper understanding of the concept of tensors. Finally, using the universal
property, several commonly used relationships, essentially involving the dual space
of a tensor product of two vector spaces, were proven.

References and Further Reading

1. M. DeWitt-Morette, C. Dillard-Bleick, Y. Choquet-Bruhat, Analysis, Manifolds and Physics


(North-Holland, 1978)
2. G. Fischer, B. Springborn, Lineare Algebra. Eine Einführung für Studienanfänger. Grundkurs
Mathematik (Springer, 2020)
3. K.-H. Goldhorn, H.-P. Heinz, M. Kraus, Moderne mathematische Methoden der Physik. Band
1 (Springer, 2009)
4. P. Grinfeld, Introduction to Tensor Analysis and the Calculus of Moving Surfaces (Springer,
2013)
398 14 Tensor Formalism

5. S. Hassani, Mathematical Physics: A Modern Introduction to its Foundations (Springer, 2013)


6. N. Jeevanjee, An Introduction to Tensors and Group Theory for Physicists (Springer, 2011)
7. N. Johnston, Advanced Linear and Matrix Algebra (Springer, 2021)
8. J.M. Lee, Introduction to Smooth Manifolds. Graduate Texts in Mathematics (Springer, 2013)
9. T. Levi-Civita, The Absolute Differential Calculus. (Calculus of Tensors) (Dover Publications,
1977)
10. P. Renteln, Manifolds, Tensors, and Forms: An Introduction for Mathematicians and Physicists
(Cambridge University Press, 2013)
11. A.J. Schramm, Mathematical Methods and Physical Insights: An Integrated Approach (Cam-
bridge University Press, 2022)
12. B.F. Schutz, Geometrical Methods of Mathematical Physics (Cambridge University Press,
1980)
13. C. Von Westenholz, Differential Forms in Mathematical Physics (Elsevier, 2009)
14. R. Walter, Lineare Algebra und analytische Geometrie (Springer, 2013)
Index

A Covector fields, 71
Abelian group, 19, 20, 54, 81, 99
Affine space, 17, 55, 56, 138
Algebra, 20 D
Algebra End(V), 267 Decomposition, 117, 260, 261, 281
Algebraic multiplicity, 256 Determinant, 203
Alternating tensors, 239 Determinant function, 204
Angle, 47 Diagonalizability, 259, 260, 262, 265, 266, 287
Annihilators, 184
Dimension, 96, 108, 115, 125
Direct product, 114, 118
B Direct sum, 62, 118, 252
Basis, 94, 95, 106, 380 Disjoint composition, 221
Dual, 37, 50, 71, 86
Basis isomorphism, 109
Dual basis, 176
Bijective, 108
Dual map, 178
Bilinear form, 40, 49, 355
Dual space, 37

C E
Canonical basis, 72 Effective action, 11
Canonical dual basis, 176 Eigen element, eigenelement, 243
Canonical isometry, 366 Eigenspace, 243
Canonical isomorphism, 365 Eigenvalue, 243
Canonical map, 4, 60 Eigenvector, 243
Cauchy-Schwarz, 43 Elementary matrices, 199
Cayley-Hamilton, 282 Elementary row operations, 201
Change of basis, 157, 235 Equivalence classes, 2, 3, 6, 57, 58, 106, 111,
Characteristic polynomial, 254, 287 152, 220
Colist, 86, 87, 147, 362 Equivalence relation, 1–4, 6, 11, 12, 14, 58, 90,
Column rank, 148 112, 152, 203, 218, 219, 356
Column space, 148 Equivariant map, 1, 13, 14, 100, 102
Complementary subspace, 116 Equivariant vector space, 101
Complement in set theory, 116 Euclidean space, 138
Complexification, 329, 330 Euclidean vector space, 17, 39, 44, 47, 48, 54,
Complex spectral theorem, 314, 316, 327 74, 135, 186, 190, 215, 299, 333, 336,
Contravariant tensor, 382 346, 365, 385
Covariant k-tensor, 372
Covector, 53, 71, 87, 97, 175, 362

© The Editor(s) (if applicable) and The Author(s), under exclusive license 399
to Springer Nature Switzerland AG 2024
N. A. Papadopoulos and F. Scheck, Linear Algebra for Physics,
https://doi.org/10.1007/978-3-031-64908-0
400 Index

F Matrix polynomial, 284


Field, 18 Minimal polynomial, 284–286
Free action, 11, 12, 99 Minkowski spacetime, 138
Free vector space, 237 Multilinear map, 372–374
Fundamental theorem, 148, 154 Multilinear vector valued map, 372

G N
Generalized eigenvector, 273 Newton Axioms, 120
Geometric multiplicity, 251, 256 Newtonian mechanics, 74, 120, 365
Gram-Schmidt orthogonalization, 302 Nilpotent, 277, 288
Group, 6–12, 14, 17, 98, 131, 133, 134, 138, Nilpotent matrix, 274–276
139, 221, 223, 227, 317, 325, 348, 359 Nilpotent operator, 248, 274
Group action, 1, 6–8, 13, 54, 98, 101, 104 Non-diagonalizability, 248, 268–270, 272, 276
G-space, 9 Nondiagonalizable, 271, 328
Norm, 40
Normal form, 111, 112
H Normal matrices, 333
Homogeneous, 165 Normal operator, 308, 310, 316, 335, 337, 338
Homogeneous equation, 165
Homomorphism, 36
O
Orientation, 219
I Oriented vector space, 220
Ideal, 284 Orthogonal basis, 301
Image, 36 Orthogonal complement, 304, 305
Inhomogeneous, 165 Orthogonality, 41
Injective, 108, 185 Orthogonal operator, 346, 348
Inner product, 38, 39, 366 Orthogonal projection, 41, 306
Isometry, 325, 327, 366 Orthogonal sum, 304
Isomorphism, 36 Orthonormal basis, 301
Isotropy group, 10–13 Orthonormal expansion, 301

K P
Kernel, 36 Parallel projection, 66
Permutation sign, 223
Photon, 136
L Point, 24, 29
Lagrangian equation, 120, 123 Polynomials, 267
Laplace expansion, 214 Pullback, 180
Left group action, 7, 10, 158
Leibniz formula, 225
Linear combination, 28, 82 Q
Linearly dependent, 89, 90, 92, 93 Quotient map, 4, 5, 60
Linearly independent, 63, 89–91, 93, 115, 251 Quotient space, 1, 3, 5, 6, 11–14, 56, 58, 60–62,
Linear maps, 32, 36 79, 111, 218, 220, 221
Linear system, 164, 172 Quotient vector space, 60
List, 22, 86, 87, 234, 362

R
M Rank, 83, 356
Matrix, 22, 86, 106, 113, 234, 362 Rank inequalities, 146
Matrix multiplication, 53, 142, 143, 193 Rank-nullity, 107
Index 401

Rank of a matrix, 144, 147, 150 Surjective, 108, 185


Real spectral theorem, 308, 314, 315, 328 Sylvester’s law of inertia, 358
Real vector spaces, 21 Symmetric k-tensor, 239
Reduced row echelon form, 111, 168
Reflection, 346, 348
Representation, 97, 107, 110–112, 163, 234, T
235 Tailor-made basis, 106, 166
Representation change, 162, 235 Tailor-made solution, 167
Representation of eigenvectors, 253 Tangent bundle, 71
Representations of V, 159 Tangent space, 55, 69
Right group action, 9, 158 Tangent vector, 55, 69, 71
Ring, 18, 21 Tensor, 119, 129, 236, 372, 390
Tensor algebra, 382
Tensor product, 129, 376, 379, 388
S Transitive action, 11, 318, 348
Scalars, 17, 20, 21, 39, 85, 88, 97 Transpose, 54
Semi-Euclidean, 138 Triangularization, 290, 291
Semi-Euclidean vector space, 17, 40
Semisimple algebra, 287
Similar matrices, 112 U
Singular Value Decomposition (SVD), 351, Universal property, 388–390
353 Upper triangular matrix, 275
Span, 82
Special relativity, 2, 5, 47, 51, 52, 132, 138,
343, 354, 355, 357, 360, 365, 385 V
Spectral decomposition, 316 Vector, 24, 97, 362
Spectral theorem, 242, 314, 316, 317, 323, 324, Vector bundle, 71
352, 357 Vector field, 72
Standard vector, 50 Vector space, 19
Structure theorem (pre-Jordan), 277, 281 Vector valued map, 101
Subspace, 30 Velocity of light c, 136

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy