0% found this document useful (0 votes)
13 views208 pages

Notes (3)

just some mechanics notes from a mid lecture

Uploaded by

Njabulo Clement
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views208 pages

Notes (3)

just some mechanics notes from a mid lecture

Uploaded by

Njabulo Clement
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 208

M E C H A N I C S II

APPM2023

Course Notes by
Warren Carlson

2024
Preface

These are the course notes for APPM2023. These notes are intended to complement the lectures
and other course material, and are by no means intended to be complete. Students should consult
the references at the end of this document and those provided by the course instructor to find
additional material, and different views on the subject matter.
This material is under development. Please would you report any errors or problems (no
matter how big or small). Any suggestions would be gratefully appreciated.

School of Computer Science and Applied Mathematics,


University of the Witwatersrand,
Johannesburg,
South Africa

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs


License. To view a copy of this license, visit,

• http://creativecommons.org/licenses/by-nc-nd/2.5/

Send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105,
USA. In summary, this means you are free to make copies provided it is not for commercial
purposes and you do not change the material.

i
Contributors

This document was prepared with the help of Dario Fanucchi, Matthew Woolway, Sheldon Herbst
and Kendall Born. I am indebted to these individuals for their help in preparing this material.

iii
Foreword

We live today in an era of discovery, where new phenomena are constantly being observed and
explained, where thousands of scientists are competing in an undying quest to discover the secret
language of the universe. We live in a world of intellectual abundance, a world where precious
gems of knowledge, mined for so conscientiously by our predecessors and contemporaries are
just a mouse-click away. It is certainly exciting to witness this lightning-quick development of
human understanding, which expands into new domains with more pace and agility than the
most riveting game of sports. What then, might lead us to the next breakthrough? What might
decode the beautiful language of that grand and silent masterpiece - the universe? Perhaps it is
prudent to heed the words of Newton that we can only progress by ‘Standing on the Shoulders of
Giants’.
With this inspiration, let us embark on a journey of the mind to the very origins of our scientific
understanding. Our journey begins in the year 2010 and we travel slowly backwards, noticing
as we go the thousands of Journal Articles and PhD’s published in the time we are winding back.
Soon we have passed to a time before social networking was a concept, and before cell phones had
colour screens. Our journey begins to speed up now, and we see the information revolution and
the internet disappearing. Let us go further still, before humans had travelled into space, before
television, before the Wright Brothers developed the first Aeroplane, before radio was discovered,
before highways and skyscrapers and cars and trucks. Let us go back to before Charles Babbage
first conceived his Analytical Engine, and continue still, to before electricity was understood and
before railways were ever built. Now we are travelling at a breathtaking speed beyond the first
telescope, crafted by Galileo Galilei, and beyond the first use of a magnetic compass. We are
reaching the furthest stretches of known history - around us is a desolate plane. Human beings
are living in caves and hiding out in the vast unknown. Let us pause here, and begin our walk back
to the present.
Humans, from the very outset, have been driven by the urge to understand the world. Even
in the distant past we observe the prehistoric man carving out a tool from rock and wielding
it to best his prey. Further forward we see the discovery of fire, the taming of wolves and the
logging of the stars above. We now jump to the time of the Ancient Greeks. We see sundials and
water-watches for telling the time of day, pulleys and levers for moving heavy objects, channels of
water irrigating farms and cities, and the great Library of Alexandria - a veritable treasure trove
of all the knowledge of the time. Archimedes, who lived at this time, is regarded now as one of
the three most influential figures in the history of mathematics (the other two being Newton

v
and Gauss). It was Archimedes who first gave serious thought to the language of the universe,
in devising methods for obtaining volumes and areas of various solids, and considering basic
ideas in mechanics and mathematics. These ideas were sporadic, but nonetheless foundational
to future understanding of the world. One important contribution to our understanding of the
universe by the Ancient Greeks in general is the formal study of Geometry. Euclid’s Geometry of
flat space was to become the cornerstone of classical mechanics in later years.
Moving forward to the 16-th century, we skip past ages of development in various different
parts of the world. Here we meet a man by the name of Galileo Galilei, well regarded as one of the
greatest thinkers of all time, who reasoned deeply about the nature of the physical world. Galileo’s
famous observations that objects of different mass fall with the same acceleration (9.8m s−1 ) is
still quoted today. One of the most paradigm-changing notions introduced by Galileo is that of
an inertial reference frame: The physics of a system, it seems, obeys the same laws no matter
what angle we look at it from, no matter where we look at it from, and no matter how fast we are
moving, provided that we are moving at a constant velocity. These are the symmetries of classical
mechanics, and it was Galileo’s discovery of the last one in particular that led to Newton’s First
Law of Motion. In the late 16-th and early 17-th century, we pause to consider the work of one
Johannes Kepler, who charted the motions of the planets through space, and first posed the theory
that they followed elliptical orbits around the sun. We notice the theory of the geocentric universe
losing all credibility, and being replaced with a heliocentric model of the solar system. Yet Kepler
has no theoretical framework that explains the motions of the planets.
Now we take our journey forward to the late 17-th century, where we meet the Eponymous Hero
of all Newtonian Mechanics: Isaac Newton Himself. Newton was an extremely able mathematician,
who independently developed Calculus (concurrently with Leibniz) and who invented much of
the Physics that is now regarded as classical optics and classical mechanics. He developed the
first sound theory of gravitation, and formulated classical mechanics in three fundamental laws.
The first of these is a restatement of Galileo’s inertial reference frames, the second is the equation
of motion obeyed by all classical systems, and the third gives meaning to interactions between
different objects in a system and different systems. Newton’s gravitational and mechanical theories
make predictions that agree astonishingly well with observation. Indeed, for several centuries
after this point, the scientific community believed his work to be the language of the universe.
Not until the advent of Relativity and Quantum Mechanics in the early 20-th century was this
to be disputed. Newton introduced to the world a physical theory both rich and accurate, and
naturally this served as a starting point for an explosion of research and discovery. The field of
Classical Mechanics, founded upon Newton’s Three Laws, flourished. Several thinkers - Bernoulli,
d’Alembert, Snell, Lagrange and Hamilton among others extended Newton’s ideas to different
circumstances and reformulated his laws into mathematical frameworks that gave them greater
versatility and applicability.
Indeed, moving forward to the 20-th century we see that it is Hamilton’s formulation of Classical
Mechanics that inspires the mathematical formulation of Quantum Mechanics by Schrödinger,
Heisenberg, Dirac and others. Moving further forward still we see Richard Feynman, inspired by

vi
the Lagrangian approach to classical mechanics, and using it to develop his elegant Path Integral
approach to Quantum Physics.
And so we return to the present, back past all the welcome discoveries and developments
that we left behind earlier and back to our opening question. How then do we make the next
breakthrough? Perhaps, like Feynman, we might draw some inspiration from the elegant language
of Lagrangian Mechanics. Like so many of the most powerful theories in Mathematics and
Physics, Lagrangian Mechanics has a depth to it that can only be fully appreciated after a long
and thoughtful perusal of its workings. Thus I encourage you to explore this rich field thoroughly,
and to find the hidden connections between this and other branches of mathematics (notably
the calculus of variations) and physics (notably quantum physics). May this be the beginning of
your journey.
Dario Fanucchi

vii
Notes on the topics covered
I provide here some remarks, comments and discussions on the topics covered in the first part
of the course. These notes are not comprehensive and are sometimes a little off-topic, but are
intended to complement and clarify what was covered in the lectures. The level of detail here is
variable. In places there is just a sketch of something covered in more detail elsewhere (like the
lectures), and at times there is much more detail. I’ve also tried to supply pointers herein as to
where one might obtain further enrichment on the topics covered. In the Historical Anecdote,
which begins these notes, I have given a personal motivation for this course and Lagrangian
Mechanics in general. The pragmatist can skip this section and not risk losing any important
knowledge.
It is important to keep in mind that the physics we study in this course is, in effect, Newtonian
Physics. We will rework the familiar F⃗ = m a⃗ into very elegant forms with far reaching
consequences so that it is barely recognisable in the end; but nonetheless we will not escape from
the fundamental shortcomings of Classical Physics. The physics we study breaks down at speeds
close to the speed of light, and at scales where quantum effects are important. Nevertheless,
classical physics is a very good approximation to reality for a very large class of problems in our
world. Moreover, the elegant new language we will develop here to speak about classical physics
is precisely the most natural language in which to speak about all of the more sophisticated
modern theories today. In this sense, we will reveal in this course very deep ideas that run
beneath much of modern physics and mathematics.

viii
Contents

Preface i

Contributors iii

Foreword v

Contents ix

1 Introduction 1
1.1 Development of a Physical Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Frames of Reference and Galilean Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Newton’s Laws of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Classical Mechanics as a State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Algebra and Geometry 15


2.1 Coordinate Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Euclidean Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Coordinate Systems and Their Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 The 2-Dimensional Cartesian Coordinate System . . . . . . . . . . . . . . . . . . . 25
2.3.2 The 3-Dimensional Cartesian Coordinate System . . . . . . . . . . . . . . . . . . . 30
2.3.3 Other Linear Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.4 Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.5 Transformation Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Coordinate Curves and Coordinate Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.1 Parametric Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.2 Tangent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.3 Cotangent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.4 Tangent and Cotangent Vector Component Relations . . . . . . . . . . . . . . . . 42
2.4.5 The Metric Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.6 Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4.7 1-Dimensional Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.4.8 2-Dimensional Plane Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 49

ix
2.4.9 2-Dimensional Elliptical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.10 3-Dimensional Polar Cylindrical Coordinate . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.11 3-Dimensional Polar Spherical Coordinates . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.12 Other Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5 Coordinate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3 Newtonian Mechanics 77
3.1 Transforming Newton’s Second Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2 Generalised Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4 Forces of Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.5 Work and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4 Lagrangian Formalism 107


4.1 Single Particle Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1.1 Coordinate Systems and Computations . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1.2 A Sanity Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2 Conservative Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2.1 Path Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2.2 Potential Energy and Kinetic Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2.3 Conditions for a Force to be Conservative . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2.4 Computing the Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2.5 Known Potential Energies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.3 The Variational Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.1 A Variational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.2 Optimization of Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.4 The Euler-Lagrange Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4.1 A General Scheme for Conservative Systems . . . . . . . . . . . . . . . . . . . . . . . 132
4.4.2 Including Convervative and Non-Conservative Forces . . . . . . . . . . . . . . . 137
4.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.6 Conjugate and Cyclic Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5 Multiple Particle Systems 151


5.1 Describing Multiple Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.1 Discrete and Continuous Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.2 Centre of Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.1.3 Kinentic Energy of a System of Particles . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.2 Rigid Body Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

x
5.2.1 Euler Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.2.2 Moment of Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.2.3 Lagrange’s Equations for Rigid Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.2.4 Inertial Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.2.5 Principal Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Appendices 177

A General Mathematics 179


A.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.2 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
A.3 Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
A.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Bibliography 195

xi
Chapter 1

Introduction

We understand our environment by constructing mental models that capture some important
aspect of the environment. A model of a system helps us to understand that system by making
predictions that we can observe and test. In this way, we develop expectations of the behaviour
of the system. A physical theory is an organisation of mental models for a given system. A good
theory should not only describe a given system well, it should also give insight into the behaviour
of another system. Below is a description of the development of a physical theory.

1.1 Development of a Physical Theory


The formal method to define a new theory begins with an informed guess of the rules that govern
the behavior of a given system. The data that informs this guess comes from the collection of prior
knowledge acquired from observations of other systems and possibly adapting existing models
from other systems. A theory is a philosophy by which we may organise our understanding and
expectations. If a theory gives the incorrect prediction for a physically observed phenomenon
then the theory is wrong and a new guess is needed. The British statistician George Box famously
stated “All models are wrong, but some models are useful,” and we should be cognisant of this
when building our models. The construction of useful models is part of the process of building
better theories. Figure 1.1 gives a graphical representation of this process. This course shall focus
on the initial guess and intermediate step of computing the consequences of this special guess
for the behavior of mechanical systems.

Guess Compute Consequences Test/Check

Figure 1.1: A diagrammatic representation of the development and evaluation of a physical theory
starting with an initial guess, followed by the determination of the logical consequences of the
guess and comparison of the expected consequences with the measurements or observations.

A theory is a conceptual abstraction of the elements needed to describes a system. It is possible


to produce multiple abstractions that describe a single phenomenon. These abstractions can
be conceptually and mathematically distinct. Two theories that imply identical consequences

1
would be physically indistinguishable, even though the characters of these theories could be very
different.
The value of a theory lies not only in how well it might describe one given system, but in how
it might inform the description of other systems. Two conceptually and mathematically distinct
theories of a single phenomenon motivate distinct patterns of thought and provide different
insights into that phenomenon. Each theory will admit different modifications and motivate
distinct ideas for new theories. Most importantly, different theories admit different ideas of natural
modifications and extensions. An extension of one theory to a broader collection of phenomenon
might require less effort than what is required to extend another theory. As such, the value of one
theory might exceed that of another by measure of its usefulness in guessing new, better theories.
The collection of ideas shared by multiple theories in themselves provide the inspiration for a
more general theory. It is therefore advantages us to know multiple theories at any instant.
A single theory might possess aspects that we might improve upon. Given two theories, one
might be considered better than the other depending on how one evaluates each with regards to
the following questions

1. Which theory make more accurate predictions?

2. Which theory is closer to the truth?

3. Which theory allows more easily computed consequences?

4. Which theory is most easily understood?

5. Which theory is most easily extended?

The relative rankings of two theories with respect to these questions might lead us to judge one
theory as better than the other, but such a ranking should not necessarily lead us to disregard one
theory in favour of another. In general, more powerful formulations should explain more while
using fewer assumptions and admitting fewer exceptions.
Given a collection of facts about some system, how much about that system do we actually
know?. Answering this question requires us to understand of the distinction between information,
knowledge and the philosophy about the models that we build. Information on about a system is
a collection of uncurated facts about that system. Knowledge is the collection of ideas that we
generate from the data by abstracting the patterns in the collection of facts. A philosophy is a
formal arrangement of the collection of ideas about the system and how these ideas are supported
by the facts. In this sense information, knowledge and philosophy are distinct levels of abstraction
of the description of a system and its relation to other systems.
These notes contain a description of the relevant topics necessary to build a new formulation of
Classical Mechanics to compete with the standard Newtonian formulation. The new and standard
formulations will be physically indistinguishable. The purpose of these notes is to make clear
the numerous advantages of this new formulation over the standard formulation of Classical
Mechanics by providing a formulation wherein

2
1. physical consequences of mechanical configurations are more quickly and easily computed
than in the standard formulation;

2. the behaviours of mechanical systems are more easily understood than in the standard
formulation;

3. extensions are possible that are either more difficult to achieve or not possible in the standard
formulation.

This new formulation is not a replacement of the standard formulation, but one with some number
of conceptual and computational advantage over the standard formulation in certain problems.
The first task in developing this new formulation is to understand the formal structure of the
standard Newtonian Mechanics and build new abstractions from this formal structure.

1.2 Frames of Reference and Galilean Symmetries


The laws of classical physics are a mathematical abstraction of observed facts about classical
physical systems. These laws are invariant under a collection of transformations known collectively
as the Galilean Group,

Translation in space: x⃗ ′ = x⃗ + c⃗.

Translation in time: t ′ = t + k .

Velocity transformations: x⃗ ′ = x⃗ + v⃗ t .

Rotation: x⃗ ′ = R x⃗ , R is an orthogonal matrix with det (R ) = 1.

These Laws are encapsulated in a set of statements known as Newton’s Laws of Motion. The proof
that these laws of motion are invariant under Galilean Group transformation is left as an exercise
to the reader. We often refer to the invariance of physical laws under these transformations as
the symmetries of classical physics. By translational symmetry, there is no fixed ‘origin’ in the
universe that can be explicitly labelled with the zero vector (in space or space-time). This leads to
the following definition.

Definition 1 (Affine Space) An affine space is a space comprising a set of points where the
difference vector between any two points in an affine space is defined, but the sum of two points is
not.

Remark 1 We think of the classical universe as an Affine Space.

Generally, when solving problems we specify an origin by singling out a point in the affine
space. All other points are defined relative to that origin by the difference vectors, and so the
position of a point x in space is given by a sequence of N real valued numbers that label that

3
point. The collection of all such sequences of N numbers defines an N -dimensional vector space
RN .
The fact that the laws of physics are invariant under velocity transformations leads to the
concept of an inertial reference frame where in any two observers moving relative to one another
with a constant velocity will observe the same forces on some system that they are both watching.
This is the substance of the Galilean principle of inertia. An inertial reference frame is one that
is not accelerating. All such reference frames are equivalent by the fourth transformation of the
Galilean Group, and hence the laws of physics will be the same in any inertial reference frame.
A non-inertial reference frame is one that is accelerating. The laws of physics are not the same in
reference frames that are accelerating relative to each other. To account for this in an accelerating
reference frame, ‘fictitious forces’ are introduced into the system as a result of the motion of
the system. An observer in a rotating reference frame observes the deflected motion of objects
moving relative to the observer. This deflected motion is commonly interpreted as a applied force
that causes a deviation in the motion of objects from straight line motion. Example 1.1 gives a
demonstration of this phenomenon.

Example 1.1 (Coriolus Effect) Suppose observers a and b stand at diametrically opposed points
on a rotating disc, see Figure 1.2
1.2. An observer a throws a ball to observer b . In the stationary
reference frame, the ball follows a straight line from the a to b . In the time it takes the ball to move
from a to b , observer a moves to a ′ and observer b moves to b ′ . Suppose we consider the flight path
of the ball from the perspectives of a and b . If a and b remain at fixed points on the rotating disc,
then a = a ′ and b = b ′ , however, the trajectory followed by the ball now deviates from the original
straight path since a and b move relative to the path of the ball. The path of the ball is deflected
away from b and toward a in the perspective of the observers in the rotating reference frame. This
deflection appears as the consequence of a ‘fictitious force’ known as the Coriolis Effect.

b b = b0
b0
ω

a0
a a = a0

Stationary Corotating

Figure 1.2: The relative motion of a ball thrown from an observer a to an observer b as seen in a
stationary reference frame and a reference frame corotating at a rate ω.

4
The Galilean Group describes the symmetries of Classical Physics. The symmetries of
relativistic physics are described by the Lorentz Group. This accounts for the change in the laws
of physics as the speed of the observer approaches the speed of light.

1.3 Newton’s Laws of Motion


Newtonian Mechanics is a means of reasoning about the motion of objects. To make this clear, it
is necessary to define the following concepts.

Definition 2 (Particle) A particle is an idealised object without physical extent, having at any
given instant, a well defined position and momentum. A particle has no dimension.

Definition 3 (State) The state of a collection of particles is the intrinsic information needed to
determine the motion of each particle in the collection.

We shall refer to a collection of particles as a system of particles or a system. In Newtonian


Mechanics, the state of a system is a function of the position and momentum of each particle.

Remark 2 The parameters that label a particle in a system may change in time. As such, a particle
has a state. Additionally, a particle may have a non-zero mass. This mass parameter may vary.
Therefore, it is sometimes useful to define the mass of a particle as part of the state of the particle.
At speeds much less than the speed of light, the momentum of a particle is approximately equal to
the product of its mass and velocity. If the mass is not constant, then it is more accurate to consider
the state of the particle as its position and momentum than its position and velocity. For a many
particle system, we consider the state of the system to be the positions and velocities (or momenta)
of all the particles in the system.

The state of a particle is all the intrinsic information that we need to fully determine its motion.
In Newtonian Mechanics, the state of a particle is characterised by its position and momentum
at a given instant in time. Assuming we know all the (external) actions applied to a particle at
a given instant, as well as the (intrinsic) state of the particle, we can completely determine its
motion. It is a property of nature that position and momentum are the most general state variables
needed to characterise the state of a system. This is true even in Relativistic extensions to Classical
Mechanics, where the mass of a particle is not constant in time.

Remark 3 It is possible to imagine a world in which something else (say acceleration) was also
important for state, but it is not so in reality. In Aristotelian Mechanics, only position was thought
to be necessary to characterise the state of a particle.

The mathematical equations of motion are intricately connected to the definition of state.
In Newtonian Mechanics, the equations of motion are second order differential equations in
position. Hence they can be used to solve for the motion of a particle if the initial position and

5
velocity of the particle are known. In an alternate universe in which the equations of motion were
third-order, the initial acceleration would also be required, and thus acceleration would also be
part of the state of a particle. In Aristotle’s universe, the equations of motion are first order, and
hence the state is determined by the position alone.
The transitions between states in Newtonian Mechanics is governed by a set of rules,

NI: A particle will continue to move in a ‘straight line’ with constant velocity unless acted upon
by a force.

NII: When an resultant force is applied to a particle, the rate of change of the momentum of the
particle is equal to the magnitude of the resultant force, and acts in the direction of that
force.

NIII: For every force or action there is an equal but opposite force or reaction.

These rules are commonly know as Newton’s Laws of Motion.


NI is a statement about how a system maintains its state. It suggests that the ‘state of motion’
of the particle remains constant unless some force acts on it. We can always pick an inertial
reference frame in which such an object is stationary. The notion of state and the notion of an
inertial reference frame are both implied by NI (Galileo’s Law of Inertia).
NII is a statement about the resistance to change in state of a system when it is acted upon by
an agent of change. It suggests that an agent F⃗ that acts to change the state of motion of a particle
affect a change in the momentum p⃗ as

d p⃗
 ‹
F⃗ = .
dt

A particle tends to resist such a change more if it is heavier and less if it is lighter. When the mass
of the particle is constant, we get the more familiar result,

F⃗ = m a⃗

which implies that the acceleration is proportional to the applied force, and inversely proportional
to the mass of the particle.
NIII is a statement about the consequence of changing the state of a system. Of these rules,
NIII provides a rule that governs the interactions in a system. It gives a rule for dealing with forces
exerted by particles on each other. This is often explained by thinking of pushing a car down the
road. The force we exert on the car to make it move is the same as the resisting force exerted by
the car on us, making it difficult to push.
More information on Newton’s Laws, particularly with regards to problem solving, can be found
in a Physics text like [1
1]. The remainder of this course shall focus mainly on NII and formulating
equations for the motion of systems that are compatible with NII.

6
1.4 Classical Mechanics as a State Machine
The state description of classical mechanics describes the mechanism by which a system which
occupies one well defined state at one instant will, under the action of a given rule, transition
to a new well defined state. There may be many such states and transition rules that govern
the state of a system. We consider in detail the consequences of this formal construction of
Classical Mechanics as a mathematical machine that manages system states. Before continuing,
let’s introduce some definitions.

Definition 4 (State Machine) A state machine is a collection of ‘states’ and ‘rules’ for moving
between these states. A finite state machine is a state machine with a finite number of states. An
infinite state machine is a state machine with an infinite number of states. The ‘state’ of a state
machine is exactly the information required to determine what will happen next.

1 3 5 6

4
Figure 1.3: A finite state machine with 6 states. The states are drawn as vertices (dots), and the
rules for transition between the states are drawn as directed edges (arrows) between them. If the
system begins in state 1, it will transition to state 2 at the first time step. At the second time step it
will transition to state 3, and eventually to state 4, after which it will return to state 1 and repeat
this process. If the system begins in state 5, it will transition to state 3 in the first time step, and to
state 4 in the second time step, and so on. If the system begins in state 6, it will remain in state 6
at each subsequent time step.

Definition 5 (Deterministic State Machine) A state machine is called deterministic if there is at


most one possible transition from every state. Given the current state of a deterministic system, it is
possible to completely determine what all future states of that system will be.

Definition 6 (Reversible State Machine) A state machine is called reversible if there is at most
one possible transition into every state. Given the current state of a reversible system, it is possible to
work backwards and completely determine what all past states of that system were.

Remark 4 The labels deterministic and reversible are used to describe state machines.

Figure 1.3 depicts a deterministic but not reversible state machine; if the system is in state 3
we cannot tell whether it came from state 2 or state 5. Figure 1.4 demonstrates the key difference

7
2 2

1 1

3 3

Non-Deterministic Non-Reversible
Figure 1.4: The non-deterministic state machine has at least one state with more than one
transition rule from that state, while a non-reversible state machine has at least one state with
more than one transition to that state.

between non-deterministic and non-reversible state machines. It is possible for a state machine
to be both deterministic and reversible, deterministic but not reversible, reversible but not
deterministic or neither deterministic nor reversible.

Definition 7 (Phase Space) The set of all states in a state machine is called the phase space of the
state machine. This terminology is most often used for infinite state machines.

A discrete infinite state machine is an infinite collection of states and a set of transition rules
for stepping between these states at each time step. A continuous infinite state machine is an
infinite collection of states and a collection of rules for determining the state at any real value of
time t > 0 from a given initial state. Classical Mechanics is a continuous infinite state machine
that is both deterministic and reversible. The state of a particle is its position and momentum;
and the transition rule is Newton’s second law of motion (NII).
The phase space of a particle is defined on its paired position and momentum. It can be useful
to consider the path a particle traverses through its phase space as a function of time. The state of
a system of particles is the collection of position and momentum pairs of all the particles in the
system. The entire (classical) universe can be thought of as having some state at a given point
in time – the collection of the positions and momenta of every particle in the universe. In our
classical picture of the universe, time is thought of as a continuous parameter, and we think of
the universe as a succession of ‘states’ for different values of this parameter. The laws of physics
tell us how to move up and down in this succession. Figure 1.5 demonstrates the phase-space for
a damped oscillator that includes the path of the oscillator through this space.

Remark 5 In general, the state pair s = ( x⃗ , p⃗ ) of a system keeps track of the position and the
momentum. It is often convenient to represent the phase-space of a particle of fixed non-zero mass
as the collection of all pairs s̃ = ( x⃗ , x⃗˙ ) and keeping track of only the positions and velocities. This is
valid when the particle has fixed mass and moves at low (non-relativistic) speeds. This is depicted
in Figure 1.5
1.5.

8

pc

s (0)

x
Figure 1.5: The phase-space of a damped oscillator of fixed mass contains the collection of all state
pairs s (t ) = (x (t ), ẋ (t )), where x (t ) and ẋ (t ) are the position of the oscillator and the velocity of
the oscillator at a time t , respectively. Grey curves denote the collection of possible paths through
phase-space that an oscillator might follow. The red curve corresponds to the ordered sequence
of states traversed by a specific oscillator with a state s (0) = (x (0), ẋ (0)) at time t = 0. This oscillator
traverses a time ordered sequence of states on the time interval t ∈ (0, ∞). This sequence of states
terminates at the critical point pc at a time t → ∞.

Note that Newton’s Second Law, for a particle with constant mass m , is a statement of the
change of the state of a system in the form of a differential equation,
 2 
d p⃗ d v⃗ d x⃗
 ‹  ‹
F⃗ = =m =m
dt dt dt 2

which is a first order differential equation in velocity, but second order differential equation is
position. This means that to solve this differential equation we require two pieces of information,
usually expressed as an initial pair ( x⃗ (0), v⃗(0)) that serves as the initial conditions that specify
the solution to the differential equation in position and velocity. An infinitesimal change d v⃗ is
incurred by the application of a force F⃗ for the duration δt . From first principles,

d v⃗ v⃗(t + δt ) − v⃗(t )
 ‹
= lim .
dt δt →0 δt

Now consider the discrete change in the velocity over the corresponding discrete change in time
δt
v⃗(t + δt ) − v⃗(t )
δ v⃗ = .
δt
Rearranging the terms in this equation leads to the statement

F⃗ (t )
v⃗(t + δt ) = v⃗(t ) + δt δ v⃗(t ) = v⃗(t ) + δt
m

9
or in integral form
tZ
+δt
F⃗ (t )
v⃗(t + δt ) = v⃗(t ) + dt
m
t

where the velocity at time t + δt is now a function of the velocity at time t and the change in
velocity incurred over the duration δt . Similarly, for position,

x⃗ (t + δt ) = x⃗ (t ) + δt v⃗(t )

or in integral form
tZ
+δt

x⃗ (t + δt ) = x⃗ (t ) + d t v⃗(t ).
t

Notice that each of these is a linear function of δt , so their action is unique. We require only
information about the system at time t to determine the position and velocity of the system at
t + δt . The functions
( x⃗ (t ), v⃗(t )) → ( x⃗ (t + δt ), v⃗(t + δt ))

are the transition rules between states at successive times t and t + δt , and the state after the
application of the rule is unique given a particular initial state. If the applied force F⃗ (t ) is known
at each instant t , then for each state ( x⃗ (t ), v⃗(t )) the update rules are completely determined at
each instant. Under these conditions, Newton’s Second Law gives rise to a state machine that is
deterministic. Under a similar set of conditions, it can also be shown that Newton’s Second Law
gives rise to a state machine that is also reversible, which is left as an exercise to the reader.
The validity of these update rules can be understood in terms of the infinitesimal nature of the
update procedure of both the position x⃗ (t ) and velocity v⃗(t ) vectors. When the time step δt is
infinitesimal and the force F⃗ (t ) is continuous, the update procedure listed above provides the first
order correction to the Taylor Series Expansion of the pair ( x⃗ (t ), v⃗(t )). At first order, this update
rule gives an approximation to the state at t + δt and accounting for all corrections beyond first
order is given by a sum over all orders in the infinitesimal corrections to these functions. This
gives rise to the integral form of the update functions, where the integrals are taken with respect
to the infinitesimal d t . Additional information on and discussion of Taylor Series Expansions is
listed in Section A.2
A.2.

1.5 Exercises
Exercise 1.1 Show that Newton’s Second Law of Motion is invariant under Gallinean
transformations, and a constant force.

Exercise 1.2 Classify the state machine associated with each of the following graphical
representations:

10
1. 4. ẋ
3 2

4 1 x

5 6

2. 5. ẋ
3 2

4 1 x

5 6

3. 6. ẋ
3 2

4 1 x

5 6

Exercise 1.3 Consider each of the following systems as a state machine and give a graphical
representation of each, including all necessary labels. In each case, classify the state machine as
either finite or infinite, and state whether it is deterministic and/or reversible.

11
1. A traffic signal on a South African road. (Hint: What is the sequence of the lights?).

2. A lamp switch. (Hint: What are the possible settings for a lamp switch.)

3. An oven thermostat. (Hint: How is an oven thermostat different from a lamp switch.)

4. A cart rolling along a rail and slowing to a stop, in 1 dimension. (Hint: What does the position
of the cart do relative to its speed?.)

5. A paperclip sliding across a smooth table top, at constant speed, in 2 dimensions. (Hint: What
does constant motion look like in phase space).

6. A mass oscillating up and down at the end of a spring that hangs from a fixed mount point,
in 1 dimensions. (Hint: What does oscillatory motion look like in phase space.)

7. An asteroid in a decaying orbit about the sun, in 3 dimensions (Hint: asteroids orbit in a
2-dimensional plane.)

Exercise 1.4 Draw a deterministic state machine and a non-deterministic state machine, with at
least five states.

Exercise 1.5 Draw a reversible state machine and an irreversible state machine, with at least five
states.

Exercise 1.6 The following Mathematica code generates a graphical representation of the behaviour
of a Cellular Automaton associated with Rule 30.

ArrayPlot [ CellularAutomaton [30 ,{{1} ,0} ,100]]

The state update rule is visualized using

RulePlot [ CellularAutomaton [30]]

Explain how this update rule, in conjunction with a sequence of filled and blank cells, defines a
infinite state machine. Then explain how altering the update rule corresponds to a different state
machine.

Exercise 1.7 Show that Classical Mechanics is deterministic, then determine the conditions under
which Classical Mechanics is reversible.

Exercise 1.8 Consider the statement of Newton’s Second Law, applied to a point particle moving
along the x -axis. Construct phase portraits for the system subject to each of the following forces

1. F⃗ = −4 x̂

2. F⃗ = −5x x̂

3. F⃗ = −5 ẋ x̂

12
In each case, proceed as follows:

1. Construct the state update rules as a pair fo update functions.

2. Construct a table containing three columns to contain the time t , position x and velocity ẋ
data.

3. Set each entry in the table to zero.

4. Initialize the table so that the first row contatins the inital date t 0 = 0, x (t 0 ) = x0 and ẋ (t 0 ) = v0

5. Apply the update rules iteratively for a given fixed time-step δt to update each row of the
table.

6. Generate a phase portrait of the system by plotting the corresponding table data.

Use a time-step δt = 10−6 , m = 1 and initial conditions x (0) = 2 and ẋ (0) = 3. How do the phase
portaits change when the time-step is changed to δt = 10−1 ? Explain why changing the time-step
changes the phapse portraits.

13
Chapter 2

Algebra and Geometry

Geometry and algebra form the basis of the quantitative study of the world. Geometry defines the
spatial relation among points and curves. Algebra is the formal set of rules by which computations
involving geometry are made. In this chapter we shall discuss the geometric and algebraic basis
for the description of mechanical systems. We shall begin with the coordinate free description of
length in a geometric context, then coordinate systems as a method for describing the relative
positions of points in space. We then discuss the algebraic formulation of length measure given a
coordinate system and promote the algebra to a general setting.

2.1 Coordinate Grids


Our ultimate goal in Mechanics is to understand the mechanism by which systems go. This
immediately implies that we should have same good description of the system in terms of its
location is space and time. The objective of this section is to get some understanding of the
conditions under which we can give a good description of a system that we want to study.
René Descartes was the first to introduce the concept of coordinate systems to geometry. He
assigned a pair of real numbers to each point in the plane - an (x , y ) coordinate pair. The plane
thus parameterised is known as the Cartesian plane. Here we shall study several examples of
coordinates on the Cartesian plane and other spaces.
In the simplest case, we can study the motion of an object that moves along a straight line. As a
precursor to a more general discussion to follow, we study the motion of a small bead that moves
along a straight wire with infinite length. In this case, we might ask where the bead is located at
some point in time, and then again at another point in time. If we are to make intuitive sense
of how the bead moves, we should have a way to denote the position of the bead along the wire.
The most natural description of the position of the bead in its motion is the distance of the bead
from some predefined point. We shall call this reference point the origin or datum point. With
this choice of origin, we can define the position of the bead a using a single (signed) real valued
number. We write a ∈ R where a positive value of a denotes a position to the right-hand side of
the origin, and a negative value of a denotes a position to the left-hand side of the origin.

15
Since the real numbers R have a natural ordering, it is possible to decide if the bead has moved
toward the left or toward the right of its previous position by simply comparing the current a f
value of the bead position with a previously measured value of a i . We compute the displacement
vector
a⃗ = a f − a i

defined as the directed line segment whose initial point and final point are a i and a f , respectively.
When this directed line segment is placed with its initial point at a i , the final point will correspond
to a f . In this way, we can locate the final position of the bead a f using only its initial data a i and
the displacement vector a⃗.
⃗ along the wire defines a vector space V where
The set of all displacement vectors a⃗ and b

⃗ ∈V
αa⃗ + β b

when α, β ∈ R. This means that there exists a vector c⃗ ∈ V such that

⃗.
c⃗ = αa⃗ + β b

Note that the definition of each of these displacement vectors and, indeed, the vector space V is
defined in terms of the differences between the points in R corresponding to points along the
wire. The space R is an example of an affine space.
The definition of an affine space uses the idea of differences between point rather than their
sum. This makes intuitive sense when considering displacement vectors, but the following
question now arises: ‘Why should we need a definition of this form if we already know how to add
real numbers?’ The answer is quite simple - for any pair of numbers a and b , we can compute
their sum c = a + b and claim that the sum of the two points is now well defined. However,
we could equally compute the difference a = c − b or b = c − a . This demonstrates that the
definition is, at least, compatible with the summation of real numbers in R. However, an affine
space gives us something more general, which is still valid when the above argument is not. For
example, we can consider a displacement vector, connecting two points in the plane. There exists
no composition rule to combine two points in the plane to get a third point, however there does
exist a composition rule to add two displacement vectors in the plane to get a third displacement
vector. The interpretation of the third displacement vector is unchanged from that in the one
dimensional case, where the vector defines a displacement between points it the plane. Therefore,
once the collection of all displacement vectors is defined, the collection of points needed to define
them is no longer needed. This is not the case for the space of points themselves. A byproduct of
this definition is that there is also no preferred choice of origin. Can you explain why this might
be the case?

Remark 6 Physical space, the space where we live, is an affine space.

At this point we should note that we have not specified how the position a has been assigned.
To do this, we should introduce a ruler, with a regular increment, to demarcate the position of the

16
bead. A ruler with a regular increment has the benefit that increasing the geometric distance of
the bead from the origin by some fixed factor is associated to a re-scaling of the numerical value
of the a by the same factor. Additionally, we might choose to transfer the distance information on
the ruler, that we will call the metric, onto the wire where the bead moves. Transferring the metric
information from the ruler to the wire corresponds to placing a unique numerical value to each
point on the wire, while maintaining the ordering of the numbers on the ruler, and hence allows us
to read off the position information of the bead using only the numbers associated the points on
the wire, without the need for the ruler. Additionally, the distance between any two bead positions
is simply the magnitude of the displacement vector connecting the bead positions. We call this
process of assigning a unique numerical value to each point on the wire coordinatisation. At this
point we have associated the space of positions along the wire with the space of real numbers R.
It should be clear that choosing coordinates is not a unique process. As an example, we could
swap out the original ruler that has some predefined unit, perhaps millimeters, with another ruler
with a different unit, say inches. Since we know that one inch corresponds to 25.4 millimeters, we
can use a conversion factor 25.4 to translate between the different distance measurement scales
on the millimeter ruler and the inch ruler. This conversion factor is the factor associated with
changing the metric from millimeters to inches. Then, if the bead is at a position a measured
using the inch ruler, then it will have a position

a ′ = 25.4a

as measured on the millimeter ruler. For each different ruler, there will be a new conversion
factor. We can think of the exchange of rulers as a transformation f on the metric information
that we place on the wire. Later we shall see that there are choices of coordinatisation where the
conversion factor changes depending on where the coordinates are studied. This rescaling gives
a coordinate transformation

f :R→R where a 7→ f (a )

subject to the following restriction

a < b ⇐⇒ f (a ) < f (b ) for all a , b ∈ R.

This means that if a point a is to the left of a point b on the wire before the change of coordinate
is applied, then the transformed coordinate f (a ) is also to the left of the point f (b ) under the new
coordinate system. When f is also linear, that is, for any scalar c

f (c a ) = c f (a )

then f is an example of an affine transformation.

Definition 8 (Affine Transformation) Suppose X is an affine space with a , b ∈ X and let f be a


function

f :X →X

17
b − a 7→ f (b ) − f (a )

then f is an affine transformation.

Remark 7 An affine transformation describes any function that preserves lines, parallelism and
relative scales but not distance information or angles.

The exchange of the millimeter ruler for an inch ruler in the description of the position of the
bead on the wire is an example of an affine transformation on the coordinate grid on the wire that
allows us to describe the position of the bead. We shall encounter many such transformations
and it is important to understand the connection between different coordinate descriptions of
physical systems and effects of changing coordinate systems when describing certain physical
quantities. Before continuing coordinate transformations, let’s consider some interesting
examples of coordinate systems on some well known spaces. Example 2.1 continues the
discussion of placing coordinates onto a 1-dimensional space.

Example 2.1 (Coordinate grid on a Circle) The circle S 1 is a 1-dimensional space. This means
that every point on the circle is described by a single number. Since there is no preferred point of
origin in S 1 , we can choose an origin denoted 0 and proceed to assign coordinates the rest of the
2.1. There are may ways to coordinatize S 1 , but we shall consider these technical
space, see Figure 2.1
details later in this text.
The simplest way to assign coordinates on S 1 is to associated points on the circumference with
the distance along the circumference from the chosen origin using a mapping function f . Suppose
that S 1 has a circumference of 2π. We can map a finite interval corresponding to the semi-open
subset I = [0, 2π) ⊂ R to S 1 , see Figure 2.1
2.1. By this mapping

f : R → S1
f
f (x ) = x mod 2π and x ∼ x + 2π

and the length of the interval I matches the circumference of the circle. The equivalence relation
f
x ∼ x + 2π means that the points x and x + 2π are mapped to the same point on S 1 . This means
that f is a many-to-one mapping and that f describes a valid one-to-one mapping only on the
restricted subset [0, 2π) ⊂ R.
After, the interval [0, 2π) is mapped to the circle, f maps [2π, 4π) onto S 1 and so on for each
interval [2πn , 2π(n + 1)) and for each positive n ∈ Z. The negative real numbers are similarly
mapped to the circle under the generalization to n ∈ Z. Clearly, f is a many-to-one mapping, and
f maps the entire space R is mapped onto S 1 , by winding the real line multiple times over the circle.

Example 2.8 provides an alternative mapping of R 7→ S 1 that, unlike that in Example 2.1 which
maps subsets of R onto S 1 , maps each point in R to a unique point in S 1 . Example 2.2 extends
the ideas from coordinatizing the 1-dimensional line to the 2-dimensional plane. This is done by
constructing a regular coordinate grid containing multiple copies of R.

18
2π 0

Figure 2.1: A mapping of the semi-open interval [0, 2π) to S 1 . Starting at x = 0 the positive real
line is wrapped around the circumference of the circle, such that the point x = 2π coincides with
the image of x = 0.

Example 2.2 (Coordinate grid on the Cartesian Plane) The 2-dimensional Cartesian plane can
be covered with a 2-dimensional grid that assigns a pair of numbers to each point on the plane.
The assignment of points follows from the placement of a single copy of the real line R on the plane.
Then, at each point x along this line, a second copy of R is placed such that each each point y on
the subsequent copies are aligned to form a grid. This grid is formed by the Cartesian product of the
two copies of R to form a new space R × R = R2 . Each point on the plane corresponds to a unique
point (x , y ) ∈ R2 corresponds to an ordered sequence where the first entry in the sequence, x , is
taken from the first copy of R and the second entry in the sequence, y , is taken from the second copy
of R in the product R × R. Figure 2.2 shows grid formed by the multiple copies of R that covers the
2-dimensional plane.

Figure 2.2: Coordinate grid on the Cartesian plane built from copies of the real line R to form
the 2-dimensional product space R2 that covers the entirety of the 2-dimensional plane. Each
horizontal line (blue) corresponds one copy of R and each vertical line (red) corresponds another
copy of R, and the arrows at the end of each line indicates that the lines representing each copy
extend to an infinite distance in each direction.

19
An arbitrary point in the plane corresponds to a point on the grid with associated coordinate
pair (x , y ). Since each copy of the real line in the pair assigns a unique point in R2 , each pair (x , y )
uniquely defines a point on the Cartesian plane. This one-to-one mapping between point in the
plane and points in R2 ensures that this coordinatization is valid everywhere on the plane.

There is no unique process to assign coordinates to a space. An example of this fact is


demonstrated in Example 2.3
2.3, where a non-uniform coordinate grid is assigned to the two
dimensional plane.

Example 2.3 (Riemann Normal Coordinates on the Cartesian Plane) An alternative coordinate
system on the 2-dimensional Cartesian plane can be constructed by choosing an origin on the plane
and then laying out a copy of the positive real line. This single copy R defines a 1-dimensional
subspace of the plane. Then, using a protractor, we can lay out additional copies of the R, that
intersect at a given point, each having an infinitesimal angular separation from the last, forming a
radial grid, see Figure 2.3
2.3. This coordinate system is called the Riemann Normal Coordinates (RNC)
and corresponds to the familiar plane polar coordinates on R2 .

Figure 2.3: Riemann normal coordinates correspond to the standard plane polar coordinates in
two dimensions. Radial lines (blue) emanating from the origin denote distance from the origin,
while the angular displacement between adjacent radial lines correspond to a given angle of
rotation about the origin. Circular (red) lines correspond to the loci of points at fixed radial
distance from the origin.

Notice that the RNC fail to be one-to-one at the origin, since this point has a unique radial
coordinate r = 0, but non-unique angular position.

The Riemann Normal Coordinates are a good starting point to assign coordinates to a patch of
a given space. In general, we do not expect that any single coordinate system to cover the whole
of the space we want to study, but in the case of the 2-dimensional plane and sphere S 2 , we can

20
use these coordinates to cover the whole space, see Example 2.4
2.4. In this case, we see that RNC
assign coordinates to the whole of S 2 , but this coordinate system is not one-to-one at two points.

Example 2.4 (Coordinate grid on the Sphere) A non-trivial example of Riemann normal
coordinates from Example 2.3 follows by considering the case of the sphere S 2 . Choose as the origin
the north pole of the sphere S 2 , and meet out an instance of the Riemann Normal coordinates. This
corresponds, under an appropriate rescaling, to the standard spherical polar coordinates on the
sphere, with fixed radial position.

Figure 2.4: Polar coordinate grid on a sphere S 2 . Radial lines (blue) emanating from the origin
denote distance from the origin, while the angular displacement between adjacent radial lines
correspond to a given angle of rotation about the origin. Circular (red) lines correspond to the
loci of points at fixed radial distance from the origin.

Notice that in the case of S 2 , the antipodal point of the origin, corresponding to the south pole,
is at a fixed “radial” distance from the origin. Beyond this antipodal point, the radial lines wrap
back along S 2 . This means that the Riemann normal coordinates provide a one-to-one mapping
between a finite subset of R2 onto S 2 . There are two points on S 2 where the coordinate mapping
breaks down, the north pole (with many possible angular displacement assignments) the south
pole (with many possible radial distance assignments). More generally, RNC can be extended to
more than two dimensions.

The torus is another example of a 2-dimensional space with interesting geometric and
topological properties that is easily coordinatized using a rectangular coordinate grid. This is
demonstrated in Example 2.5
2.5.

Example 2.5 (Coordinate grid on the Torus) The 2-dimensional surface of the torus T 2
corresponds to a product of copies of the circle, attached in a specific way so that the joined copies
form a surface with a handle. If we zoom into any single point on the T 2 , we find a local
description that suggests that the space is formed by a product of two copies of the circle, that is

21
T 2 ≃ S 1 × S 1 . However, this is not a good description of the whole torus since there is a special
arrangement of circles to form this surface. Figure 2.5 gives one choice of coordinates on the surface
of the torus.

Figure 2.5: Coordinate grid on a torus. The red circle with circumference cred and blue circle
with circumference cblue define the size of the torus. “Cutting” the torus along these two lines
transforms the 2-dimensional surface of the torus in three dimensions into a flat, 2-dimensional
rectangle corresponding to a subset of the 2-dimensional Cartesian plane.

We can add parameters to the surface of the torus by “cutting” the surface along the red line and
straightening the resulting shape to form a cylinder, and then cutting along the blue line and then
flattening the torus into a rectangle. We can then assign edge lengths cred and cblue , corresponding
to the circumference of the red and blue circles drawn on T 2 . We can identify the red edges of this
rectangle with a pair of adjacent vertical lines in Figure 2.2 and similarly, identify the blue edges of
2.2. The surface T 2 now corresponds
this rectangle with a pair of adjacent horizontal lines in Figure 2.2
to a rectangular piece of the Cartesian plane when the red edges of length cred are identified (this
means that the red line on the left-hand side of the rectangle is the red line on the right-hand side
of the rectangle) and similarly, and the blue edges are identified. Clearly, there is more than one
choice of rectangle in Figure 2.2 that can be matched, so there is a many-to-one matching between
the R2 and T 2 . Then, up to a constant factor, each cred × cblue rectangular block on the Cartesian
plane is a good coordinate patch for T 2 .

The length along a path that coincides with one of the coordinate curves on the space is easily
determined by simply computing the difference between the final and initial position along that
path. This is identical to the process by which we measure the length of an object using a ruler,
since the construction of the coordinate grid transfers the information from the ruler onto the
space we wish to study. However, if the coordinate curve does not have a constant sized increment,
or if the path deviates from the coordinate curve, then the path length is not determined by this
simple procedure an a new method of computing lengths is needed. We shall discuss this method
in the next section.

22
2.2 Euclidean Geometry
The solution to the problem of measuring the distance between marked points on a triangle
drawn on a flat plane was known in the ancient world to the Greeks, Egyptians, Mesopotamians
and Babylonians. The problem is described as follows. Given a right-angled triangle with given
length and height, what is the extent of the diagonal edge? There exist many constructions for
this length. An intuitive construction of this length relies on only the relation between similar
triangles in R2 .
Consider the triangle in Figure 2.6
2.6. Label the vertices of a triangle; an edge joining vertices in
a triangle is a directed line element that is labelled by the vertices it intersects. In general, we may
refer to a given edge AB⃗ and with length AB . We can prove by construction the similarity of these
triangles as follows. Consider triangles △AB C and △AB D then

C ÂB = D ÂB [common angle]


A B̂ C = A D̂ B [90◦ ]
A Ĉ B = A B̂ D [sum of internal angles in triangle].

By equality of internal angles, we conclude that triangles △AB C and △AD B are similar and write
△AB C ∼ △AD B . Similar arguments supply △AB C ∼ △D C B and △AB D ∼ △D C B . The size of
an angle is a measure of displacement of a point in space relative to some given reference point
from a third point. This displacement defines the ratios of the edge length between these points.
The similarity of triangles in the collection {△AB C , △AB D , △D C B } implies a corresponding set
of relations among ratios of edge lengths from one triangle with each of the other triangles in the
collection. So,
AB AD DC BC
= and =
AC AB BC AC
and
AB 2 = AD · AC and B C 2 = D C · AC

and the sum of these becomes

AB 2 + B C 2 = AD · AC + D C · AC = (AD + D C ) · AC

where AD + D C = AC , from which we determine

AB 2 + B C 2 = AC 2 . (2.1)

Equation (2.1
2.1) is the celebrated Pythagorean Theorem in Euclidean geometry.

Remark 8 (Pythagorean Tripple) An integral solution to (2.1


2.1) is called a Pythagorean Triple.
Pythagorean triples are generated by the relation
2 2 2
n 2 − m 2 + 2n 2 m 2 = n 2 + m 2 (2.2)

for n, m ∈ Z. Note that this formulation follows directly from the factorisation of quartic
polynomials.

23
C

D φ

θ
A
B
Figure 2.6: The triangle △AB C with right-angle A B̂ C has internal angles equal to those in triangles
△AD B and △D C B . Equality of the internal angles among these triangles ensures that the ratios
of the appropriate edge lengths of among triangles is maintained.

We shall use (2.1


2.1) as the basis of measurement in Euclidean space. We shall use this idea
of length to define an operation that maps vectors to scalars that is compatible with our usual
ideas of length. To construct the an expression for the lengths of vectors in the plane, we may
use the Law of Cosines which follows from the geometry of the plane. Given the labelled triangle
construction in Figure 2.7
2.7, we find the following relations among the edges,
2
⃗ − a⃗) · (b
(b ⃗ − a⃗) = b
⃗ − a⃗
2
⃗ ·b
a⃗ · a⃗ + b ⃗ = ∥a⃗∥2 + b
⃗ − 2a⃗ · b ⃗ ⃗ cos (θ )
− 2 ∥a⃗∥ b
a⃗ · b ⃗ cos (θ ) .
⃗ = ∥a⃗∥ b (2.3)

By the laws of cosines, we find a natural relationship between the magnitudes of two vectors
and the angle between them, and component wise multiplication and sum of two vectors. We
call this the dot or inner product. Clearly, the inner product of a vector with itself returns the
Pythagorean measure of its length. We have used only algebra and the Euclidean law of cosines
to reach this outcome. We can extend this definition to more than two dimensions by adding
more components to each vector, without changing any part of the formal relation or reference to
coordinate directions.

a~

θ ~ − a~
b

~
b

Figure 2.7: The Law of Cosines implies that the lengths adjacent edges and the angle between
them encodes the length of the third edge of any triangle in the plane.

The definition of relative distance defined by (2.1


2.1) and the Law of Cosines shall form the basis
of all measurements of length to follow in these notes. In particular the dot product shall define
the mathematical machinery that we shall use to measure lengths of objects.

Remark 9 Notice that the construction of (2.1


2.1) did not rely on the orientation of the triangle in
Figure 2.6 in the plane. Similarly, the Law of Cosines formulation in (A.1
A.1) did not rely on the

24
orientation of the triangle in Figure 2.7
2.7, rather it required only the relative orientation of the vectors

a⃗ and b . Indeed, the derivation of the relations between these quantities did not need any mention
of special points or point descriptions. The only assumption needed to derive these quantites is that
we work in the flat, 2-dimensional Euclidean plane. The independence of these results from any
choice of description highlights a fundamental insight into the nature of these quantites, that their
existence is independent of how we choose to describe them. Moreover, since these statements are
true, independently of our choice of coordinate description, these quantities must hold true for any
choice of coordinates and changing from one coordinate description to another will not change these
quantities. We shall consider this idea again later when we consider coordinate transformations
which will naturally lead to the concept tensors.

In Euclidean geometry, we have the familiar descriptions of length, area and volume. These
descriptions are encoded in the associated vector products. Additional discussion on vector
products can be found in Section A.3
A.3. Moreover, when the quantities we want to measure are
conveniently aligned with the coordinate axes, then we recover the familiar descriptions of these
quantites. For example, when the edge of an object we wish to study is aligned with one of the
coordinate axis, then we can use unit of measure provided by the coordinate grid to “measure”
the corresponding length. We can find correspondingly simple descriptions for area and volume.

2.3 Coordinate Systems and Their Properties


The state of a particle depends on its position and momentum. For us to reason about these
properties, we need a coordinate system that maps spatial positions onto a mathematical
framework. Recall that the classical universe is an Affine Space. In whichever coordinate system
we work, we must first select a reference point in our system and call it the origin.
René Descartes was the first to introduce the concept of coordinate systems to geometry. He
assigned a pair of real numbers (x , y ) to each point in the plane. The plane thus parameterised is
known as the Cartesian plane. In the following subsection, we shall focus our attention on the
description of points in the Cartesian plane and the transformations among different descriptions.

2.3.1 The 2-Dimensional Cartesian Coordinate System


We begin by choosing two orthogonal directions in the space R2 and label them with the
corresponding unit vectors x̂ and ŷ unit vector directions. The unit vectors in these two
directions, being orthogonal, are linearly independent. Thus, since R2 is a 2-dimensional space,
these unit vectors form an orthonormal basis for the plane. Every point in the plane can then be
expressed as some linear combination of these unit vectors.
Figure 2.9 shows how the coordinate position of a marked point in R2 is described by a position
vector p⃗ with a given extent in each of the coordinate unit vector directions. The general position
p⃗ can be written as
p⃗ = a x̂ + b ŷ

25
where a = p⃗ · x̂ and b = p⃗ · ŷ are the projections of p⃗ along x̂ and ŷ . The pair (a , b ) are called the
coordinates of p⃗ . In a purely symbolic sense, we may think of the above as
‚ Œ
€ Š a
p⃗ = x̂ ŷ
b

where standard row and column matrix multiplication is used to evaluate this product. The unit
vectors x̂ and ŷ above are usually omitted on account of an implicit understanding of which basis
is being used. Indeed, since every vector in the plane can now be uniquely identified with its
coordinates, we can simply identify p⃗ with the tuple of coordinates (a , b ) and write
‚ Œ
a
p⃗ :=
b

where the position in the column of numbers is sufficient to define the different coordinate
direction components.

a p~


b

Figure 2.8: The coordinate direction axes in a 2-dimensional coordinate system that define the
coordinate position of a marked point with position vector p⃗ = (a , b )⊤ .

When constructing a 2-dimensional Cartesian coordinate system to describe a given problem,


we must choose three pieces of information, namely, the origin, the x̂ direction and the ŷ direction.
Notice that the requirement that the x̂ and ŷ directions are orthogonal implies that any choice of
one leaves one of two choices for the other (see Figure 2.9
2.9).
Assume we have chosen a single point in the Affine Space of the Classical Universe to represent
the origin of our coordinate system. There are infinitely many 2-dimensional Cartesian Coordinate
Systems rooted at this single point. For example, there are many coordinate systems, rooted at this
choice of origin, that are related by a rotation operation. Consider three 2-dimensional Cartesian
coordinate systems that are rotated with respect to each other, as in Figure 2.10
2.10. Without loss of

26
y1
x

y2

Figure 2.9: The choice of orthogonal coordinate direction axes in a 2-dimensional coordinate
system.

generality, assume that one of these is the standard x − y Cartesian coordinate system, which is
represented with a horizontal x -axis and a vertical y -axis. Consider a point p with coordinates
(x , y ) relative to a set of coordinate axes x̂ and ŷ , (x ′ , y ′ ) relative to a set of coordinate axes x̂ ′ and
ŷ ′ , and (x ′′ , y ′′ ) relative to coordinate axes x̂ ′′ , ŷ ′′ , where x̂ ′ and ŷ ′ are obtained by rotating x̂ and
ŷ through an angle θ ; and x̂ ′′ and x̂ ′′ are obtained by rotating x̂ ′ and ŷ ′ through an angle φ. Let
p⃗ be the directed line segment starting at the origin of the coordinate system and ending at p .
Then p⃗ = p p̂ where p = p⃗ is the length of the vector p⃗ . This is presented in Figure 2.10
2.10.

ŷ 0 ŷ
00

p~
x̂ 00

α x̂ 0
φ
θ

Figure 2.10: Multiple rotation of a coordinate system with a designated point P marked by the
displacement vector p⃗ . The x̂ , ŷ -coordinate system is rotated by φ relative to the x̂ ′ , ŷ ′ -
′′ ′′
 

coordinate system, which is itself rotated by θ relative to the x̂ , ŷ -coordinate system. The vector
p⃗ now has a different component representation in each coordinate system.

27
Next, consider the relative angular displacements of the each set of coordinate axes. The
following identities are useful in the discussion to follow,
  
cos α + β = cos (α) cos β − sin (α) sin β
  
sin α + β = cos (α) sin β + sin (α) cos β .

From Figure 2.10 we find


x = p cos (α) and y = p sin (α)

and
x ′ = p cos (α − θ ) and y ′ = p sin (α − θ ) .

We can decompose the compound angle expressions using the angular composition identities

x ′ = p (cos (α) cos (−θ ) + sin (α) sin (−θ ))


= p cos (α) cos (θ ) − p sin (α) sin (θ )
= x cos (θ ) + y sin (θ )

and

y ′ = p (cos (α) sin (−θ ) + sin (α) cos (−θ ))


= −p cos (α) sin (θ ) + p sin (α) cos (θ )
= −x sin (θ ) + y cos (θ ) .

We can express these relations in matrix form as


‚ Œ ‚ Œ‚ Œ
x′ cos (θ ) sin (θ ) x
= .
y′ − sin (θ ) cos (θ ) y

The matrix ‚ Œ
cos (θ ) sin (θ )
R (θ ) =
− sin (θ ) cos (θ )
specifies the rotation. Note that R (θ ) carries coordinate pair (x , y ) through an angle θ to (x ′ , y ′ )
in the new coordinate system. The matrix R is called a rotation matrix. Note also, R (−θ ) carries
coordinate pair (x ′ , y ′ ) through an angle θ to (x , y ) in the original coordinate system, such that a
rotation through an angle θ followed by a rotation through an angle −θ leaves the point (x , y )
unchanged. Additionally,
‚ Œ ‚ Œ⊤
cos (θ ) − sin (θ ) cos (θ ) sin (θ )
R (−θ ) = =
sin (θ ) cos (θ ) − sin (θ ) cos (θ )

where ⊤ denotes the matrix transpose. Therefore,


‚ Œ‚ Œ ‚ Œ
cos (θ ) − sin (θ ) cos (θ ) sin (θ ) 1 0
R (−θ )R (θ ) = R ⊤ (θ )R (θ ) =
sin (θ ) cos (θ ) − sin (θ ) cos (θ ) 0 1

28
or
R ⊤ (θ )R (θ ) = 1

which implies that


R ⊤ (θ ) = R −1 (θ ).

Successive rotations can be represented by successive multiplications of the rotation matrix. If P


has coordinates (x ′′ , y ′′ ) with respect to x̂ ′′ and ŷ ′′ , where these axes make an angle φ with x̂ ′ and
ŷ ′ then
x ′′ = x ′ cos φ + y ′ sin φ y ′′ = −x ′ sin φ + y ′ cos φ
   
and

which is now a rotation of (x ′ , y ′ ) through an angle φ. In matrix form


‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ‚ Œ
x ′′ x′
 
cos φ sin φ cos φ sin φ cos (θ ) sin (θ ) x
= =
y ′′ y′
   
− sin φ cos φ − sin φ cos φ − sin (θ ) cos (θ ) y

which is equal to a rotation through an angle φ + θ . Then,


‚ Œ ‚ Œ ‚ Œ
x ′′

cos φ + θ sin φ + θ x
′′
=  
y − sin φ + θ cos φ + θ y

and φ + θ is the angle between x̂ and ŷ , and x̂ ′′ and ŷ ′′ .

Remark 10 Rotating a vector about the origin of a fixed coordinate system through an angle θ is
equivalent to fixing that point and rotating the coordinate axes about the origin through an angle
−θ .

Not all 2-dimensional Cartesian coordinate systems rooted at the same origin can be related
by a rotation. For instance, the coordinate system resulting from swapping the x and y axes is not
a rotation of the original coordinate system. The transformation matrix in this case is given by
‚ Œ ‚ Œ‚ Œ
xp 0 1 x ′p
= .
yp 1 0 y ′p

It is clear that any 2-dimensional Cartesian coordinate system rooted at the same origin as the
original (x , y )-system is either a rotation of the (x , y )-system, a swap of axes, or both a swap of
axes and some rotation of the result. This latter transformation is simply the composition of the
other two ‚ Œ ‚ Œ‚ Œ ‚ Œ
x ′p

xp 0 1 cos φ − sin φ
= .
y ′p
 
yp 1 0 sin φ cos φ
Schematically, such transformations can be written as

r⃗p = M r⃗p′

where M is either the rotation matrix presented earlier or the composition matrix above. Notice
that M ⊤ M = 1, so M is an orthogonal matrix. We write M ∈ O (2) to mean M is an orthogonal
matrix of size 2. O (2) is the orthogonal group of order 2. Observe that if M is a pure rotation then

29
det (M ) = 1 and if M involves a switching of axes then det (M ) = −1. The class of all pure rotations
of two dimensional Cartesian coordinate systems is called S O (2), the special orthogonal group of
order 2. Notice that the length of a vector is unchanged by rotation and this fact extends to higher
dimensions.

2.3.2 The 3-Dimensional Cartesian Coordinate System


A 3-dimensional Cartesian coordinate system is fully specified by four pieces of information. It
is necessary assign to some point in the system the special property of being the origin of the
system. Thinking of all other points in the universe by their difference vector from the origin we
now have a vector space R3 . The other three parameters are the orthogonal unit vectors x̂ , ŷ and
ẑ . As in two dimensions the coordinates of any point can be written in terms of these unit vectors
   
x x
€ Š p  p 
p⃗ = p x x̂ + p y ŷ + p z ẑ = x̂ ŷ ẑ p y  := p y  .
pz pz

As before, once we have chosen two of these (say x̂ and ŷ ) we have two possible choices for the
other (say ẑ ). In this instance, there is a mathematical significance attached to this choice - it
determines the handedness of the resulting coordinate system.

z z

φ φ

y x x y

Left-Handed Right-Handed

Figure 2.11: The relative orientation of the left-handed and right-handed coordinate systems.

Remark 11 (Handedness of a Coordinate System) In a Right-Handed coordinate system, if you


place your right hand at the origin, and point your fingers down the x -axis, and then curl your
hand towards the y -axis, your thumb will point up the z -axis. In a Left-Handed coordinate system
we apply the same rule replacing the right hand with the left hand.

The right-hand rule is used to determine the direction of the third axis given two other
coordinate axis directions in three dimensions. We use this to give an orientation among triples
of orthogonal coordinate unit vectors and define the vector cross-product of two vectors to give a
formal mathematical operation that generates this orientation. We usually prefer to work in a

30

Right-Handed coordinate system. Given the ordered sequence of orthonormal vectors x̂ , ŷ , ẑ ,
the organisation of these vectors into a right-handed system assigns x̂ × ŷ = ẑ , whereas the
organisation of these vectors into a left-handed system assigns x̂ × ŷ = −ẑ . The usual
cross-product formulas are phrased for right-handed coordinate systems.
If we fix the origin, two sets of 3-dimensional Cartesian Coordinate systems are related to each
other by a 3-dimensional Orthogonal Transformation

r⃗p = M r⃗p′

where M ∈ O (3). This fact is easy to take on face-value as a generalization of the result for 2-
dimensional Cartesian Coordinates, and indeed, this result generalizes correctly to n -dimensional
space. This generalisation deserves some discussion; to this end, suppose we relate two sets of
Cartesian Coordinates in n dimensions by an invertible linear transformation Q . Then distances
between points in space should be preserved by Q . No matter what n -dimensional Cartesian
⃗ ) ∈ R between two prescribed points, with position vectors
space we consider, the distance d(a⃗, b
⃗ , should be the same. Thus, we require
a⃗ and b

⃗ ) = d(Q a⃗,Q b
d(a⃗, b ⃗ ).

Stated differently,
⃗ )⊤ (a⃗ − b
(a⃗ − b ⃗ )⊤ (Q a⃗ − Q b
⃗ ) = (Q a⃗ − Q b ⃗)

where (a⃗− b ⃗ ) has a representation as a column of coefficients in the chosen basis and has transpose
⃗ )⊤ . Similar statements are true for (Q a⃗ − Q b
(a⃗ − b ⃗ ) and (Q a⃗ − Q b
⃗ )⊤ . Simplifying this yields

⃗ )⊤ (a⃗ − b
(a⃗ − b ⃗ )⊤Q ⊤Q (a⃗ − b
⃗ ) = (a⃗ − b ⃗ ).

⃗ ∈ Rn it follows that
Since this must be true for all a⃗, b

x⃗ ⊤ x⃗ = x⃗ ⊤Q ⊤Q x⃗ ∀ x⃗ ∈ Rn .

Let us name this central matrix L = Q ⊤Q . Choose x⃗ = (0, 0, . . . , 0, 1, 0, 0, . . . 0)⊤ with the 1 in the i -th
position. Then,
1 = x⃗ ⊤ x⃗ = x⃗ ⊤ L x⃗ = L i i .

Since this holds for all i , the diagonal elements of L must all be 1’s.
We can learn more about the structure of the transformation matrix L by considering the
following neat construction. Choose x⃗ = (0, 0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0)⊤ with 1’s in the i -th and
j -th positions and 0 everywhere else. This yields

2 = x⃗ ⊤ x⃗ = x⃗ ⊤ L x⃗ = L i i + L j j + L i j + L j i .

We know from the previous result that L i i + L j j = 1 + 1 = 2. Combining these statements results in

Li j + L ji = 0

31
but L = Q ⊤Q and so L ⊤ = L , meaning that L i j = L j i . Putting these last two results together, we
get L i j = 0, i ̸= j . Thus L is a diagonal matrix with 1’s on the diagonal and zero everywhere else -
the Identity matrix. We have shown that Q ⊤Q = I , and hence Q must be an orthogonal matrix
to preserve distances. This confirms that, in general, transformations between n-dimensional
Cartesian (orthogonal) coordinate systems is via orthogonal matrices. On the other hand, for every
orthogonal transformation, the angle between two vectors is preserved by the transformation.
This is because the inner product is preserved,

x ⊤ y = x ⊤ 1 y = x ⊤Q ⊤Q y = (Q x )⊤ (Q y ).

So, if we started with an orthonormal basis, an orthogonal transformation will keep our basis
orthonormal, and hence every orthogonal transformation takes a Cartesian coordinate system to
another Cartesian Coordinate System.
In summary, every transformation between Cartesian coordinate systems is an orthogonal
transformation, and every orthogonal transformation maps Cartesian coordinate systems to
Cartesian coordinate systems. In other words the transformations mapping Cartesian coordinate
systems to Cartesian coordinate systems are precisely the orthogonal transformations.
We remark further that the orthogonal matrices always have a determinant of either +1 or −1,

det Q 2 = det (Q ) det (Q ) = det Q ⊤ det (Q ) = det Q ⊤Q = det (1) = 1.


  

Thus det (Q ) = ±1. Those with determinant +1 can be thought of as rotations. We call the collection
of all of these matrices the special orthogonal group S O (n ). Those with determinant −1 can be
thought of as first interchanging axes and then rotating. In 3-dimensions, transformations of
determinant +1 preserve the handedness of the coordinate system, whilst transformations of
determinant −1 reverse the handedness of the coordinate system.

2.3.3 Other Linear Coordinate Systems


In linear algebra, any set of linearly independent vectors that spans the space of interest can be
thought of as a basis for that space. These vectors need not be of unit length, and they need not
be mutually orthogonal. Any point in space can be expressed as a linear combination of these
basis vectors. The coefficients in this expansion are the coordinates of the point with respect to
that basis.
We now consider the transformation equations between standard Cartesian coordinates with
orthonormal basis { x̂1 , x̂2 , x̂3 } and some linear coordinate system with basis {v⃗1 , v⃗2 , v⃗3 }. These
vectors need not be of unit length and they need not be mutually orthogonal. The only requirement
is that they are linearly independent.
Let us consider an arbitrary point in space p⃗ . This point is written in both the original Cartesian
coordinates and in the linear coordinate system as follows,

p⃗ = p 1 x̂1 + p 2 x̂2 + p 3 x̂3 ,


′ ′ ′
p⃗ = p 1 v⃗1 + p 2 v⃗2 + p 3 v⃗3 .

32
x̂i⊤ acts on the basis vector x̂ j as dot product of basis vectors and since the standard basis is
orthonormal, (
1 i=j
x̂i⊤ x̂ j = x̂i · x̂ j =
0 i ̸= j .

Then, operating with x̂i⊤ on p⃗ yields

x̂1⊤ p⃗ = x̂1 · p⃗ = p 1 x̂1 · x̂1 + p 2 x̂1 · x̂2 + p 3 x̂1 · x̂3 = p 1


x̂2⊤ p⃗ = x̂2 · p⃗ = p 1 x̂2 · x̂1 + p 2 x̂2 · x̂2 + p 3 x̂2 · x̂3 = p 2
x̂3⊤ p⃗ = x̂3 · p⃗ = p 1 x̂3 · x̂1 + p 2 x̂3 · x̂2 + p 3 x̂3 · x̂3 = p 3 .

Similarly,
′ ′ ′
x̂1⊤ p⃗ = x̂1 · p⃗ = p 1 x̂1 · v⃗1 + p 2 x̂1 · x̂2 + p 3 x̂1 · v⃗3
′ ′ ′
x̂2⊤ p⃗ = x̂1 · p⃗ = p 1 x̂2 · v⃗1 + p 2 x̂2 · x̂2 + p 3 x̂2 · v⃗3
′ ′ ′
x̂3⊤ p⃗ = x̂1 · p⃗ = p 1 x̂3 · v⃗1 + p 2 x̂3 · x̂2 + p 3 x̂3 · v⃗3 .

It then follows that


′ 1′ ′
          
p1 x̂1⊤ v⃗1 x̂1⊤ v⃗2 x̂1⊤ v⃗3 p1 x̂1⊤ € p p1
 2  ⊤   ′   Š  ′ ⊤  ′
p  =  x̂2 v⃗1 x̂2⊤ v⃗2 x̂2⊤ v⃗3  p 2  =  x̂2⊤  v⃗1 v⃗2 v⃗3 p 2  = X V p 2  .
′ ′ ′
p3 x̂3⊤ v⃗1 x̂3⊤ v⃗2 x̂3⊤ v⃗3 p3 x̂3⊤ p3 p3

In the above, X is the matrix whose column vectors are x̂1 , x̂2 and x̂3 while V is the matrix whose
column vectors are v⃗1 , v⃗2 and v⃗3 . We are free to write the matrices X and V in any coordinate
system, as long as we use the same one for both. If we use the original Cartesian Coordinates,
then X = 1, and the columns of V are the coordinates of the vectors v⃗1 , v⃗2 and v⃗3 in our original
Cartesian Coordinate System.
With X and V defined as above, the same result can be derived directly
′ ′
       
p1 p1 p1 p1
 ′ −1  ′
p⃗ = X p 2  = V p 2 
 2
and p  = X V  p 2  . (2.4)
 
′ ′
p3 p3 p3 p3

Since X is orthogonal, X −1 = X ⊤ . In the general case of transforming between any two linear
coordinate systems, the last step no longer applies, but the remainder of the argument is valid,
and (2.4
2.4) gives the transformation rule between any two sets of linear coordinate systems. We
can continue in this line to show in a different way that the matrix of transformation between two
Orthonormal coordinate systems is orthogonal.

2.3.4 Curvilinear Coordinates


When dealing with orthonormal coordinate systems it is quite clear what the coordinate axes are.
This concept generalizes to all coordinate systems. In particular, it generalizes to non-orthogonal

33
coordinate systems as well. If we set all the coordinates to zero except for one, and allow that one
coordinate to vary, we trace out the coordinate axis of that coordinate. The reader should check
that this concept yields the standard x −, y − and z -axes in the Cartesian Coordinate systems. In
arbitrary linear coordinate systems, the coordinate axes are lines emanating from the origin along
the directions of the basis vectors.
A related concept is that of coordinate curves. Instead of setting the other coordinates to zero
if we simply fix them as some prescribed constants, while allowing our chosen coordinate to vary,
we obtain a coordinate curve for that coordinate. Each coordinate is then associated with a family
of coordinate curves. In linear coordinate systems, these curves are lines parallel to the basis
vectors.
Finally, we consider the idea of coordinate surfaces. In a 3-dimensional coordinate system, we
can set one of the coordinates to be a constant value and allow the other two to vary. The figure
that they trace out is called a coordinate surface. It should be clear that the coordinate surfaces of
linear coordinate systems are planes.
So far we have covered only linear coordinate systems. That is, we’ve considered coordinate
systems that made use of the vector-space nature of 2-dimensional and 3-dimensional space
to assign coordinates to each point. In all of these systems the coordinate curves are lines and
the coordinate surfaces are planes. However, it is not necessary to confine ourselves to linear
representations of space. Indeed, any parameters that unambiguously label every point in space
can be thought of as coordinates. It is possible, and often useful, to use nonlinear coordinate
systems for solving problems.
We consider here a family of coordinate systems collectively known as Curvilinear
Coordinates. The name Curvilinear Coordinates is derived from the fact that the coordinate
curves and coordinate surfaces are not necessarily straight lines and planes in these coordinate
systems, but curved lines and curved surfaces. For every set of curvilinear coordinates we
construct, we will consider the following aspects: transformation equations, directional
derivatives and length measurements along curves. The translation between different coordinate
systems corresponds to finding transformation equations between coordinate systems and
taking into account the change in geometry of the coordinate curves.

2.3.5 Transformation Equations


Consider the curvilinear coordinates s 1 , s 2 , . . . , s n . In order for a unique set of curvilinear
coordinates to correspond to every point in space, we need to stipulate a correspondence
between the Cartesian Coordinates of some point (in a well-specified Cartesian System), and the
Curvilinear Coordinates of the same point. We achieve this through transformation equations

x 1 = x 1 s 1, s 2, . . . , s n ,


x 2 = x 2 s 1, s 2, . . . , s n ,


..
.

34
x n = x n s 1, s 2, . . . , s n .


It is customary to refer to the coordinates with superscripts (s i ) rather than subscripts (si ). There
is a reason for this, but for now we simply accept this as convention.
We require that these transformation equations are well-behaved. We mean by this

1. The equations must be locally invertible. In some neighbourhood of every point there is an
expression for the curvilinear coordinates in terms of the Cartesian Coordinates. If this was
not the case the coordinates would not ‘uniquely’ label the points in some region of space.

2. The Jacobian Matrix of the transformation must be non-singular. This is necessary to


ensure that the number of curvilinear coordinates at a point correspond to the number of
coordinates that described that point and ensures that the transformation is well defined.

3. The transformation equations must be differentiable, and the derivatives must be


continuous. Often they are smooth (infinitely differentiable). This requirement ensures that
the coordinate curves are indeed curved, as opposed to jagged.

The inverse function theorem tells us that point 3 above follows from points 1 and 2. From point
3, we can always invert the equations of transformation at some given point in space, so that we
can write
x 1 = x 1 (s 1 , s 2 , . . . , s n ) s 1 = s 1 (x 1 , x 2 , . . . , x n )
.. .. (2.5)
. and .
x n = x n (s 1 , s 2 , . . . , s n ) s n = s n (x 1 , x 2 , . . . , x n ).
A more formal motivation for these facts will follow after we’ve defined tangent vectors. Next we
consider two familiar examples of invertible coordinate transformations.

Example 2.6 (Rectilinear Coordinate to Rectilinear Coordinate Invervsion) Consider the


following reparametrisation of the standard x − y -coordinate system

x ′ (x , y ) = x − y
y ′ (x , y ) = x + y .

We can invert this coordinate transformation directly to find


 1
x x ′, y ′ = x ′ + y ′

2
1
y x ′, y ′ = y ′ − x ′ .
 
2
Clearly, the forward and reverse transformations are simple linear transformations of the coordinate
on either side of the equality sign. We can associate to this transformation a simple matrix transform
‚ Œ ‚ Œ‚ Œ
x′ 1 −1 x
=
y′ 1 1 y

35
for which we now have the inverse transformation
‚ Œ ‚ Œ‚ Œ
x 1 1 x′
= .
y −1 1 y ′

In the case of this transformation, it follows that the representation of any vector r⃗ in the x − y -
coordinate system has a corresponding representation r⃗′ in the x ′ − y ′ -coordinate system, where

r⃗′ = A −1 r⃗ and r⃗ = A r⃗′

where the primed and unprimed coordinate axes are related by

X⃗ ′ = A X⃗ and X⃗ = A −1 X⃗ ′ .

The proof of these statements is straight forward and is left as an exercise to the reader. It is now
simple to verify that the primed coordinates system corresponds to a rotation of the unprimed
coordinate system through an angle of −π/4.

Example 2.7 (Plane Polar Coordinate to Rectilinear Coordinates Inversion) Consider the
standard plane polar coordinates on the x − y -plane,

x (r, θ ) = r cos (θ )
y (r, θ ) = r sin (θ ) .

We can invert this coordinate transformation by noting that

y r sin (θ ) y 
= = tan (θ ) or θ = arctan .
x r cos (θ ) x
Now, we find v
x x t  y 2 p
r= = y
 = x 1+ = x2 + y 2
cos (θ ) cos arctan x
x
or v
y y t  y 2 p
r= = y
 = x 1+ = x2 + y 2
sin (θ ) sin arctan x
x
and we have used the identities
1 θ
cos (arctan (θ )) = p and sin (arctan (θ )) = p .
1+θ2 1+θ2
In each case, we recover the familiar coordinate transformation functions
p y 
r (x , y ) = x + y
2 2 and θ (x , y ) = arctan .
x

At this point it is important to note the subtlety of the inverse trigonometric function arctan y /x ,
which will return the correct values for the angle θ on the open interval y /x ∈ (−π/2, π/2). The
interested reader should consider a reformulation of this inverse function which will return the
correct value of theta on the complete angular interval [0, 2π).

36
2.4 Coordinate Curves and Coordinate Surfaces
We will normally distinguish a particular curvilinear coordinate system according to the shapes
of the coordinate curves (in the 2-dimensional case) or coordinate surfaces (in the 3-dimensional
case). We will derive these objects for each coordinate system we discuss.

2.4.1 Parametric Curves


Parametric curves give a simple way to assign coordinates from one space to another. We can
assign coordinates on the real line R to the real line R by linear transformation

f :R→R
f (x ) = a x + b = x ′

where a , b ∈ R. The mapping f assigns to each x ∈ R a unique image x ′ ∈ R for every x ∈ R.


For each choice of a , b ∈ R there corresponds a different map f . Clearly, the choice of f is not
unique. When the spaces involved in the coordinate mapping are more complicated than R, we
should expect that the mapping function f will be more complicated, too. As an example of a
more complicated mapping, suppose we consider a coordinatisation map f that assigns to each
point in R a point on the unit circle S 1 . Next, we consider one example of such a mapping known
as a projective map.

Example 2.8 (Projection Mapping R → S 1 ) An efficient way to assign numbers to each point on
the circle using a continuous mapping from the real line is by considering displacement vectors
that connect points on the line to points on the circle. Suppose we mark the origin O as the zero
point on the real line R and as the centre of the unit circle S 1 , choose the point P as the point on the
circle directly above O , and let Z be any point in r ∈ R, see Figure 2.12
2.12.
Clearly ‚ Œ ‚ Œ ‚ Œ
0 0 r
O= , P= and Z=
0 1 0
We can think of P as the ‘north pole’ of the circle. Now construct the following position vectors
‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ
0 0 0 r 0 r
p⃗ = P − O = − = and v⃗ = Z − P = − = .
1 0 1 0 1 −1

Notice that p⃗ connects the origin to the North pole and v⃗ connects the north pole to the point in R.
The line line element O P will intersect the circle S 1 at some point on its circumference. This line
element with be directed along v⃗ but will have a length that depends on the position of Z . We can
construct a new vector that joins the north pole to the point of intersection between the line element
O P and the circle using a linear combination of vectors p⃗ and v⃗,
‚ Œ
λr
w⃗ = p⃗ + λv⃗ = and ∥w⃗ ∥ = 1
1−λ

37

P
p~
a
w~
v~

O Z

−P

Figure 2.12: The projection mapping from the circle to the real line allows to uniquely map the
entirety of the real line, in a smooth and continuous manner, to the circle. Points in R are placed
in one-to-one correspondence with points in S 1 such that for each markded point a on S 1 there
exists a point Z on the real line that uniquely parameterizes the point a on the circle. Note that
this mapping is invertable everywhere except at a = P , which is the image of both −∞ and ∞ in
R.

where λ is a scaling parameter which adjusts the contribution of v⃗ in the definition of w⃗ and
changes the length of w⃗ along the line connecting the north pole an z . An appropriate choice of λ
will define the displacement vector w⃗ connecting the north pole to the point a on the circle. The
parameter λ is called a Lagrange Multiplier.
If we enforce the requirement that the radius of the circle is 1, then we can determine the value
of the multiplier λ. We can break down the calculation as follows.
‚ Œ ‚ Œ
λr λr
2
∥w⃗ ∥ = · = λ2 r 2 + (1 − λ)2 = 1.
1−λ 1−λ

We compute the squared magnitude of w⃗ as a function of λ and set the value equal to 1

2
λ=
r2+1
and then solve λ as a function of the parameters of the problem.
‚ Œ
2r
r 2 +1
w⃗ = r 2 −1
.
r 2 +1

Finally, we recover a component wise expression for the value of w⃗ . Now for every value of the
parameter r , we assign a unique point to the circle. This argumentation can be generalized to
higher dimensions.

Remark 12 Lagrange Multipliers are often used in optimisation problems. A single Lagrange
multiplier is used in Example 2.8 to assign a specific value to a component in a vector sum, subject
to a geometrical restriction. Lagrange Multipliers are especially useful in problems where we

38
compare the lengths of parallel vectors. We shall revisit this concept later on when considering the
constrained motion of objects.

Suppose we were to consider some general functions f that take input parameters r, s and t
and output values in R, R2 or R3 . The generic functions f map elements from one space with
some number of dimensions (number of inputs to f ) into another space with a different number
of dimensions (number of outputs from f ). The functions a , b and c describe the components of
the outputs of f . In each case, the number of input parameters determines the dimension of the
output of f , regardless of the number of dimensions of the space where this output is sent. For
simplicity, only the input and output sequences are listed.

t 7→ f (t ): f map the real number line, R, back to itself, where f can only stretch or compress
parts of it.

(r, s ) 7→ (a (r, s ), b (r, s )): f maps the 2-dimensional plane, back to the 2-dimensional plane with,
where f can rotate, stretch or compress parts of it.

(r, s ) 7→ (a (r, s ), b (r, s ), c (r, s )): f maps the 2-dimensional plane into a subspace of 3-dimensional
euclidean space. The outcome is some continuous surface in 3 dimensions.

t 7→ (a (t ), b (t ), c (t )): f maps the real line to a 1-dimensional curve in 3-dimensional Euclidean


space. The outcome of this mapping is some smooth, continuous line in 3 dimensions.

In general, we shall use the word curve to mean a continuously connected, smooth, subspace of
another space. In particular, we shall consider curves in R3 . It is important to note that there is
a class of coordinate transformation for each of the parameters r, s and t that leave the output
unchanged and we consider these next.

Remark 13 We have already encountered the idea of an affine transformation. We can think of an
affine transformation more generally as any transformation that preserves collinearity and ratios
of distances.

An affine transformation is any transformation that preserves collinearity (i.e., all points lying
on a line initially still lie on a line after transformation) and ratios of distances (e.g., the midpoint
of a line segment remains the midpoint after transformation). In particular, we may transform
the parameter t along 1-dimensional curves such that it lies in the range from 0 to 1 and think of
this number as marking the percentage of the total length of the curve (t = 0 at the beginning of
the curve and t = 1 at the end).

2.4.2 Tangent Vectors


One of the reasons we require the transformation equations to be differentiable is that the
derivatives give us useful information about the structure of the transformation. In particular,
consider what happens when we fix all the coordinates at some point and allow one of them to

39
vary infinitesimally. The motion thus produced is in the ‘characteristic direction’ of that
coordinate. It is tangent to the coordinate curve at that point. Formally, we write the tangent
vector associated with the coordinate s i as
 1 1 2
x (s , s , . . . , s n )

 2 1 2 n 
∂  x (s , s , . . . , s )  ∂ x⃗
 ‹
e⃗i = .. = .
∂ si  ∂ si

. 
x n (s 1 , s 2 , . . . , s n )

This idea is best understood through examples. The most basic example is Cartesian coordinates.
Example 2.9 demonstrates the most simple transformation equations corresponding to the identity
transformation of the coordinates curves. In this case the transformation equations are trivial.

Example 2.9 (Trivial Cartesian Coordinate Transformation) Consider the identity coordinate
transformation of the Cartesian coordinate system,

x = x (x , y , z ) = x , y = y (x , y , z ) = y , and z = z (x , y , z ) = z .

So, the tangent vectors are


           
x 1 x 0 x 0
∂     ∂     ∂    
e⃗x =  y  = 0 = x̂ , e⃗y =  y  = 1 = ŷ and e⃗z =  y  = 0 = ẑ ,
∂x ∂y ∂z
z 0 z 0 z 1

These are the unit vectors in the x , y and z directions respectively.

For a general linear coordinate system with basis vectors {v⃗1 , v⃗2 , v⃗3 }, the tangent vectors at
any point can be easily shown to be exactly the basis vectors. This makes sense - the coordinate
curves run parallel to the basis vectors.
The tangent vectors are not always unit vectors. When they are not unit vectors, we can
normalize them to obtain unit vectors in the coordinate directions. The sizes of the tangent
vectors are called the metric coefficients, hi , of the coordinate system. We write,

e⃗i = hi ŝi ,

where ŝi is a unit vector in the same direction as e⃗i that we can think of as the unit vector in the
i -th coordinate direction.
Consider the Jacobian Matrix, J , of the coordinate transformation.
€ ∂ x 1 Š € 1 Š
∂x
∂ s1 . . . ∂ sn
d x⃗
 ‹
 .. . .. ..  .
J= = . . 
d s⃗ n n 
∂x
. . . ∂∂ xs n

∂ s1

Clearly, the columns of J are the tangent vectors


€ Š
J = e⃗1 e⃗2 . . . e⃗n−1 e⃗n .

40
The matrix will be singular if and only if the tangent vectors are linearly dependent; but this
means that the tangent space would collapse (its dimension would decrease), and the coordinate
curves would coincide. We do not want this to occur because it implies that locally (very zoomed
in), the coordinate space itself will collapse and not be invertible. This explains why a sensible
transformation must have a non-singular Jacobian Matrix at all points, and keeps our intuition
sharp about a requirement that might otherwise seem arbitrary and unnatural. The interested
reader should read up on the Inverse Function Theorem (or the Constant Rank Theorem) to formally
understand this point.
For the well behaved coordinate systems that we will consider, it follows that the tangent
vectors are always linearly independent and will hence always form a basis for the tangent space,
which is always n -dimensional. When we zoom in very close to a point in coordinate space,
the space begins to look like the tangent space. If we consider the region of space enclosed by
varying each coordinate over an increasingly small interval, this region will begin to resemble the
fundamental parallelogram or parallelepiped of the tangent space. The (signed) area/volume
of this entity is given by the determinant of the matrix formed by placing the tangent vectors
as column vectors – i.e. the Jacobian Matrix. This reasoning leads us to a result well known in
Multivariable Calculus

d x 1 d x 2 . . . d x n = |det (J )| d s 1 d s 2 . . . d s n . (2.6)

In many of the examples we study, the tangent vectors will be orthogonal. In this case the unit
vectors will be orthonormal, and we will have
€ Š € Š € Š
det (J ) = det e⃗1 . . . e⃗n = det h1 ŝ1 . . . hn ŝn = h1 . . . hn det ŝ1 . . . ŝn .

This last matrix of unit vectors is an orthogonal matrix - we can see this by pre-multiplying it by
its transpose
 
 ⊤
 ŝ1⊤ ŝ1 ŝ1⊤ ŝ2 . . . ŝ1⊤ ŝn  
ŝ1  ⊤ 1 0 . . . 0
ŝ ŝ ŝ ⊤ ŝ ... ŝ2⊤ ŝn  

ŝ ⊤  €
 2 Š  2 1 2 2 0 1 . . . 0
ŝ ⊤ ŝ ŝ ⊤ ŝ ... ŝ3⊤ ŝn  = 

 .  ŝ1 ŝ2 . . . ŝn =  .. .. . . ..  = 1

 ..   3 1 3 2   . .
 .. .. .. ..   . .
 . . . . 
ŝn⊤ 0 0 ... 1
ŝn⊤ ŝ1 ŝn⊤ ŝ2 ... ŝn⊤ ŝn

Now the determinant of an orthogonal matrix is either +1 or −1, with the sign depending on the
order in which we supply the rows. The absolute value of this is always +1, and thus we get

|det (J )| = h1 h2 . . . hn ,

with the convention that we have chosen all the hi to be positive. Thus when the tangent vectors
are orthogonal, we can compute the area/volume elements very easily - we simply take the
product of the metric coefficients. We have already considered some applications of the matrix
determinant and and this topic shall appear again in various forms in these notes. To this end,
Section A.4 contains addtional details and discussion on this and related matters.

41
2.4.3 Cotangent Vectors
It is worth remarking at this point that there are actually two kinds of direction vectors associated
with any change of coordinates. The first kind is the tangent vectors discussed above, which we can
think of as the partial derivatives ∂∂s i . The second kind is a set of vectors known as the cotangent


vectors. These are associated with differentials d s i . From the chain rule we can write out the
differential as
i
   i  n‹
i ∂ s 1 ∂s 2 ∂s
ds = dx + d x + ··· + dxn .
∂x 1 ∂x 2 ∂ xn
This last expression looks like a dot product, and can be rewritten as

∂ si
€ Š 
dx1

∂ x1 Š
 ∂ si 
€
d x 2 
 ∂ x2 
i
ds =  . · ..  .
   
€ .. Š   . 
i ∂s
∂ xn
dxn

This leads us to directly associate the differential d s i with the vector

∂ si
€ Š
∂ x1
€ ∂ s i Š
∂ si
 
 ∂ x2 
i
e⃗ =  .  = .
€ .. Š ∂ x⃗
i ∂s
∂ xn

We call this vector, evaluated at some point, the i -th cotangent vector at the given point. For
Cartesian coordinates, the tangent and cotangent vectors coincide.
The inner products between the cotangent vectors are known as the components of the
contravariant metric tensor
g i j = e⃗i · e⃗ j .

The cotangent vectors are always linearly independent for the well behaved systems that we will
consider. This follows from the result of the next section.

2.4.4 Tangent and Cotangent Vector Component Relations


In general it is not necessary for the tangent vectors to be orthogonal. Likewise the cotangent
vectors are not necessarily orthogonal. However, the tangent and cotangent vectors satisfy a
condition of mutual orthogonality

∂sj
€ Š  € Š
∂ x1
∂ x1 Š ∂ si
 ∂sj 
€ € ∂ x 2 Š  n 
∂sj ∂ xk
   j
j j
 ∂ x2   ∂ si  X ds
e⃗i · e⃗ = e⃗ · e⃗i =  .  · 
  
..

= = . (2.7)
€ .. Š   k =1 ∂ x ∂s dsi
k i
.
j
∂s ∂ xn

∂ xn ∂ si

42
Where the second last step follows from the chain rule, and the last follows from the fact that the
s i are functionally independent (not constrained or related to each other by any equations). This
relationship can be written more elegantly by making use of the Kronecker δ-function,
(
j 1 if i = j ,
δi = (2.8)
0 otherwise.
We can state the mutual orthogonality relation as
j
e⃗i · e⃗ j = δi . (2.9)

Because of this relation, the tangent and cotangent vectors are called reciprocal bases. As will
become apparent shortly, this is a very useful result. We now obtain some geometric insight into
the mutual orthogonality identity in Figure 2.13
2.13.

e~2 e~2 e~1


e~2
e~ 2 e~1 e~1
e~1

(a) (b) (c)

Figure 2.13: Mutual orthogonality of tangent and cotangent vectors In figure (a) we begin with our
original basis vectors (tangent vectors). Figure (b) shows how the mutual orthogonality condition
completely determines the vector e⃗2 ; firstly the constraint that it must be orthogonal to e⃗1 forces
it to be on the blue line illustrated above. Secondly the constraint that the dot product e⃗2 · e⃗2 = 1
forces it to make an acute angle with the vector e⃗2 and determines its length completely. Finally,
figure (c) illustrates the same idea for choosing the other reciprocal vector, e⃗1 .

It is clear from the example that the mutual orthogonality condition completely specifies the
basis e⃗1 , . . . , e⃗n once the basis {e⃗1 , . . . , e⃗n } is known. The geometric method is purely algebraic


and does not rely on the direct calculation of the cotangent vectors. Notice how neither set of
basis vectors is orthogonal in this example, but mutual orthogonality still applies.
The purely algebraic method presented above generalizes to any number of dimensions. We
consider the Jacobian matrix, J , whose columns are the tangent vectors. Because the coordinate
transformations are well behaved, J is invertible and so J −1 exists and is unique. Let the rows of
J −1 be r⃗1⊤ , r⃗2⊤ , . . . , r⃗n ⊤ . Then,


 ⊤
r⃗1 · e⃗1 r⃗1 · e⃗2 . . . r⃗1 · e⃗n
   
1 0 ... 0 r⃗1
r⃗⊤  €
 r⃗2 · e⃗1 r⃗2 · e⃗2 . . . r⃗2 · e⃗n 
0 1 . . . 0 Š  
−1  2
1 = . . . = J J = e
⃗ e
⃗ . . . e
⃗ = ..  .
 
. . 1 2 n . .. ..
 .. .. . . ..   ..   .. .
   
. . 
0 1 ... 1 r⃗n⊤ r⃗n · e⃗1 r⃗n · e⃗2 . . . r⃗n · e⃗n
It clearly follows that the rows of J −1 are mutually orthogonal with the columns of J , and that this
property defines J −1 so that these are the unique vectors satisfying it. Clearly then the rows of J −1
are the cotangent vectors r⃗i = e⃗i .

43
One consequence of the above argument is that, for our well behaved coordinate systems
€ ∂ x 1 Š € 1 Š−1 € 1 Š € 1 Š
∂ s1 . . . ∂∂ sxn ∂s
∂ x1 . . . ∂∂ xs n
 .. .. ..  =  .. .. .. 
 . . .   . . . 
∂ xn ∂ xn ∂ sn ∂ sn
   
∂ s1 ... ∂ sn ∂ x1 ... ∂ xn

Stated differently, the Jacobian of the inverse of some transformation is the inverse of the Jacobian
of the transformation.
As a corollary we find that if the tangent vectors are orthonormal, then J −1 = J ⊤ , and so the
tangent vectors and the cotangent vectors coincide. The algebraic manipulations presented here
are general and are used to construct (A.12
A.12) that related the determinant of the Jacobian matrix
to the determinant of the metric. What happens when the tangent vectors are orthogonal, but
not orthonormal?

2.4.5 The Metric Tensor


The Cartesian tangent vectors are orthogonal, but clearly the tangent vectors are not orthogonal
for all coordinate systems. In a linear coordinate system, for instance, the tangent vectors are only
orthogonal if the basis vectors are orthogonal. The inner products between the tangent vectors
are known as the components of the covariant metric tensor

g i j = e⃗i · e⃗j . (2.10)


p
Clearly hi = g i i . For an orthogonal system, g i j is zero whenever i ̸= j . For a general coordinate
system, the tangent vectors at a given point form a basis for a linear space rooted at that point.
We call this space the tangent space of the coordinate space at that point. The tangent vectors
are sometimes referred to as the direction vectors at a point. The tangent space can be thought of
as a very zoomed in picture of coordinate space near a point. Formally, it is the linearisation of
coordinate space at the point. As we zoom closer and closer to the point, the coordinate space
‘flattens out’ into the tangent space.
We may construct an object g whose components are exactly the g i j of (2.10
2.10). By indexing
into g we may extract each component g i j of g in a given computation. Now, the dot product of a
vector u⃗ with a vector v⃗ in N -dimensions, in a given coordinate basis {e⃗k } is expressed as

u⃗ · v⃗ = u 1 e⃗1 + u 2 e⃗2 + . . . u N e⃗N · v 1 e⃗1 + v 2 e⃗2 + . . . v N e⃗N


 

N X
X N
= u i v j e⃗i · e⃗j
i =1 j =1
N X
X N
= gi j u i v j .
i =1 j =1

Remark 14 (Summation Convention) It is common to omit the summation symbols and write
instead
u⃗ · v⃗ = g i j u i v j ,

44
where the summation over the indices i and j is implicit. This implicit summation is referred to as
a summation convention.

A point in space has two basic sets of coordinates:

Tangent Basis: Coordinates in terms of the tangent vectors (known as the Contravariant
Components of the Vector)
v⃗ = v 1 e⃗1 + v 2 e⃗2 + · · · + v m e⃗m . (2.11)
Conventionally these components are written with a superscript v i .

Cotangent Basis: Coordinates in terms of the cotangent vectors (known as the Covariant
Components of the Vector)
v⃗ = v1 e⃗1 + v2 e⃗2 + · · · + vm e⃗m . (2.12)
Conventionally these components are written with a subscript vi .

This is nothing especially new - we know from linear algebra that we can write the same vector in
terms of two different bases. What is special is the two particular bases we have chosen. Because of
mutual orthogonality, we can determine the coordinates very easily. We simply take dot products
with the tangent and cotangent vectors, respectively,

e⃗i · v⃗ = e⃗i · v1 e⃗1 + v2 e⃗2 + · · · + vm e⃗m = vi




e⃗i · v⃗ = e⃗i · v 1 e⃗1 + v 2 e⃗2 + · · · + v m e⃗m = v i .




Thus finding the covariant or contravariant components of some point is as easy as taking the
dot product with the tangent or cotangent vectors, respectively. Clearly, it follows from the
relationships of the dot products that for given vectors u⃗ and v⃗,
j
u⃗ · v⃗ = g i j u i v j = g i u i v j = g i j u i v j = g i j u j v i . (2.13)

We interpret covariant components u i as the elements of row matrices and contravariant v j as


the elements column matrices. Then we may replace the explicit reference to summation indices
in the summation convention for computing inner products with the multiplication rules of linear
algebra and write instead
u⃗ · v⃗ = U ⊤ gV (2.14)
where U is the row matrix with components of u⃗ , V is the column matrix with components of v⃗
and g is the metric tensor. This result will be useful when we are transforming vector equations
from one coordinate system to another. We can replace the explicit reference to summation
indices by employing matrix algebra.

Example 2.10 (Simple Matrix Dot Product) Consider the dot product between the basis unit
vectors x̂ and ŷ in the standard 2-dimensional rectilinear coordinate system. Clearly, x̂ · ŷ = 0. We
may rewrite these vectors as column matrices
‚ Œ ‚ Œ
1 0
x̂ = and ŷ =
0 1

45
and use the 2-dimensional identity matrix
‚ Œ ‚ Œ
1 0 x̂ · x̂ x̂ · ŷ
1= =
0 1 ŷ · x̂ ŷ · ŷ

as the metric tensor to write


‚ Œ‚ Œ ‚ Œ
Š 1 0 0 Š 0
x̂ · ŷˆ = 1 0
€ €
= 1 0 = 0.
0 1 1 1

Similarly, we find ‚ Œ‚ Œ ‚ Œ
Š 1 0 1 Š 1
x̂ · x̂ˆ = 1 0
€ €
= 1 0 =1
0 1 0 0
and ‚ Œ‚ Œ ‚ Œ
Š 1 0 1 Š 1
ŷ · ŷˆ = 1 0
€ €
= 1 0 =1
0 1 0 0
as expected. More generally, in orthonormal coordinate basis (defined using a system of mutually
orthogonal unit vectors), e⃗i · e⃗j = δi j , then

g i j = gi j and u i = ui

and so
‚ Œ‚ Œ ‚ Œ
€ Š 1 0 v1 € Š v1
u⃗ · v⃗ = u 1 u 2 = u1 u2 = u1 v 1 + u2 v 2 = u 1 v 1 + u 2 v 2
0 1 vu vu

as expected, and the matrix multiplication is explicit. By similarly argument,


‚ Œ‚ Œ ‚ Œ
€ Š 1 0 u1 € Š u1
u⃗ · u⃗ = u 1 u 2 = u1 u2 = u 1 v 1 + u 2 u 2 = u⃗ 2 ,
0 1 u2 u2

again, as expected.

2.4.6 Tensor Algebra


Before proceding to some examples, we should first pause to consider the usefulness of these
objects and the corresponding notational conveniences. Consider the case of a system of linear
equations in two variables x and y ,

ax +by =e
c x +d y = f

and suppose that a , b , c , d , e and f are constants. The structure of these equations is of the form
of a sum over the product a coefficient and one of the variables set equal to a given constant. We
can rewite this in a compact form by defining the matrices
‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ ‚ Œ
a b A 11 A 12 x z1 e u1
A= = 2 z⃗ = = u⃗ = =
c d A 1 A 22 y z2 f u2

46
where each matrix is an organized list of elements and each entry in each matrix is referenced by
⃗ referenced by a single index, denoting the row each
a set of indices. In this way, entries in z⃗ and h
corresponding entry, while entries in A are referenced by a pair of indices denoting the row and
column of each entry. Then

A 11 = a A 12 = b z1 = x u1 = e
and and
A 21 = c A 22 = d z2 = y u2 = f .

The system of equations is now written as

A z⃗ = u⃗

where, using standard matrix multiplication,


‚ Œ‚ Œ ‚ Œ ‚ Œ
A 11 A 12 z 1 A 11 z 1 + A 12 z 2 u1
A z⃗ = 2 = = .
A 1 A 22 z 2 A 21 z 1 + A 22 z 2 u2

Clearly,
A 11 z 1 + A 12 z 2 = u 1 and A 21 z 1 + A 22 z 2 = u 2 .

We rewrite each of these equations as an indexed sum over matrix elements


2
X 2
X
A 1i z i = u 1 and A 21 z 1 = u 2 .
i =1 i =1

In each case, the value of the i -th entry of u⃗ is given by a sum whose index iterates over a subset
of the elements of A and z⃗ and the summation of terms runs over the columns entries, in a given
row, of A and over the row entries of z⃗. We can rewrite each of these sums as
2
j
X
A izi = u j
i =1

where j can take either the value 1 or the value 2. If we make use of a summation convention,
then the entries of u⃗ can be written as
j
A izi = u j (2.15)

and for each choice of index j , the sum over paired upper and lower indices i corresponds to
a sum over paired column an row entries where the lower an upper limits of the summation
j
over i are implied. The unparied row index j in the sum A i z i denotes the corresponding row
j
index of the element u j of u⃗ . The position of unpaired and unsummed index j in A i z i on the
left-hand-side determines the position of the index j on the right-hand-side. Following this
reasoning, the consistency of the notation implies that
j
z j A i = vi (2.16)

corresponding to a row vector


€ Š
z⃗⊤ A = a x + c y b x +d y .

47
j
Now, the unpaired columns index i in the sum z j A i denotes the corresponding column index of
the element vi corresponding to the entries in a row vector. Then the position of unpaired and
j
unsummed index i in z j A i on the left-hand-side determines the position of the index i on the
right-hand-side.

Remark 15 The free index j in (2.15


2.15) labels the j -th row of a column vector. This agrees with our
expectation from the rules of matrix multiplication, where the product of a square matrix and a
column vector is again a column vector. Similarly, the free index i in (2.16
2.16) labels the i -th column
of a row vector. This also agrees with our expectation from the rules of matrix multiplication, where
the product of a row vector and a square matrix is again a row vector.

Remark 16 Combining the statements of (2.9


2.9) and (2.10
2.10), it follows that

g k i g i j = δi j (2.17)

so we understand g k i to be the inverse operator of g i j . Moreover, combining (2.11


2.11) and (2.12
2.12) with
the appropriate upper-index or lower-indexed components of the metric tensor allows us to raise or
lower indices,
g k i z i = zk and g k i zi = z k

using a summation convention, thereby allowing us to convert between row and column
representation of vectors and tensors. Notice, again, that the index structure indicates the kind of
object produced by the operation. This makes sense in terms of vectors and covectors whose dot
product is a number. This is reproduced in the algebra of matrix multiplication

Z ⊤ Z = z i z i = g i j z j z i = z⃗ · z⃗ = ∥z⃗∥2

where Z is a column matrix corresponding the the vector z⃗ and Z ⊤ is the row matrix corresponding
to the covector of z⃗.

We shall use the technology of this section in each of the subsequent sections. In each case
where it is used, the notions of index manipulation and summation is implied, unless otherwise
stated.

2.4.7 1-Dimensional Curves


The first interesting geometric quantity that we want to compute is the length of a curve given
by a function y = f (t ). We make use of the vector calculus of the preceding sections to write
the position vector of a point on the curve and then construct a sequence of line elements at
each point along that curve whose length we can sum. Suppose that the parametrisation is such
that the end points of the curve correspond to t = 0 and t = 1. The length of a 1-dimensional
parametric curve, parameterised by t , with tangent v⃗(t ) is simply the integral of the magnitude of
the tangent vector. We start by constructing a vector that defines a point on the curve at a time t ,

a⃗(t ) = t x̂ + f (t ) ŷ (2.18)

48
and then compute the velocity of the point a⃗ along the path

d a⃗
 ‹
v⃗(t ) =
dt
corresponding to the tangent vector along the curve at a time t . The length along the path ∆ℓ(t )
that is traversed by a point starting at a⃗(t ), at a speed ∥v ∥ (t ) and for a period of time ∆t is given
by
d a⃗(t )
 ‹
∆ℓ(t ) = ∥∆a⃗(t )∥ = ∆t = ∥v⃗(t )∥ ∆t .
dt
We rewrite this portion of the path in the limit that ∆t → 0 and recover infinitesimal element of
the path
d ℓ2 = v⃗(t ) · v⃗(t ) d t 2 = g i j v⃗i (t )v⃗ j (t ) d t 2 .

Integrating over all such infinitesimal path elements returns the length over the entire path

Z Z1
p
ℓ= dℓ = dt v⃗(t ) · v⃗(t ) (2.19)
y = f (x ) 0

and metric tensor is necessary to compute the dot product of the tangent vector with itself. The
formulation can be extended to any smooth parametric curve curve a⃗(t ) in any number of
dimensions.
We now consider several example calculations for the length, area and volume in different
coordinates systems.

2.4.8 2-Dimensional Plane Polar Coordinates


There is always more than one way to assign coordinates to a space. It will be useful to be able to
translate between these different coordinate systems. As was stated that RNC correspond to the
classic example of 2-dimensional plane polar coordinates. We call these curvilinear coordinates
in 2-dimensions since there exists functions that translate the coordinate curves in one choice
of coordinates to coordinate curves in another. Here we consider two parameters, the distance
of a point from the origin r and its angle with the x -axis θ . The parameters r and θ completely
characterise any point in the plane. We can write transformation equations for this coordinate
system
x = r cos (θ ) and y = r sin (θ ) . (2.20)

These equations can be inverted - thus every point corresponds to an unique pair (r, θ ). It is
understood that we restrict θ to be in some interval of length 2π so that no two values of θ
correspond to the same physical angle. Figure 2.3 demonstrates the standard (x , y ) and (r, θ )
coordinate description of a point in the 2-dimensional plane.
Are these transformation equations well-behaved? Certainly they are invertible - without even
using equations we can see that every point in the plane has a unique polar representation and a
unique Cartesian representation, and thus there is a one-to-one correspondence between them.

49
The equations are also differentiable. As for the Jacobian Matrix, we shall leave the answer to this
question to the discussion of tangent vectors. It turns out that it is always non-singular (which
comes as no surprise, since we convinced ourselves that they were invertible). Recall that these
are the family of curves formed when we let one coordinate vary while keeping the other constant.
If we vary r while keeping θ constant, we get a ray emanating from the origin at the angle θ
out to infinity. These are the radial coordinate curves depicted in Figure 2.3
2.3. These curves are
lines, and thus we see that there is no curvature associated with the r coordinate. If we vary θ
while keeping r constant we get a circle of radius r and centre the origin. These are the circular
coordinate curves depicted in Figure 2.3
2.3. Clearly these curves are not straight lines, so there is
curvature associated with the θ coordinate. Fundamental geometry then tells us that the angle
between the two families of curves is always 90◦ . We will see this more formally when we consider
the tangent vectors for polar coordinates.
Notice that the coordinate curves for each coordinate are a family of curves, not just one curve.
When we fix the other parameter, then we get a specific curve in the family. For instance the
θ -curves are the circles centre the origin, but if we specifically look at the curve when r = 5, then
we single out the circle of radius 5. Similarly the r -curves are the rays from the origin, but if we
specifically consider the curve where θ = 90◦ then we single out the positive y -axis.
We often make use of polar coordinates when solving problems involving circular motion or
circular symmetry. Circles are coordinate curves in polar coordinates, so motion on a circle is
described by just one of coordinate. Thus, it is convenient to reason in polar coordinates about
physical systems where there is circular motion or circular symmetry.
It is a very simple matter to obtain the tangent vectors for any coordinate system once we have
written down the transformation equations
‚ Œ ‚ Œ ‚ Œ
∂ x (r, θ ) ∂ r cos (θ ) cos (θ )
e⃗r = = = ,
∂ r y (r, θ ) ∂ r r sin (θ ) sin (θ )
‚ Œ ‚ Œ ‚ Œ
∂ x (r, θ ) ∂ r cos (θ ) − sin (θ )
e⃗θ = = =r .
∂ θ y (r, θ ) ∂ θ r sin (θ ) cos (θ )
We notice that e⃗r is a unit vector, and e⃗θ has length r . So, we can write
‚ Œ ‚ Œ
cos (θ ) − sin (θ )
r̂ = and θ̂ = ,
sin (θ ) cos (θ )
and
e⃗r = r̂ and e⃗θ = r θ̂ .
The unit vectors r̂ and θ̂ are often used when solving problems about circular motion. The former
points radially outwards, and the latter points tangentially and counter-clockwise. In Figure 2.14
2.14,
notice how the unit vectors are indeed tangent to the corresponding coordinate curves.
Earlier we remarked that the two classes of coordinate curves in polar coordinates are
orthogonal. This is easy to show by using the tangent vectors
‚ Œ
€ Š −r sin (θ )
e⃗r · e⃗θ = cos (θ ) sin (θ ) ·
r cos (θ )

50
y
θ̂ r̂

r~

θ x

Figure 2.14: Unit vectors tangent to coordinate curves in 2-dimensions.

= −r cos (θ ) sin (θ ) + r cos (θ ) sin (θ )


= 0.

Thus the unit vectors are orthogonal for all values of r and θ , and hence everywhere in space.
Clearly the two families of coordinate curves are always orthogonal.
We noticed earlier that r̂ is a unit vector - and hence the r coordinate does not curve (or stretch)
space. Indeed the coordinate curves are straight rays. We also saw that ha t θ has a coefficient of
curvature of r . Thus θ is a curved coordinate. This makes sense physically as θ is an angle, and
its corresponding coordinate curves are circles.

Example 2.11 (Circumference of a Circle) We can determine the circumference of a circle in the
(x , y )-plane by starting with a polar coordinate representation of a position of a point on the circle
using (2.20
2.20). In this coordinate system, a point p⃗ (t ) = r cos (θ ) x̂ + r sin (θ ) ŷ traces a circular path
of radius r , centered at the origin of the coordinate system, that is parameterised t . Tangents to this
path are given by the parametric curve p⃗˙ (t ). Then, it follows
‚ Œ ‚ Œ ‚ Œ
r cos (θ (t )) cos (θ (t )) − sin (θ (t ))
p⃗ (t ) = and p⃗˙ (t ) = ṙ + r θ̇ = ṙ r̂ + r θ̇ θ̂ .
r sin (θ (t )) sin (θ (t )) cos (θ (t ))

The radial coordinate is constant for motion along the circumference of a circle, so ṙ = 0 and we
have immediately
p⃗˙ (t ) = r θ̇ θ̂ .

Next we use (2.19


2.19) to integrate the lengths of each tangent vector along the path defined by the
circumference of the circle

Z1 Z1 Z1 Z1 Z2π

q Æ  ‹
ℓ= dt p⃗˙ (t ) · p⃗˙ (t ) = dt r 2 θ̇ 2 θ̂ · θ̂ = r d t θ̇ = r dt = r dθ
dt
0 0 0 0 0

51
where the change of coordinates θ = 2πt ensures that a single circuit of the circle occurs in the time
interval t ∈ [0, 1]. Then
Z2π
ℓ=r d θ = 2πr
0

which is exactly what we should expect from the standard definition of the circumference of a circle.

Example 2.12 (Area of a disk) We now consider the area element in 2-dimensional Polar
Coordinates. We may obtain this in several ways. Our first method is a purely geometric argument.
Consider the area element enclosed between coordinate curves of infinitesimal distance apart as
depicted in Figure 2.15
2.15.

r dθ

dθ dr

Figure 2.15: 2-dimensional-Polar-Coordinate-Area-Element

The area element is highlighted in Figure 2.15


2.15. We notice that because the curves are at 90 degrees
to each other, the area element starts to resemble a rectangle as we make its sides very small (the
curvature on the ‘circle’ side flattens out in the limit). Thus we can write the area of the infinitesimal
element as the product of the lengths of the sides. The area element is clearly given by

d A = d r (r d θ ) = r d r d θ .

Notice how the orthogonality of the coordinate curves was required for this argument to work. Had
the curves not been orthogonal, simply multiplying the side lengths in the limit would not give the
correct area element. Notice also that the ‘side lengths’ of our infinitesimal area elements are our
coefficients of curvature. This is no coincidence. With these two points in mind, we can now see that
our geometric argument is in fact analogous to the argument presented in general in the preceding
section, that the Jacobian Determinant can be given as the product of the coefficients of curvature
provided that the tangent vectors are orthogonal. Thus in this case we could immediately write out

d A = (1)(r ) d r d θ = r d r d θ .

52
The third way of deriving this result is to evaluate the Jacobian determinant directly. This yields the
same answer. As an example of the use of the area element, we will now compute the area of the
disk of radius R ,
Z ZR Z2π ZR
Area = dA = d r d θ r = 2π d r r = πR 2 .
S 0 0 0

We will see more of this type of integral when we study rigid bodies later in the course.

2.4.9 2-Dimensional Elliptical Coordinates


An example of a coordinate system in which the tangent vectors are not in general orthogonal is
the elliptic coordinate system (u , φ), in which for some prescribed a and b , the transformation
equations are given by
 
x = a u cos φ and y = b u sin φ .

Here we allow φ to vary between 0 and 2π and u > 0. It should be clear that polar coordinates
are a special case of elliptic coordinates in which a = b = 1 and r = u. We get a stretched polar
coordinate system when a = b ̸= 1. The name should leave little surprise that the coordinate
curves in elliptic coordinates are rays and ellipses. This is depicted in Figure 2.16
2.16.

bu

x
au

Figure 2.16: 2-dimensional elliptic coordinate curves. Notice that this coordinate system has a
distorted area element. It is not in general the same as the product of the coefficients of curvature.
This is because the coordinate curves intersect at an angle, and the infinitesimal unit of area is
now a parallelogram.

It is evident in Figure 2.16 that the curves are not in general orthogonal. This is best seen by
computing the tangent vectors
‚ Œ ‚ Œ
a cos φ −ua sin φ
e⃗u =  and e⃗θ =  .
b sin φ u b cos φ

53
The dot product is
‚ Œ
€ Š −a u sin φ 1
 = u b 2 − a 2 sin 2φ .
 
e⃗u · e⃗θ = a cos φ

b sin φ
b u cos φ 2

Clearly, this inner product is zero for all φ when a = b . When a ̸= b then the coordinate unit
vectors are orthogonal only when φ = πn or φ = π2 + πn and n ∈ Z.

Remark 17 There is no known closed form solution for the circumference of an ellipse in terms of
its semi-major axis lengths a and b . This quantity must be determined numerically, usually using
computer-based numerical methods that implement the arc-length formula of (2.19
2.19).

Example 2.13 (Area of an Ellipse in 2-Dimensional Elliptic Coordinates) The coordinate grid
in Figure 2.16 is distorted in a way that distinguishes it from that of the polar coordinate grid in
Figure 2.3
2.3. The coordinate curves will intersect such that the dot product between tangent vectors
along the coordinate curves will vary along the path of each coordinate curve. Clearly e⃗u · e⃗θ is only

zero across all values of u , φ if we set a = b . For all other elliptical systems, the coordinate curves
are not generally orthogonal. There will be places where the coordinate curves are orthogonal. Can
you see this geometrically? Can you obtain these positions algebraically?. This means that the
coefficients of curvature cannot be directly used in this case to obtain the area element. Instead we
must compute the determinant of the Jacobian Matrix directly
‚   Œ
a cos φ b sin φ
 = a b u cos2 φ + a b u sin2 φ = a b u.
 
det (e⃗u e⃗θ ) = det 
−u a sin φ u b cos φ

So the area element in Elliptic Coordinates is d A = a b u d u d φ . We compute the area of an ellipse


with a x -radius of a and a y -radius of b :

Z Z1 Z2π Z1
Area = dA = d u d φ a b u = 2πa b d u u = πa b .
S 0 0 0

Notice that while this is easy to compute by other methods, with Elliptical Coordinates we could
just as easily compute the area of an Elliptical Segment between two specified angles:

Z1 Zβ Z1 Zβ
β −α
Z
Area = dA = du dφ a b u = a b du u dφ = a b.
2
S 0 α 0 α

We next consider some examples in 3-dimensions.

2.4.10 3-Dimensional Polar Cylindrical Coordinate


The 3-dimensional Polar Cylindrical Coordinate system uses three coordinates (φ, ρ, z ) as shown
in Figure 2.17
2.17. The coordinate φ defines an angle between the x -axis and some plane of interest
(which passes through the z -axis) and is commonly referred to as the azimuthal angle or azimuthal

54
z


p φ̂
ρ̂

ρ
y
φ
x

Figure 2.17: 3-dimensional polar cylindrical coordinate unit vectors.

coordinate. The coordinate ρ then gives the distance to travel in that plane without lifting. Finally,
the coordinate z gives the height of the point of interest.
Using these definitions and Figure 2.17
2.17, we can write down the transformation equations
 
x = ρ cos φ , y = ρ sin φ and z =z (2.21)

where it is understood that ρ ≥ 0, 0 ≤ φ ≤ 2π. We can think of polar cylindrical coordinates as a


projection of 2-dimensional polar coordinates into 3-dimensions by the addition of the z
coordinate. These transformation equations are clearly well-behaved. Now consider the
coordinate curves for cylindrical coordinates. If we keep both ρ and z fixed and vary φ, we get a
circle of an arbitrary radius centred at an arbitrary point on the z -axis. This is the φ-coordinate
curve. If we keep both φ and z fixed and allow ρ to vary, we get a ray emanating from an arbitrary
point on the z -axis in some arbitrary direction in a plane parallel to the x − y plane. This is the ρ
- coordinate curve. If we keep both ρ and φ constant and allow z to vary we get a vertical line
passing through an arbitrary point in the x − y plane specified by our choice of ρ and φ. This is
the z -coordinate curve.

Remark 18 For each of the ‘coordinate curves’ above, we’ve described a member of a family of curves.
The coordinate curves for a coordinate are always a family of curves.

Now we consider the coordinate surfaces for cylindrical coordinates. If we keep ρ fixed and
allow φ and z to vary, then we get a cylinder of radius ρ and infinite height centred on the z -axis.
For different values of ρ we will get cylinders of different radius, and this family of surfaces is the
family of coordinate surfaces associated with varying φ and z . We can call it the φ, z family of
coordinate surfaces. This coordinate surface is what gives cylindrical coordinates their name.

55
Keeping φ fixed and varying the other two parameters produces a half-plane with one side along
the z -axis. Similarly, keeping z fixed and varying the other two parameters produces a plane
parallel to the x − y axis.
When motion or symmetry exists along one of these coordinate surfaces or coordinate curves,
then cylindrical coordinates will be a good choice of coordinates for the problem. Notice that the
coordinate surfaces that were planes aren’t interesting to us as we can consider planar motion
using one of our 2-dimensional coordinate systems. Thus the cylinder is the truly interesting
object here. When our system is constrained to move in a spiral or some other motion on the
surface of a cylinder, then Polar Cylindrical Coordinates may be a good choice.
As in any other coordinate system, we can obtain the tangent vectors directly from the
transformation equations
€   Š⊤
e⃗ρ = cos φ sin φ 0 = ρ̂,
€   Š⊤
e⃗φ = −ρ sin φ cos φ 0 = ρ φ̂,
€ Š⊤
e⃗z = 0 0 1 = ẑ .

It is also immediate what the sizes of these vectors are, and hence their relationships to the
corresponding unit vectors. We notice in particular that the only coordinate with curvature is φ.
This makes sense as it is the only coordinate for which the coordinate curve was not a line or ray.
The unit vectors in cylindrical coordinates are illustrated in Figure 2.17
2.17. As per usual these vectors
are tangential to the coordinate curves and in the direction of increase of their corresponding
coordinates. (ẑ points upwards, ρ̂ points outwards, φ̂ points counter-clockwise).
Figure 2.17 suggests that the tangent vectors are orthogonal. Indeed, this should be quite clear
from thinking of cylindrical coordinates as polar coordinates with an additional coordinate z .
Formally, we can compute the pair-wise inner products between the tangent vectors
 
€ −ρ sin φ
 Š
e⃗ρ · e⃗φ = cos φ sin φ 0  ρ cos φ  = 0,
 

0
 
€ 0
 Š 
e⃗ρ · e⃗z = cos φ sin φ 0 0 = 0,


1
 
€ Š −ρ sin φ 
e⃗z · e⃗φ = 0 0 1  ρ cos φ  = 0.
0

Thus indeed we see that the tangent vectors are orthogonal at all points in space. The families of
coordinate curves are then orthogonal at all points in space.

Example 2.14 (Volume of a Cylinder in 3-Dimensianal Polar Cylindrical Coordinates) In


3-dimensional polar cylindrical coordinates, we consider three coordinates (φ, ρ, z ), see

56
Figure 2.17
2.17. This orthogonality of the tangent vectors enables us to write down the volume element
directly in terms of a product of the coefficients of curvature

d V = (ρ d φ )(1 d ρ )(1 d z ) = ρ d φ d ρ d z

and is shown graphically in Figure 2.18


2.18. As a simple application we compute the volume of a
cylinder of radius R and height H

Z ZH ZR Z2π ZH ZR Z2π
Volume = dV = dz dρ dφ ρ = dz dρ ρ d φ = πR 2 H .
cylinder 0 0 0 0 0 0

z +dz

dz

dρ ρ dφ

φ dφ y
ρ

x dρ

Figure 2.18: The volume element in cylindrical coordinates.

2.4.11 3-Dimensional Polar Spherical Coordinates


In 3 − d polar spherical coordinates we consider coordinates (ρ, θ , φ), where φ is once again the
angle between the x -axis and the plane the point of interest makes with the z -axis and is called,
again, azimuthal angle or azimuthal coordinate, θ is the angle within this plane between the
z -axis and the point of interest and is called the polar angle or polar coordinate. The coordinate
and ρ in this case is the distance from the origin to the point of interest and is commonly called the
radial position or radial coordinate. Note in particular that ρ for spherical coordinates is different
from ρ for cylindrical coordinates. See Figure 2.19
2.19.
As before, we shall relate these spherical coordinate system to other coordinate systems. From
the Figure 2.19 and some elementary trigonometry, we arrive at the following transformation

57
z

ρ
θ

y
φ
x

Figure 2.19: 3-Dimensional Polar Spherical Coordinate System.

equations for spherical coordinates

 
x = ρ sin (θ ) cos φ , y = ρ sin (θ ) sin φ and z = ρ cos (θ ) (2.22)

where it is understood that ρ ≥ 0, 0 ≤ φ ≤ 2π and 0 ≤ θ ≤ π. (Challenge question, why do we not


allow θ to vary all the way to 2π?) Notice that as we now have two angles we expect both of them
to have a curvature.
If we keep θ and ρ constant and allow ρ to vary, we obtain a ray from the origin extending
to infinity. This is a ρ-coordinate curve for spherical coordinates. If we keep ρ and θ constant
and allow φ to vary, we get a circle centred somewhere on the z -axis and in a plane parallel to the
z = 0 plane. This is a φ-coordinate curve. If we keep φ and ρ constant and allow θ to vary we
obtain a semi-circle centred at the origin and with diameter on the z -axis (radius ρ, in a plane at
angle φ with the x -axis). This is a θ -coordinate curve.
If we keep ρ constant and allow the other two parameters to vary we obtain a half-plane
with one edge the z -axis. If we keep ρ constant and allow the other two parameters to vary, we
obtain a sphere centred at the origin. If we keep θ constant and allow the other two parameters
to vary, we obtain a cone with its apex at the origin and its axis of symmetry along the z -axis.
These last two coordinate surfaces are of interest to us in problem solving. The sphere gives the
coordinates the name Spherical Coordinates. But unlike with Cylindrical Coordinates there is a
second non-trivial coordinate surface - the Cone. These coordinates are thus also suitable for
motion that is constrained to a cone, or for conical spirals etc, as well as problems with conical
symmetry.

58
As always we obtain the tangent vectors by taking partial derivatives of the transformation
equations
   
ρ sin (θ ) cos φ sin (θ ) cos φ
∂    
e⃗ρ =  ρ sin (θ ) sin φ  =  sin (θ ) sin φ  = ρ̂,
∂ρ
ρ cos (θ ) cos (θ )
     
ρ sin (θ ) cos φ ρ cos (θ ) cos φ cos (θ ) cos φ
∂     
e⃗θ =  ρ sin (θ ) sin φ  =  ρ cos (θ ) sin φ  = ρ  cos (θ ) sin φ  = ρ θ̂ ,

∂θ
ρ cos (θ ) −ρ sin (θ ) − sin (θ )
     
ρ sin (θ ) cos φ −ρ sin (θ ) sin φ − sin φ
∂     
e⃗φ =  ρ sin (θ ) sin φ  =  ρ sin (θ ) cos φ  = ρ sin (θ )  cos φ  = ρ sin (θ ) φ̂.

∂φ
ρ cos (θ ) 0 0

Here we see the coefficients of curvature are non-trivial for both θ and φ, as expected.
We wish to compute the angles between the tangent vectors. Before we begin we observe that
the corresponding unit vectors have the same direction as the tangent vectors and are easier to
work with. So, we compute the inner products between the unit vectors,
 
€ Š cos (θ ) cos φ
ρ̂ ⊤ θ̂ = sin (θ ) cos φ sin (θ ) sin φ cos (θ )  cos (θ ) sin φ 
  

− sin (θ )
= sin (θ ) cos (θ ) cos2 φ + sin2 φ − cos (θ ) sin (θ )
 

= sin (θ ) cos (θ ) − cos (θ ) sin (θ )


=0
 
€ Š − sin φ
ρ̂ ⊤ φ̂ = sin (θ ) cos φ sin (θ ) sin φ cos (θ )  cos φ 
  

0
   
= sin (θ ) − sin φ cos φ + sin φ cos φ
=0
 
€ Š − sin φ 

 
θ̂ φ̂ = cos (θ ) cos φ cos (θ ) sin φ − sin (θ )  cos φ 
0
   
= cos (θ ) − cos φ sin φ + sin φ cos φ
= 0.

The tangent vectors are orthogonal.

Example 2.15 (Volume in of a Sphere in 3-Dimensional Polar Spherical Coordinates) This


enables us to think of the infinitesimal volume element as a rectangular box (as opposed to a

59
parallelepiped), and most importantly to write the volume element in terms of the product of the
coefficients of curvature
d V = ρ 2 sin (θ ) d ρ d θ d φ

and is shown graphically in Figure 2.20


2.20. As a simple application we compute the volume of a sphere
of radius R

Z Z2πZπ ZR Zπ
1 4
Volume = dV = d ρ d θ d φ ρ 2 sin (θ ) = 2πR 3 d θ sin (θ ) = πR 3 .
3 3
Sphere 0 0 0 0


ρ dθ

θ dθ

φ dφ y

x (θ ) d φ
ρ sin

Figure 2.20: The volume element in spherical polar coordinates.

2.4.12 Other Coordinate Systems


It is impossible to provide an exhaustive list of all possible coordinate systems and their
associated transformation equations. There are literally an infinite number of well behaved
transformation equations and consequent coordinate systems. Below are some examples of
coordinate transformation equations slightly off the beaten track. For each of these coordinate
systems, it is left to the reader to perform the following steps,

1. Identify the correct ranges for caviling coordinates.

2. Identify what part of the x − y plane or x − y − z space is being mapped.

3. Identify coordinate curves/surfaces.

4. Find tangent vectors and cotangent vectors.

60
5. Investigate Orthogonality of tangent vectors and other properties.

6. Find area/volume elements.

Try using the Mathematica instruction

ParametricPlot[{formula for x, formula for y},


{x, xmin, xmax},
{y, ymin, ymax}]

to view the region covered by the coordinates

1. Canonical Hyperbolic Coordinates in 2-dimensions,

x = r (cosh (t ) − 1) and y = sinh (t )

where r, t ∈ R are varying.

2. Modified Hyperbolic Coordinates in 2-dimensions,

x = a r (cosh (t ) − 1) and y = b sinh (t )

where a , b > 0 are fixed.

3. Ellipsoid Coordinates in 3-dimensions,


 
x = a ρ sin (θ ) cos φ , y = b ρ sin (θ ) sin φ and z = c ρ cos (θ )

where a , b , c > 0 are fixed.

In the case of a curve traced by the position vector r⃗, we notice that there are as many tangent
vectors to the curve as there are independent parameters that define the curve. For example,

1-dimensional curve: Suppose that r⃗ = r⃗(t ), then there is exactly one independent parameter
that defines the curve and exactly one tangent vector given by

d r⃗
 ‹
e⃗t = .
dt
This tangent vector points along the curve traced by r⃗ in the direction of increasing
parameter t . Then at each value t = t 0 , there corresponds a point on the curve located at
r⃗(t 0 ) and a one dimensional space based at this point that is tangent to this curve located at
t 0 . This tangent space at t = t 0 is a copy of the 1-dimensional real line and is represented
graphically as an infinite line that is tangent to the curve at r⃗(t 0 ).

2-dimensional curve: Suppose that r⃗ = r⃗(θ , φ), then there are exactly two independent
parameter that defines the curve and exactly two tangent vectors given by

∂ r⃗ ∂ r⃗
 ‹  ‹
e⃗θ = and e⃗φ = .
∂θ ∂φ

61
This tangent vectors points along the curve traced by r⃗, where e⃗θ points in the direction of
increasing parameter θ , while e⃗φ points in the direction of increasing parameter φ. Then
at each point (θ0 , φ0 ) in parameter space, there corresponds a point on the curve located
at r⃗(θ0 , φ0 ) and a 2-dimensional space that is tangent to this curve, given by the span of
e⃗θ and e⃗φ . This tangent space at (θ0 , φ0 ) is a copy of the 2-dimensional real plane and is
represented graphically as an infinite flat sheet that is tangent to the curve at r⃗(θ0 , φ0 ).

n -dimensional curve: Suppose that r⃗ = r⃗(x 1 , x 2 , . . . , x n ), then there are exactly n independent
parameter that defines the curve and exactly n tangent vectors given by
∂ r⃗
 ‹
e⃗i = for i = 1, 2, . . . , n .
∂ xi
This tangent vectors points along the curve traced by r⃗, where e⃗i points in the direction
of increasing parameter x i . Then at each point x⃗0 (x 10 , . . . x n0 ) in parameter space, there
corresponds a point on the curve located at r⃗( x⃗ ) and an n -dimensional space that is tangent
to this curve, given by the span of {e⃗1 , . . . e⃗n }. This tangent space at x⃗0 is a copy of the n -
dimensional real space, however it is generally not possible to generate graphical depictions
of this tangent space, but it corresponds to an n-dimensional generalisation of the previous
two examples.

2.5 Coordinate Transformations


The following discussion considers how the components of tangent vectors transform when the
coordinate system used to define the system is replaced with a new one. The reason that this is a
necessary consideration is that while changing the coordinate system necessarily changes the
components of each tangent vector, the objects constructed from these tangent vectors are related
by well-defined transformation rules that will be necessary for later computations. In particular,
these transformation are linear functions of the coordinate transformation functions at each
point and the set of transformation rules provide a convenient encoding for the computations to
come.
Recall that a scalar field assigns a value to each point in some domain. A vector field is a
function that assigns a vector to every point in some domain. Formally,

V :S →T

where S and T are open subsets of vector spaces. To each x⃗ ∈ S , we assign y⃗ ∈ T according to
the rule y⃗ = V ( x⃗ ). When we associate some coordinate system with S and T , then we can think
of V as associated with the function that takes the coordinates of x⃗ and maps them onto the
coordinates of y⃗
y 1 x 1, x 2, . . . , x n
 

 y 2 x 1, x 2, . . . , x n


1 2 n
V⃗ x , x , . . . , x

= .
 
..
 . 
y m x 1, x 2, . . . , x n


62
One way of visualizing a vector field is by attaching a small vector to every point in space. So at
the x⃗ point we would ‘draw’ the vector V⃗ ( x⃗ ). Consider the following vector fields,
‚ Œ ‚ Œ
y 1 (x , y ) 2
V⃗ (x , y ) = = (2.23)
y 2 (x , y ) −1

which is simply the constant vector field that assigns a single fixed vector value to each point in
R2 . A more complicated example is
 
y
p
‚ Œ
1
y (x , y ) x 2 +y 2
V⃗ (x , y ) = =  (2.24)
2
y (x , y ) − p x2 2
x +y

which corresponds to a unit (normalised) vector field in 2-dimensions. We can verify that each
vector V⃗ (x , y ) is normalised by showing that V⃗ (x , y ) · V⃗ (x , y ) = 1 for each value of x and y .
Similarly,
p 4−2y 2 2
   
1
y (x , y , z ) x +4y +z
2
p 4x 2 2
 
V⃗ (x , y , z ) =  y (x , y , z ) = 
2
(2.25)
 
 x +4y +z 
y 3 (x , y , z ) p −z
x 4 +4y 2 +z 2

is an example of a unit (normalised) vector field in 3-dimensions. Again, we can verify that each
vector V⃗ (x , y , z ) is normalised by showing that V⃗ (x , y , z ) · V⃗ (x , y , x ) = 1 for each value of x , y and
z . The vector fields in (2.23
2.23), (2.24
2.24) and (2.25
2.25) are depicted graphically in Figure 2.21
2.21, Figure 2.22
and Figure 2.23
2.23, respectively. In all three examples, we have implicitly made the assumption
that both the original space and the target space are parameterized with Cartesian coordinates.
This enabled us to write out equations for our vector fields taking coordinates as arguments and
returning vectors in coordinate representation. Other examples of vector fields include the flow
of a fluid in some container (at each point the fluid has a vector direction/velocity of flow), the
electric field around some charge, and the ‘wind velocity’ field often shown on weather reports.
Recall when transforming scalar fields we transformed what we put in as an argument to the
function. For vector fields we must also transforms what comes out of the function. Depending on
whether we are considering covariant or contravariant coordinates, this will transform differently.
The question now arises ‘how do the components of the vector field change for a given change of
coordinate system?’
Our task is to determine the transformation rules for the components of vector fields for each
change of coordinates. Let us begin by considering another simple example of a 2-Dimensional
vector field.

Example 2.16 (Simple 2-Dimensional Vector Field) Consider a vector field defined by

f⃗(x , y ) = x x̂ + y ŷ .

We can think of this as two scalar components

f x (x , y ) = x and f y (x , y ) = y .

63
0
y

0
x
Figure 2.21: The constant 2-Dimensional normalised vector field defined in (2.23
2.23).

0
y

0
x
Figure 2.22: The 2-Dimensional normalised vector field defined in (2.24
2.24).

We know that these are both covariant and contravariant components for f⃗ in Cartesian coordinates.
Let us now convert this to polar coordinates. First we transform the scalar components. This part is
familiar from transforming scalar functions

f x (r, θ ) = f x (x (r, θ ), y (r, θ )) = f x (r cos (θ ) , r sin (θ )) = r cos (θ )


f y (r, θ ) = f y (x (r, θ ), y (r, θ )) = f y (r cos (θ ) , r sin (θ )) = r sin (θ )

where the components are still Cartesian. To convert these to polar coordinates we have to decide
whether we wish to consider the covariant components or the contravariant components. We will

64
0
z

0
0 y
x
Figure 2.23: The 3-Dimensional normalised vector field defined in (2.25
2.25). The different arrow
colours correspond to different z -values in 3-dimensional space.

generally be interested in the covariant components


‚ Œ
€ Š r cos (θ )
f r = e⃗r · f⃗ = cos (θ ) sin (θ ) = r cos2 (θ ) + r sin2 (θ ) = r
r sin (θ )

and
‚ Œ
€ Š r cos (θ )
fθ = e⃗θ · f⃗ = −r sin (θ ) r cos (θ ) = −r 2 sin (θ ) cos (θ ) + r 2 sin (θ ) cos (θ ) = 0.
r sin (θ )

This gives rise to f⃗(r, θ ) = r ê r . As we expect from the shape of the vector field, the tangential
component is zero. The vector field f⃗(x , y ) is depicted graphically in Figure 2.24
2.24.

Now we generalise the important aspects of coordinate transformations by considering the


space spanned by the collection of tangents to the curve r⃗. Consider a 1-dimensional curve
defined by the position vector r⃗(t ) in the vicinity of some fixed point t 0 . Expanding r⃗(t 0 + δt ) in a
Taylor series gives

1 d2 r⃗
 
d r⃗
 ‹
r⃗(t 0 + δt ) = r⃗(t 0 ) + δt + δt 2 + . . .
dt t0
2 dt 2 t0

then r⃗(t 0 + δt ) − r⃗(t 0 ) is defined at t 0 and is a function of δt only. For sufficiently small δt
d r⃗
 ‹
δ r⃗|t 0 (δt ) = r⃗(t 0 + δt ) − r⃗(t 0 ) ≈ δt
d t t0

and δ r⃗|t 0 (δt ) is a vector defined at t 0 and tangent to the curve defined by r⃗(t ). Under
parametrisation
dt d dτ d
 ‹  ‹
t = t (τ), t 0 = t (τ0 ), δt = δτ and =
dτ dt dt dτ

65
0
y

0
x
Figure 2.24: The 2-Dimensional vector field f⃗(x , y ).

then as a function of τ

δ r⃗|τ0 (δτ) = r⃗(τ0 + δτ) − r⃗(τ0 )


1 d τ 2 d2 r⃗
‹   ‹2
d τ d r⃗ dt dt
 ‹ ‹  ‹  
= δτ + δτ2 + . . .
dt d τ τ0 d τ 2 dt d τ2 τ0

1 d2 r⃗
 
d r⃗
 ‹
= δτ + δτ2 + . . .
d τ τ0 2 d τ2 τ
0

and for sufficiently small δτ


d r⃗
 ‹
δ r⃗|τ0 (δτ) ≈ δτ.
dτ τ0

Now define, on the tangent space, the differential d r⃗, such that

d r⃗
 ‹
d r⃗ = dt (2.26)
dt
and for any any change of coordinate t = t (τ)

d r⃗ d τ d r⃗ d t d r⃗
 ‹  ‹ ‹ ‹  ‹
d r⃗ = dt = dτ = d τ. (2.27)
dt dt dτ dτ dτ
By this construction, the relationship between the differential d r⃗ and the elements of the Taylor
expansion of r⃗ is clear.
Repeating the above construction for a 2-dimensional curve defined by the position vector
r⃗(u , v ) in the vicinity of some fixed point (u 0 , v0 ) yields

∂ r⃗ ∂ r⃗
 ‹  ‹
r⃗(u 0 + δu , v0 + δv ) = r⃗(u 0 , v0 ) + δu + δv
∂u (u 0 ,v0 )
∂v (u 0 ,v0 )

66
∂ 2 r⃗ 1 ∂ 2 r⃗ 1 ∂ 2 r⃗
     
2
+ δu δv + δu + δv 2 + . . .
∂ u∂ v (u 0 ,v0 )
2 ∂ u2 (u 0 ,v0 )
2 ∂ v2 (u 0 ,v0 )

and for sufficiently small δu and δv

δ r⃗|(u 0 ,v0 ) (δt ) = r⃗(u 0 + δu , v0 + δv ) − r⃗(u 0 , v0 )


∂ r⃗ ∂ r⃗
 ‹  ‹
≈ δu + δv
∂ u (u 0 ,v0 ) ∂ v (u 0 ,v0 )

and δ r⃗|(u 0 ,v0 ) (δt ) is a vector defined at (u 0 , v0 ) and tangent to the curve defined by r⃗(u, v ). Similarly,
define, on the tangent space, the differential d r⃗, such that

∂ r⃗ ∂ r⃗
 ‹  ‹
d r⃗ = du + d v.
∂u ∂v

Now consider a change of coordinates

u = u (x , y ) and v = v (x , y )

whose inverses x = x (u , v ) and y = x (u, v ) can be rewritten as

x = x (u (x , y ), v (x , y )) and y = y (u (x , y ), v (x , y )).

The following statements are a direct consequence of the change of coordinates above

∂ ∂x ∂ ∂y ∂ ∂u ∂u
 ‹  ‹  ‹  ‹
= + and du = dx + dy
∂u ∂u ∂x ∂u ∂y ∂x ∂y
∂ ∂x ∂ ∂y ∂ ∂v ∂v
 ‹  ‹  ‹  ‹
= + and dv = dx + dy
∂v ∂v ∂x ∂v ∂y ∂x ∂y

and following (2.7


2.7), we have

dx ∂x ∂u ∂x ∂v dx ∂x ∂u ∂x ∂v
 ‹  ‹ ‹  ‹ ‹  ‹  ‹ ‹  ‹ ‹
= + and = +
dx ∂u ∂x ∂v ∂x dy ∂u ∂y ∂v ∂y
(2.28)
dy ∂y ∂u ∂y ∂v dy ∂y ∂u ∂y ∂v
 ‹  ‹ ‹  ‹ ‹  ‹  ‹ ‹  ‹ ‹
= + and = + .
dy ∂u ∂y ∂v ∂y dx ∂u ∂x ∂v ∂x

Then
∂ r⃗ ∂ r⃗
 ‹  ‹
d r⃗ = du + dv
∂u ∂v
∂x ∂ r⃗ ∂y ∂ r⃗ ∂u ∂u
§ ‹ ‹  ‹ ‹ª § ‹  ‹ ª
= + dx + dy
∂u ∂x ∂u ∂y ∂x ∂y
∂x ∂ r⃗ ∂y ∂ r⃗ ∂v ∂v
§ ‹ ‹  ‹ ‹ª § ‹  ‹ ª
+ + dx + dy
∂v ∂x ∂v ∂y ∂x ∂y
∂x ∂u ∂x ∂v ∂ r⃗ ∂y ∂u ∂y ∂v ∂ r⃗
§• ‹ ‹  ‹ ‹˜  ‹ • ‹ ‹  ‹ ‹˜  ‹ª
= + + + dx
∂u ∂x ∂v ∂x ∂x ∂u ∂x ∂v ∂x ∂y
∂x ∂u ∂x ∂v ∂ r⃗ ∂y ∂u ∂y ∂v ∂ r⃗
§• ‹ ‹  ‹ ‹˜  ‹ • ‹ ‹  ‹ ‹˜  ‹ª
+ + + + dy
∂u ∂y ∂v ∂y ∂x ∂u ∂y ∂v ∂y ∂y

67
dx ∂ r⃗ dy ∂ r⃗
§ ‹ ‹  ‹ ‹ª
= + dx
dx ∂x dx ∂y
dx ∂ r⃗ dy ∂ r⃗
§ ‹ ‹  ‹ ‹ª
+ + dy
dy ∂x dy ∂y
∂ r⃗ ∂ r⃗
 ‹  ‹
= dx + d y.
∂x ∂y

Following (2.28
2.28)
∂ r⃗ ∂ r⃗ ∂ r⃗ ∂ r⃗
 ‹  ‹  ‹  ‹
d r⃗ = du + dv = dx + d y. (2.29)
∂u ∂v ∂x ∂y
Equations (2.27
2.27), (2.28
2.28) and (2.29
2.29) generalise to n dimensions as

∂ r⃗
 ‹
d r⃗ = dxi. (2.30)
∂ xi

Similarly, given x i (u 1 , u 2 , . . . , u n ), (2.28


2.28) generalises to

dxi ∂ xi ∂ ui
    
= . (2.31)
dx j ∂ ui ∂xj

This is a good point to add some commentary about these transformations.

Remark 19 Note that the transformation rule in (2.30


2.30) defines the transformation in the tangent
space at a point. This is different to the transformation rule in (2.6
2.6) and highlights the fact that,
unlike regular numbers or functions, the length element d x , the area element d A = d x d y and the
volume element d V = d x d y d z in multivariable calculus are a not simple numbers or regular
functions. Rather, these are examples of a special kind of object whose purpose is to define the
measure in integrals and are not related to differentiation. A more detailed discussion of this
distinction can be found in good reference texts on Multivariable Calculus and Integration Theory,
and is outside the scope of this text.

Local Transformation Rules: Following the discussion above, each x i in (2.5


2.5) can be Taylor
s and each s i in (2.5
expanded about a given point s⃗0 in some direction δ⃗ 2.5) can be Taylor
expanded about a given point x⃗0 in some direction δ x⃗ to give

∂ xi
 
i i i
δx s⃗ = x (⃗ s ) − x (⃗
s0 + δ⃗ s0 ) = j
δs j + . . .
0 ∂ δs
∂ si
 
i i i
δs x⃗ = s ( x⃗0 + δ x⃗ ) − s ( x⃗0 ) = δx j + . . .
0 ∂ δx j

For sufficiently small δs j and δx j it follow that


i i
  
∂ x ∂ s
δx i s⃗ ≈ δs j and δs i x⃗ ≈ δx j
0 ∂ δs j 0 ∂ δx j

which defines two sets of linear equations of the form


i
Ai j x j = s i and A −1 j
s j = xi (2.32)

68
where A is an invertible matrix that transforms coordinates x i to s i in the vicinity of x⃗0 . When
this is true, we say that the system in locally invertible. This transformation becomes exact at
x⃗0 in the limit that ∥δ x⃗ ∥ → 0. Similar statements can be made to the inverse transformation.
When this is true for all δ x⃗ , then each Taylor series becomes a linear equation and (2.5
2.5)
defines an invertible linear system. Note, however, that there may still exist points x⃗0 where
this process breaks down and A is non-invertible. This occurs when one or more equations
in the system (2.5
2.5) are multivalued or are not linearly independent of the other equations.
Plane polar coordinates in (2.20
2.20), cylindrical polar coordinates in (2.21
2.21) and spherical polar
coordinates in (2.22
2.22) are examples where this occurs and are non-invertible at ρ = r = 0.
More information on matrix inverses is contained in Section A.4
A.4.

General Invariance: The transformation of the vector components is accompanied by a


corresponding transformation of the basis vectors. Expanding (2.30
2.30) in terms of basis
vector components reveals
i i i
     
∂ r ∂ r ∂ r
d r⃗ = d r i êi = d x j êi = d x j êi = d x j êi = d x j ε̂ j . (2.33)
∂xj ∂xj ∂xj
€ iŠ
To simplify the our notation, let A i j = ∂∂ xr j , then

d r i = Ai j d x j and ε̂ j = êi A i j . (2.34)

We can find the transformation rule for êi using the inverse of A, written as A −1 , where
j
A i j A −1 k = δi k (2.35)

where δi k is the Kronecker δ-function in (2.8 2.8), such that


j j
ε̂ j A −1 k = êi A i j A −1 k = êi δi k = êk .


The transformation rules are now given by


j
d r i = Ai j d x j and êi = ε̂ j A −1 i
. (2.36)

The components of d r i have a linear transformation law that opposes that of the associated
tangent vectors êi . Indeed, the change in d r i and êi for a given change of coordinates is
such the value of d r⃗ is independent of the choice of coordinates. We see this directly in the
computation of the magnitude of d r⃗ in in the basis {ε⃗1 , ε⃗2 , . . . , ε⃗n } given the transformation
rules in (2.36
2.36)
 € j Š €
j
Š
d r⃗ · d r⃗ = d r i êi · d r j ê j = A i k d x k êi · A l d x l ê j = A i k êi · A l ê j d x k d x l = ε̂k · ε̂l d x k d x l .

This implies that


êi · ê j d r i d r j = ε̂k · ε̂l d x k d x l ,
k l (2.37)
êi · ê j = ε̂k A k i · ε̂l A l j and ε̂i · ε̂ j = A −1 i
A −1 g .
j kl

This is a general property of tensors whose individual components change according to the
choice of coordinate system. Note, however that the quantity defined by this kind of object
is independent of the choice of coordinate system used to describe it.

69
Equation (2.34
2.34) distinguishes tensor components according to its rule for transformation and
it provides the following definition.

Definition 9 (Tensor Transformation Law) The contravariant components x i (⃗


s ) of a tensor X
transform according to the rule
∂ xi

i
x (⃗
s) = s j ( x⃗ )
∂sj
whereas the covariant components xi (⃗
s ) of a tensor X transform according to the rule
 j
∂s
xi (⃗
s) = s j ( x⃗ ).
∂ xi

It follows immediately that the metric tensor g i j has an action on the contravariant vector x j
such that, following the summation convention,

gi j x i x j = x j x j

where we have used the convenience notation

gi j x i = x i gi j = x j .

Similarly, we can define the inverse of this operation such that


€ Š € Š
g i−1
j
x j = g −1
ij
g j k x k = δi k x k = x i .
€ Š
Since the operation of g i j in the inner product is linear, the inverse operation g i−1 j
, if it exists, is
€ Š
linear and the combined action of g i j g j−1 k
is also linear. This has the immediate consequence
€ Š
that g i−1
j
g j k acts linearly in
€ Š € Š
g i−1
j
g j k x k
= g −1
ij
xj = xi

and satisfies
g −1 i j g j k = δi k


These operations are summarised in the following defintion.

Definition 10 (The Inverse Metric) Given the metric tensor g i j define the inverse metric g j k such
that
g i j g j k = δi k .

Following the defintion of the inverse metric in Definition 10


10, the operations

gi j x i = x j and g i j xi = x j

respects the tensor transformation law in Definition 9 and correspond lowering the index of the
contravariant vector component x i and producing a covariant vector component xi such that
the operation
g i j x i x j = x j x j = g i j xi x j

70
is well-defined. Note that Definition 9 coincides with our original definition of covariant and
contravariant components with respect to how they are used in matrix multiplication. The
interpretation of these transformation rules is that a linear transformation of a tensor quantity
acts index-wise on the tensor.

Remark 20 (Free and Dummy Indices) In each transformation rule in (2.37


2.37), Definition 9 and
Definition 10 there are to be found two kinds of indices, those that are repeated in upper and lower
positions with respect to the tensor symbol and those that are not repeated. Since, by summation
convention, we must sum over repeated indices, there is no special information associated with
repeated label used for a repeated index and we can freely relabel these indies as we please. We call
such an index a dummy index since it is simply place-holder variable whose purpose is to track the
elements in the implicit summation. However, those indices that are not repeated have a special
importance as they specify a given component of a tensor in a computation. We call such an index
free index since its value is not associated with any other index in a given computation.

2.6 Exercises
Exercise 2.1 Show that the exchange of a millimeter ruler with an inch ruler corresponds to
applying an affine transformation to the space whose original metric is given in millimeters.

Exercise 2.2 Embed T 2 into R3 and then give an explicit realization of a coordinate mapping from
a rectangular subset U ⊂ R2 onto T 2 . Show that this mapping is many-to-one outside of U .

Exercise 2.3 Show by direct construction that the RNC provide a many-to-one mapping between
R2 and S 2 and then define a subset of R2 where the RNC provide a one-to-one mapping from R2
onto S 2 . Embed S 2 into R2 and then give an explicit coordinatization of S 2 for an appropriate
transformation of Riemann Normal Coordinates.

Exercise 2.4 Show that the spherical polar coordinates in R3 coincide with the RNC on S 2 , centered
at the origin, when the radial coordinate ρ = 1 is fixed. Find the relations between the RNC
coordinates and the angular variables in the spherical polar coordinates.

Exercise 2.5 Use the standard definition of the vector dot product in N -dimensional Euclidean
space in rectilinear coordinates to prove that

N
X
⃗=
a⃗ · b aibi
i =1

Exercise 2.6 Consider a marked point in p ∈ R3 . Let α, β and γ denote the angles subtended at the
origin by the vector p⃗ and each of the coordinate axes x̂ , ŷ and ẑ . Show that

cos2 (α) + cos2 β + cos2 γ = 1.


 

71
Exercise 2.7 Compute the cotangent vectors in elliptic coordinates directly and then by inverting
the Jacobian matrix. Make sure you get the same results. Verify that they are mutually orthogonal
with the tangent vectors.

Exercise 2.8 Show that the cotangent vectors in an arbitrary linear system are the rows of the
inverse transformation matrix associated with that system (hint, V −1 V = I ). Clearly these do not
coincide with the tangent vectors in general.

Exercise 2.9 Show that the matrix of transformation between two orthonormal coordinate systems
is orthogonal.

Exercise 2.10 Verify that the area of the parallelogram whose adjacent edges are given by the vectors
⃗ is
a⃗ and b
⃗ sin (θ )
A = ∥a⃗∥ b
⃗.
where θ is the smallest angle between a⃗ and b

Exercise 2.11 Suppose A is a linear transformation (matrix) and the vector r⃗ transforms to a vector
r⃗′ by
r⃗′ = A r⃗.

Show this corresponds to a corresponding transformation of the coordinate axes given by

X⃗ ′ = A −1 X⃗ .

Exercise 2.12 Consider the points a = (a 1 , a 2 ) and b = (b1 , b2 ) in R2 , in x y -coordinates, and the
displacement vector
v⃗ = b − a

joining a to b . Answer the following questions. (Hint: Passive (coordinate axis) transformations
act opposite to active (point coordinate) transformations.)

1. Construct the metric tensor g in this coordinate system? (Hint: It is the 2-dimensional identify
matrix.)

2. Use the metric tensor g to compute the length of v⃗. (Hint: Use matrix multiplication.)

3. Shift the origin in the x y -coordinate system by a constant vector s⃗ = α x̂ + β ŷ , for fixed α and
β , to define a new x ′ - and y ′ -coordinate axes.

a) compute the positions of a and b with respect to the x ′ y ′ -coordinate system.

b) determine v⃗ with respect to the x ′ y ′ -coordinate system.

c) compute the metric g′ with respect to the x ′ y ′ -coordinate system.

d) Use the metric tensor g′ to compute the length of v⃗ in the x ′ y ′ -coordinate system.

72
e) Do the points a and b change when changing from the x y -coordinate system to the
x ′ y ′ -coordinate system?
f) Do the points a and b have different descriptions in the x y - and x ′ y ′ -coordinate
systems?
g) Does the length of v⃗ differ in the x y - and x ′ y ′ -coordinate systems?

4. Define a rotation matrix that will rotate the x - and y - coordinate axes, through some fixed
angle θ , to define the x ′′ - and y ′′ -coordinate axes. (Hint: Use matrix multiplication to get
the new point and vector component values.)

a) compute the positions of a and b with respect to the x ′′ y ′′ -coordinate system.


b) determine v⃗ with respect to the x ′′ y ′′ -coordinate system.
c) compute the metric g′′ with respect to the x ′′ y ′′ -coordinate system.
d) Use the metric tensor g′′ to compute the length of v⃗ in the x ′′ y ′′ -coordinate system.
e) Do the points a and b change when changing from the x y -coordinate system to the
x ′′ y ′′ -coordinate system?
f) Do the points a and b have different descriptions in the x y - and x ′′ y ′′ -coordinate
systems?
g) Does the length of v⃗ differ in the x y - and x ′′ y ′′ -coordinate systems?

5. Rescale the x - and y - coordinate axes by some fixed factor K to define a new x ′′′ - and y ′′′ -
coordinate axes.

a) compute the positions of a and b with respect to the x ′′′ y ′′′ -coordinate system.
b) determine v⃗ with respect to the x ′′′ y ′′′ -coordinate system.
c) compute the metric g′′′ with respect to the x ′′′ y ′′′ -coordinate system.
d) Use the metric tensor g′′′ to compute the length of v⃗ in the x ′′′ y ′′′ -coordinate system.
e) Do the points a and b change when changing from the x y -coordinate system to the
x ′′′ y ′′′ -coordinate system?
f) Do the points a and b have different descriptions in the x y - and x ′′′ y ′′′ -coordinate
systems?
g) Does the length of v⃗ differ in the x y - and x ′′′ y ′′′ -coordinate systems?

6. Use the forms of the metric tensors and the vector v⃗ in each of the different coordinate systems
to explain why the length of v⃗ is identical in each coordinate system.

Exercise 2.13 Suppose a particle moves along the surface of a ball with position, in the standard
rectilinear x y z -coordinate system, given by
 
x
p⃗ =  y 
 

73
in the standard rectilinear x y z -coordinate system. Answer the following questions.

1. Rewrite the position of the particle with respect to the corresponding x̂ , ŷ and ẑ unit vectors.

2. Show, by direct calculation, that the tangents to the x -, y - and z - coordinate curves are
     
1 0 0
e⃗x = 0 , e⃗y = 1 and e⃗z = 0 .
     

0 0 1

What do these tangent vectors tell us? Are these tangent vectors also unit vectors?

3. Show, by direct calculation, that the metric tensor in the x y z -coordinate system is
 
1 0 0
g = 0 1 0 .
 

0 0 1

Note: the metric tensor is written as a matrix, with rows and columns, only to demonstrate
the reoganization of elements with respect to the matrix multiplication V ⊤ gV , where V is a
3-dimensional column vector.

4. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that

x (t ) = 0, y (t ) = 0 and z (t ) = 1 − 2t .

Describe the motion of the particle along this path. Show, by direct calculation, that this path
has length ℓ = 2. Does this make sense? Explain your answer.

5. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that

2t t2−1
x (t ) = , y (t ) = 0 and z (t ) = .
t2+1 t2+1
Describe the motion of the particle along this path. Show, by direct calculation, that this path
has length ℓ = π2 . Does this make sense? Explain your answer.

Exercise 2.14 Suppose a particle moves along the surface of a ball with position, in the standard
rectilinear x y z -coordinate system, given by
   
x ρ sin (θ ) cos φ

p⃗ =  y  =  ρ sin (θ ) sin φ 
  

z ρ cos (θ )

where ρ, θ and φ are the radial, polar- and azimuthal-angle positions of particle on the sphere.
Answer the following questions.

74
1. Show, by direct calculation, that the tangents to the ρ-, θ - and φ- coordinate curves are
     
sin (θ ) cos φ cos (θ ) cos φ − sin φ
  
e⃗ρ =  sin (θ ) sin φ  , e⃗θ = ρ  cos (θ ) sin φ  and e⃗φ = ρ sin (θ )  cos φ  .
  

cos (θ ) − sin (θ ) 0

2. Show, by direct calculation, that the radial, polar- and azimuthal-unit vectors are
     
sin (θ ) cos φ cos (θ ) cos φ − sin φ
  
ρ̂ =  sin (θ ) sin φ  , θ̂ =  cos (θ ) sin φ  and φ̂ =  cos φ  .
  

cos (θ ) − sin (θ ) 0

3. Rewrite the position of the particle with respect to the ρ̂, θ̂ and φ̂.

4. Re-interpret p⃗ (ρ, θ , φ) as a column vector with coordinate components ρ, θ and φ. Write p⃗ as


a column vector in the ρθ φ-coordinate system. Compare this column vector representation
of p⃗ with that in the x y z coordinate system and explain why
   
x ρ
p⃗ =  y  =  θ  but x ̸= ρ y ̸= θ and z ̸= φ
   

z φ

is a consistent statement.

5. Show that the metric tensor in the ρθ φ-coordinate system is


 
1 0 0
g = 0 ρ 2 0 .
 
2 2
0 0 ρ sin (θ )

6. Why is the metric tensor in the x y z -coordinate system different than that in the
ρθ φ-coordinate system?

7. Suppose that ρ = ρ(t ), θ = θ (t ) and φ = φ(t ) and show that the velocity of the particle is

p⃗˙ (t ) = ρ̇(t )r̂ + ρ(t )θ̇ (t )θ̂ + ρ(t ) sin (θ ) φ̇(t )φ̂

and give an interpretation for each of quantities ρ̇(t ), ρ(t )θ̇ (t ) and ρ(t ) sin (θ ) φ̇(t ).

8. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that
π
ρ(t ) = 1, θ (t ) = and φ(t ) = 8t .
2
Describe the motion of the particle in 3 dimensions, as it travels along this path. Show, by
direct calculation, that this path has length ℓ = 8. Does this make sense? Explain your answer?

75
9. Suppose that the particle moves on a curve parameterized by t ∈ [0, 1] such that

ρ(t ) = 1, θ (t ) = πt and φ(t ) = 0.

Describe the motion of the particle in 3 dimensions, as it travels along this path. Show, by
direct calculation, that this path has length ℓ = π.

Exercise 2.15 Show that the area of the outer curved part of a hollow paper cylinder of radius R
and height h in 3-dimensions is
A = 2πR h .

Hint: use the following procedure

1. use the cylindrical coordinates system to determine the vectors tangent to the surface of the
cylinder. Suppose that the base of the cylinder corresponds to z = 0.

2. determine the area element by computing the cross product of vectors tangent to the curved
part of the cylinder.

3. integrate area element of the cylinder.

4. cut the cylinder along its edge, and press it flat to form a rectangle, then measure the lengths
of each side of this rectangle, compute its area and compare this answer that obtained by
performing the integral. (Should these answers coincide?)

Exercise 2.16 Show that the volume of a solid ball of radius R in 3-dimensions is
4
V = πR 3 .
3
Hint: use the following procedure

1. use the spherical polar coordinates system to determine the vectors tangent to surface of the
sphere. Suppose that the north pole corresponds to the point θ = 0.

2. compute the metric tensor g in this coordinate system.

3. determine the volume element by computing the scalar triple product of the tangent vectors.
q
4. compare the volume element with det(g) , where det(g) is the absolute value of the
determinant of the metric tensor (when written as a matrix).

5. integrate volume element of the sphere over the northern hemisphere and then double the
value of the integral to get the volume of the entire sphere.

6. integrate volume element of the sphere over the entire sphere and compare your answer with
that obtained by doubling the volume of the northern hemisphere. (Should these answers
coincide?)

76
Chapter 3

Newtonian Mechanics

We now consider the some formal aspects of the classical formulation of Newtonian mechanics.
We shall use ideas from this classical vector calculus based theory to build the Lagrangian
reformulation later.

3.1 Transforming Newton’s Second Law


We start by considering how NII
 2 
d p⃗ d r⃗
 ‹
F⃗ = =m = m r⃗¨
dt dt 2

transforms under a change of coordinates from Cartesian to some other system.

Example 3.1 (Newton’s Second Law to 2-dimensional Polar Coordinates (I)) In 2-dimensional
Cartesian coordinates, we write NII for a single particle of mass m as

Fx = m ẍ and Fy = m ÿ .

We wish to write similar equations in polar coordinates for the components r and θ . To obtain the
transformed equations, we transform the vector equation. One way to do this is by transforming
each of the scalar components

d2
Fx = m ẍ = m (r cos (θ )) .
dt 2
Then, the first derivative is

d
ẋ = (r cos (θ )) = ṙ cos (θ ) − r θ̇ sin (θ ) .
dt
Differentiating again we get

d
ṙ cos (θ ) − r θ̇ sin (θ ) = r̈ cos (θ ) − 2ṙ θ̇ sin (θ ) − r θ̈ sin (θ ) − r θ̇ 2 cos (θ ) .

ẍ =
dt

77
So we can write

Fx (r, θ ) = m r̈ cos (θ ) − 2ṙ θ̇ sin (θ ) − r θ̈ sin (θ ) − r θ̇ 2 cos (θ ) .




A similar reasoning applies to the second component

Fy (r, θ ) = m r̈ sin (θ ) + 2ṙ θ̇ cos (θ ) + r θ̈ cos (θ ) − r θ̇ 2 sin (θ ) .




We now obtain the ‘generalized’ components of force in the r and θ directions


‚ Œ ‚ Œ
cos (θ ) Fx
Q r = e⃗r · F⃗ = · = cos (θ ) Fx + sin (θ ) Fy
sin (θ ) Fy
‚ Œ ‚ Œ
−r sin (θ ) Fx
Qθ = e⃗θ · F⃗ = · = −r sin (θ ) Fx + r cos (θ ) Fy
r cos (θ ) Fy

where the new components of force are expressed in terms of the old components.

The decomposition of force into components expressed in Example 3.1 maintains the
(x , y )-coordinate description of force in the transformed coordinate system. It will be generally
more useful to completely express the components of force using the components in the new
(r, θ )-coordinates with the associated coordinate direction vectors. This new decomposed is
demonstrated in Example 3.2
3.2.

Example 3.2 (Transforming Newton’s Second Law in 2-Dimensional Polar Coordinates (II))
We now present an alternative approach for transforming NII to polar coordinates. To begin with,
consider the Cartesian coordinate vector
‚ Œ
x
r⃗ = .
y

This can be immediately written in polar coordinates


‚ Œ
r cos (θ )
r⃗ = = r r̂ .
r sin (θ )

Thus, getting time derivatives of seems to boil down to getting time derivatives of the unit vectors.
This is easy
‚ Œ ‚ Œ ‚ Œ
d r̂ d cos (θ ) − sin (θ ) θ̇ − sin (θ )
 ‹
= = = θ̇ = θ̇ θ̂ ,
dt dt sin (θ )
cos (θ ) θ̇ cos (θ )
‚ Œ ‚ Œ ‚ Œ
d − sin (θ ) − cos (θ ) θ̇ − cos (θ )
 
d θ̂
= = = −θ̇ = −θ̇ r̂ .
dt d t cos (θ ) − sin (θ ) θ̇ sin (θ )

It is now a very simple matter to get r⃗¨ ,

¨ d2
r⃗ = (r r̂ )
dt 2

78
d 
= ṙ r̂ + r θ̇ θ̂
dt
= r̈ r̂ + 2ṙ θ̇ θ̂ + r θ̈ θ̂ − r θ̇ 2 r̂ .

Grouping components we get

r⃗¨ = r̈ − r θ̇ 2 r̂ + r θ̈ + 2ṙ θ̇ θ̂ .
 

Hence, Newton’s Second Law states

F⃗ = m r̈ − r θ̇ 2 r̂ + m r θ̈ + 2ṙ θ̇ θ̂ .
 

It is now simple to obtain the generalized components of force in the radial and tangential directions

Q r = e⃗r · F⃗ = r̂ ⃗· F=
⃗ m r̈ − r θ̇ 2 r̂ · r̂ + m (r θ̈ + 2ṙ θ̇ )r̂ · θ̂ .


Since the unit vectors are orthonormal in polar coordinates, the first inner product is 1 and the
second is 0. Thus
Q r = m r̈ − r θ̇ 2 .


Similarly,
Qθ = e⃗θ · F⃗ = r θ̂ · F⃗ = m r̈ − r θ̇ 2 r θ̂ · r̂ + m r θ̈ + 2ṙ θ̇ r θ̂ · θ̂ .
 

Again we make use of orthogonality to obtain

Qθ = m r θ̈ − r θ̇ 2 r.


Notice that the curvature has appeared here in the shape of the metric coefficient, r . Qθ is not the
component of the force vector in the θ̂ direction, but rather the generalized component of ‘force’
associated with the θ coordinate. This component doesn’t have units of force, but of torque.

Remark 21 The component of force Q r = m r̈ − r θ̇ 2 along r̂ indicates that there is an associated




acceleration along r̂ that is a function of r̈ , r and θ̇ . Most notably, if we enforce that r is constant,
then ṙ = r̈ = 0, however, the component of acceleration along r̂ does not vanish. Instead, it
reduces to Q r = −m r θ̇ 2 , which suggests a resultant acceleration along −r̂ with a magnitude that
is proportional to the square of the angular rate of change and distance r from the center of the
coordinate system. This centripetal (center-seeking) acceleration corresponds a ‘fictitious force’
that is not apparent in the (x , y )-coordinate description of the motion. Such ‘fictitious forces’ are a
common feature of the curvilinear coordinate descriptions of mechanical systems.

Summarizing our results

1 2m ṙ θ̇
m r̈ = Q r + m r θ̇ 2 and m θ̈ = Qθ − .
r 2 r
Notice how the equations of motion have undergone a change of form when we changed
coordinates. There are extra terms on the right hand side that didn’t exist in the Cartesian

79
coordinate version of NII. Also, notice that the transformed ‘forces’ are not necessarily even
forces. In this Q r case is a force and Qθ is a torque. If we want to consider what the equations of
motion look like in terms of real forces, we can break up the force vector into rectilinear
components in the r̂ and θ̂ directions

Fr = r̂ · F⃗ = e⃗r · F⃗ = Q r
1 1
Fθ = θ̂ · F⃗ = e⃗θ · F⃗ = Qθ .
r r
These are called the physical components of force, and they are real forces that one might measure.
We can write the equations of motion in terms of these real forces

m r̈ = Fr + m r θ̇ 2
1 m ṙ θ̇
m θ̈ = Fθ − 2 .
r r
Next, the transformation is performed by using the derivatives of the spherical unit vectors, and
taking advantage of the orthogonality of the spherical unit vectors (the second method presented
above).

Example 3.3 (Newton’s Second Law to Polar Spherical Coordinates) Recall


     
sin (θ ) cos φ cos (θ ) cos φ − sin φ
  
ρ̂ =  sin (θ ) sin φ  , θ̂ =  cos (θ ) sin φ  and φ̂ =  cos φ  .
  

cos (θ ) − sin (θ ) 0

So,
     
sin (θ ) cos φ cos (θ ) cos φ − sin (θ ) sin φ
d ρ̂ d 
 ‹
  
=  sin (θ ) sin φ  = θ̇  cos (θ ) sin φ  + φ̇  sin (θ ) cos φ  = θ̇ θ̂ + φ̇ sin (θ ) φ̂
 
dt dt
cos (θ ) − sin (θ ) 0
     
  cos (θ ) cos φ − sin (θ ) cos φ − cos (θ ) sin φ
d θ̂ d 
 cos (θ ) sin φ  = θ̇  − sin (θ ) sin φ  + φ̇  cos (θ ) cos φ  = −θ̇ ρ̂ + φ̇ cos (θ ) φ̂
  
=
 
dt dt
− sin (θ ) − cos (θ ) 0
   
  sin φ − cos φ
d φ̂ d    
= cos φ  = φ̇  sin φ  .

dt dt
0 0

This is better than we could have hoped - in trying to eliminate the last parameter, we got exactly
the vector we need, and we can write
 
d φ̂ 
= −φ̇ sin (θ ) ρ̂ − cos (θ ) θ̂ .
dt
The other two identities are summarised below for convenience,
 
d θ̂
= −θ̇ ρ̂ + φ̇ cos (θ ) φ̂
dt

80
d ρ̂
 ‹
= θ̇ θ̂ + φ̇ sin (θ ) φ̂.
dt

Using these results we can compute the derivatives of position r⃗ = r ρ̂ and so

r⃗˙ = ṙ ρ̂ + r θ̇ θ̂ + r φ̇ sin (θ ) φ̂.

Therefore,

r̈⃗ =ṙ ρ̂ + ṙ θ̇ θ̂ + φ̇ sin (θ ) φ̂ + ṙ θ̇ θ̂ + r θ̈ θ̂ + r θ̇ −θ̇ ρ̂ + θ̇ ρ̂ + φ̇ cos (θ ) φ̂


 

+ ṙ φ̇ sin (θ ) φ̂ + r φ̈ sin (θ ) φ̂ + r φ̇ θ̇ cos (θ ) φ̂ − r φ̇ sin (θ ) φ̇ sin (θ ) ρ̂ + cos (θ ) θ̂ .

Grouping like terms yields

r̈⃗ = r̈ − r θ̇ 2 − r φ̇ 2 sin (2) θ ρ̂




+ r θ̈ + 2ṙ θ̇ − r φ̇ 2 sin (θ ) cos (θ ) θ̂

+ r φ̈ sin (θ ) + 2ṙ φ̇ sin (θ ) + 2r θ̇ φ̇ cos (θ ) φ̂.

We can then write the result in terms of the real (not generalized) components of force by taking
inner products with each of these unit vectors on both sides of the equations of motion

Fθ = r θ̈ + 2ṙ θ̇ − r φ̇ 2 sin (θ ) cos (θ )


Fφ = r φ̈ sin (θ ) + 2ṙ φ̇ sin (θ ) + 2r θ̇ φ̇ cos (θ ) .

Newton’s Laws are barely recognizable when written in polar coordinates. Clearly we would have
to have a good motivation for going through all this effort.

That motivation might stem from the motion of the system being constrained to take place on
a sphere or a cone. In such cases, the equations above might prove useful. Nevertheless, if we
need to transform NII to some new coordinate system for every new problem we solve, then we
are going to be spending a lot of time transforming equations that could be better spent elsewhere.
We notice that our problems come from the fact that NII are not transformation invariant. They
have different forms in different coordinate systems. The equations of motion we’ll meet later
(Lagrange’s Equations) will be transformation invariant, and hence much more elegant.

Example 3.4 (Oscillatory Motion) We consider related ideas of circular motion (Figure 3.1
3.1) and
the motion of a simple pendulum as depricted in Figure 3.2
3.2. First consider an object constrained to
move in a circle. In polar coordinates this means that r is constant. Hence r̈ = ṙ = 0. Now recall
that the velocity vector is given in Polar Coordinates by

v⃗ = ṙ r̂ + r θ̇ θ̂

and the acceleration vector is given by

a⃗ = r̈ − r θ̇ 2 r̂ + 2ṙ θ̇ + r θ̈ θ̂ .
 

81
Plugging into these equations the requirements r̈ = ṙ = 0 we get

v⃗ = r θ̇ θ̂ and a⃗ = r θ̈ θ̂ − r θ̇ 2 r̂ .

So the velocity is tangential and of magnitude r θ̇ and the acceleration has both a tangential and a
radial component. If we further specify that the motion has constant angular velocity θ̇ = ω then
θ̈ = 0 and the tangential acceleration vanishes. In this last case the motion is characterised by

Velocity: r ω tangential to the circle.

Acceleration: r ω2 radially inwards.

Since the velocity is v = r ω, we can write ω = vr , and hence we can write the acceleration in terms
of the velocity
v2
r̂ . a⃗ = −
r
These are the well known velocity and acceleration of a particle moving at a constant angular
velocity.

Figure 3.1: A pendulum comprising a bob of mass m attached to the end of a rigid, massless rod
of length l that is free to swing from an anchor point without the influence of any outside agent
will swing smoothly on a circle about the anchor point.

Now consider the simple pendulum, comprising a bob of mass m attached to a mass-less rod of
length L and allowed to swing freely under the influence of gravity and depicted in Figure 3.2
3.2. We
can write out the transformed NII for polar coordinates
2 !
r θ̇
m r̈ = Fr + m r θ̇ 2 ⇒ Fr = m r̈ −
r
1 2m ṙ θ̇ 
m θ̈ = Fθ − ⇒ Fθ = m r θ̈ + 2ṙ θ̇
r r
To completely determine the motion of the simple pendulum, we must determine which forces are
acting on the system. Clearly the forces in play are the weight of the bob and Tension.

82
O θ

T⃗ θ̂

x̂ m
θ


W
Figure 3.2: A simple pendulum is attached at an anchor point. A Cartesian coordinate system and
a polar coordinates are attached to the anchor point of the pendulum. The angle θ in the polar
coordinates is as illustrated. Aligning coordinate systems in this way keeps the transformation
equations between the Cartesian coordinate system and the polar coordinate system the same as
what we would expect. We now notice that we are constrained to fix the radial component of the
bob to r = L , and hence ṙ = r̈ = 0.

According to the coordinates in Figure 3.2


3.2, the vector r̂ points downward and to the right, away
from the anchor point at O . For the pendulum in this configuration, the weight of the bob acts in
the − ŷ direction and tension in the −r̂ direction, so

⃗ = −mg ŷ
W and T⃗ = −T r̂ .

⃗ in the r̂ direction given by


The projection of the weight W

⃗ = r̂ · mg (− ŷ ) = +mg cos (θ )
r̂ · W

where θ is the angle between r̂ and − ŷ . It is clear that this should be the case since θ is the angle
between the cord of the pendulum, directed along r̂ , and the line running parallel to the y -axis.
⃗ and r̂ is also θ . In contrast, the θ̂ direction, as measured at m , is
Similarly, the angle between W
orthogonal to r̂ and is oriented in the positive direction of the x − y -plane, following the right-hand
rule. Then, θ̂ has a component that extends in the positive ŷ -direction and a component that
⃗ in the θ̂ direction is
extends in the positive x̂ -direction. The projection of the weight W
 π


θ̂ · W = θ̂ · m g − ŷ = mg cos θ + = −mg sin (θ )
2
where the angle bwteen the positive θ̂ -direction and the − ŷ -direction is θ + π2 . This follows from
the fact that θ̂ and r̂ are orthogonal and the smallest angle between r̂ and − ŷ is θ . In contrast, T⃗
directed only along −r̂ and has no θ̂ component, so

⃗ + r̂ · T⃗ = +mg cos (θ ) − T,
Fr = r̂ · W
⃗ + θ̂ · T⃗ = −mg sin (θ ) .
Fθ = θ̂ · W

The restriction r = L = constant, and hence ṙ = r̈ = 0. This implies that Fr = 0 when r = L is fixed.
Plugging these into the transformed NII gives
g
T = m L θ̇ 2 + g cos (θ )

and m θ̈ = −m sin (θ ) .
L

83
The r -component gives us the required value of tension (the force of constraint), and the
θ -component gives us the equation of motion
g
θ̈ + sin (θ ) = 0.
L

This equation of motion is independent of mass. Thus, whether we chose a 1kg bob or a 2000kg
bob, the motion executed by the pendulum (provided the rod did not bend) would be exactly the
same. This differential equation is difficult to solve exactly. We thus consider approximations
valid in certain regimes. In particular, we ask how the system might behave when θ is small (near
equilibrium).
We begin by writing out a Taylor expansion of sin (θ ),

θ3 θ5 θ7
+O θ9 .

sin (θ ) = θ − + −
3! 5! 7!

Since the angle is small, we expect θ 3 to be very small. As a first approximation, we then neglect
terms beyond the first one in this expansion and obtain sin (θ ) ≈ θ . With this approximation, we
rewrite our differential equation
g
θ̈ + θ = 0.
L
This equation is very easy to solve
s  s 
g g
θ (t ) = A cos t + B sin t
L L

where A and B are determined by the initial conditions, or equivalently,


s 
g
θ (t ) = R cos t +φ ,
L

where R and φ are determined by the initial conditions. From this last expression we can clearly
see that for small angles θ approximately executes simple harmonic motion of amplitude R and
phase offset φ.

3.2 Generalised Coordinates


We have already seen examples of parametric curves, where the shape and position of the curve
is fixed by the parametric function that defines a curve, and any single point in space along the
curve is uniquely specified the parameter values that specify that position along the curve. Here
the point mass is confined to move along a straight line in the two dimensional plane. Then the
assignment of position coordinates to the point mass needs only specify its position along the line
because it is not free to move in an arbitrary way in the plane. Limiting the number of parameters
that are needed to encode the position of a particle leads naturally to the idea of generalised
coordinates.

84

Figure 3.3: A point mass moving on a one dimensional path can be described by a parametric
curve a⃗(t ) where the parameter t can be thought of as a time parameter that updates the current
position of the particle. The actual representation a⃗(t ) as a function will depend on the choice of
coordinates system used.

Use general geometric considerations to assign a minimal set of coordinates to a system.


Generalised coordinates are a non-unique encoding of the position of a system. Time is not
considered as one of these parameters.
Sometimes it is convenient to abandon the familiar coordinate systems, and to use very ad hoc
geometrical constructions to describe positions. This can take many forms, and it is not possible
to make very useful general statements about this process. Any set of parameters that completely,
and usually without redundancy, specifies the position of all parts of a system, can be called a
set of generalised coordinates. Time is not considered as one of these parameters, even if one
must specify the time, in order to interpret the generalised coordinates. This is because time
moves on in any case, and does not give us direct information on the location of the parts of the
system. Forgetting dynamical consequences of the passing of time, consider how you would set
up a system for an initial condition. Choosing the time at which this initial condition applies gives
no information of what you intend the initial system configuration to be.
The space consisting of all possible configurations of a system of particles is known as the
configuration space of the system. We’ll consider first the configuration space of a single particle
under various circumstances. A single particle living in 2-dimensional space with no limitations
on its movement can be placed anywhere in the plane. Thus the whole 2-dimensional plane (R2 )
is the Configuration Space of the particle. Similarly, a single particle in 3-dimensional space with
no limitations on its movement can be placed anywhere in 3-dimensional space. So R3 is the
configuration space of the particle.
A simple planar pendulum comprises a single bob of mass that is constrained to move at a
constant distance from the origin. The space of possible configurations are all points at this fixed
distance from the anchor point - a circle. The circle illustrated in Figure 3.1 the configuration
space for this particle. The path followed by the simple pendulum is a a portion of this circle
and corresponds to a one dimensional curve in the vertical plane. We say that the configuration
space for this system is one dimensional. We say that that the configuration space of the simple
pendulum is a circle. Since the angular displacement of the pendulum determines the position of

85
the pendulum in the plane, we might also say that the configuration space of the simple pendulum
is the circle S 1 .

θ1 r~1


m1
θ2 r~2

m2

Figure 3.4: A double pendulum comprising a simple pendulum a mass m1 , attached to a rigid
massless rod anchored at a point, and a second simple pendulum comprising a mass m2 , attached
to a second massless rod with anchor attached at at m1

Now consider a planar double-pendulum, where two rods of lengths L 1 and L 2 are attached
at a hinge and the particle is placed at the end of the second rod, as in Figure 3.4
3.4. The system
can now be configured in such a way that the particle ends up anywhere in space with distance
between L 1 − L 2 and L 1 + L 2 from the origin. Clearly this space is 2-dimensional. The locus of
points which correspond to all the possible positions of the double pendulum is an annulus with
inner radius |L 1 − L 2 | and outer radius |L 1 + L 2 |.
A 3-dimensional ‘simple pendulum’ comprises a rod of fixed length that is anchored to a
freely moving hinge at one end while the other end has a bob with mass attached. The bob
can now be positioned everywhere on a sphere around the origin of radius equal to the length
of the rod. The configuration space of the system is thus the surface of the sphere. Since this
surface is 2-dimensional, we say the configuration space in this instance has dimension 2. Similar
reasoning to that for the planar double-pendulum leads us to the conclusion that a 3-dimensional
double-pendulum has a configuration space that is a sphere with a hollowed centre, S 2 .
When we introduce several particles, the configuration space C is the set of all possible
configurations of all of them. It becomes harder to visualize such spaces, as often they have
dimension greater than 2 or 3, and thus cannot be thought of as geometric figures. However, it is
understood that they resemble (at least locally), curves or surfaces in higher dimensional space.
Mathematicians call such objects Manifolds.
In the simplest case of N particles each living in 3-dimensional space without limitations, the
configuration space is simply the collection of all possible positions of each particle. This is the
product space
C = R3 × R3 × · · · × R3 ,

formed by the configuration of each particle in the system independently assuming some position
in R3 . We can identify this space with R3N . Each configuration of the system is given by a tuple

x11 , x12 , x13 , . . . xN1 , xN2 , xN3 ∈ C ,




86
of 3N numbers. Clearly it is 3N -dimensional. When we add constraints to this system, the
configuration space will become a surface of some dimension less than 3N that can be embedded
in R3N much as our geometrically constrained configuration spaces above were embedded in the
unconstrained spaces.
In general, the dimension of these configuration spaces is the smallest number of parameters
needed to describe the whole space. This definition sits well with what we’ve said above for the
simpler spaces. We need two parameters to describe every point in the plane and three to describe
every point in space. So these are 2- and 3-dimensional manifolds respectively. Similarly, the
configuration space of a simple pendulum was a circle - this is 1-dimensional because we can
parameterize it with a single parameter, s ∈ [0, 2π), as follows

x = r cos (s ) and y = r sin (s ) .

Likewise the space of the double pendulum in the plane is parameterized by two angles and is
hence 2-dimensional. Notice that the dimension of the space is not dependent on the
parametrization we use, but rather it is the minimum number of parameters necessary in any
parametrization of the space. We could, for instance, have parameterized the same circle above
with
x = r cos (s ) and y = r sin (s ) .

What is important is that both of these parametrization involve a single parameter, making the
space one dimensional. Indeed for some spaces we need to patch different parametrization
together in different parts of the space, but the number of parameters in these parametrization is
always the same, and this is the dimension of the space.

Example 3.5 An example should serve to consolidate all the ideas introduced above. We consider
three points living in 2-dimensional space. To begin with let us suppose that there are no constraints
on any of the points. Each is thus free to move anywhere in the plane. The configuration space of
the first point is then R2 , and so are the configuration spaces of the second and third points.
Since there are no constraints linking the coordinates, we can write the configuration space for
the whole system as
C = R2 × R2 × R2 = R6 .

We could choose any coordinates (Cartesian, Polar, Elliptic, Hyperbolic, etc.) to represent each of
the particles - the important feature being that there are two degrees of freedom associated with
each particle, and six degrees of freedom in the system. The configuration space is 6-dimensional in
this case.

3.3 Constraints
It is useful to know the dimension of the configuration space as this corresponds to the number
of parameter that are needed to specify the position of the system. We shall spend some time

87
understanding the counting of these dimensions. It is common to replace the term dimension of
the configuration space in favour of the less wordy term defined as follows.

Definition 11 (Degrees of Freedom) The Degrees of Freedom of a system are the number of
independent coordinates (not including time) required to specify completely the position of each
and every particle or component part of the system.

Remark 22 The dimension of the configuration space for some system is the number of Degrees of
freedom of that system. Thus the number of degrees of freedom of a system indicates the number
of independent ways that the system can move freely. Stated differently, the number of degrees
of freedom correspond to the dimension of the configuration space C , where each number in the
tuple c ∈ C is a coordinate that can be altered to change the position of the system. We interpret the
degrees of freedom of a system as being the number of independent ways that the system can move.
Next we consider some simple examples. Clearly, the terms dimension of the configuration space
and degrees of freedom are equivalent.

Next we consider in some examples the degrees of freedom of a system.

Example 3.6 (Point Mass Moving in the Plane) Consider a point mass that is free to move in the
two dimensional plane. Since the particle is free to move any direction in the x − y -plane, but not in
the z -direction, then regardless of specific choice of parameterization, there are at most 2 parameters
that specify the motion of the particle in the plane, in this case, the x - and y -directions. So the
particle has two degrees of freedom - even though the coordinate system includes the z -direction.

r~
θ


Figure 3.5: Point-Mass-Moving-on-Plane-Labelled

If we choose to describe the position of the particle using polar coordinates the the position
depends on the radial distance from the origin, and the angular displacement from the x -axis. So,
a change of coordinates does not change the number of parameters needed to describe the position
of the particle. We can now consider reducing the number of degrees of freedom by considering
limitations to the free motion of the system.

88
Finally, we consider the most obscure kind of configuration space - a time-evolving
configuration space. This construct occurs for systems that have explicitly time dependent
constraints.

Example 3.7 Consider a particle constrained to the floor of a moving lift as discussed earlier. Before
adding this constraint the particle would have been free to move in 3-dimensional space and the
configuration space would have been R3 . With the constraint in place we suspect a 2-dimensional
configuration space - the floor of the lift.
Notice that at each instant in time, the possible positions of the particle are those positions
making up the floor of the lift at that time instant. Thus our configuration space is a moving plane.
We can parameterize it with two parameters; r and s as follows

S = {(r, s , h (t )) : r ∈ R, s ∈ R}

Notice that the parameterization is dependent on time, and thus the manifold will be different
at different points in time. Think of it as a time-evolving surface. In this case the evolution is
simple - it moves upwards, but in principle it can be any prescribed evolution. It is important to
notice the following distinction - the motion of the lift is known, so the configuration space of the
particle is 2-dimensional. If we did not know the motion of the lift in advance and allowed it to fall
down the shaft, we would have to solve for its motion, as well as that of the particle on the floor
of the lift. Because we do know the motion of the lift in advance, we can infer the z -coordinate of
our particle at any instant in time. This means that the z -coordinate is predetermined and not a
degree of freedom for the particle. It is only free to move in the x and y directions, and hence the
configuration space is 2-dimensional.

It often occurs that one or more physical or geometric restrictions are imposed on a given
system. These restrictions lead to the following definition.

Definition 12 (Equation of Constraint) Each equation among the state variables of a given system
that reduces the number of degrees of freedom of that system is an equation of constraint.

Each equation of constraint reduces the collection of all conceivable motions of the system
to some smaller subset. Let’s look at a simple example that is different from what we have seen
before. We now see that Definition 11 captures the important point - that the number of degrees
of freedom is the number of parameters required to fully specify the position or configuration of
the system, given that we already know all the constraints on the system’s motion. We may choose
these parameters in different ways, but whichever we choose the number will be the same.
We can formally count the number of degrees of freedom of a system in multiple dimensions.
Consider a multiparticle system in 3-spatial dimensions with n particles gives rise to 3n Cartesian
Coordinates. If a particle is free to move along the x , y and z directions, then that particle has
three degrees of freedom - one for each direction. If a system contains more than one particle,
then sum up the number of degrees of freedom of each particle to find the number of degrees

89
of freedom of the system. This makes sense since each particle needs 3 numbers to describe its
position and there 3n numbers needed to describe the position of n particles. Given m equations
of constraint it follows that there are

N = 3n − m Generalised Coordinates qi , for i = 1, . . . , N .

Since each constraint eliminates one degree of freedom, m constrains will eliminate m degrees
of freedom. The number of generalised coordinates corresponds to the number of degrees of
freedom. As such, each qi corresponds to a possible coordinate direction along which a particle
can move. It is common in Lagrangian mechanics to use qi to represent the i -th generalised
coordinate. Similarly, it is convention to write qi as the i -th generalised coordinate and pi as the
i -th generalised momentum.

Example 3.8 (Coupled Point Masses Moving in the Plane) Consider a solid rod that is free to slide
along the z = 0 plane in three dimensions. Since the rod is an extended object, it is clear that more
than two numbers are necessary to describe the motion of the rod since specifying the position of
any one part of the rod is insufficient to describe all points along the rod. One method to define
the position of the rod is to consider first the position of one point on the rod, here the midpoint c .
Then each point along the rod lies on the line segment joining a and b , where a and b are each
some distance l from c . Once the position of c is specified, then the relative position of any other
point on the rod can be determined.

a
c θ

b
r~



Figure 3.6: Coupled-Point-Masses-Moving-on-Plane-Fully-Labelled

We can describe the position of c using the displacement vector r⃗, and the orientation of the
line segment joining a and b is given by an angle θ . These are the generalised coordinates of the
rod in the plane. If we restrict the description of the system to include only the end points of the rod,
then we require 2 numbers to describe the position of c in the plane, and one number to describe
the angular orientation of the rod in the plane. Therefore, we need 3 numbers to define the position
and orientation of the rod in 3-dimensions. Notice that this is fewer that the number of parameters
needed to describe the unrestricted motion of two particles in 3-dimensions, which is 6. Next we
make some general observations about generalised coordinates.

90
Cartesian coordinates explicitly belong to one or another particle in the system. Generalised
coordinates are an attribute of the entire system. Cartesian coordinates explicitly belong to one
or other particle in the system. Generalised coordinates may well at times look like Cartesian
coordinates of one or another particle, but they do not really fundamentally belong to any one
particle, but rather are an attribute of the entire system. The idea of generalised coordinates
comes about when we take into account constraints on the system. What follows are some details
about the kinds of constraints that are interesting for us. In particular, we will be interested in a
special kind of constraint called Holonomic Constraints.

Definition 13 (Holonomic Constraints) Holonomic constraints can be written as a system of


algebraic equations of position q 1 , q 2 , . . . , q m and time t only,


 1 2
q , q , . . . , q m , t = 0,

f

and reduce the number of degrees of freedom. Non-holonomic constraints are those that cannot be
written as algebraic equations of position and time.

Holonomic constraints can be written as a system of algebraic equations of position and time
only. The interrelation between coordinates in each holonomic constraint serves to reduce the
number of independent parameters in the system. The equation of constraint allows us to express
some coordinates explicitly in terms of the others, making some redundant. Obviously, this is
not possible for non-holonomic constraints. We classify constraints according to whether they
allow us to eliminate degrees of freedom in a system. Holonomic constraints can be written as
an equation of coordinates of the system. A constraint that cannot be written as an equation is
called non-holonomic.
A (continuous) Holonomic constraint implies that one of the coordinates can (at least locally)
be written as a function of the other coordinates and eliminates (unnecessary) coordinates. This
follows from Implicit Function Theorem and the Inverse Function Theorem. The Categories of
holonomic constraints include

Rhenomic: time t appears explicitly.

Scleronomic: time t does not appear explicitly.

Following the implicit function theorem, a holonomic constraint allows us to locally write one
coordinate as a function of the other coordinates in the neighbourhood of some point. Also,
following the inverse function theorem, a holonomic constraint can be inverted, again locally,
to change which coordinate is eliminated from the set of independent parameters. Together,
these theorems allow us to (at least locally) reduce the complexity of systems that we study. There
are additional subcategories of holonomic constraints, known as Rhenomic and Scleronomic
constraints. Interested readers should look into the formal statements of each of these theorems
and subcategories. Next I consider an example of a holonomic constraint on the motion of a
particle.

91
Example 3.9 (Motion on a circle in three dimensions) Consider a particle with position given by

p⃗ = x x̂ + y ŷ + z ẑ

subject to the constraints


x2 + y 2 −a2 = 0 and z =0

Given any one coordinate value, we can use the equations of constraint to determine the others.
We have already seen parametric curves on the circle and on the sphere. The requirement
that a particle if constrained to move only on the surface of the sphere limits the total number of
dimensions in which it moves. The degrees of freedom limit the number of free parameters needed
to specify the position of a particle on its motion. Clearly, we need only one parameter to define the
motion on a circle - the angular displacement of the particle along the circle, or alternatively, the
distance along the circumference of the circle - either is an a appropriate generalised coordinate.
In Cartesian coordinates, we find
p
y (x ) = ± a 2 − x 2 ,
y
or, we could use the angular displacement tan (θ ) = x , then
   
x a cos (θ )
 p
p⃗ (x ) = ± a 2 − x 2  or p⃗ (θ ) =  a sin (θ )  .
  

0 0

In each case, one free parameter determines position.

We have already seen parametric curves on the circle and on the sphere. The requirement
that a particle if constrained to move only on the surface of the sphere limits the total number of
dimensions in which it moves. The degrees of freedom limit the number of free parameters needed
to specify the position of a particle on its motion. Clearly, we need only one parameter to define
the motion on a circle - the angular displacement of the particle along the circle, or alternatively,
the distance along the circumference of the circle - either is an appropriate generalised coordinate.
For comparison, let’s consider some examples of non-holonomic constraints.
Recall that a holonomic constraint is one that can be expressed by an equation

f q 1 , q 2 , . . . , q m , t = 0.


Assuming that q 1 , q 2 , . . . , q (m −1) are known, we can use the equation above to determine (via the
implicit function theorem) the possible values of q m satisfying the constraint. Thus knowing
m − 1 of the coordinates enables us to determine the other one. It is then possible to find m − 1
parameters and some parametrization for the new surface, which is intrinsically m −1 dimensional.
Thus the constraint has eliminated one degree of freedom. In general, the number of degrees of
freedom of a system of N particles in 3-dimensional space is 3N − q , where q is the number of
independent Holonomic Constraints acting on the system.

92
We can parameterize the configuration space of a system in many different ways. Let us
specify one such parametrization, (q 1 , q 2 , . . . , q m ), of a system of n particles by the transformation
equations

x1 = x1 q 1 , q 2 , . . . , q m , t ,


x2 = x2 q 1 , q 2 , . . . , q m , t ,


..
.
x3N = x3N q 1 , q 2 , . . . , q m , t


We call the parameters q 1 , q 2 , . . . , q m generalized coordinates if they satisfy the following two
conditions

1. They fully specify the possible motions of the system.

2. They do not obey any holonomic constraints between them.

The first point tells us that it must be possible to write down transformation equations from our
generalized coordinates to the Cartesian Coordinates of all the particles in the system. The second
point says that we must be free to vary the Generalized Coordinates however we like without
Holonomic constraints between them ? this limits the number of coordinates chosen for our
system to be exactly equal to the number of degrees of freedom of the system. Together, these
points state that Generalized Coordinates are a parametrization of the configuration space.
The most important thing to take home from this is that generalized coordinates do not obey
holonomic constraints - all the constraint information is encoded in the transformation equations.

Remark 23 There is never only one choice of generalized coordinates for a system. For instance for
a particle moving freely in a plane we may choose to use Cartesian coordinates, polar coordinates,
elliptic coordinates, hyperbolic coordinates, etc.. Provided some set of coordinates satisfies the
conditions above, we can regard them as generalized coordinates for the system.


Example 3.10 Let us consider Cartesian coordinates. Then particle 1 has coordinates x1 , y1 ,
 
particle 2 has coordinates x2 , y2 and particle 3 has coordinates x3 , y3 . A general point in the cross
space R2 × R2 × R2 is of the form x1 , y1 , x2 , y2 , x3 , y3 , which we associate with an element of
  

R6 given by x1 , y1 , x2 , y2 , x3 , y3 . We can consider these parameters as generalized coordinates for




the system because they are not constrained and they completely describe the configuration space of
the system. We now add a single constraint to the system. We require that particle 1 and particle 2
are a fixed distance, L 12 apart. We can think of this as connecting the two particles by a rigid,
massless rod. This constraints is Holonomic as it can be expressed by
2
y2 − y1 + (x2 − x1 )2 − L 12 = 0.

Thus we expect the degrees of freedom to drop from six to five.

93
The new configuration space can be thought of as the product of three spaces - the first is the
position of the first particle, the third is the position of the third particle, and the second is the
possible locations of the second particle given the position of the first - this is a circle of radius L 12
about the first particle.

Example 3.11 The circle manifold is depicted S 1 , so we can write the configuration space as

C = R2 × S 1 × R2 .

The first and third components here are 2-dimensional, while the circle is 1-dimensional. Thus, the
total dimension of C is 5. In other words the system has five degrees of freedom and we search for
five generalized coordinates.
This time a set of three pairs of Cartesian coordinates no longer parameterized our space. we
 
need to find some other set of coordinates. A neat choice is to keep x1 , y1 and x3 , y3 but instead of

x2 , y2 we simply consider the angle formed between particles 1 and 2 measured from horizontal at

particle 1. We call this angle φ12 . Thus we can parameterize C with x1 , y1 , φ12 , x3 , y3 . There are 5
generalized coordinates, and hence 5 degrees of freedom. In other words, the configuration space
is 5 dimensional. We can write our transformation equations back to the Cartesian coordinates
of each of the particles. Clearly the transformation equations for x1 , y1 , x3 , and y3 are trivial. The
other two are given by
 
x2 = x1 + L 12 cos φ12 and y2 = y1 + L 12 sin φ12 .

Now let us add a second constraint to the system. We require that the distance between the first
and third particles is constant. This can once again be thought of as connecting them by a rigid,
mass-less rod. This constraint is Holonomic

(x3 − x1 )2 + y3 − y1 − L 13
2

= 0.

Moreover it is independent of the previous constraint as it constrains the third particle. We thus
expect the degrees of freedom of the problem to reduce again by 1. Hence we expect that the
configuration space for this new system is 4 dimensional. Indeed, once we know the position of the
first particle, the possible positions of the third particle describe a circle of radius L 13 centred at the
first particle. Thus we can write the configuration space as

C = R2 × S 1 × S 1 .

Clearly this space has dimension four, and so this system has four degrees of freedom.
As before, we can parameterize this space by considering the angle between particle 1 and
particle 3. We can call this angle φ13 . (Note that every circle is fully parameterized by an angle,
because the radius is constant for a circle. Thus the space S 1 can be parameterized by an angle). Thus

an arbitrary point in C can be parameterized by x1 , y1 , φ12 , φ13 . The transformation equations to
get back x2 , y2 , x3 , y3 are given by
 
x2 = x1 + L 12 cos φ12 , y2 = y1 + L 12 sin φ12 ,

94
 
x3 = x1 + L 13 cos φ13 , y3 = y1 + L 13 sin φ13 .

We note in passing that the four parameters x1 , y1 , φ12 , φ13 are free to move without holonomic
constraints between them. The constraints are implemented by the transformation equations.
We now add yet another constraint to the system. This time we require that the angle formed
between the two lines shown in the picture to the left is a constant, θ . This constraint is Holonomic.
To show this we’ll write it in terms of our previous set of coordinates

φ12 − φ13 − θ = 0.

Since we used only coordinates and not velocities or other quantities in this expression, the constraint
is holonomic. Moreover since it relates φ12 and φ13 we can see that it is independent of previous
constraints. Thus we expect that it will reduce the number of degrees of freedom of the system by
one. We thus expect that the resulting Configuration space will have dimension 3. Indeed, directly
from the equation of constraint, we infer

φ13 = φ12 − θ .

It is redundant to include φ13 as a coordinate since the system is fully specified by φ12 alone (or
indeed by any one consistently defined angle). Therefore, we should write the configuration space as

C = R2 × S 1

The generalized coordinates in this case are (x1 , y1 and φ12 ). The transformation equations from
these to the original Cartesian coordinates are

x1 = x1 y1 = y1 ,
 
x2 = x1 + L 12 cos φ12 , y2 = y1 + L 12 sin φ12 ,
 
x3 = x1 + L 13 cos φ12 − θ , y3 = y1 + L 13 sin φ12 − θ .

For an important purpose, we now add one final constraint to the system
With all the existing constraints, we consider the additional constraint that the distance between
particle 2 and particle 3 is fixed to the value L 23 .
This constraint is Holonomic
2
(x3 − x2 )2 + y3 − y2 − L 23
2
= 0,

but it does not reduce the degrees of freedom of the system any further. Why?
The answer to this question is that this constraint is not independent of previous constraints.
In other words we had enough information from previous constraints to show that the distance
between particle 2 and particle 3 is constant and solve for this distance. Indeed, we could write out
Cartesian coordinates for particles 2 and 3 from the transformation equations
‚ Œ ‚ Œ
x1 + L 12 cos φ12 x1 + L 13 cos φ12 − θ
r⃗2 =  and r⃗3 =  .
y1 + L 12 sin φ12 y1 + L 13 sin φ12 − θ

95
The distance between these points is given by

2
 2  
L 23 = L 13 cos φ12 − θ − L 12 cos φ12 + L 13 sin φ12 − θ − L 12 sin φ12
2 3
   
= L 12 + L 13 − 2L 12 L 13 cos φ12 cos φ12 − θ + sin φ12 sin φ12 − θ
2 3
= L 12 + L 13 − 2L 12 L 13 cos (θ ) ,

which is an answer that we should expect from the cosine rule.


This value is independent of all coordinates L 12 , L 13 and θ are fixed parameters, not coordinates)
and hence fixed constant. Thus there is no new information in the constraint that particles 2 and 3
be at a fixed distance apart (and indeed this fixed distance is determined by the other constraints).
This last point shows that we can only reduce the degrees of freedom of a system by imposing a
constraint that prevents the system from moving freely in a certain direction. This constraint must
then be holonomic.
The constraints we considered in this example were independent of time, but explicitly time
dependent constraints also reduce the degrees of freedom of a system if they are Holonomic. For
instance, if L 12 , L 23 and θ are known functions of time, the reasoning above still holds for any instant
in time. The shape of the configuration space would then change with time to accommodate these
changing external parameters.

We now consider several examples of generalized coordinates and equations of constraint


applied to dynamical problems. First we consider the familiar case of a simple pendulum. Or in
Polar Coordinates as r = L . Next we consider two related examples of motion constrained to an
inclined plane.

Example 3.12 (Simple Pendulum Constraint) For the simple pendulum, the constraint on the
system is holonomic. In Cartesian Coordinates we wrote it as x 2 + y 2 = L 2 , or equivalently

f (x , y , t ) = x 2 + y 2 − L 2 = 0

Similarly, in polar coordinates the constraint could be expressed as

f (r, θ , t ) = r − L = 0

Example 3.13 (Particle Rolling on a Raised Platform) Suppose that a particle is free to roll on a
the floor of a lift and lift’s height is any prescribed function of time h (t ), see Figure 3.7
3.7, then we can
write the constraint specifying that the particle is bound to the floor of the lift with the equation

z = h (t )

This general motion is also clearly holonomic

f (x , y , z , t ) = z − h (t ) = 0.

96
Suppose that the lift is moving at a constant velocity v upwards. Then the constraint on the particle
is that its height must match the height of the floor of the lift at all times. In other words

z =vt.

The constraint
f (x , y , z , t ) = z − v t = 0.
is clearly holonomic and depends explicitly on time.

v~


Figure 3.7: A particle rolling on a rising horizontal plane.

Example 3.14 (A Rolling Tilted Wheel) Consider the case of a wheel rolling along a straight line
in the 2-dimensional plane, where the line along which the wheel rolls intersects the horizontal
axis at some angle, see Figure 3.8
3.8. As before the no-slipping constraint states that the velocity of
the point on the wheel in contact with the ground is zero. As before, this velocity is given by the
sum of two terms that act in opposite directions - the velocity of the whole wheel and the tangential
velocity due to rotation at the edge of the wheel

vtip = v − R θ̇ = 0 ⇒ v = R θ̇ .

Now the orientation of the disc in the plane is captured by the dotted axis. The no-slipping constraint
additionally tells us that the motion of the wheel must be perpendicular to this axis (otherwise the
wheel would slip sideways). Notice that this doesn’t constrain the wheel to move in a straight line, as
the acceleration is permitted to be in a direction different from the velocity - think back to circular
motion, where the velocity was tangent to the circle, for instance. To quantify this expression we
consider a top-view of the wheel on the plane, see Figure 3.8
3.8.
At an instant in time the wheel’s orientation looks as in Figure 3.8
3.8. Let the axis of the wheel
intersect the x -axis at an angle of φ as shown in the diagram. Some simple geometry then gives the
results
 
ẋ = v x = v sin φ and ẏ = v y = −v cos φ .
Substituting into these two equations our constraint on v , we get two differential equations
 
ẋ = R sin φ θ̇ and ẏ = −R cos φ θ̇ .

But since x , θ and φ are all functions of t and we only have these two equations relating them, we
cannot integrate these equations to obtain velocity-independent constraints. Thus the constraint of
no slipping for this problem is non-holonomic.

97
y

φ
x

Figure 3.8: A wheel tilted at an angle φ.

Example 3.15 (A Block Sliding down an Inclined Plane) Consider the motion of a block on an
inclined plane with wedge angle θ and coordinates as in Figure 3.9
3.9, the constraint that the block
does not lift from or fall through the slope is represented as x sin (θ ) − y cos (θ ) = 0, or equivalently,
y
x = tan (θ ). For the same problem but with coordinates now chosen such that the x -axis lies parallel
to the inclined plane and the y -axis is perpendicular to the include plane, we would write the same
equation of constraint as y ′ = 0. If we allow the block to lift from the slope but not to fall through it,
the constraint becomes y ′ ≥ 0.

N
f

mg sin (θ )
mg cos (θ )
θ mg

Figure 3.9: A block on the slope with wedge angle θ .

For the Block on the Slope problem, the constraint that the block does not fall through or lift
from the slope is Holonomic. In our first set of coordinates it was

x sin (θ ) − y cos (θ ) = 0,

so we would write f (x , y , t ) = x sin (θ ) − y cos (θ ) = 0. The constraint that the block does not fall
through or lift from the slope is Holonomic. In our first set of coordinates it was

f (x ′ , y ′ , t ) = y ′ = 0.

However, the condition that the block not fall through the slope, whilst allowing it to lift from the
slope is Non-Holonomic. We write this equation as

y′≥0

The inequality cannot be rewritten as an equality, and hence we cannot express the equation as
f (x , y , t ) = 0 for any f .

98
Example 3.16 (A Wheel Rolling Down an Inclided Plane) Consider a wheel rolling down an
inclined plane as depicted in Figure 3.10
3.10. In this case, the constraint that the wheel does not lift
from or fall through the slope requires that the wheel follow a line in the (x , y )-coordinate system.
This line corresponds to all points that keep fixed the relationship between the x and y -coordinate
positions of the block as it moves along the slop and is represented as

y
= tan (θ ) .
x

or equivalently,
x sin (θ ) = y cos (θ ) ,

This makes sense as it implies that the motion of the wheel along the slope occurs at a fixed angle θ .

x̂ ŷ 0
ŷ θ

x̂ 0
y

~
W
α

Figure 3.10: A wheel on a slope with wedge angle θ .

For a wheel rolling without slipping along a straight line, the angle of the wheel and the distance
it has rolled are related. Consider the image above. It is intuitively clear that the distance the wheel
has rolled will be equal to the arc-length that has been in contact with the ground over the interval
of the rolling. Thus the distance along the ground and the arc-distance along the circle are equal. If
the wheel has radius R , we can then write the equation of constraint

d = Rθ.

If we placed a coordinate axis (x − y ) on the left hand side of Figure 3.10


3.10, then there would also
be the constraint that the wheel could not fall through or lift from the line. These two constraints
would then take the form
x = Rθ and y = 0.

The argument for the first constraint presented above was lacking somewhat in rigour. We now
derive this equation directly from the constraint that there be no slipping between the wheel and the
ground. This statement is the same as saying that the velocity of the point on the wheel in contact
with the ground is zero at all times (if this were not the case, then it would slide). This velocity is the
sum of two components - the velocity of the wheel, and the tangential velocity at the edge of the

99
wheel due to rotation. We observe that these components are in opposite directions, and so we can
write the magnitude of the resultant velocity as

vtip = ẋ − vtangential = ẋ − R θ̇ .

This velocity must be zero for no slipping to occur. So we must have

ẋ − R θ̇ = 0.

We can integrate this equation over time to obtain

x − x0 = R (θ − θ0 ).

Which is the same as the one we had above when we set x0 = θ0 = 0. For this constraint we must
allow ourselves to think of values of θ beyond 2π as corresponding to several rotations of the wheel.
This derivation has taught us that equations of constraint can depend on velocities or positions. In
this case we were able to integrate the velocity - equation to obtain a position dependent equation,
but this is not always possible.

3.4 Forces of Constraint


In the simple pendulum problem considered earlier, the two forces acting on the system were
gravity and the tension in the rod. This last force is something that we know must exist in order to
satisfy the condition that the rod remains of constant length and the bob remains fixed to one
end throughout the motion. In other words, the force was necessary in order for the system to
satisfy certain constraints. We call such a force a force of constraint.
We know that value value of the force of gravity is −mg x̂ but we must solve for the force of
tension in the rod so that it is consistent with the constraint that the bob remains at a fixed distance
from the origin. This is a general feature of forces of constraint - we do not know their values in
advance. Because of this last point, forces of constraint can cause some degree of difficulty when
solving complex systems using Newton’s Laws directly. This is one of the principal motivations
for Lagrange’s Formulation of Mechanics.
Forces of constraint often act perpendicular to the direction of motion of the system. For the
pendulum example, for instance, the force of constraint is radial and the motion of the bob is
tangential. For a block on a slope, the normal force (which is the force that prevents the block from
falling through the slope, and is hence the force of constraint for the system) is at right angles to
the angle of the slope. Because these forces are acting at a direction perpendicular to the motion
of the systems, we notice that they do not do any work on the system. This turns out to be an
important observation, and is the basis upon which the Lagrangian formulation is based.
Consider the lift-floor example for instance. The normal force points vertically upwards, and
the motion of the particle over some short period of time will be along the floor of the lift, which
itself has moved slightly in the upward direction. Thus there is a component of the normal force in

100
the direction of motion and d W = N ·d r⃗ ̸= 0. Thus forces associated with time-varying constraints
do work on the particle. Later, we will find a work-around for this by defining the notion of virtual
work. We will then see that even time-varying constraint forces do no virtual work.

3.5 Work and Energy


A force only does work in a mechanical system if it causes some motion of the system. For example,
if one pushes against a wall, but do not move it, then the state of the wall is unchanged. Thus, the
force applied in this case cannot be thought of as doing any work. Sometimes only part of the
force does any work. For instance, if one pushes a block at 45◦ to the ground, only the horizontal
component of the force will actually move the block, and thus only the horizontal component of
the force will do any work on the block. Formally, for a constant force, we define the work done
on the system from position 1 to position 2 as

W12 = F⃗ · (⃗
r2 − r⃗1 ) = F⃗ · ∆⃗
r

where the dot above denotes scalar product. Notice that W is a scalar quantity associated with
the force and the distance travelled due to the force.
In reality forces are seldom constant, but we can assume them to be constant over very short
periods of time (or very small displacements of the system), so we can write

d W = d r⃗ · F⃗ .

This can be integrated as we move the particle from position r⃗1 to position r⃗2 , yielding

Zr2
W12 = d r⃗ · F⃗ ,
r1

where the path integral is performed along whatever path the particle traced out. We can
parameterize this path integral with the time parameter as follows

Zr⃗2 Zt 2 Z2 t Z2 t
d r⃗ d m v⃗ d v⃗ 1 d
 ‹  ‹  ‹
W12 = d r⃗ · F⃗ = dt · = m d t v⃗ · = m dt (v⃗ · v⃗),
dt dt dt 2 dt
r⃗1 t1 t1 t1

where in the last step we have made use of the product rule of differentiation. From here the
Fundamental Theorem of Calculus gives the result

1 1
W12 = m v⃗ · v⃗ − m v⃗ · v⃗ .
2 t2
2 t1

We thus introduce a quantity called the Kinetic Energy of the Particle

1
T = m v⃗ · v⃗.
2

101
And we can write down the simple relation

W12 = T2 − T1 .

The kinetic energy can be thought of as a function of time, and the work W12 , expressed in terms
of kinetic energy, can also be thought of as the work done by all forces acting on the particle in
bringing it from its state at one point in time to its state at another.
We cannot write kinetic energy as a function of only position, and so the above equation
cannot be used directly to reason about the work done in bringing the particle from one position
to another. Indeed, for many forces this question makes no sense, as the details of the motion of
the particle may well be important, and in such cases the force will do different amounts of work
when the particle travels on different paths between the points in space. Among all the interesting
forces that we can study we note the following special kind of force.

Definition 14 (Conservative Force) A force F⃗ that can be written as

F⃗ = −∇U ,

where U : Rn → R is a function only of position is called a conservative force. The function U is


called the and is called the potential associated with the force F⃗ .

A common property of conservative forces as that the work done by a conservative force is
independent of the path along which the force acts. In such a case, it follows that:

Zr⃗2 Zr⃗2
W12 = d r⃗ · F⃗ · = − r1 ) − U (⃗
d r⃗ · ∆U = U (⃗ r2 ) .
r⃗1 r⃗1

The last step follows from an extension of the fundamental theorem of calculus to many variables.
If the work done by some force in moving the particle from place to place is independent of the
path travelled by the particle, but dependent only on the starting and ending points on that path,
then we call the force path independent. We can (and will, later) show that all path independent
forces are conservative - that is, we can write

F⃗ = −∆U ,

Where U : Rn → R is a function only of position and is called the Potential Energy of the particle
associated with the force. In such a case, it follows that:

Zr⃗2 Zr⃗2
W12 = d r⃗ · F⃗ · = − r1 ) − U (⃗
d r⃗ · ∆U = U (⃗ r2 ) .
r⃗1 r⃗1

The last step follows from an extension of the fundamental theorem of calculus to many variables.

102
Remark 24 More generally, we shall use

∇U (⃗
r ) = ∆U (⃗
r)

for the total infinitesimal change of U . Note that this is a vector valued function
N 
∂ U (⃗
r)
X ‹
∇U (⃗
r)= x̂ i
i =1
∂ xi

in RN .

In the case where the net force acting on the system is a conservative force, we can equate the
two expressions we have for the work done by the force between positions/states 1 and 2 to get

W12 = T2 − T1 = U1 − U2 ,

and hence
T1 + U1 = T2 + U2 .

Since time/positions 1 and 2 were arbitrary, this tells us that the quantity T + U is constant. This
is called the total energy of the system, and it is conserved (constant) when the forces in question
are conservative (this is in fact why such forces are known as ‘conservative’ forces).
Note that we often show that a force is conservative by path independence - that is we work
out an expression for W12 by integration and notice that it is independent of path. There are
also other means for showing that a force is conservative that we’ll cover later. For now, let us
make use of path independence to show that the force of Gravity is conservative and compute its
potential energy. We know that in Cartesian coordinates, we can write the force of gravity acting
on a particle as F⃗ = −m g ẑ (Here ẑ refers to the unit vector in the z -direction). Now if we travel
along some path from r⃗1 = (x1 , y1 , z 1 )⊤ to r⃗2 = (x2 , y2 , z 2 )⊤ parameterized by
 
x (t )
r⃗(t ) =  y (t ) ,
 

z (t )

where, t ∈ [a , b ], r⃗(a ) = r⃗1 and r⃗(b ) = r⃗2 , then we have the integral

Z Zb Zb
d r⃗
 ‹
W12 = d r⃗ · F⃗ = dt r (t )) = −mg d t z ′ = −mg (z (b ) − z (a )) .
· F⃗ (⃗
dt
a a

So we have
W12 = mg (z 1 − z 2 ) .

Clearly, this integral is independent of path. Moreover, we can from this decide on our potential
function
U (x , y , z ) = mg z .

103
Notice that because the work is defined in terms of differences in potential energy and force is
defined in terms of its gradient, it is possible to add a constant to the potential function without
altering the dynamics of the system under description. Thus we can indeed set

U (x , y , z ) = mg z + K .

It is common to set some reference height, z 0 , and by letting K = −mg z 0 , define

U (x , y , z ) = mg (z − z 0 ).

Example 3.17 (Raising a Particle above the Surface of the Earth) The force of attraction
between two particles of mass m1 and m2 separated by a distance r is given by Newton’s Law of
Universal Gravitation
m1 m2
F⃗ = −G r̂ ,
r2
where G = 6.6743 × 10−11 m 3 k g −1 s −2 m −2 is Newton’s Gravitational constant. We can study the
motion of a single particle near the surface of the earth by replacing m1 with the mass of the Earth
M E = 5.9722 × 1024 k g , and replacing m2 with the mass of test particle m.
Let’s compute the work done against gravity when we raise test particle from the surface of the
earth to some point h above the surface of the earth. Then,

Z R
Z+h
dr 1 1 G M E mh
 ‹
W = d r⃗ · F⃗ = −G M E m = G ME m − =− .
r2 R +h R R 2 (1 + h /R )
R

Clearly, the force F⃗ is a conservative, so the work done in moving a particle subject to this force is
h
independent of the path taken in moving the particle. Let g = G M E R 2 (1+h /R ) denote the gravitational
acceleration of the test particle at a height h above the surface of the earth.

W = −mg h .

When h ≪ R , then
ME G
g= = 9.81932m s −2
R 2

which is the familiar gravitational acceleration used previously. It is easy to see that the force is
conservative since the work one by this force depends on the dot product of radial vectors, which
gives a radial position dependence, and not the path taken in moving the test particle.

Remark 25 The appearance of a negative sign in the value of the work done in raising a test particle
above the surface of the earth in Example 3.17 indicates that the work done in moving the particle
from R r̂ to (R + h )r̂ is done against the not by the force F⃗ , but rather by an external agent (the
person who moves the particle) against gravity. When the test particle is allowed to fall, from a
height R + h down to a height R above the surface of the earth, it is the force of attraction between
the particles that makes the test particle fall down and so it does work on the particle. When the
particle falls down, the work done by gravity through this distance is W = +mg h . We associate the
potential energy of the test particle with the work needed to move it.

104
All of the above results apply to one single particle, but the ideas of Work and Energy extend
readily to systems involving many particles. When we want to reason about the kinetic or potential
energy of a system of particles we simply add up the Kinetic/Potential Energies of each particle in
the system. Thus if we have N particles with positions {⃗
r1 , r⃗2 , . . . , r⃗N } and velocities {v⃗1 , v⃗2 , . . . , v⃗N }
we can write the Kinetic Energy of the system as
N
1X
T= mi v⃗i · v⃗i
2 i =1
This result can be justified in a similar manner to the result for a single particle (by computing the
work done on a system by the net force on the system), and this justification is left to the reader.
Similarly, the potential energy of a system is the sum of the potential energies of each of
the particles in the system. So, for instance, the gravitational potential energy of a system of N
particles is given by
N
X
U =g mi z i
i =1
Later in the course we will expand these expressions to see how they affect rigid body dynamics.
Below, an alternate (more direct) derivation of the conservation of energy is presented. This is
often the only derivation presented in physics courses, but one might argue this to be less intuitive
than the earlier derivation based on work.
N
d d 1X
T +U = mi v⃗i · v⃗i + Ui
dt d t 2 i =1
N 
1 d d
X ‹
= mi v⃗i · v⃗i + Ui
i =1
2 dt dt
N
∂ Ui d xi ∂ Ui d yi ∂ Ui d zi
X   ‹ ‹  ‹ ‹  ‹ ‹‹
= mi a⃗i · v⃗i + + +
i =1
∂ xi dt ∂ yi dt ∂ zi dt
N
∂ F⃗i ∂ F⃗i ∂ F⃗i
       ‹
∂ vi ∂ vi ∂ vi
X  ‹  ‹ 
= mi a⃗i · v⃗i − − −
i =1
∂x ∂x ∂y ∂y ∂z ∂z
N
X
mi a⃗i · v⃗i − F⃗i · v⃗i

=
i =1
N
X
F⃗i · v⃗i − F⃗i · v⃗i

=
i =1

= 0.

The derivation above applies to systems of particles. It is left as an exercise to adapt the work-based
derivation to systems of particles.

3.6 Exercises
Exercise 3.1 Extend the single particle work-based derivation from Section 3.5 to systems of
particles.

105
Exercise 3.2 Consider the earth and Newton’s Law of Universal Gravitation

GMm
F⃗ = − r̂
r2
where M is the mass of the earth, m is the mass of a test particle and G is Newton’s Gravitational
Constant. Suppose that the earth has raduis R . Prove that the work done to move a particle vertially
above the surface of the earth is independent of the path taken.

Exercise 3.3 Consider the earth and Newton’s Law of Universal Gravitation

GMm
F⃗ = − r̂
r2
where M is the mass of the earth, m is the mass of a test particle and G is Newton’s Gravitational
Constant. Suppose that the earth has raduis R . Now suppose that a particle, initially at rest on the
surface of the earth, is raised to a height of h ≪ R above the surface of the earth. Show that the work
needed to raise this particle has magnitude mg h . Find an expression for g in terms of the R , M
and G .

Exercise 3.4 Explain why the commonly used expression for graviational potential energy,

U = mg h

is an approximation to the actual graviational potential energy and give some limits on the validity
of this approximation. Demonstrate how this approximation degrades using a concrete example
using numerical values.

106
Chapter 4

Lagrangian Formalism

When solving problems with Lagrange’s Equations, one of the most important steps is to obtain
an expression for Kinetic Energy. Listed below are some techniques for obtaining the Kinetic
Energy expression in our set of generalized coordinates. When there is more than one particle in
the system we generally consider the Kinetic Energy of each particle individually and then add up
the results to get the total Kinetic Energy.

4.1 Single Particle Dynamics


In what follows we shall consider the Kinetic Energy of a single particle. There are two main
approaches that we can use to study the dynamical motion of a single particle, these are brute
force computation, a general expansion of the transformed Newtonian equations in a judicious
choice of coordinate system. We shall study these next.

4.1.1 Coordinate Systems and Computations


Brute force computation the most reliable method as it will always work. We compute the Cartesian
components of the velocity vector in terms of the generalized coordinates and get its size. In
symbols,
1  1 2 1 2 1 2
T = m ẋ 2 + ẏ 2 + ż 2 = m r⃗˙ = m˙r⃗ = p⃗ .
2 2 2m 2m
We work out each of ẋ , ẏ , ż in the required coordinates and add. As an example let us consider
polar cylindrical coordinates
 
x = ρ cos φ , y = ρ sin φ and z = z.

Then,
   
ẋ = cos φ ρ̇ − ρ sin φ φ̇, ẏ = sin φ ρ̇ − ρ cos φ φ̇ and ż = ż .

Summing the squares of these quantities yields


2  2  2
r⃗˙ = cos φ ρ̇ − ρ sin φ φ̇ + sin φ ρ̇ + ρ cos φ φ̇ + ż 2 ,
 

107
= ρ̇ 2 cos2 φ + sin2 φ + ρ 2 φ̇ 2 sin2 φ + cos2 φ − 2 f + 2 f + ż 2
   

= ρ̇ 2 + ρ 2 φ̇ 2 + ż 2 .
 
Where f = ρ ρ̇ cos φ sin φ . Thus the Kinetic Energy in Polar Cylindrical Coordinates is given by

1
T = m ρ̇ 2 + ρ 2 φ̇ 2 + ż 2 .

2
When there is no extra information about the coordinates, this is generally the best approach to
use.
While brute force computations are sure to work, there is often possible to greatly simplify
computations by a judicious choice of coordinates to describe the system. For each choice of
coordinates, there is a corresponding transformed set of equations that define the dynamics of
the system. The transformation Equation for some particle

r⃗ = r⃗ q 1 , q 2 , . . . , q m , t


yields the transformation of velocity


X  ∂ r⃗ ‹ 
∂ r⃗
‹
r⃗˙ = j
q̇ + .
j
∂qj ∂t

Now the inner product of the velocity with itself will give
!
X  ∂ r⃗ ‹ ‹ ‚X  ‹Œ
∂ r
⃗ ∂ r
⃗ ∂ r

 ‹ 
2
r⃗˙ = r⃗˙ · r⃗˙ = q̇ j + · q̇ k + .
j
∂ q j ∂ t k
∂ q k ∂t

Expanding this out yields


! !
2
X  ∂ r⃗ ‹  ∂ r⃗ ‹ X  ∂ r⃗ ‹  ∂ r⃗ ‹ 
∂ r⃗
‹ 
∂ r⃗
‹
r⃗˙ = · j k
q̇ q̇ + 2 · j
q̇ + ·
j ,k
∂qj ∂ qk j
∂qj ∂t ∂t ∂t

Notice the presence of the tangent vectors in this expression. Writing the i -th tangent vector as e⃗i ,
the above equation becomes
! !
2
∂ r⃗ ∂ r⃗
 ‹  ‹
2
r⃗˙
X X
= e⃗j · e⃗k q̇ j q̇ k + 2 e⃗j · q̇ j +
j ,k j
∂t ∂t

The dot product e⃗j · e⃗k is usually denoted g j k - a component of the covariant metric tensor.

Remark 26 This all simplifies into just the first expression if we allow for a ‘time tangent vector’ e⃗t
and let j and k run through this vector as well, but this is not convention.

In general then, the Kinetic Energy can be written as the sum of three terms

T = T0 + T1 + T2 ,

108
where
2
1 X 1 X ∂ r⃗ 1 ∂ r⃗
 ‹  ‹
T0 = m g j k q̇ j q̇ k , T1 = m e⃗j · q̇ j and T2 = m
2 j ,k
2 j
∂ t 2 ∂t
If for some coordinate system we have computed the tangent vectors, it may be worthwhile to
work out the Kinetic Energy directly from this expression. Also note that if the transformation
equations are not explicitly dependent on time, then T = T0 and the other two terms vanish. By
symmetry of the inner product

g j k = e⃗j · e⃗k = e⃗k · e⃗j = g k j .

Thus, we can rewrite the expression for T0 as


1 X
T0 = m g j k q̇ j q̇ k
2 j ,k
1 X 2 j 2 X
= m h j q̇ + m g j k q̇ j q̇ k .
2 j j <k

Where h j is the j -th metric coefficient h j = g j j .


There are several different coordinate systems that may be useful when computing the
dynamics of particles. Several of these will already be familiar to you. Let’s investigate how each
choice of coordinate system reveals something different about the problem of computing the
dynamics of a single particle.

Rectilinear Coordinates: If the tangent vectors are orthogonal, the general expression for Kinetic
Energy can be simplified quite considerably. In this case:
(
h j2 if j = k ,
g j k = e⃗j · e⃗k =
0 otherwise,

where h j are non-zero constants. So, T0 becomes


1 X 2 j 2
T0 = m h j q̇ .
2 j

In three dimensions this is more simply,


1 X 2 2 2
T0 = m h1 q̇ 1 + h2 q̇ 2 + h3 q̇ 3
2 j

Polar Coordinates: In polar coordinates the unit vectors are orthogonal, viz. e⃗r = r̂ , e⃗θ = r θ̂ .
Thus the metric coefficients are

hr = g r r = ∥e⃗r ∥ = 1, and hθ = g θ θ = ∥e⃗θ ∥ = r.

Since the transformation equations have no explicit time dependence, it follows that T = T0
and
1  1
T = m 12 ṙ 2 + r 2 θ̇ 2 = m ṙ 2 + r 2 θ̇ 2 .

2 2

109
Cylindrical Coordinates: Cylindrical Coordinates are also orthogonal, with metric coefficients
given by

hρ = g ρρ = e⃗ρ = 1, hz = g z z = ∥e⃗z ∥ = 1, and hφ = g φφ = e⃗φ = ρ.

It follows that
1  1
T = m 12 ρ̇ 2 + 12 ż 2 + ρ 2 φ̇ 2 = m ρ̇ 2 + ż 2 + ρ 2 φ̇ 2 .

2 2
Spherical Coordinates: Yet again we remark that these are orthogonal coordinates, and the metric
coefficients are

hρ = g ρρ = e⃗ρ = 1, hθ = g θ θ = ∥e⃗θ ∥ = ρ, and hφ = g φφ = e⃗φ = ρ sin (θ ) .

Thus,
1
T = m ρ̇ 2 + ρ θ̇ 2 + ρ 2 sin2 (θ ) φ̇ 2

2
Notice how it is almost immediate to write down the Kinetic Energy in orthogonal coordinate
systems with this method once we know the metric coefficients. For general coordinate
systems, the same is true once we know the components of the covariant metric tensor.

4.1.2 A Sanity Check


We derived Lagrange’s Equations from Newton’s Laws and the assumptions that the net work done
by the forces of constraints vanishes. Therefore we should expect to get Newton’s Laws back for
an unconstrained particle moving under the influence of some force, F⃗ . Let us now test this claim.
To do this, we consider a particle moving in 2-dimensional space. Since there are no constraints
on the system we are free to choose any parameterisation of 2-dimensional space. To start with,
we choose Cartesian Coordinates. The generalized components of force are then

∂x ∂y
 ‹  ‹
Q x = e⃗x · F⃗ = Fx + Fy = Fx ,
∂x ∂x
∂x ∂y
 ‹  ‹
Q y = e⃗y · F⃗ = Fx + Fy = Fy .
∂y ∂y

The Kinetic Energy is


1
T = m ẋ 2 + ẏ 2

2
And so Lagrange’s Equations are

d ∂T ∂T d
 ‹  ‹
− = Fx ⇒ (m ẋ ) = Fx ⇒ m ẍ = Fx ,
d t ∂ ẋ ∂x dt
d ∂T ∂T d
 ‹  ‹ 
− = Fy ⇒ m ẏ = Fy ⇒ m ÿ = Fy .
d t ∂ ẏ ∂y dt

These are precisely Newton’s Equations in 2-dimensional Cartesian Coordinates - exactly what
we were hoping to find.

110
One of the beautiful properties of Lagrange’s Equations of Motion is their transformation
invariance. The equations will represent the same motion regardless of the coordinates chosen
(provided they are correct generalized coordinates for the system of choice). As an illustration of
this we will use them to solve for the motion of a particle in free space undergoing force F⃗ , but
this time using polar coordinates (r, θ ) as our generalized coordinates for the system. Recall the
transformation equations
x = r cos (θ ) and y = r sin (θ ) .

It remains to write down expressions for the generalized components of force and the Kinetic
Energy in Polar coordinates. These are both quite easy to obtain. First we consider the Generalized
Components of Force

Q r = e⃗r · F⃗ = r̂ · F⃗ = Fr ,
Qθ = e⃗θ · F⃗ = r θ̂ · F⃗ = r Fθ .

Now the Kinetic Energy - this is immediate by the trick for orthogonal coordinates above
1
T = m ṙ 2 + r 2 θ̇ 2 .

2
Thus we can write down Lagrange’s Equations in Polar Coordinates
d ∂T ∂T d
 ‹  ‹
− = Qr ⇒ (m ṙ ) − m r θ̇ 2 = Fr ⇒ m r̈ = Fr + m r θ̇ 2 ,
d t ∂ ṙ ∂r dt
d ∂T ∂F d
 ‹  ‹
m r 2 θ̇ = r Fθ m r 2 θ̈ = r Fθ − 2m r ṙ θ̇ .

− = Qθ ⇒ ⇒
d t ∂ θ̇ ∂θ dt
Thus the equations of motion in Polar Coordinates are given by

1 2m ṙ θ̇
m r̈ = Fr + m r θ̇ 2 and m θ̈ = Fθ − .
r r
Indeed, these are the same equations of motion we obtained by transforming Newton’s Laws, but
this time we got them quite easily.
It should be quite apparent now that we can transform Newton’s Equations to any coordinate
system simply by writing out Lagrange’s Equations in that coordinate system. Indeed, this is true
for generalized coordinates on arbitrary constraint surfaces as well (after all that is how we arrived
at Lagrange’s Equations in the first place).

4.2 Conservative Forces


Recall from Definition 14 that a force F conservative if there exists a scalar function U such that
F⃗ = −∇U . Mathematicians don’t include the minus sign when defining conservative functions.
For Physics it makes sense to include the minus sign because of what it implies for Work and Energy.
Here U is a function of position only (no explicit time dependence, not a function of velocity).
In this section we shall spend some time acquainting ourselves with some of the properties of
Conservative forces and their applications to Lagrangian Mechanics.

111
4.2.1 Path Independence
As stated previously, the work done by a force is independent of the path travelled by the particle
if and only if the force is conservative,

Conservative Force ⇐⇒ Path Independence

We now prove this result.

Theorem 1 (Work done by convervative forces) The work done by a conservative force is
independent of the path along which the force is applied.
Proof. Suppose F⃗ = −∇U is conservative and let γ : R → Rn be a path given by the position
⃗ . Then γ is the path from a⃗ to b
vector r⃗(t ) on the domain [t 1 , t 2 ], such that r (t 1 ) = a⃗ and r (t 2 ) = b ⃗.
⃗ is independent of the choice of γ.
We want to show that the work done by F⃗ between a⃗ to b
Define
r (t )) ,
h (t ) = U (⃗

then, the chain rule gives us

∂U ∂ r1 ∂U d r2 ∂U d rn d r⃗
 ‹ ‹  ‹ ‹  ‹ ‹  ‹

h (t ) = + + ··· + = ∇U · .
∂ r1 ∂t ∂ r2 dt ∂ rn dt dt

But by definition F⃗ = −∇U , and so

d r⃗
 ‹
h ′ (t ) = −F⃗ · .
dt

Then the work done in moving along this path is

Z Zt 2 Z2 t
d r⃗
 ‹
Wa b ,γ = d r⃗ · F⃗ = d t F⃗ = − d t h ′ (t ) = h (t 1 ) − h (t 2 ) ,
dt
γ t1 t1

where the last step follows from the Fundamental Theorem of Calculus. We write

r (t 1 )) − U (⃗
Wa b ,γ = U (⃗ ⃗ ).
r (t 2 )) = U (a⃗) − U (b

Notice that the work done is just the difference in potential energy function evaluated at the start
and the end of the path. Thus it is independent of the path itself. So, we have shown that the work
done by a conservative force is independent of path,

Path Independence ⇒ Conservative Force

provided the domain is connected and open. We now show that if the work done by some force
along all paths between two points is the same, then we can find U such that F⃗ = −∇U . We present
here a constructive proof in which we build the function U and show that it satisfies the necessary
property.

112
z

q
s~

p~

Figure 4.1: The path of a particle in motion in 3-dimensional space.

Fix some point p⃗ = (x0 , y0 , z 0 ) in the domain of F⃗ . Now define the function U as
Z
s) = −
U (⃗ d r⃗ · F .
γ

where γ is any path from p⃗ to s⃗ = (x , y , z ). Note that this function is well-defined because the work
done is independent of the path, γ.
Since the domain is an open set we can find some open ball, B , around s⃗ contained in the
domain. We choose q ∈ B with the same coordinates as s⃗ except in the first coordinate

q⃗ = (x1 , y , z ).

4.1. Now we choose any curve C from p⃗ to q⃗ and then follow the straight line L from q⃗
See Figure 4.1
to s⃗. This new curve is as good as any for the computation of U .
Z Z
s ) = − d r⃗ · F⃗ − d r⃗ · F⃗
U (⃗
C L

Now because q⃗ is independent of x , so is the first integral above. Thus the partial derivative of U
with respect to x is Z Z
∂U ∂ ∂
 ‹
=− d r⃗ · F⃗ − d r⃗ · F⃗ .
∂x ∂x ∂x
C L

The first term above is then clearly zero. Thus,


Z Zx2
∂U ∂ ∂
 ‹
=− d r⃗ · F⃗ = − d t Fx (t , y , z ) = −Fx (x , y , z ),
∂x ∂x ∂x
L x1

113
where the last step follows from
€ Š the Fundamental Theorem of Calculus. It is easy to see how to
∂U ∂U

construct paths to compute ∂ y and ∂ z in a similar fashion. It follows that

F⃗ = −∇U ,

and so if the work done by a force is independent of path, the force is Conservative! Thus Conservative
Forces are precisely those forces that do the same work on any path between two points.

Path Independence ⇐⇒ No work on a Closed Loop

Another way of phrasing the path independence criterion is to say that the work done on any closed
loop is zero. One part of this is easy to see - if a force is path independent then it is conservative, and
the work done by the force when the particle is moved from a⃗ to b ⃗ along any path γ is

⃗ ).
Wa b ,γ = U (a⃗) − U (b

⃗ , and then clearly


But if the path is a closed loop then the starting point is the end-point, i.e. a⃗ = b

⃗ ) = 0.
Wγ = U (a⃗) − U (b

So the work done on any closed path is zero.


For the converse we are given that the work done by the force around any closed loop is zero and
we must show that the work done from one place to another is independent of path. ■

γ2
a~

~
b

γ1

Figure 4.2: Arbitrary closed paths.

Consider Figure Figure 4.2


4.2. Let the top path and the bottom path be any two arbitrary paths
⃗ . Let the bottom path be γ1 and the top path be γ2 . The reverse of the top path,
between a⃗ and b
⃗ to a⃗. Now we can travel from a⃗ to b
−γ2 , goes from b ⃗ along γ1 and then back to a again along γ2
and we will have followed a closed loop. Thus, the work done is zero
Z Z
d r⃗ · F⃗ + d r⃗ · F⃗ = 0.
γ1 −γ2

Reversing the parameterization to get back γ2 we get


Z Z
d r⃗ · F⃗ − d r⃗ · F⃗ = 0.
γ1 γ2

114
Hence, Z Z
d r⃗ · F⃗ = d r⃗ · F⃗ .
γ1 γ2

Thus the work done is independent of the path travelled. Thus for any force on an open, connected
domain the following statements are equivalent

1. The Force is Conservative: there is a function U of position such that F⃗ = −∇U .

2. The work done by the force is independent of path.

3. The work done by the force around any closed loop is zero.

4.2.2 Potential Energy and Kinetic Energy


As remarked previously, the potential energy is not unique in that we can add any constant to
its value without changing the force involved (show this). It is generally convenient to define a
reference point somewhere in space at which we set the potential energy to be zero. After this, the
potential Energy will be unique. Let the reference point be r⃗0 . In moving from any point r⃗ to this
reference point, our conservative force will do the following work:

r ) − U (r⃗0 ) = U (⃗
Wr,r0 = U (⃗ r ) − 0 = U (⃗
r ).

We can now attach an unambiguous physical meaning to the potential energy: The potential
energy of a particle is the work that the conservative force(s) in the system would do to bring that
particle from its current location to the reference point. It was shown in Chapter 1 CHECK that in
the case where the net force is a conservative force, when we move a particle from one state to
another the work done can be expressed as

W12 = T2 − T1 and U12 = U1 − U2 ,

Comparing the two equations, we get T1 + U1 = T2 + U2 , indicating that the quantity T + U is


conserved. The first equation gives us a physical interpretation of Kinetic Energy. The Kinetic
Energy of a Particle is equal to the work that was done by the net force on the system to bring the
particle from rest to its current velocity.
Physicists like to think of the Kinetic Energy as the amount of energy encoded in the motion
of the particle, and Potential Energy as the energy stored inside the particle that has not yet been
converted to motion (hence the name potential energy). If a particle has a potential energy of
zero we would say all of its energy is being used, whilst if it has a Kinetic Energy of zero, we would
say all of its energy is stored. The Physics of conservative systems is essentially just the process of
converting energy from one type to another. When both conservative and non-conservative forces
are acting in the system, the total energy is no longer conserved. Instead we have the following
relationships, Z
W12 = T2 − T1 = U1 − U2 + d r⃗ · F⃗ (nc) ,
γ

115
where F⃗ (nc) is the non-conservative part of the total force acting on the particle (notice how the
work done by this part is path-dependent). Comparing these equations we get
Z
(T2 + U2 ) − (T1 + U1 ) = d r⃗ · F⃗ (NC) .
γ

We can think of the left-hand-side above as the change in total energy of the system. So, we can
write Z
∇E = E2 − E1 = d r⃗ · F⃗ (nc) .
γ

The force of Friction, for example, is a non-conservative force that decreases the total energy of
the mechanical system.

Remark 27 In a deep sense in physics, energy is always conserved. When a mechanical system
appears to lose energy, this is normally because we are not accounting for some part of the system
into which the energy is ‘seeping’. Friction, for instance, is a very complicated electrostatic process
through which electric energy is released (if you rub your feet on the carpet and then touch a metal
surface a spark is produced from the charge of electric energy), sound energy is released (the sound
of an object sliding on a table, for instance, in which each sound wave carries energy with it), heat
energy is released, etc. If we were to perform a very careful bookkeeping of all the aspects of a system
(keep track of all electric energy, follow all the sound waves, etc), then energy would still be conserved.
The reason we don’t do this in practice is because it would add an immense amount of complexity
to the problem for little immediate gain. In order to do modelling, we need to make simplifications.
Thus treating some forces as ‘non-conservative’ forces is often useful and necessary in our problems.
Forces like friction are called dissipative because they dissipate the energy of the system into various
forms that we are not keeping track of.

4.2.3 Conditions for a Force to be Conservative


Earlier we showed that Conservative Forces are precisely those forces that do the same work over
different paths with the same endpoints. We also showed that they are precisely those forces that
do no work when the particle is moved in a closed loop. However, these are not easy conditions to
use when we want to decide whether a given force is conservative. In this section we consider a
much more convenient method, which we’ll derive from Stokes’s Theorem.

Theorem 2 (Stokes) Let D ⊂ R3 be an open set. Given an oriented surface S ⊂ D with boundary
∂ S , and some function
F⃗ : D → R3 ,

we have Z Z
d r⃗ · F⃗ = dS n̂ · ∇ × F⃗ .


∂S S

116
For proof of this theorem, you are encouraged to refer to Cain & Herod’s excellent online book,
referenced in the bibliography. Spivak’s ‘Calculus on Manifolds’ generalizes the result neatly – even
Penrose has something to say about it in his book ‘Road to Reality’.

In order to state the result that we want regarding an easy condition for determining whether
a force is conservative, we need one more concept - that of a Simply Connected Set.

Definition 15 (Simply Connected Set) We call a set S ⊂ R3 simply connected if every closed curve
in S is the boundary of some surface contained entirely in S .

In much of the literature, simply connected sets are defined as sets for which any closed loop
in the can be continuously deformed into a point in the set. This is, of course, equivalent to the
above definition.
An informal way of reasoning about simply connected sets in two dimensions is to think of a set
as simply connected if it has no ‘holes’ in it. In three dimensions, we can get past holes by making
a surface that ‘bends around it’. In this case planes or dense clouds of singularity would violate
the conditions. Figure 4.3 presents some examples of simply connected sets. These examples
are not too surprising. Any closed curve we draw in one of these figures is the boundary of some
surface also inside the figure. Thus they are Simply Connected.

(a) (b) (c)

Figure 4.3: Examples of simply connected sets. (a) and (b) are examples of a simply connected sets.
(c) is an example that differs from the first two in that it is not convex. Nevertheless, any closed
curve within this figure is the boundary of some surface contained in the figure. There are also no
‘holes’ in the figure, so it passes our intuitive test for simple connectivity.

Next we consider some examples of sets that are note simply connected. Figure 4.4 presents
an example of a set that is not not simply connected. The closed curve indicated in Figure 4.4
is a curve inside the set which does not contain a surface in the set. Thus, this set is not simply
connected. We can also think of this as a circle with a giant ‘hole’ in the middle, preventing it from
being simply connected.
As a final example Figure 4.4 is clearly also not simply connected. Informally we see this directly
by observing the ‘hole’. Alternatively we could draw in a closed curve that is not the boundary
of some surface. All the machinery is now in place, and so we can state and prove the following
theorem, which is a simple corollary to Stokes’s Theorem:

Corollary 1 If a force, F⃗ , is defined on a simply connected domain, D , and ∇ × F⃗ = 0 everywhere


on that domain, then F⃗ is conservative.

117
(a) (b)

Figure 4.4: Examples of not simply connected sets. Sets (a) and (b) are not not simply connected.
The closed curve indicated in (a) is a curve inside the set which does not contain a surface in the
set so it is not simply connected. We can also think of this as a circle with a giant ‘hole’ in the
middle, preventing it from being simply connected.

Proof. We show that the work done around any closed loop is zero. By previous results this
implies that the force is conservative. Given some closed loop, γ, completely contained in D , we can
by the fact that D is simply connected find a surface, S , in D for which γ is a boundary. In other
words γ = ∂ S , and so by Stokes’s Theorem:
Z Z Z
F⃗ · d r⃗ = dS n̂ · ∇ × F⃗ = ⃗ = 0.

dS n̂ · 0
γ S S

Thus F⃗ is conservative. ■

In three dimensions, we can write out the curl as:


    € Š € ∂ F Š
∂ Fz
∂y − ∂z
∂ y
∂x Fx
∇ × F⃗ =  ∂∂y  × Fy  =  ∂∂Fzx − ∂∂ Fxz  = 0.
      
€∂ F Š € Š
∂ ∂ Fx
∂x − ∂y
y
∂z Fz

This implies that the following quantities are equal:

∂ Fy ∂ Fy
‹    
∂ Fz ∂ Fx ∂ Fz ∂ Fx
  ‹  ‹  ‹
= , = , = .
∂y ∂z ∂z ∂x ∂x ∂y

So the condition that ∇ × F⃗ = 0 is equivalent to:

∂ Fj
   
∂ Fi
= , 1 ≤ i , j ≤ 3.
∂ xj ∂ xj

We remark that this result is true of all conservative forces:

∂ Fj
         
∂ Fi ∂ ∂U ∂U ∂U ∂ ∂U
 ‹
=− =− =− =− =
∂ xj ∂ x j ∂ xi ∂ xi , x j ∂ x j , xi ∂ xi ∂ x j ∂ xi

118
Thus if a force has a simply connected domain, it is conservative if and only if ∇ × F⃗ = 0
⃗. When
the domain of the force is not simply connected and ∇ × F⃗ = 0
⃗, then we have to use some other
means to determine if the force is conservative or not.
We may now update our list of equivalent statements for conservative forces (assuming that
the domain on which the force is defined is open and simply connected):

1. The Force is Conservative: there is a function U of position such that F⃗ = −∇U .

2. The work done by the force is independent of path.

3. The work done by the force around any closed loop is zero.

4. If the force is the only force acting on a system, then energy is conserved in that system (the
force is not dissipative).

5. ∇ × F⃗ .
€ Š €∂ F Š
∂F
6. ∂ xij = ∂ xij , 1 ≤ i , j ≤ 3.

4.2.4 Computing the Potential


Once we have convinced ourselves that a given force is conservative, we are faced with the problem
of how to compute the potential energy associated with the force. There are essentially two ways
of going about this. We consider each in turn.

The Direct Method - Integration: The most intuitive method for computing the potential energy
is to use the definition directly. It is the recommended approach because when the domain
of F⃗ has to be restricted due to singularities, it still works. Recall definition of potential
energy as the work that the conservative force would do to bring that particle from its current
location to the reference point. This gives a recipe for finding the potential energy.

1. Establish some reference point at which the potential energy is chosen to be zero.
2. Compute the work done in moving the particle (on any path) from a given point in
space to this reference point.

The fact that we can do this computation along any path gives us quite a bit of freedom in
how we come to the result. A word of warning: the path must at all times remain inside the
domain we’ve specified for our force, F⃗ . As an example, consider the following force:
 
−2x − y cos (z )
F =  z − x cos (z ) 
 

y + x y sin (z )

This force is defined everywhere in R3 and thus its domain is simply connected. The curl
of this force is clearly zero (check). These two facts together tell us that the force is indeed
conservative.

119
We can choose the reference point to be anywhere in the domain of the force. When the
force corresponds to some physical force, there might be a preferred point, but since this
example was concocted out of thin air, we’ll simply take the origin: U (0, 0, 0) = 0. Thus we
can get the potential energy by integrating along any path from (x , y , z ) to the origin.

(x , y , z )

(0, 0, 0)

(x , 0, 0)
y
(x , y , 0)
x
Figure 4.5: Path of motion of a particle under the action of a conservative force.

One such path is suggested in Figure 4.5


4.5. Here we integrate along the coordinate axes. When
we’re working in Cartesian Coordinates and a path like this is feasible, it is usually a good
choice because it simplifies our integration into three easy parts, as shown below:
Z
r ) = Wr⃗,0 = d r⃗ · F⃗
U (⃗
γ

Zz Zy Zx
= d t Fz (x , y , z ) + d t Fy (x , t , 0) + d t Fx (t , 0, 0)
0 0 0
Zz Zy Zx
= d t (y + x y sin (t )) + d t (0 − x cos (0)) + d t (−2t − 0 cos (0))
0 0 0
Zz Zy Zx
= d t (y + x y sin (t )) − dt x − d t 2t
0 0 0

= −x y cos (0) − y z − x y cos (z ) − −x y + x 2


  

= −x y − y z + x y cos (z ) + x y + x 2
= x y cos (z ) − y z + x 2

We can easily verify that F⃗ = −∇U . Following the path independence of work done in this
potential, we could have chosen a different path and obtained the same answer that we did
above. Another commonly used path is the straight line from the point to the origin.

120
The problem above was set up so that U (0, 0, 0) = 0, but we can easily change that now that
we have a potential function by adding some constant to the potential function. Say we
want U (1, 1, 1) = 0. Then we simply change our function by a constant:

U ′ = U − U (1, 1, 1) = x y cos (z ) − y z + x 2 − (cos (1) − 1 + 1) = x y cos (z ) − y z + x 2 − cos (1)

This new function will satisfy U ′ (1, 1, 1) = 0. Naturally, the addition of a constant does not
change the underlying force - it is a routine exercise to check that F⃗ = −∇U ′ .

Finding the Anti-derivative: Once we know that U exists, we can attempt to find it analytically
as follows:
Z
∂U
 ‹
− = Fx ⇒ d x Fx + K 1 (y , z ) = −U (x , y , z )
∂x
Z
∂U
 ‹
− = Fy ⇒ d y Fy + K 2 (x , z ) = −U (x , y , z )
∂y
Z
∂U
 ‹
− = Fz ⇒ d z Fz + K 3 (x , y ) = −U (x , y , z )
∂z

We can hope to solve for the functions K 1 , K 2 , K 3 by comparing these three results. The
method is best illustrated by example, so we consider the same force as before:
 
−2x − y cos (z )
F =  z − x cos (z ) 
 

y + x y sin (z )

The equations above then become:


Z Z
d x Fx = d x (−2x − y cos (z )) = −x 2 − x y cos (z ) + K 1 (y , z ) = −U (x , y , z )
Z Z
d y Fy = d y (z − x cos (z )) = z y − x y cos (z ) + K 2 (x , z ) = −U (x , y , z )
Z Z
d z Fz = d z (y + x y sin (z )) = y z − x y cos (z ) + K 3 (x , y ) = −U (x , y , z )

Collecting these results we have:

U (x , y , z ) = x 2 + x y cos (z ) − K 1 (y , z )
= −z y + x y cos (z ) − K 2 (x , z )
= −y z + x y cos (z ) − K 3 (x , y )

Comparing the second and third equations we notice:

K 2 (x , z ) = K 3 (x , y )

121
€ Š
∂ K3
Differentiating with respect to y gives 0 = ∂y and hence K 3 is only a function of x . Similarly
differentiating with respect to z shows that K 2 is only a function of x . Now comparing the
first and second equations above we see:
x 2 − K 1 (y , z ) = −y z − K 2 (x ).
Hence:
x 2 + K 2 (x ) = K 1 (y , z ) − y z .
Differentiating with respect to x yields:
2x + K ′ 2 (x ) = 0
Thus giving:
K 2 (x ) = −x 2 + C1 .
Substituting this back into the previous equation we get:
x 2 − x 2 + C1 = K 1 (y , z ) − y z .
So:
K 1 (y , z ) = y z + C1 .
Also, we saw earlier that K 3 = K 1 . So we can write:
K 3 (x ) = −x 2 + C1
Substituting these values back into the expression for U gives:
U (x , y , z ) = x 2 + x y cos (z ) − y z + C1
And we can set C1 to any value we like. If we desire (as before) that U (0, 0, 0) = 0, then we
could solve for C1 :
0 = U (0, 0, 0) = 02 + 0 − 0 + C1 .
Hence C1 = 0 and the potential energy is given by:
U (x , y , z ) = x 2 + x y cos (z ) − y z

4.2.5 Known Potential Energies


The two forms of physical potential energy considered up to this point are listed below for
completeness.

Force Potential Energy Meaning of symbols

Gravity mg h h is the height from some reference point, m is the mass of the
particle and g is the gravitational acceleration
1 2
Spring 2 k (l − l0) l is the length of the spring and l 0 is the rest length of the spring
with spring constant k

Other examples of potential energies come from electrostatics, optics, magnetism, molecular
dynamics, etc. We will not consider the physical meaning of potential energies other than the two
given above, but we might derive them from a given conservative force as in the previous section.

122
4.3 The Variational Calculus
In this section we make contact with ideas of systems under the influence of conservative forces
and relate them to Optimisation Theory. The following treatments will rely on three tools from
Calculus, namely (i) curve parametrisation, (ii) Taylor Series expansions and (iii) integration by
parts, and the Optimisation Theory idea of path minimisation. The idea of path minimisation is a
critical component in the formulation of Lagrangian Mechanics. We shall highlight some general
features of simple parametric motion that we can generalise. This generalisation introduces the
idea of geodesics that define solutions to parametric motion as a minimisation problem via the
Calculus of Variations.

4.3.1 A Variational Approach


This section provides a brief overview of the Calculus of Variations. The treatment here makes no
claim at being rigorous, and mostly intuitive arguments have been used. The idea is not so much
to present the theory of the Calculus of Variations as it is to give the reader a taste of the problems
and solution methods employed. Students are encouraged to consult, the extensive resources on
this vast and fascinating topic, listed in the bibliography of these notes.
The Calculus of Variations is the calculus of functionals. A functional is a map from a space
of functions to the real numbers. We can think of a functional as a map that takes a curve as
input and produces a real number as output. One obvious example is the length of the curve.
We can often write a functional as an integral of a (kernel) function over the curve in question.
Example 4.1 provides a simple example of such a functional.

Example 4.1 (Length of a Curve in Space) Consider the length functional in 2-dimensional space.
This is a map that takes any curve in the x y -plane and assigns to it a single real number, its length.
It is the arc-length of the curve and is expressed as the integral of unity along the curve. More
formally, consider a curve γ that begins at a point a and ends at a point b . The length of the curve
ℓ(γ) is Z
ℓ(γ) = ds
γ

where d s is the length element along the curve. This can be re-expressed as
Z
Æ
ℓ(γ) = dx2 +d y 2
γ

following the Pythagorean Theorem. Suppose that the curve γ is parameterised by t ∈ [0, 1], then
‚ Œ
x (t )
γ(t ) =
y (t )
such that a = γ(0) and b = γ(1). It follows that we can rewrite the line element as
v
t dx 2 dy 2
u ‹  ‹
ds = + dt ,
dt dt

123
so the integral becomes
Z1
p
ℓ(γ) = dt ẋ 2 + ẏ 2 .
0

This is easily extended to three dimensions,


Z1
p
ℓ(γ) = dt ẋ 2 + ẏ 2 + ż 2 .
0

Now let’s consider an example to develop our intuition for the mathematical treatments to
come. The outcome of this analysis will be to confirm our intuition for simple cases, to guide
our thinking for more complicated cases, and then to develop a set of general principles that will
allow us to formulate equations of motion for general systems.
Let’s consider some basic ideas of particle motion. Firstly, particles move along paths
(parametric curves). There are many possible paths that a particle might follow between the
starting and end points of its motion. A path in phase space is an ordered sequence of positions
and momenta followed by a system over its lifetime. A path in physical space is the set of
positions in a path in state space. We shall assign to each path a numerical value called energy,
that measures the ‘activity’ along the path - a very curvy path has more stuff happening
(changing directions, accelerations, and so on) than a less curvy path path. Later, we shall see
that particles follow paths of minimum energy through phase space and minimum energy paths
corresponds to ‘shortest’ paths is real space.
For a general system, its path through phase space contains all state information for all particles
in that system. A single path is phase space corresponds to all paths of the individual elements of
the system. The motion of every part of the system is then described by the minimum energy path
of the system. The path through phase space determine complete information for the evolution
of a system. The motion of every part of the system is then described by the minimum energy
path of the system. Let’s consider some simple examples of particles with constant motion (paths
of shortest length) to see how to put this together. In the following example, the key observation of
constant motion is that there can be no acceleration, and we shall use this in next in Example 4.2
4.2.

Example 4.2 (Constant Motion in the Cartesian Plane) Consider the constant motion of a
particle moving in the 2-dimensional Cartesian plane with position coordinates x and y .
Constant motion means no acceleration, no changing of direction, no excitement. We can
rearrange the t dependence and find that each component of r⃗ is written in the form of constant
times time plus constant. In particular r⃗ is written using constant vectors r⃗1 and r⃗2 , then

ẍ = 0 ⇒ ẋ = a 1 ⇒ x (t ) = a 1 t + a 2
ÿ = 0 ⇒ ẏ = b1 ⇒ y (t ) = b1 t + b2

and
 
r⃗(t ) = x (t ) x̂ + y (t ) ŷ = a 1 x̂ + b1 ŷ t + a 2 x̂ + b2 ŷ

124
r1 t + r⃗2
=⃗

We can generalise this to any number of spatial dimensions in Cartesian coordinates and find
similar expression.
The length of path along the curve is then

Zt q q
ℓ= dt ′ r⃗˙ (t ′ ) · r⃗˙ (t ′ ) = t a 12 + b12
0

Note that by the Pythagorean Theorem and the Triangle inequality,

⃗ ≤ ∥⃗
r⃗ + δr ⃗ ,
r ∥ + δr

so, the most direct path is also the shortest path. Now parameterise the curve such that the particle
Æ
traverses the path in the interval t ∈ [0, 1], then ℓ = t a 12 + b12 and t is an affine parameter that
measures the fraction of the total length that the particle has moved along the path.

The observation that constant motion follows from zero acceleration allows us to generalise
the ideas presented in Example 4.2 to other systems. Then adding a change in the speed of a
particle at each point along its path allows us to include changes in the path. When these changes
are small, the resulting path is ‘near by’ the original path.
The key idea in calculus of variations is that we can find a solution to a given problem by
considering functions that are ‘near by’ the actual solution, to determine properties of the solution,
and ultimately, the solution itself. We do this by studying variations about the solution given some
small shifts in the parameters of the solution. To study small variations about a given function,
we can use a Taylor series expansion about an arbitrary point on the function, here δ f is given by
the first order variation of f given a variation of x by δx . Note that the variation in f goes to zero
as the variation in x goes to zero. Consider a the Taylor Series Expansion of a f (x ) about a

x = a + δx , δx = x − a and a = x − δx

Then

X f (m ) (a )
f (x ) = f (a + δx ) = f (a ) + (δx )m .
m =1
m!
To first order in δx
f (a + δx ) → f (a ) + f ′ (a )δx .

as δx → 0, δ f → 0. Figure 4.6 gives a visual depiction of a curve y and a variation of this by some
function δy to give a varied path y + δy . The most important feature of the variations that we
consider is that the variation goes to zero at boundary of the intervals, here marked at (xi , yi ) and
(x f , y f ). Our aim will be to study the constrained optimisation problem that is bounded by the
arbitrary variation δy and so determine the unknown function y for each problem of interest.
We look at this next.

125
yf •
y

yi • y
y + δy
xi xf
x
Figure 4.6: The variation of a path y (x ) by δy to produce the varied path y + δy between the
fixed end-points (xi , yi ) and (x f , y f ). The variation δy is zero at xi and x f .

Example 4.3 (Constant Motion in on the Cylinder) Consider the motion of a particle with
position p = (x , y , z ) cylinder in Cartesian coordinates:

x → x (θ ) = R cos (θ ) ⇒ d x = −R sin (θ ) d θ
y → y (θ ) = R sin (θ ) ⇒ d y = R cos (θ ) d θ
dz
 ‹
z → z (θ ) = z (θ ) ⇒ dz = dθ

Then,
Zp2 q Zθ2
v
‹2
dz
u 
2 2 2
d x +d y +d z = dθ R2 +
t
ℓ=

p1 θ1

The cylinder has radius R and the path extends from p1 to p2 . Rewriting the coordinates in terms
of the dynamical parameter θ , the path must now extend θ1 to θ2 , and z (θ ) is a to be determined
function of θ . Using the arc length in Cartesian coordinates, we rewrite the integral for the length
of the curve on the cylinder. To find the path corresponding to constant motion on the cylinder, we
must first find the path with the minimum length, this means determining z (θ ) - we shall do this
by introducing some concepts from the Calculus of Variations.
We can determine the equation for the shortest path between two fixed points (here specified
by θ1 and θ2 ) on the surface of the cylinder by determining the variation of the path length δℓ
with respect to a variation in the position z when the dynamical variable θ is varied. We only
have specific information about the variation δθ at the terminal points of the motion, therefore,
we must rewrite the variation δℓ to make use of this information. Rewriting the variation in δℓ
using integration by parts removes the integral dependence on the derivative of δz , which is totally
arbitrary, except at the end points.

126
Consider an arbitrary path variation between fixed endpoints p1 and p2 ,

θ → θ + δθ ⇒ z → z + δz ⇒ ℓ → ℓ + δℓ

and
Zθ2 Zθ2
v
‹2 dz

dz d
u 

dθδ R2 + dθ Ç
t
δℓ = = δz
dθ R 2 + dd θz d θ
2
θ1 θ1

d
Proceed using integration by parts to replace dθ δz with δz and then remove the dependence of δℓ
d
on dθ δz to give
Zb Zb
d u v = [v u]ab − d v u.
a a

Fixed end-points implies zero variation at the boundary, so

Zθ2
 (
d2 z
€ Š )
R2
 
dθ2
δℓ = − d θ δz
dz 2
Ç
dz 2

 R2 +
  R2 +
dθ dθ
θ1

now the integrand has multiplicative dependence on δz only.


After integrating by parts, the boundary term is eliminated since the variations δθ and δz are
zero at the fixed boundary. The integrand is now proportional to the variation δz . For a minimum
length path, any variation δz must correspond to a variation δℓ that will increase the length of the
path. A minimum length path admits no such variation no matter what the variation δz . For any
variation δz ,
d2 z
 
δℓ = 0 ⇒ =0 ⇒ z (θ ) = a θ + b .
dθ2
Returning to the length calculation,

Zθ2
p p p
ℓ= dθ R2 +a2 = R 2 + a 2 (θ2 − θ1 ) = R 2 + a 2 ∆θ .
θ1

Reparametrising the θ = ∆θ t + θ1 , with t ∈ [0, 1] gives

r⃗(t ) = R cos (∆θ t + θ1 ) x̂ + R sin (∆θ t + θ1 ) ŷ + (a ∆θ t + a θ1 + b ) ẑ .

Requiring that δℓ is zero for any δz imposes a restriction on €the Švalue of the integrand that is
d2 z
independent of δz . In this case, the integrand is zero only when dθ2 = 0. This gives an equation of
motion for z that we can solve by direct integration. Again, we may introduce an affine parameter
t to specify the position along the path. Notice linear dependence of z on the parameter θ - we can
demonstrate this graphically in Figure 4.7
4.7.
Figure 4.7 depicts the path of minimum length along the cylinder given the initial values θ1 , θ2 ,
a and b . Clearly, when considered as a 2-dimensional sheet, path followed along the cylinder is a

127
(R θ1 , a θ2 + b ) (R θ2 , a θ2 + b )

r~(θ )

(R θ1 , a θ1 + b ) (R θ2 , a θ1 + b )

Figure 4.7: Geodesic motion on a cylinder corresponds to a ‘straight’ line traced on the surface
formed by splitting the cylinder along its length and pressing it flat into a plane. In this case the
motion is traced with respect to the independent variable θ that tracks the position r⃗(θ ) that
follows a point p across the surface of the cylinder. The motion initiates at p = p1 corresponding
to θ = θ1 and terminats at p = p2 corresponding to θ = θ2 .

‘straight’ line in the usual sense with respect to the position along the vertical z -direction versus the
horizontal θ direction. Therefore, the curved line in 3-dimensions on the cylinder is a ‘straight’ line
on the surface of the cylinder. Obviously, changing the values of each of these initial values with
change the path in 3-dimensions, which remains a ‘straight’ line in 2-dimensions.

The before we generalise ideas of paths of minimum length, it will be useful to introduce the
following definition.

Definition 16 (Riemannian Manifold) A Riemannian manifold (M , g) is a real, smooth manifold


M equipped with a metric g on the space of tangent vectors Tp M at each point p . The metric g is a
smooth function on the coordinate mappings (U , p ) on M , such that

g i j (p ) = g e⃗i (p ), e⃗j (p ) : U → R

and e⃗i (p ) is the i -th coordinate tangent at p .

So far we have seen how to measure lengths of curves in a few different spaces. In each
case, the method that we used made sense because there was a meaningful way to associate
positions on that space, with a coordinate system. Then we measured the distance between points
along some path connecting those points using the tangent vectors along that path. Mapping
coordinates from Euclidean space onto another space allows us to carry the inner product, as a
length measuring function in Euclidean space, to the other space. Any space that we can equip in
this way is called a Riemannian Manifold. We shall study Classical Mechanics on Riemannian
Manifolds.

128
Definition 17 (Geodesic) Given a Riemannian manifold (M , g) with metric tensor g, the length ℓ
of a continuously differentiable curve γ : [a , b ] → M is
Zb Ç 
ℓ(γ) = dt g γ(t ) γ̇(t ), γ̇(t ) ,
a

The the distance between two points p1 = γ(a ) and p2 = γ(b ) is

d (p1 , p2 ) = inf ℓ(γ)


γ

over all continuous, piece-wise continuously differentiable curves γ.

In Riemannian geometry, all geodesics are locally distance-minimising paths. Here, locally
means only in the neighbourhood of a point. As an example, consider two arbitrary points on a
circle. Clearly, one direction along the circle between these two points is shorter than the other.
The distance between these two points is a global minimum. However, we compute lengths by
integrating along adjacent points on a curve. Geodesics are constructed using information about
neighbouring points. This information is local to each point an a given path. The distance is the
minimum of all possible paths, which uses global information. The interested student might
consider more formal definitions of Geodesics in Affine spaces in more advanced courses in
Mathematics.
Constant motion corresponds to motion on a ‘straight’ path and ‘straight’ paths have minimum
length. Paths of minimum length are geodesics in a given space. Calculus of Variations can be used
to determine the equations of motion for a path of minimum length. Next we shall extend the
use of calculus of variations to develop a general path ‘energy’ function for classical mechanical
systems.
We could have chosen to write the integral in Example 4.3 using a different coordinate system.
Regardless of the formulation of the integral, the function ℓ takes a curve as input and returns a real
number. This is a generally useful tool in studying problems and in particular, we are interested
in problems where this functional acts by taking an integral over some curve of the form,
Zt f
d t L q 1 (t ), q 2 (t ), . . . , q n (t ), q̇ 1 (t ), q̇ 2 (t ), . . . , q̇ n (t ), t

I (γ) =
ti

where t parameterised the path γ in some n-dimensional manifold with parameterised by


q 1 (t ), q 2 (t ), . . . , q n (t ) . Several interesting functionals take this form, as we shall discover shortly,


but first we develop the first order optimality condition for such functionals in the Calculus of
Variations. We begin with an important lemma.

Lemma 1 (Fundamental Lemma of the Calculus of Variations) Given some C k (k times


continuously differentiable) function f on some interval [a , b ], if
Zb
d x f (x )h (x ) = 0 (4.1)
a

129
for all C k functions h defined on [a , b ] satisfying h (a ) = h (b ) = 0, then f (x ) = 0 for all x ∈ [a , b ].
Proof. Let r (x ) be some C k function that is strictly positive on (a , b ) and zero at a and at b .
4.1) is zero for any C k function h that
For example, we could choose r (x ) = (x − a )(x − b ). Since (4.1
vanishes at the end-points of the interval we can choose

h (x ) = f (x )r (x )

on the interval [a , b ]. Since f and r are C k , h is also C k and it vanishes at the end-points of the
interval. Then,
Zb Zb
d x f (x )h (x ) = d x f 2 (x )r (x ) = 0. (4.2)
a a

Now, the integrand is strictly non-negative over the entire interval because r is non-negative by
construction and f 2 is the square of a real number at each point x on the interval. Since the integral
is zero and the integrand is non-negative over the entire interval, we conclude that the integrand
is zero everywhere on the interval. Note, however that r is non-zero on the interval (a , b ), so the
only way that the entire integral can be zero is if f (x ) = 0 at each point x on the interval. Therefore
f (x ) = 0 for all x ∈ [a , b ]. ■

The Fundamental Lemma of the Calculus of Variations will be useful next when we reformulate
Newtonian Mechanics as a problem in optimisation.

4.3.2 Optimization of Functionals


One of the primary uses of the Calculus of Variations is for finding optimal curves according to
some criteria. For instance, we might want to find the curve of shortest length between two fixed
points in space. Let us consider the family of curves between two fixed points P and Q . We wish
to find the curve between these points that optimizes the value of some functional that can be
written as the integral of some function L over the curve. For now, consider the one dimensional
problem with parameter t and a curve given by q (t ) defined on the interval t ∈ [t 0 , t 1 ]. Fix P = q (t 0 )
and Q = q (t 1 ) for all curves q . We want to optimise

Zt 1

I (γ) = d t L q (t ), q̇ (t ), t .
t0

To reason about this problem, call the optimal curve γ and parameterize it by q0 (t ). Now consider
any other curve between the same end-points and this curve be parameterized by q (t ). At each
time value we can define the difference between these curves

δ(t ) = q (t ) − q0 (t ).

Since the end-points of the curves coincide we have that δ(t 0 ) = δ(t 1 ) = 0. Now fix some arbitrary
C 1 function δ(t ) with t ∈ [t 0 , t 1 ] that is zero at the end-points. Consider the family of curves

130
parameterised by the real valued parameter α

qα (t ) = q0 (t ) + αδ(t ).

We shall refer to δ(t ) as the variation of q0 t . Clearly, for this family of curves, α = 0 is the optimum
curve q0 (t ). For each value of α the functional I takes some real value,
Zt 1

I (α) = d t L qα (t ), q̇α (t ), t .
t0

dI

Since I is a real functional, the criterion for it to be optimal is dα α=0
= 0. This leads to

Zt 1
dI d
 ‹ 
0= = d t L qα (t ), q̇α (t ), t
dα α=0

t0 α=0
Zt 1
∂L d qα ∂L d q̇α
§ ‹ ‹  ‹ ‹ª
= dt +
∂ qα dα ∂ q̇α dα
t0 α=0
Zt 1
∂L ∂L
§ ‹  ‹ ª
= dt δ(t ) + δ̇(t ) .
∂q ∂ q̇
t0

The second term can be integrated using integration by parts to produce


Zt 1 ˜t =t 1 Z 1
t Z1 t
∂L ∂L d ∂L d ∂L
 ‹ • ‹  ‹  ‹
dt δ̇(t ) = δ(t ) − dt δ(t ) = − d t δ(t ),
∂ q̇ ∂ q̇ t =t 0 d t ∂ q̇ d t ∂ q̇
t0 t0 t0

where the boundary terms are eliminated by the fact that δ(t 0 ) = δ(t 1 ) = 0. Using this result, we
find, after some simplification
Zt 1
∂L d ∂L
§ ‹  ‹ª
dt − δ(t ) = 0.
∂q d t ∂ q̇
t0

Since the choice of δ(t ) was arbitrary, this statement must be true for any δ(t ). Then, by the
Fundamental Lemma of Variational Calculus we conclude that the integrand is zero and so
∂L d ∂L
 ‹  ‹
− = 0. (4.3)
∂q d t ∂ q̇
This is the familiar Euler-Lagrange Equation. This means that the optimal path q must satisfy the
Euler-Lagrange equations. We extend this result to many variables,
∂L d ∂L
 ‹  ‹
− = 0. (4.4)
∂ qi d t ∂ q̇ i
which is the familiar many variable Euler-Lagrange equation for each variable i .
We shall see this the Euler-Lagrange equations appear again in the next section.

131
4.4 The Euler-Lagrange Equations
The Euler-Lagrange equations are derived following an optimisation procedure applied to a
general Lagrangian. In this section we consider the Euler-Lagrange equations for physical systems
involving kinetic and potential energies.

4.4.1 A General Scheme for Conservative Systems


When the forces acting on the system are conservative, then we can write the net force as

F⃗ = −∇U

and so we can write the generalized component of force in the j -th coordinate as
X ∂ x ‹ X∂ U ‹ ∂ x ‹ 
∂U
‹
i i
Qj = Fi =− =− .
i
∂qj i
∂ xi ∂qj ∂qj

Lagrange’s equations become

d dT ∂T ∂U
 ‹  ‹  ‹
− = − .
d t d q̇ j ∂qj ∂qj

We can rewrite this as


d dT ∂T ∂U
 ‹  ‹  ‹
− + = 0.
d t d q̇ j ∂qj ∂qj
By definition, U is a function of position only, so

∂U
 ‹
= 0.
∂ q̇ j

Since adding zero to an equation does not change the equation, we can add this additional term
to the above expressions to get

d dT ∂U ∂T ∂U
 ‹  ‹‹  ‹  ‹‹
− − − = 0,
d t d q̇ j ∂qj ∂qj ∂qj

which we can rewrite as


d d(T − U ) ∂ (T − U )
 ‹  ‹
− = 0.
dt d q̇ j ∂qj
Define the Lagrangian of the system
L = T −U . (4.5)

Lagrange’s Equations become,


d dL ∂L
 ‹  ‹
− = 0, (4.6)
d t d q̇ j ∂qj
which are the celebrated Euler-Lagrange Equations for a conservative system.
Systems with only conservative forces are now remarkably simple to solve - we simply compute
the kinetic and potential energies and plug them into the above equations. Notice how force
doesn’t appear in these equations at all - no force diagrams, no calculating components of force,

132
no adding up of forces. All we need is the kinetic energy of the system and the potential energy of
the system and we can solve for its motion.
The Euler-Lagrange Equations have many other special properties - notably in the Calculus of
Variations and Control Theory. They also solve a special class of optimization problems. We may
touch on some of these topics later in the course. These equations are also at the corner-stone
of modern physics, and they underlie Quantum Field Theories and String Theory and a deep
understanding of them is essential for the serious theoretical physicist. This is a case where the
rabbit-hole goes very deep indeed: the equations above show up everywhere from Computer
Vision Algorithms to High Energy Physics. We see them here in their original context. It certainly
is worth memorizing.
Our general scheme for solving a system in Lagrangian Mechanics can now be updated to the
case where all the forces in the system are conservative.

1. The first step, as always, is to find suitable generalized coordinates for our system and to
write down the transformation equations between our generalized coordinates and some
fixed Cartesian Coordinate system.

2. Find the Kinetic Energy T in terms of the generalized coordinates.

3. Find the Potential Energy U in terms of the generalized coordinates

4. Set L = T − U and obtain the Equation of Motion for the System.

5. Solve the equation, if possible.

Note that the equations that arise from this formulation will probably not be soluble by regular
mathematical techniques, in particular, these equations will most often require numerical
treatments.
Next we consider a non-trivial example, which is completely soluble within the strict confines
of an elasitc deformation. We now study the example of a mechanical system whose motion is
governed by the compression and extension of a spring, subject to Hooke’s Law. The restoring force
of the spring is conservative and results in the simple mechanical oscillation of the block-spring
system.

Example 4.4 (Block on a Spring) Consider the system comprising a spring that is attached to a
wall at one end and to a block of mass m at the other. The anchor point of the spring is fixed, while
the block slides on a smooth horizontal surface, as shown in Figure 4.8
4.8.
The system is free to move in a single direction. The motion of the block is well parametrised by
a single generalised coordiate x marking the position of the block along the x̂ -direction. Suppose
the unstretched spring has length l and stiffness k . By Hooke’s law of mechanical deformation, the
restoring force applied by the spring when it is compressed by δ ⃗ = (l − x ) x̂ is


F⃗ = −k δ.

133

Figure 4.8: A block attached to a spring that smoothly deforms by compression (or extension)
along its length. The deformation of the spring causes the spring to respond with a restoring force
that is proportinal to the amount of the distortion and is oriented so as to oppose the deformation.
The block is free to slide on the smooth horizonal surface under the influence of the restoring
force of the spring.

The work done to distort the spring by an amount x is

Zx f l −x
Z
W = d x⃗ · F⃗ = d x x̂ · −k (l − x ) x̂
xi x
l −x
Z
= −k d x (l − x )
l
˜l −x
1
•
= −k l x − x 2
2 l
1 2
• ˜
= −k −x l − (x − 2l x )
2
1
• ˜
= −k −x l − x 2 + l x
2
1 2
= kx .
2
Suppose the block is allowed to move over a closed path γ, starting at some position x0 then moving
away before returning to x0 . Since the problem is 1-dimensional, this corresponds to moving some
distance away from an initial point x0 in the x̂ -direction and then returning to x0 . Then the work
done by this force on this path is

Zx0
1 2 x0
• ˜
W = ⃗
d x⃗ · F = −k l x − x = 0.
2 x0
x0

Since x0 is an arbitrarily chosen point, the work done by this force on any closed path is zero.
Therefore, F⃗ is conservative.
The corresponding Lagrangian for the spring block system comprises two terms, the kinetic term
T and the potential energy term U , namely

1 1
T = m x⃗˙ · x⃗˙ = m ẋ 2 .
2 2

134
Note that the block remains at fixed height for the duration of its motion, so the gravitational
potential does not change during the course of the motion of the block. As such we can treat is as a
constant C . Next, we write done the Lagrangian

1 1
L = T − U = m ẋ 2 − k x 2 − C
2 2
Since there is only a single dynamical variable x in this system, the Euler-Lagrange equations
comprise a single differential equation, where

d ∂L ∂L
 ‹  ‹
= ,
dt ∂ ẋ ∂x

and
∂L ∂L d ∂L
 ‹  ‹  ‹
= −k x , = m ẋ and = m ẍ .
∂x ∂ ẋ dt ∂ ẋ
Notice that C does not appear in any of the terms in the Euler-Lagrange equation, this is an
important feature of the Lagrangian formalism - shifting the Lagrangian by a constant does not
effect the equations of motion. So, the equation of motion is

m ẍ + k x = 0.

This differential equation is second-order, linear and homogeneous, and is easily solved by any
number of techniques. Here is a demonstration of solution by method of undetermined coefficient.
Suppose x (t ) = Ae λt is a solution to this system, then

ẋ = λAe λt = λx and ẍ = λ ẋ = λ2 x .

Then,
mλ2 x + k x = 0.

Since x = 0 corresponds to the trivial solution, we reqire instead that x ̸= 0 and

k
λ2 + = 0.
m
We can solve for λ using the quadratic equation solver to find
v v
t k tk
λ± = ± − = ±i .
m m

Substituting this pair of roots λ± into the trial solution and using the idetity

e i θ = cos (θ ) + i sin (θ )

yields
– v ™ –v ™
tk tk
x (t ) = A exp − t + B exp t
m m

135
‚v Œ ‚v Œ
tk tk
= (A + B ) cos t + (B − A)i sin t
m m
‚v Œ ‚v Œ
tk tk
= P cos t + Q sin t ,
m m

where P and Q are determined by initial conditions.

Next we combine Example 3.4 and Example 4.4 to consider a pendulum that undergoes
an elastic deformation while swinging. In this case, there are two dynamical varaibles with a
corresponding interaction terms in the Lagrangian giving rise to a pair of coupled equations of
motion for the angular and radial generalized coordinates.

Example 4.5 (The Simple Elasitc Pendulum) Consider again the simple rigid pendulum from
Example 3.4
3.4. We consider a modification of the simple pendulum problem where the mass-less
rod is now a spring that is allowed to expand and contract, but not to bend. We assume that the
rest-length of the spring is l 0 . As per usual we set up a reference system of Cartesian Coordinates.
The constraints on the bob are not holonomic in this instance, and there are thus two degrees of
freedom associated with the system (explore this statement).
We can thus choose any coordinates in 2-dimensional space. We maximally exploit the symmetry
of the system by using polar coordinates (in these coordinates, the extension of the spring is in the r̂
direction). Thus we choose our generalized coordinates for the system to be (r, θ ). The kinetic energy
is
1 1
T = m r⃗˙ · r⃗˙ = m ṙ 2 + r 2 θ̇ 2 .

2 2
For the potential energy we need to sum the gravitational and spring potential energies (why does it
make sense to do this?). The gravitational potential energy, taking the origin as a reference point, is

Ugravity = −mg r cos (θ ) .

The spring potential energy is given by


1 1
Uspring = k (l − l 0 )2 = k (r − l 0 )2 ,
2 2
Where we have made the observation that the distance of the bob from the origin is also the length
of the spring. (Here we see the advantage of using Polar coordinates). Thus, the total potential
energy on the system is
1
U = Ugravity + Uspring = k (r − l 0 )2 − mg r cos (θ ) .
2
Therefore,
1  1
L = T − U = m ṙ 2 + r 2 θ̇ 2 − k (r − l 0 )2 + mg r cos (θ ) .
2 2
Computing the corresponding Euler-Lagrange equations for conservative forces yields

m r̈ = m r θ̇ 2 + k (r − l 0 ) + mg cos (θ )

136
g 2
θ̈ = − sin (θ ) − ṙ θ̇ .
r r
Notice again that shifting thevalue of the Lagrangian by a constant C has no effect on the equation
of motion of the system. It should not surprise us that when the length of the pendulum r is constant,
the second equation reduces to that of the simple pendulum. Notice that in this case the first equation
will cease to make sense because is no longer a generalized coordinate.

4.4.2 Including Convervative and Non-Conservative Forces


When some of the forces in the system are conservative and others are not, the net force can be
written as a sum of a conservative part and a non-conservative part

F⃗ = −∇U + F⃗ ′ .

Now, the j -th generalised component of force is


∂U
 ‹
Qj = − +Q′j ,
∂qj

where Q ′ j is the generalised component associated with the non-conservative forces. We can
follow the same procedure as before to construct the Euler-Lagrange equations to produce
d dL ∂L
 ‹  ‹
− =Q′j. (4.7)
d t d q̇ j ∂qj
As before, the Lagrangian is the difference between the kinetic and potential energy of the system
L = T − U . This is the most general form of Lagrange’s equations, where the Lagrangian takes
care of teh conservative forces and what remains is for us te compute the generalised components
of the non-conservative forces.

Remark 28 A complete treatment for non-conservative forces in the Lagrangian formalism is


deferred for a more advanced course.

4.5 Constraints
In general, a simple mechanical system might have a collection of forces acting on it so as to
constrain, or limit the motion of objects within that system. An example of this might be a normal
force that is exerted on a part of a mechanical system that would limit its motion. for example a
ball resting on the a table top is maintained in its vertical position at its contact point with the
table by the normal force (the reaction of the table to the weight of the ball) at the point where it
rests on the table.
We might try to eliminate the constraint that the ball is held at fixed height above the ground
by the contact force supplied by the table by adding the force of constraint to the collection of
forces that we must consider when analysing the system. When this is included in the calculations,
the force equations are balanced such that the ball maintains its vertical position.

137
Another way to study the system is to construct the space of allowed ball positions which
explicitly allows only positions of the ball that are at a given height above the ground, as specified
by the position of the ball on the table. If it is not possible to construct all allowable positions
directly, then a collection of constraints must be added so as to reduce the space of all possible
positions that the ball can occupy.
A complete and systematic analysis of such a system must take into account for these constrains.
The benefit of generalised coordinates is that they eliminate the forces of constraint. The benefit
of the Lagrangian approach is that it eliminates the need to consider constrain forces and replace
the concept of constraint forces with a collection of geometric constraints instead.
Let’s return to the Least Action Principle and consider a 2-coordinates system connected by a
(holonomic) constraint. Consider a 2-coordinates system connected by a (holonomic) constraint,
  
f qi , q̇i , t = 0

the variation of the action is then


Zt f Zt f N 
∂L d ∂L
X ‹  ‹‹
δS = δ L dt = dt − δqi .
i =1
∂ qi dt ∂ q̇i
ti ti

or by least action principle


Zt f N 
∂L d ∂L
X ‹  ‹‹
0= dt − δqi
i =1
∂ qi dt ∂ q̇i
ti

Since we must now introduce a collection of constraints among the parameters qi , we can no
longer assert that each summand over the index i is linear independent from the others. Therefore,
we cannot separate the single variation into a collection of equations in a single variable alone.
Instead, we must keep the entire expression under the integration as a single equation.
The variation δq1 depends on the variation δq2 , δq3 , . . . , etc. To circumvent this issue, we can
consider the variation of the holonomic constraint function to take a fixed value at each point in
time, and so the variation of the constraint function is zero.
We now explicitly break-out each summation under the integration sign and set the variation
to zero. Then, since the constraint if holonomic
∂f ∂f
 ‹  ‹
δf = δq1 + δq1 = 0
∂ q1 ∂ q1
or
∂f
€ Š
∂ q1
δq2 = − € ∂ f Š δq1 .
∂ q2

Substituting into δS gives,


Zt f (
‹˜ ∂ f
€ Š)
∂L d ∂L ∂L d ∂L
• ‹  ‹˜ • ‹ 
∂ q1
δS = d t δq1 − − −
∂ q1 d t ∂ q̇1 ∂ q2 d t ∂ q̇2 ∂f
€ Š
∂ q2
ti

138
and then substitute this value for the variation of δq2 into the expression for the least action.
We now require that the variation δS vanishes for all time. So,
∂f
€ Š
∂L d ∂L ∂L d ∂L
 ‹  ‹  ‹  ‹‹
∂ q1
− = −
∂ q1 dt ∂ q̇1 ∂ q2 dt ∂ q̇2 ∂f
€ Š
∂ q2

we can rearrange this, since it is separable, to get


€ Š € Š € Š € Š
∂L d ∂L ∂L d ∂L
∂ q1 − d t ∂ q̇1 ∂ q2 − d t ∂ q̇2
∂f
€ Š = ∂f
€ Š
∂ q1 ∂ q2

This equation is now separated. Since each side of the equation depends on either q1 or on q2 ,
which can be varied independently, but maintain the equality. So far, we have managed to separate
the qi dependence, but there could still be dependence on the time (this could be implicit or
explicit dependence). Therefore each side of this equation necessarily independent of qi , but
possibly depends on t . Then,
€ Š € Š € Š € Š
∂L
∂ q1 − ddt ∂∂ L
q̇1
∂L
∂ q2 − ddt ∂∂ L
q̇2
−λ = € Š
∂f
and −λ= € Š
∂f
∂ q1 ∂ q2

a where the separation function λ is called a Lagrange Multiplier. Lagrange multipliers are useful
whenever we need to manage an unknown constant functional dependence in any constrained
optimization problem. This allows us to rewrite our equations of motion as
∂L d ∂L ∂f
 ‹  ‹  ‹
0= − +λ
∂ q1 d t ∂ q̇1 ∂ q1
∂L d ∂L ∂f
 ‹  ‹  ‹
0= − +λ
∂ q2 d t ∂ q̇2 ∂ q2
In this case we find two equations of motion, in two coordinates that are connected by on equation
of constraint. We can generalize this to n coordinates and m equations of constraint,
m
∂L d ∂L ∂ fj
 ‹  ‹ X  
0= − + λj
∂ qi d t ∂ q̇i j =1
∂ qi

where i = 1, . . . , n and j = 1, . . . , m. Some rearranging gives


m
∂L ∂ fj d ∂L
 ‹ X    ‹
+ λj =
∂ qi j =1
∂ qi d t ∂ q̇i

Inspecting each terms, we have a generalised force term


d ∂L d pi
 ‹  ‹
=
d t ∂ q̇i dt
and another force term
m
∂L ∂ fj
 ‹ X  
+ λj
∂ qi j =1
∂ qi
where

139
€ Š
∂L
• Fi = ∂ qi is a conservative force (depending only on position)
m €∂ f Š
j
P
• Qi = λj ∂ qi is a generalized force of constraint.
j =1

Generalized forces could also include torques.

Example 4.6 (A disc rolling without slipping down an inclined plane) Suppose a disc is place
on an incline such that the disc does not slide down the slop, but is allowed to roll down the slope
instead. Suppose further that the length of the incline is l , the distance from the start of the rolling
is y , the radius of the disc is r , the incline of the incline is α, and the angular displacement as the
disc rolls is θ , see Figure 4.9
4.9.

x̂ ŷ 0
ŷ θ

x̂ 0
y

~
W
α

Figure 4.9: A cylinder placed on inclined plane that is subject to a no slipping condition must roll
down the plane such that the point of contact between the cylinder and the plain moved down
the plane such that there is no relative motion between the point on the cylinder and the point on
the plane where contact is made, rather the contact point moves along the cylinder and the plane.

The equation of constraint on the motion of the disc is

f (y , θ ) = y − r θ = 0

Construct the Lagrangian

1 1
T = m ẏ 2 + m r 2 θ̇ 2 and V = mg (l − y ) sin (α)
2 4
then
1 1
L = m ẏ 2 + m r 2 θ̇ 2 − mg (l − y ) sin (α) .
2 4
Including the Lagrange multipliers in the equations of motion we find

∂L d ∂L df ∂L d ∂L df
‹   ‹  ‹  ‹  ‹  ‹
0= − +λ and 0= − +λ .
∂y d t ∂ ẏ dy ∂θ d t ∂ θ̇ dθ

Plugging in the expressions for L and f gives

1
0 = m g sin (α) − m ÿ + λ and 0 = − m r θ̈ − λ.
2

140
Eliminating the Lagrange Multiplier gives
1
0 = m g sin (α) − m ÿ − m r θ̈ .
2
From the equation of constraint we also have
2
y =rθ and ÿ = r θ̈ =⇒ ÿ = g sin (α)
3
Then
1
λ = − mg sin (α) .
3
and
2g sin (α) 2g sin (α)
ÿ = and θ̈ = .
3 3r
Now we could integrate each of these equations twice to recover the position y and the angle through
which the disc has rolled θ . Or, since we have the no slipping condition, it is possible integrate one
of these equations and then use the above geometric construction to determine the other.
The negative sign indicates that this generalised force is directed up the slope - it opposes the
motion with respect to increasing y coordinate. So this points up the incline. This corresponds to
the force of friction that is also responsible for the disc to roll down the slope instead of sliding down
the slope. This corresponds to the torque acting on the disc due to the friction at the point of contact.
Lets now look at the generalised forces,
∂L 1 ∂L 1
 ‹  ‹
Qy = λ = λ = − m g sin (α) Qθ = λ = −r λ = mg r sin (α)
∂y 3 ∂θ 3

where r θ̇ = ẏ and ẏ 2 = r 2 θ̇ 2 so that


3 2
L = m ẏ 2 + m g (y − l ) sin (α) and ÿ = g sin (α) .
4 3
The Lagrangian depends on only one parameter y . We can find the equations of motion y . It is
easy to see that if it is not necessary to determine the generalised forces, then using the constraints is
a much faster way to determine the equations of motion.

Example 4.7 (The Simple Atwood Machine) An Atwood machine is a mechanical assembly
comprising masses m1 and m2 suspended over a massless, frictionless pulley, of radius r , by a
massless inextensible cord of length l , see Figure 4.10
4.10. If mass m1 and m2 are different, then
loading of the pulley will be uneven and the system will start to move. Our task is to determine the
equations of motion for the motion of this system. Suppose we choose a horizontal datum line that
passes through the centre of the pulley.
Since the cord is inextensible, the total length of the chord l is fixed equal to the sum of lengths
of the cord hanging below the datum line and the portion of the cord running over the
half-circumference of the top of the pulley. The fixed length of the chord fixes the relative position
and motion of each mass,

l = y1 + y2 + πr =⇒ y2 = l − πr − y1

141

r

y2

y1
m2

m1

Figure 4.10: The simple Atwood machine comprises masses m1 and m2 suspended over a massless
pulley, of radius r , by a massless inextensible cord.

l˙ = ẏ1 + ẏ2 = 0 =⇒ ẏ1 = − ẏ2

We shall use the constraint on the relative motion of each mass to fix the dynamics of the system.
Next we setup a Lagrangian for the system by considering the kinetic and potential energy of
each block. The kinetic energy of each block expressed as before in the usual way

1 1
T1 = m1 ẏ12 and T2 = m2 ẏ22 .
2 2
Each block is subject to gravity and will have a potential energy measured relative to the datum
line, where we have chosen upward to be the positive vertical direction,

V1 = −m1 g y1 and V2 = −m2 g y2

In total we find
1
m1 ẏ12 + m2 ẏ22
 
T= and V = −g m1 y1 + m2 y2
2
Following the constraint we find the the velocities of the blocks, whatever they may be, are equal
in magnitude. So as one block moved down with a given speed, the other moves up at the same
speed. This allows us to reduce the number of variables from two to one and simplify the kinetic
and potential energy expressions for the system,

1 2
T= ẏ (m1 + m2 ) and V = −y1 g (m1 − m2 ) − m2 g (l − πr )
2 1
and so
1 2
L=
ẏ (m1 + m2 ) + y1 g (m1 − m2 ) + m2 g (l − πr )
2 1
The equation of motion are given by

∂L d ∂L
 ‹  ‹
− =0
∂ y1 d t ∂ ẏ1

142
After some algebra, we see that the Lagrangian for the system is a function of one variable, only.
We can now solve the equation of motion for this system in a single variable y1 which describes
the position of mass m1 as a function of time and then use the equation of constraint to reconstruct
y2 which gives the position of mass m2 . Computing each term in the Euler-Lagrange equation gives

∂L d ∂L
 ‹  ‹
= g (m1 − m2 ) and = (m1 + m2 ) ÿ1
∂ y1 dt ∂ ẏ1

So,
m1 − m2
ÿ1 = g
m1 + m2
1 −m 2
We can simplify these expressions if we define the reduced mass of the system µ = m
m1 +m2 . Then

ÿ1 − g µ = 0.

Again the constant term in the Lagrangian does not contribute to the equations of motion. Also, the
motion depends only on the reduced mass of the system. When µ > 0, then m1 moves downward
with increasing speed and m2 moves upward with increasing speed. When µ < 0 the situation is
reversed and m2 moved downward with increasing speed and m1 moves upward with increasing
speed. Clearly, the system is balanced when the m1 = m2 which corresponds to a reduced mass µ = 0.
When µ is zero move at whatever initial speed the system had a the initial time.

4.6 Conjugate and Cyclic Coordinates


We have already seen that a judicious choice of generalized coordinates will greatly simplify
computations. There are other simplifying opportunities to exploit when studying the Euler-
Lagrange equations of a given system. The simplest being for coordinates that do not participate
in the dynamics of a given problem. In such cases, the corresponding Euler-Lagrange equations
are immediately integrated to give the functional form of these variables. We shall look at this
simplifying property of some systems it detail.
Consider the following Lagrangian

1
L = T − V = m ẋ 2 + ẏ 2 + ż 2 − V (x , y , z )

2
where V (x , y , z ) is a smooth, continuous function of x , y and z . The corresponding
Euler-Lagrange equations are
∂V d
 ‹ 
− − m q̇i = 0
∂ qi dt

where qi ∈ x , y , z , then

∂V

• Fi = − ∂ xi is the force associated with the potential and

• pi = m ẋi is the i -th component of the linear momentum.

143
So the Euler-Lagrange equations for this Lagrangian are compatible with NII in rectilinear
coordinates.
The generalized momentum pk is conjugate to the generalized coordinate qk such that

∂L d pk
 ‹  ‹
− =0
∂ qk dt

where
∂L ∂L
 ‹  ‹
pk = and ṗk =
∂ q̇k ∂ qk
This gives rise to the cyclic coordinates.

Definition 18 (Cyclic Coordinates) A cyclic coordinate is one that does not explicitly appear in
the Lagrangian.

If L does not contain qk then the equations of motion is not determined by one of the equations
of motion following from the Euler-Lagrange equations. Instead qk is ignorable and does not
participate in the equations of motion and can be determined by other means. Suppose qk is
cyclic, then
∂L
 ‹
ṗk = =0
∂ qk
so pk is constant with respect to time. For each generalised coordinate that is missing from the
Lagrangian, then there corresponds a conserved conjugate generalized momentum. This is a
special case of the more general Noëther Theorem.

Theorem 3 (Noether’s Theorem) For each symmetry of the Lagrangian, there corresponds a
conserved quantity.

Corollary 2 For each generalised coordinate that does not appear in the Lagrangian, there
corresponds a conserved quantity.

Remark 29 The proof of Theorem 3 is omitted, but follows from the procedure of the variational
calculus in Section 4.3
4.3. A more general discussion of symmetries of the Lagrangian is deferred for
more advanced courses.

As an application, we consider in the next example a bead moving across the surface of a
cylinder.

Example 4.8 (Bead on a Cylinder) Suppose we have a cylinder, vertically aligned, of radius r .
Place a mass m on the surface of the cylinder that is constrained to be on the cylinder and subject to
a force.
Suppose f⃗ = −k r⃗ where V = 21 k (x 2 + y 2 + z 2 ) and suppose r 2 = x 2 + y 2 ,

1 1
T = m r 2 θ̇ 2 + ż 2 V = k (r 2 + z 2 )

and
2 2

144
z

y
x

p~

r
Figure 4.11: Bead on a Cylinder

Now
1  1
L = m r 2 θ̇ 2 + ż 2 − k (r 2 + z 2 )
2 2
The corresponding generalised momenta are

∂L ∂L ∂L
 ‹  ‹  ‹
pr = = 0 pz = = m ż and pθ = = m r 2 θ̇ .
∂ ṙ ∂ ż ∂ θ̇
Here, pθ corresponds to angular momentum, pr corresponds to radial momentum, and pz
corresponds to linear momentum. Note that θ is cyclic, so pθ is conserved (constant), but z is not.
Lets now confirm that pθ is conserved.
The equation of motion for θ reduces to

ṗθ = m r 2 θ̈ = 0

corresponding to θ̈ = 0 which implies θ̇ is constant. From the corresponding Euler-Lagrange


equations for the θ coordinate, θ̈ = 0 and so θ̇ is constant and is, therefore, a conserved quantity.
The equation of motion for z reduces to

k
z̈ + z = 0.
m
which is the equation for the harmonic oscillator. Therefore z is not conserved, but rather oscillates.

4.7 Exercises
Exercise 4.1 Compute the Equations of Motion of a free particle in Cylindrical and Spherical
Coordinates by using Lagrange’s Equations of Motion.

Exercise 4.2 Given a system of n particles, with all the forces conservative and having the SAME
potential function, show that the net force is conservative with a potential function given by the
sum of the potential function at the position of each particle in the system.

145
Exercise 4.3 Verify that F⃗ = −∇U .

Exercise 4.4 Because of path independence, we could have chosen a different path and obtained
the same answer that we did above. Another commonly used path is the straight line from the point
to the origin. Use this path as an exercise to check that you get the same answer for the potential
function.

Exercise 4.5 check that F⃗ = −∇U ′ .

Exercise 4.6 (Parametric Motion on a Horizontal plane) Consider the standard x −y -coordinate
system, where x captures the horizontal position information and y captures the vertical position
information. Consider a coin, of radius R , thickness T and mass M , rolling without slipping along
x in the positive direction. Suppose that we mark a point on the circumference of the coin and
define θ as the angle subtended at the center of the coin, by the radial line joining the center of the
coin to the marked point, and horizontal line passing through the center of the coin. Suppose that
we parameterise the motion of the coin by tracking the center of mass of the coin.

1. Explain why the following


r⃗(t ) = R θ x̂ + R ŷ

is the correct parametrisation for the position of the center of mass of the rolling coin. (Hint:
Consider the contact point of the circumference of the coin and the horizontal line.)

2. Explain why a positive value of θ corresponds to the coin moving in the − x̂ direction, while
negative value of θ corresponds to the coin moving in the positive direction. (Hint: Consider
the relative rotation of the coin and the relationship between the contact point and the
direction of rotation.)

3. Explain why the gravitational potential energy of the coin is constant.

4. Show that the mass density of the coin is

M
µ=
2πR 2 T

(Hint: Do the volumetric integral in the appropriate coordinate system, with the appropriate
bounds.)

5. Show that the rotational inertia of the coin is

M R2
I=
2

(Hint: Do the volumetric integral in the appropriate coordinate system, with the appropriate
bounds.)

146
6. Show that the kinetic energy of the coin as it rolls is
3
T = M R 2 θ̇ 2
4
(Hint: Consider the two pieces of the kinetic energy separately and then simplify their sum.)

7. Write down an appropriate Lagrangian for this system.

8. Show that the coin undergoes constant motion by solving the corresponding equations of
motion for the system.

9. Now suppose that the coin is allowed to slip as it rolls, show that the new Lagrangian for this
system is
1 1
L = M ẋ 2 + M R 2 θ̇ 2
2 4
and describe the qualitative differences between the case where the coin is allowed to slip as it
rolls and the case where it does not.

10. Determine the equations of motion for case where slipping is allowed and describe the
relationship between the horizontal and rotational motion of the coin.

Exercise 4.7 (Parametric Motion on an Incline) Consider the standard x − y -coordinate system.
Consider a coin, of radius R , thickness T and mass M , rolling without slipping on a fixed incline.
Suppose that the incline makes an angle of α with the horizontal plane and the coin rolls down
the slop in the positive positive x̂ direction. Suppose that we mark a point on the circumference of
the coin and define θ as the angle subtended at the center of the coin, by the radial line joining the
center of the coin to the marked point and horizontal line passing through the center of the coin.
Suppose that we parameterise the motion of the coin by tracking the center of mass of the coin.

1. Explain why the following


r⃗(t ) = X (t ) x̂ + Y (t ) ŷ

with
X (t ) = X 0 + R θ (t ) cos (α) and Y (t ) = Y0 − R θ (t ) sin (α) + R

where X 0 and Y0 are the initial positions of center of the coin with respect to the incline. is the
correct parametrisation of the rolling coin. (Hint: Consider the position contact point of the
circumference of the coin and the incline.)

2. Explain why a positive value of θ corresponds to the coin moving in the − x̂ direction, while
negative value of θ corresponds to the coin moving in the positive direction. (Hint: Consider
the relative rotation of the coin and the relationship between the contact point and the
direction of rotation.)

3. Does the coin roll up or down the incline when θ is a positive and increasing function. Explain
your reasoning. (Hint: Consider the relative rotation of the coin and the relationship between
the contact point and the direction of rotation.)

147
4. Explain why the two gravitational potentials

U1 = M g (Y0 − R θ (t ) sin (α) + R ) and U1 = M g R θ (t ) sin (α)

are each equally valid choices for the gravitational potential energy of the system and explain
under what conditions these are valid choices for the potential energy of the system. (Hint:
Consider the implication of a shift in the value of the potential by a constant value.)

5. Write down an appropriate Lagrangian for this system.

6. Show that the coin undergoes constant acceleration as a function of the angle α

7. Determine the equations of motion for case where slipping is allowed and describe the
relationship between the horizontal and rotational motion of the coin.

Exercise 4.8 (Pendulum on a Cart) Consider the system comprising the simple pendulum with a
ball of mass mball suspended my a mass-less cord of length length ℓ below a cart of mass mcart that
is free to move on one dimensional path (x , f (x )), where

1. f (x ) = a , where a ∈ R

2. f (x ) = x 2

and the entire system is assembled near the surface of the earth where the gravitational potential
energy for a mass m is m g h where h is the height above the bottom of the motion of the system and
g = 9.81 is the gravitational acceleration near the surface of the earth.

1. Draw a diagram for this system and label the Generalised coordinates for this system in your
construction for each choice of f . (Hint: Consider the examples of pendulums in the notes.)

2. Construct the Lagrangian for this coupled system, for each f (x ), as a function of mball , mcart , ℓ
and g . (Hint: Construct the system Lagrangian as a sum of the cart Lagrangian and the ball
Lagrangian.)

3. Find the Euler-Lagrange Equations of motion for this system

a) by hand
b) using a computer algebra package of your choice

The resulting equations will be a set of coupled second order, non-linear differential equations
in the generalised coordinates.

4. Generate the phase space of each generalised coordinate for the following initial conditions

a) The cart and the pendulum are initially at rest, with the cart at the origin of the coordinate
system, and the pendulum is raised to the right of the cart such such that the cord is
parallel to the horizontal.

148
b) The cart moves with velocity ẋ = 1 to the right, while the pendulum is initially at rest
and hangs directly below the cart.

Use mp = m c = 1, ℓ = 1 in these computations. (Hint: Use a computer algebra package of


your choice to generate the x and ẋ solutions for each generalized coordinate x and construct
plots of x versus ẋ .)

5. Explain the expected behaviour of the system given the phase portrait information. Give
specific reference to the behaviour when f is changed.

You may find the following Mathematica example functions useful in the implementation and
numerical evaluation of the code part of exercise,

(* list of coordinates *)
COORDS = {x , theta };

(* Numerical Solver *)
Sol [ g_ , mcart_ , mball_ , l_ , x0_ , theta0_ , xdot0_ , thetadot0_ ] :=
Module [
{ xtheta } ,
xtheta = NDSolveValue [
Flatten [
{
Eqns [g , mcart , mball , l ] ,
COORDS [[1]][0] == x0 ,
COORDS [[2]][0] == theta0 ,
COORDS [[1]] ' [0] == xdot0 ,
COORDS [[2]] ' [0] == thetadot0
}
],
COORDS ,
{t , 0 , TEND }
];
Return [ xtheta ];
];

(* Reconstruct solutions from numerical solver *)


swing = Sol [ GRAVITY , MCART , MBALL , LENGTH , x0 , theta0 , xDOT0 ,
thetaDOT0 , TEND ];

(* extract coordinate components from solutions *)


xSol [ t_ ] := First [ swing ][ t ];
thetaSol [ t_ ] := Last [ swing ][ t ];

149
(* plot phase space for thetaSol as a parametric function *)
ParametricPlot [ { thetaSol [ t ] , thetaSol ' [ t ] } , {t , 0 , TEND }]
and the Eqns[g, mcart, mball, l] expression is a comma separated list of each of the Euler-
Lagrange Equations, written in braces ({. . . }). Note that x0, theta0, xdot0 and thetadot0 contain
the initial condition information for the numerical solver.

150
Chapter 5

Multiple Particle Systems

Mainly single-particle systems have been considered in the examples in previous chapters, but
the derivation of Lagrange’s equations of motion allowed for any number of particles to be present
in the system. We now consider the implications of including more than one particle in a system
where interactions among these particles is now included.

5.1 Describing Multiple Particle Systems


The Lagrangian formalism is readily extended to incorporate multiple particle systems. Before
proceeding to applications, we shall first review some methods of describing multiple particle
systems.

5.1.1 Discrete and Continuous Descriptions


A discrete system of particles is a collection of N particle masses {m1 , m2 , . . . , mN } and positions
{⃗
r1 , r⃗2 , . . . , r⃗N }. The total kinetic energy of the system is
N N
X 1 1X
T= mi r⃗i · r⃗i = mi vi2 .
i =1
2 2 i =1

The total potential energy of the system is then


N
X
U= Ui .
i =1

In particular, for gravitational potential energy, this becomes


N
X N
X
U= mi g z i = g mi z i .
i =1 i =1

If there are non-conservative forces, we must compute their generalised components of force,
N 
∂x
X ‹
i
Qj = Fi .
i =1
∂qj

151
We regard Fj as the net j -th component of force in Cartesian coordinates,
N
X
(i )
Fj = Fj ,
i =1
(i )
where Fj is the force in the j -th component acting on the i -th particle. Then for a system of
particles,
N X
∂x
X ‹
i (i )
Qj = Fk .
i =1 k
∂qj
For many systems (like the every-day objects with which we interact), the number of particles
in the system is very large, and the scale of the individual particles is extremely small relative
to the scale of the system (like particles in a fluid or traffic on a congested highway). In such
circumstances, it is sensible to think of the system as a continuous mass distribution rather than
a discrete collection of object.
Suppose we consider a distribution of particles within a 2-dimensional box. In some places
there will be a much denser packing of particles than in others. Now suppose there is a sliding
’window’ over the system where we may count the number of particles that fit inside the window
as it slides from one place in the box to another. If the window has a fixed size, then if we divide
the number of particles within the window when the window is at a given position (x , y ) in the
box by the area of the window, then we obtain a density ρ(x , y ) of the system at the location of the
window. Notice that the density ρ(x , y ) is well defined at all points in space even though there
might be gaps between particles. Therefore, we have produced a function of two continuous
variables. This could also be extended to 3-dimensions.
Now we make the assumption that the window is very small relative to our every-day scale,
and call its area d A. Then the total amount of mass fitting inside the window at a point is an
infinitesimal quantity of mass d m located at that point. This gives us the relationship

d m = d Aρ(x , y ).
dm

Formally, the argument requires that in the limit as approaches 0, the ratio dA approaches ρ at
each point. Notice in particular that the integral over the region of the mass density given the
total mass of the system Z Z
M= dm = d A ρ(x , y ).
A
We can apply a similar argument to particles living in a volume or along a line and produce similar
results.
For continuous objects, we integrate over the mass distribution instead of summing over the
system Z
X
→ and mi → d m,

and so we can write out Kinetic and Potential Energies for continuous systems
Z Z
1
T = dT = dm v2 (5.1)
2

152
and Z
U= dU , (5.2)

in particular Z
Ugravity = g dm z (5.3)

where d m is the mass differential, which we think of as a small ‘piece’ of mass. The total mass
follows from the density distribution ρ(⃗
r ). If the system is one dimensional, then the density
is a line density and d m = d x ρ (x ). If the system is two dimensional, then the density is an
r ) where it may be more
area density and d m = d x d y ρ(x , y ), or equivalently, d m = d A ρ(⃗
convenient to specify ρ in coordinates other than Cartesian coordinates. Finally, when the system
is three dimensional, the density is a volume density and d m = d x d y d z ρ(x , y , z ) or equivalently
r ). In the special case where ρ is constant with respect to its coordinate dependence
d m = d V ρ(⃗
then the mass density of the body in question is said to be constant and we speak of a uniform
continuous body, and Z Z Z
M= dm = dV ρ = ρ d v = ρV .
V V V
Hence,
M
ρ= .
V
Similarly, a 2-dimensional uniform continuous body with area A and mass M has density ρ = MA
and a 1-dimensional uniform continuous body of length L and mass M has density ρ = ML .

5.1.2 Centre of Mass


In mathematics (and even more so in physics) it is important to have an intuitive grasp of a
concept before working directly with the equations. The more we perfect our intuition about a
problem, the more meaning we can assign to the formulae and equations that result. In this spirit,
we open this section with a motivational problem.

Example 5.1 (A Pivot Balance) Given two particles of equal weight connected together by a thin
mass-less rod, see Figure 5.1
5.1, where along the rod should a pivot be placed so that the system remains
balanced? The answer seems intuitively obvious - half way.
Now we consider a slightly different problem - this time let the mass on the left be 1kg and the
mass on the right be 2kg. Intuitively we suspect that the pivot should be placed closer to the heavy
mass in order for the system to be in balance. How close? That’s a slightly more difficult question.
Let us solve this problem using Lagrangian Mechanics (this is instructive in the context of this
course even though arguments appealing to torque can certainly also be used for this problem).
We’ll generalize it slightly so that instead of 1kg and 2kg we have two masses of arbitrary mass m1
and m2 .
Let the rod have a total length L and place the pivot point at a distance d from the left mass.
We wish to solve for such that the system is in equilibrium and the angle about the pivot is zero.

153
L −d
m2

d
θ

m1

Figure 5.1: A simple mass balance comprising a rigid, massless rod of length L placed atop a
fulcrum at a distance d from one end. Masses m1 and m2 are fixed at opposing ends of the rigid
rod.

To do this, we must first solve for the motion of the system in general, and then find out what the
equilibrium condition is. It is important to note that the distance d is not a generalized coordinate.
This is because once we have chosen where to place the pivot, we fix it there and do not allow it to
move subject to the laws of physics. The only motion of the system is to rotate about the pivot point.
Thus there is precisely one degree of freedom for the system.
To begin with, we fix a reference Cartesian Coordinate System with its origin at the pivot point.
A good generalized coordinate for the system is the angle, , which describes the rotation of the rod
about the pivot. We must produce transformation equations that give us back the coordinates of
each of the masses m1 and m2 ,

x1 = −d cos (θ ) x2 = (L − d ) cos (θ )
y1 = −d sin (θ ) y2 = (L − d ) sin (θ )

The transformation equations show conclusively that there is one degree of freedom, and one
coordinate θ . The only force acting on the system is the force of Gravity, which is a conservative
force. We can thus use Lagrange’s Equations for a Conservative Force in this instance. The Kinetic
Energy is the sum of the Kinetic Energy of each particle

1 1 1
T = mi d 2 θ̇ 2 + m2 (L − d )2 θ̇ 2 = m1 d 2 + m2 (L − d )2 θ̇ 2 .

2 2 2
Similarly, we compute the total potential energy of the system,

U = m1 g y1 + m2 g y2 = −m1 g d sin (θ ) + m2 g (L − d ) sin (θ ) = (m2 L − m1 d − m2 d ) g sin (θ ) .

Therefore, the Lagrangian is

1
m1 d 2 + m2 (L − d )2 θ̇ 2 + (m1 d + m2 d − m2 L ) g sin (θ ) .

L = T −U =
2
The equation of motion for this system is

m1 d 2 + m (L − d )2 θ̈ = (m1 d + m2 (d − L )) g cos (θ ) .


154
Now we return to the original problem: can we find a value d such that the system is balanced
when it is horizontal? This question decodes as follows. When θ = 0 and θ̇ = 0, what is the value
of d such that there is no rotation? Direct substitution of these requirements into the equation of
motion yields
m1 d 2 + m (L − d )2 (0) = (m1 d + m2 (d − L )) g cos (0) ,


or
0 = (m1 d + m2 (d − L )) g .

After a little algebra, we find


m2
d= L
m1 + m2
This enables us to give a precise answer to the question with the 1kg and 2kg masses, namely

2 2
d= L = L,
1+2 3
2
so the pivot must be placed 3 the length of the rod away from the 1kg mass.
We can generalise this result by introducing a 1-dimensional coordinate system in which the
coordinates x1 and x2 of the masses m1 and m2 are determined relative to the coordinate of the
pivot x c . In this case, we have

d = x c − x1 and L = x2 − x1 .

Substituting this into the above equations yields,

x1 m1 + x2 m2
xc = (5.4)
m1 + m2

This gives a coordinate formula for the centre of the system, which is the point at which we can
place the pivot so that the system will balance.

It is instructive to extend the discussion presented in Example 5.1


5.1, this time with an arbitrary
number of masses, and see what formula results. This is presented here as a guided exercise.
Consider a problem with masses fixed to a rod. We want to know where to place the pivot so that
the resulting system is in equilibrium. To begin with we must find the equation of motion of the
system. To do this, we remark once more that the only degree of freedom is θ . Set up the system
with the origin at the pivot point, and let the distance along the rod to each mass be given as
d 1 , d 2 , . . . , d n , with the convention that d i is negative if it lies to the left of the pivot and positive if
it lies to the right of the pivot. Then follow the following procedure,

1. Obtain Transformation Equations between the generalized coordinate θ and the coordinates
of each one of the masses in the system.

2. Write an expression for the total Kinetic Energy of the system.

3. Write an expression for the total Potential Energy of the system.

155
4. Write down Lagrange’s Equation for this system. The equation that you obtain here will look
similar to what we obtained previously.

5. Set θ = 0 and θ̇ = 0 to find an equation relating the positions that must be satisfied for the
rod to be in equilibrium. As before, we would now like to consider what this would look
like in a coordinate system in which the coordinates of the masses are x1 , x2 , . . . , xn and the
coordinate of the correct pivot point is x c .

6. Find the relationship between d i and xi .

7. Reduce the system of equations by back substitution.

8. Solve for x c .

Following this process produces


n
P
mi xi
m1 x1 + m2 x2 + . . . + mn xn i =1
xc = = n . (5.5)
m1 + m2 + . . . + mn P
mi
i =1

We can use (5.5


5.5) to define the centre of mass of objects in more than one dimension. This gives
rise to the following definition.

Definition 19 (Discrete Centre of Mass) Consider a collection particles where the i -th particle has
mass mi and is located at position r⃗i . The centre of mass r⃗cm of the collection of particles is
n
P
mi r⃗i
i =1
r⃗cm = n . (5.6)
P
mi
i =1

Note in each case the appearance of the total mass


n
X
M= mi
i =1

in the definition of the centre of mass. We can decompose (5.6


5.6) into vector components in the
rectilinear x y z -coordinate system as follows,
n n n
1 X 1 X 1 X
xcm = mi xi , ycm = mi yi and z cm = mi z i .
M i =1 M i =1 M i =1

This is extendable to any given choice of coordinate system. The form of these component wise
decompositions of the centre of mass is reminicent of that found in (5.5
5.5) in the one dimensional
case. In the continuous approximation, the summation in each of these expressions is promoted
to an integral over a continuous variable, and gives rise to the following definition.

156
r)
Definition 20 (Continuous Centre of Mass) Consider a continuous mass distribution ρ(⃗
evaluated at position r⃗i . The centre of mass r⃗cm of the distribution is
R R
d m r⃗ d v ρ(⃗
r )⃗
r
V V
r⃗cm = R = R . (5.7)
dm d v ρ(⃗
r)
V V

Again, we have
Z
M= dm
V

and Z Z Z
1 1 1
xcm = dm x, ycm = dm y and z cm = dm z.
M M M
V V V

The centre of mass of a system is also the centre of gravity of the system in the sense that the force
of gravity tends to act at this point on a system of particles. We make this precise now.

Theorem 4 (Gravitational Potential Enegy of a System) The gravitational potential energy of a


system (discrete or continuous) is
Ugravitational = M g z cm .

Proof. Consider first the discrete case with N particles where the i -th particle has mass mi and
⊤
is located at position r⃗i = xi , yi , z i . Let the gravitational force act in the −ẑ -direction. Then the
gravitanional potential energy of the i -th particle is

(i )
Ugravitational = g mi z i .

The total gravitational potential energy of the system is then the sum of all gravitational potential
energies for all N particles,

N N N
X
(i )
X 1 X
Ugravitational = Ugravitational = g mi z i = g M mi z i = g M z cm .
i =1 i =1
M i =1

Similarly, in the continuous case,


Z Z Z
1
Ugravitational = dUgravitational = dm g z = g M d m z = g M z cm .
M

Therefore, the gravitational potential energy of the entire system can be computed easily if
the total mass and centre of mass of the system is known. Next we consider the effect of Newton’s
Second Law of motion as applied to a system of particles.

157
Theorem 5 (External Force Applied to a System) The net external force F⃗ (e) acting on a system of
particles will accelerate the system according to

F⃗ (e) = M a⃗cm .

Proof. Consider the discrete case with N particles where the i -th particle has mass mi and is
⊤
located at position r⃗i = xi , yi , z i . Split the total force in the system into the external force applied
(e )
to the system and internal force components applied by the particles on oneanother. Denote by F⃗i
the external force applied to the i -th particle and F⃗i j the force exerted on particle i by particle j .
Then the total force experienced by particle i is
X
F⃗i = F⃗ (e )i + F⃗j i .
j ̸=i

By Newton’s Third Law of motion, the force exerted on particle j by particle i is equal in magnitude
and opposite in direction to the force exerted on particle i by particle j , so

F⃗i j = −F⃗j i .

The net force on the entire system is then the sum of all forces acting on all particles,
X X X X XX X XX XX
F⃗ = F⃗i = F⃗ (e )i + F⃗j i = F⃗ (e )i + F⃗j i = F⃗ (e )i + F⃗j i − F⃗i j .
i i j ̸=i i i j ̸=i i i j <i i i<j

Clearly, the last two sums cancel, so the internal forces on the system cancel. The net force acting on
the system is then the sum of external forces acting on the system only
X
F⃗ = F⃗ (e )i = F⃗ (e ) .
i

By Newton’s Second Law of Motion, we can write the total force acting on the i -th particle as

F⃗i = mi r⃗¨i .

The position vector r⃗i can now be decomposed as the sum of two vectors, the position of the centre
of mass of the system r⃗cm and the position of the particle with respect to the centre of mass r̃⃗i , so

r⃗i = r⃗cm + r̃⃗i .

Then,
F⃗i = mi r⃗¨cm + r̃⃗¨i .
€ Š

Summing over all particles yields,


X
F⃗ = F⃗i
i

mi r⃗¨cm + r̃⃗¨i
X € Š
=
i

158
d2 X
mi r⃗¨cm +
X
= r̃⃗i
i
dt 2 i
X d2 1 X
mi r̃⃗i − r⃗cm

= a⃗cm mi + M
i
dt 2 M i
d2 1 X
‚ Œ
X
= a⃗cm mi + M mi r̃⃗i − r⃗cm
i
dt 2 M i

d2
= M a⃗cm + M r⃗cm − r⃗cm
dt 2
= M a⃗cm .

So the application of an external force at a system acts to cause the centre of mass of the system to
accelerate as if the force were acting on a single particle of mas M situated at the centre of mass of
the system. ■

Next we consider a simple example of computing the centre of mass for a uniform solid.

Example 5.2 (Centre of Mass of Uniform Hemisphere) Consider a uniform solid hemisphere of
radius R where the hemisphere is oriented so that equator of the sphere lies in the x y -plane with
the axis of symmetry of the hemispher passing through the origin. We may compute the centre
of mass of the sphere in a multitude of ways; here we consider one of them. For more on how to
compute these sorts of integrals, students are encouraged to consult references on multivariable
calculus and integration methods.
Observe that by symmetry, the centre of mass must lie along the line of symmetry, that passes
through the north pole of the hemisphere and the origin of the coordinate system, the (the z -axis).
We can see this by taking any point on the volume contained in the hemisphere and reflecting it
abouth the z -axis to produce a pair of points with centre of mass that lies on the z -axis. Hence, we
can think of integrating over the whole system as corresponding to summing over such pairs with
centre of mass on the z -axis. Therefore, the centre of mass of the entire system must lie on the z -axis.
We can formalise this by reducing the problem to a symmetric sum over point in some cross-section
of the hemisphere, where z -axis lies in the plane of the cross-section, and then evaluating the valume
of the sphere as a solid of rotation about the z -axis. Thus the x -centre of mass and the y -centre of
mass of the system must both be zero. If the paragraph above was not sufficient to convince you of
this fact, compute these integrals and make sure that they do indeed evaluate to zero.
What remains is to compute the z -centre of mass of the system, only. Using the rule for continuous
systems, we find R R
R d v ρz
dv z ρ
dm z V
Z
V 1
z cm = R = R = R = dv z,
dm dv ρ ρ dv V
V V V

where V is the volume of the hemisphere. Now, to evaluate the integrals explicitly, we consider a
convenient choice of coordinates. Cartesian coordinates can be used to perform this computation,
however the calculus is significantly simplified by considering spherical polarcoordinates. Students

159
are encouraged to implement Cartesian coordinates to evaluate these integrals. Recal that the
volume element in spherical polar coordinates is

d V = d r d θ d φ r 2 sin (θ )

where r is the radius of the coordinate so as to avoid confusion with the density ρ to appear. The
limits of integration for this problem are those coordinate parameters that allow for total volumetric
coverage of the hemisphere,
π
0≤r ≤R 0≤θ ≤ and 0 ≤ φ ≤ 2π
2
In addition, the z position of any mass element d m in the hemisphere is simply r cos (θ ). Then, the
centre of mass is
π
Z2 Z2πZR
1
z cm = d r d θ d φ r cos (θ ) r 2 sin (θ )
V
0 0 0
π
Z 2 Z2πZR
1
= d r d θ d φ r 3 cos (θ ) sin (θ )
V
0 0 0
π
Z Z2π2

1 R4 1
= dθ dφ sin (2θ )
V 4 2
0 0
π

4
Z2
1 R
= 2π d θ sin (2θ )
V 8
0
πR 4
= .
4V
Now, V is the volume of the Hemisphere, which is simply half the volume of a sphere of radius R , so
π π
Z2 Z2πZR 3
Z2
2πR 2
V = d r d θ d φ r 2 sin (θ ) = d θ sin (θ ) = πR 3 .
3 3
0 0 0 0

Direct substitution of this value for V into the integral for z cm yields,

πR 4 3 3
z cm = = R.
4 2πR 3 8
We conclude that the centre of mass of the uniform hemisphere of radius R is located on the central
axis of the hemisphere at a distance 82 R from the base.

5.1.3 Kinentic Energy of a System of Particles


The centre of mass palys an important role in the computation of the gravitational potential
energy of a system. Next we consider its relation to the kinetic energy of a system. To begin,

160
consider again the position of the i -th particle in a system specified relative to the centre of mass
of the system
r⃗i = r⃗cm + r̃⃗i ,

where, again r⃗cm is the position of the centre of mass and r̃⃗i is the position of the i -th particle
relative to the centre of mass. We now present the following theorem.

Theorem 6 (Kinetic Energy of a System of Particles) The kinetic energy of a system of N particles
where the i -th particle has mass mi and velocity ṽ⃗i relative to the centre of mass of the system which
has velocity v⃗cm is
N
1 1X
T = M v⃗cm · v⃗cm + mi ṽ⃗i
2 2 i =1
and is the sum of kinetic energy of the centre of mass and the internal kinetic energy of the constituent
components of the system.
Proof. Compute the total kinetic energy of the system as the sum of kinetic energies of each
particle in the system,
N N
1X
mi r⃗˙i · r⃗˙i
X
T= Ti =
i =1
2 i =1
N
1X
mi r⃗˙cm + r̃⃗˙i · r⃗˙cm + r̃⃗˙i
€ Š € Š
=
2 i =1
N
1X
mi r⃗˙cm · r⃗˙cm + 2r⃗˙cm · r̃⃗˙i + r̃⃗˙i · r̃⃗˙i
€ Š
=
2 i =1
N N
1X
mi r⃗˙cm · r⃗˙cm + r̃⃗˙i · r̃⃗˙i + mi r⃗˙cm · r̃⃗˙i
€ Š X
=
2 i =1 i =1
N N
1 1X d X
= M v⃗cm · v⃗cm + mi ṽ⃗i · ṽ⃗i + r⃗˙cm · rcm − r⃗i )
mi (⃗
2 2 i =1 d t i =1
N
1 1X d
= M v⃗cm · v⃗cm + mi ṽ⃗i · ṽ⃗i + r⃗˙cm · (M r⃗cm − M r⃗cm )
2 2 i =1 dt
N
1 1X
= M v⃗cm · v⃗cm + mi ṽ⃗i · ṽ⃗i .
2 2 i =1

Clearly,
N
1 1X
T = M v⃗cm · v⃗cm + mi ṽ⃗i · ṽ⃗i .
2 2 i =1
Therefore, the kinetic energy of a system of N particles is the sum of kinetic energy of the centre of
mass and the internal kinetic energy of the constituent components of the system. ■

If no further constraints ore imposed on the motion of the system then we cannot reason
further about the internal kinetic energy other than to compute it directly. An example of such
a system would be a gas or several unrelated particles. There is one class of system for which

161
the kinetic energy is relatively simple to calculate - these are solid objects that are not allowed to
deform by bending stretching or compressing. Such an object is called a rigid body. We discuss
these types of object next

5.2 Rigid Body Mechanics


Let us start with a useful definition.

Definition 21 (Rigid Body) A system of particles is called a rigid body if the distance between every
pair of particles in the system is fixed.

The constraint that the distance between every two particles is fixed is a Holonomic constraint.
Therefore, we expect that the degrees of freedom of a rigid body comprising N to be much fewer
than 3N degrees of freedom of N freely moving particles. Indeed, we consider some examples
below an then derive some general results for the degrees of freedom of a rigid body.

Example 5.3 (Degrees of Freedom of Rigid Bodies in 3-Dimensions) In three dimensions each
particle in the system has three degrees of freedom. Let d = 3 count the number of degrees of freedom
of a single particle in 3dimensions. Now consider the following cases

Two Particles: For two particles there are N = 2d = 6 degrees of freedom. If the body is rigid, then
there exits a single equation of constraint among the six degrees of freedom. This reduces the
number of degrees of freedom to six to N = 2d − 1 = 5.

Three Particles: For three particles there are N = 3d = 9 degrees of freedom. If the body is rigid,
then there exits one equation of constraint for each pair of particles, giving a total of three
constraints. This reduces the number of degrees of freedom from nine to N = 3d − 3 = 6.

Four Particles: For four particles there are N = 4d = 12 degrees of freedom. If the body is rigid, then
there exits one equation of constraint for each pair of particles, giving a total of six constraints.
This reduces the number of degrees of freedom from twelve N = 4d − 6 = 6.

Remark 30 For completeness, we should show that each equation of constraint is functionally
independent of the others. Alternatively, we could show that each constraint reduces the total degrees
of freedom by one by illustrating which motion is restricted by each constraint equation. We hall
proceed with the assumption that the constraints are functionally independent.

We now reason that any rigid body with more than four particles has six degrees of freedom.
To do this, first notice that the number of constraints increases quadratically as the number
of particles in the system is increased linearly. This means that the constraints must not be
functionally independent. Indeed, the claim that knowing the position of four particles on any
rigid body is sufficient to determine the positions of all other particles in the body. We prove this
next.

162
Proposition 1 (Trilateration of Points in Space) The positions of four known points in a rigid
body and their distances from any other point p in the body is enough to fix the position of p relative
to the four known points.
Proof. Consider a rigid body comprising five particles and suppose that the four known particle
positions are r⃗1 , r⃗2 , r⃗3 and r⃗1 . Now consider some arbitrary particle in the system with position
ri . The distances d 1i , d 2i , d 3i and d 4i between each of the four known particles and this particle
are known constraints. Given only r1 , we know that the relative position of ri with respect to
ri must be somewhere on the sphere of radius d 1i centred at r1 . Since the location of r2 is also
known, we also know that the ri must lie somewhere on the sphere of radius d 2i centred at r2 .
Therefore, for each of these condition to hold true, ri must lie on the intersection of these two spheres,
which is a circle. Adding a third point r3 implies that ri must now lie on the intersection of three
spheres. The intersection points of three spheres corresponds to two points in 3-dimensional space
corresponding to a pair of diametrically opposed points on the previously constructed circle of
interacting spheres. Finally, adding the information of the fourth point selects only one of these
points. Hence, the information from the four known particle positions r⃗1 , r⃗2 , r⃗3 and r⃗1 and their
corresponding distances d 1i , d 2i , d 3i and d 4i from r⃗i uniquely determines the position of r⃗i in the
3-dimensional space. So, knowing the position of four objects points in a rigid body is enough to
know where every single particle is in the body. ■

Remark 31 (Trilateration and Triangulation) The process by which the position of one point is
determined from its distance from a collection of other points is called trilateration, see Figure 5.2
5.2.
It is a related process to that of triangulation that is commonly implemented in global positioning
systems which is used for navigation.

Figure 5.2: The relative positions of tree points is completely fixed once the distances between
each pair of points is fixed. A triple of points is fixed in relative position by enforcing that the
distance between each pair of points is confined to points on the circumference of intersecting
circles of fixed radius. For each pair of points positioned in this way, there exists two choices for
the placement of a third equidistant point. The congruence of all such point placements fixes the
relative positions of the points.

163
Following the discussion above, it is clear that if the four known positions form a rigid body, then
the system requires only six generalised coordinations to fully specify its position and orientation
is space. Therefore, every rigid body comprising more than two particles has exactly six degrees
of freedom in 3-dimensional space.

5.2.1 Euler Angles


Following the discussion in the previous section, it is clear that a rigid body has a limited number
of degrees of freedom. This means that there is a finite number of parameters that we must specify
to uniquely determine the position and orientation of a rigid body in space. In 3-dimensional
space these degrees of freedom are specified by the following generalised coordinates

xcm , ycm , z cm , θ , φ, ψ

The generalised coordinates xcm , ycm and z cm specify the position in space of the centre of mass of
the rigid body and correspond to the translational degrees of freedom of the body. The remaining
generalised coordinates θ , φ and ψ specify the relative orientation of the rigid body in space. The
generalised coordinates θ and φ are already familiar to us from spherical coordinates and are
called the pitch and yaw of the body, respectively. The ψ coordinate specifies an additional degree
of freedom of the rigid body to rotate about its own axis and is known a roll. The pitch, yaw and
roll are commonly referred to as the Euler Angles and specify the position of a body subject to the
rotations about specific direction axes in space.
Now that we’ve seen that forces acting on the axis of rotation only further complicates the
orientation it becomes necessary to properly keep track of how the set of axes which are fixed in
the rigid body are changing relative to our frame. We shall do this using the Euler angles discussed
above as they are a set of angles which tell us how a rigid body is orientation. The Euler angels
together with the centre of mass vector produce 6 degrees of freedom which complete specify the
position and orientation of a rigid body. As a demonstration, let us s begin with a rigid body whose
set of axes coincides with our fixed set. How do we most efficiently get the body to some arbitrary

orientation where the basis vectors are ê x , ê y , ê z ? Consider the following four step process,

1. Since both sets of axes are aligned we have that the body’s set of axes are given by iˆ, jˆ, k̂ .


¦ ©
2. Rotate the body through an angle θ about k̂ . So the body’s axes are at ê x′ , ê y′ , k̂ .

3. Next, rotate the body through an angle ϑ about ê y′ to bring the body into an orientation
¦ ©
aligned with ê x′ , ê y′ , ê z .

4. Finally, rotate the body through an angle ψ about ê3 to bring the body into an orientation

ê x , ê y , ê z .

These three angles are precisely the Euler angles and we may express angular velocity in terms of
a the Euler angles as a vector
⃗ = θ̇ k̂ + θ̇ ê2 + ψ̇ê3
ω

164
In order for us to find the angular momentum or kinetic energy, we need to define the angular
velocity in terms of the principle axes {ê1 , ê2 , ê3 }. Note that,
 
− sin (θ )
k̂ =  0
 

cos (θ )

in the directions of {ê1 , ê2 , ê3 }. Now, if we are dealing with a body that has rotational symmetry
about ê3 , then it is not necessary to rotate ê1′ and ê2′ into ê1 and ê2 . Then {ê1 , ê2 , ê3 } forms a set of
principle axes and the angular velocity of the body is

⃗ = −θ̇ ê1 + θ̇ ê2 + ψ̇ + cos (θ ) θ̇ ê3 .
ω

Therefore, the motion of a rigid body can be understood as the composition of two
transformations, namely a translation to relocate the centre of mass of the body to some new
position and a rotation to re-orient the rigid body about its translated centre of mass. In the next
section we shall consider the effects of rotation on the motion of a rigid body.

5.2.2 Moment of Inertia


As seen previously, any motion of a rigid body can be decomposed into a translations and a
rotation. The rotation component of this motion can be fully described by specifying an axis of
rotation and an angle of rotation about the chosen axis. As a rigid body moves through space,
the axis might change, but at any instant the rotation of the body is about a fixed axis. Here be
consider the effect of rotation on the kinetic energy of a system.
Note that if a system rotates with angular velocity ω about some axis, then the speed of some
point in the system at a distance r⊥ from the axis of rotation is ωr⊥ . Following the development of
the motion of a single particle in spherical and polar coordinate systems, we note the contribution
to the kinetic energy of the particle corresponding to motion through angular displacements with
angular velocities θ̇ and φ̇ at some distance r from the centre of the coordinate system,

1
T = m ṙ 2 + r 2 θ̇ 2 + r 2 sin2 (θ ) φ̇ 2

2
where r sin (θ ) is the perpendicular distance of the particle from the axis of rotation. For a particle
with fixed θ position, we find
1
T= d m r⊥2 ω2 .
2
Clearly, the kinetic energy of this point is

1
T= d m r⊥2 ω2 .
2
If we fix some point on the axis, then we can write

v⃗ = r⊥ ωn̂ = r sin (θ ) ωn̂ = (â × r⃗) ω

165
where â is the unit vector pointing along the axis of rotation. For a discrete system, the speed of
the i -th particle about the axis of rotation is

vi = ∥â × r⃗i ∥ ω = ri ⊥ ω.

Summing all contributions to the rotational kinetic energy for a system containing N particles
gives
N
‚N Œ
1X 2 2 1 X 1
T= m i ri ⊥ ω = mi ri ⊥ ω2 = I (â , r⃗0 ) ω2
2
2 i =1 2 i =1 2

where ‚N Œ
X
I (â , r⃗0 ) = mi ri 2⊥
i =1

is the rotational inertia of the rigid body and accounts for all the contributions to the rotational
kinetic energy of the system about the â -axis with respect to reference point r⃗0 . We extend this
idea to the case of continuous mass distributions
Z
I (â , r⃗0 ) = d m r⊥2 .
V

The rotational inertia is dependent on our choice of axis and our choice of reference point. The
next theorem allows us to relate the rotational inertia with respect to one axis of rotation and
reference point to another reference point with the same axial direction.

Theorem 7 (Parallel Translation of Axis) Let the moment of inertia of a rigid body about some
axis â through its centre of mass be Icm . then the Moment of inertia of the same body through an
axis parallel to the first and at a distance d from it is

Id = Icm + M d 2 (5.8)

where M is the total mass of the system.


Proof. Suppose the displacement vector between the centre of mass and axis of rotation is d⃗.
The moment of inertia about the axis defined by the shift d⃗ is
Z
2
Ia x = d m r⊥,a x

V
Z
= d m r⃗⊥,a x · r⃗⊥,a x
V
Z
d m d⃗ + r⃗⊥,cm · d⃗ + r⃗⊥,cm
 
=
V
Z Z Z
= d m d⃗ · d⃗ + 2 d m d⃗ · r⃗⊥,cm + d m r⃗⊥,cm · r⃗⊥,cm
V V V

166
n̂ n̂

cm =⇒ cm

Figure 5.3: The rotational inertia is dependent on the choice of rotation axis. An object with mass
M and centre-of-mass c m has a corresponding inertia I c m . The inertia associated with rotating
the object about an axis at a distance d from the center-of-mass is then the sum of the inertia
that about the new rotation axis d 2 M and I c m .

Z Z Z
=d2 dm + 2 d m d⃗ · r⃗⊥,cm + d m r⃗⊥,cm · r⃗⊥,cm
V
ZV V

= d 2 M + 2d⃗ · d m r⃗⊥,cm + Icm


V

Note that d⃗ · r⃗⊥,cm = 0 for d⃗ perpendicular to r⃗⊥,cm . So,

Ia x = d 2 M + Icm

and corresponds to (5.8


5.8). ■

Remark 32 In the above proof, the integral only vanished because it involved the centre of mass!
This theorem therefore does not apply to two arbitrary parallel axes – but only to an arbitrary axis
and an axis through the centre of mass, see Figure 5.3
5.3.

Another Theorem relating the moment of inertia about different axes is the Lamina Theorem.
This applies exclusively to flat 2-dimensional objects being rotated in various ways. This theorem
is not discussed in this course. Next we consider a collection of illustrative examples.

Example 5.4 (Moment of Inertia of a Rod About One End) Consider a thin uniform rod of mass
M and length L . The rod is rotated about its left most end. Assume that the rod is sufficiently thin
that the vertical dimension of the rod con be ignored. We can measure the x -position of any point
along the rod relative to its left most end with the positive x̂ -axis directed along the length of the
rod.
Consider a small piece of the rod of mass d m . Since the rod is uniform, the length per unit
length along the extent of the rod is constant. Therefore, the rod has uniform line density,
M
ρ= .
L

167
The infinitesimal piece of mass d m is a function of length,
M
dm = ρ d x = dx
L
The contribution to the moment of inertia along the entire length of the rod is now an integral,
ZL ZL
M 2 M L3 1
I= d m r⊥2 = dx x = = M L 2.
L L 3 3
0 0

It is useful to consider the moment of inertia about a different point along its length. For
simplicity, consider rotation about the centre of mass of the rod. This is considered in the next
example.

Example 5.5 (Moment of Inertia of a Rod About its Centre of Mass) Consider the same setup as
is Example 5.4 with the centre of rotation placed at the centre of mass. The only change to the problem
now corresponds to choosing the bounds of integration to be − L2 , L2 . Students are encouraged to
 

motivate why this is true. Then,


L
Z2
M 1 L3 1
I= d m r⊥2 = = M L 2.
L 3 4 12
− L2

Notice how the position of the centre of rotation has an effect on the moment of inertia of the
rod is Example 5.4 and Example 5.5
5.5. We can explain this difference using Theorem 7 as follows.
The distance from the tip of the rod to the centre of mass of the rod is L2 , so the new axis of rotation
L
will be a distance 2 from the old one. Therefore, by Theorem 7,

Id = Icm + M d 2 ,

which implies that


 ‹2
1 2 2 L 1
Icm = Id − M d = M L − M = M L 2,
3 2 12
which matches the value that was calculated in Example 5.5
5.5.

Remark 33 When using Example 5.5


5.5, one of the axes must go through the centre of mass. If we
simply have two arbitrary axes at a distance d from each other then this theorem will not necessarily
hold. If we do wish to compare two axes, neither of which passes through the centre of mass, we
must compare both with the axis through the centre of mass separately.

5.2.3 Lagrange’s Equations for Rigid Bodies


When the system under consideration involves rigid bodies possibly translating and/or rotating
about known axes, then we are now well-equipped to solve for the motion of the system using
Lagrange’s Equations. For each rigid body, we write out the potential energy due to Gravity,

Ug = M g z cm .

168
If there are non-conservative forces acting on the rigid body, we recall that the net force acting on a
system is just the sum of external forces acting on the system and we can compute the generalized
components of force in terms of the net force F⃗ net ,

∂ xi
X  ‹
Qj = F⃗i net .
i
∂ qi

Finally, if the rigid body is rotating about an axis through its centre of mass and the angle of this
rotation is given by φ, then the Kinetic Energy can be written as

1 2 1
T = M vcm + Icm φ̇ 2 ,
2 2
where Icm is the moment of inertia of the body about the appropriate axis. With all of this in place,
we can compute the Euler-Lagrange equations and solve for the motion of the system.
We will now make use of our earlier computation of the moment of inertia of a uniform rod
rotated about its tip. We consider the problem of the rod-pendulum. Consider a thin uniform rod
of length used as a pendulum.

Example 5.6 (The Uniform Rod Pendulum) Consider a pendulum comprising a uniform,
inextensible rod that is able to swing about one end anchored at fixed pivot point. The pendulum
hangs and swings freely below this pivot point. The pendulum clearly has just one degree of
freedom, and a good choice of generalized coordinate is the angle with the vertical θ .
Symmetry clearly gives the centre of mass of the rod at half way along its length. The height of
L
this point is 2 cos (θ ) below the anchor point. Then, the potential energy of the rigid body is

L
Ug = M g hcm = −M g cos (θ ) .
2
If we choose as our axis of rotation the hinge between the rod and the block, then we notice that the
rod rotates about this point without translation. Thus we can compute the Kinetic Energy

1 1
T = I θ̇ 2 = M L 2 θ̇ 2 .
2 6
The Lagrangian for this system is

1 L
L = T − U = M L 2 θ̇ 2 + M g cos (θ ) .
6 2
The Euler-Lagrangian equation for this system is

1 L
M L 2 θ̈ = −M g sin (θ ) .
3 2
Simplifying this yields,
3g
θ̈ + sin (θ ) = 0.
2l
Not that this is reminiscent of the simple pendulum of length 23 L .

169
θ

Figure 5.4: A rod pendulum

Remark 34 The equation of motion is Example 5.6 has some noteworthy properties. The first of
these is that, like that of the simple pendulum, the equation of motion of the rod pendulum is
independent of the mass of the rod. The second is that under the relabelling L ′ = 23 L the equation of
motion for the rod pendulum is
g
sin (θ ) = 0
θ̈ +
L′
which is the equation of motion of a pendulum of length L ′ . We call L ′ the effective length of the
rod pendulum. Any rigid body pendulum has an equation of motion identical to a point pendulum
with some appropriate effective length L ′ .

5.2.4 Inertial Tensor


The problem of describing orientations in 2-dimensions requires specifying a single rotation that
confined about a given point in the plane. In this case, the the axis of rotation is clear. In more
than two dimensions there are considerably more directions about which to rotate a given rigid
body. The previous discussion on rotational inertia can be generalised to describe rotations about
any given axis of in 3-dimensions. This can be done by considering the Euler angle description of
rotation where the orientation of a rigid body in 3-dimensions is described relative to some set of
rotations about three mutually orthogonal axes.
In this section, we shall find great use for the angular velocity about some specific direction in
space. In particular, this angular velocity can be decomposed as a vector describing the rates of
rotation about a given set of orthogonal axes and is denoted
 
ω1
ω⃗ = ω2 
 

ω3

where each component ωi is the angular velocity about a given axis êi . In terms of Euler angles,
⃗ can be rewritten as
ω

⃗ 1 · ê1 ) ê1 + (ω
⃗ = (ω
ω ⃗ 2 · ê2 ) ê2 + (ω
⃗ 3 · ê3 ) ê3

170
= θ̇ ê1 + θ̇ ê2 + ψ̇ê3

⃗ 1 · ê1 is the magnitude of ω


where ω ⃗ along the êi -axis.
Equipped with the Euler angles to describe orientations, we generalise the angular inertia
of a rigid body to 3-dimensions. Before do so, consider the following thoughts. Presently, with
the exception of a few special cases (like the Lamina theorem), if we change the direction of the
axis about which we wish to know the moment of inertia then the computation itself needs to
be started from scratch. This, it transpires, is highly redundant. In fact, with a small amount of
pre-computation it is possible to derive an expression for the moment of inertia when rotating an
object about any arbitrary axis.
We begin by directly asking the question: what is the moment of inertia about an arbitrary
axis? Recall the expression for the moment of inertia about some axis is
Z
I= d m r⊥2
V

where r⊥ is defined as the perpendicular distance to the axis in question. Ignoring the integration
for the moment, we unpack what the distance means r⊥ in the context of an arbitrary axis, whose
(unit) direction vector is given by n̂ ,

r⊥2 = r⃗⊥ · r⃗⊥


r − (⃗
= (⃗ r · n̂)) · (⃗
r − (⃗
r · n̂ ))
r · n̂ )2 + (⃗
= r⃗ · r⃗ − 2 (⃗ r · n̂ ) n̂ · n̂
r · n̂ )2
= r⃗ · r⃗ − (⃗
r · n̂)2 .
r ∥2 (n̂ · n̂ ) − (⃗
= ∥⃗

Note that we have used the fact that n̂ · n̂ = 1. We can write this in a more convenient manner
that is related to matrix multiplication. Recall that the dot product in matrix algebra corresponds
to multiplication of a covector and a vector, or simply the product of a row vector and a column
vector. So we rewrite the above computation as
2
r⊥2 = ∥⃗r ∥2 n̂ ⊤ n̂ − r⃗⊤ n̂

2
r ∥2 n̂ ⊤ n̂ − r⃗⊤ n̂

= ∥⃗
= n̂ ⊤ ∥⃗
r ∥2 1n̂ − r⃗⊤ n̂ r⃗⊤ n̂
 

= n̂ ⊤ ∥⃗
r ∥2 1n̂ − n̂ r⃗⊤ r⃗⊤ n̂
 

= n̂ ⊤ ∥⃗
r ∥2 1n̂ − n̂ ⊤ r⃗ r⃗⊤ n̂
= n̂ ⊤ ∥⃗r ∥2 1 − r⃗ r⃗⊤ n̂


where 1 is the identity matrix. We can evaluate the quantity in the brackets explicitly,
   
1 0 0 x
2 ⊤ 2 2 2 
   € Š
∥⃗r ∥ 1 − r⃗ r⃗ = x + y + z 0 1 0 −  y  x y z
0 0 1 z

171
   
x2 + y 2 +z2 0 0 x2 xy xz
= 0 x2 + y 2 +z2 0  − y x y2 y z
   

0 0 x2 + y 2 +z2 zx zy z2
 
y 2 +z2 −x y −x z
=  −y x x2 +z2 −y z  .
 

−z x −z y x2 + y 2

So,  
y 2 +z2 −x y −x z
r⊥2 = n̂ ⊤  −y x x2 +z2 −y z  n̂.
 

−z x −z y x2 + y 2
We an now substituted this into the equation of the moment of inertia to find
 
Z Ix x Ix y Ix z
I = d m r⊥2 = n̂ ⊤ I y x I y y I y z  n̂ , (5.9)
 

v Iz x Iz y Iz z

where,
Z Z
2 2
Ix x = dm y + z , Ix y = Iy x = − dm x y ,
V V
Z Z
Iy y = d m x 2 + z 2, I x z = Iz x = − dm x z,
V V
Z Z
Iz z = d m x 2 + y 2, I y z = Iz y = − dm y z.
V V

Notice that the matrix above is independent of the axis about which the object is being rotated - it
is a property of the object itself. We call this matrix the Inertia Tensor of the object and denote it
 
Ix x Ix y Ix z
I⃗ = I y x I y y I y z  . (5.10)
 

Iz x Iz y Iz z

The inertia tensor is symmetric and its entries are real, positive numbers. Once we’ve computed
the inertia tensor, it is very easy to compute the moment of inertia when the object is rotated
about any axis whatsoever.
We can think of the angular velocity of an object as a vector ω
⃗ whose direction indicates the
axis about which the object is (instantaneously) rotating, and whose length gives the magnitude
of the angular velocity. Using this notation we have

⃗ = ωn̂
ω

and we con write the rotational component of the kinetic energy of the rigid body as
1 1 1  1 ⊤ 2
T = I ω2 = n̂ ⊤ Iˆn̂ ω2 = (ωn̂)⊤ Iˆ ω2 n̂ = ω⃗ Iˆω
⃗ .
2 2 2 2

172
Thus the Kinetic Energy due to rotation of a rigid body can be written directly in terms of the
Inertia tensor and the angular velocity vector as above.

5.2.5 Principal Axes


As a matrix, the moment of inertia tensor is real and symmetric. This means that its eigenvalues
are real numbers, and its eigenvectors are pairwise orthogonal real vectors. It follows that the
Inertia Tensor can always be diagonalised by an orthogonal matrix (the matrix of its eigenvectors).
Geometrically, we can interpret this last statement as affirming the existence of a Cartesian
coordinate system in which the inertia tensor is diagonal. The Cartesian direction vectors in this
coordinate system are known as the principal axes of the object under study. Generally these axes
can be guessed from the symmetry of the object. In principal axes, the inertia tensor looks like
 
I1 0 0
Iˆ =  0 I2 0  .
 

0 0 I3

Working in these axes is beneficial and simplifying. For instance, the (rotational) Kinetic Energy
can now be written as
1 1 1
T = I1 ω21 + I2 ω22 + I2 ω22 .
2 2 2
Next we consider the classic example of a uniform spinning top.

Example 5.7 (The Uniform Spinning Top) Consider the uniform cone spinning top, where the
apex of the cone is fixed at the origin and is allowed to rotate in any way about this fixed point
(including spin, precession and nutation). In order to solve for the motion of the top, we must
first compute the inertia tensor for the top as a rigid body. We do this in a system of axes with the
ẑ -axis pointing along the axis of symmetry of the cone, and the x̂ and ŷ axes chosen arbitrarily as
orthogonal axes perpendicular to the ẑ -axis. Let the cone have a height h and base radius R . Then
we can parameterize the cone in cylindrical coordinates as
z
0≤z ≤h 0 ≤ φ ≤ 2π and 0≤r ≤ R.
h
Clearly then all integrals will occur over this region. Since the cone is a uniform rigid body, the cone
has constant density ρ. The total mass of the cone is
Z Z
1
M= dm = d V ρ = πh R 2 .
3
V V

Clearly,
3M
. ρ=
πhR 2
We can also compute the components of the inertial tensor,
z
Z Z2π Zh Zh R
€ 2 Š ρ 3M
dV y 2 + z2 = ρ + z 2 = πh R 2 = R 2 + 4h 2 .

Ix x = ρ dφ dz dr r r sin φ
20 20
V 0 0 0

173
by symmetry, we also find
3M
R 2 + 4h 2 .

Iy y =
20
Next,
z
Z Z2π Zh Zh R
€ 2 2 Š
Iz z = ρ dV x2 + y 2 = ρ dφ dz dr r r sin φ + r cos φ
V 0 0 0
z
R
Z2π Zh h
Z
3M
= dφ dz dr r2
πh R 2
0 0 0
3
= M R 2.
10

It turns out that all the off diagonal terms are zero in this case. This means that the axes we have
chosen are principal axes for the cone. The inertial tensor is
 
R 2 + 4h 2 0 0
3M 
Iˆ = 0 R 2 + 4h 2 0 .

20

0 0 2R 2

The kinetic energy of the top can be computed immediately in terms of angular velocities ω1 , ω2
and ω3 ,
3M  3M 2 2
R 2 + 4h 2 ω21 + ω22 +

T= R ω3 .
40 20
These are not candidates for generalized coordinates, however, because they are quantities expressed
inside a frame of reference living “on” the top itself. It remains to find suitable generalized coordinates
for this problem and write the Kinetic Energy in terms of them. In this example, we choose Euler

angles θ , φ, ψ , so
 
ω1 = φ̇ sin (θ ) sin ψ + θ̇ cos ψ
 
ω2 = φ̇ sin (θ ) cos ψ − θ̇ sin ψ
ω3 = φ̇ cos (θ ) + ψ̇.

Then, the Kinetic energy of the spinning top is

3M  3M 2 2 2
R 2 + 4h 2 θ̇ 2 + φ̇ 2 sin2 (θ ) +

T= R ψ̇ + φ̇ cos (θ ) .
40 20

The potential energy is given by the height of the centre of mass of the cone. Clearly, the centre of
mass lies on the axis of symmetry of the cone. To determine the height of the centre of mass, we
compute
Z
1 3
z cm = = h.
M 4
dm z

174
So, the centre of mass is three quarters of the height of the cone above its apex. Then, in the body-axis
of the cone,
3
r⃗cm = h ẑ ,
4
and using the Euler angles, we find
  
sin (θ ) sin φ
3  
r⃗cm = h − sin (θ ) cos φ  .
4
cos (θ )

The ẑ -component of this vector gives the centre of mass, so we write the potential energy as

3
U = M g h cos (θ ) .
4
Therefore, the Lagrangian for this system is

3M  3M 2 2 2 3
R 2 + 4h 2 θ̇ 2 + φ̇ 2 sin2 (θ ) +

L= R ψ̇ + φ̇ cos (θ ) − M g h cos (θ ) .
40 20 4
Notice that each term contains a factor M . Following our experience computing the Euler-Lagrange
equations, we note that the motion of the top will be independent of the mass M .

Remark 35 There are many other facts that can be deduced directly from the Lagrangian, and
indeed it is unwise to proceed directly from here to the Equations of Motion without first extracting
a lot of very useful information. In the next part of the course we will discover how to extract
information directly from the Lagrangian, and hence to make powerful statements about complex
differential equations. However, these discussions are deferred to another course on this topic.

5.3 Exercises
Exercise 5.1 Given the density of a Hemisphere of Radius R which varies radially according to
ρ(r ) = c r for some constant c ,
2M
1. Show that c = πR 4.

2. Compute the centre of mass of the Hemisphere.

175
Appendices

177
Appendix A

General Mathematics

Here we shall review some general mathematical tools for use in this course.

A.1 Differentiation
Here we shall review some details differentiation. The analysis that follows makes use of the
following classical results from trigonometry
  
sin x + y = sin (x ) cos y + cos (x ) sin y
  
cos x + y = cos (x ) cos y − sin (x ) sin y

and Real Analysis


cos (δx ) − 1 sin (δx )
lim =0 and lim = 1.
δx →0 δx δx →0 δx
Now consider the following derivatives

d sin (x + δx ) − sin (x )
sin (x ) = lim
dx δx →0
δx
cos (δx ) − 1 sin (δx )
§ ‹  ‹ª
= lim sin (x ) + cos (x )
δx →0 δx δx
cos (δx ) − 1 sin (δx )
§  ‹ª §  ‹ª
= lim sin (x ) + lim cos (x )
δx →0 δx δx →0 δx
cos (δx ) − 1 sin (δx )
= sin (x ) lim + cos (x ) lim
δx →0 δx δx →0 δx
= cos (x ) .

Similarly,

d cos (x + δx ) − cos (x )
cos (x ) = lim
dx δx →0
δx
cos (δx ) − 1 sin (δx )
§ ‹  ‹ª
= lim cos (x ) − sin (x )
δx →0 δx δx
cos (δx ) − 1 sin (δx )
§  ‹ª §  ‹ª
= lim cos (x ) − lim sin (x )
δx →0 δx δx →0 δx

179
cos (δx ) − 1 sin (δx )
= lim cos (x ) lim − lim sin (x ) lim
δx →0 δx →0 δx δx →0 δx →0 δx
= − sin (x ) .

Clearly,
d d
sin (x ) = cos (x ) and cos (x ) = − sin (x ) .
dx dx
We shall use of these results in sections that follow.

A.2 Power Series


Here we shall consider the statement of Taylor’s Theorem and its proof. There are may ways to prove
Taylor’s Theorem and the choice presented here uses the Mean Value Theorem. This construction
of the proof gives a direct relation between the form of the series expansion approximation to a
given function and the form of the error of this approximation.
Taylor’s Theorem gives a procedure to generate a polynomial approximation to the value of a
function in the vicinity of a point and an estimate for the corresponding error in the approximation.
The difference between the true value of the function and its polynomial approximation is called
the remainder. The Mean Value Theorem is a statement relating the ratio of the differences of two
continuous and differentiable functions to the ratio of their first derivatives.

Remark 36 The version of Taylor’s theorem below has been written in a form that will be most useful
for our purposes, however the more familiar form of this theorem is easily recovered by defining
a = x + δx and expressing each statement in terms of x and δx in terms of x and a . Similarly, the
statement δx → 0 becomes x → a .

Theorem 8 (Taylor’s Theorem) Let n ≥ 1 be an integer and let the function f : R → R be (n + 1)-
times differentiable at the point x ∈ R for which
 n ‹
d f
≤ AB n n !.
dxn

Then there exists functions Tn (δx ) and Rn (ξ) such that

f (x + δx ) = Tn (x , δx ) + Rn (δx )

where
n
X f (k ) (x ) f (n+1) (ξ)
Tn (x , δx ) = (δx )k and Rn (δx ) = (δx )n+1
k =0
k! (n + 1)!
where x < ξ < x + δx and
lim Rn (δx ) = 0.
δx →0

Proof. Define the following auxiliary functions


n
X f (k ) (t )
Fn,s (t ) = (s − t )k and Gn,s (t ) = (s − t )n+1
k =0
k !

180
and note that
n
X f (k ) (x )
Fn,x +δx (x ) = (δx )k and Fn ,x +δx (x + δx ) = f (x + δx )
k =0
k!

and
Gn,x +δx (x + δx ) = 0.

It is clear that these functions are continuous and n-times differentiable whenever f is n -times
differentiable. The Generalised Mean Value Theorem gives the following result

Fn,x +δx (x + δx ) − Fn,x +δx (x ) F ′ n,x +δx (ξ)


=
Gn,x +δx (x + δx ) − Gn,x +δx (x ) G ′ n,x +δx (ξ)

where x < ξ < x + δx , and


n
′ d X f (k ) (t )
F n,x +δx (t ) = (x + δx − t )k
d t k =0 k !
n n −1 (k +1)
X f (k +1) (t ) X f (t )
= (x + δx − t )k − (x + δx − t )k
k =0
k! k =0
k!
f (n+1) (t )
= (x + δx − t )n
n!
and
d
G ′ n ,x +δx (t ) = (x + δx − t )n +1 = −(n + 1)(x + δx − t )n .
dx
Rewriting the statement of the Mean Value Theorem gives
 F ′ n,x +δx (ξ)
Fn,x +δx (x + δx ) = Fn ,x +δx (x ) + Gn,x +δx (x + δx ) − Gn,x +δx (x ) .
G ′ n,x +δx (ξ)

Then, direct substitution of the definitions of the auxiliary functions into the statement of the
Mean Value Theorem and simplifying gives

f (n+1) (ξ)
f (x + δx ) = Fn,x +δx (x ) + (δx )n+1 .
(n + 1)!
Defining
f (n+1) (ξ)
Tn (x , δx ) = Fn,x +δx (x ) and Rn (δx ) = (δx )n+1
(n + 1)!
allows to write
f (x + δx ) = Tn (x , δx ) + Rn (δx )

where Tn (x , δx ) is the n -th order Taylor polynomial approximating f in the vicinity of x and
Rn (δx ) is the remainder corresponding to the error in the polynomial approximation to f as a
function of the displacement δx away from x . Now suppose that

dn f
 ‹
≤ AB n n !
dxn

181
for fixed, positive constants A, B ∈ R. Then

f (n+1) (ξ)
|Rn (δx )| = (δx )n+1 ≤ AB n+1 (δx )n+1
(n + 1)!

and
lim |Rn (δx )| ≤ AB n+1 lim (δx )n+1 = 0
δx →0 δx →0

such that Rn (δx ) → 0 as δx → 0. ■

Remark 37 The mild condition


dn f
 ‹
≤ AB n n !
dxn
on the behaviour of the value of the derivatives of f in the theorem above Theorem 8 is sufficient
condition to ensure the convergent of the series expansion of f in the limit that n is taken to infinity.
To see this, we need only consider the case where

dn f
 ‹
= AB n n !
dx n

since this defines an upper bound on all functions of the type described in Theorem 8. Then,
including all terms in the infinite series expansion of f gives

X
f (x + δx ) = f (x ) + AB n (δx )n
n =1
B δx
= f (x ) + A
1 − B δx
where we have used the identity

X 1
xn =
n =0
1− x
to rewrite the series in δx . Clearly,

lim f (x + δx ) = f (x ).
δx →0

However, this functional form contains a singularity at δx = 1/B . This sets a limit on the range of
values for which the Taylor series coincides with the value of the function, namely

0 ≤ δx < 1/B.

This restriction on the value of δx also ensures that the remainder goes to zero. Note, however that
the summed expression for δx is not a polynomial and contains a singularity. The Taylor series
expansion is valid when δx < 1/B but the sum of the geometric series is valid for all values of δx
excluding 1/B . The behaviour of f is discontinuous about δx = 1/B , corresponding to a transition
from the region where f and its Taylor series agree to a region where they to not. This transition is
known as a Stokes Phenomenon.

182
Remark 38 The Stokes phenomenon is associated with critical behaviour of systems modelled
by Gevrey-1 type functions, whose theory is described in advanced courses in Analysis. It also
demonstrates that functions whose derivatives grow faster than a given bound described here
contain singularities and are not differentiable at a given point. While the intrepid student may be
interested in learning more about these functions and how they arise in this course, it should be
noted that in most cases the functions that we shall encounter are generally free of such pathologies
or appear in regions sufficiently far from the regions of interest so as to not enter into our calculations.

A.3 Vector Products


A vector is an element of a vector space. This means that a vector a⃗ is an element of a set V which
has the following composition rules:

⃗ ∈ V , there exists a vector c⃗ = a⃗ + b


Addition: Given any two vectors a⃗, b ⃗ , called the sum of a⃗ and
⃗ , for which c⃗ ∈ V .
b

Scalar Multiplication: Given a vector a⃗ ∈ V and any scalar c ∈ S , there exists a vector b ⃗ = c a⃗,
such that b⃗ ∈ V . If the set of scalars S is the set of real numbers R then V is called a real
vector space, and if S is the set of complex numbers C then V is called a complex vector
space.

In this way it is clear that vector spaces are closed under the operations of addition of vectors and
multiplication of vectors by scalars.
This description of a vector is an abstraction of the commonly referred to statement that ‘a
vector is an object that has both a magnitude and a direction’. Indeed, this general description is
sufficient for our purposes provided that we can understand the following two ideas. Consider
the vector a⃗ ∈ V , then

Magnitude: the magnitude ∥a⃗∥ ∈ R is a real number that defines the ‘length’ of a⃗ in V .

Direction: the ‘direction’ a⃗/ ∥a⃗∥ ∈ V is vector with magnitude 1 and has the same orientation
of a⃗ in V . We can compare the direction of any two vectors in V independently of their
magnitudes.

In general, a vector space V is defined by the span of its basis vectors B. This means that there
exists a nonunique set of ‘direction vectors’ B = {e⃗1 , e⃗2 , . . . , e⃗n } such that every vector in v⃗ ∈ V can
be expressed as a linear combination of vectors in B,
n
X
v⃗ = v k e⃗k
k =1

where the ordered sequence of coefficients v 1 , v 2 , . . . , v n are called the components of v⃗ and the


natural number n is called the dimension of V .

183
A basis in which every element has unit length is called normal. A basis in which every element
is orthogonal to every other element is called orthogonal. A basis which is both normal and
orthogonal is called orthonormal.

Remark 39 It is computationally convenient to choose orthonormal bases and we shall regularly


do so.

There are several useful conventions to express vectors in terms of these components that
we consider when the need arises. When explicit reference to the components of a vector is
unnecessary, there are also several useful notations to express vectors in terms of magnitude and
direction. It is common to write v = ∥v⃗∥ to denote the magnitude of v⃗ and v̂ to denote the vector,
of unit magnitude, in the direction of v⃗. Then,

v⃗ = v v̂

⃗ ∈ Rn and
and we call v̂ a unit vector. For the remainder of this discussion, consider vectors a⃗, b
expressed in terms of the normalised basis.

Remark 40 We use the notation ∥·∥ to represent the function that maps a given vector to its
magnitude. Explicitly,
∥·∥ : V → R+ ∪ {0}

where R+ ∪ {0} is the set of nonnegative real values. The specifics of this function depend on the
choice of basis, however, its value is independent of the choice of basis. In contrast, there exists a
different function written |·| that represents a function from a real number to a positive real number
and is defined by

|·| : R → R+ ∪ {0}
(
−x if x < 0
|x | =
x if 0 ≤ x .

It is clear that the norm ∥·∥ and absolute value |·| are different mathematical objects that act on
different spaces and serve distinct purposes.

Next we consider the matter of vector composition via multiplication. The core property of
multiplication, as a mathematical operation, is that it is distributive over addtion. The following
constructions of vector products is predicated on the definition of multiplication, however defined,
carries this property. Below we consider two useful definitions of vector multiplication that have
this property.
First consider the case of a product of two vectors that gives a real number. This product is
often called the dot or scalar product of vectors and is defined independently of the number of
dimensions.

184
⃗ , their dot product is given by
Definition 22 (The Dot Product) Given vectors a⃗ and b

⃗ = ∥a⃗∥ b
a⃗ · b ⃗ cos (θ ) (A.1)

⃗.
where θ is the angle between vector a⃗ and b

Consider two vectors p⃗ and q⃗ that span a 2-dimensional plane P ⊂ Rn . In this plane, we can
consider any pair of unit basis vectors B = {ê1 , ê2 } such that, relative to this basis,

p⃗ = p 1 ê1 + p 2 ê2 = p cos (α) ê1 + p sin (α) ê2 q⃗ = q 1 ê2 + q 2 ê2 = q cos β ê1 + q sin β ê2
 
and

where α and β are respectively the angles denoting the orientation of each vector with respect to
the basis B in P . Then,

p⃗ · q⃗ = p 1 q 1 + p 2 q 2
 
= p⃗ cos (α) q⃗ cos β + p⃗ sin (α) q⃗ sin β
 
= p⃗ q⃗ cos (α) cos β + sin (α) sin β

= p⃗ q⃗ cos α − β

where α − β measures the angle between p⃗ and q⃗ . Additionally, since cosine is an even function
cos (θ ) = cos (−θ ), we recover
 
p⃗ · q⃗ = p⃗ q⃗ cos α − β = q⃗ p⃗ cos β − α = q⃗ · p⃗

such that the value of the dot product is independent of the order of the multiplication.
A key idea in the construction of vector products is that multiplication is distributive over
addition. This is clear from the expansion

(a + b )(c + d ) = a c + a d + b c + b d

where a , b , c , d ∈ R. We now consider the distributive property of the dot product over addition.
⃗ where a⃗ = a â and b
To this end, suppose c⃗ = a⃗ + b ⃗ = b b̂ and consider the product

⃗ · a⃗ + b

 
c⃗ · c⃗ = a⃗ + b
⃗ +b
= a⃗ · a⃗ + a⃗ · b ⃗ · a⃗ + b
⃗ ·b

2
= ∥a⃗∥2 cos (0) + b
⃗ ⃗ cos (θ )
cos (0) + 2 ∥a⃗∥ b
= a 2 + b 2 + 2a b cos (θ ) .

Now suppose that â and b̂ are oriented such that the angle θ between them is π2 , then cos (θ ) = 0
and
∥c⃗∥2 cos (0) = a 2 + b 2

which is reminiscent of the Pythagorean Theorem. Then, it follows that the dot product is a map
from the space of pairs of vectors to the space of real numbers

·:V ×V →R

185
and the norm of a vector is a special case of the dot product such that for all v⃗ ∈ V ,

v⃗ · v⃗ = ∥v⃗∥2 .

More generally, we find that, given an n -dimensional orthonormal basis, the dot product of vectors
⃗ ∈ Rn follows directly from the definition of the dot product and the distributivity of addition
a⃗, b
over multiplication,
! ‚
n n n X
n n
Œ
X X X X
⃗=
a⃗ · b j
a ê j · k
b êk = j k
a b ê j · êk = ajb j
j =1 k =1 j =1 k =1 j =1

where (
 1 if j = k
ê j · êk = cos θ j k =
0 if j ̸= k
π
since the angle θ j k between vectors ê j and êk is zero when j = k and 2 otherwise, when given an
orthonormal basis. Then
n
X
⃗=
a⃗ · b ajb j
j =1

which recovers the familiar form of the dot product in component form, as listed in
Equation (2.1
2.1). Clearly, the dot product of the vector with itself recovers the familiar sum of
squares of it components and generalised the form of the Pythagorean Theorem to n -dimensions.
Similarly, we can construct a product of vectors that gives a vector. This product is often called
the cross product of vectors and is defined only in three dimensions.

⃗ , their cross product is given by


Definition 23 (The Cross Product) Given vectors a⃗ and b

⃗ = ∥a⃗∥ b
a⃗ × b ⃗ sin (θ ) n̂ (A.2)

⃗ and n̂ is the unit vector that is normal to the plane


where θ is the angle between vector a⃗ and b
⃗.
containing a⃗ and b

Then using the definition of the cross product,

⃗ × a⃗ + b

 
c⃗ × c⃗ = a⃗ + b
⃗ +b
c 2 sin (0) n̂ = a⃗ × a⃗ + a⃗ × b ⃗ × a⃗ + b
⃗ ×b



⃗ = a b â × b̂ + b̂ × â

0

which suggests that


â × b̂ = −b̂ × â (A.3)

where n̂ is a unit vector that is orthogonal to both â and b̂ whose direction is determined up to
a sign. This shows the anti-commutativity of the cross product. This sign is fixed by choice of
orientation convention that we choose at the outset. Note that n̂ is uniquely defined in three
dimensions only since the orientation of n̂ is uniquely determined only for three dimensions.

186
This is obvious since in higher dimensions there are more orthogonal directions resulting in an
ambiguity that cannot be resolved by any convention. It follows that the cross product is a map
from pairs of vectors to vectors, where

×:V ×V →V

and the Cartesian product V × V = V 2 is the space of all pairs of vectors in V .


Similarly, we construct a component form of the cross product given a 3-dimensional, right-
⃗ ∈ R3 , then using the definition of the cross
handed, orthonormal basis. Consider vectors a⃗, b
product and the distributivity of addition over multiplication,
!
3
‚ 3 Œ 3 X
3
X X X
⃗=
a⃗ × b j
a ê j × k
b êk = a j b k ê j × êk
j =1 k =1 j =1 k =1

where

ê1 × ê2 = ê3 ê2 × ê3 = ê1 ê3 × ê1 = ê2 .

Then, using the anti-commutativity of the cross product from (A.3


A.3), we find

⃗ = a 2 b 3 − a 3 b 2 ê1 − a 1 b 3 − a 3 b 1 ê2 + a 1 b 2 − a 2 b 1 ê3


  
a⃗ × b (A.4)

which recovers the familiar form of the cross product in component form. Clearly, the cross
product of the vector with itself is zero.

Remark 41 There is a theorem regarding the existence of normed division algebras that uniquely
determines the existence of higher dimensional vector products. It can be shown that division
algebras are uniquely defined in 0, 1, 3, and 7 dimensions, only. The unique division algebra in one
dimension corresponds to the Complex Numbers, while that of three dimensions corresponds to the
Quaternions and in seven dimensions there exist the Octonions. There exist no other dimensions
for which a division algebra exists. Interested students may encounter these objects in advanced
courses in Algebra or Group Theory courses, however, we shall not consider these objects in this
course.

⃗,
A consequence of Definition 22 and Definition 23 is that given a⃗ and b

⃗ · a⃗ = a⃗ × b
a⃗ × b ⃗ ·b
⃗ =0
⃗. (A.5)

The definition of the dot product an cross product ensure that (A.5
A.5) is uniquely and unambiguously
defined. Clearly, a⃗ and b⃗ are colinear when θ = 0 and antilinear when θ = π. Similarly, these
⃗ contribute
vectors are orthogonal when θ = π2 . Moreover, only colinear components of a⃗ and b
to the value of their dot product, while only orthogonal components contribute to their cross
product.

187
A.4 Determinants
We have already seen the appearance of the determinant of a matrix in the context of orthogonal
transformations and in the signed area and volume elements. Since the determinant shall appear
in several guises throughout these notes, it is instructive to consider what the determinant of
a matrix is on a more fundamental level. To this end, the following examples demonstrate the
invertible linear system.

Definition 24 (Invertible Matrix) Suppose A is a square matrix and there exists a matrix B such
that
AB = B A = 1

then A is invertible and B is the inverse of A. If no such matrix B exists, then A is singular.

Example A.1 (Invertible 2 × 2 Linear System) Consider the linear system

a 11 x 1 + a 12 x 2 = b 1
a 21 x 1 + a 22 x 2 = b 2

which we write matrix notation as



A x⃗ = b

where
‚ Œ ‚ Œ ‚ Œ
a 11 a 12 x1 b1
A= x⃗ = and ⃗=
b .
a 21 a 22 x2 b2

Eliminating x 1 from these equations gives

a 11 b 2 − a 21 b 1
x2 = .
a 11 a 22 − a 21 a 12

Similarly, eliminating x 2 gives


1
a 22 b 1 − a 12 b 2
x = .
a 11 a 22 − a 21 a 12
Define
det (A) = a 11 a 22 − a 21 a 12 .

In each case, the system has non-trivial solutions only when

det (A) ̸= 0.

Then, the inverse of this system reduces to

a 22 b1 − a 12 b2 a 11 b2 − a 21 b1
x1 = x2 = .
det (A) det (A)
Written in matrix form, this becomes
x⃗ = A −1 b

188
where ‚ Œ
1 a 22 −a 12
A −1 =
det (A) −a 21 a 11
is the inverse of A and det (A)is a sum of signed elementary products of elements of A.

The next example gives a more general presentation for the inverse of a matrix.

Example A.2 (Invertible 3 × 3 Linear System) Consider the linear system

a 11 x 1 + a 12 x 2 + a 13 x 3 = b 1
a 21 x 1 + a 22 x 2 + a 23 x 3 = b 2
a 31 x 1 + a 32 x 2 + a 33 x 3 = b 3 .

We can rewrite this system as



A x⃗ = b

where
     
a 11 a 12 a 13 x1 b1
A = a 21

a 22 a 23 

x⃗ =  x 2 
 
and ⃗ =
b b 2  .

a 31 a 32 a 33 x3 b3

Again, the system can be inverted to give

x⃗ = A −1 b

where  
a 22 a 33 − a 23 a 32 a 13 a 32 − a 12 a 33 a 12 a 23 − a 13 a 22
1
A −1 =
 2 3
a 3 a 1 − a 21 a 33 a 11 a 33 − a 13 a 31 a 13 a 21 − a 11 a 23 

det (A)
a 21 a 32 − a 22 a 31 a 12 a 31 − a 11 a 32 a 11 a 22 − a 12 a 21
is the inverse of A which is well defined provided that

det (A) = a 11 a 22 a 33 − a 12 a 21 a 33 + a 13 a 21 a 32 − a 11 a 23 a 32 + a 12 a 23 a 31 − a 13 a 22 a 31

is non-zero. Again, det (A) is a sum signed elementary products of elements of A.

The presence of a sum of signed elementary products of elements in the previous examples
is a common feature of matrix inverses. The following definition is useful in the description of
matrix inverses and on the solubility of linear systems.

Definition 25 (Matrix Determinant) Suppose A is an n × n square matrix. The determinant of A,


denoted det (A) is the sum of all signed elementary products of A,
X
det (A) = sgn (σ) A 1σ(1) A 2σ(2) . . . A n σ(n) (A.6)
σ∈Sn

where σ is an element of the group of permutations of n elements is the j -th permutation of n


elements, and sgn (σ) is a function that returns +1 if σ is an even permutation and −1 otherwise.

189
Remark 42 A permutation σ of n ordered elements (1, 2, 3, . . . , n) is called even if it can be composed
of an even number of transpositions, otherwise it is odd. For example, σ( j ) = j takes each element
in the sequence to itself, so it has no transpositions and is even. However, given σ such that σ(1) = 2,
σ(2) = 1 and σ(k ) = k otherwise, has one transposition that swaps elements 1 and 2, so it is odd.

In Example A.1 and Example A.2


A.2, det (A) acts as an overall scaling factor on the inverse
transformation. When the linear transformation A acts on a set of basis vectors {êi } then this
rescaling accounts for the distortion of the coordintate grid under the transformation. This
distortion corresponds to the changing of the lengths of the basis vectors following the
transformation, and measures the distortion of the coordinate grid. When det (A) = 1 then the
transformation does not change the length of any vector and corresponds to an even number of
reflections about some collection of coordinate axes, corresponding to a rotation. When
det (A) = −1 then the transformation does not change the length of any vector and corresponds to
an odd number of reflections about some collection of coordinate axes, corresponding to a
combination of rotations and reflections.

Remark 43 When det (A) = 0, then there must exist at least one row (or column) in A that is a
multiple of another row (or column), for which there is a subspace of vectors that map to the zero
vector. This can be seen from elementary row (or column) operations on A. When this is true, A
maps a subspace to zero, meaning that the transformation is not a one-to-one mapping, so A is not
invertable. We say that A is singular since it collapses a subspace on which it acts to a point, thereby
reducing the dimension of the space. A is a mapping from a space of n dimensions to a space of
lower dimension. An example of this appears in the problem of gimbal lock in mechanical inertial
navigation systems, such as those found in spacecraft, and can cause catastrophic and irrecoverable
loss of navigation.

The summation in (A.7


A.7) is economicaly expressed as

det (A) = εi j k A 1i A 2 j A 3k (A.7)

where the values of the coefficients are easily read off the expansion, so

ε123 = ε231 = ε312 = 1 and ε213 = ε132 = ε321 = −1

and is zero otherwise. An immediate conseqence of this is that under the exchange of indices,

εi j k = ε j k i = εk i j = 1, ε j i k = εi k j = εk j i = −1 and εi i k = εi k i = εk i i = 0

for all choices of i , j , k = 1, 2, 3. Moreover, under the exchange of indices i and j on the components
of A, followed by a relabelling, we find,

εi j k A 2 j A 1i A 3k = −ε j i k A 2 j A 1i A 3k = −εi j k A 1i A 2 j A 3k = ε213 εi j k A 1i A 2 j A 3k = ε213 det (A) .

More generally, we find


p q
εp q r det (A) = εi j k A i A j A r k . (A.8)

190
We can eliminate the free indices in (A.8
A.8) by multiplying by εp q r and summing over the indices.
Then
εp q r εp q r det (A) = 3! det (A)

and recover the most general form of the determinant

1 p q
det (A) = ε p q r εi j k A i A j A r k (A.9)
3!
and the details of the summation are left as an exercise. Next we verify some important properties
of determinants.
Given the 3 × 3 matrices A and B ,

p q p q
εp q r det (A) = εi j k A i A j A r k and εi j k det (A) = εp q r A i A j A r k

p q
where εi j k A i A j A r k correspond to a cofactor matrix expansion along row p , then row q and
p q
then along row r while εp q r A i A j A r k correspond to a cofactor matrix expansion along column
i , then column j and then along column k . Then

j
εi j k A 1i A 2 j A 3k = εi j k A i 1 A 2 A k 3 .

This exchange of rows and columns motivates the following statement

det (A) = det A ⊤ .



(A.10)

Now consider the determinant of the product of A and B ,

p q
εp q r det (AB ) = εi j k (AB ) i (AB ) j (AB )r k
= εi j k A p s B si A q t B t j A r u B uk
= A p s A q t A r u εi j k B si B t j B uk
= A p s A q t A r u εs t u det (B )
= εs t u A p s A q t A r u det (B )
= εp q r det (A) det (B )

where εp q r corresponds to an overall sign. Similarly, this result is easily generalised to arbitrary
n × n matrices and recovers
det (AB ) = det (A) det (B ) . (A.11)

A.11) is that det (AB ) = det (B A), even when AB ̸= B A. These result
An immediate consequence of (A.11
are easily generalised to arbitrary n × n matrices, requiring only that each expressions is updated
to include additional indexed elements and are left as exercises to the student.

Remark 44 An immediate consequence of (A.11


A.11) is that given an invertible matrix A,

det (A) det A −1 = det AA −1 = det (1) = 1


 

191
such that
det A −1 = det (A)−1 .


This identity is particularly useful when considering transformations between coordinate systems
and we shall use it as a general tool to study invariance of certain quantities under coordinate
transformations.

Now consider the scalar triple product of tangent vectors to the coordinate curves in a given
coordinate embedding, parametrised by the coordinate set u 1 , u 2 , u 3 ,


 
€ Š e⃗i⊤
e⃗i · e⃗j × e⃗k = det e⃗i e⃗j e⃗k = det e⃗j⊤  .
 

e⃗k⊤

A.8), (A.10
Then using the determinant identities in (A.8 A.10) and (A.11
A.11), it follows that
2 € Š € Š
e⃗i · e⃗j × e⃗k = εi j k εi j k det e⃗1 e⃗2 e⃗3 det e⃗1 e⃗2 e⃗3
2
€ Š € Š
= εi j k det e⃗1 e⃗2 e⃗3 det e⃗1 e⃗2 e⃗3
 
e⃗1⊤ € Š
= det e⃗2⊤  det e⃗1 e⃗2 e⃗3
 

e⃗3⊤
e⃗1⊤ e⃗1 e⃗1⊤ e⃗2 e⃗1⊤ e⃗3
= e⃗2⊤ e⃗1 e⃗2⊤ e⃗2 e⃗2⊤ e⃗3
e⃗3⊤ e⃗1 e⃗3⊤ e⃗2 e⃗3⊤ e⃗3
e⃗1 · e⃗1 e⃗1 · e⃗2 e⃗1 · e⃗3
= e⃗2 · e⃗1 e⃗2 · e⃗2 e⃗2 · e⃗3
e⃗3 · e⃗1 e⃗3 · e⃗2 e⃗3 · e⃗3
= det (g) .

From which we conclude


2
e⃗i · e⃗j × e⃗k = det (g)

and Ç
e⃗i · e⃗j × e⃗k = det (g) .

We can extend this to n dimensions by noting that the Jacobian of a coordinate transformation
defines the multidimensional volume element and
Ç
|det (J )| = det (g) . (A.12)

The relationship between the Jacobian determinant and the determinant of the metric defines
the basis for all area and volume computations.

192
A.5 Exercises
Exercise A.1 Rewrite the proof of Taylor’s Theorem, presented in these notes, including all missing
steps.

Exercise A.2 Show that



x
X xn
e = .
n =0
n!

Exercise A.3 Show that


1
= 1 + x + x2 + x3 + ...
1− x
whenever |x | < 1.

Exercise A.4 Use the Taylor series expansion to show that

e i x = cos (x ) + i sin (x ) .

Exercise A.5 Use the Taylor series expansion to show that

cos (x ) − 1
lim = 0.
x →0 x
Exercise A.6 Use the Taylor series expansion to show that

sin (x )
lim = 1.
x →0 x
Exercise A.7 Show that
d
e α d x f (x ) = f (x + α).

Exercise A.8 Given a right-handed coordinate system with an orthonormal basis {ê1 , ê2 , ê3 }, and
the vectors
a⃗ = a 1 ê1 + a 2 ê2 + a 3 ê3 and ⃗ = b 1 ê1 + b 2 ê2 + b 3 ê3
b

where a 1 , a 2 , a 3 , b 1 , b 2 , b 3 ∈ R, use the distributivity of addition over multiplication to show

⃗ = a 1b 1 + a 2b 2 + a 3b 3
a⃗ · b

and
⃗ = a 2 b 3 − a 2 b 2 ê1 − a 1 b 3 − a 3 b 1 ê2 + a 1 b 2 − a 2 b 1 ê3
  
a⃗ × b

Exercise A.9 Show that the transformation matrix


‚ Œ
0 1
A=
1 0

corresponds to a reflection, by considering the effect of the transformation on a test vector and then
by considering the the rules of determinants to describe the operation of A and A 2 . Include as much
detail as possible.

193
Exercise A.10 Verify, by direct calculation, the matrix inverse transformation calculations in
Example A.1 and Example A.2
A.2. Include as much detail as possible.

Exercise A.11 Verify, by direct calculation, that (A.6


A.6) gives the value of the determinant in the case
of the 2-by-2 and 3-by-3 matrices. Tabulate of the permutations σ and their corresponding sign.
Include as much detail as possible.

Exercise A.12 Compute

1. εp q εp q .

2. εp q r εp q r .

Generalise these results to εi 1 i 2 ...i n εi 1 i 2 ...i n .

Exercise A.13 Construct and expression of the determinant of an n × n matrix A using the
appropriate collection of ε symbols. How can you check your construction? Verify your results for
n = 1, 2, 3, 4.

Exercise A.14 Show that


det (AB ) = det (A) det (B )

for n × n matrices A and B .

Exercise A.15 Show that Ç


|det (J )| = det (g) .

in n-dimensions.

194
Bibliography

[1] D. Halliday, R. Resnick, and J. Walker. Fundamentals of Physics. Wiley, 2003.

[2] J.C. Baez. Lectures on Classical Mechanics, Course Notes. Louisiana State University, 2005.

[3] A.I. Borisenko and I.E. Tarapov. Vector and Tensor Analysis with Applications. Dover, 1968.

[4] Cain and Herod. Multivariable Calculus. Dover, 1997.

[5] P.S. Goldstein. Classical Mechanics. Addison Wesley, 1997.

[6] D. Tong. Classical Dynamics, Lecture Notes. Cambridge University, Cambridge, 2005.

[7] J.B. Marion. Classical Dynamics of Particles and Systems. Academic Press, 2013.

[8] D.A. Wells. Schaum’s Outline of Theory and Problems of Lagrangian Mechanics. McGraw Hill,
1967.

[9] M.R. Spiegel. Schaum’s Outline of Theory and Problems of Theoretical Mechanics: With an
Introduction to Lagrange’s Equations and Hamiltonian Theory. McGraw Hill, 1967.

195

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy