Lecture Notes Part2
Lecture Notes Part2
Contents
1 Equivalence principle
1.1 Statement of equivalence principle
1.2 Bending of light
1.3 Gravitational redshift
1.4 Spacetime pseudo-Riemannian manifold
2 Manifolds and tensors
2.1 Smooth manifolds
2.2 Scalars
2.3 Contravariant and covariant vectors
2.4 Tensors of general type
3 Metric tensor and covariant derivative
3.1 Metric tensor
3.2 Covariant derivative
3.3 Parallel-transport
3.4 Geodesic equation
4 Curvature
4.1 Riemann curvature tensor
4.2 Geometrical interpretation of the Riemann curvature tensor
4.3 Ricci and Einstein curvature tensor
Page 1 of 48
Lecture notes PH4508
1 EQUIVALENCE PRINCIPLE
Newton's universal gravitation is not compatible with special relativity. Being an action at a distance, the
Newtonian gravitational force acting on an object responds instantaneously to the change of say the
position of its source (another mass) that may be very far (e.g., many light years) away. According to
special relativity, however, nothing can travel faster than the speed of light. So, Newton's universal
gravitation needs modification if special relativity is correct. Einstein formulated the relativistic theory of
gravity—the general theory of relativity or general relativity about ten years after he formulated special
relativity.
General relativity is now the accepted classical theory of gravity. Gravity is so special a force that it admits
a geometrical formulation due to the equivalence principle. It will be clear that with this principle, we are
guided to an almost unique formulation of general relativity in terms of spacetime geometry. There are a
few different versions and forms of the equivalence principle that may not be entirely equivalent. Here,
we discuss two most commonly used versions.
The idea of equivalence principle is an old one and appeared first in the days of Galileo, when he
discovered that the acceleration of an object under the gravitation of Earth is independent of any
properties of the object. Galileo's discovery can be understood as follows in Newtonian mechanics. Recall
that, in Newtonian mechanics, the motion of an object is governed by Newton's second law
𝐹⃗ = 𝑚𝑖 𝑎⃗. (1)
This equation also defines the mass 𝑚𝑖 of the object. This mass measures the resistance of the object to
changes in its motion and is known as the inertial mass.
The inertial mass 𝑚𝑖 has a universal character in the sense that (1) is applicable to any forces regardless
of their nature. Newton also formulated the first of the four fundamental forces of nature, namely the
universal gravitation. In terms of the gravitational potential Φ, Newton's law of university gravitation says
that an object will experience a gravitational force in the following form
⃗⃗ = −mg ∇Φ,
F (2)
where −∇Φ is the gravitational field at the point where the object is located. The quantity 𝑚𝑔 is known
as the gravitational mass of the object. A stone weighs more than a piece of Styrofoam of the same size
Page 2 of 48
Lecture notes PH4508
because the stone has more gravitational mass. On the other hand, the stone is much harder to stop than
the piece of Styrofoam because the stone has more inertial mass.
The inertial mass 𝑚𝑖 and the gravitational mass 𝑚𝑔 are different physical quantities that measure
different properties of an object. The statement that they are always equal
𝑚𝑖 = 𝑚𝑔 , (3)
is known as the Weak Equivalence Principle. Equating (1) and (222) and canceling the masses, the
equation of motion (EOM) of the object in a gravitational field described by Φ then becomes
𝑎⃗ = −𝛻𝛷. (4)
The acceleration of the object is independent of any properties of the object, but only depend on
gravitational field itself. This is exactly what Galileo tried to show us by dropping two cannonballs from
the Pisa Tower.
Gravity is the unique fundamental force that determines the motion of all objects in the same way. If an
object is in an electric field instead, its EOM will depend not only on the electric field but also on the
charge-mass ratio of the object. If the object is neutral, there will be no electric force acting on the object.
The object will be stationary or move at constant velocity in an inertial reference frame—this is how we
define inertial reference frames when an electric field is present. However, when gravity is present, the
motion of any object will be influenced by it, and we cannot find an inertial reference as we do when only
an electric field is present. We encounter serious problems to define (global) inertial reference frames in
a theory of gravity with weak equivalence principle.
The weak equivalence principle, however, allows us to define a special type of observers that are freely
falling according to (4) in the presence of gravity but no other forces. These observers are known as locally
inertial observers (frames).1 In a small enough region of spacetime, to these observers, all freely falling
objects will have no acceleration: Objects subject to no forces other than gravity either remain stationary
or move with constant velocity. So, to these observers, objects move as if there is no gravitational field
present—locally inertial reference frames indeed play the role similar to that of inertial reference frames
in Newtonian mechanics (if one forgets about gravity). As we can see, the existence of locally inertial
observers builds on the equivalence principle or equivalently Eq. (4). With global inertial reference frames
not well defined, the locally inertial reference frames are the more natural reference frames we resort to
and can be thought of as truly inertial when gravity is present.
Note that at any point in spacetime, we can define not just one locally inertial reference frame, but
actually a family of them, all of which are moving with the same acceleration but different velocities.
These locally inertial reference frames are connected to each other by Lorentz transformations. Note that
in a generic gravitational field which is not uniform, the locally inertial reference frame is defined locally—
it cannot be extended to a region of spacetime where the gravitational field has a different value. If a
locally inertial reference frame can be extended throughout the entire spacetime, we would have a global
inertial reference frame.
1
Such an observer can build a local reference frame with him/her in which special relativity holds. This reference
frame will be called locally inertial reference frame. We will not distinguish locally inertial observers and locally
inertial reference frames.
Page 3 of 48
Lecture notes PH4508
We should point out and emphasise that, however, locally inertial reference frames defined here are not
inertial reference frames in Newtonian mechanics when gravity is present. A house stationary with respect
to Earth is, to a very good approximation, an inertial reference frame in Newtonian mechanics, but in the
present treatment, it is obviously not a locally inertial reference frame. A freely falling apple, on the other
hand, is not an inertial reference frame in Newtonian mechanics; it is, however, a locally inertial reference
frame in general relativity, such defined to eliminate gravity!
Figure 1: (a) Motion of a ping-pong ball in the gravitational field g of Earth observed
by a stationary observer. (b) The ping-pong ball is now stationary in outer space,
but the observer is towed by a rocket with an acceleration g in the upward
direction. The motion of the ping-pong ball appears identical to the observers in
these two scenarios.
There is another form of the weak equivalence principle that is often used: In small enough regions of
spacetime, the motion of particles in a gravitational field 𝑔⃗ is the same as that in a uniformly accelerating
reference frame with acceleration equal to −𝑔⃗. This statement can be understood as follows. Suppose
that a physicist is doing experiments to measure Earth's gravitational field by studying the motion of test
particles. The physicist is restricted to a small region, e.g., in a lift, so that the gravitational field is almost
uniform. He/she observes for a while that test particles, e.g., ping-pong balls, are in free fall, see Fig. 1(a).
If the lift is sealed with no windows, can the physicist conclude that he/she is in a gravitational field? The
answer is no, because he/she cannot distinguish the observed motion of the ping-pong ball from that in
the scenario as shown in Fig. 1 (b). In the latter scenario, the ping-pong ball is stationary in outer space in
the absence of any forces and the physicist (and his/her lift) is towed upwards by a rocket with an
acceleration equal to 𝑔. The physicist will also find that the ping-pong ball is moving downwards with
Page 4 of 48
Lecture notes PH4508
acceleration 𝑔 . By studying the motion of no objects, regardless of their nature, can the physicist
distinguish the above-mentioned two scenarios—this is the exact content of weak equivalence principle.
Einstein took the equivalence principle one step further. His idea was simply that there is no way
whatsoever for the physicist to distinguish a gravitational field (Fig. 1(a)) and uniform acceleration (Fig.
1(b)). Physics in the two scenarios, not just the motion of test particles, appear the same to the physicist.
This is called Einstein Equivalence Principle and can be stated more precisely as follows: In small enough
regions of spacetime, it is impossible to detect the existence of gravitational field by means of any local
experiments (which, for example, may involve electromagnetism).
Einstein Equivalence Principle can also be stated in another form: in small enough regions of spacetime,
laws of physics reduce to those of special relativity. More explicitly, in small enough regions of spacetime,
we can always set up a locally inertial reference frame (freely falling reference frame) and all laws of
physics in special relativity are valid in this frame. Here, laws of physics in special relativity refer to any
laws of physics that are consistent with special relativity. Recall that the underlying spacetime structure
of special relativity is Minkowski spacetime. This form of Einstein Equivalence Principle then says that in
small enough regions, spacetime looks like Minkowski spacetime!
Therefore, local gravity, according to Einstein Equivalence Principle, has a relative existence. In a small
region near the surface of Earth, a stationary observer observes gravity; but this can be attributed to the
fact that he/she is in uniform acceleration relative to a locally inertial reference frame. The locally inertial
observer finds that laws of physics are that of special relativity, so to him/her, there is no gravity.
Figure 2: How does gravity affect the motion of light? The black arrow represents
a light beam traveling in the horizontal direction.
Consider a light beam traveling in the horizontal direction near the surface of Earth, see Fig. 2. We assume
that the gravitational field of Earth is uniform. How will the gravitational field affect the motion of the
light beam? At this point, we do not know. What we know now are special relativity and the equivalence
Page 5 of 48
Lecture notes PH4508
principle. The idea to deal with gravity is to eliminate it by choosing a locally inertial reference frame,
study physics in this reference frame using special relativity, and then relate physics in this frame to that
in Earth's reference frame (laboratory reference frame).
So, we study the motion of the light beam in a locally inertial reference frame (a freely falling lift) first,
which is of course a horizontal straight line, see left graph in Fig. 3. In the coordinates (𝑡 ′ , 𝑥 ′ , 𝑦 ′ ) of the
freely falling lift,2 the motion of the beam is given by the equation
𝑥 ′ = 𝑐𝑡 ′ , 𝑦 ′ = 0. (5)
The coordinate transformations between the freely falling lift and Earth are given by
𝑥 = 𝑥′,
1
{𝑦 = 𝑦 ′ + 𝑔𝑡 2 (6)
2
𝑡 = 𝑡′.
Combining (5) and (6), the motion of the light beam in Earth's reference frame can then be found as
follows
1
𝑥 = 𝑐𝑡, 𝑦 = 𝑔𝑡 2 . (7)
2
The trajectory of the beam, like that of any massive objects, is a parabola in Earth's gravitational field, see
Fig. 3. So, we conclude that light is bent by a gravitational field.
Figure 3: Left: In the reference frame of the freely falling lift, light will travel in a
horizontal straight line, since no gravity is observed. Right: In the Earth reference
frame, light will be bent downwards by gravity.
Two points should be pointed out in our treatment above. Firstly, we ignore any relativistic corrections to
the coordinate transformation equations (6). For this to be a good approximation, we need to require
𝑔𝑡 ≪ 𝑐, so the velocity of the freely falling lift at any time is small compared to the speed of light. Secondly,
the gravitational field is assumed to be uniform in the range of the motion of the light beam. For this to
happen, we demand that 𝑐𝑡 ≪ 𝑅𝐸 , where 𝑅𝐸 is the radius of Earth.
2
We suppress the z' coordinate for simplicity.
Page 6 of 48
Lecture notes PH4508
Consider light signals emitted and received in a gravitational field. Suppose that we have two observers
Alice and Bob who stay at the nose and tail respectively of a stationary rocket with height h in the presence
of Earth's gravitational field. Alice sends light pulses at constant time intervals Δτ𝐴 downwards, while Bob
is asked to receive these light pulses at the tail of the rocket, and he finds that these signals arrive at
constant time intervals Δτ𝐵 . We show in Fig. 4 four events of sending and receiving light pulses by Alice
and Bob. In (1), Alice sends a light pulse at a certain time instant 𝑡1𝐴 . This signal is received by Bob at time
instant 𝑡1𝐵 as shown in (2). Alice sends a second light pulse at time 𝑡2𝐴 = 𝑡1𝐴 + Δτ𝐴 in (3), and this signal
is received by Bob at time 𝑡2𝐵 = 𝑡1𝐵 + Δτ𝐵 in (4). The question is again how gravity affects light signals.
More precisely, we ask how the time intervals Δτ𝐴 and Δτ𝐵 are related.3
3
If this is your first encounter with the gravitational redshift phenomenon, it is tempting to assume that 𝑡1𝐵 − 𝑡1𝐴 =
𝑡2𝐵 − 𝑡2𝐴 to deduce that Δτ𝐴 = Δτ𝐵 . If you did this in your mind just now, you were wrong. Probably you made the
Page 7 of 48
Lecture notes PH4508
The idea to deal with gravity is the same as in our previous discussion of bending of light: We first study
the behaviour of light pulses in a locally inertial reference frame, find the relations between the
coordinates used by the locally inertial reference frame and the Earth reference frame, and then deduce
the behaviour of light pulses in the Earth reference frame from that in the locally inertial reference frame.
Here, we choose the locally inertial reference frame as the reference frame that starts to fall freely under
gravity at time 𝑡1𝐴 . In this reference frame, the rocket is not stationary but accelerates upwards with an
acceleration equal to g. The four events of emitting and receiving of light signals by Alice and Bob are now
observed as shown in Fig. 5. We work out the time instances of these four events in this reference frame.
Figure 5: In the freely falling reference frame, the rocket is accelerating upwards.
The same four events are observed. Notice that there is no gravitational field
anywhere.
Examine the process from time 𝑡1𝐴 to time 𝑡1𝐵 : the first light pulse travels downwards with speed c while
the rocket accelerates upwards from rest. The tail of the rocket meets the light signal at time 𝑡1𝐵 , so we
have
1
c(t1B − t1A ) + g(t1B − t1A )2 = h, (8)
2
following mistakes: (1) you used the idea in Newtonian mechanics that light is not affected by gravity, and (2) you
assumed that the clocks carried by Alice and Bob once synchronised remain synchronised.
Page 8 of 48
Lecture notes PH4508
−c + √c 2 + 2gh
t1B − t1A = . (9)
g
The other solution to (8) has been discarded. Given 𝑡1𝐴 , we can calculate 𝑡1𝐵 .
At time 𝑡2𝐴 = 𝑡1𝐴 + Δτ𝐴 when Alice sends the second signal, the velocity of the rocket is 𝑔Δτ𝐴 . Examine
process from time 𝑡2𝐴 to time 𝑡2𝐵 : the second light pulse travels downwards while the rocket travels
upwards, and they meet at time 𝑡2𝐵 . The sum of the distance traveled by the light pulse and that by the
rocket is ℎ
1
c(t 2B − t 2A ) + gΔτA (t 2B − t 2A ) + g(t 2B − t 2A )2 = h. (10)
2
This equation can be rewritten as
1
(c + gΔτA )(t 2B − t 2A ) + g(t 2B − t 2A )2 = h, (11)
2
whose solution is given by
To find Δτ𝐵 = 𝑡2𝐵 − 𝑡1𝐵 , we take the difference of Eq. (12) and Eq. (9) to get the following equation
Page 9 of 48
Lecture notes PH4508
conditions make sure that, in the entire process from (1) to (4), the velocity of the rocket is non-relativistic
in the locally inertial reference frame.
Now switch to Earth's reference frame. Gravity now comes into the scene. Since the freely falling locally
inertial reference frame has a non-relativistic velocity, the time dilation effect can be ignored. So, in Eq.
(14), Δτ𝐴 and Δτ𝐵 can be interpreted as the time intervals as measured in the Earth frame or the rocket
frame: Δτ𝐴 is the time interval measured by Alice of two successive emitting events of light pulses and
Δτ𝐵 is the time interval as measured by Bob of two successive receiving events. Eq. (14) then says that
Bob finds that he receives light signals at time intervals shorter than the time intervals at which the light
signals were emitted: light signals traveling towards lower gravitational potential are blueshifted! Eq. (14)
can be converted to a relation between the frequencies of the light beam sent by Alice and received by
Bob as follows
gh
γB ≈ γA (1 + ). (15)
c2
The gravitational blueshift phenomenon has one direct consequence: Since Δτ𝐵 < Δτ𝐴 , a clock carried by
Alice ticks faster than a clock carried by Bob. So, if you want to age more slowly than others, you can stay
in a unit as lower as possible. The best you can do is of course to build your house at the centre of Earth.4
If light pulses are emitted by Bob and received by Alice, they will be found redshifted with lower frequency.
The phenomenon of gravitational redshift/blueshift, predicted by the equivalence principle, was
confirmed experimentally by the Pound-Rebka experiment in the nineteen sixties.
Recall that Einstein Equivalence Principle implies that, in a small region of spacetime, we can find a locally
inertial observer to whom physics in the region is that of special relativity. The locally inertial observer can
in no way detect the presence of gravity in the small region by doing experiments of any sort; To him/her,
spacetime in the small region looks like Minkowski spacetime. Another observer who sees gravity in the
small region can be attributed to the fact that he/she is in uniform acceleration relative to the locally
inertial observer.
Can a locally inertial reference frame be extended throughout the entire spacetime? If the answer is yes,
we would have found a global inertial reference frame in which special relativity holds, and gravity has
been completely eliminated everywhere. Physics will be much easier, and life will be extremely boring in
this case. The course would have ended here. Gravity in realistic situations are not uniform. Non-
uniformities in gravitational fields are referred to as tidal forces. 5 The tidal forces around Earth are
illustrated in Fig. 6. In each small enough region of spacetime, we can find a locally inertial observer who
sees no gravity. But these observers, due to the tidal forces, are moving towards each other when they
are freely falling towards the centre of Earth. Two neighbouring locally inertial reference frames cannot
4
Another way to stay young is to do space travel and return. Study the so-called twin paradox before you embark
on the journey.
5
The terminology may be confusing: Tidal forces are not the forces themselves, as you can see from the definition.
They are the force differences.
Page 10 of 48
Lecture notes PH4508
be meshed into a bigger one, due to tidal forces. In other words, tidal forces present the obstructions to
extend a locally inertial reference frame to a global inertial reference frame, and they can never be
eliminated by changing reference frames. In this sense, tidal forces are the fundamental manifestation of
gravity in general relativity.
Figure 6: Tidal forces around Earth. In general relativity, tidal forces are the
fundamental manifestation of gravity, and they are the obstructions of
constructing a global inertial reference frame. On the other hand, local gravity, or
gravity in a small enough region, has a relative existence as explained in the main
text.
One may be attempted to construct an inertial reference frame by making a rigid body and letting it fall
freely near the surface of Earth, illustrated by the box in Fig. 6. Such an attempt will fail because, if the
box is rigid and its centre B is in free fall, corner A and corner C will fall in the vertical direction rather than
towards the centre of Earth. It is impossible for all parts of the box to be in free-fall motion while
maintaining rigid.
Let us summarise. The equivalence principle allows us to conclude that spacetime is locally Minkowski. In
small enough regions of spacetime, gravity can be attributed to observers in uniform acceleration relative
to locally inertial observers. Over finite regions, gravity has a true manifestation as tidal forces, the
presence of which indicates the failure of constructing a global inertial reference frame.
We are thus led to describe spacetime as a kind of mathematical structure that looks locally like 𝑅 𝟛,𝟙
(Minkowski spacetime) but over extended regions may be curved and deviate from 𝑅 𝟛,𝟙 . The kind of object
that encompasses this notion is that of a manifold. Moreover, Minkowski spacetime 𝑅 𝟛,𝟙 carries a pseudo-
Riemannian structure described by the metric ημν, so geometrical notions such as straight lines, distance
and angle can be defined. In small regions of spacetime, we can always erect a locally inertial reference
frame and use it to define straight lines, distance, and angle. We are finally led to describe spacetime as
Page 11 of 48
Lecture notes PH4508
a pseudo-Riemannian manifold—a manifold equipped with an indefinite metric.6 In this treatment, gravity
is no longer taken as an external force, but rather encoded in the structure of spacetime itself—the precise
meaning of this statement will become clear after the full glory of pseudo-Riemannian manifolds is
revealed.
Once again, our understanding of space, time and spacetime is revolutionised by Einstein, now in a theory
that tries to incorporate gravity. In standard quantum field theory treatment of the other known
fundamental forces of nature, namely the electromagnetic force, weak force, and strong force, spacetime
offers the arena for these forces to play the game but it itself is inert and fixed—The spacetime is that of
Minkowski. This is to be contrasted with the spacetime when gravity is taken into account—spacetime in
general relativity has a non-trivial geometry and is dynamical!
Figure 7: A manifold is the union of possibly many patches (open regions) 𝑈𝑖 . Each
region is parametrised by a coordinate system say 𝑥 μ .
A manifold 𝑀 is a space consisting of patches or open regions 𝑈𝑖 that look locally like the 𝑑-dimensional
Euclidean space7 𝑅 𝑑 for some integer d and smoothly sewn together. The integer d is called the dimension
of the manifold. The index 𝑖 labels different regions. In each region, we have a coordinate system to
parametrise points inside the region. For example, region 𝑈1 has coordinates 𝑥 μ and region 𝑈2 has
′
coordinates 𝑥 μ . Two or more regions may overlap. Overlapped regions are parametrised by multiple
coordinate systems that are related by coordinate transformations. The manifold is smooth if all the
coordinate transformations are infinitely differentiable. You can think of a manifold as a space with
coordinates, that locally looks like Euclidean space but globally can warp and bend, and picture it in your
mind as something like that depicted in Fig. 7.
6
A manifold equipped with a positive definite metric is the subject of Riemannian geometry.
7
Or Minkowski spacetime. Metric has yet been introduced at this point and it is not necessary for the definition of
a manifold.
Page 12 of 48
Lecture notes PH4508
The Euclidean space 𝑅 𝑑 is a trivial example of a manifold, where a single coordinate system covers the
whole space. A circle, defined as points whose coordinates satisfy 𝑥 2 + 𝑦 2 = 1 in the two-dimensional
Euclidean plane, is a one-dimensional manifold. A sphere, defined as points whose coordinates satisfy
𝑥 2 + 𝑦 2 + 𝑧 2 = 1 in the three-dimensional Euclidean space, is a two-dimensional manifold. Surfaces in
the intuitive sense (e.g., an eggshell, a membrane, etc.) are two-dimensional manifolds. They are
manifolds embedded in three-dimensional Euclidean space. The definition of manifolds we offered above
is an intrinsic one: There do not need to have any ambient geometry in which a manifold “lives”. In fact,
many manifolds cannot be embedded in three-dimensional Euclidean space.
In general relativity, our spacetime is modeled as a four-dimensional manifold. We can use coordinates
say 𝑥 μ to label points in spacetime. One coordinate system may not be enough to cover all points in the
entire spacetime, so in some cases we may have to use multiple coordinate systems. The Greek
superscript μ of spacetime coordinates 𝑥 μ is usually assumed to take values from 0 to 3 rather than from
1 to 4: 𝑥 0 often denotes the time coordinate, while 𝑥 1 , 𝑥 2 and 𝑥 3 denote spatial coordinates. In some
theoretical models, spacetime dimension is an integer variable d which may be different from four. The
superscript μ in this case takes values from 0 to 𝑑 − 1. When there is no time coordinate and all the 𝑑
coordinates are spatial coordinates, μ is often assumed to take values form 1 to 𝑑 instead. The tensor
calculus that we develop in this part is valid whether there exists a time coordinate or not.
We also call the 𝑑 functions 𝑥 μ (𝑝) the coordinates of the point 𝑝. A point in a manifold exists independently
of its coordinates. We often use the sloppy notation 𝑥 μ to denote a point whose coordinates are 𝑥 μ . For
3
example, you know what point it is if I say the point (2, , 5) in Cartesian coordinates. The function ψ𝑖 sort
2
of identifies 𝑈𝑖 with its image ψ𝑖 (𝑈𝑖 ) ⊆ 𝑅𝑑 . We can do calculus in 𝑈𝑖 by doing calculus in its image under ψ𝑖
in 𝑅𝑑 . The integer 𝑑 is called the dimension of the manifold and it is equal to the number of coordinates to
uniquely parametrise a point in the manifold.
The manifold 𝑀 is the union of its patches 𝑈𝑖 : 𝑀 =∪ 𝑈𝑖 . The collection of all the charts (𝑈𝑖 , ψ𝑖 ) of a manifold
is called an atlas. Two charts may overlap: 𝑈𝑖 ∩ 𝑈𝑗 ≠. When this happens, we demand the patching map or
transition map ψ𝑖𝑗 ≡ ψ𝑖 ∘ ψ−1
𝑗 from ψ𝑗 (𝑈𝑖 ∩ 𝑈𝑗 ) to ψ𝑖 (𝑈𝑖 ∩ 𝑈𝑗 ) to be infinitely differentiable. Notice that ψ𝑖𝑗 is
𝑛
defined on subsets of 𝑅 , so its differentiability is defined in the usual sense of calculus. A manifold can
have many atlases.
The 𝑛-sphere and 𝑛-torus are manifolds. Lie groups, e.g., 𝑆𝑂(3), 𝑆𝑈(2) and the Lorentz group, are also
manifolds but with additional structures. The closed interval [0,1] on the real line is not a manifold itself. It
is, however, a manifold with boundary. It has a disconnected boundary with two points 0 and 1. Manifolds
with boundary are also very important in mathematics and physics, but we shall not discuss in detail.
Page 13 of 48
Lecture notes PH4508
A manifold looks like 𝑅𝑑 locally, but globally it may look like very different from 𝑅𝑑 . One example is the 2-
torus. No matter how hard you deform the 2-torus, you can never turn it into the Euclidean 2-plane in a
continuous manner. This is because the 2-torus is topologically distinct from the Euclidean 2-plane.
One can introduce more than one coordinate system to cover a neighborhood of a point p in a manifold
′
𝑀. Suppose that two such coordinate systems are 𝑥 μ and 𝑥 μ and they cover a common neighborhood 𝑈.
′
Each point in 𝑈 have some coordinates 𝑥 μ in the first coordinate system and 𝑥 μ in the second. There is a
′ ′
natural bijection between the coordinates 𝑥 μ and 𝑥 μ : 𝑥 μ is associated to 𝑥 μ if they are the coordinates
of the same point in the two coordinate systems. The two functions that are defined by the bijection
′ ′ ′
𝑥 μ = 𝑥 μ (𝑥 μ ) and 𝑥 μ = 𝑥 μ (𝑥 μ ) are called coordinate transformations. You are very familiar with
coordinate transformations already. The Euclidean plane can be parametrised by polar coordinates (𝑟, ϕ)
as well as Cartesian coordinates (𝑥, 𝑦) with coordinate transformations
𝑥 = 𝑟 cos ϕ , 𝑟 = √𝑥 2 + 𝑦 2 ,
{ and { 𝑦 (17)
𝑦 = 𝑟 sin ϕ , ϕ = arctan .
𝑥
These coordinate transformations are determined by demanding that (𝑟, ϕ) and (𝑥, 𝑦) describe the same
point in the Euclidean plane.
′ ′ ′
For smooth manifolds, the coordinate transformations 𝑥 μ = 𝑥 μ (𝑥 μ ) and its inverse 𝑥 μ = 𝑥 μ (𝑥 μ ) are
∂𝑥 μ
infinitely differentiable functions. It can be shown that the Jacobian matrix ( ′ ) of the coordinate
∂𝑥 μ
μ′
transformations 𝑥 μ = 𝑥 μ (𝑥 ) defined as
∂𝑥 1 ∂𝑥 1
...
∂𝑥 μ ∂𝑥 1′
∂𝑥
′
𝑑
( )= ... ... ... , (18)
∂𝑥 μ′
∂𝑥 𝑑 ∂𝑥 𝑑
′ ... ′
(∂𝑥 1 ∂𝑥 𝑑 )
′
∂𝑥 μ
has a non-vanishing determinant. Similarly, the Jacobian matrix ( ∂𝑥 μ ) of the inverse transformations
′ ′
𝑥 μ = 𝑥 μ (𝑥 μ ) also has a non-vanishing determinant. One can check that the coordinate transformations
(17) have non-vanishing Jacobians as long as 𝑟 > 0. At 𝑟 = 0, the Jacobians become either zero or
divergent, indicating that something peculiar is happening. In fact, polar coordinates go bad at 𝑟 = 0 and
the origin of Eulidean plane is not covered by this coordinate system. Other coordinates should be used
at this point.
In a spacetime manifold, we can describe an event or a spacetime point using different coordinate systems.
Under coordinate transformations, the coordinates of the point change, but the point itself does not—
the existence of the point is independent of the coordinate systems we use to describe it. This is the idea
of general covariance and is promoted as an important principle in general relativity. In special relativity,
we have a preferred family of globally inertial observers that we can use to describe laws of physics. In
general relativity, the equivalence principle allows us to construct a family of locally inertial coordinate
systems that can be used to describe laws of physics in small enough regions of spacetime. But Einstein's
Page 14 of 48
Lecture notes PH4508
genus idea is that any coordinate system is as good as any other to describe laws of physics. We should
free ourselves from just using globally or even locally inertial reference frames. All coordinate systems are
treated on an equal footing in general relativity, so laws of physics can be expressed in arbitrary
coordinate systems, and their forms are preserved under coordinate transformations.
The general covariance principle stated above is rather vague; it is not clear what it means by saying that
the forms of physics laws are preserved under coordinate transformations. As we have seen in Minkowski
spacetime, there is a kind of quantities called tensors that have specified transformation rules under
Lorentz transformations, and laws of physics in special relativity can be written as tensor equations.
Tensors can be generalised to arbitrary manifolds, and they have specified transformation rules under
arbitrary coordinate transformations. The general covariance principle can then be formulated more
precisely as follows: laws of physics should be written as tensor equations.8 Due to the transformation
properties of tensors, a tensor equation will be true in all coordinate systems if it is true in one particular
coordinate system. Put in slightly different words, the coordinate-independent way to write laws of
physics is to write them as tensor equations.
2.2 SCALARS
We now define the simplest type of tensors in a manifold. We assign to a point 𝑝 in the manifold a value
φ(𝑝). This assignment is of course coordinate independent, since we have not talked about coordinates
of any sort. Such an assignment φ(𝑝) is called a scalar at point 𝑝. If a scalar is defined at each point in the
manifold, we get a scalar field.
To do calculations in the manifold, we need to set up a coordinate system say 𝑥 μ . A point p is labeled by
its coordinates 𝑥 μ (𝑝) in the coordinate system. The above-mentioned assignment of a value φ(𝑝) to
point 𝑝 becomes a function φ(𝑥 μ ) of the coordinates 𝑥 μ of 𝑝 . We can perform coordinate
′ ′
transformations 𝑥 μ = 𝑥 μ (𝑥 μ ), so point 𝑝 now has coordinates 𝑥 μ and the scalar assignment becomes a
′ ′
new function φ(𝑥 μ ) of the new coordinates 𝑥 μ of 𝑝. The value we assign at point 𝑝, whether labeled by
′
𝑥 μ in the new coordinate system or by 𝑥 μ in the old coordinate system, is the same:
′ ′
φ(𝑥 μ ) = φ(𝑥 μ ) where 𝑥 μ = 𝑥 μ (𝑥 μ ). (19)
We often say that scalars do not transform at all under coordinate transformations: Points do not change,
the values we assign to them do not change, and only the coordinates of the points change.
It is very important to notice that the functional forms of a scalar in different coordinate systems will in
general be different. Suppose that in coordinate system 𝑥 μ , the scalar is given by the function φ(𝑥 μ ) =
′ ′
𝑓(𝑥 μ ) . In the new coordinate system 𝑥 μ , the scalar is given by the function φ(𝑥 μ ) = 𝑓(𝑥 μ ) =
′
𝑓 (𝑥 μ (𝑥 μ )): we do nothing but just write the old coordinates in terms of the new coordinates. An
example of scalars is temperature. Suppose that a temperature field is defined in two-dimensional
Euclidean space in Cartesian coordinates by the function 𝑇(𝑥, 𝑦) = 𝑥 + 2𝑦. Then temperature expressed
in polar coordinates will be 𝑇(𝑟, ϕ) = 𝑟 cos ϕ + 2𝑟 sin ϕ.
8
Spinors are a type of objects in physics used to describe particles with half-integer spins. They are not tensors,
however. They do transform in a specified way under (the double cover of) Lorentz transformations.
Page 15 of 48
Lecture notes PH4508
At a point in spacetime, vectors or contravariant vectors to be more precise, can be pictured as arrows
starting from the point. A prototype of contravariant vectors is the displacement vector between two
neighbouring spacetime points. In a coordinate system 𝑥 μ , the coordinate difference between two nearby
μ μ
spacetime points A and B is Δ𝑥 μ = 𝑥𝐵 − 𝑥𝐴 . Now we perform coordinate transformations
′ ′ ′
𝑥 μ = 𝑥 μ (𝑥 μ ), or inversely, 𝑥 μ = 𝑥 μ (𝑥 μ ). (20)
′
The coordinate difference between the same two points in the new coordinate system 𝑥 μ is given by
′
μ′ μ′ μ′ μ′ μ μ′ μ ∂𝑥 μ μ μ
Δ𝑥 = 𝑥𝐵 − 𝑥𝐴 =𝑥 (𝑥𝐵 ) − 𝑥 (𝑥𝐴 ) ≈ (𝑥 − 𝑥𝐴 ). (21)
∂𝑥 μ 𝐵
When the two points A and B become infinitesimally near, the above equation becomes exact
′
μ′
∂𝑥 μ
Δ𝑥 = Δ𝑥 μ . (22)
∂𝑥 μ
Page 16 of 48
Lecture notes PH4508
′
μ′
∂𝑥 μ μ ν
ν′ ′
𝑉 (𝑥 ) = 𝑉 (𝑥 ) where 𝑥 ν = 𝑥 ν (𝑥 ν ). (24)
∂𝑥 μ
If a vector is defined at every point in the spacetime manifold, we have a vector field.
Exercise 1. Show that when the spacetime becomes Minkowski spacetime and coordinate transformations
are restricted to Lorentz transformations, the above definition of vectors reduces to that in special relativity.
𝑉𝑥
Let us work out one example. Consider a vector field 𝑉𝜇 = ( 𝑦 ) in Cartesian coordinates in two-
𝑉
dimensional Euclidean space. What are the components of this vector field in polar coordinates (𝑟, ϕ)?
According to (23), the components in polar coordinates are
∂𝑟 𝑥 ∂𝑟 𝑦
𝑟
𝑉 + 𝑉 cos ϕ 𝑉 𝑥 + sin ϕ 𝑉 𝑦
𝑉 ∂𝑥 ∂𝑦
( ϕ) = = ( sin ϕ 𝑥 cos ϕ 𝑦 ) . (25)
𝑉 ∂ϕ 𝑥 ∂ϕ 𝑦 − 𝑉 + 𝑉
𝑉 + 𝑉 𝑟 𝑟
( ∂𝑥 ∂𝑦 )
A uniform vector field along the x-direction with components 𝑉 μ = (10) in Cartesian coordinates will have
components
cos ϕ
𝑉𝑟
( ϕ ) = ( sin ϕ) , (26)
𝑉 −
𝑟
in polar coordinates.
Contravariant vectors are not new to us. The tangent vector of a curve is a contravariant vector. Suppose
that we have a curve parametrised by a parameter λ. In coordinates 𝑥 μ , the equation of the curve is given
𝑑𝑥 μ (λ) ′
by 𝑥 μ (λ). The tangent of the curve has components . In new coordinates 𝑥 μ defined by the
𝑑λ
′ ′ ′ ′
coordinate transformation 𝑥 μ = 𝑥 μ (𝑥 μ ), the same curve has an equation 𝑥 μ (λ) = 𝑥 μ (𝑥 μ (λ)), whose
tangent now has components
′ ′
𝑑𝑥 μ (𝑥 μ (λ)) ∂𝑥 μ 𝑑𝑥 μ (λ)
= . (27)
𝑑λ ∂𝑥 μ 𝑑λ
We thus see that the components of the tangent transform according to ( 23 ) under coordinate
transformations. This shows that the tangent of a curve is a contravariant vector.
∂x μ
ωμ′ = ′ ωμ , (28)
∂x μ
under the coordinate transformations (20). Let us emphasise that point 𝑝 does not change although its
coordinates change under coordinate transformations, so Eq. (28) should really read
Page 17 of 48
Lecture notes PH4508
′ ∂𝑥 μ ′
ωμ′ (𝑥 ν ) = ′ ωμ (𝑥 ν ) where 𝑥 ν = 𝑥 ν (𝑥 ν ). (29)
∂𝑥 μ
Covectors are also called dual vectors or one-forms. If a one-form is defined at every point in the manifold,
we have a one-form field.
From this point onwards, when no confusion arises, we will refer to contravariant vectors simply as vectors
and covariant vectors as one-forms.
We have actually seen one-forms before: The gradient of a scalar field is a one-form. Consider a scalar
field given by the function 𝑓(𝑥⃗) in the coordinate system 𝑥 μ . Here, we use the notation 𝑥⃗ to represent
the collection {𝑥 μ , μ = 1 … 𝑑 or μ = 0 … 𝑑 − 1} to emphasise that the function f depends on 𝑑 argument
variables. The gradient of 𝑓(𝑥⃗) then has 𝑑 components given by ∂μ 𝑓(𝑥⃗). If we perform coordinate
′ ′
transformations 𝑥 μ = 𝑥 μ (𝑥 μ ), the scalar function becomes 𝑓(𝑥⃗(𝑥⃗ ′ )), where 𝑥⃗ ′ = {𝑥 μ , μ′ = 1′ … 𝑑′ or
μ′ = 0′ … (𝑑 − 1)′ } to emphasise that there are d independent argument variables. In the new
′
coordinates 𝑥 μ , the gradient then has components given by
∂𝑓(𝑥⃗) ∂𝑥 μ
∂μ′ 𝑓(𝑥⃗(𝑥⃗ ′ )) = . (30)
∂𝑥 μ ∂𝑥 μ′
[If you are not comfortable with the arguments being written as 𝑥⃗(𝑥⃗ ′ ), you can write out all the arguments
explicitly.] Here we have used the chain rule of partial derivatives. The components of the gradient given
′
by ∂μ 𝑓(𝑥⃗) in coordinates 𝑥 μ and ∂μ′ 𝑓(𝑥⃗(𝑥⃗ ′ )) in coordinates 𝑥 μ are related by
∂𝑥 μ
∂μ′ 𝑓(𝑥⃗(𝑥⃗ ′ )) = ′ ∂μ 𝑓(𝑥⃗). (31)
∂𝑥 μ
The gradient of a scalar field is thus a one-form.
At a given point in spacetime, a one-form ω defines the following linear map on vectors
ω(𝑉) ≡ ωμ 𝑉 μ , (32)
to the real numbers. All linear maps from a vector space to the real numbers form a vector space called
the dual vector space with the same dimension as the original vector space. All contravariant vectors at
pint 𝑝 form the tangent space 𝑇𝑝 of the spacetime manifold at point 𝑝. A picture of the tangent space is
shown in Fig. 9. The dual vector space of the tangent space 𝑇𝑝 is called the cotangent space at point 𝑝,
denoted as 𝑇𝑝∗. Hence,
𝑉 μ ∈ 𝑇𝑝 , ωμ ∈ 𝑇𝑝∗ . (33)
Page 18 of 48
Lecture notes PH4508
We begin by making the following claim without proof: all directional derivative operators along curves
through a point 𝑝 in spacetime form a vector space that can be identified with the tangent space 𝑇𝑝 . Given
any coordinate system 𝑥 μ covering a neighbourhood of 𝑝, the partial derivative operators {∂μ } form a basis
of the vector space of directional derivatives. We then identify a (contravariant) vector 𝑉 μ with the
differential operator
𝑉 = 𝑉 μ ∂μ . (34)
As a differential operator, the vector V acts on a function f in the obvious way 𝑉(𝑓) = 𝑉 μ ∂μ 𝑓 and produces
another function. This definition makes the geometrical nature of vectors manifest: A vector 𝑉 is a
differential operator, so its action on functions is independent of the coordinate system one uses. The
transformation law of the components of a vector 𝑉 can also be easily reproduced. In a new coordinate
′
system 𝑥 μ , the vector 𝑉 becomes
′
∂𝑥 μ
𝑉 μ ∂μ = 𝑉 μ ∂ ′, (35)
∂𝑥 μ μ
′
according to the chain rule of partial differentiation. The components of 𝑉 in the coordinate system 𝑥 μ are
then
′
μ′
∂𝑥 μ μ
𝑉 = 𝑉 , (36)
∂𝑥 μ
the same as (23).
In the above definition, the basis contravariant vectors are identified with {∂μ }. Recall that one-forms are
linear functionals of contravariant vectors. A set of basis one-forms can be identified with the differential of
coordinates {𝑑𝑥 μ } . The basis one-forms 𝑑𝑥 μ satisfy the defining property
Page 19 of 48
Lecture notes PH4508
μ
𝑑𝑥 μ (∂ν ) = δν . (37)
ω = ωμ 𝑑𝑥 μ . (38)
′
Under coordinate transformations 𝑥 μ → 𝑥 μ , the one-form ω becomes
∂𝑥 μ ′
ωμ 𝑑𝑥 μ = ωμ ′ 𝑑𝑥 μ , (39)
∂𝑥 μ
where the chain rule of differentiation has been used. The components of the one-form ω in the new
coordinate system can be found to be
∂𝑥 μ
ωμ′ = ′ ωμ , (40)
∂𝑥 μ
which are the same as (28).
It is clear from the transformation rules (41) that the order and positions of indices of a tensor matter in
an essential way. For example, if the index ν1 is placed as a superscript, then (41) is no longer consistent
with Einstein summation convention. Before we define a metric tensor for the spacetime manifold, indices
of a tensor cannot be raised or lowered. Two indices if swapped will point to different components of a
tensor, e.g., 𝑇 12 ≠ 𝑇 21 for a general tensor of type (2,0).
Tensors defined here also generalise tensors in Minkowski spacetime we defined in part I. Restricting
coordinate transformations to Lorentz transformations, the transformation rules of tensor components
(41) reduce to that we used in part I for tensors in Minkowski spacetime.
Page 20 of 48
Lecture notes PH4508
A tensor 𝑇 of type (𝑘, 𝑙) has components 𝑇 μ1…μ𝑘 ν1…ν𝑙 in a coordinate system 𝑥 μ . It naturally defines a
multilinear map from a collection of one-forms and vectors to the real numbers
𝑇: 𝑇𝑝∗ × … × 𝑇𝑝∗ × 𝑇𝑝 × … × 𝑇𝑝 → 𝑅
as follows. Given any 𝑘 one-forms ω(1) , … , ω(𝑘) and l vectors 𝑉 (1) , … , 𝑉 (𝑙) , the multilinear map produces the
number
(1) (𝑘)
𝑇(ω(1) , … , ω(𝑘) , 𝑉 (1) , … , 𝑉 (𝑙) ) = 𝑇 μ1…μ𝑘 ν1…ν𝑙 ωμ1 … ωμ𝑘 𝑉 (1)ν1 … 𝑉 (𝑙)ν𝑙 . (43)
Many textbooks use this multilinear map to define the tensor 𝑇 itself.
Recall that in a coordinate system 𝑥 μ , {∂μ } form a set of basis vectors and {𝑑𝑥 μ } form a set of basis one-
forms. In the same coordinate system, a tensor 𝑇 fo type (𝑘, 𝑙) can be expanded as follows
where Einstein summation convention has been assumed. The expressions ∂μ1 ⊗ … ∂μ𝑘 ⊗ 𝑑𝑥 ν1 ⊗ … ⊗
𝑑𝑥 ν𝑙 for all possible values of indices form a set of basis tensors of type (𝑘, 𝑙).
In general relativity, many tensors are constructed from either the metric tensor or the Riemann curvature
tensor. Here are a few basic algebraic operations that can be used to produce new tensors from known
tensors.
1. Addition and subtraction: Tensors of the same type can be added or subtracted in the obvious
fashion:
(𝑆 ± 𝑇)μ1 …μ𝑘 ν1 …ν𝑙 = 𝑆 μ1 …μ𝑘 ν1 …ν𝑙 ± 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 . (45)
2. Scalar multiplication: A tensor 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 of type (𝑘, 𝑙) multiplied by a scalar f produces a tensor
of the same type:
(𝑓𝑇)μ1 …μ𝑘 ν1 …ν𝑙 = 𝑓𝑇 μ1 …μ𝑘 ν1 …ν𝑙 . (46)
3. Tensor product: The tensor product of a tensor 𝑆 μ1 …μ𝑘 ν1 …ν𝑙 of type (𝑘, 𝑙) and a tensor
𝑇 ρ1 …ρ𝑚 σ1 …σ𝑛 of type (𝑚, 𝑛) is a tensor 𝑅 μ1 …μ𝑘 ν1 …ν𝑙 ρ1 …ρ𝑚 σ1 …σ𝑛 𝑜𝑓𝑡𝑦𝑝𝑒(𝑘 + 𝑚, 𝑙 + 𝑛):
𝑅 μ1 …μ𝑘 ν1 …ν𝑙 ρ1 …ρ𝑚 σ1 …σ𝑛 = 𝑆 μ1 …μ𝑘 ν1 …ν𝑙 𝑇 ρ1 …ρ𝑚 σ1 …σ𝑛 . (47)
4. Tensor contraction: A new tensor is produced from a known tensor by setting one of its upper
indices to one of its lower indices and applying the Einstein summation convention. As an example,
the Ricci tensor is the contraction of the Riemann tensor as follows
𝑅μν = 𝑅 λ μλν . (48)
Notice that in doing tensor contractions, the positions of indices that are contracted matter.
Exercise 3. Show that the tensor product of two contravariant vectors 𝐴μ and 𝐵μ is a tensor of type (2,0).
Show that the contraction of a contravariant vector 𝐴μ with a covector 𝐵μ is a scalar.
Page 21 of 48
Lecture notes PH4508
Given a tensor, one can symmetrise or antisymmetrise its two lower or upper indices to get new tensors.
If 𝑇μν is a tensor of type (0,2), the symmetrised tensor and antisymmetrised tensor are defined as
1
𝑇(μν) = (𝑇μν + 𝑇νμ ), (49)
2
1
𝑇[μν] = (𝑇μν − 𝑇νμ ). (50)
2
More generally, for a tensor 𝑇μ1 …μ𝑙 of type (0, 𝑙) we define
1
𝑇(μ1 …μ𝑙 ) = ∑ 𝑇μσ(1) …μσ(𝑙) , (51)
𝑙!
σ
1
𝑇[μ1 …μ𝑙 ] = ∑ sign(σ)𝑇μσ(1)…μσ(𝑙) , (52)
𝑙!
σ
where the sum is taken over all possible permutations σ of 1,2, … , 𝑙 and sign(σ) is +1 for even
permutations and −1 for odd permutations. Similar definitions apply for any group of bracketed covariant
or contravariant indices. So, we can define tensors with indices partially symmetrised or antisymmetrised:
1
𝑇 (μν)λ [αβ] = (𝑇 μνλ αβ + 𝑇 νμλ αβ − 𝑇 μνλ βα − 𝑇 νμλ βα ). (53)
4
A tensor of type (0, 𝑙) is symmetric or antisymemtric if it is equal to its symmetrised or antisymmetrised
tensor
Symmetric tensor: 𝑇(μ1 …μ𝑙) = 𝑇μ1 …μ𝑙 , (54)
The metric tensor 𝑔μν is symmetric, while the electromagnetic field tensor 𝐹μν is antisymmetric.
Page 22 of 48
Lecture notes PH4508
The metric tensor of a spacetime is symmetric 𝑔μν = 𝑔νμ and non-degenerate |𝑔μν | ≠ 0. One can define
the inverse metric tensor 𝑔μν via the equation
μ
𝑔μν 𝑔νσ = δσ , (56)
μ
where δσ is the Kronecker delta. The inverse metric is a tensor of type (2,0) and is also symmetric. A
convenient and compact way to represent the metric is to define the line element of the spacetime as
follows
𝑑𝑠 2 ≡ 𝑔μν 𝑑𝑥 𝜇 𝑑𝑥 ν , (57)
where 𝑑𝑥 μ is an infinitesimal displacement vector. The line element in this form is invariant under
arbitrary coordinate transformations. This can be seen as follows. Firstly, it is a tensor because it is
constructed from the contraction of tensors. Secondly, it carries no index since all indices are contracted
in (57). The line element is thus a scalar, invariant under coordinate transformations. We will use the
terms “metric” and “line element” interchangeably in this course.
Now consider a d-dimensional manifold endowed with its metric tensor 𝑔μν . Under coordinate
′
transformations 𝑥 μ = 𝑥 μ (𝑥 μ ), the metric tensor 𝑔μν transforms as
∂𝑥 μ ∂𝑥 ν
𝑔μ′ ν′ = ′ ′ 𝑔μν . (58)
∂𝑥 μ ∂𝑥 ν
∂𝑥 μ
If the metric and the Jacobian are represented as matrices 𝑔 ≡ (𝑔μν ) and 𝐽 ≡ ( ′ ) respectively, the
∂𝑥 μ
transformation (58) can be written as a matrix equation
𝑔′ = 𝐽𝑇 𝑔𝐽. (59)
It is a mathematical result that one can always find a matrix 𝐽 that brings a symmetric and non-degenerate
matrix to a diagonal form with entries equal to ±1. This means that, at any given point in the manifold,
we can always find a coordinate system 𝑥 μ̂ such that the metric tensor has the following form
for a certain number of −1 and a certain number of +1. The manifold is then said to have a signature
(−1, … , −1, +1, … , +1). A minus (plus) sign in the signature indicates a time (spatial) dimension. If all the
entries of the signature are positive, all dimensions are spatial dimensions, and the manifold is said to be
Euclidean or Riemannian. If one and only one of the entries is −1, the manifold has one time dimension
and is said to be Lorentzian or pseudo-Riemannian. Minkowski spacetime is pseudo-Riemannian with a
signature (−1,1,1,1). Einstein equivalence principle says that our spacetime locally looks like Minkowski
spacetime. It follows that our spacetime is pseudo-Riemannian, with a signature (−1, +1, +1, +1).
You have already encountered (pseudo)-Riemannian manifolds before. The Euclidean plane is a two-
dimensional Riemannian manifold with a metric
1 0 (61)
(𝑔𝜇𝜈 ) = ( )
0 1
𝑑𝑠 2 = 𝑑𝑥 2 + 𝑑𝑦 2 , (62)
Page 23 of 48
Lecture notes PH4508
in polar coordinates. The metrics (61) and (63) describe the same geometry—the Euclidean plane—in
different coordinate systems. A unit sphere is a two-dimensional Riemannian manifold with a metric
Minkowski spacetime is four-dimensional pseudo-Riemannian manifold with a metric given by the familiar
form
𝑑𝑠 2 = −𝑑(𝑐𝑡)2 + 𝑑𝑥 2 + 𝑑𝑦 2 + 𝑑𝑧 2 , (66)
Exercise 4. Using the invariance of the line element, derive the metric (67) of Minkowski spacetime in
spherical coordinates from (66).
It is worth to emphasise that the metric components 𝑔μν of a spacetime are in general not constants but
may vary from point to point. Even the metric (67) of Minkowski spacetime is not constant when written
in spherical coordinates.
What does a metric do? How do geometrical notions such straight lines, length and angle arise from the
metric of a spacetime? What is curvature? How is it related to metric? The full glory of the metric will be
unveiled in the remaining of this part.
Length of a vector
The metric defines the “inner product” of two vectors 𝑈 μ and 𝑉 μ in a spacteime as follows
𝑈 ⋅ 𝑉 = 𝑔μν 𝑈 μ 𝑉 ν . (68)
It is easy to show that the inner product of two vectors is a scalar whose value is independent of
coordinate systems. The “length” or the norm of a vector ‖ 𝑉𝜇 ‖ can thus be defined as
‖𝑉 μ ‖2 = 𝑔μν 𝑉 μ 𝑉 ν . (69)
If the square of its norm is negative, positive or zero, the vector 𝑉 μ is said to be time-like, space-like or
null respectively.
μ μ
Now consider the displacement vector 𝑉 μ = 𝑑𝑥 μ ≡ 𝑥𝐵 − 𝑥𝐴 between two (sufficiently) nearby
μ μ
spacetime points 𝑥𝐴 and 𝑥𝐵 . Events A and B are time-like separated if 𝑉 μ is time-like. In this case, we can
find an observer that moves with constant velocity and passes through the two spacetime points along its
Page 24 of 48
Lecture notes PH4508
world line. This observer will find that the two events are at the same location with a time difference 𝑑τ
known as the proper time. The spacetime invariant interval 𝑑𝑠 2 between the two events to this observer
is
𝑑𝑠 2 = −(𝑐𝑑τ)2 . (70)
The spacetime invariant interval is of course equal to
μ μ
If 𝑉 μ is space-like, 𝑥𝐴 and 𝑥𝐵 are space-like separated and we can find an observer that moves with
constant velocity and to which the two events occur simultaneously. This observer measures the proper
distance between the two events, which is equal to
𝑑𝑙 = √𝑔μν 𝑑𝑥 μ 𝑑𝑥 ν . (73)
μ μ
If 𝑉 μ is null, 𝑥𝐴 and 𝑥𝐵 are light-like separated and they can be connected by light signals.
Volume element
The metric tensor not only defines the notion of the length of a vector in a spacetime, but also the notion
of the volume of a region in the spacetime. In a coordinate system 𝑥 μ , the volume of a region in the
spacetime (or spacetime volume, to be distinguished from spatial volume) is the integral of the volume
element
where det 𝑔 is the determinant of the metric tensor viewed as a matrix. We shall not seek a derivation of
these formulas.
The volume element and the volume integral have the following important properties. Firstly, when the
coordinates 𝑥 μ are orthonormal at a point in the sense that the metric becomes diagonal 𝑔 =
diag(−1, +1, … , +1), we recover the expected result 𝑑𝑉 = 𝑑𝑥 0 … 𝑑𝑥 𝑑−1 . This is the small spacetime
volume that a locally inertial observer will measure using his rods and clocks. Secondly, as we now show,
the volume integral (75) is a coordinate-invariant expression. In multi-variable calculus, you have learned
′
that, under a change of variables 𝑥 μ = 𝑥 μ (𝑥 μ ), the integrand of an integral should be multiplied by the
Jacobian, i.e.,
′ ′ ∂𝑥 μ
∫ √|det 𝑔|𝑑𝑥 0 … 𝑑𝑥 𝑑−1 = ∫ |det 𝐽|√|det 𝑔| 𝑑𝑥 0 … 𝑑𝑥 (𝑑−1) where 𝐽=( ′ ). (76)
𝐷 𝐷′ ∂𝑥 μ
Page 25 of 48
Lecture notes PH4508
Recall the transformation property Eq. (59) of the metric. Taking its determinant yields
det 𝑔′
|det 𝐽| = √ . (78)
det 𝑔
det 𝑔′ ′ ′ ′ ′
∫ √|det 𝑔|𝑑𝑥 0 … 𝑑𝑥 𝑑−1 = ∫ √ √|det 𝑔|𝑑𝑥 0 … 𝑑𝑥 (𝑑−1) = ∫ √|det 𝑔′ |𝑑𝑥 0 … 𝑑𝑥 (𝑑−1) . (79)
𝐷 𝐷′ det 𝑔 𝐷′
This equation says that we can use (75) to calculate spacetime volume in any coordinate system and get
the same result. The above-mentioned two properties ensure that we get the correct generalisation of
the notion of volume in an arbitrary spacetime.
If the manifold we consider is Riemannian and the range of coordinate index is μ = 1,2 … 𝑑, the volume
of a region in the manifold is the obvious integration
𝑉 = ∫ √|det 𝑔| 𝑑𝑥 1 … 𝑑𝑥 𝑑 . (80)
𝐷
As an example, we can use this formula to calculate the area (“volume” in two dimensions) of a unit sphere
described by the metric 𝑑𝑠 2 = 𝑑θ2 + sin2 θ 𝑑ϕ2 as follows
π 2π
𝑉 = ∬ √1 × sin2 θ 𝑑θ𝑑ϕ = ∫ 𝑑θ sin θ ∫ 𝑑ϕ = 4π. (81)
0 0
Exercise 5. Work out the volume element of three-dimensional Euclidean space in spherical coordinates.
Given any vector 𝑉 μ , we can contract it with the metric tensor. The resulting quantity 𝑔μν 𝑉 ν is a tensor
of type (0,1), i.e., a one-form. We use the same symbol 𝑉 to denote this one-form
𝑉μ = 𝑔μν 𝑉 ν . (82)
Conversely, given any one-form ωμ , we can contract it with the inverse metric to get a vector as follows
ωμ = 𝑔μν ων . (83)
The above operations set up a bijection between vectors and one-forms in a manifold with metric. When
there is no metric defined, vectors and one-forms are completely independent objects.
The raising and lowering of indices can be generalised to tensors of any type: We use the metric tensor to
lower an upper index and the inverse metric tensor to raise a lower index as in the following example
Page 26 of 48
Lecture notes PH4508
Notice that raising and lowering does not change the position of an index relative to other indices.
Exercise 6. Show that a vector with its index first lowered and then raised is equal to itself. Also show that
𝐴𝜇 𝐵𝜇 = 𝐴𝜇 𝐵𝜇 . (85)
When a metric is present, one can raise (lower) a lower (upper) index, and contract it with another upper
(lower) index. For example, given a symmetric tensor 𝑅μν , we can “contract its two indices” by first raising
its first index and then contracting it with the second index to get a quantity R with no index (scalar)
𝑅 μ ν = 𝑔μρ 𝑅ρν , 𝑅 = 𝑅 μ μ = 𝑔μρ 𝑅ρμ . (86)
Contractions of a tensor in this way are really contractions of the tensor with the metric tensor. When no
metric is defined, such contractions of course do not make sense.
The equivalence principle implies that in a small region of a spacetime, we can always set up a locally
inertial coordinate system such that the spacetime looks like Minkowski spacetime in the region. We say
that spacetimes are locally flat. Now we use pseudo-Riemann manifolds, namely manifolds with a metric
with Lorentzian signature, to model spacetimes. The natural question to ask is whether there exist
coordinate systems in a pseudo-Riemann manifold such that the metric looks locally flat.
The answer to this question is of course positive. For any given point 𝑝 in a manifold, we can always find
coordinates 𝑥 μ̂ such that the metric has a form that satisfies
𝑔μ̂ν̂ (𝑝) = ημ̂ν̂ and ∂ρ̂ 𝑔μ̂ν̂ (𝑝) = 0. (87)
In these coordinates, the metric is exactly the Minkowski metric at point p and its first derivatives are also
vanishing. The coordinates 𝑥 μ̂ can be identified as the forementioned locally inertial coordinates at point
𝑝. Such an identification is the mathematical representation of locally inertial reference frames.
So, at each point in spacetime, we can find locally inertial coordinates such that the metric has the form
(87 ). However, at different points, these coordinates are different. For a general spacetime, these
coordinates cannot be meshed to form a global coordinate system due to the curvature of spacetime. Or
in physical terms, in the presence of gravity, there does not exist a global inertial coordinate system in
spacetime so that the metric becomes the Minkowski metric everywhere. In other words, the non-
meshing of locally inertial coordinates to a global inertial coordinate system is a manifestation of the
presence of gravity (or more precisely, tidal forces).
Now we proceed to show that locally inertial coordinates exist at any given point 𝑝 in a pseudo-
Riemannian manifold. Let 𝑥 μ be an arbitrary coordinate system. We seek coordinate transformations as
follows
𝑥 μ = 𝑥 μ (𝑥 μ̂ ), (88)
Page 27 of 48
Lecture notes PH4508
have the desired form (87) at a given point p. Let us assume that, without loss of generality, the point 𝑝
has new coordinates 𝑥̂0 . We Taylor expand the metric components (89) around point 𝑝 in the new
coordinate system
̂
𝜌 ̂ 2
𝜌
𝑔𝜇̂𝜈̂ (𝑥̂) = 𝑔𝜇̂𝜈̂ (𝑥̂0 ) + 𝑔𝜇̂𝜈̂,𝜌̂ (𝑥̂0 ) (𝑥 𝜌̂ − 𝑥0 ) + 𝑂 ((𝑥 𝜌̂ − 𝑥0 ) )
𝜕𝑥 𝜇 𝜕𝑥 𝜈 𝜕 𝜕𝑥 𝜇 𝜕𝑥 𝜈 ̂
𝜌 ̂ 2
𝜌
= 𝜇̂ │𝑥̂0 𝜈̂ │𝑥̂0 𝑔𝜇𝜈 │𝑥̂0 + [ 𝜌̂ ( 𝜇̂ 𝜈̂ 𝑔𝜇𝜈 )] │𝑥̂0 (𝑥 𝜌̂ − 𝑥0 ) + 𝑂 ((𝑥 𝜌̂ − 𝑥0 ) )
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥
𝜇 𝜈
𝜕𝑥 𝜕𝑥 𝜕 2 𝑥 𝜇 𝜕𝑥 𝜈 𝜕𝑥 𝜇 𝜕 2 𝑥 𝜈 𝜕𝑥 𝜇 𝜕𝑥 𝜈 𝜕𝑔𝜇𝜈 ̂
𝜌
= 𝜇̂ │𝑥̂0 𝜈̂ │𝑥̂0 𝑔𝜇𝜈 │𝑥̂0 + ( 𝜌̂ 𝜇̂ 𝜈̂ 𝑔𝜇𝜈 + 𝜇̂ 𝜌̂ 𝜈̂ 𝑔𝜇𝜈 + 𝜇̂ 𝜈̂ ) │𝑥̂0 (𝑥 𝜌̂ − 𝑥0 )
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜌̂
̂ 2
ρ
+𝑂 ((𝑥 ρ̂ − 𝑥0 ) )
(90)
At point 𝑝, the metric 𝑔μ̂ν̂ can be brought to the Minkowski metric if the following equation holds
𝜕𝑥 μ 𝜕𝑥 ν
│ │ 𝑔 │ = 𝜂𝜇̂𝜈̂ . (91)
𝜕𝑥 𝜇̂ 0 𝜕𝑥 𝜈̂ 𝑥̂0 𝜇𝜈 𝑥̂0
𝑥̂
This is a set of 10 equations indexed by the pair μ̂ν̂ (swapping μ̂ and ν̂ in (91) does not give an independent
∂𝑥 μ
equation) for the 16 unknowns |
̂ 𝑥̂0 . For any given 𝑔μν|𝑥̂ , we can always find appropriate values of
∂𝑥 μ 0
∂𝑥 μ ∂𝑥 μ
μ
|
̂ 𝑥̂0 so that (91) holds. In fact, 6 out of the 16 unknowns
|
̂ 𝑥̂0 will be left unspecified. These 6 free
∂𝑥 ∂𝑥 μ
variables correspond to the degrees of freedom in the Lorentz transformations that preserve the form of
the Minkowski metric ημ̂ν̂.
∂𝑥 μ
With 𝑔μ̂ν̂ (𝑥̂0 ) set to ημ̂ν̂ by a particular choice of |
̂ 𝑥̂0 , let us now ask whether we can set the first
∂𝑥 μ
derivatives of 𝑔μ̂ν̂ to be zero. In view of (90), the question becomes whether we can set to zero the
following expression
∂𝑥 ν ∂2 𝑥 μ ∂𝑥 μ ∂2 𝑥 ν ∂𝑥 μ ∂𝑥 ν ∂𝑔μν
( 𝑔 ) │ ̂ │ ̂ + ( 𝑔 ) │ ̂ │ ̂ + ( ) │𝑥̂0 = 0. (92)
∂𝑥 ν̂ μν 𝑥0 ∂𝑥 ρ̂ ∂𝑥 μ̂ 𝑥0 ∂𝑥 μ̂ μν 𝑥0 ∂𝑥 ρ̂ ∂𝑥 ν̂ 𝑥0 ∂𝑥 μ̂ ∂𝑥 ν̂ ∂𝑥 ρ̂
∂2 𝑥 μ
The unknowns are |
̂ 𝑥̂0 . These expressions are symmetric with respect to the indices ρ
̂ and μ̂, so
∂𝑥 ρ̂ ∂𝑥 μ
𝑑(𝑑+1)
there are in total 𝑑 × 2 unknowns. The expression (92) carries three indices μ̂ , ν̂ and ρ̂ and is
𝑑(𝑑+1)
symmetric with respect to μ̂ and ν̂, so it contains a number of 𝑑 × 2 independent equations. We thus
have just the exact number of variables to adjust to set 𝑔μ̂ν̂,ρ̂ (𝑥̂0 ) to zero at the given point.
Can we carry on this process to make second derivatives 𝑔μ̂ν̂,ρ̂σ̂ (𝑥̂0 ) of the metric vanish at a given point?
The answer is no. It can be shown that we do not have enough variables to adjust to solve the conditions
𝑔μ̂ν̂,ρ̂σ̂ (𝑥̂0 ) = 0. The second derivatives 𝑔μ̂ν̂,ρ̂σ̂ (𝑥̂0 ) encode curvature information of the spacetime and
cannot be gauged away by coordinate transformations. Interested readers can refer to Schutz or Carroll.
Page 28 of 48
Lecture notes PH4508
Recall that in Minkowski spacetime, only inertial coordinate systems are used, and coordinate
transformations are restricted to Lorentz transformations. In this case, we can take the partial derivative
of a tensor to get a new tensor as shown in part I of the notes. In a general manifold under arbitrary
coordinate transformations, the partial derivative of a tensor is no longer a tensor, as you will show in the
following exercise.
Tensors at different points in a general manifold are not directly comparable: a vector at point 𝑝 lives in
𝑇𝑝 , while a vector at point 𝑞 lives in 𝑇𝑞 . Because of this, the partial derivative of a tensor is not a tensor.
We need to define how to compare tensors at different points. This is achieved by introducing the notion
of covariant derivative. There are infinitely many covariant derivatives we can define in an arbitrary
spacetime manifold, but there is a unique one that is induced from the metric tensor. The notion of
curvature can be defined in terms of the unique covariant derivative. So, in the context of general relativity,
it is legitimate to think of covariant derivative and curvature as characterising the metric, without
introducing additional structures into the spacetime manifold.
A covariant derivative offers a way to compare tensors at different points of spacetime manifold and a
covariant way to differentiate tensors to get tensors. Recall that in Minkowski spacetime, the partial
derivative of a tensor of type (𝑘, 𝑙) is a tensor of type (𝑘, 𝑙 + 1). The covariant derivative ∇ that we are
going to define generalises partial derivatives to arbitrary manifolds to produce tensors. More specifically,
a covariant derivative ∇ is a map from tensor fields 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 of type (𝑘, 𝑙) to tensor fields
∇μ 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 of type (𝑘, 𝑙 + 1) satisfying the following properties:
1. Linearity: for any two tensors 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 and 𝑆 μ1 …μ𝑘 ν1 …ν𝑙 of the same type and two numbers
𝑎 and 𝑏,
∇μ (𝑎𝑇 μ1 …μ𝑘 ν1 …ν𝑙 + 𝑏𝑆 μ1 …μ𝑘 ν1 …ν𝑙 ) = 𝑎∇μ 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 + 𝑏∇μ 𝑆 μ1 …μ𝑘 ν1 …ν𝑙 . (93)
2. Leibniz rule: for any two tensors 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 and 𝑆 ρ1 …ρ𝑚 σ1 …σ𝑛 ,
∇μ (𝑇 μ1 …μ𝑘 ν1 …ν𝑙 𝑆 ρ1 …ρ𝑚 σ1 …σ𝑛 ) = ∇μ (𝑇 μ1 …μ𝑘 ν1 …ν𝑙 )𝑆 ρ1 …ρ𝑚 σ1 …σ𝑛 + 𝑇 μ1 …μ𝑘 ν1 …ν𝑙 ∇μ (𝑆 ρ1 …ρ𝑚 σ1 …σ𝑛 ). (94)
Page 29 of 48
Lecture notes PH4508
Let's first work out how the covariant derivative acting on a vector. Since the covariant derivative is a
generalisation of partial derivative, we expect that it can be written as partial derivative plus some linear
corrections. So, we write the covariant derivative of a vector in the following form
∇μ 𝑉 ν = ∂μ 𝑉 ν + Γ ν μλ 𝑉 λ , (95)
′
where Γ ν μλ is a quantity carrying three indices. In another coordinate system 𝑥 μ , the covariant derivative
of 𝑉 takes the same form
′ ′ ′ λ′
∇μ′ 𝑉 ν = ∂μ′ 𝑉 ν + Γ ν μ′ λ′ 𝑉 . (96)
Since the covariant derivative ∇μ 𝑉 ν is a tensor of type (1,1), its components should transform according
to
′
ν′
∂𝑥 μ ∂𝑥 ν
∇μ ′ 𝑉 = μ′ ∇μ 𝑉 ν , (97)
∂𝑥 ∂𝑥 ν
′
under coordinate transformations 𝑥 μ = 𝑥 μ (𝑥 μ ). This determines the transformation property of the
quantity Γ ν μλ as follows
′ ′
′ ∂𝑥 μ ∂𝑥 λ ∂𝑥 ν ∂𝑥 μ ∂𝑥 λ ∂2 𝑥 ν
Γ ν μ′ λ′ = μ′ λ′ ν Γ ν μλ − μ′ λ′ μ λ . (98)
∂𝑥 ∂𝑥 ∂𝑥 ∂𝑥 ∂𝑥 ∂𝑥 ∂𝑥
The quantity Γ ν μλ does not transform under coordinate transformations as a tensor does, so it is not a
tensor. The first term in (98) transforms like a tensor, but the second does not since it contains the second
derivative of coordinate transformations. In view of (98), it is clear that the components of Γ ν μλ are
known in any coordinate systems if they are known in one particular coordinate system. Assigning values
to Γ ν μλ in one coordinate system then defines one covariant derivative. There are then obviously infinitely
many covariant derivatives one can define in a manifold if the linearity property and Leibniz rule are the
only conditions we impose.
The quantity Γ ν μλ is all that we need to define covariant derivative of tensors of any type if we impose
two further properties:
Now we show how these two conditions determine the covariant derivative of one-forms. Consider the
covariant derivative of the scalar formed by 𝑉 ν ων of a vector 𝑉 ν and a one-form ων . Using Property 3,
we have
∇μ (𝑉 ν ων ) = ∂μ (𝑉 ν ων ). (99)
Page 30 of 48
Lecture notes PH4508
𝑉 ν ∇μ ων = ∂μ (𝑉 ν ων ) − (∇μ 𝑉 ν )𝜔𝜈
= ων ∂μ 𝑉 ν + 𝑉 ν ∂μ ων − (∂μ 𝑉 ν + Γ ν μλ𝑉 λ )ων (101)
ν ν λ ν λ ν
= 𝑉 ∂μ ων − Γ μλ 𝑉 ων = 𝑉 ∂μ ων − Γ μν 𝑉 ωλ .
This equation is true for all 𝑉 ν , so the covariant derivative of the one-form ων must be given by
∇μ ων = ∂μ ων − Γ λ μν ωλ . (102)
The covariant derivative of tensors of arbitrary type can be worked out. The expression is of no surprise:
for each upper index we introduce a term with a single +Γ and for each lower index a term with a single
−Γ:
∇μ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙 = ∂μ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙
+Γ μ1 μλ 𝑇 λμ2 …μ𝑘 ν1 ν2 …ν𝑙 + Γ μ2 μλ 𝑇 μ1 λ…μ𝑘 ν1 ν2 …ν𝑙 + ⋯ (103)
λ μ1 μ2 …μ𝑘 λ μ1 μ2 …μ𝑘
−Γ μν1 𝑇 λν2 …ν𝑙 −Γ μν2 𝑇 ν1 λ…ν𝑙 −⋯
In many textbooks, semicolons are used for covariant derivatives as opposed to commas are used for
partial derivatives
∂μ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙 ≡ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2…ν𝑙,μ ,
(104)
∇μ 𝑇 μ1μ2 …μ𝑘 ν1 ν2 …ν𝑙 ≡ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙 ;μ .
Christoffel symbols
The covariant derivative we have defined so far has nothing to do with the metric. A unique covariant
derivative is derivable from the metric if we impose two more properties:
5. Torsion-free: Γ λ μν = Γ λ νμ.
6. Metric compatibility: ∇ρ 𝑔μν = 0.
It will be clear later that the metric compatibility is imposed to make sure that the length of a vector is
preserved under parallel-transport.
Now, we derive the unique covariant derivative that satisfies the above-mentioned properties from the
metric. Property 6, when spelled out explicitly, reads
To use Property 5, the torsion-free or symmetric property of the symbols Γ λ μν, we swap the lower indices
of Γ in the equation above
Page 31 of 48
Lecture notes PH4508
In general relativity, the torsion-free 9 and metric compatible covariant derivative is used. So, unless
otherwise specified, the symbols Γ μ αβ are determined by the metric and given as (109). Recall that in a
locally inertial coordinate system 𝑥 μ̂ constructed around a point 𝑝, the metric is approximately flat in the
sense that (87) holds. At the point p, the Christoffel symbols are all vanishing, and covariant derivatives
coincide with partial derivatives. This in turn implies that
Since this equation is true at a generic point, the covariant derivative of the metric tensor is zero
everywhere. This is another motivation behind the metric compatibility property of the covariant
derivative.
Divergence
Divergence is a very important operator in vector calculus (Recall Maxwell equations). In Minkowski
spacetime, using Cartesian coordinates, the divergence of a vector field 𝑉 μ can be written as ∂μ 𝑉 μ. It
produces a scalar field from a vector field and can be understood to represent the vector field's source.
Now, in a general manifold that may not be flat, what is the generalisation of divergence? It is not
particularly difficult to guess the answer: we replace partial derivative by covariant derivative and define
the divergence of a vector field 𝑉 μ as
∇μ 𝑉 μ = ∂μ 𝑉 μ + Γ μ μλ 𝑉 λ . (111)
The result is obviously a scalar. The divergence of a vector field can be simplified to the following
expression
1
∇μ 𝑉 μ = ∂μ (√|det 𝑔|𝑉 μ ) . (112)
√|det 𝑔|
Exercise 8. Use the formula (112) to express the divergence of a vector field 𝑉 μ in spherical coordinates
in three-dimensional Euclidean space. Compare it to the divergence formula in spherical coordinates you
learned from vector calculus.
We can talk about the divergence of tensors of arbitrary type. We simply take the covariant derivative of
a tensor and contract the index of covariant derivative with one of the indices of the tensor: ∇μ 𝑇 …μ… …….
Sometimes, you may see taking the divergence of tensors with only lower indices. For example, the
divergence of a tensor 𝑇μν of type (0,2) can be defined as follows
9
Gravity theories with torsion exist, an example being the Einstein-Cartan theory.
Page 32 of 48
Lecture notes PH4508
Divergence in this form is very important to express the conservation property of energy and momentum
and the associated stress-energy tensor.
3.3 PARALLEL-TRANSPORT
In Minkowski spacetime in Cartesian coordinates, we are always parallel-transporting vectors implicitly
μ μ
when we do vector subtraction: We subtract a vector 𝑉 μ at point 𝑥𝐴 by a vector 𝑊 μ at another point 𝑥𝐵
by simply taking the difference of the components 𝑉 μ − 𝑊 μ . What we have actually done in this process
μ
is that we first shift (parallel-transport) the vector 𝑉 μ at point 𝑥𝐴 to a vector with the same components
μ
𝑉 μ at 𝑥𝐵 , and subtract the shifted vector by 𝑊 μ , which are now at the same point. Geometrically, this
means that arrows representing vectors in flat spacetime can be shifted from points to points with
direction kept fixed.
How can we generalise this idea of parallel-transport of vectors in a spacetime manifold with an arbitrary
metric that may in general be curved? The first thing we notice is that parallel-transport of vectors in a
general space(time) is path-dependent. As an example of this, consider parallel-transporting a vector on
a round two-sphere as shown in Fig. 10. We start from the vector 𝑉𝑖 at point A. Transporting the vector 𝑉𝑖
along AC will end up with the vector 𝑉𝑓1 ,10 while transporting 𝑉𝑖 along ABC will end up with the vector 𝑉𝑓2 .
The ending vectors 𝑉𝑓1 and 𝑉𝑓2 are both at point C, but they are not equal.
10
This footnote explains how parallel-transport of 𝑉𝑖 along the curve AC is done on the round two-sphere embedded
in three-dimensional Euclidean space. First note that 𝑉𝑖 is in the plane tangent to the sphere at the starting point A.
Now move along the curve by a small displacement from A to a nearby point 𝐴1 , and parallel-transport 𝑉𝑖 (in the
usual way as in flat space) as a vector in three-dimensional Euclidean space to a vector 𝑉1′ at 𝐴1 . Since the sphere is
curved, 𝑉1′ is no longer in the plane tangent to the sphere at 𝐴1 . (𝑉𝑖 and 𝑉1′ are parallel, but the tangent plane at A
and that at 𝐴1 are not.) In other words, 𝑉1′ is not a vector on the sphere any more. We can project 𝑉1′ down to the
tangent plane at 𝐴1 . The projection vector 𝑉1 is now a vector on the tangent plane at 𝐴1 and is defined as the
parallel-transport of 𝑉𝑖 to 𝐴1 along 𝐴𝐴1 on the sphere. Now move from 𝐴1 along the curve by a small displacement
Page 33 of 48
Lecture notes PH4508
So, to appropriately talk about parallel-transporting vectors in a curved spacetime, we need to specify a
path 𝑙 along which the transport is to be done. The path 𝑙 is a curve in the spacetime, and we assume that
it is parametrised as 𝑥 μ (λ) by a parameter λ. How then to make sure that a vector is parallel-transported
along the curve if spacetime is curved? In the example as shown in Fig. 10, parallel-transport can be done
with the help of the three-dimensional Euclidean space in which the two-sphere is embedded. A general
spacetime manifold, however, cannot be embedded in the 3-dimensional Euclidean space nor in
Minkowski spacetime. We have to use intrinsic properties of the spacetime to define parallel-transport of
vectors (and tensors in general). This is achieved by using the covariant derivative we introduced in the
previous section.
𝐷 μ1 μ2 …μ𝑘 𝑑𝑥 μ
( 𝑇) ν1 ν2…ν𝑙 = ∇ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙 . (115)
𝑑λ 𝑑λ μ
In other words, the directional covariant derivative of a tensor along a curve is the contraction of the
covariant derivative of the tensor with the tangent vector of the curve.
to reach the next nearby point 𝐴2 and use the above-mentioned procedure (parallel-transport in the usual sense
and do projection) to parallel transport 𝑉1 to a vector 𝑉2 at 𝐴2 . This process can then be repeated to parallel-
transport the vector 𝑉𝑖 at A to 𝑉𝑓1 at the end point C along the curve AC. Note that this way of doing parallel-
transport relies on the existence of the three-dimensional Euclidean space in which the round sphere “lives”.
Page 34 of 48
Lecture notes PH4508
A tensor 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙 of type (𝑘, 𝑙) is parallel-transported along the curve if its directional covariant
derivative is zero
𝑑𝑥 μ
∇ 𝑇 μ1 μ2 …μ𝑘 ν1 ν2 …ν𝑙 = 0, (116)
𝑑λ μ
along the curve. A vector 𝑉 μ (λ) is parallel-transported along the curve if
𝑑 μ 𝑑𝑥 ρ σ
𝑉 + Γ μ ρσ 𝑉 = 0. (117)
𝑑λ 𝑑λ
Given the vector 𝑉 μ at a single point along the curve, there is a unique solution 𝑉 μ (λ) satisfying the above
equation. This unique solution 𝑉 μ (λ) is the parallel-transported vector of 𝑉 along the curve.
In Minkowski spacetime, the Christoffel symbols vanish in Cartesian coordinates. The parallel-transport
𝑑
equation (117) becomes 𝑑λ 𝑉 μ = 0 whose solution is a constant. Hence, the parallel-transport defined
here using the directional covariant derivative reduces to the usual notion of parallel-transport in
Minkowski spacetime. In a general curved spacetime, at the neighbourhood of any point, we can set up a
locally inertial reference frame in which (87) holds, and in this reference frame the parallel-transport
𝑑 μ
equation (117) also becomes 𝑉 = 0. This means that our definition of parallel-transport reduces to
𝑑λ
the usual notion of parallel-transport in locally inertial reference frames. Parallel-transport of vectors at a
point along an infinitesimal curve can be performed in the usual way by fixing the components of the
vectors and shifting their starting point in the locally inertial reference frame. For a finite curve, we can
divide it into many small segments and repeat the above-mentioned procedure to perform parallel-
transport. It is clear that the parallel-transport defined here does not assume and involve any higher
dimensional space in which spacetime “lives”, so it is truly an intrinsic approach.
The parallel-transport of vectors defined above has a very nice property: it preserves the inner product of
two parallel-transported vectors. If two vector 𝑉 μ and 𝑊 μ are parallel-transported, their inner product is
preserved
𝑑𝑥 μ 𝑑𝑥 μ
∇μ (𝑔ρσ 𝑉 ρ 𝑊 σ ) = [(∇μ 𝑔ρσ )𝑉 ρ 𝑊 σ + 𝑔ρσ (∇μ 𝑉 ρ )𝑊 σ + 𝑔ρσ 𝑉 ρ (∇μ 𝑊 σ )] = 0. (118)
𝑑λ 𝑑λ
The inner product of 𝑉 μ and 𝑊 μ is thus a constant along the curve. Taking 𝑊 μ = 𝑉 μ , this result says that
a parallel-transported vector has the same length along the curve, a natural requirement that we would
love to have for any sensible definition of parallel-transport.
Suppose that a path in spacetime joining two points A and B is given by the curve 𝑥 μ (σ) parametrised by
σ. The parametrisation is chosen in such a way that the two endpoints have parameter σ = 0 and σ = 1
respectively:
Page 35 of 48
Lecture notes PH4508
We assume that the tangent vector to the curve is time-like everywhere, so one can imagine that the
curve is the worldline traced out by an observer in spacetime. The proper time measured by the observer
between two nearby points along the curve with parameter difference 𝑑σ is (refer to Eq. (72) and set 𝑐 =
1)
𝑑𝑥 μ (σ) 𝑑𝑥 ν (σ) 𝑑𝑥 μ 𝑑𝑥 ν
𝑑τ = √−𝑔μν ( 𝑑σ) ( 𝑑σ) = √−𝑔μν 𝑑σ. (120)
𝑑σ 𝑑σ 𝑑σ 𝑑σ
The total proper time measured by the observer when he/she moves from point A to point B along the
path 𝑥 μ (σ) is thus
𝐵 1
𝑑𝑥 μ 𝑑𝑥 ν
τ = ∫ 𝑑τ = ∫ √−𝑔μν 𝑑σ. (121)
𝐴 0 𝑑σ 𝑑σ
Notice that between the two points A and B, there are infinitely many paths whose tangent vectors are
everywhere time-like. Eq. (121) defines a functional of the path, called the proper time functional or
length functional (by slight abuse of notion). A “straight line” connecting A to B is defined as a curve that
extremises this proper time or length functional. The problem of finding a straight line connecting A to B
now translates to the problem of finding functions 𝑥 μ (σ) that extremise (121) subject to the boundary
conditions (119). This is a variational problem with action (121) that you have encountered in Lagrangian
mechanics. Identify the integrand in the action (121) as the Lagrangian
𝑑𝑥 μ 𝑑𝑥 ν
𝐿 = √−𝑔μν . (122)
𝑑σ 𝑑σ
A straight line connecting point A and B then satisfies the Euler-Lagrange equations
𝑑 ∂𝐿 ∂𝐿
= ρ, (123)
𝑑σ ∂𝑥 ̇
ρ ∂𝑥
𝑑𝑥 ρ
where 𝑥̇ ρ = 𝑑σ
. After substituting (122) in, these equations become
𝑑 1 𝑑𝑥 ν 1 ∂𝑔αβ 𝑑𝑥 α 𝑑𝑥 β
(− 2𝑔ρν )= (− ). (124)
𝑑σ 2𝐿 𝑑σ 2𝐿 ∂𝑥 ρ 𝑑σ 𝑑σ
𝑑 𝑑𝑥 ν ∂𝑔αβ 𝑑𝑥 α 𝑑𝑥 β
(2𝑔ρν )= , (126)
𝑑τ 𝑑τ ∂𝑥 ρ 𝑑τ 𝑑τ
or equivalently
Page 36 of 48
Lecture notes PH4508
𝑑2 𝑥 ν ∂𝑔ρν 𝑑𝑥 α 𝑑𝑥 ν 1 ∂𝑔αβ 𝑑𝑥 α 𝑑𝑥 β
𝑔ρν + − ( ) = 0. (127)
𝑑τ2 ∂𝑥 α 𝑑τ 𝑑τ 2 ∂𝑥 ρ 𝑑τ 𝑑τ
𝑑2 𝑥 μ μρ
∂𝑔ρβ 𝑑𝑥 α 𝑑𝑥 β 1 μρ ∂𝑔αβ 𝑑𝑥 α 𝑑𝑥 β
+ 𝑔 − 𝑔 ( ρ ) = 0, (128)
𝑑τ2 ∂𝑥 α 𝑑τ 𝑑τ 2 ∂𝑥 𝑑τ 𝑑τ
The above method also applies to curves whose tangent vector is everywhere space-like. The length
functional 𝑙 can be defined for space-like curves as
1
𝑑𝑥 μ 𝑑𝑥 ν
𝑙 = ∫ √𝑔μν 𝑑σ . (130)
0 𝑑σ 𝑑σ
Space-like geodesics satisfy (129) with τ replaced by 𝑙. Null geodesics also satisfy (129) with a parameter
say λ that has no obvious intuitive meaning. The geodesic equation (129) can be written in the following
form
𝑑2 𝑥 μ μ
𝑑𝑥 α 𝑑𝑥 β
+ Γ αβ = 0, (131)
𝑑λ2 𝑑λ 𝑑λ
with Γ μ αβ identified as the Christoffel symbols (109). The parameter λ is the proper time τ for time-like
geodesics and proper distance 𝑙 for space-like geodesics. More generally, we can use the so-called affine
parameter that is of the form λ = 𝑎τ + 𝑏 or λ = 𝑎𝑙 + 𝑏 to parametrise geodesics without changing the
geodesic equation.
along a curve between two points A with σ = 0 and B with σ = 1, the same geodesic equation (131) is
obtained. The functional (132) has a couple of nice properties: (1) It is defined for any smooth path
between the two points A and B, which may be time-like, space-like or light-like; (2) The geodesic equation
it leads to is automatically affinely parametrised.
The above exercise shows that to find the geodesic equation in a spacetime with metric 𝑔μν , we can
alternatively identity the Lagrangian as
𝑑𝑥 μ 𝑑𝑥 ν
𝐿 = 𝑔μν , (133)
𝑑σ 𝑑σ
Page 37 of 48
Lecture notes PH4508
and apply the Euler-Lagrange equations. This is a very effective method to find geodesic equation of a
given spacetime from a practical point of view.
Free particles in special relativity feels no force, so Newton's first law says that they must travel along
straight lines in Cartesian coordinates. In general relativity, gravity is no longer treated as a force; it is now
a feature of spacetime itself and is represented by the metric. When no force other than gravity is present
in general relativity, we would expect particles still travel along straight lines but now in a curved
spacetime. We thus get a basic assumption of general relativity: free particles follow geodesics. This
statement can be viewed as a generalised version of Newton's first law in general relativity.
Properties of geodesics
Parallel-transport offers a natural alternative way to defines geodesics: A geodesic is a curve that parallel-
transports its own tangent vector. Intuitively, if you follow a geodesic, at any point, you move in a direction
that is parallel to (or the parallel-transport of) the direction in which you have been moving just before
the point—you keep going in the direction you have been going in. In general relativity, freely falling
observers experience no forces and they follow geodesics—straight lines in curved spacetime. Their
velocity is always parallel-transported along the direction they are traveling.
Let's work out the equation a geodesic must satisfy using the above definition. Consider a parametrised
𝑑𝑥 μ
curve 𝑥 μ (λ). The tangent vector of this curve is . The curve is a geodesic if the directional covariant
𝑑λ
derivative of its tangent is zero (tangent vector is parallel-transported along the curve)
𝐷 𝑑𝑥 μ
= 0. (134)
𝑑λ 𝑑λ
Written out explicitly, the geodesic equation (131) is reproduced.
If a geodesic is time-like (space-like or null) at a point, it will be time-like (space-like or null) at every point
along itself. This is because the tangent vector is parallel-transported along the geodesic and its length is
preserved. So, if the square of the length of the tangent vector is negative (positive or zero) at a point, it
will be negative (positive or zero) at every point. Physically this also makes sense: a freely falling observer
will follow time-like geodesic and remain time-like since he/she can never exceed the speed of light.
The geodesic equation (131) is a second order differential equation. Given a point and a vector at this
point in spacetime, there is a unique curve or solution to the geodesic equation that passes through the
point with the given vector as the tangent. This is consistent with our expectation that the motion of freely
falling object in a spacetime is determined by its initial position and velocity unambiguously. However,
notice that there may exist more than one geodesic that connect two points in spacetime. As an extreme
example, there are infinitely many geodesics connecting the north and south poles of a round sphere.
4 CURVATURE
In the absence of gravity, spacetime is Minkowski spacetime and it is flat. In the presence of gravity or
more precisely tidal forces, spacetime is no longer flat but curved, and the notion of curvature is used to
measure how much the geometry of spacetime deviates from being flat. There are several different
Page 38 of 48
Lecture notes PH4508
notions to quantify the curvature of spacetime, the most important of which is the Riemann curvature
tensor.
Exercise 10. Show that if covariant derivatives of a spacetime ∇μ and ∇ν commute when acting on vectors,
i.e., [∇μ ∇ν − ∇ν ∇μ ]𝑉 ρ = 0 in one particular coordinate system, they commute in every coordinate system
when acting on vectors.
Covariant derivatives in Minkowski spacetime commute in Cartesian coordinates, so they commute in any
coordinate systems. This property defines, in a coordinate independent way, the flatness of Minkowski
spacetime.
In a general spacetime, covariant derivatives do not commute with each other and the non-commutativity
can be used to describe the curvature of the spacetime.
Exercise 11. Show that, however, when acting on scalars, covariant derivatives do commute in any
spacetime: ∇μ ∇ν ϕ = ∇ν ∇μ .
We will shortly show that [∇μ , ∇ν ] when acting on vectors is a linear machine. So [∇μ , ∇ν ]𝑉 μ is a linear
combination of 𝑉 μ and can thus be written as
at any point in spacetime. The quantity 𝑅 ρ σμν carries four indices and is called the Riemann curvature
tensor. Working out Eq. (135) explicitly, one finds that the Riemann curvature tensor can be expressed in
terms of the Christoffel symbols as follows
Proof:
Page 39 of 48
Lecture notes PH4508
∇μ 𝑉 ρ = ∂μ 𝑉 ρ + Γ ρ μσ 𝑉 σ ,
∇μ ∇ν 𝑉 ρ = ∂μ ∇ν 𝑉 ρ + Γ ρ μσ ∇ν 𝑉 σ − Γ σ μν ∇σ 𝑉 ρ
= ∂μ (∂ν 𝑉 ρ + Γ ρ νσ 𝑉 σ ) + Γ ρ μσ (∂ν 𝑉 σ + Γ σ νλ 𝑉 λ ) − Γ σ μν (∂σ 𝑉 ρ + Γ ρ σλ 𝑉 λ )
= (∂μ ∂ν 𝑉 ρ + Γ ρ νσ,μ 𝑉 σ + Γ ρ νσ ∂μ 𝑉 σ ) + (Γ ρ μσ ∂ν 𝑉 σ + Γ ρ μσ Γ σ νλ 𝑉 λ ) (137)
σ ρ σ ρ λ
−(Γ μν ∂σ 𝑉 + Γ μν Γ σλ 𝑉 )
= ∂μ ∂ν 𝑉 ρ + (Γ ρ νσ ∂μ 𝑉 σ + Γ ρ μσ ∂ν 𝑉 σ − Γ σ μν ∂σ 𝑉 ρ )
+Γ ρ νσ,μ 𝑉 σ + Γ ρ μσ Γ σ νλ 𝑉 λ − Γ σ μν Γ ρ σλ 𝑉 λ .
∇ν ∇μ 𝑉 ρ = ∂ν ∂μ 𝑉 ρ + (Γ ρ μσ ∂ν 𝑉 σ + Γ ρ νσ ∂μ 𝑉 σ − Γ σ νμ ∂σ 𝑉 ρ )
(138)
+Γ ρ μσ,ν 𝑉 σ + Γ ρ νσ Γσ μλ 𝑉 λ − Γ σ νμ Γ ρ σλ 𝑉 λ .
Comparing this equation with Eq. (135), we can identify the Riemann curvature tensor as declared
in Eq. (136).
We see from the above derivation that terms involving partial derivatives of the vector field 𝑉 μ all drop
out, leaving only terms that are linear combinations of 𝑉 μ . So, at any given point, [∇μ , ∇ν ]𝑉 μ depends
only on the value of 𝑉 μ at this point and not on its values at neighbouring points. This is of course what
we want: The expression [∇μ , ∇ν ] is a kind of measure of the intrinsic structure of spacetime rather than
9information of the vector field it is acting on. The mathematical language to say this is that [∇μ , ∇ν ] is a
tensor.
The notion of curvature and Riemann curvature can be defined when a covariant derivative is defined in
spacetime. We see in Eq. (136) that components of the Riemann curvature tensor are functions of
Christoffel symbols and their first derivatives. In general relativity, the covariant derivative we use is the
unique metric compatible one derived from the metric, so the Γ's in Eq. (136) are the Christoffel symbols.
Components of the Riemann curvature tensor are thus (rather complicated) functions of the metric
together with its first and second derivatives.
The Riemann curvature tensor has a few important symmetry properties that allow us to significantly
reduce its number of independent components. These symmetries can be studied most easily using the
Riemann curvature tensor with four lower indices
Page 40 of 48
Lecture notes PH4508
The components of Riemann curvature tensor are simpler when expressed using locally inertial
̂
coordinates 𝑥 μ̂ . Recall that, in locally inertial coordinates, we have ∂ρ̂ 𝑔μ̂ν̂ = 0 and Γ λ μ̂ν̂ = 0 at a given
point p. So, in these coordinates, first derivatives of the Christoffel symbols become
̂ 1 ̂ 1 λ̂ρ̂
Γ λ μ̂ν̂,σ̂ = ∂σ̂ [𝑔λρ̂ (∂μ̂ 𝑔ν̂ρ̂ + ∂ν̂ 𝑔ρμ
̂̂ − ∂ρ
̂ 𝑔μ
̂ ν̂ )] = 𝑔 (∂σ ̂ ∂μ
̂ 𝑔ν̂ρ
̂ + ∂σ
̂ ∂ν̂ 𝑔ρ
̂μ̂ − ∂σ
̂ ∂ρ
̂ 𝑔μ
̂ ν̂ ), (141)
2 2
and the Riemann curvature tensor (136) becomes
From the above expressions, it is not difficult to observe the following algebraic symmetries of the
Riemann tensor:
3. Vanishing of completely antisymmetric part with respect to its last three indices:
𝑅ρ[σμν] = 0. (146)
These symmetries were derived from (143) in a particular coordinate system, but since they are tensor
equations, they hold in arbitrary coordinate systems. These three symmetries form a complete set of
algebraic symmetries of the Riemann tensor.
Let's work out how many independent components the Riemann curvature tensor has. The first two
symmetries imply that any component of the Riemann tensor can be written as
𝑅ρσμν = 𝑅[ρσ][μν] . (147)
Thus, independent components are specified by the pair {ρσ} with ρ ≠ σ and the pair {μν} with μ ≠ ν.
2
This reduces the number of independent components of the Riemann tensor to (𝑑2) . Since Property 3 is
completely anti-symmetric with respect to σμν, these indices must be distinct and any permutation of
them do not give new constraints. Using Property 2, Property 3 can also be written as
𝑅ρσμν + 𝑅ρμνσ + 𝑅ρνσμ = 0. (148)
Page 41 of 48
Lecture notes PH4508
Thus, independent equations from (146) can be labeled by the index ρ and the combination of three
distinct elements chosen from d integers ( 1 to 𝑑 or 0 to 𝑑 − 1). The total number of independent
constraints resulting from Property 3 is 𝑑(𝑑3). The Riemann curvature tensor then has a number of
independent components given by
𝑑 2 𝑑 𝑑2 (𝑑2 − 1)
( ) −𝑑( ) = . (149)
2 3 12
In four dimensions, this number becomes 20. In dimensions 𝑑 = 3,2,1, this number becomes 6, 1 and 0
respectively.
A couple of important and frequently used symmetries that can be derived from the above symmetries
or directly from (143) are:
Exercise 12. Using the symmetry properties 1-3 of the Riemann curvature tensor, prove the symmetry
properties 4-5.
The Riemann curvature tensor has another symmetry called the Bianchi identity involving its first
covariant derivatives
∇[𝜆 𝑅ρσ] μν = 0 ⇔ ∇λ 𝑅ρσμν + ∇ρ 𝑅σλμν + ∇σ 𝑅λρμν = 0. (152)
This symmetry will be important when we construct the Eistein tensor later. The proof of this identity is
straightforward.
Proof:
Write out all terms on the left-hand side of Eq. ( 152 ) explicitly in locally inertial coordinates 𝑥 μ̂
constructed at any given point 𝑝
1
∇λ̂ 𝑅ρ̂σ̂μ̂ν̂ + ∇ρ̂ 𝑅σ̂λ̂μ̂ν̂ + ∇σ̂ 𝑅λ̂ρ̂μ̂ν̂ = (∂λ̂ ∂μ̂ ∂σ̂ 𝑔ρ̂ν̂ − ∂λ̂ ∂μ̂ ∂ρ̂ 𝑔ν̂σ̂ − ∂λ̂ ∂ν̂ ∂σ̂ 𝑔ρ̂μ̂ + ∂λ̂ ∂ν̂ ∂ρ̂ 𝑔μ̂σ̂ )
2
1
+ (∂ρ̂ ∂μ̂ ∂λ̂ 𝑔σ̂ν̂ − ∂ρ̂ ∂μ̂ ∂σ̂ 𝑔ν̂λ̂ − ∂ρ̂ ∂ν̂ ∂λ̂ 𝑔σ̂μ̂ + ∂ρ̂ ∂ν̂ ∂σ̂ 𝑔μ̂λ̂ )(153)
2
1
+ (∂σ̂ ∂μ̂ ∂ρ̂ 𝑔λ̂ν̂ − ∂σ̂ ∂μ̂ ∂λ̂ 𝑔ν̂ρ̂ − ∂σ̂ ∂ν̂ ∂ρ̂ 𝑔λ̂μ̂ + ∂σ̂ ∂ν̂ ∂λ̂ 𝑔μ̂ρ̂ )
2
= 0.
The Riemann curvature tensor packages all curvature information of a spacetime. A spacetime is flat if
and only if its Riemann tensor vanishes. By saying that spacetime is flat, we mean that, at any given point
in spacetime, we can find an open region containing this point and construct a coordinate system such
that the metric in this region is the same as the Minkowski metric in Cartesian coordinates. This, in
particular, means that in this entire open region (not just at the given point), the Christoffel symbols vanish.
Page 42 of 48
Lecture notes PH4508
Notice, however, this does not mean that spacetime has a trivial topology. For example, a cylindrical
surface is flat but is not equivalent to the two-dimensional Euclidean space.
The non-commutativity of covariant derivatives, and thus the Riemann curvature tensor, has a more
intuitive geometrical interpretation in terms of parallel-transport of vectors in spacetime: A vector, when
parallel-transported along a closed loop to the original point, will not be equal to the original vector if
covariant derivatives do not commute (or equivalently spacetime is curved).
μ
Figure 12: Parallel-transport of a vector 𝑉𝑖 along a closed loop PQRSP in spacetime
formed by varying two coordinates 𝑥 1 and 𝑥 2 while keeping the remaining
coordinates fixed.
We now define a closed loop in a region of spacetime with coordinates 𝑥 μ , as shown in Fig. 12. Choose
any two coordinates, say 𝑥 1 and 𝑥 2 , among the 𝑑 coordinates and allow them to vary. The remaining
coordinates are kept fixed. Let us pick a point P in spacetime, and assume its coordinates are
(𝑥 1 = 𝑎, 𝑥 2 = 𝑏, … ). We fix 𝑥 2 and vary 𝑥 1 from 𝑥 1 = 𝑎 to 𝑥 1 = 𝑎 + for a small quantity ϵ, and a curve
PQ will be traced out ending at the point Q with coordinates (𝑥 1 = 𝑎 + ϵ, 𝑥 2 = 𝑏, … ). Now starting from
point Q, we fix 𝑥 1 and vary 𝑥 2 from 𝑥 2 = 𝑏 to 𝑥 2 = 𝑏 + for a small quantity δ to trace out a curve QR
ending at the point R with coordinates (𝑥 1 = 𝑎 + ϵ, 𝑥 2 = 𝑏 + δ, … ). We next go in the opposite directions.
Starting from R, we fix 𝑥 2 and vary 𝑥 1 from 𝑥 1 = 𝑎 + to 𝑥 1 = 𝑎 to trace out the curve RS ending at S with
coordinates (𝑥 1 = 𝑎, 𝑥 2 = 𝑏 + δ, … ). To go back to the original point P, we then fix 𝑥 1 and vary 𝑥 2 from
𝑥 2 = 𝑏 + to 𝑥 2 = 𝑏. A closed loop PQRSP is then formed.
Page 43 of 48
Lecture notes PH4508
μ
Given a vector 𝑉𝑖 at point P, it can be parallel-transported along the closed loop PQRSP back to a vector
μ
𝑉𝑓 at the same point P. We now show that the difference between the two vectors is proportional to the
Riemann curvature tensor. Since the vector is parallel-transported, along PQ, it satisfies the equation
𝐷 μ 𝑑𝑥 α
0= 𝑉 = 1 ∇α 𝑉 μ = ∂1 𝑉 μ + Γ μ1λ 𝑉 λ . (154)
𝑑𝑥 1 𝑑𝑥
Along the curve PQ, only the coordinate 𝑥 1 change, so the above equation is an ordinary differential
equation:
𝑑𝑉 μ
= −Γ μ1λ 𝑉 λ . (155)
𝑑𝑥 1
To leading order, this equation can be solved to give the parallel-transported vector at point Q:
𝑎+ϵ
μ
𝑉 μ (𝑄) ≈ 𝑉𝑖 − ∫ (Γ μ1λ 𝑉 λ )|𝑥 2 =𝑏 𝑑𝑥 1 . (156)
𝑎
Work along similar lines, one can find the parallel-transported vectors at points R, S and then P
respectively as
𝑏+δ
𝑉 μ (𝑅) ≈ 𝑉 μ (𝑄) − ∫ (Γ μ 2λ 𝑉 λ )|𝑥 1 =𝑎+ϵ 𝑑𝑥 2 ,
𝑏
𝑎
𝑉 μ (𝑆) ≈ 𝑉 μ (𝑅) − ∫ (Γ μ1λ 𝑉 λ )|𝑥 2 =𝑏+δ 𝑑𝑥 1 , (157)
𝑎+ϵ
𝑏
𝑉 μ (𝑃) ≈ 𝑉 μ (𝑆) − ∫ (Γ μ 2λ 𝑉 λ )|𝑥 1 =𝑎 𝑑𝑥 2 .
𝑏+δ
μ
Adding up the above four equations, we can find the parallel-transported vector back to P denoted as 𝑉𝑓
as
𝑎+ϵ 𝑏+δ
μ μ
𝑉𝑓 ≈ 𝑉𝑖 − ∫ (Γ μ1λ 𝑉 λ )|𝑥2 =𝑏 𝑑𝑥 1 − ∫ (Γ μ 2λ 𝑉 λ )|𝑥1 =𝑎+ϵ 𝑑𝑥 2
𝑎 𝑏
𝑎 𝑏
− ∫ (Γ 1λ 𝑉 )|𝑥 2 =𝑏+δ 𝑑𝑥 1 − ∫
μ λ
(Γ μ 2λ 𝑉 λ )|𝑥1 =𝑎 𝑑𝑥 2
𝑎+ϵ 𝑏+δ
𝑎+ϵ
μ
= 𝑉𝑖 + ∫ [(Γ μ1λ 𝑉 λ )|𝑥 2 =𝑏+δ − (Γ μ1λ 𝑉 λ )|𝑥 2 =𝑏 ]𝑑𝑥 1
𝑎
𝑏+δ (158)
−∫ [(Γ μ 2λ 𝑉 λ )|𝑥 1 =𝑎+ϵ − (Γ μ 2λ 𝑉 λ )|𝑥 1 =𝑎 ]𝑑𝑥 2
𝑏
μ
≈ 𝑉𝑖 + ϵδ(Γ μ1λ 𝑉 λ ),2 − ϵδ(Γ μ 2λ 𝑉 λ ),1
μ
= 𝑉𝑖 + ϵδ(Γ μ1λ,2 𝑉 λ + Γ μ1λ 𝑉 λ ,2 − Γ μ 2λ,1 𝑉 λ − Γ μ 2λ𝑉 λ ,1 )
μ
= 𝑉𝑖 + ϵδ[Γ μ1λ,2 𝑉 λ + Γ μ1λ (−Γ λ 2σ 𝑉 σ ) − Γ μ 2λ,1 𝑉 λ − Γ μ 2λ (−Γλ1σ 𝑉 σ )]
μ
= 𝑉𝑖 + ϵδ𝑉 λ (Γ μ1λ,2 − Γ μ 2λ,1 + Γ μ 2σ Γ σ1λ − Γ μ1σ Γ σ 2λ ).
Here, for the second last equality, we have used Eq. (155). We thus reach the conclusion that
μ μ
𝑉𝑓 − 𝑉𝑖 ≈ ϵδ𝑉 λ (Γ μ1λ,2 − Γ μ 2λ,1 + Γ μ 2σ Γ σ1λ − Γ μ1σ Γ σ 2λ ) = ϵδ𝑉 λ 𝑅 μ λ21 . (159)
Page 44 of 48
Lecture notes PH4508
μ μ
If the two coordinates that we allowed to vary were chosen as 𝑥 α and 𝑥 β , the change 𝑉𝑓 − 𝑉𝑖 along the
corresponding closed loop would be
μ μ
𝑉𝑓 − 𝑉𝑖 ≈ ϵδ𝑉 λ 𝑅 μ λβα . (160)
The change is not zero and is proportional to the “area” of the loop ϵδ, the initial vector and the Riemann
curvature tensor.
In this section, we show yet another geometrical interpretation of the Riemann curvature tensor via the
geodesic deviation equation. We all know that two straight parallel lines will remain parallel forever in
Euclidean space, while two parallel lines on a two-dimensional sphere will inevitably cross each other. This
behaviour and its generalisation can be quantified using the Riemann curvature tensor as follows.
Suppose that we have a family of free particles labelled by a variable s moving in spacetime. The variable
𝑠 takes real values in a continuous manner. Each particle traces out a geodesic γ(𝑡) parametrised by an
affine parameter 𝑡. All the particles then trace out a family of geodesics γ(𝑠, 𝑡) = 𝑥 μ (𝑠, 𝑡) indexed by the
variable 𝑠, depicted in Fig. 13. At time 𝑡, the coordinates of the s-th particle are 𝑥 μ (𝑠, 𝑡). We can construct
two vector fields out from the family of geodesics γ(𝑠, 𝑡) as follows
∂𝑥 μ (𝑠, 𝑡) ∂𝑥 μ (𝑠, 𝑡)
𝑇μ = , 𝑆μ = . (161)
∂𝑡 ∂𝑠
The vector 𝑇 μ can be understood as the velocity of the 𝑠-th particle, while the so-called separation vector
𝑆 μ represents the separation between the 𝑠-th particle and the 𝑠 + 1-th particle at fixed time 𝑡. Now we
show that these two vector fields, by construction, satisfy the following equation
Page 45 of 48
Lecture notes PH4508
𝑆 ρ ∇ρ 𝑇 μ = 𝑇 ρ ∇ρ 𝑆 μ . (162)
Proof:
𝑆 ρ ∇ρ 𝑇 μ − 𝑇 ρ ∇ρ 𝑆 μ = 𝑆 ρ (∂ρ 𝑇 μ + Γ μ ρλ 𝑇 λ ) − 𝑇 ρ (∂ρ 𝑆 μ + Γ μ ρλ 𝑆 λ )
= 𝑆 ρ ∂ρ 𝑇 μ − 𝑇 ρ ∂ρ 𝑆 μ
∂𝑥 ρ ∂𝑥 ρ (163)
= ∂ρ 𝑇 μ − ∂ 𝑆 μ = ∂𝑠 𝑇 μ − ∂𝑡 𝑆 μ
∂𝑠 ∂𝑡 ρ
= ∂𝑠 ∂𝑡 𝑥 μ − ∂𝑡 ∂𝑠 𝑥 μ = 0.
We have a separation vector 𝑆 μ defined at every point along the 𝑠-th particle's trajectory paramterised
by the affine parameter 𝑡 . We can then define the separation velocity vector 𝑉 μ as the directional
covariant derivative of the separation vector along this trajectory
𝐷 μ
𝑉μ = 𝑆 = 𝑇 ρ ∇ρ 𝑆 μ , (164)
𝑑𝑡
and the separation acceleration vector as the second directional covariant derivative
𝐷 μ
𝐴μ = 𝑉 = 𝑇 ρ ∇ρ (𝑇 λ ∇λ 𝑆 μ ). (165)
𝑑𝑡
The separation acceleration vector can be simplified as
This equation is known as the geodesic deviation equation, expressing the idea that the relative
acceleration of nearby geodesics is proportional to the Riemann curvature tensor. Tidal forces that cause
objects to deviate from each other in a nonuniform gravitational field manifest themselves as the
curvature of spacetime in general relativity.
Page 46 of 48
Lecture notes PH4508
Recall the few ways that one can construct new tensors from known tensors. Given the Riemann curvature
tensor, we can take its contraction to extract part of its information (recall that taking contraction of a
tensor is like taking the trace of a matrix). The Riemann curvature tensor has four indices, so there are a
few ways to contract two indices. But it turns out that there is only one independent contraction; all other
contractions are either zero or related to this one.
Exercise 13. Find the relation between the two contractions 𝑅λ μλν and 𝑅μλν λ of the Riemann curvature
tensor.
𝑅 = 𝑅μμ. (169)
The Ricci tensor and Ricci scalar are sufficiently sophisticated machines constructed using the metric that
an intuitive interpretation, though possible with the help of defining more sophisticated machines, is not
straightforward.
Before we can introduce the Einstein curvature tensor, let us recall the Bianchi identity (152) of the
Riemann curvature tensor and take contractions twice
0 = 𝑔λμ 𝑔σν (∇λ 𝑅ρσμν + ∇ρ 𝑅σλμν + ∇σ 𝑅λρμν ) ‖ Apply symmetry properties of Riemann
= 𝑔λμ 𝑔σν (∇λ 𝑅σρνμ + ∇ρ 𝑅μνσλ + ∇σ 𝑅μνλρ ) ‖ Move g to the right of ∇
= 𝑔λμ ∇λ 𝑅ν ρνμ + 𝑔σν ∇ρ 𝑅λ νσλ + 𝑔σν ∇σ 𝑅 λ νλρ
(170)
= 𝑔λμ ∇λ 𝑅ρμ − 𝑔σν ∇ρ 𝑅νσ + 𝑔σν ∇σ 𝑅νρ
= ∇μ 𝑅ρμ − ∇ρ 𝑅 + ∇ν 𝑅νρ
= 2∇μ 𝑅ρμ − ∇ρ 𝑅.
We find the divergence of the Ricci tensor in terms of the gradient of the Ricci scalar
1
∇μ 𝑅ρμ = ∇ρ 𝑅. (171)
2
For reasons that will be clear later in part III, we are motivated to find a symmetric tensor that is a
combination of the Ricci tensor and Ricci scalar and that is divergence-free. Such a tensor is called the
Einstein tensor, and after some trial can be found to be
1
𝐺μν = 𝑅μν − 𝑅𝑔μν . (172)
2
As declared, using (171), we can show that the Einstein tensor is divergence-free
∇μ 𝐺ρμ = 0. (173)
Page 47 of 48
Lecture notes PH4508
It is worth mentioning that the Einstein tensor is the unique divergence-free tensor of type (0,2) that one
can construct using the metric tensor along with its first and second derivatives (with cosmological
constant set to zero at the moment). Similar as that of the Ricci tensor, an intuitive interpretation of the
Einstein tensor is not straightforward.
This completes our discussion on the geometry of spacetime. The geometrical notions developed here
form an essential part of this course and will be used to formulate the general theory of relativity in the
next part.
Summary: Starting from the equivalence principle, we were led to model a spacetime as a pseudo-
Riemannian manifold. Physics quantities in a spacetime manifold are described by tensors, and physics
laws can be expressed as tensor equations. Gravity, via the equivalence principle, endows spacetime
with a metric tensor. The metric tensor is the fundamental of all in the formulation of general relativity
by Einstein, from which all geometrical notions, such as covariant derivatives, parallel-transport of
vectors and tensors, geodesic equation, geodesic deviation equation and various curvature tensors,
are derived and defined. In particular, the Riemann curvature tensor measures how far a metric is
away from being flat. In the next part, we will see that matter distribution will determine how
spacetime is curved through the averaged Riemann curvature called the Einstein tensor.
Page 48 of 48