SIAM Review Book Review
SIAM Review Book Review
net/publication/385006853
CITATIONS READS
0 153
2 authors:
All content following this page was uploaded by Hollis Williams on 18 October 2024.
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey
Hinton for their work on artificial intelligence and machine learning. The award
has been somewhat controversial in the physics community and prompted some
heated debates, since the only apparent use of physics is the Boltzmann distri-
bution in the sampling function of the Boltzmann machine [1]. If we leave aside
this debate for the time being, it is undeniable that AI and machine learning
have had a transformative effect on various areas of science and technology. One
example is high energy physics, where data from particle colliders is often anal-
ysed using machine learning algorithms [2]. Besides this, AI was used to make
remarkable progress on the problem of protein folding in biochemistry, resulting
in half of the 2024 Nobel Prize in Chemistry being awarded to Hassabis and
Jumper at Google DeepMind [3]. One could also argue that there is precedent
for awarding the prize to important work which applies physics in a key way
without necessarily being classed as ’physics’ (an example might be Jack Kilby,
who was awarded the prize in 2000 for invention of the integrated circuit).
Given the current interest and excitement on the topic, a book which takes
the reader through the rigorous mathematical approach to some well-known
machine learning algorithms whilst only requiring basic linear algebra is desir-
able, which is exactly what this book provides. This is especially timely, since
there is some confusion amongst general mathematical readers about the phrase
’machine learning’, which is often used to describe a group of data analysis
techniques which are applied to complicated real-world data sets. Two typical
problems are reduction of the dimensionality of the data set and predictions
about data which is unseen (known as regression). The regression problem
is key in high energy physics because of the huge amounts of data involved.
These basic principles are discussed in Chapter 1 and terminology is clarified.
Machine learning is distinguished from artificial intelligence, which considers
all aspects of machine intelligence. Mathematical definitions are provided for
machine learning jargon and the distinction is made between supervised and un-
supervised learning. Roughly speaking, supervised learning algorithms predict
outputs for specific inputs, given both inputs and outputs as samples. Unsu-
1
pervised learning algorithms are only given input samples. ML algorithms are
essentially more flexible and powerful than normal algorithms because, rather
than producing certain solutions given a certain setting, they infer what the
solution is depending on the data fed into them.
Chapter 2 introduces two of the simplest and most intuitive supervised learn-
ing algorithms: linear least squares and k-nearest neighbours [4]. The treat-
ment is streamlined by considering affine linear functions as the model class
over which the learning problem is solved. The chapter also introduces crite-
ria which are used to quantify the performance of regression and classification
algorithms. Some programming tasks help the reader to gain familiarity with
numpy, a Python library which is widely used in the sciences. Chapter 3 brings
in support vector machines as a more complicated ML method and provides a
derivation of the basic algorithm. The level of mathematical rigour provided is
similar to that which would typically be provided in a linear algebra textbook.
Optimisation of finding separating hyperplanes for classification is discussed and
the Python library scikit-learn is also introduced.
Chapter 4 considers dimensionality reduction, the process of determining a
low-dimensional representation of a large high-dimensional data set which pre-
serves relevant information. It also provides the first example of an unsupervised
learning algorithm for dimensionality reduction, known as principal component
analysis. The algorithm is used in a case study of a data set with images of
pedestrians. Chapter 5 brings in nonlinear dimensionality reduction methods
which do not assume that the data sits on an affine linear subspace and gener-
alises to curved manifolds [5]. A case study from biology is provided (single-cell
analysis).
Chapter 6 studies the model class of artificial neural networks and deep neu-
ral networks. These have been very successful in analysing large data sets, espe-
cially in signal processing and image recognition [6]. The Python libraries keras
and tensorflow are introduced and used to construct neural networks and analyse
some example data sets. Chapter 7 focuses on neural networks which are used
specifically for dimensionality reduction (known as variational autoencoders).
Concepts from statistics are introduced as needed. Chapter 8 provides a brief
review of the connection between neural networks and differential equations,
both ODEs and PDEs. Neural tangent kernels, neural ODEs, and generative
diffusion models are discussed [7]. As the name suggests, diffusion probabilistic
models are broadly inspired by reasoning from non-equilibrium thermodynamics
[8].
Chapter 9 introduces reinforcement learning algorithms. These are used in
a wide range of applications, including quantum field theory [9]. Along with
supervised and unsupervised learning, reinforcement learning is the third major
type of learning in artificial intelligence and has influences from cognitive psy-
chology. Basic optimal control theory is discussed. Chapter 10 concludes with
a set of short introductions to machine learning concepts and algorithms that
are not covered in the book, including recurrent neural networks, transformer
networks, and ensemble methods.
Most of the chapters are accompanied by programming exercises which il-
2
lustrate the main points discussed. One drawback of the book is that it assumes
basic familiarity with Python, which is likely not universal amongst all mathe-
maticians and scientists (if you do not know what numpy is, you may not have
the prerequisite knowledge). However, a reader with mathematical training who
is willing to put in some time scrolling through GitHub repositories could get
up to speed quite quickly, and step-by-step tutorials are provided for the im-
portant concepts used in numpy. The book is recommended for mathematically
minded readers that wish to learn more about the recent AI/machine learning
hype without compromising on rigour.
[1] D.H. Ackley, G.E. Hinton and T.J. Sejnowski, A Learning Algorithm for
Boltzmann Machines, Cog. Sci., 9 (1985), pp. 147-169.
[2] A.S. Cornell et al., Boosted decision trees in the era of new physics: a smuon
analysis case study, J. High Energ. Phys., 15 (2022).
[3] J. Jumper et al., Highly accurate protein structure prediction with Al-
phaFold, Nature 596 (2021), pp. 583-589.
[4] K. Murphy, Probabilistic Machine Learning: An Introduction, MIT Press,
Cambridge, MA, 2022.
[5] J. Lee and M. Verleysen, Nonlinear Dimensionality Reduction, Springer &
Business Media, New York, NY, 2007.
[6] C. Aggarwal, Neural Networks and Deep Learning, Springer, Switzerland,
2018.
[7] A. Jacot et al., Neural tangent kernel: Convergence and generalization in
neural networks. In S. Bengio et al. (ed.), Advances in Neural Information
Processing Systems, vol. 31, pp. 8580-8589, Curran Associates Inc., Red Hook,
NY, 2018.
[8] J. Ho. et al., Denoising diffusion probabilistic models. In H. Larochelle et
al. (ed.) Advanced in Neural Information Processing Systems, vol. 33, pp.
6840-6951, Curran Associates Inc., Red Hook, NY, 2020.
[9] G. Kántor, C. Papageorgakis and V. Niarchos, Solving Conformal Field The-
ories with Artificial Intelligence, Phys. Rev. Lett. 128 (2022), 041601.
Hollis Williams
Azza M. Algatheem - Department of Mathematics , Faculty of Science, Univer-
sity of Bisha, Bisha 61922, Saudi Arabia