Sms Essay 2
Sms Essay 2
Artificial intelligence
Chong How, NUS High School
July 2020
1 Abstract
Mathematics plays an important role in modern science and technology. From
speed of reaction and thermodynamics to electric circuits and computer network,
the usage of math is found almost everywhere. In particular, math pushes the
progress of science and technology, making advancements possible, while new
problems arise because of unlocks in new scientific domains, this often cultivate
the development of new Mathematical tools to analyse them. We will look into
how Multivariable Calculus is related to Artificial Intelligence, such as Neural
Network and Regression.
2 Introduction
“Mathematics is the queen of the sciences” - Carl Friedrich Gauss
3 Multivariable Calculus
3.1 Coordinates and Functions
In secondary school, the 2-D Cartesian coordinate system (or xy coordinate
system) is introduced. A function y = f (x) is a function in x. For example,
the function y = x2 − 1 is a quadratic function that cuts through the x-axis
at x = 1 and x = −1, and the y-axis at y = −1. In the Euclidean 3-D space,
there is another axis — the z-axis which is orthogonal to xy axis and its positive
direction determined by the right hand rule.
1
A function z = f (x, y) is a function in x, y. The value of z depends on x
and y. For example, the function z = f (x, y) = x2 + y 2 is a elliptical paraboloid
passing through (0,0,0).
3.2 Differentiation
dy f (x + h) − f (x)
Suppose y = f (x), the derivative, dx is defined to be lim . For
h→0 h
d 3
example, dx (2x ) = 6x2
In 3-D coordinate, suppose z = f (x, y), then the partial derivative fx (x, y) is
f (x + h, y) − f (x, y) ∂z
defined to be lim = ∂x . And the partial derivative fy (x, y)
h→0 h
f (x, y + h) − f (x, y) ∂z
is defined to be lim = ∂y .
h→0 h
What does this mean? Construct a plane y = a that cuts through the sur-
face of z = f (x, y). fx (x, y) measures the rate of change of f on the plane in the
direction of positive x-axis. fy (x, y) is interpreted the same way.
4 Artificial Intelligence
Multivariable calculus has applications in AI development. These applications
include:
2
Gradient Descent
Artificial Neural Networks
Maximising a expectation-maximization algorithm
Optimization problems
Finding maximal margin in support vector algorithm, etc
Example: Suppose Y values (y¯1 , y¯2 , y¯3 ) = (4.4, 4.8, 4.9) are predicted from a
set of X values (x1 , x2 , x3 ) through the relationship Y = mX + c, while the
actual Y values are (y1 , y2 , y3 ) = (5.4, 5.8, 5.9), then the mean squared error is
X3
1
3 (yi − y¯i )2 = 1.
i=1
3
Now, Suppose we have a bunch of n data points. Define a cost function
n
1X
E= (yi − y¯i )2 (1)
n i=1
We wish to minimise E, i.e we need to find suitable values of m and c such that
E is minimised.
To do so, we evaluate
n
∂E −2 X
= xi (yi − y¯i ) (3)
∂m n i=1
and
n
∂E −2 X
= (yi − y¯i ) (4)
∂c n i=1
∂E
Can’t we just solve ∂m = 0 and ∂E ∂c = 0 ? This method does work if the
function is not complicated. It is not feasible for complicated functions, and
unfortunately most error functions are complicated, and typically we also have
∂E
a large data set, which makes solving ∂m = 0 and ∂E∂c = 0 extremely difficult.
Learning rate, L, of model controls how much we modify our model every time.
If the learning rate is too large, it cause the model to converge too quickly, while
if the learning rate is too small, it may cause long training process.
To find the local minimum, we start by letting learning rate L = 0.0001, which
controls how much m and c changes with each iteration. Let m1 = c1 = 0.
∂E
We plug in values of our data points and current m, c values into ∂m and ∂E
∂c .
Then, we update our m and c value, given by the recurrence relation:
∂E
mn+1 = mn − L ×
∂m
∂E
cn+1 = cn − L ×
∂c
As more iterations are being ran, finally we have m = lim mn and c = lim cn .
n→∞ n→∞
Hopefully, we have reached our desired linear relation Y = mX + c that fits the
actual value with predicted value optimally. Why? This is because using this
method, we could have either reach a global minimum or local minimum of the
cost function, which we would not know.
Often, the dependent variable that we wish to predict has relationship with
4
more than one variable. The idea is the same as Simple Linear Regression. For
Gradient Descent of multiple variables, we will have to treat each variable sep-
arately by making all of the other variables constant and then find the partial
derivative of the function.
Imagine we have many neurons (or nodes). These neurons are organised in
layers. Each neuron holds a number and a bias value. The layers of neurons are
connected with neurons in other layers, forming a neural network.
A neural network accepts one or multiple inputs, processes it and give one
or multiple outputs.
A neuron of one layer interacts with neuron of the next layer through weighted
connections (a real valued number) between two neurons. Neuron in the next
layer receives values of the previous layer neurons, each multiplied by their con-
nection weight. The total sum of all those products plus that neuron’s bias value
is then put into an activation function, which then returns a value and assigns
to that neuron. Information is passed through the entire network. The key is
to decide on appropriate weights and bias values, which can be determined via
machine learning. How?
C = C(W )
This method that we have just discussed is an old variant of modern Machine
Learning. It builds the foundation of more advanced concepts in Machine Learn-
ing. We will not be discussing that.
5
5 Concluding Remarks
It is truly remarkable how far technology has progressed over the last four
decades. Artificial Intelligence especially, no doubt holds great potential in ar-
eas such as research and new development. According to a new report from the
World Economic Forum (WEF), the growth of artificial intelligence could create
58 million net new jobs by the year 2022. Artificial Intelligence (AI) will define
the next phase of the world’s landscape, transforming our economy and society.
Prevalence of AI is likely to rise in the next few decade. And it will dominate
our lives, bringing immense benefits and also uncertainties into the future