0% found this document useful (0 votes)
11 views6 pages

Sms Essay 2

The document discusses how multivariable calculus relates to artificial intelligence. It begins with an introduction to multivariable calculus, covering concepts like functions of multiple variables, partial derivatives, and gradient vectors. It then explains several applications of multivariable calculus in AI, including using gradient descent for linear regression in machine learning models and training artificial neural networks. Gradient descent is described as an iterative algorithm for finding the minimum of a cost function by taking steps in the direction of steepest descent.

Uploaded by

akhile293
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Sms Essay 2

The document discusses how multivariable calculus relates to artificial intelligence. It begins with an introduction to multivariable calculus, covering concepts like functions of multiple variables, partial derivatives, and gradient vectors. It then explains several applications of multivariable calculus in AI, including using gradient descent for linear regression in machine learning models and training artificial neural networks. Gradient descent is described as an iterative algorithm for finding the minimum of a cost function by taking steps in the direction of steepest descent.

Uploaded by

akhile293
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Multivariable Calculus and its Applications in

Artificial intelligence
Chong How, NUS High School
July 2020

1 Abstract
Mathematics plays an important role in modern science and technology. From
speed of reaction and thermodynamics to electric circuits and computer network,
the usage of math is found almost everywhere. In particular, math pushes the
progress of science and technology, making advancements possible, while new
problems arise because of unlocks in new scientific domains, this often cultivate
the development of new Mathematical tools to analyse them. We will look into
how Multivariable Calculus is related to Artificial Intelligence, such as Neural
Network and Regression.

2 Introduction
“Mathematics is the queen of the sciences” - Carl Friedrich Gauss

Artificial Intelligence (AI) is closely linked to Mathematics. Why? Some key-


words associated with AI are “machine learning” “algorithms” “neural network”
and “graph theory”. These domains are heavy in maths. Some of the related
mathematical tools are Linear Algebra, Multivariable Calculus and Probability.
We shall discuss Multivariable Calculus.

3 Multivariable Calculus
3.1 Coordinates and Functions
In secondary school, the 2-D Cartesian coordinate system (or xy coordinate
system) is introduced. A function y = f (x) is a function in x. For example,
the function y = x2 − 1 is a quadratic function that cuts through the x-axis
at x = 1 and x = −1, and the y-axis at y = −1. In the Euclidean 3-D space,
there is another axis — the z-axis which is orthogonal to xy axis and its positive
direction determined by the right hand rule.

1
A function z = f (x, y) is a function in x, y. The value of z depends on x
and y. For example, the function z = f (x, y) = x2 + y 2 is a elliptical paraboloid
passing through (0,0,0).

3.2 Differentiation
dy f (x + h) − f (x)
Suppose y = f (x), the derivative, dx is defined to be lim . For
h→0 h
d 3
example, dx (2x ) = 6x2

In 3-D coordinate, suppose z = f (x, y), then the partial derivative fx (x, y) is
f (x + h, y) − f (x, y) ∂z
defined to be lim = ∂x . And the partial derivative fy (x, y)
h→0 h
f (x, y + h) − f (x, y) ∂z
is defined to be lim = ∂y .
h→0 h
What does this mean? Construct a plane y = a that cuts through the sur-
face of z = f (x, y). fx (x, y) measures the rate of change of f on the plane in the
direction of positive x-axis. fy (x, y) is interpreted the same way.

Example 1: Let f (x, y) = x2 + y 2 . Then fx (x, y) = 2x, fy (x, y) = 2y


Example 2: Let f (x, y) = x2 y 2 . Then fx (x, y) = 2xy 2 ,fy (x, y) = 2yx2

3.3 Gradient Vector and Directional Derivatives


Let f be a function of x and y. Then the gradient of f is the vector function

∇f (x, y) = hfx (x, y), fy (x, y)i

and the directional derivative of f at (x0 , y0 ) in the direction of a unit vector u


= ha, bi is
f (x0 + ha, y0 + hb) − f (x0 , y0 )
Du f (x0 , y0 ) = lim
h→0 h
if this limit exists.

Example: Let f (x, y) = x + y, then ∇f (x, y) = h1, 1i.

A useful property of ∇f (x0 , y0 ) is that it provides the direction of the greatest


rate of change of f at the point (x0 , y0 ), which is |∇f (x0 , y0 )|. This let us to
think: What if Du f (x0 , y0 ) = 0 at some point (x0 , y0 ) of f ? What if we travel
in the direction of ∇f (x0 , y0 )?

4 Artificial Intelligence
Multivariable calculus has applications in AI development. These applications
include:

2
Gradient Descent
Artificial Neural Networks
Maximising a expectation-maximization algorithm
Optimization problems
Finding maximal margin in support vector algorithm, etc

We will discuss the first two.

4.1 Linear Regression and Multiple linear Regression


Regression is a method in Statistics to model the relationship between a de-
pendent variable and one or more independent variables. In Machine Learning,
Regression Algorithm is a method to train the model. Simple Linear Regression
predicts the dependent variable based on one independent variable. The aim of
applying Linear Regression is to find a function Y = mX + c and fit predicted
values and actual values as close as possible. The difference between predicted
and actual values are quantified by using a cost (sometimes called loss) function,
which could be based on mean squared error algorithm.

Example: Suppose Y values (y¯1 , y¯2 , y¯3 ) = (4.4, 4.8, 4.9) are predicted from a
set of X values (x1 , x2 , x3 ) through the relationship Y = mX + c, while the
actual Y values are (y1 , y2 , y3 ) = (5.4, 5.8, 5.9), then the mean squared error is
X3
1
3 (yi − y¯i )2 = 1.
i=1

What above Multiple Linear Regression? Multiple Linear Regression predicts


the dependent variable based on two or more independent variable. The rela-
tionship is often given by Y = m1 X1 + m2 X2 + ... + mn Xn + c.

4.2 Gradient Descent


Gradient descent is a iterative optimization algorithm to find the minimum value
of a function. In this context, it is the cost function.

Imagine a man is going down a mountain. There is haze contributed by neigh-


bourhood countries. He does not know how to get down. He tries to move to
another point that is lower. When the slope of the mountain is steep, he takes
a huge step, while if the slope is gentle, he take smaller steps. The next point
is determined by previous point and once he reach the bottom, he stops this
process, where hopefully he reached the bottom.

3
Now, Suppose we have a bunch of n data points. Define a cost function
n
1X
E= (yi − y¯i )2 (1)
n i=1

Since y¯i = mxi + c We rewrite (1) as


n
1X
E= (yi − (mxi + c))2 (2)
n i=1

We wish to minimise E, i.e we need to find suitable values of m and c such that
E is minimised.
To do so, we evaluate
n
∂E −2 X
= xi (yi − y¯i ) (3)
∂m n i=1

and
n
∂E −2 X
= (yi − y¯i ) (4)
∂c n i=1
∂E
Can’t we just solve ∂m = 0 and ∂E ∂c = 0 ? This method does work if the
function is not complicated. It is not feasible for complicated functions, and
unfortunately most error functions are complicated, and typically we also have
∂E
a large data set, which makes solving ∂m = 0 and ∂E∂c = 0 extremely difficult.

Learning rate, L, of model controls how much we modify our model every time.
If the learning rate is too large, it cause the model to converge too quickly, while
if the learning rate is too small, it may cause long training process.

To find the local minimum, we start by letting learning rate L = 0.0001, which
controls how much m and c changes with each iteration. Let m1 = c1 = 0.
∂E
We plug in values of our data points and current m, c values into ∂m and ∂E
∂c .
Then, we update our m and c value, given by the recurrence relation:
∂E
mn+1 = mn − L ×
∂m
∂E
cn+1 = cn − L ×
∂c
As more iterations are being ran, finally we have m = lim mn and c = lim cn .
n→∞ n→∞
Hopefully, we have reached our desired linear relation Y = mX + c that fits the
actual value with predicted value optimally. Why? This is because using this
method, we could have either reach a global minimum or local minimum of the
cost function, which we would not know.

Often, the dependent variable that we wish to predict has relationship with

4
more than one variable. The idea is the same as Simple Linear Regression. For
Gradient Descent of multiple variables, we will have to treat each variable sep-
arately by making all of the other variables constant and then find the partial
derivative of the function.

Gradient Descent is also used in Neural networks.

4.3 Artificial Neural Networks and Machine Learning


As the name suggests, Artificial Neural Networks are inspired by the brain.
They are sometimes also called models.

Imagine we have many neurons (or nodes). These neurons are organised in
layers. Each neuron holds a number and a bias value. The layers of neurons are
connected with neurons in other layers, forming a neural network.

A neural network accepts one or multiple inputs, processes it and give one
or multiple outputs.

A neuron of one layer interacts with neuron of the next layer through weighted
connections (a real valued number) between two neurons. Neuron in the next
layer receives values of the previous layer neurons, each multiplied by their con-
nection weight. The total sum of all those products plus that neuron’s bias value
is then put into an activation function, which then returns a value and assigns
to that neuron. Information is passed through the entire network. The key is
to decide on appropriate weights and bias values, which can be determined via
machine learning. How?

Suppose we have an artificial neural network. We define a cost function

C = C(W )

where W = (w1 , w2 , w3 , ..., wn ) is a matrix storing each weight/bias values. For


each i, wi is the value of one weight or bias in the network. C is our ”error” and
should be as low as possible. Clearly, we want to minimise the cost function,
and this can be done by computing partial derivative with respect to each of the
weights and bias in the network, which initially have any arbitrary initialised
values that are subjected to change. We then modify the weight and bias values
accordingly using the recurrence relation Wn+1 = Wn − ∇C(Wn ). By iterating
this process, it is likely we can get to a point Cmin = lim C(Wn ), At which C
n→∞
is minimised and the artificial neural network model is trained to give us our
desired result.

This method that we have just discussed is an old variant of modern Machine
Learning. It builds the foundation of more advanced concepts in Machine Learn-
ing. We will not be discussing that.

5
5 Concluding Remarks
It is truly remarkable how far technology has progressed over the last four
decades. Artificial Intelligence especially, no doubt holds great potential in ar-
eas such as research and new development. According to a new report from the
World Economic Forum (WEF), the growth of artificial intelligence could create
58 million net new jobs by the year 2022. Artificial Intelligence (AI) will define
the next phase of the world’s landscape, transforming our economy and society.
Prevalence of AI is likely to rise in the next few decade. And it will dominate
our lives, bringing immense benefits and also uncertainties into the future

Everything about AI is fascinating, yet without Mathematics, the study and


development of AI will be impossible. This is yet another instance that Math-
ematics is inseparable from our modern world. I would like to end with the
famous quote by Galileo Galilei: “In order to understand the universe you must
know the language in which it is written and that language is mathematics.”

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy