0% found this document useful (0 votes)

19 views39 pages

FAI 4 Mathematical Concepts II

This document provides an overview of mathematical concepts for an artificial intelligence course, including: - Gradient descent algorithms for finding the minimum of a function by iteratively updating parameters based on the gradient. - Partial derivatives, which are used when functions have multiple variables and involve fixing variables to obtain derivatives of single variable parameterized functions. - Parameterized functions, where some variables can be used to describe a "family" of functions, with each value of the parameters corresponding to a distinct single-variable function.

Uploaded by

zhipengyang0110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views39 pages

FAI 4 Mathematical Concepts II

Uploaded by

zhipengyang0110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Mathematical Concepts (2/2)

Functions, Minimization, Gradient

Fundamentals of Artificial Intelligence
Instructor: Chenhui Chu
Email: chu@i.kyoto-u.ac.jp

Teaching Assistant: Youyuan Lin

E-mail: youyuan@nlp.ist.i.kyoto-u.ac.jp
Schedule
• 1. Overview of AI and this Course (10/2)
• 2. Introduction to Python (10/16)
• 3, 4. Mathematics Concepts I, II (10/23, 10/30)
• 5, 6. Regression I, II (11/6, 11/13)
• 7. Classification (11/20)
• 8. Introduction to Neural Networks (11/27)
• 9. Neural Networks Architecture and Backpropagation (12/4)
• 10. Fully Connected Layers (12/1112/18)
• 11, 12, 13. Computer Vision I, II, III (12/25, 1/4, 1/15)
• 14. Natural Language Processing (1/22)

2
Overview of This Course
11, 12, 13. Computer Vision 14. Natural language
I, II, III processing
Deep Learning Applications

8. Neural network 9. Architecture and 10. Feedforward

Introduction Backpropagation neural networks
Deep Learning

5. Simple linear 6. Multiple linear

7. Classification
regression regression
Basic Supervised Machine Learning

2. Python 3, 4. Mathematics Concepts I, II

Fundamental of Machine Learning 3

Gradient Descent Algorithm (1/4)
• This suggests some procedure for finding a minimum:
• Start at any x (e.g., x = 0)

• Compute f’(x)
• If f’(x) > 0: Decrease x a bit
• If f’(x) < 0: Increase x a bit
• Repeat
4
Gradient Descent Algorithm (2/4)
• This suggests some procedure for finding a minimum:
• Start at any x (eg. x = 0)

• Compute f’(x)
In practice, we do this:
• If f’(x) > 0: Decrease x a bit
𝑥: = 𝑥 − 𝑙𝑟 ⋅ 𝑓 ! (𝑥)
• If f’(x) < 0: Increase x a bit lr: learning rate
Should be a positive value
• Repeat If too large: no convergence
If too small: very slow convergence
5
Gradient Descent Algorithm (3/4)
lr = 0.2
!
Initialize x 𝑓(𝑥) = 𝑥 − 𝑥
x=0
𝑓 !(𝑥) = 2𝑥 − 1 f’(x) = -1
Compute argmin𝑓(𝑥) = 0.5
f’(x) " x=0.2
f’(x) = -0.6

x is good Yes x=0.32

enough? Done f’(x) = -0.36

x = 0.392
No …
…
lr: learning rate x=0.493
Update x: Should be a positive value f’(x) = -0.014
𝑥: = 𝑥 − 𝑙𝑟 ⋅ 𝑓 ! (𝑥) If too large: no convergence STOP?
If too small: very slow convergence
6
Gradient Descent Algorithm (4/4)
• Gradient descent works well even when we have functions of
millions of variable
• This is why it is so useful for Machine Learning and Neural
Networks
• Other methods will not be practical in such settings
• Convergence will depend on the choice of a good learning rate
• In experiments, a good deal of time is often spent finding an
optimal learning rate
• Too large learning rate: no convergence (ie. the system learn
nothing)
• Too small learning rate: slow convergence (ie. the system takes
a long time to learn)
7
Minimizing a Function of
Several Variables

8
Functions of Several Variables
• A function of several variables is just that: a function which has several
variables

𝑓(0,0,0) = 0
𝑓: ℝ! →ℝ 𝑓(1,2,3) = 7
𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓(−1,2,2) = 11
𝑓(0,1,1) =?
𝑓(2,2,0) =?
• Like before, we want to find its minimum:

argmin𝑓(𝑥, 𝑦, 𝑧) = (0,0,0.5)
",$,%

9
Parameterized Functions (1/5)
• By fixing one of the variable, we can obtain a function
with one less variable
Function of 3 variables: 𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

Fixing z: 𝑧=2

𝑓(𝑥, 𝑦, 2) = (𝑥 − 𝑦)! + 4 − 2

Function of 2 variables: 𝑓(𝑥, 𝑦) = (𝑥 − 𝑦)! + 2

10
Parameterized Functions (2/5)
• By fixing one of the variable, we can obtain a function
with one less variable
Function of 3 variables: 𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

Fixing y: 𝑦=2

𝑓(𝑥, 2, 𝑧) = (𝑥 − 2)! + 𝑧 ! − 𝑧

Function of 2 variables: 𝑓(𝑥, 𝑧) = (𝑥 − 2)! + 𝑧 ! − 𝑧

11
Parameterized Functions (3/5)
• By fixing one of the variable, we can obtain a function
with one less variable
Function of 3 variables: 𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

Fixing y and z: 𝑦=2 𝑧=3

𝑓(𝑥, 2, 3) = (𝑥 − 2)! + 9 − 3

Function of 1 variable: 𝑓(𝑥) = (𝑥 − 2)! + 6

12
Parameterized Functions (4/5)
• Therefore, in this case, variables y and z can be used to describe a
“family” of functions. We say they parameterize the

Function of 3 variables: 𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

For each value of y and

Fixing y and z: 𝑦=2 𝑧=3 z, we have one function
of one variable
𝑓(𝑥, 2, 3) = (𝑥 − 2)! + 9 − 3

Function of 1 variable: 𝑓(𝑥) = (𝑥 − 2)! + 6

13
Parameterized Functions (5/5)
• In such a case, we will say that f is a function parameterized by y
and z.

• And we note the parameters separately, as subscripts

Function of 3 variables: 𝑓$,% (𝑥) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

For each value of y and

Fixing y and z: 𝑦=2 𝑧=3 z, we have one function
of one variable
𝑓!,& (𝑥) = (𝑥 − 2)! + 9 − 3
Function of 1 variable: 𝑓!,& (𝑥) = (𝑥 − 2)! + 6 𝑓',' (𝑥) = 𝑥 !
𝑓',! (𝑥) = 𝑥 ! + 2
14
Partial Derivatives (1/4)
• What is the equivalent of our “high school” derivatives when we
have several variables?

• One part of the answer is partial derivatives

• Partial derivatives are computed by choosing one variable and
fixing the others

• In other words, we see the function of several variables as a

parameterized function of one variable

15
Partial Derivatives (2/4)
• What is the equivalent of our “high school” derivatives when we have several
variables?

• One part of the answer is partial derivatives

• Partial derivatives are computed by choosing one variable and fixing the others

• In other words, we see the function of several variables as a parameterized

function of one variable ! !
𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦) + 𝑧 − 𝑧
• Indeed, if we choose y, and fix x and z, we can see f(x, y, z) as a function of one
variable and compute its derivative
𝜕𝑓 𝜕𝑓 𝜕𝑓
= 2(𝑥 − 𝑦) = 2(𝑦 − 𝑥) = 2𝑧 − 1
𝜕𝑥 𝜕𝑦 𝜕𝑧
16
Partial Derivatives (3/4)
𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧
Fix other variables

𝑓$,% (𝑥) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",% (𝑦) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",$ (𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

Compute derivative

( (𝑥) = 2(𝑥 − 𝑦)
𝑓$,% ( (𝑦) = 2(𝑦 − 𝑥) ( (𝑧) = 2𝑧 − 1
𝑓",$
𝑓",%

17
Partial Derivatives (4/4)
𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧
Fix other variables

𝑓$,% (𝑥) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",% (𝑦) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",$ (𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

Compute derivative

( (𝑥) = 2(𝑥 − 𝑦)
𝑓$,% ( (𝑦) = 2(𝑦 − 𝑥) ( (𝑧) = 2𝑧 − 1
𝑓",$
𝑓",%
In practice, we use this notation for partial derivatives:
𝜕𝑓 𝜕𝑓 𝜕𝑓
= 2(𝑥 − 𝑦) = 2(𝑦 − 𝑥) = 2𝑧 − 1
𝜕𝑥 𝜕𝑦 18
𝜕𝑧
Compute the Partial Derivatives

𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

𝜕𝑓 𝜕𝑓 𝜕𝑓
= = =
𝜕𝑥 𝜕𝑦 𝜕𝑧

19
Vectors (1/3)
• What are vectors?

• You probably have used vectors in physics classes to represent force and speed

• 3-dimensional vectors: [2.3, 4.5, -1]

• In machine learning, we also use them a lot

• Except that they can have more than 3 dimensions

• 5-dimensional vector: [-1, 3, 4.1, 5.2, 4]

• We often note the set of all n-dimensional vectors ℝ6

[1.2, 1.4, 1, −1, −1] ∈ ℝ)

20
Vectors (2/3)
• For now, we only need to know the following about vectors:

• A n-dimensional vector is a list of n numbers

• We can add 2 vectors (if they have the same dimension)

[2.1, 3.4, 1.1, 3.2] + [−1, 2.1, 3.1, −2] = [1.1, 5.5, 4.2, 1.2]
[2.1,3.4] + [−1,2.1,3.1, −2] =
• We can multiply a vector by a number
0.5×[2, 3, −1, −2] = [1, 1.5, −0.5, −1]

21
Vectors (3/3)

→
• We will usually denote a vector by a letter with an arrow on it: 𝑥
→
• We denote the ith component of 𝑥 by xi
• If →
𝑥 = [1, 2.2, −1,4]

• Then we have x0=1, x1=2.2, x2=-1, x3=4

22
Vectors and Numpy
• In Python, Numpy arrays are a
convenient way to represent
vectors
→
𝑥 = [1, 5, −2, 0.5]
• x = np.array([1, 5, -2, 0.5])

• x[0] == x0 == 1

• x[1] == x1 == 5

• Vector operations: x+0.5*y

23
Vectors and Multivariate Functions
• For now, we have represented the variables of a multivariate function with the
letters x, y, z as in: 𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧
• In practice, we can have any number of variables. So it is more convenient to use:
x0 (instead of x) , x1 (instead of y), x2 (instead of z), x3 .. xn (if we need more than 3
variables) 𝑓(𝑥' , 𝑥+ , 𝑥! ) = (𝑥' − 𝑥+ )! + 𝑥!! − 𝑥!
• We can also use a vectorial notation to represent all of the variables as one vector
variable: → → ! !
𝑥 = [𝑥' , 𝑥+ , 𝑥! ] 𝑓(𝑥 ) = (𝑥' − 𝑥+ ) + 𝑥! − 𝑥!
• So, keep in mind that the 3 following expressions actually refer to the same
function: ! !
𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦) + 𝑧 − 𝑧
𝑓(𝑥' , 𝑥+ , 𝑥! ) = (𝑥' − 𝑥+ )! + 𝑥!! − 𝑥!
→
𝑓(𝑥 ) = (𝑥' − 𝑥+ )! + 𝑥!! − 𝑥!
24
Gradient (1/2)

• The partial derivatives become the component of a vector we call the

gradient
𝜕𝑓 𝜕𝑓 𝜕𝑓
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥, 𝑦, 𝑧) = [ , , ]
𝜕𝑥 𝜕𝑦 𝜕𝑧

𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧
• For example:
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥, 𝑦, 𝑧) = [2(𝑥 − 𝑦), 2(𝑦 − 𝑥), 2𝑧 − 1]

25
Gradient (2/2)
𝜕𝑓 𝜕𝑓 𝜕𝑓
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥, 𝑦, 𝑧) = [ , , ]
𝜕𝑥 𝜕𝑦 𝜕𝑧
• In this case, the function has 3 variables. Therefore the gradient is a
vector of size 3

• If the gradient has n variables, it is a vector of size n

• More precisely, the gradient of f is itself a function that return a vector

𝑓: ℝ" →ℝ 𝑔𝑟𝑎𝑑 ⋅ 𝑓: ℝ" → ℝ"

𝑓(𝑥# , 𝑥$ , . . . , 𝑥% ) 𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥" , 𝑥! , . . . , 𝑥# ) = [𝑔" , . . . , 𝑔# ]
26
Interpreting the Gradient
• At a given point, the gradient is the direction for 3D plot
which the value of the function increase fastest

• Therefore, in general, it points in the direction

opposite to the minimum
𝑓(𝑥, 𝑦) = 4(𝑥 − 2)! + 4(𝑦 + 1)! − 0.1𝑥𝑦
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥, 𝑦) = [8(𝑥 − 2) − 0.1𝑦, 8(𝑦 + 1) − 0.1𝑥]

𝑔𝑟𝑎𝑑 ⋅ 𝑓(0,0) = [−16, 16]

𝑔𝑟𝑎𝑑 ⋅ 𝑓(2, −1) = [0.1, −0.2]

27
Contour Plot
Gradient Descent
• Because we know that the gradient point in a direction opposite
to the minimum, we can use the same idea as in the case of one
variable

One variable: Multiple variables:

A → → →
𝑥: = 𝑥 − 𝑙𝑟 ⋅ 𝑓 (𝑥) 𝑥 : = 𝑥 − 𝑙𝑟 ⋅ 𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 )

28
Gradient Descent Algorithm →
→ →
𝑓(𝑥 ) = (𝑥' − 𝑥+ )! + 𝑥!! − 𝑥!
Initialize 𝑥 𝑥 = [𝑥', 𝑥+, . . . 𝑥, ] →
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) = [2(𝑥! − 𝑥" ), 2(𝑥" − 𝑥! ), 2𝑥# − 1]

𝑙𝑟 = 0.2
→
Compute grad f(𝑥 )
𝜕𝑓→ 𝜕𝑓
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) = [ ,... ] →
𝑥 = [0,1,0]
𝜕𝑥' 𝜕𝑥, →
Yes →
𝑥 should be close to the
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) = [−2, 2, −1]
|grad(x)| < err
minimum
→
𝑥 = [0.4,0.6,0.2]
No →
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) = [−0.4, 0.4, −0.6]
→
Update 𝑥: →
→ → →
𝑥 : = 𝑥 − 𝑙𝑟 ⋅ 𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) 𝑥 = [0.41,0.43,0.51]
→
𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) = [−0.04, 0.04, 0.01]
29
What is the Equivalent of Second
Derivative for Multivariate Functions?
𝜕"𝑓 𝜕"𝑓 𝜕"𝑓
𝜕𝑥 " 𝜕𝑥𝜕𝑦 𝜕𝑥𝜕𝑧

𝜕"𝑓 𝜕"𝑓 𝜕"𝑓

• It is the Hessian Matrix:
𝜕𝑥𝜕𝑦 𝜕𝑦𝜕𝑧
𝜕𝑦 "
• But thankfully, we will not need to use it 𝜕"𝑓 𝜕"𝑓 𝜕"𝑓
𝜕𝑥𝜕𝑧 𝜕𝑦𝜕𝑧 𝜕𝑧 "
• But for your information, this would be the equivalent of the “High School”
minimization when we have several variables:
To minimize f(x, y, z):
1. Compute gradient of f(x, y , z)
2. Compute hessian of f(x)
3. Find x, y, z such that grad f(x,y,z) = 0
4. If hessian of f(x,y,z) is definite positive then
(x,y,z) is a local minimum of f
30
Gradient Descent Algorithm
• You can see that, in the case of the gradient descent, the algorithm is the
same for univariate functions and multivariate functions

• It is a simple algorithm, but it scales very well

• There exists many variations of it:

• Gradient Descent with Momentum

• Stochastic Gradient Descent

• Adagrad, Adadelta, Adam, …

31
Gradient Descent with Momentum

• Compute a “gradient with momentum” at each

iteration:
→
𝑔𝑚< = 0.6𝑔𝑟𝑎𝑑 ⋅ 𝑓(𝑥 ) + 0.4𝑔𝑚<=+
→
Update 𝑥 :
→ →
𝑥 : = 𝑥 − 𝑙𝑟 ⋅ 𝑔𝑚<

32
Stochastic Gradient Descent (1/2)

• What happens if the gradient is noisy?

• That is, we can only compute a value that is equal to
the true gradient “on average”?

• A bit like if you are drunk and trying to get home

33
Stochastic Gradient Descent (2/2)
• What happens if the gradient is noisy?

• That is, we can only compute a value that is equal to the true gradient “on
average”?

• A bit like if you are drunk and trying to get home

• It turns out it works.

• But you have to decrease your learning rate over time to stabilize 𝑙𝑟#
𝑙𝑟 =
• Convergence will be slower (𝑡 + 1)
• Very interesting because a noisy gradient can be million times faster to
compute than a “true” gradient

34
Optimization Libraries
• You can also minimize a function by using a specialized library

• It gives you access to more sophisticated minimization algorithms

• However these more sophisticated algorithms do not scale as well as

Gradient Descent

• Which is one reason for Gradient Descent and its variants are still the main
tool for large scale Machine Learning (In particular, Deep Learning)

35
Google Colab Notebook

• Let us check gradient descent in practice with Google

Colab notebook
https://shorturl.at/lxUX1

36
Report

• Submit Exercise 1 and 2 in pdf via PandA

• Submission due: next lecture

• Name the pdf file as student id_name.

37
Exercise 1
• Compute the partial derivatives of:

𝑓(𝑥, 𝑦, 𝑧) = 𝑥𝑦𝑧 − 𝑧 / − 𝑦 /

𝑓(𝑥, 𝑦, 𝑧) = 𝑒 KLM − 𝑙𝑜𝑔(𝑧)

38
Exercise 2
→
𝑥 = [1.5, −2.0, 5]
→
𝑦 = [2, 2, 10, 10]
→
𝑧 = [3, −3, 0]
• Dimensions of → → →
𝑥 , 𝑦, 𝑧 ?

• Values of x1, y2, z0, y0?

→ → → → → →
• Compute: 𝑥 + 𝑦 𝑥 + 0.5×𝑦 𝑦+𝑧

AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
No ratings yet
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
3 pages
Linear Algebra Assignment Solution
100% (1)
Linear Algebra Assignment Solution
28 pages
MECH2407 (Advanced Calculus Part I)
100% (2)
MECH2407 (Advanced Calculus Part I)
94 pages
Fast Ai Class Notes
No ratings yet
Fast Ai Class Notes
48 pages
DL03 Classroom SNN
No ratings yet
DL03 Classroom SNN
41 pages
Chapter 1 Multi Variable Functions
No ratings yet
Chapter 1 Multi Variable Functions
62 pages
Unit 1: Partial Differentiation: Dr. Deepika
No ratings yet
Unit 1: Partial Differentiation: Dr. Deepika
42 pages
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
FAI 3 Mathematical Concepts I
No ratings yet
FAI 3 Mathematical Concepts I
45 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Maths For ML
No ratings yet
Maths For ML
1 page
LectureNoteson Calculus3
No ratings yet
LectureNoteson Calculus3
77 pages
MA2104 Week 03
No ratings yet
MA2104 Week 03
80 pages
Differentiation, Partial Differentiation & Gradients
No ratings yet
Differentiation, Partial Differentiation & Gradients
51 pages
Review Multivariable Functions
No ratings yet
Review Multivariable Functions
20 pages
Nature-Inspired Optimizers: Theories, Literature Reviews and Applications Seyedali Mirjalili Download
No ratings yet
Nature-Inspired Optimizers: Theories, Literature Reviews and Applications Seyedali Mirjalili Download
60 pages
Functions Several Variables: CH 7-Of
No ratings yet
Functions Several Variables: CH 7-Of
56 pages
Week 3 Multivariable Differentiation
No ratings yet
Week 3 Multivariable Differentiation
55 pages
RCMP Unit 2
No ratings yet
RCMP Unit 2
10 pages
Group-10 CC05 Cal2 213
No ratings yet
Group-10 CC05 Cal2 213
29 pages
Multivariable Calculus Study Guide: AL TEX Version: 1 Disclaimer
No ratings yet
Multivariable Calculus Study Guide: AL TEX Version: 1 Disclaimer
18 pages
Calculus Ii: Unit 1: Functions of Several Variables
No ratings yet
Calculus Ii: Unit 1: Functions of Several Variables
63 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
CENG3300 Lecture 2-1
No ratings yet
CENG3300 Lecture 2-1
21 pages
03 23ECE216 PartialDerivatives
No ratings yet
03 23ECE216 PartialDerivatives
47 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Unit 2 Sessionswise Problems LAh5sdSzgB
No ratings yet
Unit 2 Sessionswise Problems LAh5sdSzgB
9 pages
Tangent Planes and Linear Approximations
No ratings yet
Tangent Planes and Linear Approximations
9 pages
04 Background - Calculs
No ratings yet
04 Background - Calculs
24 pages
Unit Ii: Interpolation and Approximation: XXXX XX Yyx FX y X XX X X X
No ratings yet
Unit Ii: Interpolation and Approximation: XXXX XX Yyx FX y X XX X X X
21 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Chapter 4 Multivariable Function and Its Differentiation
No ratings yet
Chapter 4 Multivariable Function and Its Differentiation
10 pages
Process Control B.S
100% (5)
Process Control B.S
437 pages
Document Calculus II Chapter 1
No ratings yet
Document Calculus II Chapter 1
20 pages
QMS 203 - Lecture2 - Classical - Optimization1
No ratings yet
QMS 203 - Lecture2 - Classical - Optimization1
26 pages
Partial Derivatives
No ratings yet
Partial Derivatives
20 pages
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
No ratings yet
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
71 pages
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
No ratings yet
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
188 pages
Cal2 Chapter5
No ratings yet
Cal2 Chapter5
21 pages
2E02 Partial Derivatives
No ratings yet
2E02 Partial Derivatives
34 pages
Chapter 14
No ratings yet
Chapter 14
27 pages
0 Main
No ratings yet
0 Main
13 pages
Elijah's Math Notes
No ratings yet
Elijah's Math Notes
58 pages
FF Calculus 2
No ratings yet
FF Calculus 2
12 pages
First Course On Fuzzy Theory and Applications: Kwang H. Lee
No ratings yet
First Course On Fuzzy Theory and Applications: Kwang H. Lee
5 pages
Chap2 Studentsversion
No ratings yet
Chap2 Studentsversion
21 pages
Solutions To Pure Mathematics Seminarfont 14
No ratings yet
Solutions To Pure Mathematics Seminarfont 14
40 pages
Partial Derivatives 2.1 First Order Partial Derivatives Definition
No ratings yet
Partial Derivatives 2.1 First Order Partial Derivatives Definition
31 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
Chapter 02 - Partial - Differentiation - Annotated
No ratings yet
Chapter 02 - Partial - Differentiation - Annotated
9 pages
Unit 3
No ratings yet
Unit 3
37 pages
CO-1 Course Material (15-12-17)
No ratings yet
CO-1 Course Material (15-12-17)
27 pages
1.multivariable Functions
No ratings yet
1.multivariable Functions
20 pages
Chapter 4 Multivariable Function and Its Differentiation
No ratings yet
Chapter 4 Multivariable Function and Its Differentiation
10 pages
Chapter 14
No ratings yet
Chapter 14
14 pages
Calc
No ratings yet
Calc
6 pages
Winter 2021 Paper Solution - Math 1
No ratings yet
Winter 2021 Paper Solution - Math 1
42 pages
2 - MQP - BCA303 - Artificial Intelligence
No ratings yet
2 - MQP - BCA303 - Artificial Intelligence
4 pages
Lesson2 1
No ratings yet
Lesson2 1
49 pages
124F1 - Lecture Notes - Section 2.1
No ratings yet
124F1 - Lecture Notes - Section 2.1
12 pages
Interpretations of Partial Derivatives
No ratings yet
Interpretations of Partial Derivatives
8 pages
CPT212 - Graphs Pt.2 (ELearn)
No ratings yet
CPT212 - Graphs Pt.2 (ELearn)
79 pages
Partial Derivatives
No ratings yet
Partial Derivatives
9 pages
BNetwork Presentation
No ratings yet
BNetwork Presentation
18 pages
Functions of Several Variables
No ratings yet
Functions of Several Variables
8 pages
Week 2
No ratings yet
Week 2
8 pages
ML Lab Exp 7 K-Means Clustering
No ratings yet
ML Lab Exp 7 K-Means Clustering
14 pages
Fourier Transform
No ratings yet
Fourier Transform
16 pages
Revisiting VAE For Unsupervised Time Series Anomaly Detection: A Frequency Perspective
No ratings yet
Revisiting VAE For Unsupervised Time Series Anomaly Detection: A Frequency Perspective
10 pages
Simplex Method
No ratings yet
Simplex Method
29 pages
Vector Calculus (1016-410-01) : Plan For The Day - Tuesday 12/4 1 Functions of Several Variables
No ratings yet
Vector Calculus (1016-410-01) : Plan For The Day - Tuesday 12/4 1 Functions of Several Variables
8 pages
Nguyễn Phát Thịnh - assignment 11
No ratings yet
Nguyễn Phát Thịnh - assignment 11
6 pages
SSL TLS
No ratings yet
SSL TLS
31 pages
Chinese Pharmaceuticals Summary
No ratings yet
Chinese Pharmaceuticals Summary
5 pages
Ma1505 Cheat
No ratings yet
Ma1505 Cheat
4 pages
NCS21 - 03 - Describing Function Analysis - 03
No ratings yet
NCS21 - 03 - Describing Function Analysis - 03
4 pages
Artificial Intelligence For Business Mb-Gab-Oimict-01 (Ahp) : The Correct Answer Is: Free From Errors
No ratings yet
Artificial Intelligence For Business Mb-Gab-Oimict-01 (Ahp) : The Correct Answer Is: Free From Errors
8 pages
MA161 Assignment 2 Question 1 (A) Prime Number Composite Number
No ratings yet
MA161 Assignment 2 Question 1 (A) Prime Number Composite Number
6 pages
Hand Gesture Recognition2
No ratings yet
Hand Gesture Recognition2
5 pages
Smoothsort Demystified
No ratings yet
Smoothsort Demystified
27 pages
Box-Muller Transform Wiki
No ratings yet
Box-Muller Transform Wiki
5 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
32 pages
Pages From (Monson Hayes) Schaum S Outline of Digital Signal
No ratings yet
Pages From (Monson Hayes) Schaum S Outline of Digital Signal
7 pages
Technical Software Pic Crc16
No ratings yet
Technical Software Pic Crc16
7 pages
Week 3 PDF
No ratings yet
Week 3 PDF
20 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

FAI 4 Mathematical Concepts II

Uploaded by

FAI 4 Mathematical Concepts II

Uploaded by

Mathematical Concepts (2/2)

Functions, Minimization, Gradient

Teaching Assistant: Youyuan Lin

8. Neural network 9. Architecture and 10. Feedforward

5. Simple linear 6. Multiple linear

2. Python 3, 4. Mathematics Concepts I, II

Fundamental of Machine Learning 3

x is good Yes x=0.32

Function of 2 variables: 𝑓(𝑥, 𝑦) = (𝑥 − 𝑦)! + 2

Function of 2 variables: 𝑓(𝑥, 𝑧) = (𝑥 − 2)! + 𝑧 ! − 𝑧

Fixing y and z: 𝑦=2 𝑧=3

Function of 1 variable: 𝑓(𝑥) = (𝑥 − 2)! + 6

Function of 3 variables: 𝑓(𝑥, 𝑦, 𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

For each value of y and

Function of 1 variable: 𝑓(𝑥) = (𝑥 − 2)! + 6

• And we note the parameters separately, as subscripts

For each value of y and

• One part of the answer is partial derivatives

• In other words, we see the function of several variables as a

• One part of the answer is partial derivatives

• In other words, we see the function of several variables as a parameterized

𝑓$,% (𝑥) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",% (𝑦) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",$ (𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

𝑓$,% (𝑥) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",% (𝑦) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧 𝑓",$ (𝑧) = (𝑥 − 𝑦)! + 𝑧 ! − 𝑧

• 3-dimensional vectors: [2.3, 4.5, -1]

• In machine learning, we also use them a lot

• Except that they can have more than 3 dimensions

• 5-dimensional vector: [-1, 3, 4.1, 5.2, 4]

• We often note the set of all n-dimensional vectors ℝ6

• A n-dimensional vector is a list of n numbers

• We can add 2 vectors (if they have the same dimension)

• Then we have x0=1, x1=2.2, x2=-1, x3=4

• Vector operations: x+0.5*y

• The partial derivatives become the component of a vector we call the

• If the gradient has n variables, it is a vector of size n

• More precisely, the gradient of f is itself a function that return a vector

𝑓: ℝ" →ℝ 𝑔𝑟𝑎𝑑 ⋅ 𝑓: ℝ" → ℝ"

• Therefore, in general, it points in the direction

𝑔𝑟𝑎𝑑 ⋅ 𝑓(0,0) = [−16, 16]

One variable: Multiple variables:

𝜕"𝑓 𝜕"𝑓 𝜕"𝑓

• It is a simple algorithm, but it scales very well

• There exists many variations of it:

• Gradient Descent with Momentum

• Stochastic Gradient Descent

• Adagrad, Adadelta, Adam, …

• Compute a “gradient with momentum” at each

• What happens if the gradient is noisy?

• A bit like if you are drunk and trying to get home

• A bit like if you are drunk and trying to get home

• It turns out it works.

• It gives you access to more sophisticated minimization algorithms

• However these more sophisticated algorithms do not scale as well as

• Let us check gradient descent in practice with Google

• Submit Exercise 1 and 2 in pdf via PandA

• Submission due: next lecture

• Name the pdf file as student id_name.

𝑓(𝑥, 𝑦, 𝑧) = 𝑒 KLM − 𝑙𝑜𝑔(𝑧)

• Values of x1, y2, z0, y0?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.