0% found this document useful (0 votes)

102 views90 pages

Tensorflow Deep Learning With Keras

This document discusses the title "TensorFlow: Deep learning with Keras". It summarizes that deep learning uses artificial neural networks, Keras is a popular library for implementing deep learning methods, and TensorFlow is a library that includes Keras as a submodule.

Uploaded by

wick91274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views90 pages

Tensorflow Deep Learning With Keras

Uploaded by

wick91274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

TensorFlow: Deep learning with Keras

Primož Godec1 and Rok Hribar2

1 University of Ljubljana, Faculty of Computer and Information Science
2 Jožef Stefan Institute, Computer Systems department

September 2020
Regarding the title of this course

TensorFlow: Deep learning with Keras

Regarding the title of this course

TensorFlow: Deep learning with Keras

I Deep learning is a set of methods for using artificial neural
networks
Regarding the title of this course

TensorFlow: Deep learning with Keras

I Deep learning is a set of methods for using artificial neural
networks
I Keras is probably the most popular library that implements Deep
learning methods
Regarding the title of this course

TensorFlow: Deep learning with Keras

I Deep learning is a set of methods for using artificial neural
networks
I Keras is probably the most popular library that implements Deep
learning methods
I TensorFlow is a library that includes Keras as a submodule
Deep learning
Artificial intelligence

I machines (or computers) that mimic cognitive

functions that we associate with the human mind
I translate text (like a book)
I recognize object in image (face, handwriting)
I recognize speech
I creativity (poetry, music, paintings)
I expert diagnosis (physician, mechanic)

What kind of murderer has moral fiber? – A cereal killer.

Artificial intelligence

I machines (or computers) that mimic cognitive

What kind of murderer has moral fiber? – A cereal killer.

Machine learning
Machine learning involves computers discovering how they can perform
tasks without being explicitly programmed to do so.
Machine learning
Machine learning involves computers discovering how they can perform
tasks without being explicitly programmed to do so.
Traditional algorithm:
I A human programmer designs an algorithm telling the machine
how to execute all steps required to solve the problem at hand.
Machine learning
Machine learning involves computers discovering how they can perform
tasks without being explicitly programmed to do so.
Traditional algorithm:
I A human programmer designs an algorithm telling the machine
how to execute all steps required to solve the problem at hand.
For some tasks, it can be challenging for a human to manually create
the needed algorithm.
Machine learning
Machine learning involves computers discovering how they can perform
tasks without being explicitly programmed to do so.
Traditional algorithm:
I A human programmer designs an algorithm telling the machine
how to execute all steps required to solve the problem at hand.
For some tasks, it can be challenging for a human to manually create
the needed algorithm.
Machine learning algorithm:
I A human programmer designs an algorithm that helps the
computer develop its own algorithm, rather than having human
programmer specify every needed step.
Machine learning
Machine learning involves computers discovering how they can perform
tasks without being explicitly programmed to do so.
Traditional algorithm:
I A human programmer designs an algorithm telling the machine
how to execute all steps required to solve the problem at hand.
For some tasks, it can be challenging for a human to manually create
the needed algorithm.
Machine learning algorithm:
I A human programmer designs an algorithm that helps the
computer develop its own algorithm, rather than having human
programmer specify every needed step.
I Do not let the word “learning” mislead you.
Machine learning example: Spam filtering
Text Category

secret prize! claim secret prize now spam

could you send me that image we talked about ham model
account compromised reset password spam
free entry for 2 week tournament spam
are you coming to a secret party for Mark ham ML
you have a virus please download spam data parameters
I’m in Ljubljana on Thursday, have time? ham algorithm
$50 gift card for Amazon spam

Basic machine learning algorithm:

I Count the words that appear in spam/ham messages
I Calculate probabilities that a word is present in a message
belonging to a given class
Result is a model that can calculate probability that a message is spam
Machine learning
Artificial intelligence that is not machine learning:
I rule-based systems (natural language processing, theorem proving)
I early computer vision
Artificial neural network
Despite its name it doesn’t have much to do with biological brain.
Artificial neural network
Despite its name it doesn’t have much to do with biological brain.
It is a simple mathematical model that:
Traditional neural network
I is fast – can be easily parallelized

I can capture wide range of functions

a = W1 x + b1
h = σ(a)
y = W2 h + b2
Artificial neural network
Despite its name it doesn’t have much to do with biological brain.
It is a simple mathematical model that:
Traditional neural network
I is fast – can be easily parallelized
I matrix multiplication is highly
parallelizable and optimized
I composition of linear functions is
linear – we need nonlinearity
I the fastest nonlinear functions are
those of a single variable
I can capture wide range of functions
a = W1 x + b1
h = σ(a)
y = W2 h + b2
Artificial neural network
Despite its name it doesn’t have much to do with biological brain.
It is a simple mathematical model that:
Traditional neural network
I is fast – can be easily parallelized
I matrix multiplication is highly
parallelizable and optimized
I composition of linear functions is
linear – we need nonlinearity
I the fastest nonlinear functions are
those of a single variable
I can capture wide range of functions
a = W1 x + b1
I L-NL and NL-L are not universal
approximators h = σ(a)
y = W2 h + b2
Artificial neural network
Despite its name it doesn’t have much to do with biological brain.
It is a simple mathematical model that:
Traditional neural network
I is fast – can be easily parallelized
I matrix multiplication is highly
parallelizable and optimized
I composition of linear functions is
linear – we need nonlinearity
I the fastest nonlinear functions are
those of a single variable
I can capture wide range of functions
a = W1 x + b1
I L-NL and NL-L are not universal
approximators h = σ(a)
I NL-L-NL and L-NL-L are and out of y = W2 h + b2
those L-NL-L is faster
Deep neural network
The number of all possible models
for a network with a single hidden
layer is

a#parameters
#hidden units!
Deep neural network
The number of all possible models
for a network with a single hidden
layer is

a#parameters
#hidden units!
More formal result for capacity of
a deep network (per parameter)

w f −2
(w /f )(d−1)f
d
Deep neural network

More philosophical reasons for why depth is good:

I belief that the function we want to learn is a computer program
consisting of multiple steps, where each step makes use of the
previous step’s output
I belief that the nature of knowledge is hierarchical, where more
abstract concepts build on simpler ones
I belief that the learning problem consists of discovering a set of
underlying factors of variation that can in turn be described in
terms of other, simpler underlying factors of variation
Deep neural network
Training a neural network

There have been many procedures to train neural networks through

history.
I learning rules (Hebian, correlation)
Training a neural network

There have been many procedures to train neural networks through

history.
I learning rules (Hebian, correlation)
I perceptron learning (linear least squares)
I neuroevolution
I gradient based methods

target
error
input output
Training a neural network
I gradient based methods
target
error
input output

I Derivative of error with respect

to all parameters of the
network are calculated using
backpropagation algorithm.
I Parameters of the network are
changed in direction that
minimizes the error.
Overfitting and underfitting

Overfitting is a modeling error that

occurs when a model has learned
too much.
I model capacity is so high that
noise is being modeled
I model doesn’t generalize well
from our training data to
unseen data
I this can usually be avoided by

#data instances #parameters

Overfitting and underfitting
However, overfitting is a
complicated phenomenon.
I model capacity
I data set distribution
I complexity of an underlying
problem
The most bulletproof way to know
if overfitting happened is to
measure error on unseen data
I Test error
Overfitting and underfitting
However, overfitting is a
complicated phenomenon.
I model capacity
I data set distribution
I complexity of an underlying
problem
The most bulletproof way to know −1.4
training test

if overfitting happened is to
log(error)
−1.6
measure error on unseen data
−1.8
I Test error −2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

number of epochs ·104
Regularization
AI problems normally require high capacity
models.
I depth due to problem complexity
I width to ensure information flow
Regularization
Techniques:
AI problems normally require high capacity I weight decay
models.
I parameter sharing
I depth due to problem complexity
I semi-supervised
I width to ensure information flow learning
I dropout
To reduce overfitting we handicap the
network without reducing its size. I early stopping
I constraints on the structure of the I sparse representations
network I data augmentation
I disruptions in the training phase I batch/layer
normalization
Weight decay Dropout
error + λkparameterskq

Batch normalization
Data augmentation
Cross-validation method
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

learning was very limited.
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

learning was very limited.
I gradient based training (on GPU)
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

learning was very limited.
I gradient based training (on GPU)
I availability of large quantity of data
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

learning was very limited.
I gradient based training (on GPU)
I availability of large quantity of data
I appropriate cost functions
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

learning was very limited.
I gradient based training (on GPU)
I availability of large quantity of data
I appropriate cost functions
I new regularizations
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

learning was very limited.
I gradient based training (on GPU)
I availability of large quantity of data
I appropriate cost functions
I new regularizations
I new representation mappings (eg. embeddings)
Deep learning
Official definition: Deep learning is the study of machine learning
models composed of multiple layers of functions that progressively
extract higher level features from the raw input.

However, this idea existed also 1950–2010 when success of deep

I Introduction to neural networks through classification

I Neural network for regression
I Image classification
Convolutional neural networks

I Image classification with convolutional neural networks

I Exercise: Classification of images from CIFAR10 dataset
Notranjski regC
INTERMITTENT

Lake Cerknica is the largest lake

in Slovenia (∼ 28km2 ).

Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkn

Notranjski regijski
INTERMITTENT Notranjski regC
parkLAKEINTERMITTENT
CERKNICA

Lake Cerknica is the largest lake

in Slovenia (∼ 28km2 ).
When it is full – intermittent lake.

Notranjski regijski
INTERMITTENT parkLAKE
CERKNICA

Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkniško  Cerknica Lake - Project area

jezero
Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkn
Notranjski regijski
INTERMITTENT Notranjski regC
parkLAKEINTERMITTENT
CERKNICA

Lake Cerknica is the largest lake

in Slovenia (∼ 28km2 ).
When it is full – intermittent lake.
Karst country involves an
underground drainage system with
Notranjski regijski
sinkholes and caves. INTERMITTENT parkLAKE
CERKNICA

Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkniško  Cerknica Lake - Project area

jezero
Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkn
Recurrent neural network
I part of output from a layer is
fed as additional input along
with the next instance
I short term memory
Recurrent neural networks
Internal state doesn’t depend only on Applications:
current data instance but also on all I Time series prediction
previous ones. I Robot control
I Text generation
Advantages
I Music composition
I no need to choose time window
I Video processing
I weight sharing
I Machine translation
I partially observable modeling
I Handwriting
Disadvantages
recognition
I less parallelizable
I Genetics and protein
I difficult to train related ML
I vanishing and exploding gradients I Speech recognition
TensorFlow
What can TensorFlow do?
1. It can perform numerical operations on data (in a parallel way –
multi-core, GPU).

import tensorflow as tf

A = tf . Variable ( [[1.0 , 2.0] , [3.0 , 4.0]] )

B = tf . Variable ( [[5.0 , 6.0] , [7.0 , 8.0]] )

C = tf . matmul (A , B ) # matrix multiplication

D = A - B*C # elementwise operations
cos_D = tf . cos ( D ) # elementwise math functions
sum_D = tf . reduce_sum ( D ) # sum of all D components
max_D = tf . reduce_max ( D ) # max component of D
svd_D = tf . linalg . svd ( D ) # singular value decomposition
C = tf . matmul (A , B )
# < tf . Tensor : id =17 , shape =(2 , 2) , dtype = float32 ,
# numpy = array ([[19. , 22.] , [43. , 50.]] , dtype = float32 ) >

cos_D = tf . cos ( D )
# < tf . Tensor : id =30 , shape =(2 , 2) , dtype = float32 ,
# numpy = array ([[ 0.96945935 , -0.36729133] ,
# [ -0.89988 , 0.9873345 ]] ,
# dtype = float32 ) >

max_D = tf . reduce_max ( D )
# < tf . Tensor : id =26 , shape =() , dtype = float32 ,
# numpy = -94.0 >

C . numpy ()
# array ([[19. , 22.] , [43. , 50.]] , dtype = float32 )
svd_D = tf . linalg . svd ( D )
# ( < tf . Tensor : id =27 , shape =(2 ,) , dtype = float32 ,
# numpy = array ([520.9103 , 2.9102921] , dtype = float32 ) >,
#
# < tf . Tensor : id =28 , shape =(2 , 2) , dtype = float32 ,
# numpy = array ([[ -0.30792360 , 0.95141107] ,
# [ -0.95141107 , -0.30792360]] ,
# dtype = float32 ) >,
#
# < tf . Tensor : id =29 , shape =(2 , 2) , dtype = float32 ,
# numpy = array ([[0.59984480 , 0.80011636] ,
# [0.80011636 , -0.59984480]] ,
# dtype = float32 ) >)
What can TensorFlow do?
1. It can perform numerical operations on data (in a parallel way –
multi-core, GPU).
What can TensorFlow do?
1. It can perform numerical operations on data (in a parallel way –
multi-core, GPU).
2. It can calculate derivatives using automatic differentiation.
δ
S
multiply

Ψ
I transpose multiply

Β0 Σ invert
multiply
add

subtract invert
trace

Λ multiply
logdet

λ
multiply add add

Θ
transpose

sum
multiply FLASSO
absolute
task that is hard in theory, but sometimes easy in practice. Despite the NP-hardness of training
general neural loss functions [2], simple gradient methods often find global minimizers (parameter
configurations with zero or near-zero training loss), even when data and labels are randomized before
training [42]. However, this good behavior is not universal; the trainability of neural nets is highly
dependent on network architecture design choices, the choice of optimizer, variable initialization, and
a Why
variety of
dootherweconsiderations. Unfortunately, the effect of each of these choices on the structure of
need derivatives?
the underlying loss surface is unclear. Because of the prohibitive cost of loss function evaluations
(which requires looping over all the data points in the training set), studies in this field have remained
Knowing in which direction “down” is, can help us when solving
predominantly theoretical.
optimization problems.
task that is hard in theory, but sometimes easy in practice. Despite the NP-hardness of training
general neural loss functions [2], simple gradient methods often find global minimizers (parameter
configurations with zero or near-zero training loss), even when data and labels are randomized before
training [42]. However, this good behavior is not universal; the trainability of neural nets is highly
dependent on network architecture design choices, the choice of optimizer, variable initialization, and
a Why
variety of
dootherweconsiderations. Unfortunately, the effect of each of these choices on the structure of
need derivatives?
the underlying loss surface is unclear. Because of the prohibitive cost of loss function evaluations
(which requires looping over all the data points in the training set), studies in this field have remained
Knowing in which direction “down” is, can help us when solving
predominantly theoretical.
optimization problems.
I So, various competitions show that evolutionary algorithms
outperform gradient based optimization algorithms.
I However, all those competitions use functions of “low” dimension
(≤ 100) and gradient based optimization excels in high dimensions.
I So, various competitions show that evolutionary algorithms
outperform gradient based optimization algorithms.
I However, all those competitions use functions of “low” dimension
(≤ 100) and gradient based optimization excels in high dimensions.
How many local minima are there with respect to dimension?
I So, various competitions show that evolutionary algorithms
outperform gradient based optimization algorithms.
I However, all those competitions use functions of “low” dimension
(≤ 100) and gradient based optimization excels in high dimensions.
How many local minima are there with respect to dimension?
 
∂2f ∂2f ∂2f
 ∂x 2 ···
 1 ∂x1 ∂x2 ∂x1 ∂xn 

 
 ∂2f 2
∂ f 2
∂ f 
 · · · 
 ∂x2 ∂x1 ∂x22 ∂x2 ∂xn 
 
 . . . . 
 .. .. .. .. 
 
 
 
 ∂2f 2
∂ f ∂ f 
2
···
∂xn ∂x1 ∂xn ∂x2 ∂xn2
I So, various competitions show that evolutionary algorithms
outperform gradient based optimization algorithms.
I However, all those competitions use functions of “low” dimension
(≤ 100) and gradient based optimization excels in high dimensions.
How many local minima are there with respect to dimension?
 
∂2f ∂2f ∂2f
 ∂x 2 ··· If eigenvalues of Hessian matrix
 1 ∂x1 ∂x2 ∂x1 ∂xn   are randomly distributed, then
 
 ∂2f ∂2f ∂2f  probability that a stationary
 ··· 
 ∂x2 ∂x1 ∂x 2 ∂x 2 ∂x  point is a local minimum is 2−n .
 2 n
 .. .. .. .. 
 . . . .  I Saddle points are
 
  exponentially more common
 
 ∂2f 2
∂ f 2
∂ f  than local minima.
···
∂xn ∂x1 ∂xn ∂x2 ∂xn2
x1 , x2 x3 , x4 x5 , x6 x7 , x8

x9 , x10 x11 , x12 x15 , x16

x13 , x14

x17 , x18 x19 , x20 x21 , x22 x23 , x24

Ways to calculate derivatives of a program
2.3 1. Numerical differentiation
Step ellipsoid
D
X i−1
!
fstep (x) = 0.1 max |ẑ1 |/104 , 102 D−1 zi2
i=1

10 opt
• ẑ = Λ R(x − x ) ∂ f (x1 + h, x2 , . . . ) − f (x1 − h, x2 , . . . )
• z̃i =
(
b0.5 + ẑi c if ẑi > 0.5 f (x1 , x2 , . . . ) ≈
for i = 1, . . . , D,
b0.5 + 10 ẑi c/10 otherwise
∂x1
denotes the rounding procedure in order to produce the plateaus. 2h
• z = Qz̃

Very efficient for

Properties The function consists of many plateaus of different sizes. Apart from a small area
close to the global optimum, the gradient is zero almost everywhere.
• condition number is about 100
Algorithms that use it
Ozaki et al. IPSJ Transactions on Computer Vision and Applications (20
I
2.3.1
Noisy functions
113 Step ellipsoid with gaussian noise
f113 (x) = fGN (fstep (x), 1) + fpen (x) + fopt (113)
I Nelder-Mead algorithm
I Locally flat functions I OpenAI evolution strategy
Ways to calculate derivatives of a program
2. Symbolic differentiation

∂ a exp (ax + b)
log (1 + exp (ax + b)) =
∂x 1 + exp (ax + b)

Very efficient in case function

has large number of outputs
y1
y2
x1
f y3
x2
y4
y5
Ways to calculate derivatives of a program
2. Symbolic differentiation

∂ a exp (ax + b)
log (1 + exp (ax + b)) =
∂x 1 + exp (ax + b)

Very efficient in case function

has large number of outputs
y1
y2 if f (x , data ) > 0:
x1 g (x , data )
f y3
x2 else :
y4 h (x , data )
y5
Ways to calculate derivatives of a program
3. Automatic differentiation
I Sort of a hybrid between simbolic and numerical differentiation
I There exist forward and reverse automatic differentiation –
TensorFlow uses reverse automatic differentiation
Example:
Very efficient in case function has
large number of inputs f (x1 , x2 , . . . , xn ) = x1 · x2 · . . . · xn
x1  
x2 x2 · x3 · . . . · xn
y1  x1 · x3 · . . . · xn 
x3  
f ∇f =  .. 
y2  . 
x4
x1 · x2 · . . . · xn−1
x5
y = (x1 + x2 ) exp (x2 ) x1

a
x2 ∗ y
b

exp
y = (x1 + x2 ) exp (x2 ) x1

+ ∂y
=b
∂a
a
x2 ∗ y
b
∂y
exp =a
∂b
y = (x1 + x2 ) exp (x2 ) x1 ∂y ∂y ∂a
=
∂x1 ∂a ∂x1

+ ∂y
=b
∂a
a
x2 ∗ y
b
∂y
exp =a
∂b
1
z}|{
y = (x1 + x2 ) exp (x2 ) x1 ∂y ∂y ∂a
=
∂x1 ∂a ∂x1

+ ∂y
=b
∂a
a
x2 ∗ y
b
∂y
exp =a
∂b
1
z}|{
y = (x1 + x2 ) exp (x2 ) x1 ∂y ∂y ∂a
= =b
∂x1 ∂a ∂x1

+ ∂y
=b
∂a
a
x2 ∗ y
b
∂y ∂y ∂a ∂y ∂b ∂y
= + exp =a
∂x2 ∂a ∂x2 ∂b ∂x2 ∂b
1
z}|{
y = (x1 + x2 ) exp (x2 ) x1 ∂y ∂y ∂a
= =b
∂x1 ∂a ∂x1

+ ∂y
=b
∂a
a
x2 ∗ y
b
∂y ∂y ∂a ∂y ∂b ∂y
= + exp =a
∂x2 ∂a ∂x2 ∂b ∂x2 ∂b
|{z} |{z}
a 1
1
z}|{
y = (x1 + x2 ) exp (x2 ) x1 ∂y ∂y ∂a
= =b
∂x1 ∂a ∂x1

+ ∂y
=b
∂a
a
x2 ∗ y
b
∂y ∂y ∂a ∂y ∂b ∂y
= + = ab + a exp =a
∂x2 ∂a ∂x2 ∂b ∂x2 ∂b
|{z} |{z}
a 1
The same thing in TensorFlow

import tensorflow as tf

x1 = tf . Variable (3.1)
x2 = tf . Variable ( -1.4)

with tf . GradientTape () as tape : # Save graph to tape

tape . watch ([ x1 , x2 ]) # Watch for x1 and x2
f = ( x1 + x2 ) * tf . exp ( x2 ) # Calculate f ( x1 , x2 )

df = tape . gradient (f , [ x1 , x2 ])
# [ < tf . Tensor : id =22 , shape =() , dtype = float32 , numpy
=0.24659698 > ,
# < tf . Tensor : id =25 , shape =() , dtype = float32 , numpy
=0.66581184 >]
Optimizers

Vanilla update
x += - learning_rate * dx

Momentum update
v = mu * v - learning_rate * dx # integrate velocity
x += v # integrate position

Adam
m = beta1 * m + (1 - beta1 ) * dx
v = beta2 * v + (1 - beta2 ) *( dx **2)
x += - learning_rate * m / ( np . sqrt ( v ) + eps )
Optimizers

Optimizers are available in tf.keras.optimizers module

I Vanilla update (tf.keras.optimizers.SGD)
I Adagrad (tf.keras.optimizers.Adagrad)
I RMSprop (tf.keras.optimizers.RMSprop)
I Adam (tf.keras.optimizers.Adam)
Matrix factorization example
Suppose we have movie ratings from various people for set of movies
they have watched.

user id movie id rating

4160 14501 5
182 14502 2
6649 14502 3
17240 14502 1
115 14503 4
.. .. ..
. . .
Matrix factorization example
Suppose we have movie ratings from various people for set of movies
they have watched.

users
user id movie id rating
4160 14501 5
182 14502 2

movies
6649 14502 3
17240 14502 1
115 14503 4
.. .. ..
. . .
Matrix factorization example

users genres

preference
users
movies

movies
R ≈ G · H
Matrix factorization example

users genres

preference
users
movies

movies
R ≈ G · H

error = kW (R − G · H)k = min. G, H ≥ 0

# Initialize variables .
G = tf . Variable (...)
H = tf . Variable (...)

# Choose a gradient based optimizer .

optimizer = tf . keras . optimizers . Adam ()

# Perform gradient descent .

for i in range ( num_steps ) :
with tf . GradientTape () as tape :
tape . watch ([ G , H ])
absG = tf . abs ( G )
absH = tf . abs ( H )
dR = R - tf . matmul ( absG , absH )
loss = tf . reduce_sum ( tf . square ( dR ) )
dG , dH = tape . gradient ( loss , [G , H ])
optimizer . apply_gradients ([[ dG , G ] , [ dH , H ]])
Example: Finite element method

geometric solve
Ku = λMu resonance spectra
stuff

Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Deep Learning
No ratings yet
Deep Learning
243 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Module 3
No ratings yet
Module 3
97 pages
Unit - 1 Deep Learning Techniques
No ratings yet
Unit - 1 Deep Learning Techniques
18 pages
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
23 pages
Deep Learning
No ratings yet
Deep Learning
100 pages
Dl All Units Materials
No ratings yet
Dl All Units Materials
138 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
15 pages
I MSC DS ML Notes
No ratings yet
I MSC DS ML Notes
109 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Lect 4-Introduction to Deep Learning
No ratings yet
Lect 4-Introduction to Deep Learning
33 pages
Reading+10+ +Introduction+to+Deep+Learning
No ratings yet
Reading+10+ +Introduction+to+Deep+Learning
21 pages
21CA3207 DLDV UNIT1
No ratings yet
21CA3207 DLDV UNIT1
34 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Deep Learning Project
No ratings yet
Deep Learning Project
24 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
machine learning basics
No ratings yet
machine learning basics
2 pages
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
100% (5)
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
84 pages
Unit 1 part 1
No ratings yet
Unit 1 part 1
61 pages
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
No ratings yet
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
39 pages
Deep Learning Introduction
No ratings yet
Deep Learning Introduction
14 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
MLT unit -1
No ratings yet
MLT unit -1
38 pages
python-machine-learning-machine-learning-and-deep-learning-from-scratch-illustrated-with-python-scikit-learn-keras-theano-and-tensorflow-1211083261
No ratings yet
python-machine-learning-machine-learning-and-deep-learning-from-scratch-illustrated-with-python-scikit-learn-keras-theano-and-tensorflow-1211083261
53 pages
ANN
No ratings yet
ANN
5 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Deep Learning Introduction Class (1)
No ratings yet
Deep Learning Introduction Class (1)
46 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
24 pages
DL - Unit1
No ratings yet
DL - Unit1
59 pages
Machine Learning Deep Learning Overview AIST
No ratings yet
Machine Learning Deep Learning Overview AIST
86 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Module1_ Deep Learning
No ratings yet
Module1_ Deep Learning
26 pages
DL Unit 2
No ratings yet
DL Unit 2
29 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
Artificial Intelligence and Deep Learning
0% (1)
Artificial Intelligence and Deep Learning
9 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Introduction Toartificial Intelligence
100% (1)
Introduction Toartificial Intelligence
6 pages
ML06_Neural-Network_2024-2025
No ratings yet
ML06_Neural-Network_2024-2025
78 pages
Unit-3
No ratings yet
Unit-3
16 pages
Deep Learning
100% (3)
Deep Learning
207 pages
Deep Learning PIAIC
100% (1)
Deep Learning PIAIC
229 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
dl notes
No ratings yet
dl notes
97 pages
1 - Deep Learning 10-10-2023
No ratings yet
1 - Deep Learning 10-10-2023
30 pages
An Ingression Into Deep Learning - FP
No ratings yet
An Ingression Into Deep Learning - FP
17 pages
MODULE 1 DL SNOTES
No ratings yet
MODULE 1 DL SNOTES
11 pages
ML Unit 1 Pallav
No ratings yet
ML Unit 1 Pallav
22 pages
MLUnit_1
No ratings yet
MLUnit_1
131 pages
Artificial Intelligence Vs Machine Learning Vs Deep Learning
No ratings yet
Artificial Intelligence Vs Machine Learning Vs Deep Learning
38 pages
Unit I
No ratings yet
Unit I
10 pages
Article Review 10 Eng
No ratings yet
Article Review 10 Eng
28 pages
Non-Mathematical Introduction to Using Neural Networks | Heaton Research
No ratings yet
Non-Mathematical Introduction to Using Neural Networks | Heaton Research
14 pages
chapter 1
No ratings yet
chapter 1
6 pages
What's The Difference Between AI, Machine Learning
No ratings yet
What's The Difference Between AI, Machine Learning
21 pages
Technologies of AI: Professor Ravee Choudhury 3 May 2019
No ratings yet
Technologies of AI: Professor Ravee Choudhury 3 May 2019
23 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
ADA - UNIT-3 - Chapter-1 - Greedy Method
No ratings yet
ADA - UNIT-3 - Chapter-1 - Greedy Method
47 pages
Chole Sky
100% (1)
Chole Sky
6 pages
MGOC10 - Review Problems - Chapters 4, 6, 7 & 13 - Solution - Fall2020
No ratings yet
MGOC10 - Review Problems - Chapters 4, 6, 7 & 13 - Solution - Fall2020
6 pages
Machine A B C Capacity Minimiz e Cost 1 2 3 Orders
No ratings yet
Machine A B C Capacity Minimiz e Cost 1 2 3 Orders
2 pages
Course Syllabus - Spring Semester 2022/2023 DS 14350 Computer Architecture For Machine Learning
No ratings yet
Course Syllabus - Spring Semester 2022/2023 DS 14350 Computer Architecture For Machine Learning
3 pages
Common Monomial
No ratings yet
Common Monomial
14 pages
Tugas Kelompok Matlan
No ratings yet
Tugas Kelompok Matlan
3 pages
CS 701 Viva Qa
No ratings yet
CS 701 Viva Qa
4 pages
Tony Jameson Presentation
No ratings yet
Tony Jameson Presentation
88 pages
Polynomials Final Corrected LG
No ratings yet
Polynomials Final Corrected LG
3 pages
GANppt
100% (1)
GANppt
34 pages
Shifting Bottleneck Procedure For Job Shop Scheduling
No ratings yet
Shifting Bottleneck Procedure For Job Shop Scheduling
8 pages
Mini Project: By: Ketaki Limaye Yash Dhadve Nakul Bahurupi Sakshi Gole
No ratings yet
Mini Project: By: Ketaki Limaye Yash Dhadve Nakul Bahurupi Sakshi Gole
20 pages
Module 3 (Aad)
No ratings yet
Module 3 (Aad)
25 pages
Lec15 PDF
No ratings yet
Lec15 PDF
83 pages
LU Decomposition With Gauss Elimination
No ratings yet
LU Decomposition With Gauss Elimination
4 pages
CPT212 2324-Assignment1
No ratings yet
CPT212 2324-Assignment1
3 pages
Math Zc234 l8
No ratings yet
Math Zc234 l8
22 pages
Cholesky Factorization: Positive de Nite Matrices
No ratings yet
Cholesky Factorization: Positive de Nite Matrices
9 pages
MGMT Science
No ratings yet
MGMT Science
35 pages
Addition of Polynomials
No ratings yet
Addition of Polynomials
24 pages
Lecture 3 - Meaning, Nature and Scope
80% (5)
Lecture 3 - Meaning, Nature and Scope
29 pages
Linear Programming 1
No ratings yet
Linear Programming 1
5 pages
Lesson 01
No ratings yet
Lesson 01
6 pages
Homework 4: IEOR 160: Operations Research I (Fall 2014)
No ratings yet
Homework 4: IEOR 160: Operations Research I (Fall 2014)
2 pages
New Termwork4.1
No ratings yet
New Termwork4.1
12 pages
5th Sem Operations Research Previous Years Questions
No ratings yet
5th Sem Operations Research Previous Years Questions
24 pages
GROUP A - Project SPL (Matrix Inverse, Gauss Jordan, LU Factorization, Jacobi, Gauss-Seidel)
No ratings yet
GROUP A - Project SPL (Matrix Inverse, Gauss Jordan, LU Factorization, Jacobi, Gauss-Seidel)
22 pages
Polynomials Worksheet
No ratings yet
Polynomials Worksheet
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Tensorflow Deep Learning With Keras

Uploaded by

Tensorflow Deep Learning With Keras

Uploaded by

TensorFlow: Deep learning with Keras

Primož Godec1 and Rok Hribar2

TensorFlow: Deep learning with Keras

TensorFlow: Deep learning with Keras

TensorFlow: Deep learning with Keras

TensorFlow: Deep learning with Keras

I machines (or computers) that mimic cognitive

What kind of murderer has moral fiber? – A cereal killer.

I machines (or computers) that mimic cognitive

What kind of murderer has moral fiber? – A cereal killer.

secret prize! claim secret prize now spam

Basic machine learning algorithm:

I can capture wide range of functions

I can capture wide range of functions

I can capture wide range of functions

More philosophical reasons for why depth is good:

There have been many procedures to train neural networks through

There have been many procedures to train neural networks through

I Derivative of error with respect

Overfitting is a modeling error that

#data instances  #parameters

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

However, this idea existed also 1950–2010 when success of deep

However, this idea existed also 1950–2010 when success of deep

However, this idea existed also 1950–2010 when success of deep

However, this idea existed also 1950–2010 when success of deep

However, this idea existed also 1950–2010 when success of deep

However, this idea existed also 1950–2010 when success of deep

However, this idea existed also 1950–2010 when success of deep

I Introduction to neural networks through classification

I Image classification with convolutional neural networks

Lake Cerknica is the largest lake

Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkn

Lake Cerknica is the largest lake

Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkniško  Cerknica Lake - Project area

Lake Cerknica is the largest lake

Projekt LIFE 06 NAT/SLO/000069 - Presihajoče Cerkniško  Cerknica Lake - Project area

A = tf . Variable ( [[1.0 , 2.0] , [3.0 , 4.0]] )

C = tf . matmul (A , B ) # matrix multiplication

x9 , x10 x11 , x12 x15 , x16

x17 , x18 x19 , x20 x21 , x22 x23 , x24

Very efficient for

Very efficient in case function

Very efficient in case function

with tf . GradientTape () as tape : # Save graph to tape

Optimizers are available in tf.keras.optimizers module

user id movie id rating

error = kW (R − G · H)k = min. G, H ≥ 0

# Choose a gradient based optimizer .

# Perform gradient descent .

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

#data instances #parameters