0% found this document useful (0 votes)
3 views3 pages

Deep Learning Week 201

Uploaded by

Adan Gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Deep Learning Week 201

Uploaded by

Adan Gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Artificial Intelligence (AI)

* Similar to electricity (100 years ago) -> AI is transforming multiple industries


* We are living in an AI-powered society.
Supervised learning: Many examples of input-output mappings implementing them and categorizing new data inputs.
Machine Learning Unsupervised learning: Figuring things based on patterns and statistics in data without initial mappings.
Algorithms Transfer learning: Transferring mappings from one task with many examples to another with less data.
Reinforcement learning: continually giving feedback like “good computer/bad computer.”
Idea
Iterative process of developing machine learning models: Better Algorithms -> Improve the time needed to run an
experiment or to train a model and thus enable us to produce better models. E.g., ReLU function vs Sigmoid function.
Experiment Code
Universal function approximators -> Given training examples, they learn to approximate any continuous function mapping
inputs X to outputs Y. Learn hierarchical representations -> complex relationships into simpler components.
Neural Networks Parameters (weights and biases) grant them flexibility to fit a wide range of functions.
Nonlinear activation functions -> Model complex, nonlinear relationships between inputs and outputs.
Optimization algorithms -> (e.g., gradient descent) -> adjust parameters to minimize the difference
between predicted and actual outputs over many training examples.. Limited computational power during the 80s hindered
When the input/output is a sequence (e.g., a the development and training of deep neural networks.
Recurrent Neural Why is used for
sequence of words)
Networks (RNN) machine translation? Involves sequential input (the source language sentence) and sequential output (the target
Convolutional Neural Networks (CNN) -> designed to process grid-
language translation)
like data, such as images. RNNs are designed to process sequential It can be trained as a supervised learning problem using labeled data, where the input is a
data, such as time series or natural language. CNNs capture spatial
dependencies and local patterns in the input data. RNNs handle sentence in the source language (e.g., English), and the corresponding output is the
temporal dependencies and sequences. CNNs -> computer vision
tasks. – RNNs -> NLP tasks.
translated sentence in the target language (e.g., French).
Structured -> well-defined schema and can be organized into a tabular format, like a spreadsheet or a database table.

Data Type E.g., A demographic dataset with statistics on different cities' population, GDP per capita, and economic growth.
Unstructured -> does not have a predefined schema or format. E.g., text documents, images, audio files, and videos.
Lesson 2 – Standard Notations for Deep Learning
Some Deep Learning mathematical notations.
Processing the Usually process the entire training set without using an explicit for loop.
training set This is different from the typical approach of using a for loop to step through m training examples one by one.
Notations -> Superscript (i) -> the ith training example | Superscript [l] -> the lth layer [-]
Number of examples -> 𝑚 | Input Size -> 𝑛) | Output Size -> 𝑛* | Hidden Units of the lth layer -> 𝑛+ | Number of layers -> 𝐿
Input Matrix -> 𝑋 ∈ ℝ:!×< → It contains the feature values for each training example, 𝑚 -> examples ; 𝑛 -> features.
Objects Training Example -> 𝑥 ()) ∈ ℝ:! → It is represented as a column vector.
used in Label Matrix -> 𝑌 ∈ ℝ:"×< → It contains desired outputs for each training example, 𝑚 -> examples ; 𝑛 -> classes
Neural Output Label ->𝑦 (#) ∈ ℝ3! → It is the predicted class or value for an ith training example.
Networks Weight Matrix-> 𝑊 [-] ∈ ℝ34&567 89 43#:; #3 36): -<*67 × 34&567 89 43#:; #3 :+6 >76?#84; -<*67 → [l] = layer
Bias Vector -> 𝑏 [-] ∈ ℝ34&567 89 43#:; #3 36): -<*67 → It allows [@]
the network to shift the activation function output.
3!
Predicted Output Vector -> 𝑦( ∈ ℝ → It also can be denoted 𝑎 where 𝐿 is the number of layers.
Examples of 𝑎 = 𝑔 ' 𝑊( 𝑥 ) + 𝑏* Where 𝑔 ' is lth layer activation function. Activation 𝑎 of the lth layer. It is computed by applying the activation function
Equations of 𝑔 $ to the linear combination of the weights 𝑊 and activations 𝑥 from the previous layer, plus the bias term 𝑏% .
𝑦( ()) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊/ ℎ + 𝑏0 → Activation 𝑦! (") of the output layer. It is computed by applying the sigmoid activation function (𝑠𝑜𝑓𝑡𝑚𝑎𝑥) to the linear
common combination of the weights 𝑊, input features ℎ, and bias term 𝑏& .
forward [-] [-] [-12] [-] -
General Activation Formula: 𝑎/ = 𝑔 - ∑0 𝑤/0 𝑎0 + 𝑏/ = 𝑔 - 𝑧/ →Activation 𝑎'[$] of the jth neuron in the lth
propagation [$+%]
layer. It is computed by applying the activation function 𝑔 $ to the weighted sum of activations 𝑎* from the neurons in the previous layer 𝑙 − 1 , multiplied by the weights
[$] [$]
𝑤'* , plus the bias term 𝑏' .

Cost function: 𝐽 𝑥, 𝑊, 𝑏, 𝑦 = 𝐽 𝑦, ( 𝑦 → It measures the difference between the predicted output 𝑦! (which is a function of the input 𝑥, weights 𝑊, and biases 𝑏)
Examples and the true output 𝑦, providing a quantitative measure of how well the network is performing.
of cost Cross-entropy Cost Function: 𝐽!" 𝑦, ( 𝑦 = ∑& (#)
#$% 𝑦 𝑙𝑜𝑔𝑦 ( (#) →It measures the dissimilarity between the predicted probabilities 𝑦! (") and the true
functions labels 𝑦 (") across all 𝑚 training examples, with the goal of minimizing this cost during training to improve the network's performance.
( 𝑦 = ∑&
Mean Absolute Error (MAE) cost function: 𝐽2 𝑦, #$% 𝑦
(#)
− 𝑦( (#) →It calculates the average absolute difference between the predicted
values 𝑦! (") and the actual values 𝑦 (") across all 𝑚 examples. MAE equally penalizes all errors, making it less sensitive to outliers compared to Mean Squared Error (MSE).
Lesson 2 – Standard Notations for Deep Learning
Some Deep Learning mathematical notations.
Processing the Usually process the entire training set without using an explicit for loop.
training set This is different from the typical approach of using a for loop to step through m training examples one by one.
Notations -> Superscript (i) -> the ith training example | Superscript [l] -> the lth layer [-]
Number of examples -> 𝑚 | Input Size -> 𝑛) | Output Size -> 𝑛* | Hidden Units of the lth layer -> 𝑛+ | Number of layers -> 𝐿
Input Matrix -> 𝑋 ∈ ℝ:!×< → It contains the feature values for each training example, 𝑚 -> examples ; 𝑛 -> features.
Objects Training Example -> 𝑥 ()) ∈ ℝ:! → It is represented as a column vector.
used in Label Matrix -> 𝑌 ∈ ℝ:"×< → It contains desired outputs for each training example, 𝑚 -> examples ; 𝑛 -> classes
Neural Output Label ->𝑦 (#) ∈ ℝ3! → It is the predicted class or value for an ith training example.
Networks Weight Matrix-> 𝑊 [-] ∈ ℝ34&567 89 43#:; #3 36): -<*67 × 34&567 89 43#:; #3 :+6 >76?#84; -<*67 → [l] = layer
Bias Vector -> 𝑏 [-] ∈ ℝ34&567 89 43#:; #3 36): -<*67 → It allows [@]
the network to shift the activation function output.
3!
Predicted Output Vector -> 𝑦( ∈ ℝ → It also can be denoted 𝑎 where 𝐿 is the number of layers.
Examples of 𝑎 = 𝑔 ' 𝑊( 𝑥 ) + 𝑏* Where 𝑔 ' is lth layer activation function. Activation 𝑎 of the lth layer. It is computed by applying the activation function
Equations of 𝑔 $ to the linear combination of the weights 𝑊 and activations 𝑥 from the previous layer, plus the bias term 𝑏% .
𝑦( ()) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊/ ℎ + 𝑏0 → Activation 𝑦! (") of the output layer. It is computed by applying the sigmoid activation function (𝑠𝑜𝑓𝑡𝑚𝑎𝑥) to the linear
common combination of the weights 𝑊, input features ℎ, and bias term 𝑏& .
forward [-] [-] [-12] [-] -
General Activation Formula: 𝑎/ = 𝑔 - ∑0 𝑤/0 𝑎0 + 𝑏/ = 𝑔 - 𝑧/ →Activation 𝑎'[$] of the jth neuron in the lth
propagation [$+%]
layer. It is computed by applying the activation function 𝑔 $ to the weighted sum of activations 𝑎* from the neurons in the previous layer 𝑙 − 1 , multiplied by the weights
[$] [$]
𝑤'* , plus the bias term 𝑏' .

Cost function: 𝐽 𝑥, 𝑊, 𝑏, 𝑦 = 𝐽 𝑦, ( 𝑦 → It measures the difference between the predicted output 𝑦! (which is a function of the input 𝑥, weights 𝑊, and biases 𝑏)
Examples and the true output 𝑦, providing a quantitative measure of how well the network is performing.
of cost Cross-entropy Cost Function: 𝐽!" 𝑦, ( 𝑦 = ∑& (#)
#$% 𝑦 𝑙𝑜𝑔𝑦 ( (#) →It measures the dissimilarity between the predicted probabilities 𝑦! (") and the true
functions labels 𝑦 (") across all 𝑚 training examples, with the goal of minimizing this cost during training to improve the network's performance.
( 𝑦 = ∑&
Mean Absolute Error (MAE) cost function: 𝐽2 𝑦, #$% 𝑦
(#)
− 𝑦( (#) →It calculates the average absolute difference between the predicted
values 𝑦! (") and the actual values 𝑦 (") across all 𝑚 examples. MAE equally penalizes all errors, making it less sensitive to outliers compared to Mean Squared Error (MSE).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy