0% found this document useful (0 votes)

41 views49 pages

Deepnet Lourentzou

This document provides an introduction and overview of deep learning. It begins with an overview of machine learning basics, then defines deep learning as a machine learning technique that learns multiple levels of representation from large amounts of data. The main components of deep learning models are described, including activation functions, optimizers, cost functions, regularization methods, and classification vs regression tasks. Common deep neural network architectures like convolutional and recurrent networks are also mentioned. The document concludes by noting several applications of deep learning like machine translation and question answering.

Uploaded by

Hooper Chheang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views49 pages

Deepnet Lourentzou

Uploaded by

Hooper Chheang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 49

Introductio

n to
Deep Learn
ing
Ismini Loure
ntzou
11-30-2017
Outline
 Machine Learning basics
 Introduction to Deep Learning
 what is Deep Learning
 why is it useful
 Main components/hyper-parameters:
 activation functions
 optimizers, cost functions and training

x
 regularization methods
 tuning Backpropagation
GANs & Adversarial training
 classification vs. regression tasks
Bayesian Deep Learning
 DNN basic architectures: Generative models
 convolutional Unsupervised / Pretraining
 recurrent
 attention mechanism
 Application example: Relation Extraction

Most material from

Machine Learning Basics
Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed

Machine Learning
Labeled Data
algorithm

Training
Prediction

Learned model
Labeled Data Prediction

Methods that can learn from and make predictions on data

Types of Learning
Supervised: Learning with a labeled training set
Example: email classification with already labeled emails

Unsupervised: Discover patterns in unlabeled data

Example: cluster similar documents based on text

Reinforcement learning: learn to act based on feedback/reward

Example: learn to play Go, reward: win or lose

class A

Classification Regression Clustering

Anomaly Detection
Sequence labeling
http://mbjoseph.github.io/2013/11/27/measure.html
…
ML vs. Deep Learning
Most machine learning methods work well because of human-designed
representations and input features
ML becomes just optimizing weights to best make a final prediction
What is Deep Learning (DL) ?
A machine learning subfield of learning representations of data. Exceptional effective
at learning patterns.
Deep learning algorithms attempt to learn (multiple levels of) representation by using a
hierarchy of multiple layers
If you provide the system tons of information, it begins to understand it and respond
in useful ways.

https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
Why is DL useful?
o Manually designed features are often over-specified, incomplete and take a long time
to design and validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable framework for
representing world, visual and linguistic information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning
o Utilize large amounts of training data

In ~2010 DL started outperforming other

ML techniques
first in speech and vision, then NLP
State of the art in …

Several big improvements in recent years in NLP

 Machine Translation
 Sentiment Analysis Leverage different levels of representation
 Dialogue Agents o words & characters
 Question Answering o syntax & semantics
 Text Classification …
Neural Network Intro
Weights

Activation functions

How do we train?

4 + 2 = 6 neurons (not counting inputs)

[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters
Training
Sample labeled Forward it Back-
data through the Update the
network, get
propagate the network weights
(batch) errors
predictions

Optimize (min. or max.) objective/cost function

Generate error signal that measures difference between
predictions and target values

Use error signal to change the weights and get more

accurate predictions
Subtracting a fraction of the gradient moves you towards
the (local) minimum of the cost function
https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
Gradient Descent
objective/cost function

Update each element of θ

Matrix notation for all parameters

learning rate

Recursively apply chain rule though each node

One forward pass
Text (input) representation
TFIDF
Word embeddings
….

0.2 -0.5 0.1 0.1 1.0 0.95 very positive

2.0 1.5 1.3 3.0 3.89 positive
0.2 0.025
0.5 0.0 0.25 0.15 negative
-0.3 2.0 0.0 0.37 very negative
0.3 0.0
Activation functions
Non-linearities needed to learn complex (non-linear) representations of data, otherwise
the NN would be just a linear function

http://cs231n.github.io/assets/nn1/layer_sizes.jpeg

More layers and neurons can approximate more complex functions

Full list:
Activation: Sigmoid
Takes a real-valued number and
“squashes” it into range between 0 and
1.

http://adilmoujahid.com/images/activation.png

+ Nice interpretation as the firing rate of a neuron

• 0 = not firing at all
• 1 = fully firing

- Sigmoid neurons saturate and kill gradients, thus NN will barely learn

� gradient at these regions almost zero

• when the neuron’s activation are 0 or 1 (saturate)

� almost no signal will flow to its weights

� if initial weights are too large then most neurons would saturate
Activation: Tanh
Takes a real-valued number and
“squashes” it into range between -1 and
1.

http://adilmoujahid.com/images/activation.png

- Like sigmoid, tanh neurons saturate

- Unlike sigmoid, output is zero-centered
- Tanh is a scaled sigmoid:
Activation: ReLU
Takes a real-valued number and
thresholds it at zero

http://adilmoujahid.com/images/activation.png

Most Deep Networks use ReLU nowadays

� Trains much faster

• accelerates the convergence of SGD

� Less expensive operations

• due to linear, non-saturating form

• compared to sigmoid/tanh (exponentials etc.)

� More expressive
• implemented by simply thresholding a matrix at zero

� Prevents the gradient vanishing problem

Overfitting

http://wiki.bethanycrane.com/overfitting-of-data

Learned hypothesis may fit the

training data very well, even
outliers (noise) but fail to generalize
to new examples (test data)

https://www.neuraldesigner.com/images/learning/selection_error.svg
Regularization
Dropout
• Randomly drop units (along with their
connections) during training
• Each unit retained with fixed probability p,
independent of other units
• Hyper-parameter p to be chosen (tuned)
Srivastava, Nitish, et al. Journal of machine learning research (2014)

L2 = weight decay
• Regularization term that penalizes big weights, added to the
objective
• Weight decay value determines how dominant regularization is during
gradient computation
• Big weight decay coefficient  big penalty for big weights

Early-stopping
• Use validation error to decide when to stop training
• Stop when monitored quantity has not improved after n subsequent epochs
• n is called patience
Tuning hyper-parameters
g(x) ≈ g(x) + h(y)

g(x) shown in green

h(y) is shown in yellow
Bergstra, James, and Yoshua Bengio. "" Journal
of Machine Learning Research, Feb (2012)

“Grid and random search of 9 trials for optimizing function g(x) ≈ g(x) + h(y)
With grid search, nine trials only test g(x) in three distinct places.
With random search, all nine trials explore distinct values of g. ”

Both try configurations randomly and blindly

Next trial is independent to all the trials done before

Make smarter choice for the next trial, minimize the number of trials
1. Collect the performance at several configurations
2. Make inference and decide what configuration to try next
Loss functions and output
Classification Regression

Training Rn x {class_1, ..., class_n} Rn x R m

examples (one-hot encoding)

Output Soft-max Linear (Identity)

Layer [map R to a probability distribution]
n
or Sigmoid

f(x)=x

Cost (loss) Cross-entropy Mean Squared Error

function

Mean Absolute Error

Convolutional Neural
Networks (CNNs)
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards

Example: “this takes too long” compute vectors for:

This takes, takes too, too long, this takes too, takes too long, this takes too long

Convolutional
Input matrix 3x3 filter
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolutional Neural
Networks (CNNs)
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards

max pool
2x2 filters
and stride 2

https://shafeentejani.github.io/assets/images/pooling.gif
CNN for text classification

Severyn, Aliaksei, and Alessandro Moschitti. "UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment
Classification." SemEval@ NAACL-HLT. 2015.
CNN with multiple filters

Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)

sliding over 3, 4 or 5 words at a time

Recurrent Neural Networks
(RNNs)
Main RNN idea for text:
Condition on all previous words
Use same set of weights at all time steps

https://pbs.twimg.com/media/C2j-8j5UsAACgEK.jpg

� Stack them up, Lego fun!

https://discuss.pytorch.org/uploads/default/original/1X/6415da0424dd66f2f5b134709b92baa59e604c55.jpg
Bidirectional RNNs
Main idea: incorporate both left and right context
output may not only depend on the previous elements in the sequence, but also future
elements.

past and future around a single token

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-
part-1-introduction-to-rnns/

two RNNs stacked on top of each other

output is computed based on the hidden state of both RNNs
Sequence 2 Sequence or
Encoder Decoder model

Cho, Kyunghyun, et al. "Learning phrase

representations using RNN encoder-decoder for
statistical machine translation." EMNLP 2014
Gated Recurrent Units
(GRUs)
Main idea:
keep around memory to capture long dependencies
Allow error messages to flow at different strengths depending on the inputs
Standard RNN computes hidden layer at next time step
directly

Compute an update gate based on current input word

vector and hidden state
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-
4-implementing-a-grulstm-rnn-with-python-and-theano/

Controls how much of past state should matter now

If z close to 1, then we can copy information in that unit through many steps!
Gated Recurrent Units
(GRUs)
Main idea:
keep around memory to capture long dependencies
Allow error messages to flow at different strengths depending on the inputs
Standard RNN computes hidden layer at next time step
directly

Compute an update gate based on current input word

vector and hidden state
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-
4-implementing-a-grulstm-rnn-with-python-and-theano/

Compute a reset gate similarly but with different weights

If reset close to 0, ignore previous hidden
state (allows model to drop information that is
irrelevant in the future)

Units with short-term dependencies often have reset gates very active
Units with long-term dependencies have active update gates z
Gated Recurrent Units
(GRUs)
Main idea:
keep around memory to capture long dependencies
Allow error messages to flow at different strengths depending on the inputs
Standard RNN computes hidden layer at next time step
directly

Compute an update gate based on current input word

vector and hidden state
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-
4-implementing-a-grulstm-rnn-with-python-and-theano/

Compute a reset gate similarly but with different weights

are a more complex form, but basically

New memory same intuition
GRUs are often more preferred than
Final memory LSTMs
combines current & previous time steps
Attention Mechanism
Pool of source states

Bahdanau D. et al. "Neural machine translation by jointly learning to align and translate." ICLR (2015)

Main idea: retrieve as needed

Attention - Scoring

Compare target and source hidden states

Attention - Normalization

Convert into alignment weights

Attention - Context

Build context vector: weighted average

Attention - Context

Compute next hidden state

Application Example:
IMDB Movie reviews
sentiment classification
https://uofi.box.com/v/cs510DL

Binary Classification
Dataset of 25,000 movies reviews from IMDB, labeled by
sentiment (positive/negative)
Application Example:
Relation Extraction from text

http://www.mathcs.emory.edu/~dsavenk/slides/relation_extraction/img/distant.png
Useful for:
• knowledge base completion
• social media analysis
• question answering
• …
Task: binary (or multi-class)
classification
sentence S = w1 w2 .. e1 .. wj .. e2 .. wn e1 and e2 entities

“The new iPhone 7 Plus includes an improved camera to take amazing pictures”

Component-Whole(e1 , e2 ) ?
YES / NO

It is also possible to include more than two entities as well:

“At codons 12, the occurrence of point mutations from G to T were observed”
 point mutation(codon, 12, G, T)
Features / Input
representation
1) context-wise split of Embeddings Embeddings
the sentence Embeddings Left
Middle Right

The new iPhone 7 Plus includes an improved camera that takes amazing pictures

2) word sequences Word indices Position indices e1 Position indices e2

concatenated with [5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]
positional features
Word Embeddings Positional emb. e1 Positional emb. e2

3) concatenating
embeddings of two entities
with average of word Embeddings e1 Embeddings e2 context embeddings
embeddings for rest of the
words

The new iPhone 7 Plus includes an improved camera that takes amazing pictures
Models: MLP
Component-Whole(e1 , e2 ) ?
Sigmoid YES / NO

Dense Layer n
…

Dense Layer 1

Embeddings e1 context embeddings Embeddings e2

The new iPhone 7 Plus includes an improved camera that takes …

Simple fully-connected multi-layer perceptron

Models: CNN
Component-Whole(e1 , e2 ) ?
Sigmoid
YES / NO

Max Pooling Max Pooling Max Pooling

Convolutional Layer Convolutional Layer Convolutional Layer

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

OR
Word indices Position indices e1 Position indices e2
[5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]

Word Embeddings Positional emb. e1 Positional emb. e2

Zeng, D.et al. “Relation classication via convolutional deep neural network”.COLING
(2014)
Models: CNN (2)
Component-Whole(e1 , e2 ) ?
Sigmoid
YES / NO

Max Max Max

Pooling Pooling Pooling

Convoluti Convoluti Convoluti

on on on CNN with multiple CNN with multiple
filter=2 filter=3 filter=k filter sizes filter sizes

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

OR
Word indices Position indices e1 Position indices e2
[5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]

Word Embeddings Positional emb. e1 Positional emb. e2

Nguyen, T.H., Grishman, R. “Relation extraction: Perspective from convolutional neural networks.” VS@ HLT-NAACL. (2015)
Models: Bi-GRU
Component-Whole(e1 , e2 ) ?
Sigmoid
YES / NO

Attention or
Max Pooling

Bi-GRU

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

OR
Word indices Position indices e1 Position indices e2
[5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]

Word Embeddings Positional emb. e1 Positional emb. e2

Zhang, D., Wang, D. “Relation classication via recurrent neural network.” -arXiv preprint arXiv:1508.01006 (2015)
Zhou, P. et al. “Attention-based bidirectional LSTM networks for relation classication. ACL (2016)
Distant Supervision
Circumvent the annotation problem – create large dataset
Exploit large knowledge bases to automatically label entities and their relations in text

Assumption:
when two entities co-occur in a sentence, a certain relation is expressed

knowledge base
Relation Entity 1 Entity 2 text
place of birth Michael Gary Barack Obama moved from Gary ….
Jackson Michael Jackson met … in Hawaii
place of birth Barack Hawaii
Obama
place of birth
… … …

For many ambiguous relations, mere co-occurrence does not guarantee the existence of the
relation  Distant supervision produces false positives
Attention over Instances
s representation of the sentence set

ai weight given by sentence-level attention

xi sentence vector representation

xi sentence for an entity pair (e1,e2)

n sentences for relation r(e1,e2)

Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016)
Sentence-level ATT results
NYT10 Dataset
Align Freebase relations with New
York Times corpus (NYT)
53 possible relationships
+NA (no relation between entities)

Data sentences entity pairs

Training 522,611 281,270

Test 172,448 96,678

Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016)
References
 Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine
learning research (2014)
 Bergstra, James, and Yoshua Bengio. "Random search for hyper-parameter optimization." Journal of Machine
Learning Research, Feb (2012)
 Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)
 Severyn, Aliaksei, and Alessandro Moschitti. "UNITN: Training Deep Convolutional Neural Network for
Twitter Sentiment Classification." SemEval@ NAACL-HLT (2015)
 Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine
translation." EMNLP (2014)
 Ilya Sutskever et al. “Sequence to sequence learning with neural networks.” NIPS (2014)
 Bahdanau et al. "Neural machine translation by jointly learning to align and translate." ICLR (2015)
 Gal, Y., Islam, R., Ghahramani, Z. “Deep Bayesian Active Learning with Image Data.” ICML (2017)
 Nair, V., Hinton, G.E. “Rectified linear units improve restricted boltzmann machines.” ICML (2010)
 Ronan Collobert, et al. “Natural language processing (almost) from scratch.” JMLR (2011)

 Kumar, Shantanu. "A Survey of Deep Learning Methods for Relation Extraction." arXiv preprint arXiv:1705.03645 (2017)
 Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016) [code]
 Zeng, D.et al. “Relation classification via convolutional deep neural network”. COLING (2014)
 Nguyen, T.H., Grishman, R. “Relation extraction: Perspective from CNNs.” VS@ HLT-NAACL. (2015)
 Zhang, D., Wang, D. “Relation classification via recurrent NN.” -arXiv preprint arXiv:1508.01006 (2015)
 Zhou, P. et al. “Attention-based bidirectional LSTM networks for relation classification . ACL (2016)
 Mike Mintz et al. “Distant supervision for relation extraction without labeled data.” ACL- IJCNLP (2009)
References & Resources
 http://web.stanford.edu/class/cs224n
 https://www.coursera.org/specializations/deep-learning
 https://chrisalbon.com/#Deep-Learning
 http://www.asimovinstitute.org/neural-network-zoo
 http://cs231n.github.io/optimization-2
 https
://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
 https://arimo.com/data-science/2016/bayesian-optimization-hyperparameter-tuning
 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow
 http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp
 https://
medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-networks-on-the-internet-fbb8b1ad
5df8

 http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
 http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python
-and-theano/
 http://colah.github.io/posts/2015-08-Understanding-LSTMs
 https://github.com/hyperopt/hyperopt
 https://github.com/tensorflow/nmt
https://giphy.com/gifs/thanks-thank-you-thnx-3o6ozuHcxTtVWJJn32/download

Deep Learning Tutorial
No ratings yet
Deep Learning Tutorial
133 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
125 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
108 pages
Neural Network and Deep Learning 1736802600
No ratings yet
Neural Network and Deep Learning 1736802600
54 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
Unit II
No ratings yet
Unit II
56 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Deep Learning
No ratings yet
Deep Learning
90 pages
Simplifying Neural Networks and Deep Learning Basics!
No ratings yet
Simplifying Neural Networks and Deep Learning Basics!
27 pages
Deep Learning
No ratings yet
Deep Learning
95 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Lec 2
No ratings yet
Lec 2
42 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
68 pages
7 Deep Learning
No ratings yet
7 Deep Learning
75 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Deep Learning Day 27
No ratings yet
Deep Learning Day 27
43 pages
Midterm Study Guide Csci566
No ratings yet
Midterm Study Guide Csci566
20 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Practical Research 2: Quarter 1 - Module 1 Nature of Inquiry and Research
100% (5)
Practical Research 2: Quarter 1 - Module 1 Nature of Inquiry and Research
13 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Unit 1
No ratings yet
Unit 1
70 pages
Deep Learning Report For Students
No ratings yet
Deep Learning Report For Students
32 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
MLT Unit 4 and 5 Part 2
No ratings yet
MLT Unit 4 and 5 Part 2
34 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Sony KDL-32EX720 AZ2F
No ratings yet
Sony KDL-32EX720 AZ2F
160 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
DL Intro
No ratings yet
DL Intro
64 pages
Static and Dynamic Testing
100% (1)
Static and Dynamic Testing
51 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Deep Learning 2 July 2014
No ratings yet
Deep Learning 2 July 2014
75 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
22 pages
Data Lake and Data Warehouse
100% (2)
Data Lake and Data Warehouse
24 pages
SI+410+V1.11 User+Manual
No ratings yet
SI+410+V1.11 User+Manual
58 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Deep Learning
100% (3)
Deep Learning
32 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
Survey Questionnaire Group 1
100% (1)
Survey Questionnaire Group 1
7 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Deep Learning in Neural Networks: An Overview
No ratings yet
Deep Learning in Neural Networks: An Overview
31 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
Lecture Home and Building Automation
No ratings yet
Lecture Home and Building Automation
47 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
SHM
No ratings yet
SHM
24 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Lab 99
No ratings yet
Lab 99
18 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Forein03 PDF
No ratings yet
Forein03 PDF
20 pages
Above. Ahead. Always.: All Terrain Cranes. Crawler Cranes
No ratings yet
Above. Ahead. Always.: All Terrain Cranes. Crawler Cranes
12 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
GPI Trimec DP Manual
No ratings yet
GPI Trimec DP Manual
8 pages
Explanation - 8
No ratings yet
Explanation - 8
3 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Sinamics Startdrive v13 Sp1
No ratings yet
Sinamics Startdrive v13 Sp1
1 page
Deloitte
No ratings yet
Deloitte
12 pages
System Admin Resume
100% (1)
System Admin Resume
7 pages
Aplikasi Sap 2000 Untuk Rancang Bangun Konstruksi Baja Ringan Bentang Panjang
No ratings yet
Aplikasi Sap 2000 Untuk Rancang Bangun Konstruksi Baja Ringan Bentang Panjang
11 pages
AEIF 2024 Proposal Forms
No ratings yet
AEIF 2024 Proposal Forms
10 pages
Pse
No ratings yet
Pse
7 pages
ML 03 Clustering
No ratings yet
ML 03 Clustering
63 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
1 s2.0 S0264127522001903 Main
No ratings yet
1 s2.0 S0264127522001903 Main
7 pages
ALFRD Instruction Manualv5.2
No ratings yet
ALFRD Instruction Manualv5.2
12 pages
AC-DC LCD TV Power Architecture and LED Backlight
No ratings yet
AC-DC LCD TV Power Architecture and LED Backlight
53 pages
FE Chapter 1
No ratings yet
FE Chapter 1
9 pages
FE Chapter 1
No ratings yet
FE Chapter 1
9 pages
Fire Resistance Loaded Fire Tests FINAL04012024
No ratings yet
Fire Resistance Loaded Fire Tests FINAL04012024
3 pages
Zero-Backlash Rack-Pinion-Gearbox Systems - Servotak
No ratings yet
Zero-Backlash Rack-Pinion-Gearbox Systems - Servotak
2 pages
Orthogonal Time-Frequency Space Modulation A Promising Next-Generation Waveform
No ratings yet
Orthogonal Time-Frequency Space Modulation A Promising Next-Generation Waveform
9 pages
Credit GrantingA Comparative Analysis of
No ratings yet
Credit GrantingA Comparative Analysis of
18 pages
Forensic Camera Classification - Verification of Sensor Pattern Noise Approach
No ratings yet
Forensic Camera Classification - Verification of Sensor Pattern Noise Approach
16 pages
Meeting Report: Previous Works On-Going Works
No ratings yet
Meeting Report: Previous Works On-Going Works
2 pages
Fe Quiz: Name: - Date: - Class: - Score: - /5
No ratings yet
Fe Quiz: Name: - Date: - Class: - Score: - /5
1 page
Meeting Report
No ratings yet
Meeting Report
1 page
Me Yrk
No ratings yet
Me Yrk
3 pages
The Crew The Deimos Adventure 3
No ratings yet
The Crew The Deimos Adventure 3
1 page
Mangueira PVC - S470 General Purpose
No ratings yet
Mangueira PVC - S470 General Purpose
1 page
Macro - Salty? (1) .MDR
No ratings yet
Macro - Salty? (1) .MDR
3 pages
Computer Science
No ratings yet
Computer Science
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deepnet Lourentzou

Uploaded by

Deepnet Lourentzou

Uploaded by

Introductio

Most material from

Methods that can learn from and make predictions on data

Unsupervised: Discover patterns in unlabeled data

Reinforcement learning: learn to act based on feedback/reward

Classification Regression Clustering

In ~2010 DL started outperforming other

Several big improvements in recent years in NLP

4 + 2 = 6 neurons (not counting inputs)

Optimize (min. or max.) objective/cost function

Use error signal to change the weights and get more

Update each element of θ

Matrix notation for all parameters

Recursively apply chain rule though each node

0.2 -0.5 0.1 0.1 1.0 0.95 very positive

More layers and neurons can approximate more complex functions

+ Nice interpretation as the firing rate of a neuron

� gradient at these regions almost zero

� almost no signal will flow to its weights

- Like sigmoid, tanh neurons saturate

Most Deep Networks use ReLU nowadays

� Trains much faster

� Less expensive operations

• compared to sigmoid/tanh (exponentials etc.)

� Prevents the gradient vanishing problem

Learned hypothesis may fit the

g(x) shown in green

Both try configurations randomly and blindly

Training Rn x {class_1, ..., class_n} Rn x R m

Output Soft-max Linear (Identity)

Cost (loss) Cross-entropy Mean Squared Error

Mean Absolute Error

Example: “this takes too long” compute vectors for:

Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)

sliding over 3, 4 or 5 words at a time

� Stack them up, Lego fun!

past and future around a single token

two RNNs stacked on top of each other

Cho, Kyunghyun, et al. "Learning phrase

Compute an update gate based on current input word

Controls how much of past state should matter now

Compute an update gate based on current input word

Compute a reset gate similarly but with different weights

Compute an update gate based on current input word

Compute a reset gate similarly but with different weights

are a more complex form, but basically

Main idea: retrieve as needed

Compare target and source hidden states

Convert into alignment weights

Build context vector: weighted average

Compute next hidden state

It is also possible to include more than two entities as well:

2) word sequences Word indices Position indices e1 Position indices e2

Embeddings e1 context embeddings Embeddings e2

The new iPhone 7 Plus includes an improved camera that takes …

Simple fully-connected multi-layer perceptron

Max Pooling Max Pooling Max Pooling

Convolutional Layer Convolutional Layer Convolutional Layer

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

Word Embeddings Positional emb. e1 Positional emb. e2

Max Max Max

Convoluti Convoluti Convoluti

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

Word Embeddings Positional emb. e1 Positional emb. e2

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

Word Embeddings Positional emb. e1 Positional emb. e2

ai weight given by sentence-level attention

xi sentence vector representation

xi sentence for an entity pair (e1,e2)

Data sentences entity pairs

Training 522,611 281,270

Test 172,448 96,678

You might also like