0% found this document useful (0 votes)

5 views44 pages

Tensor Flow Chat Bot

The document discusses the implementation of a TensorFlow chatbot using a sequence-to-sequence (seq2seq) model, highlighting key concepts such as the encoder-decoder architecture, attention mechanisms, and bucketing techniques. It also addresses challenges faced by the chatbot, including dramatic responses and inconsistent personality, and suggests improvements like training on multiple datasets and incorporating user feedback. The lecture includes announcements about upcoming assignments and guest lectures related to TensorFlow and deep learning.

Uploaded by

Abhishek Singla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views44 pages

Tensor Flow Chat Bot

Uploaded by

Abhishek Singla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

A TensorFlow Chatbot

CS 20SI:
TensorFlow for Deep Learning Research
Lecture 13
3/1/2017

1
2
Announcements
Assignment 3 out tonight, due March 17

No class this Friday: Pete Warden’s talk on TensorFlow for mobile

Guest lecture next Friday by Danijar Hafner on Reinforcement Learning

3
Agenda
Seq2seq

Implementation keys

Chatbot craze

4
Sequence to Sequence
● The current model class of choice for most dialogue and
machine translation systems
● Introduced by Cho et al. in 2014 for Statistical Machine
Translation (the predecessor of NMT)
● The paper “Learning Phrase Representations using RNN
Encoder-Decoder for Statistical Machine Translation” has
been cited 900 times, approx. one paper a day.
● Originally called “RNN Encoder – Decoder”

5
Sequence to Sequence
Consists of two recurrent neural networks (RNNs):
● Encoder maps a variable-length source sequence (input) to a
fixed-length vector
● Decoder maps the vector representation back to a variable-length
target sequence (output)
● Two RNNs are trained jointly to maximize the conditional probability
of the target sequence given a source sequence

6
Vanilla Encoder and Decoder

Graph from “Learning Phrase Representations using RNN Encoder–Decoder for Statistical 7
Machine Translation” (Cho et al.)
Encoder and Decoder in TensorFlow
● Each box in the picture represents a cell of the RNN, most commonly
a GRU cell or an LSTM cell.
● Encoder and decoder often have different weights, but sometimes
they can share weights.

Graph by Dev Nag 8

With Attention
● In the vanilla model, each input has to be encoded into a fixed-size
state vector, as that is the only thing passed to the decoder.
● Attention mechanism that gives decoder direct access to the input.

9
Graph by Indico.io blog
Bucketing
● Avoid too much padding that leads to extraneous computation
● Group sequences of similar lengths into the same buckets

10
Bucketing
● Avoid too much padding that leads to extraneous computation
● Group sequences of similar lengths into the same buckets
● Create a separate subgraph for each bucket

11
Bucketing
● Avoid too much padding that leads to extraneous computation
● Group sequences of similar lengths into the same buckets
● Create a separate subgraph for each bucket
● In theory, can use for v1.0:
tf.contrib.training.bucket_by_sequence_length(max_length,
examples, batch_size, bucket_boundaries, capacity=2 *
batch_size, dynamic_pad=True)
● In practice, use the bucketing algorithm used in TensorFlow’s
translate model (because we’re using v0.12)

12
Sampled Softmax
● Avoid the growing complexity of computing the normalization
constant
● Approximate the negative term of the gradient, by importance
sampling with a small number of samples.
● At each step, update only the vectors associated with the
correct word w and with the sampled words in V’
● Once training is over, use the full target vocabulary to compute
the output probability of each target word

On Using Very Large Target Vocabulary for Neural Machine Translation (Jean et al., 2015)

13
Sampled Softmax

if config.NUM_SAMPLES > 0 and config.NUM_SAMPLES < config.DEC_VOCAB:

weight = tf.get_variable('proj_w', [config.HIDDEN_SIZE, config.DEC_VOCAB])
bias = tf.get_variable('proj_b', [config.DEC_VOCAB])
self.output_projection = (w, b)

def sampled_loss(inputs, labels):

labels = tf.reshape(labels, [-1, 1])
return tf.nn.sampled_softmax_loss(tf.transpose(weight), bias, inputs, labels,
config.NUM_SAMPLES, config.DEC_VOCAB)
self.softmax_loss_function = sampled_loss

14
Sampled Softmax
● Generally an underestimate of the full softmax loss.
● At inference time, compute the full softmax using:

tf.nn.softmax(tf.matmul(inputs, tf.transpose(weight)) + bias)

15
Seq2seq in TensorFlow
outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)

encoder_inputs: a list of tensors representing inputs to the encoder

decoder_inputs: a list of tensors representing inputs to the decoder
cell: single or multiple layer cells

outputs: a list of decoder_size tensors, each of dimension 1 x DECODE_VOCAB corresponding

to the probability distribution at each time-step
states: a list of decoder_size tensors, each corresponds to the internal state of the decoder at
every time-step

16
Seq2seq in TensorFlow
outputs, states = basic_rnn_seq2seq(encoder_inputs,
decoder_inputs,
cell)

encoder_inputs: a list of tensors representing inputs to the encoder

decoder_inputs: a list of tensors representing inputs to the decoder
cell: single or multiple layer cells

outputs: a list of decoder_size tensors, each of dimension 1 x DECODE_VOCAB corresponding

to the probability distribution at each time-step
states: a list of decoder_size tensors, each corresponds to the internal state of the decoder at
every time-step

17
Seq2seq in TensorFlow
outputs, states = embedding_rnn_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
output_projection=None,
feed_previous=False)

To embed your inputs and outputs, need to specify the number of input and output tokens
Feed_previous if you want to feed the previously predicted word to train, even if the model
makes mistakes
Output_projection: tuple of project weight and bias if use sampled softmax

18
Seq2seq in TensorFlow
outputs, states = embedding_attention_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
num_heads=1,
output_projection=None,
feed_previous=False,
initial_state_attention=False)

Embedding sequence-to-sequence model with attention.

19
Wrapper for seq2seq with buckets
outputs, losses = model_with_buckets(encoder_inputs,
decoder_inputs,
targets,
weights,
buckets,
seq2seq,
softmax_loss_function=None,
per_example_loss=False)

Seq2seq: one of the seq2seq functions defined above

Softmax loss function: normal softmax or sampled softmax

20
Our TensorFlow chatbot

21
Cornell Movie-Dialogs Corpus
● 220,579 conversational exchanges between
● 10,292 pairs of movie characters
● 9,035 characters from 617 movies
● 304,713 total utterances
● Very well-formatted (almost perfect)

Come with a very interesting paper “Chameleons in

Imagined Conversations.”

22
Input Length Distribution

23
Bucketing
9 buckets
[(6, 8), (8, 10), (10, 12), (13, 15), (16, 19), (19, 22), (23, 26), (29, 32), (39, 44)]
[19530, 17449, 17585, 23444, 22884, 16435, 17085, 18291, 18931]

5 buckets
[(8, 10), (12, 14), (16, 19), (23, 26), (39, 43)] # bucket boundaries
[37049, 33519, 30223, 33513, 37371] # number of samples in each bucket

3 buckets - recommended
[37899, 34480, 31045]
[(8, 10), (12, 14), (16, 19)]

24
Vocabulary tradeoff
● Get all tokens that appear at least a number of time (twice)
● Alternative approach: get a fixed size vocabulary

Smaller vocabulary:
● Has smaller loss/perplexity but loss/perplexity isn’t everything
● Gives <unk> answers to questions that require personal information
● Doesn’t give the bot’s answers much response
● Doesn’t train much faster than big vocab using sampled softmax

25
Model
● Seq2seq
● Attentional decoder
● Reverse encoder inputs
● Bucketing
● Sampled softmax
● Based on the Google’s vanilla translate model,
originally used to translate from English to French

26
The sassiest bot
> hi
you have you go .
> human
> where? bot
in mind .
> you're weird?
you ' re right . . .
> what do you want?
who is you me .
> i don't understand
you know you look .
> i look what?
you nothing .
> you're funny
you ' re not .

27
The sassiest bot
> hi
hi .
> human
> how're you? bot
just go .
> go where?
<unk> .
> omg
<unk> !
> you're doing that again!
i ' m not .
> stop telling me to go somewhere!
go !
> it's almost like you want to piss me off.
i ' m not .
> you're stupid
i ' m not . 28
It sort of works
> you are a monster!
i am . . .
> human
> i'm sorry bot
you don ' t need to boy .
> i'm a girl!
you done ?
==========
> Did you change your hair?
no .
> no?
no i won t .
> who?
paul owen .
> where?
in a harem .
29
Sort of doesn’t
> what?
don ' t bother .
> human
> you're being sassy bot
i need .
> what do you need?
i need ya help ya .
> okay
what do you want ?

30
Sort of doesn’t
> you're just talking gibberish
and what are you talking about ?
> about life > human
and what about it ? bot
> life is beautiful
is good .
> where do you live?
up .
> as in heaven?
in his apartment .
> who's he?
who ' s here ?
> oh well
and you are in trouble .
> i'm not!
please !
> whatever
i don ' t know what to say .
31
See sample conversations on GitHub
(in assignments)

32
Sanity check?

How do we know that we implemented our model

correctly?

33
Sanity check?

Run the model on a small dataset (~2,000 pairs) and

run for a lot of epochs to see if it converges
(learns all the responses by heart)

34
Problems?
● The bot is very dramatic (thanks to Hollywood screenwriters)
● Topics of conversations aren’t realistic
● Responses are always fixed for one encoder input
● Inconsistent personality
● Use only the last previous utterance as the input for the encoder
● Doesn’t keep track of information about users

35
Train on multiple datasets
● Twitter chat log (courtesy of Marsan Ma)
● More movie substitles (less clean)
● Every publicly available Reddit comments (1TB of data!)
● Your own conversations (chat logs, text messages, emails)

36
Example of Twitter chat log

37
Chatbot with personalities
● At the decoder phase, inject consistent information about the bot
For example: name, age, hometown, current location, job
● Use the decoder inputs from one person only
For example: your own Sheldon Cooper bot!

38
Train on the incoming inputs
● Save the conversation with users and train on those conversations
● Create a feedback loop so users can correct the bot’s responses

39
Remember what users say
● The bot can extract information the user gives them

> hi
hi . what ' s your name ?
> my name is chip
nice to meet you .
> what's my name?
let ' s talk about something else .

40
Use characters instead of tokens
● Character level language modeling seems to be working quite well
● Smaller vocabulary -- no unknown tokens!
● But the sequences will be much longer (approximately 4x longer)

41
Improve input pipeline
● Right now, 50% of running time is spent on generating batches!

42
See assignment 3 handout

43
Next class
More discussion on chatbot

Feedback: huyenn@stanford.edu

Thanks!

Extracted-Chemistry Hand Book Last and Final Copy 17-6-2025
No ratings yet
Extracted-Chemistry Hand Book Last and Final Copy 17-6-2025
40 pages
List of Hormones - Hypersecretion and Hyposecretion
88% (49)
List of Hormones - Hypersecretion and Hyposecretion
11 pages
DLL Matatag Week 5 Pe and Health
No ratings yet
DLL Matatag Week 5 Pe and Health
14 pages
Dtv-md-0359-Directv Shef Public Beta Command Set-V1.0
No ratings yet
Dtv-md-0359-Directv Shef Public Beta Command Set-V1.0
25 pages
NVEM - EC300 - EC500 Self Test Guide
No ratings yet
NVEM - EC300 - EC500 Self Test Guide
8 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
DynaMed Plus - Thyroid Nodule
No ratings yet
DynaMed Plus - Thyroid Nodule
85 pages
MIDs POR IDENTIFICAR
No ratings yet
MIDs POR IDENTIFICAR
34 pages
Unit 5 DL
No ratings yet
Unit 5 DL
26 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
Lecture 7 - Conditional Language Modeling
No ratings yet
Lecture 7 - Conditional Language Modeling
64 pages
Sequence-To-Sequence Models: CIS 530, Computational Linguistics: Spring 2018
No ratings yet
Sequence-To-Sequence Models: CIS 530, Computational Linguistics: Spring 2018
61 pages
Commissioning Report For Boiler Air and Flue Gas System Unit 1
No ratings yet
Commissioning Report For Boiler Air and Flue Gas System Unit 1
6 pages
Assignment4 - Deeplearning
No ratings yet
Assignment4 - Deeplearning
10 pages
Tut 7
No ratings yet
Tut 7
12 pages
Image Captioning With Visual Attention PDF
No ratings yet
Image Captioning With Visual Attention PDF
16 pages
Comparison of PM Frameworks
67% (3)
Comparison of PM Frameworks
77 pages
Unit 5.
No ratings yet
Unit 5.
17 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
11 RNN
No ratings yet
11 RNN
32 pages
Sequence Models-II
No ratings yet
Sequence Models-II
10 pages
Project Machine Translation
No ratings yet
Project Machine Translation
45 pages
M5 Topic 1 - Encoder Decoder
No ratings yet
M5 Topic 1 - Encoder Decoder
21 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
Assignment 7
No ratings yet
Assignment 7
10 pages
All About Encoder-Decoder Models
No ratings yet
All About Encoder-Decoder Models
50 pages
Solutions
No ratings yet
Solutions
11 pages
Cse425 Assignement - 20101257
No ratings yet
Cse425 Assignement - 20101257
12 pages
DL - 20-WordEmbeddings - Ipynb - Colab
No ratings yet
DL - 20-WordEmbeddings - Ipynb - Colab
6 pages
DL - 22 - IMDB - SentimentAnalysis - RNN - Ipynb - Colab
No ratings yet
DL - 22 - IMDB - SentimentAnalysis - RNN - Ipynb - Colab
6 pages
FineTune OPUS MT Engine
No ratings yet
FineTune OPUS MT Engine
9 pages
Practical Applications
No ratings yet
Practical Applications
235 pages
Problem 1 Proposal
No ratings yet
Problem 1 Proposal
24 pages
CNN Text Classification
No ratings yet
CNN Text Classification
12 pages
Assingment-3 NLP
No ratings yet
Assingment-3 NLP
5 pages
Unit 3 - Subject Evaluation Building Sentences and Paragraphs (BSP)
No ratings yet
Unit 3 - Subject Evaluation Building Sentences and Paragraphs (BSP)
4 pages
Transformer
No ratings yet
Transformer
39 pages
Grade 9 English 6 T
No ratings yet
Grade 9 English 6 T
2 pages
Polynomial Expansion Paper
No ratings yet
Polynomial Expansion Paper
4 pages
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
No ratings yet
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
8 pages
Cs 224N: Assignment #4: 1. Neural Machine Translation With Rnns (45 Points)
No ratings yet
Cs 224N: Assignment #4: 1. Neural Machine Translation With Rnns (45 Points)
10 pages
Deep Learning Project
No ratings yet
Deep Learning Project
21 pages
Hair Transplant in Nepal
100% (1)
Hair Transplant in Nepal
3 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
cl12 Huggingface
No ratings yet
cl12 Huggingface
34 pages
Statement of Financial Position (S.F.P)
No ratings yet
Statement of Financial Position (S.F.P)
3 pages
DR Lal Pathlabs: Interpretation
No ratings yet
DR Lal Pathlabs: Interpretation
2 pages
Space Programming
100% (2)
Space Programming
3 pages
Expt 5 Expt 6
No ratings yet
Expt 5 Expt 6
10 pages
Dynamic Chat Bot
No ratings yet
Dynamic Chat Bot
4 pages
Exp 8 Machine Translation
No ratings yet
Exp 8 Machine Translation
11 pages
Language Translation With NN - Transformer and Torchtext - PyTorch Tutorials 2.3.0+cu121 Documentation
No ratings yet
Language Translation With NN - Transformer and Torchtext - PyTorch Tutorials 2.3.0+cu121 Documentation
8 pages
Neural Machine Translation: Shusen Wang
No ratings yet
Neural Machine Translation: Shusen Wang
57 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
5th Unit
No ratings yet
5th Unit
36 pages
Deep DL Manual Nainish
No ratings yet
Deep DL Manual Nainish
8 pages
LLM Fine Tune
No ratings yet
LLM Fine Tune
11 pages
Business Studies Notes PDF Class 12 Chapter 13
No ratings yet
Business Studies Notes PDF Class 12 Chapter 13
6 pages
Mining Rehabilitation Fund Questions and Answers
No ratings yet
Mining Rehabilitation Fund Questions and Answers
4 pages
Republic of Kenya Preparatory Survey On Second Olkaria Geothermal Power Project
No ratings yet
Republic of Kenya Preparatory Survey On Second Olkaria Geothermal Power Project
156 pages
CV Ajab Gul
No ratings yet
CV Ajab Gul
3 pages
Neubig 16 Afnlp
No ratings yet
Neubig 16 Afnlp
58 pages
تمثيل النص كموترات - تدريب - مايكروسوفت ليرن
No ratings yet
تمثيل النص كموترات - تدريب - مايكروسوفت ليرن
14 pages
Position Encoding: Intuition Lack Inherent Word Order Awareness
No ratings yet
Position Encoding: Intuition Lack Inherent Word Order Awareness
33 pages
Dark Tower PDF 4
No ratings yet
Dark Tower PDF 4
1 page
Natural Language Processing GPT-2
No ratings yet
Natural Language Processing GPT-2
5 pages
RNN Text Generation
No ratings yet
RNN Text Generation
3 pages
EncoderDecoderSeq2Seq DeepLSTM
No ratings yet
EncoderDecoderSeq2Seq DeepLSTM
7 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Chapter 23 The Enlightenment
No ratings yet
Chapter 23 The Enlightenment
4 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Summons in A Civil Action - National Attorney Collection Services Inc Kansas
No ratings yet
Summons in A Civil Action - National Attorney Collection Services Inc Kansas
3 pages
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
No ratings yet
CS4740/5740 Introduction To NLP Fall 2017 Neural Language Models and Classifiers
7 pages
GPT2 From Scratch in PyTorch
No ratings yet
GPT2 From Scratch in PyTorch
13 pages
Augmentin I V
No ratings yet
Augmentin I V
13 pages
List of Teaching Staff AY 2016-2017
No ratings yet
List of Teaching Staff AY 2016-2017
2 pages
DL Practical 09text Pre Processing
No ratings yet
DL Practical 09text Pre Processing
6 pages
Cs 224N: Assignment #4: 1. Neural Machine Translation With Rnns (45 Points)
No ratings yet
Cs 224N: Assignment #4: 1. Neural Machine Translation With Rnns (45 Points)
7 pages
PL 2
No ratings yet
PL 2
1 page
Code Explanation
No ratings yet
Code Explanation
8 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
Next Word Prediction With NLP and Deep Learning
No ratings yet
Next Word Prediction With NLP and Deep Learning
13 pages
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
No ratings yet
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
2 pages
Master Hunter CX
No ratings yet
Master Hunter CX
13 pages
Multi CD800a Mje061 User
No ratings yet
Multi CD800a Mje061 User
1 page
C Programming
From Everand
C Programming
Netra
No ratings yet
Technical Analysis Elearn
100% (5)
Technical Analysis Elearn
44 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Tensor Flow Chat Bot

Uploaded by

Tensor Flow Chat Bot

Uploaded by

A TensorFlow Chatbot

No class this Friday: Pete Warden’s talk on TensorFlow for mobile

Guest lecture next Friday by Danijar Hafner on Reinforcement Learning

Graph by Dev Nag 8

if config.NUM_SAMPLES > 0 and config.NUM_SAMPLES < config.DEC_VOCAB:

def sampled_loss(inputs, labels):

tf.nn.softmax(tf.matmul(inputs, tf.transpose(weight)) + bias)

encoder_inputs: a list of tensors representing inputs to the encoder

outputs: a list of decoder_size tensors, each of dimension 1 x DECODE_VOCAB corresponding

encoder_inputs: a list of tensors representing inputs to the encoder

outputs: a list of decoder_size tensors, each of dimension 1 x DECODE_VOCAB corresponding

Embedding sequence-to-sequence model with attention.

Seq2seq: one of the seq2seq functions defined above

Come with a very interesting paper “Chameleons in

How do we know that we implemented our model

Run the model on a small dataset (~2,000 pairs) and

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.