0% found this document useful (0 votes)
5 views44 pages

Tensor Flow Chat Bot

The document discusses the implementation of a TensorFlow chatbot using a sequence-to-sequence (seq2seq) model, highlighting key concepts such as the encoder-decoder architecture, attention mechanisms, and bucketing techniques. It also addresses challenges faced by the chatbot, including dramatic responses and inconsistent personality, and suggests improvements like training on multiple datasets and incorporating user feedback. The lecture includes announcements about upcoming assignments and guest lectures related to TensorFlow and deep learning.

Uploaded by

Abhishek Singla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views44 pages

Tensor Flow Chat Bot

The document discusses the implementation of a TensorFlow chatbot using a sequence-to-sequence (seq2seq) model, highlighting key concepts such as the encoder-decoder architecture, attention mechanisms, and bucketing techniques. It also addresses challenges faced by the chatbot, including dramatic responses and inconsistent personality, and suggests improvements like training on multiple datasets and incorporating user feedback. The lecture includes announcements about upcoming assignments and guest lectures related to TensorFlow and deep learning.

Uploaded by

Abhishek Singla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

A TensorFlow Chatbot

CS 20SI:
TensorFlow for Deep Learning Research
Lecture 13
3/1/2017

1
2
Announcements
Assignment 3 out tonight, due March 17

No class this Friday: Pete Warden’s talk on TensorFlow for mobile

Guest lecture next Friday by Danijar Hafner on Reinforcement Learning

3
Agenda
Seq2seq

Implementation keys

Chatbot craze

4
Sequence to Sequence
● The current model class of choice for most dialogue and
machine translation systems
● Introduced by Cho et al. in 2014 for Statistical Machine
Translation (the predecessor of NMT)
● The paper “Learning Phrase Representations using RNN
Encoder-Decoder for Statistical Machine Translation” has
been cited 900 times, approx. one paper a day.
● Originally called “RNN Encoder – Decoder”

5
Sequence to Sequence
Consists of two recurrent neural networks (RNNs):
● Encoder maps a variable-length source sequence (input) to a
fixed-length vector
● Decoder maps the vector representation back to a variable-length
target sequence (output)
● Two RNNs are trained jointly to maximize the conditional probability
of the target sequence given a source sequence

6
Vanilla Encoder and Decoder

Graph from “Learning Phrase Representations using RNN Encoder–Decoder for Statistical 7
Machine Translation” (Cho et al.)
Encoder and Decoder in TensorFlow
● Each box in the picture represents a cell of the RNN, most commonly
a GRU cell or an LSTM cell.
● Encoder and decoder often have different weights, but sometimes
they can share weights.

Graph by Dev Nag 8


With Attention
● In the vanilla model, each input has to be encoded into a fixed-size
state vector, as that is the only thing passed to the decoder.
● Attention mechanism that gives decoder direct access to the input.

9
Graph by Indico.io blog
Bucketing
● Avoid too much padding that leads to extraneous computation
● Group sequences of similar lengths into the same buckets

10
Bucketing
● Avoid too much padding that leads to extraneous computation
● Group sequences of similar lengths into the same buckets
● Create a separate subgraph for each bucket

11
Bucketing
● Avoid too much padding that leads to extraneous computation
● Group sequences of similar lengths into the same buckets
● Create a separate subgraph for each bucket
● In theory, can use for v1.0:
tf.contrib.training.bucket_by_sequence_length(max_length,
examples, batch_size, bucket_boundaries, capacity=2 *
batch_size, dynamic_pad=True)
● In practice, use the bucketing algorithm used in TensorFlow’s
translate model (because we’re using v0.12)

12
Sampled Softmax
● Avoid the growing complexity of computing the normalization
constant
● Approximate the negative term of the gradient, by importance
sampling with a small number of samples.
● At each step, update only the vectors associated with the
correct word w and with the sampled words in V’
● Once training is over, use the full target vocabulary to compute
the output probability of each target word

On Using Very Large Target Vocabulary for Neural Machine Translation (Jean et al., 2015)

13
Sampled Softmax

if config.NUM_SAMPLES > 0 and config.NUM_SAMPLES < config.DEC_VOCAB:


weight = tf.get_variable('proj_w', [config.HIDDEN_SIZE, config.DEC_VOCAB])
bias = tf.get_variable('proj_b', [config.DEC_VOCAB])
self.output_projection = (w, b)

def sampled_loss(inputs, labels):


labels = tf.reshape(labels, [-1, 1])
return tf.nn.sampled_softmax_loss(tf.transpose(weight), bias, inputs, labels,
config.NUM_SAMPLES, config.DEC_VOCAB)
self.softmax_loss_function = sampled_loss

14
Sampled Softmax
● Generally an underestimate of the full softmax loss.
● At inference time, compute the full softmax using:

tf.nn.softmax(tf.matmul(inputs, tf.transpose(weight)) + bias)

15
Seq2seq in TensorFlow
outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)

encoder_inputs: a list of tensors representing inputs to the encoder


decoder_inputs: a list of tensors representing inputs to the decoder
cell: single or multiple layer cells

outputs: a list of decoder_size tensors, each of dimension 1 x DECODE_VOCAB corresponding


to the probability distribution at each time-step
states: a list of decoder_size tensors, each corresponds to the internal state of the decoder at
every time-step

16
Seq2seq in TensorFlow
outputs, states = basic_rnn_seq2seq(encoder_inputs,
decoder_inputs,
cell)

encoder_inputs: a list of tensors representing inputs to the encoder


decoder_inputs: a list of tensors representing inputs to the decoder
cell: single or multiple layer cells

outputs: a list of decoder_size tensors, each of dimension 1 x DECODE_VOCAB corresponding


to the probability distribution at each time-step
states: a list of decoder_size tensors, each corresponds to the internal state of the decoder at
every time-step

17
Seq2seq in TensorFlow
outputs, states = embedding_rnn_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
output_projection=None,
feed_previous=False)

To embed your inputs and outputs, need to specify the number of input and output tokens
Feed_previous if you want to feed the previously predicted word to train, even if the model
makes mistakes
Output_projection: tuple of project weight and bias if use sampled softmax

18
Seq2seq in TensorFlow
outputs, states = embedding_attention_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
num_heads=1,
output_projection=None,
feed_previous=False,
initial_state_attention=False)

Embedding sequence-to-sequence model with attention.

19
Wrapper for seq2seq with buckets
outputs, losses = model_with_buckets(encoder_inputs,
decoder_inputs,
targets,
weights,
buckets,
seq2seq,
softmax_loss_function=None,
per_example_loss=False)

Seq2seq: one of the seq2seq functions defined above


Softmax loss function: normal softmax or sampled softmax

20
Our TensorFlow chatbot

21
Cornell Movie-Dialogs Corpus
● 220,579 conversational exchanges between
● 10,292 pairs of movie characters
● 9,035 characters from 617 movies
● 304,713 total utterances
● Very well-formatted (almost perfect)

Come with a very interesting paper “Chameleons in


Imagined Conversations.”

22
Input Length Distribution

23
Bucketing
9 buckets
[(6, 8), (8, 10), (10, 12), (13, 15), (16, 19), (19, 22), (23, 26), (29, 32), (39, 44)]
[19530, 17449, 17585, 23444, 22884, 16435, 17085, 18291, 18931]

5 buckets
[(8, 10), (12, 14), (16, 19), (23, 26), (39, 43)] # bucket boundaries
[37049, 33519, 30223, 33513, 37371] # number of samples in each bucket

3 buckets - recommended
[37899, 34480, 31045]
[(8, 10), (12, 14), (16, 19)]

24
Vocabulary tradeoff
● Get all tokens that appear at least a number of time (twice)
● Alternative approach: get a fixed size vocabulary

Smaller vocabulary:
● Has smaller loss/perplexity but loss/perplexity isn’t everything
● Gives <unk> answers to questions that require personal information
● Doesn’t give the bot’s answers much response
● Doesn’t train much faster than big vocab using sampled softmax

25
Model
● Seq2seq
● Attentional decoder
● Reverse encoder inputs
● Bucketing
● Sampled softmax
● Based on the Google’s vanilla translate model,
originally used to translate from English to French

26
The sassiest bot
> hi
you have you go .
> human
> where? bot
in mind .
> you're weird?
you ' re right . . .
> what do you want?
who is you me .
> i don't understand
you know you look .
> i look what?
you nothing .
> you're funny
you ' re not .

27
The sassiest bot
> hi
hi .
> human
> how're you? bot
just go .
> go where?
<unk> .
> omg
<unk> !
> you're doing that again!
i ' m not .
> stop telling me to go somewhere!
go !
> it's almost like you want to piss me off.
i ' m not .
> you're stupid
i ' m not . 28
It sort of works
> you are a monster!
i am . . .
> human
> i'm sorry bot
you don ' t need to boy .
> i'm a girl!
you done ?
==========
> Did you change your hair?
no .
> no?
no i won t .
> who?
paul owen .
> where?
in a harem .
29
Sort of doesn’t
> what?
don ' t bother .
> human
> you're being sassy bot
i need .
> what do you need?
i need ya help ya .
> okay
what do you want ?

30
Sort of doesn’t
> you're just talking gibberish
and what are you talking about ?
> about life > human
and what about it ? bot
> life is beautiful
is good .
> where do you live?
up .
> as in heaven?
in his apartment .
> who's he?
who ' s here ?
> oh well
and you are in trouble .
> i'm not!
please !
> whatever
i don ' t know what to say .
31
See sample conversations on GitHub
(in assignments)

32
Sanity check?

How do we know that we implemented our model


correctly?

33
Sanity check?

Run the model on a small dataset (~2,000 pairs) and


run for a lot of epochs to see if it converges
(learns all the responses by heart)

34
Problems?
● The bot is very dramatic (thanks to Hollywood screenwriters)
● Topics of conversations aren’t realistic
● Responses are always fixed for one encoder input
● Inconsistent personality
● Use only the last previous utterance as the input for the encoder
● Doesn’t keep track of information about users

35
Train on multiple datasets
● Twitter chat log (courtesy of Marsan Ma)
● More movie substitles (less clean)
● Every publicly available Reddit comments (1TB of data!)
● Your own conversations (chat logs, text messages, emails)

36
Example of Twitter chat log

37
Chatbot with personalities
● At the decoder phase, inject consistent information about the bot
For example: name, age, hometown, current location, job
● Use the decoder inputs from one person only
For example: your own Sheldon Cooper bot!

38
Train on the incoming inputs
● Save the conversation with users and train on those conversations
● Create a feedback loop so users can correct the bot’s responses

39
Remember what users say
● The bot can extract information the user gives them

> hi
hi . what ' s your name ?
> my name is chip
nice to meet you .
> what's my name?
let ' s talk about something else .

40
Use characters instead of tokens
● Character level language modeling seems to be working quite well
● Smaller vocabulary -- no unknown tokens!
● But the sequences will be much longer (approximately 4x longer)

41
Improve input pipeline
● Right now, 50% of running time is spent on generating batches!

42
See assignment 3 handout

43
Next class
More discussion on chatbot

Feedback: huyenn@stanford.edu

Thanks!

44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy