Creating A Rule-Based Chatbot
Creating A Rule-Based Chatbot
com/python-for-nlp-creating-a-rule-based-chatbot/
This is the 12th article in my series of articles on Python for NLP. In the previous article
(/python-for-nlp-working-with-the-gensim-library-part-2/), I briefly explained the
different functionalities of the Python's Gensim library (https://pypi.org/project
/gensim/). Until now, in this series, we have covered almost all of the most commonly
used NLP libraries such as NLTK, SpaCy, Gensim, StanfordCoreNLP, Pattern, TextBlob,
etc.
In this article, we are not going to explore any NLP library. Rather, we will develop a
very simple rule-based chatbot capable of answering user queries regarding the sport
of Tennis. But before we begin actual coding, let's first briefly discuss what chatbots
are and how they are used.
What is a Chatbot?
A chatbot is a conversational agent capable of answering user queries in the form of
text, speech, or via a graphical user interface. In simple words, a chatbot is a software
application that can chat with a user on any topic. Chatbots can be broadly categorized
into two types: Task-Oriented Chatbots and General Purpose Chatbots.
The task-oriented chatbots are designed to perform specific tasks. For instance, a task-
oriented chatbot can answer queries related to train reservation, pizza delivery; it can
also work as a personal medical therapist or personal assistant.
On the other hand, general purpose chatbots can have open-ended discussions with
the users.
There is also a third type of chatbots called hybrid chatbots that can engage in both
task-oriented and open-ended discussion with the users.
A
1 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Learning-Based Chatbots
Learning-based chatbots are the type of chatbots that use machine learning techniques
and a dataset to learn to generate a response to user queries. Learning-based chatbots
can be further divided into two categories: retrieval-based chatbots and generative
chatbots.
The retrieval based chatbots learn to select a certain response to user queries. On the
other hand, generative chatbots learn to generate a response on the fly.
Rule-Based Chatbots
One of the advantages of rule-based chatbots is that they always give accurate results.
However, on the downside, they do not scale well. To add more responses, you have to
define new rules.
In the following section, I will explain how to create a rule-based chatbot that will reply
to simple user queries regarding the sport of tennis.
2 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
When a user enters a query, the query will be converted into vectorized form. All the
sentences in the corpus will also be converted into their corresponding vectorized
forms. Next, the sentence with the highest cosine similarity (https://en.wikipedia.org
/wiki/Cosine_similarity) with the user input vector will be selected as a response to the
user input.
import bs4 as bs
import urllib.request
import re
3 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
raw_html = urllib.request.urlopen('https://en.wikipedia.org/wiki/Tennis')
raw_html = raw_html.read()
article_paragraphs = article_html.find_all('p')
article_text = ''
article_text = article_text.lower()
We need to divide our text into sentences and words since the cosine similarity of the
user input will actually be compared with each sentence. Execute the following script:
article_sentences = nltk.sent_tokenize(article_text)
article_words = nltk.word_tokenize(article_text)
Finally, we need to create helper functions that will remove the punctuation from the
user input text and will also lemmatize the text. Lemmatization refers to reducing a
word to its root form. For instance, lemmatization the word "ate" returns eat, the word
"throwing" will become throw and the word "worse" will be reduced to "bad".
wnlemmatizer = nltk.stem.WordNetLemmatizer()
def perform_lemmatization(tokens):
return [wnlemmatizer.lemmatize(token) for token in tokens]
def get_processed_text(document):
return perform_lemmatization(nltk.word_tokenize(document.lower().translate(punctuation_remov
al))) A
4 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
In the script above we first instantiate the WordNetLemmatizer from the NTLK
(https://www.nltk.org/) library. Next, we define a function perform_lemmatization
which takes a list of words as input and lemmatize the corresponding lemmatized list
of words. The punctuation_removal list removes the punctuation from the passed text.
Finally, the get_processed_text method takes a sentence as input, tokenizes it,
lemmatizes it, and then removes the punctuation from the sentence.
Responding to Greetings
Since we are developing a rule-based chatbot, we need to handle different types of
user inputs in a different manner. For instance, for greetings we will define a dedicated
function. To handle greetings, we will create two lists: greeting_inputs and
greeting_outputs . When a user enters a greeting, we will try to search it in the
greetings_inputs list, if the greeting is found, we will randomly choose a response
from the greeting_outputs list.
greeting_inputs = ("hey", "good morning", "good evening", "morning", "evening", "hi", "whatsup")
greeting_responses = ["hey", "hey hows you?", "*nods*", "hello, how you doing", "hello", "Welcom
e, I am good and you"]
def generate_greeting_response(greeting):
for token in greeting.split():
if token.lower() in greeting_inputs:
return random.choice(greeting_responses)
5 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Subscribe
Now we have everything set up that we need to generate a response to the user
queries related to tennis. We will create a method that takes in user input, finds the
cosine similarity of the user input and compares it with the sentences in the corpus.
def generate_response(user_input):
tennisrobo_response = ''
article_sentences.append(user_input)
matched_vector = similar_vector_values.flatten()
matched_vector.sort()
vector_matched = matched_vector[-2]
if vector_matched == 0:
tennisrobo_response = tennisrobo_response + "I am sorry, I could not understand you"
return tennisrobo_response
else:
tennisrobo_response = tennisrobo_response + article_sentences[similar_sentence_number]
return tennisrobo_response
You can see that the generate_response() method accepts one parameter which is
user input. Next, we define an empty string tennisrobo_response . We then append
the user input to the list of already existing sentences. After that in the following lines:
6 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
We initialize the tfidfvectorizer and then convert all the sentences in the corpus
along with the input sentence into their corresponding vectorized form.
We use the cosine_similarity function to find the cosine similarity between the last
item in the all_word_vectors list (which is actually the word vector for the user input
since it was appended at the end) and the word vectors for all the sentences in the
corpus.
similar_sentence_number = similar_vector_values.argsort()[0][-2]
We sort the list containing the cosine similarities of the vectors, the second last item in
the list will actually have the highest cosine (after sorting) with the user input. The last
item is the user input itself, therefore we did not select that.
Finally, we flatten the retrieved cosine similarity and check if the similarity is equal to
zero or not. If the cosine similarity of the matched vector is 0, that means our query did
not have an answer. In that case, we will simply print that we do not understand the
user query.
Otherwise, if the cosine similarity is not equal to zero, that means we found a sentence
similar to the input in our corpus. In that case, we will just pass the index of the
matched sentence to our "article_sentences" list that contains the collection of all
sentences.
7 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
we just designed. To do so, we will write another helper function that will keep
executing until the user types "Bye".
Look at the following script, the code has been explained after that:
continue_dialogue = True
print("Hello, I am your friend TennisRobo. You can ask me any question regarding tennis:")
while(continue_dialogue == True):
human_text = input()
human_text = human_text.lower()
if human_text != 'bye':
if human_text == 'thanks' or human_text == 'thank you very much' or human_text == 'thank
you':
continue_dialogue = False
print("TennisRobo: Most welcome")
else:
if generate_greeting_response(human_text) != None:
print("TennisRobo: " + generate_greeting_response(human_text))
else:
print("TennisRobo: ", end="")
print(generate_response(human_text))
article_sentences.remove(human_text)
else:
continue_dialogue = False
print("TennisRobo: Good bye and take care of yourself...")
In the script above, we first set the flag continue_dialogue to true. After that, we print
a welcome message to the user asking for any input. Next, we initialize a while loop
that keeps executing until the continue_dialogue flag is true. Inside the loop, the user
input is received, which is then converted to lower case. The user input is stored in the
human_text variable. If the user enters the word "bye", the continue_dialogue is set to
false and goodbye message is printed to the user.
On the other hand, if the input text is not equal to "bye", it is checked if the input
contains words like "thanks", "thank you", etc. or not. If such words are found, a reply
"Most welcome" is generated. Otherwise, if the user input is not equal to None , the
generate_response method is called which fetches the user response based on the
cosine similarity as explained in the last section.
Once the response is generated, the user input is removed from the collection of
sentences since we do not want the user input to be part of the corpus. The process
continues until the user types "bye". You can see why this type of chatbot is called a
A
rule-based chatbot. There are plenty of rules to follow and if we want to add more
8 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Hello, I am your friend TennisRobo, You c;,n ;isl:• ;,ny question reg;irding tennis:
roger federer
TennisRobo:
however it 11ust be noted th,1t hath rod l,1ver and ken rosewall also won ... jor pro sl- tourna11ents on all three surfaces (grass,
clay, wood) rosewall in 1963 and l,1ver in 1967. -,re recently, roger federer is considered by 11,1ny observers to have the 1110st
"coo,plete" g""'e in 110dern tennis.
however it must be noted that both rod laver and ken rosewall also won major pro slam tournament
s on all three surfaces (grass, clay, wood) rosewall in 1963 and laver in 1967. more recently, r
oger federer is considered by many observers to have the most "complete" game in modern tennis."
9 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
10 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
11 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
jobs)
jobs)
jobs)
(https://stackabu.se/daily-coding-problem)
12 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Ad
angular
What is the (/tag/angular/)
serialVersionUID in Java?
announcements
(/what-is-the-
(/tag/announcements/)
serialversionuid-in-java/)
I apache (/tag/apache/) I
asynchronous
(/tag/asynchronous/)
I aws (/tag/aws/) I
13 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
14 of 14 11/06/2020, 12:44