0% found this document useful (0 votes)
45 views153 pages

NLP

The document provides an overview of different machine learning techniques including supervised learning, unsupervised learning, reinforcement learning and semi-supervised learning. It describes the key concepts, applications, advantages and limitations of these techniques.

Uploaded by

kusum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views153 pages

NLP

The document provides an overview of different machine learning techniques including supervised learning, unsupervised learning, reinforcement learning and semi-supervised learning. It describes the key concepts, applications, advantages and limitations of these techniques.

Uploaded by

kusum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 153

Syllabus

1
CONTENTS
• Deep Learning
• Need
• Advantages
• Limitations
• Applications
• Types

2
3
4
Supervised Learning
• Being able to adapt to new inputs and make predictions is the crucial generalisation part
of machine learning.
• In training, we want to maximize generalization, so the supervised model defines the
real ‘general’ underlying relationship.
• If the model is over-trained, we cause over-fitting to the examples used and the model
would be unable to adapt to new, previously unseen inputs.
• A side effect to be aware of in supervised learning is that the supervision we provide
introduces bias to the learning.
• The model can only be imitating exactly what it was shown, so it is very important to
show reliable, unbiased examples.
• Also, supervised learning usually requires a lot of data before it is learned.
• Obtaining enough reliably labeled data is often the hardest and most expensive part of
using supervised learning.
5
Supervised Learning
• The output from a supervised Machine Learning model could be a category from a finite
set e.g [low, medium, high] for the number of visitors to the beach.
• This is called classification problem.

• The output from a supervised Machine Learning model could be a numeric value from a
finite set e.g [500-2000] for the number of visitors to the beach.
• This is called regression problem.

• Supervised learning is of two types: Classification and Regression.

6
7
Supervised Learning-Classification
• Classification is used to group the similar data points into different sections in order to
classify them.
• Machine Learning is used to find the rules that explain how to separate the different data
points.
• They all focus on using data and answers to discover rules that linearly separate data
points.
• Linear separability is a key concept in machine learning.
• Classification approaches try to find the best way to separate data points with a line.
• The lines drawn between classes are known as the decision boundaries.
• The entire area that is chosen to define a class is known as the decision surface.
• The decision surface defines that if a data point falls within its boundaries, it will be
assigned a certain class.
8
Supervised Learning-Classification
• Binary Classification CLASS 1
• Multi-Class Classification
• Multi-Label Classification
CLASS 2
• Imbalanced Classification

LINEAR SEPERABLE

9
Supervised Learning-Regression
• The difference between classification and CLASS 1
regression is that regression outputs a
number rather than a class.
CLASS 2

• Therefore, regression is useful when


predicting number based problems like
stock market prices, the temperature for a
given day, or the probability of an event.

10
Unsupervised Learning
• In unsupervised learning, only input data is provided in the examples.
• There are no labelled example outputs to aim for.
• But it may be surprising to know that it is still possible to find many interesting and
complex patterns hidden within data without any labels.
• An example of unsupervised learning in real life would be sorting different colour coins
into separate piles. Nobody taught you how to separate them, but by just looking at their
features such as colour, you can see which colour coins are associated and cluster them
into their correct groups.

• Unsupervised learning can be harder than supervised learning, as the removal of


supervision means the problem has become less defined. The algorithm has a less focused
idea of what patterns to look for.

11
12
Unsupervised Learning
• Unsupervised machine learning finds all kind of unknown patterns in data.
• Unsupervised methods help you to find features which can be useful for categorization.
• It is taken place in real time, so all the input data to be analyzed and labeled in the
presence of learners.
• It is easier to get unlabeled data from a computer than labeled data, which needs manual
intervention.

• Unsupervised Learning is of two types: Clustering and Association.

13
14
Unsupervised Learning-Clustering
• Unsupervised learning is mostly used for clustering.
• Clustering is the act of creating groups with differing characteristics.
• Clustering attempts to find various subgroups within a dataset.
• As this is unsupervised learning, we are not restricted to any set of labels and are free to
choose how many clusters to create.
• This is both a blessing and a curse.
• Picking a model that has the correct number of clusters (complexity) has to be conducted
via an empirical model selection process.

15
Unsupervised Learning-Association
• In Association Learning you want to uncover the rules that describe your data.
• For example, if a person watches video A they will likely watch video B.

• Association rules are perfect for examples such as this where you want to find related
items.

• Common example is Market Basket Analysis:


• Market Basket Analysis is one of the key techniques used by large retailers to uncover associations
between items. It works by looking for combinations of items that occur together frequently in
transactions. To put it another way, it allows retailers to identify relationships between the items that
people buy.
• Association Rules are widely used to analyze retail basket or transaction data, and are intended to
identify strong rules discovered in transaction data using measures of interestingness, based on the
concept of strong rules.
16
Semi-supervised Learning
• Semi-supervised learning is a mix between supervised and unsupervised approaches.
• The learning process isn’t closely supervised with example outputs for every single input,
but we also don’t let the algorithm do its own thing and provide no form of feedback.
• Semi-supervised learning takes the middle road.
• By being able to mix together a small amount of labelled data with a much larger
unlabeled dataset it reduces the burden of having enough labelled data.
• Therefore, it opens up many more problems to be solved with machine learning.
• Example:
• Internet Content Classification: Labeling each webpage is an impractical and
unfeasible process and thus uses Semi-Supervised learning algorithms. Even the Google
search algorithm uses a variant of Semi-Supervised learning to rank the relevance of a
webpage for a given query.
17
Reinforcement Learning
• In this approach, occasional positive and negative feedback is used to reinforce
behaviours.
• Think of it like training a dog, good behaviours are rewarded with a treat and become
more common. Bad behaviours are punished and become less common.
• This reward-motivated behaviour is key in reinforcement learning.
• It is less common and much more complex, but it has generated incredible results.
• It doesn’t use labels as such, and instead uses rewards to learn.
• Reinforcement learning is also employed in games, finance, and other domains where
decision-making in dynamic environments is essential. It allows machines to learn from
experience and adapt their strategies to achieve better outcomes over time.

18
19
Reinforcement Learning
• This is very similar to how we as humans also learn.
• Throughout our lives, we receive positive and negative signals and constantly learn from
them.
• The chemicals in our brain are one of many ways we get these signals.
• When something good happens, the neurons in our brains provide a hit of positive
neurotransmitters such as dopamine which makes us feel good and we become more
likely to repeat that specific action.
• We don’t need constant supervision to learn like in supervised learning.
• By only giving the occasional reinforcement signals, we still learn very effectively.

20
Example: Teaching a Robot to Navigate
• Environment: Imagine a small robot placed in a grid-like environment. Each cell in the
grid represents a location, and the robot can move up, down, left, or right.
• Agent: The robot is the reinforcement learning agent. Its objective is to reach a specific
goal cell in the grid.
• Actions: The robot can take actions to move in any of the four directions: up, down, left,
or right.
• Rewards: The robot receives positive rewards for moving closer to the goal and negative
rewards for moving away from the goal or hitting obstacles.
• Learning Process:
• Initially, the robot doesn't know the optimal path to reach the goal.
• The agent explores the environment by taking random actions. As it moves, it
receives rewards or penalties based on its location and proximity to the goal.
• Over time, the agent learns a policy - a strategy for choosing actions in each state that
maximizes the expected cumulative reward.

21
Reinforcement Learning
• One of the most exciting parts of Reinforcement Learning is that is a first step away from
training on static datasets, and instead of being able to use dynamic, noisy data-rich
environments.
• This brings Machine Learning closer to a learning style used by humans. The world is
simply our noisy, complex data-rich environment.
• Games are very popular in Reinforcement Learning research. They provide ideal
data-rich environments.
• The scores in games are ideal reward signals to train reward-motivated behaviours.
Additionally, time can be sped up in a simulated game environment to reduce overall
training time.
• A Reinforcement Learning algorithm just aims to maximize its rewards by playing the
game over and over again. If you can frame a problem with a frequent ‘score’ as a
reward, it is likely to be suited to Reinforcement Learning.
22
23
What is AI?

“Copy or Mimic Human Brain”

How human Brain Work??

In deep learning, we have an artificial neuron


that carries the information to communicate.
24
ML Failed working on Problems Like:
•Natural Language Processing
•Image Detection
•Image Recognition
•Chat bots
•Automatic Car driving system

25
Natural Language Processing
• USED FOR:
• Language translation application such as Google Translate
• Using NLP to check grammatical accuracy of texts (MS Word)
• Integrative Voice Response (IVR) application used in call centres to respond to certain users’ requests.
• Personal assistant applications such as OK Google, Siri, Cortana, and Alexa

26
Perceptron(Artificial Neuron)

27
Perceptron(Artificial Neuron)

28
Perceptron(Artificial Neuron)

• Perceptron is a linear model used for classification.


• It receives all the inputs.
• Sum those inputs.
• Apply the activation function on it and give us output.

29
Neural Network
• Basically, there are 3 different layers in a neural network :-
• Input Layer (All the inputs are fed in the model through this layer)
• Hidden Layers (There can be more than one hidden layers which are used for
processing the inputs received from the input layers)
• Output Layer (The data after processing is made available at the output layer)

30
ANN vs BNN

Biological Neural Artificial Neural Network ANNANN


Network BNNBNN
Soma Node
Dendrites Input
Synapse Weights or Interconnections
Axon Output
31
ANN vs BNN
Criteria BNN ANN
Processing Massively parallel, slow Massively parallel, fast but inferior than
but superior than ANN BNN
11 2 4
Size 10 neurons and 10 to 10 nodes mainly depends on the
15
10 interconnections type of application and network designer

Learning They can tolerate Very precise, structured and formatted data
ambiguity is required to tolerate ambiguity
Fault tolerance Performance degrades It is capable of robust performance, hence
with even partial has the potential to be fault tolerant
damage
Storage Stores the information in Stores the information in continuous
capacity the synapse memory locations
32
ANN-Terminology
• Neuron
• Input layer
• Hidden layer
• Output layer
• Bias
• Weight
• Activation Function

33
Working of ANN
• Artificial Neural Networks can be best viewed as weighted directed graphs, where the
nodes are formed by the artificial neurons and the connection between the neuron outputs
and neuron inputs can be represented by the directed edges with weights.
• The Artificial Neural Network receives the input signal from the external world in the
form of a pattern and image in the form of a vector.
• These inputs are then mathematically designated by the notations x(n) for every n
number of inputs.
• Each of the input is then multiplied by its corresponding weights (these weights are the
details used by the artificial neural networks to solve a certain problem).
• In general terms, these weights typically represent the strength of the interconnection
amongst neurons inside the artificial neural network.
• All the weighted inputs are summed up inside the computing unit (yet another artificial
neuron).

34
Working of ANN
• If the weighted sum equates to zero, a bias is added to make the output non-zero or else
to scale up to the system’s response.
• Bias has the weight and the input to it is always equal to 1.
• Here the sum of weighted inputs can be in the range of 0 to positive infinity.
• To keep the response in the limits of the desired value, a certain threshold value is
benchmarked.
• And then the sum of weighted inputs is passed through the activation function.
• The activation function, in general, is the set of transfer functions used to get the desired
output of it.
• There are various flavors of the activation function, but mainly either linear or non-linear
sets of functions.

35
Working of ANN

36
Neuron
• ANNs are composed of artificial neurons that retain the biological concept of neurons,
which receive input, combine the input with their internal state (activation) and an
optional threshold using an activation function, and produce output using an output
function.
• The initial inputs are external data, such as images and documents.
• The ultimate outputs accomplish the task, such as recognizing an object in an image.
• The important characteristic of the activation function is that it provides a smooth,
differentiable transition as input values change, i.e. a small change in input produces a
small change in output.
• Processing elements, the neural network equivalent of neurons, are generally simple
devices that receive several input signals and, based on those inputs, either generate
a single output signal (fire) or do not.
• The output signal of an individual processing element is sent to many other processing
elements (and possibly back to itself) as input signals via the interconnections between
processing elements. 37
Weights and Bias
• The network consists of connections, each connection providing the output of one neuron
as an input to another neuron.
• Each connection is assigned a weight that represents its relative importance.
• A given neuron can have multiple input and output connections.
• The weight shows the effectiveness of a particular input. More the weight of input,
more it will have impact on network.
• On the other hand Bias is like the intercept added in a linear equation.
• It is an additional parameter in the Neural Network which is used to adjust the output
along with the weighted sum of the inputs to the neuron.
• Therefore Bias is a constant that helps the model in a way that it can fit best for the given
data.

38
Input Layer
• The Input layer communicates with the external environment that presents a pattern to the
neural network.
• Its job is to deal with all the inputs only.
• This input gets transferred to the hidden layers.
• The input layer should represent the condition for which we are training the neural
network.
• Every input neuron should represent some independent variable that has an influence
over the output of the neural network

39
Hidden Layer
• The hidden layer is the collection of neurons which has activation function applied on it
and it is an intermediate layer found between the input layer and the output layer.
• Its job is to process the inputs obtained by its previous layer.
• So it is the layer that is responsible for extracting the required features from the
input data.
• Many researches has been made in evaluating the number of neurons in the hidden layer
but still none of them was successful in finding the accurate result.
• Suppose that if we have a data which can be separated linearly, then there is no need to
use hidden layer as the activation function can be implemented to input layer which can
solve the problem.
• But in case of problems which deals with complex decisions, we can use 3 to 5 hidden
layers based on the degree of complexity of the problem or the degree of accuracy
required.
• That certainly not means that if we keep on increasing the number of layers, the neural
40
network will give high accuracy!
Example: Hidden Layer
• In recurrent neural networks (RNNs) or transformers used for NLP tasks, hidden layers
are crucial for capturing sequential dependencies in language. Each layer in these
networks processes information from previous layers, allowing the model to understand
the context and relationships between words in a sentence.
• Suppose we have the following sentence: "The cat sat on the mat." And our task is to predict the
next word in the sequence.

1. Without Hidden Layers:


• If we were to use a simple feedforward neural network without hidden layers,
the model would treat each word independently. It would not consider the order
or sequential dependencies between words. Each word would be represented by
its own set of weights, and the model would not capture the context of the
sentence.

41
Example: Hidden Layer
2. With Hidden Layers (RNN):
• Now, let's introduce a recurrent neural network (RNN) with hidden layers. In an RNN, each word in the
sequence is processed one at a time, and the hidden state is updated at each step. The hidden state serves as a
memory that retains information about the previous words in the sequence.
• For example:
• Input: "The"
• Update hidden state based on "The"
The cat sat on the mat
• Input: "cat"
• Update hidden state based on "The Cat"
• Input: "sat"
• Update hidden state based on "The cat sat"
• Input: "on"
• Update hidden state based on "The cat sat on"
• Input: "the"
• Update hidden state based on "The cat sat on the"
• Input: "mat"
• Update hidden state based on "The cat sat on the mat“

• The hidden state now contains information about the entire context of the sentence. When predicting the next
word, the model considers not only the current input word ("mat") but also the context captured in the hidden
state. 42
Example: Hidden Layer
Other architectures like transformers also utilize hidden layers, but they process the
entire sequence in parallel, allowing them to capture long-range dependencies more
efficiently than traditional RNNs. The idea is similar, where each layer processes
information from the previous layers, building a contextualized representation of the
input sequence.

43
Output Layer
• The output layer of the neural network collects and transmits the information accordingly
in way it has been designed to give.
• The pattern presented by the output layer can be directly traced back to the input layer.
• The number of neurons in output layer should be directly related to the type of work that
the neural network was performing.
• To determine the number of neurons in the output layer, first consider the intended use of
the neural network.

44
Activation Function
• The main purpose of the activation function is to convert the input signal to a node in
ANN to output signal.
• A neural network without an activation function is just a linear regression model. Hence
to learn complex and non-linear curves, we need activation functions.
• It is used to determine the output of neural network like yes or no. It maps the resulting
values in between 0 to 1 or -1 to 1 etc. (depending upon the function)
• The Activation Functions can be of following types-
• Step
• Linear
• Sigmoid
• Tanh
• ReLu
45
Properties of Activation Function
• Non-Linear To generate non-linear input mappings, we need a non-linear activation
function.
• Saturate A saturating activation function squeezes the input and puts the output in a
limited range; hence, no single weight can have a significant impact on the final output.
• Continuous and Smooth For gradient based optimizations smoother functions have
generally shown better results. Since input takes a continuous range output also should
take a continuous range for a node.
• Differentiable As we see while deriving back-propagation derivative of f should be
defined.
• Monotonic If the activation function is not monotonically increasing a neuron’s weight
might cause it to have less influence and vice versa; which is precisely opposite of what
we want.
• Linear for small values If it is non-linear for small values, we need to take care of the
constraints while weight initialization of the neural network since we can face vanishing
46
gradient or exploding gradient problem.
Binary Step Activation Function
• This activation function very basic and it comes to mind every time if we try to bound
output.
• It is basically a threshold base classifier, in this, we decide some threshold value to decide
output that neuron should be activated or deactivated.
• f(x) = 1 if x > 0 else 0 if x < 0

• In this, we decide the threshold value to 0. It is very simple and useful to classify binary
problems or classifier. 47
Linear Activation Function
• It is a simple straight line activation function where our function is directly proportional
to the weighted sum of neurons or input.
• Linear activation functions are better in giving a wide range of activations and a line of
a positive slope may increase the firing rate as the input rate increases.

• Mathematically,
Y = mZ

48
Sigmoid Activation Function
• The main reason why we use sigmoid function is because it exists between (0 to 1).
• Therefore, it is especially used for models where we have to predict the probability as
an output.
• Since probability of anything exists only between the range of 0 and 1, sigmoid is the
right choice.
• The function is differentiable.That means, we can find the slope of the sigmoid curve at
any two points.
• The function is monotonic but function’s derivative is not.

• Note e and its properties.


As x goes to minus infinity, y goes to 0 (tends not to fire).
As x goes to infinity, y goes to 1 (tends to fire):
At x=0, y=1/2 49
50
Tanh Activation Function
• tanh is also like logistic sigmoid but better.
• The range of the tanh function is from (-1 to 1). tanh is also sigmoidal (s - shaped).
• The advantage is that the negative inputs will be mapped strongly negative and the zero
inputs will be mapped near zero in the tanh graph.
• The function is differentiable.
• The function is monotonic
• Its derivative is not monotonic.

51
Tanh vs Sigmoid

52
ReLu Activation Function
• ReLU stands for rectified linear unit, and is a type of activation
function. Mathematically, it is defined as y = max(0, x).
• ReLU is linear (identity) for all positive values, and zero for all
negative values. This means that:
• It’s cheap to compute as there is no complicated math. The
model can therefore take less time to train or run.
• It converges faster. Linearity means that the slope doesn’t
plateau, or “saturate,” when x gets large. It doesn’t have the
vanishing gradient problem suffered by other activation
functions like sigmoid or tanh.
• It’s sparsely activated. Since ReLU is zero for all negative
inputs, it’s likely for any given unit to not activate at all. This is
often desirable (see below).

53
Softmax Activation Function

54
Perceptron
• It is used for two class classification.

55
How to train a Perceptron?

56
Steps to train a Perceptron?

57
What are weight and why it is used?

58
Loss Function or Error

input

59
What is Loss Function or Error

60
Weight updating formula:

61
62
Bias:

It is used to shift the activation function

63
64
NLP Context: Weights and Bias
• In the context of Natural Language Processing (NLP) and neural networks, "weights"
refer to the parameters that the model learns during the training process.
• These weights are associated with the connections between neurons (or nodes) in the
neural network. The weights determine the strength of the connections and play a crucial
role in shaping the model's ability to make predictions based on input data.
• Example: Sentiment Analysis
• Suppose we have a simple neural network for sentiment analysis with the following architecture:
• Input Layer: Each input neuron represents the presence or absence of a specific word in the input text.
• Hidden Layer: Neurons in the hidden layer process the input features and learn to capture patterns and
relationships between words.
• Output Layer: The output layer produces the final prediction, indicating the sentiment of the input text.
• During the training process, the neural network adjusts its weights based on the provided
labeled data (input text with corresponding sentiment labels). The goal is to minimize the
difference between the predicted sentiment and the actual sentiment.

65
NLP Context: Weights and Bias
• For each connection between neurons, there is a weight associated with that
connection. These weights determine the contribution of each input feature
to the final prediction. Higher weights indicate a stronger influence, while
lower weights indicate a weaker influence.
• Example:
• Input Layer: Suppose we have three input neurons corresponding to three words:
"good," "bad," and "movie."
• Hidden Layer: The hidden layer neurons learn to assign weights to the input neurons
based on their importance in determining sentiment. For example:
• Weight for "good" input: 0.8
• Weight for "bad" input: -0.5
• Weight for "movie" input: 0.3
• Output Layer: The output layer combines the weighted inputs to produce a sentiment
prediction. Let's say the weighted sum is 0.6. If the model is using a sigmoid
activation function, it might output a value close to 1 (indicating a positive
66
sentiment).
NLP Context: Weights and Bias
• During training, these weights are adjusted using optimization algorithms
like gradient descent to minimize the difference between the predicted
sentiment and the true sentiment labels in the training data.

• Weights in NLP neural networks represent the learned parameters that


determine the influence of input features on the model's predictions, and
they are crucial for capturing the complex relationships within the language
data.

67
NLP Context: Bias
• In a neural network, each neuron has associated weights and a bias term. The
bias is a learnable parameter that is added to the weighted sum of the inputs
before passing through an activation function. The bias term allows the
neural network to model functions that do not necessarily pass through the
origin.
• For example, in a simple linear equation y=wx+b, where w is the weight and
b is the bias, the bias term allows the line to have an offset or a y-intercept.
In the context of NLP, this bias term is present in each neuron and
contributes to the overall flexibility and expressiveness of the model.

68
NLP Context: Bias
• In NLP tasks, the bias term helps the neural network capture patterns in the
data that may not be purely based on the input features. It allows the model
to learn the inherent tendencies or predispositions that exist in the data.
• For instance, in sentiment analysis, the bias term allows the model to learn
whether certain words or combinations of words tend to contribute to
positive or negative sentiment, regardless of the specific values of the input
features.
• Example: Consider a sentiment analysis task where the input is a sentence,
and the goal is to predict whether the sentiment is positive or negative. Each
word in the sentence is represented as a feature. The neural network, with its
bias terms, can learn that certain words (features) might generally contribute
positively or negatively to the sentiment, regardless of the specific
69
combination of words in a given instance.
Sentiment Analysis with Bias Term
• Suppose we have the following training data:
• Positive Example: "I love this product! It's amazing!"
• Negative Example: "The quality of this item is terrible."
• We want to build a neural network for sentiment analysis. For simplicity,
let's use a basic model with one neuron (unit) for sentiment prediction, and
this neuron has a weight (w) and a bias (b). The model might look like this:
• Prediction=sigmoid(Weighted Sum of Inputs+Bias)
• Now, let's assign some arbitrary values to the weights and bias:
• Weight (w)= =0.5
• Bias (b)=−0.2
• For the sake of simplicity, let's assume we are using the sigmoid activation
function, and the threshold for classifying sentiments is set at 0.5 (above 0.5
is positive, below 0.5 is negative). 70
Sentiment Analysis with Bias Term
• Positive Example: "I love this product! It's amazing!"
• The model processes the input words ("I," "love," "this," "product," "amazing"):
• Weighted Sum of Inputs=0.5×(sum of word embeddings)
• Bias=−0.2
• Prediction=sigmoid(Weighted Sum of Inputs+Bias)
• Let's say the model calculates a prediction of 0.8, which is above the threshold, so it correctly
predicts a positive sentiment.
• Negative Example: "The quality of this item is terrible."
• The model processes the input words ("The," "quality," "of," "this," "item," "terrible"):
• Weighted Sum of Inputs=0.5×(sum of word embeddings)
• Bias=−0.2
• Prediction=sigmoid(Weighted Sum of Inputs+Bias)
• Let's say the model calculates a prediction of 0.2, which is below the threshold, so it correctly
predicts a negative sentiment.

71
Sentiment Analysis with Bias Term
• Bias Interpretation:
The bias term (-0.2) allows the model to shift the decision boundary. In this case, it can be learned that
even if the weighted sum of inputs is not very positive, the bias can compensate for it, making the
overall prediction positive. Conversely, for a negative sentiment example, the bias can push the
prediction towards negativity even if the weighted sum of inputs is not strongly negative.

• This bias term allows the model to capture overall sentiments and tendencies present in the training
data, contributing to its ability to generalize to new examples. However, it's essential to be aware of
biases that may exist in the training data and consider them in the development and evaluation of
sentiment analysis models.

72
Example: Sentiment Analysis
• Suppose you are working on sentiment analysis, a task where the goal is to determine the
sentiment expressed in a sentence (positive, negative, or neutral). Here's a sequence of
sentences with corresponding sentiments:
• Sentence: "I love this movie!"
• Sentiment: Positive

• Sentence: "The weather is terrible today.“


• Sentiment: Negative

• Sentence: "Neutral statement with no strong sentiment.“


• Sentiment: Neutral
• Now, let's represent these sentences as sequential data for processing by a
neural network. We'll use a simple tokenization approach where each word
is a discrete element in the sequence: 73
Example: Sentiment Analysis
• Tokenized Sequence for "I love this movie!":
• ["I", "love", "this", "movie", "!"]
• Tokenized Sequence for "The weather is terrible today.":
• ["The", "weather", "is", "terrible", "today", "."]
• Tokenized Sequence for "Neutral statement with no strong sentiment.":
• ["Neutral", "statement", "with", "no", "strong", "sentiment", "."]
• A simple recurrent neural network (RNN) for sentiment analysis. The RNN
processes each word in the sequence one at a time, updating its hidden state
at each step. The final hidden state contains information about the entire
sequence.

74
Example: Sentiment Analysis
• RNN Architecture:
• Input Layer: Each word is represented by an embedding.
• Hidden Layer: Captures sequential dependencies.
• Output Layer: Produces sentiment prediction.
• During training, the weights in the network are adjusted based on the
sentiment labels, and the model learns to capture the relationships between
words in the context of sentiment.
• During the processing of the sentence "I love this movie!" in the RNN, the
network learns to give higher weights to the words "love" and "movie" while
considering the overall context of the sentence. This allows the model to
make a positive sentiment prediction.
• Sequential data, in this case, text data, is represented as a sequence of
elements, and neural networks like RNNs are designed to capture the
dependencies and patterns within such sequences for tasks like sentiment
75
analysis.
NLP Use cases
• Typing any sentence in gmail: Autocompletion
• Spam Filter: based on some keyword
• Language Translation
• Customer service chatbots
• Voice Assistant
• Google Search uses: the BERT Language model
• Auto news generation: rumors or not

76
NLP
• IS a field in computer science and AI gives machines the ability to understand human
language better and to assist in language-related tasks.

77
Regex for NLP

78
Goal of Bayesian Learning

79
ArgMax Computation

80
ArgMax Computation
• In natural language processing (NLP), the argmax function is commonly used to find
the argument (input) that maximizes a given function. Specifically, it is often used to
find the index or value of the maximum element in a set of values. In the context of NLP,
this can be applied to various tasks, such as text classification, language modeling, and
sequence labeling.
• Let's break down the concept with an example related to text classification. Suppose you
have a set of text documents, and you want to classify each document into one of several
categories (e.g., sports, politics, entertainment). To achieve this, you might use a
machine learning model that assigns a probability distribution over the possible
categories for each document.
• Let's say you have a document, and after applying your model, you obtain the following
probability distribution over categories:
• Probabilities:[0.1,0.4,0.2,0.3]

81
ArgMax Computation
• Let's say you have a document, and after applying your model, you obtain the following
probability distribution over categories:
• Probabilities:[0.1,0.4,0.2,0.3]
• In this example, the array represents the probabilities assigned to each category. To
determine the predicted category, you use the argmax function to find the index of the
maximum probability:
• Argmax:argmax([0.1,0.4,0.2,0.3])=1
• In this case, the argmax function returns 1, indicating that the second category (index 1)
has the highest probability. Therefore, you predict that the document belongs to the
category associated with index 1 (e.g., politics).

82
Stages
Phonetics: Phonetics is the study of the physical sounds of human speech. It involves the
analysis and classification of speech sounds, known as phonemes, based on their
articulation and acoustic properties.
Morphology: Morphology is the study of the structure and formation of words. It deals
with morphemes, which are the smallest units of meaning in a language. Morphology
explores how morphemes combine to create words and how words can be modified.
Lexical Analysis: Lexical analysis, also known as lexical processing or lexical semantics,
is a stage in language processing where the input text is broken down into individual words
or tokens. This process is crucial for understanding the basic units of language.
Syntatic Analysis: Syntactic analysis focuses on the grammatical structure of sentences. It
involves analyzing the arrangement of words to determine how they relate to each other
and form grammatically correct sentences.

83
Stages

Semantic Analysis: Semantic analysis is concerned with the meaning of words and how
they combine to convey meaning in sentences. This stage aims to understand the intended
meaning of the text.
Pragmatics: Pragmatics involves the study of language in context, considering the social
and cultural aspects of communication. It deals with how context influences the
interpretation of language, including implied meaning, tone, and the speaker's intentions.
Discourse: Discourse refers to the larger units of connected text or speech beyond the
sentence level. It involves the study of how sentences are organized and connected to form
coherent and meaningful communication. Discourse analysis examines the structure,
coherence, and flow of information in a conversation or text.

84
Stages Challenges
Phonetics:
Variability in Pronunciation: Different speakers may pronounce words or phonemes differently,
introducing variability that needs to be accounted for in speech recognition systems.
Background Noise: Environmental noise can affect the accuracy of speech recognition systems, making
it challenging to accurately transcribe spoken words.
Example: In speech recognition, the word "read" can be pronounced differently based on context (past
tense or present tense). Distinguishing between these pronunciations accurately poses a phonetic
challenge.
Morphology:
Morphological Ambiguity: Words may have multiple meanings or interpretations depending on their
context, leading to challenges in determining the correct morphological analysis.
Word Formation in New Contexts: Morphology may face difficulties when dealing with newly coined
words or words used in creative ways.
Example: The word "running" can be analyzed as a verb in its present participle form or as a noun
referring to a physical activity. Identifying the correct morphological analysis requires considering the
context of usage.
85
Stages Challenges
Lexical Analysis:
Out-of-Vocabulary Words: Handling words that are not present in the system's vocabulary can be
challenging, especially in the case of rapidly changing languages or domain-specific terminology.
Word Sense Disambiguation: Resolving the correct meaning of a word with multiple meanings in a
specific context can be a challenge.
Example: In a medical text, the term "virus" may refer to a biological entity, while in a computer science
context, it could refer to malicious software. Lexical analysis needs to account for domain-specific
meanings.
Syntactic Analysis:
Parsing Ambiguity: Sentences may have multiple valid syntactic interpretations, and resolving ambiguity
in parsing can be complex.
Handling Ellipsis and Anaphora: Understanding references and omitted words in sentences poses
challenges for syntactic analysis.
Example: The sentence "I saw the man with the telescope" can be syntactically
ambiguous. It's unclear whether the speaker used a telescope to see the man or if the
man had the telescope. Resolving this ambiguity is a syntactic challenge. 86
Stages Challenges
Semantic Analysis:
Ambiguity in Meaning: Similar to syntactic ambiguity, words or phrases may have multiple meanings,
and determining the correct meaning in a given context can be challenging.
Lack of World Knowledge: Understanding context often requires knowledge beyond language, and
incorporating real-world knowledge into semantic analysis is a ongoing challenge.
Example: In the sentence "He's really cool under pressure," the word "cool" has a positive connotation,
but in another context, it might mean a low temperature. Semantic analysis must capture the intended
meaning based on context.

Pragmatics:
Cultural and Contextual Variation: Pragmatics involves understanding context and cultural nuances,
which can vary widely. Adapting to different cultural contexts poses challenges for pragmatic analysis.
Recognizing Intention and Politeness: Identifying the speaker's intention, implied meaning, and levels
of politeness in communication can be challenging.
Example: The statement "Could you pass the salt?" might be a polite request for salt at the dinner table.
Recognizing the politeness and understanding the pragmatic implications of the request is crucial for
accurate communication. 87
Stages Challenges
Discourse:
Coherence and Cohesion: Ensuring that sentences in a discourse are logically connected and form a
coherent whole presents challenges.
Dealing with Ambiguous References: Resolving references to entities across sentences and maintaining
coherence can be challenging, especially in longer texts.
Example: In a news article, understanding how different paragraphs are connected and identifying the
main theme can be challenging. Ensuring coherence and cohesion in longer texts requires sophisticated
discourse analysis.

88
Stages Challenges
Discourse:
Consider a segment from a political speech where a candidate is addressing a crowd:
"Ladies and gentlemen, thank you for being here today. We are at a crucial juncture in our nation's history. Our
economy is facing challenges, and our education system requires reform. But I believe in the resilience of our
people. Together, we can build a stronger future. Now, let's talk about our plan for economic recovery."
In this example:
Opening (Sentence 1): The speaker begins by expressing gratitude and acknowledging the audience. This sets the
tone and establishes a connection with the listeners.
Introduction of the Topic (Sentence 2): The speaker introduces the main topic—facing challenges in the nation's
history. This sets the context for the upcoming discussion.
Transition and Positive Assertion (Sentence 3): The speaker acknowledges challenges but emphasizes optimism and
confidence in the ability to overcome them. This serves as a transitional statement, guiding the audience toward a
positive mindset.
Call to Action (Sentence 4): The speaker transitions from the general context to a specific plan of action, expressing
a belief in collaboration for a better future.
Transition to Specifics (Sentence 5): The speaker concludes the introductory segment by signaling a shift in focus to
a detailed discussion about the plan for economic recovery.
89
Term Weighting
• Term weighting is a crucial concept in natural language processing (NLP) and information retrieval. It
involves assigning numerical weights to words or terms in a document based on their importance or
relevance.
• Term weighting is especially important in tasks such as text classification, information retrieval, and
document ranking. Some are here listed below:
• Binary Weighting:
• Term Frequency (TF)
• Inverse Document Frequency (IDF)
• Term Frequency-Inverse Document Frequency (TF-IDF)
• Logarithmic Term Frequency
• Normalization
• These techniques are used to represent and weigh terms in documents, creating a numerical representation
that can be used in various NLP applications such as document retrieval, text classification, and clustering.
The choice of term weighting method depends on the specific requirements of the task and the
characteristics of the dataset

90
Tokenization
• Tokenization is the process of breaking down a text into individual units, called tokens.
• These tokens can be words, phrases, sentences, or other meaningful elements, depending on the level
of granularity needed for analysis.
• Tokenization is a fundamental step in natural language processing (NLP) and text processing tasks.
• NLTK (Natural Language Toolkit) provides a simple and effective way to perform tokenization in
Python.
• # Tokenize into words
• tokens_words = word_tokenize(sample_text)
• # Tokenize into sentences
• tokens_sentences = sent_tokenize(sample_text)
• Library used:
• import nltk
• nltk.download('punkt')
• from nltk.tokenize import word_tokenize, sent_tokenize

91
Stemming

92
Lemmatization

93
Stopwords

94
Bag of words
• In natural language processing (NLP), the Bag of Words (BoW) model is a commonly used technique to
represent text data. It's a simple and effective way to convert variable-length sequences (sentences or
documents) into fixed-length feature vectors.
• Tokenization:
• The first step is to break down the text into individual words or terms, a process known as tokenization.
• Punctuation, capitalization, and other linguistic aspects are often ignored during this phase.
• Vocabulary Creation:
• Create a vocabulary containing all unique words present in the corpus (collection of documents or sentences).
• Each unique word is assigned a unique index or identifier.
• Vectorization:
• Represent each document or sentence as a vector, where each element corresponds to the frequency (or presence/absence) of a
word in the vocabulary.
• The resulting vectors are of fixed length, and the order of words is usually ignored.
• Example:
• Consider two sentences: "I love machine learning" and "I love deep learning."
• The vocabulary might be ["I", "love", "machine", "deep", "learning"].

95
Bag of words
• Vectorization:
• Represent each document or sentence as a vector, where each element corresponds to the frequency (or presence/absence) of a
word in the vocabulary.
• The resulting vectors are of fixed length, and the order of words is usually ignored.
• Example:
• Consider two sentences: "I love machine learning" and "I love deep learning."
• The vocabulary might be ["I", "love", "machine", "deep", "learning"].

• The vectors for the sentences could be [1, 2, 1, 0, 0] and [1, 2, 0, 1, 1], respectively.
• Sparse Representation:
• Since most documents only contain a small subset of the words in the vocabulary, the BoW representation is often sparse.
• The vectors are sparse because most elements are zero, indicating the absence of the corresponding word.

96
Bag of words: Limitation
• The Bag of Words model has its limitations. It does not consider the
order of words, which means it loses information about the structure
of sentences. Additionally, it treats each word independently,
disregarding the semantic relationships between words. Despite
these limitations, BoW is widely used as a feature representation in
various NLP applications, such as text classification, sentiment
analysis, and document clustering.

97
Bag of words: Limitation
• The Bag of Words model has its limitations. It does not consider the
order of words, which means it loses information about the structure
of sentences. Additionally, it treats each word independently,
disregarding the semantic relationships between words. Despite
these limitations, BoW is widely used as a feature representation in
various NLP applications, such as text classification, sentiment
analysis, and document clustering.

98
Remove Punctuation
• The Bag of Words model has its limitations. It does not consider the
order of words, which means it loses information about the structure

99
Syntactic Collocation
• In natural language processing (NLP), syntactic collocation refers to the tendency of certain words or types
of words to occur together in a sentence based on their syntactic structures. Syntactic collocations involve
the combination of words in a way that is grammatically acceptable and typical for a particular language.
• Examples:
• Example: "Make" and "Decision"
• Consider the phrase "make a decision." In this collocation, the verb "make" is syntactically paired with the noun
"decision" to form a common and grammatically correct phrase. Here's how it works:
• Correct Usage: "I need to make a decision.“
• In this example, "make" is a verb that requires an object to complete its meaning. The noun "decision" serves as the object, and together they
form a syntactically collocated phrase. Changing the noun to something else that doesn't commonly collocate with "make" might result in a
sentence that sounds awkward or grammatically incorrect:
• Incorrect Usage: "I need to make a table."
• In this case, "make a table" is not a common syntactic collocation, and the sentence sounds unnatural because the verb "make" typically
collocates with abstract or decision-related nouns.

• Syntactic collocations are important in NLP because understanding these patterns can help improve the
accuracy of various language processing tasks, such as parsing, sentiment analysis, and machine
translation. NLP models that take into account syntactic collocations can better capture the natural
structures of language, making them more effective in understanding and generating human-like text.
100
Term Weighting
• Term weighting is a crucial concept in natural language processing (NLP) and information retrieval. It
involves assigning numerical weights to words or terms in a document based on their importance or
relevance.
• Term weighting is especially important in tasks such as text classification, information retrieval, and
document ranking. Some are here listed below:
• Binary Weighting:
• Term Frequency (TF)
• Inverse Document Frequency (IDF)
• Term Frequency-Inverse Document Frequency (TF-IDF)
• Logarithmic Term Frequency
• Normalization
• These techniques represent and weigh terms in documents, creating a numerical representation that can be
used in various NLP applications such as document retrieval, text classification, and clustering. The choice
of term weighting method depends on the specific requirements of the task and the characteristics of the
dataset

101
Term Weighting
• Term Frequency (TF):
• Definition: Term frequency represents the number of times a term occurs in a document. It is a basic measure of the importance
of a term within a document.
• Calculation: Tfij= Number of times term j appears in document i/Total number of terms in document i

• Inverse Document Frequency (IDF):


• Definition: Inverse document frequency measures how unique or rare a term is across a collection of documents. It helps in
highlighting the importance of terms that are not common across all documents.
• Calculation: IDF j​=log(Total number of documents / Number of documents containing term j)
• ​Term Frequency-Inverse Document Frequency (TF-IDF):
• Definition: TF-IDF is a combination of term frequency and inverse document frequency. It provides a weight for each term in a
document, emphasizing terms that are frequent within the document but not across the entire document collection.
• Calculation: TF-IDFij=TFij ×IDFj

102
Term Frequency-Inverse Document
Frequency (TF-IDF)
• TF=No.of repeated words in sentence/No. of words in sentence
• IDF=log(No. of sentences/ No. of sentences containing words)

103
Word2Vec
• Word2Vec is a popular natural language
processing (NLP) technique used to represent
words as continuous vector spaces, capturing
semantic relationships between words.
• Developed by a team at Google, Word2Vec is an
unsupervised learning algorithm that learns
distributed representations of words based on
their context in a given corpus.
• This representation is often referred to as word
embeddings.
• The main idea behind Word2Vec is to represent
words with similar meanings or contexts as
vectors in a multi-dimensional space, making
them close to each other.

104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
Continuous Bag of Words (CBOW)
• Objective: Predict the target word (center word) based on the
surrounding context words.
• Example: Given the context words "The cat sat on the," predict the
target word "mat."
• Training: The model adjusts the word vectors to maximize the
likelihood of predicting the target word from the context words.

125
Skip-Gram
• Objective: Predict the context words based on the target word (center
word).
• Example: Given the target word "mat," predict the context words
"The cat sat on the."
• Training: The model adjusts the word vectors to maximize the
likelihood of predicting the context words from the target word.

126
POS Tagging:
• POS tagging is a natural language processing (NLP) task that involves
assigning a part-of-speech label (such as noun, verb, adjective, etc.)
to each word in a sentence. This task helps in understanding the
grammatical structure and meaning of a sentence.

127
POS Types:
• Rule-Based POS Tagging:

• Approach: Rule-based POS tagging relies on a set of handcrafted rules


to assign POS tags based on the grammatical and syntactic
characteristics of words in a sentence.
• Pros: Transparent, easy to understand, and customizable.
• Cons: May not perform well with ambiguous or irregular cases.

128
POS Types:
• Rule-Based POS Tagging:

• Approach: Rule-based POS tagging relies on a set of handcrafted rules


to assign POS tags based on the grammatical and syntactic
characteristics of words in a sentence.
• Pros: Transparent, easy to understand, and customizable.
• Cons: May not perform well with ambiguous or irregular cases.

129
POS Types: Rule Based

130
POS Types: Rule-Based
• In this example, we define a simple set of rules to assign POS tags to
words based on their lexical properties. The rules are as follows:

• Determiners: "the," "a," "an" are assigned the tag "DET."


• Nouns: "cat" and "dog" are assigned the tag "NOUN."
• Verbs: "is," "am," "are," "was," "were" are assigned the tag "VERB."
• Prepositions: "on," "in," "under," "over" are assigned the tag "PREP."
• Punctuation: ".", ",", "?," "!" are assigned the tag "PUNCT."
• Other words: Any other words are assigned the tag "UNKNOWN."

131
POS Types:
• Stochastic POS Tagging:

• Approach: Stochastic or probabilistic methods use statistical models


to determine the most likely POS tags for words based on the
probabilities of observing certain tags given the context or corpus.
• Examples: Hidden Markov Models (HMMs), Maximum Entropy
Markov Models (MEMMs), and Conditional Random Fields (CRFs).
• Pros: Can handle ambiguity and context-dependent situations better
than rule-based approaches.
• Cons: Requires annotated training data for model training.

132
POS Types:
• Transformation-Based POS Tagging:

• Approach: Transformation-based tagging involves learning a series of


transformational rules from annotated training data to predict POS
tags.
• Example: Brill's transformation-based learning algorithm.
• Pros: Adaptable and capable of learning complex patterns from data.
• Cons: Requires labeled training data

133
POS Types:
• Transformation-Based POS Tagging, also known as Brill's POS tagging,
is a method that involves learning a series of transformational rules
from annotated training data to predict POS tags. The algorithm
iteratively applies rules to improve the accuracy of the POS tagging.

134
POS Advantanges:
• Text Simplification: Breaking complex sentences down into their
constituent parts makes the material easier to understand and easier
to simplify.
• Information Retrieval: Information retrieval systems are enhanced by
point-of-sale (POS) tagging, which allows for more precise indexing
and search based on grammatical categories.
• Named Entity Recognition: POS tagging helps to identify entities such
as names, locations, and organizations inside text and is a
precondition for named entity identification.
• Syntactic Parsing: It facilitates syntactic parsing, which helps with
phrase structure analysis and word link identification.
135
POS Disadvantanges:
• Ambiguity: The inherent ambiguity of language makes POS tagging difficult
since words can signify different things depending on the context, which
can result in misunderstandings.
• Idiomatic Expressions: Slang, colloquialisms, and idiomatic phrases can be
problematic for POS tagging systems since they don’t always follow formal
grammar standards.
• Out-of-Vocabulary Words: Out-of-vocabulary words (words not included in
the training corpus) can be difficult to handle since the model might have
trouble assigning the correct POS tags.
• Domain Dependence: For best results, POS tagging models trained on a
single domain should have a lot of domain-specific training data because
they might not generalize well to other domains.
136
Word Sense Disambiguation (WSD)
• Word Sense Disambiguation (WSD) is the process of determining the correct
sense of a word in context, especially when a word has multiple meanings.
• WordNet is a widely used lexical database of the English language that provides a
structured representation of word meanings and their relationships. It plays a
significant role in Word Sense Disambiguation.

137
Word Sense Disambiguation (WSD)
• Word Sense Disambiguation (WSD) is the process of determining the correct
sense of a word in context, especially when a word has multiple meanings.
• WordNet is a widely used lexical database of the English language that provides a
structured representation of word meanings and their relationships. It plays a
significant role in Word Sense Disambiguation.

138
Homonymy | Hyponymy & Hypernymy |
Meronymy | Synonymy | Antonymy | Polysemy
• Homonymy: Homonyms are words that have the same spelling or pronunciation but different
meanings. For example, "bat" can refer to a flying mammal or a piece of sports equipment used in
baseball.
• Hyponymy & Hypernymy: Hyponymy refers to a hierarchical relationship between words where one
word (hyponym) is a subtype or specific instance of another word (hypernym). For example, "apple" is
a hyponym of "fruit" (hypernym), and "rose" is a hyponym of "flower" (hypernym).
• Meronymy: Meronymy describes a part-whole relationship between words, where one word
represents a part of another word. For example, "wheel" is a meronym of "car" because it is a
component part of a car.
• Synonymy: Synonyms are words that have the same or similar meanings. For example, "happy" and
"joyful" are synonyms because they convey similar sentiments.
• Antonymy: Antonyms are words that have opposite meanings. For example, "hot" and "cold" are
antonyms because they represent opposite temperature extremes.
• Polysemy: Polysemy refers to the phenomenon where a single word has multiple related meanings. For
example, "bank" can refer to a financial institution or the side of a river.

139
Algorithms:

140
Knowledge-Based Approach:

141
Lesk Algorithm
• The Lesk algorithm is a traditional and straightforward approach to Word Sense
Disambiguation (WSD) that uses the overlap of word senses between the target
word and its context to determine the correct sense. The algorithm was
introduced by Michael E. Lesk in 1986. Here's how the Lesk algorithm works:
• Gather Context: Given a target word to disambiguate and its surrounding context
(usually a sentence or a window of words), tokenize the context into individual
words.
• Retrieve Definitions: Retrieve the set of dictionary definitions for the target word
from a lexical resource such as WordNet. Each definition corresponds to a
possible sense of the word.
• Calculate Overlaps: For each definition of the target word, compute the overlap
(intersection) between the words in the definition and the words in the context.
• Select Best Sense: Choose the sense with the highest overlap as the most
appropriate sense for the target word in the given context.

142
Lesk Algorithm

143
Walker Algorithm
• The Walker algorithm is another approach to Word Sense Disambiguation (WSD), specifically designed to handle
ambiguous words in a given context. It is named after the inventor, Chris Walker. The algorithm is a heuristic-based
method that leverages word embeddings and semantic similarity to disambiguate the senses of ambiguous words.
Here's an overview of the Walker algorithm:
• Preprocessing: Tokenize the input sentence into individual words. Obtain word embeddings for each word in the sentence. Word embeddings
are dense vector representations of words in a continuous vector space, where words with similar meanings have similar vector
representations.
• Identify Ambiguous Words: Determine which words in the sentence are ambiguous and require disambiguation. These are typically words that
have multiple senses according to a lexical resource like WordNet.
• Calculate Semantic Similarity: For each ambiguous word, compute the semantic similarity between its different senses and the surrounding
context words. This can be done using cosine similarity or other distance metrics in the word embedding space.
• Select Best Sense: Choose the sense with the highest semantic similarity to the surrounding context as the most appropriate sense for the
ambiguous word.
• Disambiguation: Replace the ambiguous word with its disambiguated sense in the sentence.
• The Walker algorithm relies on the assumption that the correct sense of an ambiguous word is the one that is most
semantically similar to the context in which it appears. By comparing the word embeddings of different senses with
the context, the algorithm attempts to identify the most suitable sense.
for each ambiguous word in the sentence:
obtain word embeddings for each sense of the word
compute semantic similarity between each sense and surrounding context
select the sense with the highest similarity
replace the ambiguous word with the selected sense

144
Walker Algorithm

145
Word Sense Disambiguation (WSD)
• WordNet Structure:
• WordNet organizes words into synsets (sets of synonyms) and defines
relationships between these synsets. Each synset corresponds to a specific
meaning or sense of a word.
• Sense Inventory:
• WordNet serves as a sense inventory, providing a catalog of word senses. Each
word may have multiple senses, and WordNet enumerates and defines these
senses.
• Lesk Algorithm:
• One common approach to WSD, especially when using WordNet, is the Lesk
algorithm. The Lesk algorithm compares the context of a target word with the
definitions and examples in WordNet to determine the correct sense.

146
Word Sense Disambiguation (WSD)
• WordNet Structure:
• WordNet organizes words into synsets (sets of synonyms) and defines
relationships between these synsets. Each synset corresponds to a specific
meaning or sense of a word.
• Sense Inventory:
• WordNet serves as a sense inventory, providing a catalog of word senses. Each
word may have multiple senses, and WordNet enumerates and defines these
senses.
• Lesk Algorithm:
• One common approach to WSD, especially when using WordNet, is the Lesk
algorithm. The Lesk algorithm compares the context of a target word with the
definitions and examples in WordNet to determine the correct sense.

147
Word Sense Disambiguation (WSD)
• Word Embeddings:
• In addition to traditional methods like Lesk, modern approaches to WSD may
leverage word embeddings and neural network models trained on large
datasets to capture contextual information and semantic relationships
between words.

148
WordNet
• In Natural Language Processing (NLP), WordNet is a valuable lexical database that
is widely used for various tasks, including word sense disambiguation, semantic
similarity calculation, information retrieval, and text classification. Let's explore
some common ways in which WordNet is utilized in NLP:
• Word Sense Disambiguation (WSD): WordNet provides multiple senses for many words,
capturing different meanings or usages. In WSD, WordNet is employed to determine the
correct sense of an ambiguous word in a given context. By comparing the context with the
definitions and examples in WordNet, NLP systems can infer the most appropriate sense of
the word.
• Semantic Similarity: WordNet organizes words into synsets (sets of synonymous words) and
provides relationships such as hypernyms (more general concepts) and hyponyms (more
specific concepts). NLP applications use these relationships to compute semantic similarity
between words or phrases. For instance, two words are considered semantically similar if
they share a common hypernym in WordNet.

149
WordNet
• In Natural Language Processing (NLP), WordNet is a valuable lexical database that
is widely used for various tasks, including word sense disambiguation, semantic
similarity calculation, information retrieval, and text classification. Let's explore
some common ways in which WordNet is utilized in NLP:
• Information Retrieval: In information retrieval tasks, WordNet is utilized to expand or refine
search queries by including synonyms and related terms. By incorporating synonyms from
WordNet, search engines can retrieve documents that contain conceptually similar content,
improving the relevance of search results.
• Text Classification: WordNet-based features can be used in text classification tasks to
enhance the representation of text data. For example, features derived from WordNet, such
as hypernyms or hyponyms of words in a document, can provide additional semantic
information that helps classifiers distinguish between different categories or topics.
• Lexical Resource for NLP Systems: WordNet serves as a valuable resource for various NLP
applications by providing a structured representation of word meanings and relationships.
NLP systems often incorporate WordNet as a knowledge base to enrich their understanding
of language and improve performance in tasks such as parsing, named entity recognition, and
machine translation.
150
151
152
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy