21cse356t NLP Unit 3
21cse356t NLP Unit 3
LANGUAGE PROCESSING
Instructor:
Ms. S. Rama,
Assistant Professor
Department of Information Technology,
SRM Institute of Science and Technology,
Unit III - (Semantic and Discourse Analysis)
• Representing Meaning, Lexical Semantics
• Word Senses, Relation between Senses
• Word Sense Disambiguation
• Word Embeddings
• Word2Vec
• CBOW, Skip-gram and GloVe
• Discourse Segmentation,
• Text Coherence, Discourse Structure
• Reference Resolution
• "Pronominal Anaphora Resolution, Coreference Resolution"
Semantics Analyzer
• What is the meaning of a word?
– dog = animal or sausage?
– lie = to be in a horizontal position or a false statement made with
deliberate intent
What are the relations of different words in terms of meaning?
– Specific relations between senses
• Animal is more general than dog – Semantic fields
• Money is related to bank “a set of words grouped, referring
to a specific subject … not
necessarily synonymous, but are all
used to talk about the same
general phenomenon ”
What is the need?
• Verifiability - The system’s ability to compare the state of affairs described
by representation to the state of affairs in some world as modelled in
knowledge base
• Unambiguous representations - A concept closely related to ambiguity is
vagueness
• Canonical Form - The notion that inputs that mean the same thing should
have the same meaning representation is known as the doctrine of canonical
form.
• Inference and variables – Inference System’s ability to draw valid
conclusions based on meaning representation of inputs
• Expressiveness - A meaning representation scheme must be expressive
enough to handle an extremely wide range of subject matter.
Representing Meaning, Lexical Semantics
Semantics Analysis
• Parts of Semantic : Semantic Analysis of Natural
Language can be classified into two broad parts:
1. Lexical Semantic Analysis: Lexical Semantic
Analysis involves understanding the meaning of each
word of the text individually. It basically refers to fetching
the dictionary meaning that a word in the text is deputed
to carry.
►Antonymy ► entailment
►Homonymy ► Co-hyponymy
► In other words, we can say that synonymy is the semantic equivalence between lexical items. The
(pairs of) words that have this kind of semantic relationship are called synonyms.
► to begin = to start
Synonymy
► Pairs of words that are synonymous are believed to share all (or almost all) their semantic
features or properties. However, no two words have exactly the same meaning in all the
contexts in which they can occur.
► For example, the verbs employ and use are synonymous in the expression. We
used/employed effective strategies to solve the problem; however, only use can be
used in the following sentence:
► We used a jimmy bar to open the door.
► If we used employ, the sentence would sound awkward
► We employed a jimmy bar to open the door.
► In short, we can say that there are no absolute synonyms, i.e., pairs of words that have
the same meaning in all the situational and syntactic contexts in which they can appear.
II.Antonymy
► Antonymy is the semantic relationship that exists between two (or more) words
that have opposite meanings.
► Antonymous pairs of words usually belong to the same grammatical category (both
elements are nouns / adjectives / verbs).
► The semantic feature that they do not share is present in one member of the pair
and absent in the other
Types of Antonymy
► 1. Complementary or contradictory antonyms.
► They are pairs of words in which one member has a certain semantic property that the
other member does not have. Therefore, in the context(s) in which one member is true,
the other member cannot be true.
► It is said that these pairs of antonyms exhibit an either/or kind of contrast in which
there is no middle ground.
Types of Antonymy
► 2. Relational antonyms.
► They are pairs of words in which the presence of a certain semantic property in one
member implies the presence of another semantic property in the other member.
► In other words, the existence of one of the terms implies the existence of the other
term.
► They are pairs of words that are contrasted with respect to their degree of possession of a
certain semantic property.
► Each term represents or stands for an end-point (or extreme) on a scale (e.g., of temperature,
size, height, beauty, etc.); between those end-points there are other intermediate points (i.e.,
there is some middle ground).
► Morphologically related
► One of the members of a pair of antonyms is derived from the other member by the addition of
a negative word or an affix
► By adding negative prefixes such as un-, im-, in- il-, ir-, non-, mis-, dis-, a-.
► The pairs of words that exhibit this kind of relationship are called homonyms.
► In isolated spoken sentences, homophonic homonyms can also give rise to lexical
ambiguity.
► John went to the [bœNk] (the financial institution or the ground by the river?)
► The different meanings of a word are not interchangeable; in fact, they are context-specific.
V.Metonymy(Figurative
language)
► Metonymy is the semantic relationship that exists between two words (or a word and an expression) in
which one of the words is metaphorically used in place of the other word (or expression) in particular
contexts to convey the same meaning
► cops = policemen
► Pen = literature
► If the meaning of a superordinate term is included in the meaning of several other more specific
words, the set of specific terms which are hyponyms of the same superordinate term and are
called co-hyponyms
► 1. Lexical paraphrase
► 2. Structural paraphrase
Semantic Relationships at Phrase
or Sentence Level - Paraphrase
► 1. Lexical paraphrase
► It is the use of a semantically equivalent term in place of another in a given
context. This is also known as synonymy.
► 2. Structural paraphrase
► It is the use of a phrase or sentence in place of another phrase or sentence
semantically equivalent to it, although they have different syntactic structure.
• A Dictionary
• The very first input for evaluation of WSD is dictionary, which
is used to specify the senses to be disambiguated.
• Test Corpus
• Another input required by WSD is the high-annotated test
corpus that has the target or correct-senses. The test corpora
can be of two types
• Lexical sample − This kind of corpora is used in the system,
where it is required to disambiguate a small sample of words.
• All-words − This kind of corpora is used in the system, where
it is expected to disambiguate all the words in a piece of
running text.
Challenges in Word Sense
Disambiguation
1.Polysemy: Words often have multiple meanings, making it challenging to
determine the intended sense in context.
2. Context Sensitivity: The meaning of a word can vary depending on the
surrounding words, syntactic structure, and discourse context.
3. Ambiguity: Words may have senses that are closely related or
contextually
dependent, requiring fine-grained distinctions.
4. Data Sparsity: Annotated data for training WSD models is often limited,
especially for low-frequency senses or in languages with fewer resources.
5. Domain Specificity: Word meanings can vary across different domains or
specialized fields, requiring domain-specific knowledge for
accuratedisambiguation.
Approaches and Methods to Word Sense Disambiguation (WSD)
Steps 2 Step4:
3. Retrieve all possible senses (definitions) of the target word from the The selected sense of the target word.
dictionary. The algorithm selects Sense 1 (financial institution) because it has the
4. Retrieve all possible senses (definitions) of the surrounding words in highest overlap with the context.
Word Embeddings
words in a
multi-
dimensional
space.
Word Embeddings
One hot
Word2 Vec
encoding
TF-IDF SkipGram
Example of One-Hot Encoding vs. Word2Vector
Let's consider the word "apple" in a small vocabulary of five words: ["apple",
"banana", "fruit", "car", "dog"].
1.One-Hot Encoding (Sparse):
•"apple" → [1, 0, 0, 0, 0]
•"banana" → [0, 1, 0, 0, 0]
•"fruit" → [0, 0, 1, 0, 0]
2.Word 2Vector (Dense, 3-dimensional example):
•"apple" → [0.65, 0.45, 0.10]
•"banana" → [0.62, 0.47, 0.08]
•"fruit" → [0.61, 0.46, 0.09]
In the case of embeddings, the vectors are dense and capture semantic
information. The embeddings of "apple" and "banana" are close in vector space,
reflecting their similarity as fruits, whereas "car" and "dog" would be farther apart
from both
ALGORITHMS TO TRANSFORM THE TEXT INTO EMBEDDINGS
• Example: Given the sentence "The cat is sitting on the mat," the
model tries to predict the word "mat" from the words "The cat is
sitting on the_____________."
• Efficient for large datasets and computationally less expensive.
Architecture
The CBOW model has three key layers:
1.Input Layer: This layer takes the surrounding context
words, represented as one-hot vectors, as input.
2.Hidden Layer: The hidden layer learns the word
embeddings by transforming the one-hot input vectors
into lower-dimensional vectors (the embeddings).
3.Output Layer: This layer predicts the target word (the
word being predicted) by applying SoftMax, which
gives a probability distribution over all possible words in
the vocabulary.
CBOW
Objective Predict center word from Predict context words from a center
surrounding context words word
When to Use Better for large datasets and Better for small datasets and rare
frequent words words
Input and Output Input : One center word Input : Multiple context word
Output : Multiple context word Output : one center word
Handling of context Averages context words before Treats each context word
predicting independently
GloVe: Global Vectors
• Models like Word2Vec (Skip-gram/CBOW) are prediction-based: they predict
context words.
• But prediction-based models only capture local context.
• GloVe proposes a count-based model:
• Focus on the global co-occurrence statistics of words.
• Idea: The frequency with which words co-occur carries meaningful information.
Core Idea of GloVe
• Premise: The ratio of co-occurrence probabilities is important, not just raw
counts.
• If two words frequently co-occur with the same other words, they must be
semantically related.
• Example:
• Probability(word | ice) vs. Probability(word | steam)
How
• Builds on matrix factorization
- store most of the important information in a fixed, small number of
dimensions: a dense vector
- Create a low-dimensional matrix for the embedding while minimizing
reconstruction loss (error in going from low to high dimension)
• Fast training, scalable to huge corpora
GloVe: Global Vectors
Result
Explanation
Parallel
Elaboration
Occasion
Types of Text Coherence
In Natural Language Processing (NLP), several types of coherence contribute to the logical
flow and clarity of a text:
• Lexical Coherence: This type refers to the repetition or semantic relatedness of words
across a text. By maintaining consistent vocabulary or using synonyms, a text achieves a
unified and connected feel.
Example: The doctor examined the patient carefully. The physician then prescribed
medication.
• Syntactic Coherence: Syntactic coherence ensures that sentences are constructed with
consistent and grammatically correct structures. Proper syntax allows sentences to link
together naturally, supporting smooth reading and understanding.
Example: She arrived late. She apologized to the group. (Clear syntactic connection.)
• Semantic Coherence: Semantic coherence is established through meaningful relationships
between sentences. When the ideas expressed in consecutive sentences logically build on
each other, the text maintains semantic clarity and avoids abrupt shifts in meaning.
Example: The sun was setting. The sky turned a brilliant orange. (Logical meaning
progression.)
Types of Text Coherence
• Contrast Coherence: Contrast coherence occurs when differences or opposing ideas are
clearly articulated. Using discourse markers such as "however," "although," or "on the
other hand," a text can signal shifts in perspective without disrupting overall coherence.
Example: He worked hard all year. However, he did not achieve his goals.
• Discourse Coherence: At a broader level, discourse coherence concerns the logical
organization of the entire text. It ensures that ideas are introduced, developed, and
concluded in a structured manner, often analyzed through discourse frameworks like
Rhetorical Structure Theory.
Example: In a research paper: Introduction → Literature Review → Methodology →
Results → Discussion → Conclusion.
• Referential Coherence: Referential coherence involves the consistent and clear use of
references to entities within a text. By properly managing pronouns and noun phrases,
writers avoid ambiguity and ensure that readers can easily track subjects and objects
throughout the discourse.
Example: Emily found a lost dog. She took it home and cared for it.
Rhetorical Structure Theory
• RST is a theory of how parts of a text are connected logically.
• It models the structure of discourse (text longer than a sentence).
• Developed by William Mann and Sandra Thompson in 1988.
• The idea is: Texts aren't just a bunch of sentences stuck together —
they have a deep structure where some sentences support, explain,
contrast with, or elaborate on others.
How RST Works
•A text is broken into parts called Elementary Discourse Units (EDUs) — usually simple clauses
or sentences.
•These EDUs are linked by rhetorical relations (like cause, elaboration, contrast, background,
etc.).
•The structure is typically shown as a tree where:
•Nucleus: The main idea (important).
•Satellite: Additional info that supports or explains the nucleus (less important).
[He missed the train] ← Nucleus Relation Meaning
because
[he woke up late] ← Satellite Elaboration Adding more detail.
Cause One event causes another.
Contrast Showing differences.
Background Giving context or background info.
Showing conditions under which
Condition
something happens.
Main: The city was flooded. (Nucleus)
└── because: Heavy rains lasted for two days. (Satellite)
└── background: The rainy season had just started. (Satellite)
Reference Resolution
• Reference Resolution is the process of identifying what a word or phrase
(such as a pronoun or noun phrase) refers to in a given text. It is a crucial task
in Natural Language Processing (NLP) and is essential for understanding
meaning in discourse.
Why is Reference Resolution Important?
• Improves Text Coherence – Ensures clarity in who or what is being referred
to.
• Enhances Machine Translation – Helps translate pronouns and noun
references correctly.
• Aids Chatbots & Virtual Assistants – Allows systems to track conversations
accurately.
• Supports Sentiment Analysis – Determines the correct entity being reviewed
or criticized.
Types of Reference Resolution
Coreference Resolution
Identifies when two or more expressions in a text refer to the same entity.
Example: "John loves football. He plays for his college team." "He" refers to "John" (Coreference).