0% found this document useful (0 votes)
13 views101 pages

21cse356t NLP Unit 3

The document outlines key concepts in Natural Language Processing, focusing on semantic and discourse analysis, including word senses, lexical semantics, and word sense disambiguation. It discusses various semantic relationships such as synonymy, antonymy, and polysemy, along with methods for evaluating word sense disambiguation. Additionally, it highlights the challenges faced in disambiguation and approaches to tackle them, including dictionary-based and supervised methods.

Uploaded by

Rama Sugavanam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views101 pages

21cse356t NLP Unit 3

The document outlines key concepts in Natural Language Processing, focusing on semantic and discourse analysis, including word senses, lexical semantics, and word sense disambiguation. It discusses various semantic relationships such as synonymy, antonymy, and polysemy, along with methods for evaluating word sense disambiguation. Additionally, it highlights the challenges faced in disambiguation and approaches to tackle them, including dictionary-based and supervised methods.

Uploaded by

Rama Sugavanam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 101

21CSE356T– NATURAL

LANGUAGE PROCESSING

Instructor:
Ms. S. Rama,
Assistant Professor
Department of Information Technology,
SRM Institute of Science and Technology,
Unit III - (Semantic and Discourse Analysis)
• Representing Meaning, Lexical Semantics
• Word Senses, Relation between Senses
• Word Sense Disambiguation
• Word Embeddings
• Word2Vec
• CBOW, Skip-gram and GloVe
• Discourse Segmentation,
• Text Coherence, Discourse Structure
• Reference Resolution
• "Pronominal Anaphora Resolution, Coreference Resolution"
Semantics Analyzer
• What is the meaning of a word?
– dog = animal or sausage?
– lie = to be in a horizontal position or a false statement made with
deliberate intent
What are the relations of different words in terms of meaning?
– Specific relations between senses
• Animal is more general than dog – Semantic fields
• Money is related to bank “a set of words grouped, referring
to a specific subject … not
necessarily synonymous, but are all
used to talk about the same
general phenomenon ”
What is the need?
• Verifiability - The system’s ability to compare the state of affairs described
by representation to the state of affairs in some world as modelled in
knowledge base
• Unambiguous representations - A concept closely related to ambiguity is
vagueness
• Canonical Form - The notion that inputs that mean the same thing should
have the same meaning representation is known as the doctrine of canonical
form.
• Inference and variables – Inference System’s ability to draw valid
conclusions based on meaning representation of inputs
• Expressiveness - A meaning representation scheme must be expressive
enough to handle an extremely wide range of subject matter.
Representing Meaning, Lexical Semantics
Semantics Analysis
• Parts of Semantic : Semantic Analysis of Natural
Language can be classified into two broad parts:
1. Lexical Semantic Analysis: Lexical Semantic
Analysis involves understanding the meaning of each
word of the text individually. It basically refers to fetching
the dictionary meaning that a word in the text is deputed
to carry.

2. Compositional Semantics Analysis:Although


knowing the meaning of each word of the text is
essential, it is not sufficient to completely understand the
meaning of the text.
Representing Meaning, Lexical Semantics
The notion that the meaning of linguistic utterances can be captured in formal
structures, which we will call meaning representations.
Word Senses, Relation between Senses
Word Senses
• Different meaning that a single word can have depending on the context
• What does ‘bank’ mean?
– A financial institution
E.g., “Reserve bank of India has raised interest rates.”

– A particular branch of a financial institution


E.g., “The bank on Main Street closes at 5pm.”
– The sloping side of any hollow in the ground, especially when bordering a river

E.g., “In 1927, the bank of the Mississippi flooded.”


– A ‘repository’
- E.g., “I donate blood to a blood bank.”
Types of Word Senses:
1) Monosemy: A word has a single, clear meaning in all
contexts.
For example, “oxygen" usually refers to chemical element.

2) Polysemy: A word has multiple related meanings.


For instance, "bank" can refer to a financial institution or
the side of a river.

3) Homonymy: A word has multiple unrelated meanings.


For example, "bat" can refer to a flying mammal or a
sports equipment.
Relation between Senses
►Semantic relationships are the associations that there exist

►between the meanings of words

►between the meanings of phrases

►between the meanings of sentences


Semantic Relationships at Word Level

►Synonymy ► Hyponymy and hypernymy

►Antonymy ► entailment

►Homonymy ► Co-hyponymy

► Taxonomy and ontology


►Polysemy
relationships
►Metonymy(Figurative )
► Idiomatic expressions
I.Synonymy
► Synonymy is the semantic relationship that exists between two (or more) words that have the same
(or nearly the same) meaning and belong to the same part of speech, but are spelled differently.

► In other words, we can say that synonymy is the semantic equivalence between lexical items. The
(pairs of) words that have this kind of semantic relationship are called synonyms.

► big = large ► kind = courteous

► hide = conceal ► beginning = start

► small = little ► to cease = to stop

► couch = sofa ► fast = quickly = rapidly

► to begin = to start
Synonymy
► Pairs of words that are synonymous are believed to share all (or almost all) their semantic
features or properties. However, no two words have exactly the same meaning in all the
contexts in which they can occur.
► For example, the verbs employ and use are synonymous in the expression. We
used/employed effective strategies to solve the problem; however, only use can be
used in the following sentence:
► We used a jimmy bar to open the door.
► If we used employ, the sentence would sound awkward
► We employed a jimmy bar to open the door.
► In short, we can say that there are no absolute synonyms, i.e., pairs of words that have
the same meaning in all the situational and syntactic contexts in which they can appear.
II.Antonymy
► Antonymy is the semantic relationship that exists between two (or more) words
that have opposite meanings.
► Antonymous pairs of words usually belong to the same grammatical category (both
elements are nouns / adjectives / verbs).
► The semantic feature that they do not share is present in one member of the pair
and absent in the other
Types of Antonymy
► 1. Complementary or contradictory antonyms.

► They are pairs of words in which one member has a certain semantic property that the
other member does not have. Therefore, in the context(s) in which one member is true,
the other member cannot be true.

► E.g., male/female, married/unmarried, complete/incomplete, alive/dead,


present/absent/ awake/asleep.

► It is said that these pairs of antonyms exhibit an either/or kind of contrast in which
there is no middle ground.
Types of Antonymy
► 2. Relational antonyms.

► They are pairs of words in which the presence of a certain semantic property in one
member implies the presence of another semantic property in the other member.

► In other words, the existence of one of the terms implies the existence of the other
term.

► For example, over/under, buy/sell, doctor/patient, teacher/pupil, stop/go,


employer/employee, taller/shorter, cheaper/more expensive.
Types of Antonymy
► 3. Gradable or scalar antonyms.

► They are pairs of words that are contrasted with respect to their degree of possession of a
certain semantic property.

► Each term represents or stands for an end-point (or extreme) on a scale (e.g., of temperature,
size, height, beauty, etc.); between those end-points there are other intermediate points (i.e.,
there is some middle ground).

► E.g., hot/cold, big/small, tall/short, good/bad, strong/weak, beautiful/ugly, happy/sad,


fast/slow.
Other Dimension - Antonyms can be
► Morphologically unrelated
► One of the elements of the pair does not derive from the other

► e.g., good/bad, high/low

► Morphologically related
► One of the members of a pair of antonyms is derived from the other member by the addition of
a negative word or an affix

► e.g., good/not good, friendly/unfriendly, likely/unlikely.


Morphologically related
antonyms
► By using the word not

► e.g., alive/not alive, happy/not happy, beautiful/not beautiful.

► By adding negative prefixes such as un-, im-, in- il-, ir-, non-, mis-, dis-, a-.

► happy/unhappy, do/undo, lock/unlock, entity/nonentity,


conformist/nonconformist, tolerant/intolerant, decent/indecent,
please/displease, like /dislike, behave/misbehave, hear/mishear, moral/amoral,
political/apolitical, legal/illegal, logical/illogical, probable/improbable,
relevant/irrelevant.

► By adding negative suffixes such as –less.

► E.g., careful/careless, joyful/ joyless.


III. Homonymy
► Homonymy is the relationship that exists between two (or more) words which belong to
the same grammatical category, have the same spelling, may or may not have the same
pronunciation, but have different meanings and origins.

► to lie (= to rest) and to lie (= not to tell the truth);

► to bear (= to give birth to) and to bear (= to tolerate);

► bank (= the ground near a river) and bank (= financial institution);


Homonymy
► lead (= the first place or position) and lead (= heavy metal);

► bass (= musical instrument) and bass (= edible fish).

► The pairs of words that exhibit this kind of relationship are called homonyms.

► In isolated spoken sentences, homophonic homonyms can also give rise to lexical
ambiguity.

► John went to the [bœNk] (the financial institution or the ground by the river?)

► Mary can’t [bE´r] (have or tolerate?) children.


IV. Polysemy
► Polysemy is the semantic relationship that exists between a word and its multiple conceptually and
historically related meanings

► E.g., foot = 1. part of body; 2. lower part of something

► plain = 1. clear; 2. unadorned; 3. obvious.

► nice = 1. pleasant; 2. kind; 3. friendly; etc.

► The different meanings of a word are not interchangeable; in fact, they are context-specific.
V.Metonymy(Figurative
language)
► Metonymy is the semantic relationship that exists between two words (or a word and an expression) in
which one of the words is metaphorically used in place of the other word (or expression) in particular
contexts to convey the same meaning

► cops = policemen

► Moscow = Russian Government

► crown = monarchy /king or royality

► New blood = new people

► Pen = literature

► Hollywood = Film industry


VI. Hyponymy, hypernym and co-
hyponyms
► Hyponymy is the semantic relationship that exists between two (or more) words in such a way
that the meaning of one word includes (or contains) the meaning of other words(s).

► If the meaning of a superordinate term is included in the meaning of several other more specific
words, the set of specific terms which are hyponyms of the same superordinate term and are
called co-hyponyms

► Dog in hyponym, where animal is a hypernym


Semantic Relationships at Phrase
or Sentence Level - Paraphrase

► Paraphrase is the expression of the meaning of a word, phrase or


sentence using other words, phrases or sentences which have
(almost) the same meaning. Paraphrase involves a relation of
semantic equivalence between syntactically different phrases or
sentences

► John wrote a letter to Mary. A dog bit John.

► John wrote Mary a letter. John was bitten


by a dog.
Semantic Relationships at Phrase
or Sentence Level - Paraphrase

► Like synonymy, paraphrase is never perfect; there are always


differences in emphasis or focus.

There are two kinds of paraphrase:

► 1. Lexical paraphrase

► 2. Structural paraphrase
Semantic Relationships at Phrase
or Sentence Level - Paraphrase
► 1. Lexical paraphrase
► It is the use of a semantically equivalent term in place of another in a given
context. This is also known as synonymy.

► John is happy. = John is cheerful.

► to rejuvenate = to make someone or something appear or feel younger.

► 2. Structural paraphrase
► It is the use of a phrase or sentence in place of another phrase or sentence
semantically equivalent to it, although they have different syntactic structure.

► John showed the pictures to me. John showed me the pictures.


Examples in Sentences
• Let’s explore how these relationships work in 4. Meronymy in Context:
sentences: 1. Sentence: "The car's wheels are new."
1. Synonymy in Context: Explanation: "Wheels" are parts of a "car."
1. Sentence 1: "The movie was amazing." 5. Polysemy in Context:
2. Sentence 2: "The movie was incredible." 1. Sentence 1: "He runs a mile every day."
Explanation: "Amazing" and "incredible" are 2. Sentence 2: "She runs a software company."
synonyms. Explanation: The word "run" has multiple
2. Antonymy in Context: related senses.
1. Sentence 1: "The water is hot." 6. Homonymy in Context:
2. Sentence 2: "The water is cold." 1. Sentence 1: "The bat flew into the cave."
Explanation: "Hot" and "cold" are gradable 2. Sentence 2: "He hit the ball with a bat."
antonyms. Explanation: "Bat" has unrelated meanings.
3. Hyponymy in Context:
1. Sentence: "A Labrador is a type of dog."
Explanation: "Labrador" is a hyponym of
"dog."
Word Sense Disambiguation
Word sense disambiguation
• What does this word mean?
– This plant needs to be watered each day.
living plant
– This plant manufactures 1000 widgets each day.
factory
• Word sense disambiguation (WSD) – Identify the sense of content words
(noun, verb, adjective) in context (assuming a fixed inventory of word senses)
Eg. 2
I can hear bass sound.
He likes to eat grilled bass.
Evaluation of WSD

• A Dictionary
• The very first input for evaluation of WSD is dictionary, which
is used to specify the senses to be disambiguated.
• Test Corpus
• Another input required by WSD is the high-annotated test
corpus that has the target or correct-senses. The test corpora
can be of two types
• Lexical sample − This kind of corpora is used in the system,
where it is required to disambiguate a small sample of words.
• All-words − This kind of corpora is used in the system, where
it is expected to disambiguate all the words in a piece of
running text.
Challenges in Word Sense
Disambiguation
1.Polysemy: Words often have multiple meanings, making it challenging to
determine the intended sense in context.
2. Context Sensitivity: The meaning of a word can vary depending on the
surrounding words, syntactic structure, and discourse context.
3. Ambiguity: Words may have senses that are closely related or
contextually
dependent, requiring fine-grained distinctions.
4. Data Sparsity: Annotated data for training WSD models is often limited,
especially for low-frequency senses or in languages with fewer resources.
5. Domain Specificity: Word meanings can vary across different domains or
specialized fields, requiring domain-specific knowledge for
accuratedisambiguation.
Approaches and Methods to Word Sense Disambiguation (WSD)

• Dictionary-based or Knowledge-based Methods


• Supervised Methods
• Semi-supervised Methods
• Unsupervised Methods
Lesk algorithm
• How the Lesk Algorithm Works Sentence (Context): the context.
• "I deposited money in the bank." Surrounding words in the context: "deposited," "money," "in," "the.“
Step 1: Step 3:
1. A word to be disambiguated (target word). 5. Measure the overlap between the definitions of the target word’s
2. The context in which the word occurs (e.g., a sentence or a short senses and the definitions of the surrounding context words.
window of text). 6. Select the sense with the highest overlap as the correct sense.
3. A dictionary or lexicon that contains definitions for the senses of the Compare the glosses of the senses with the words in the context:
word (e.g., WordNet). 7. Sense 1 (Financial Institution):
Retrieve Senses of "Bank" Gloss: "A financial establishment that accepts deposits and offers
• From a lexicon like WordNet: loans."
1. Bank (Sense 1): A financial institution where people deposit or borrow Overlap: Words like "deposited" and "money" match with this gloss.
money. Score: 2
Gloss: "A financial establishment that accepts deposits and offers 8. Sense 2 (Riverbank):
loans." Gloss: "The slope of land adjoining a river."
2. Bank (Sense 2): The side of a river or stream. Overlap: No matches with the context words.
Gloss: "The slope of land adjoining a river." Score: 0

Steps 2 Step4:
3. Retrieve all possible senses (definitions) of the target word from the The selected sense of the target word.
dictionary. The algorithm selects Sense 1 (financial institution) because it has the
4. Retrieve all possible senses (definitions) of the surrounding words in highest overlap with the context.
Word Embeddings
words in a
multi-
dimensional
space.
Word Embeddings

• Word embeddings are vector representations of words in a multi-


dimensional space. These vectors capture the semantic meanings of
words, such that words with similar meanings or contexts are
represented by similar vectors.
• Why Use Word Embeddings?
• High Dimensionality: Unlike one-hot encoding, which uses high-
dimensional sparse vectors, word embeddings provide low-
dimensional dense vectors.
• Semantic Information: Embeddings capture rich semantic
relationships, such as similarity and analogy, between words.
Word Embedding types
Word
embedding

Count or Deep Learning


Frequency trained models

One hot
Word2 Vec
encoding

Bag of words CBOW

TF-IDF SkipGram
Example of One-Hot Encoding vs. Word2Vector
Let's consider the word "apple" in a small vocabulary of five words: ["apple",
"banana", "fruit", "car", "dog"].
1.One-Hot Encoding (Sparse):
•"apple" → [1, 0, 0, 0, 0]
•"banana" → [0, 1, 0, 0, 0]
•"fruit" → [0, 0, 1, 0, 0]
2.Word 2Vector (Dense, 3-dimensional example):
•"apple" → [0.65, 0.45, 0.10]
•"banana" → [0.62, 0.47, 0.08]
•"fruit" → [0.61, 0.46, 0.09]
In the case of embeddings, the vectors are dense and capture semantic
information. The embeddings of "apple" and "banana" are close in vector space,
reflecting their similarity as fruits, whereas "car" and "dog" would be farther apart
from both
ALGORITHMS TO TRANSFORM THE TEXT INTO EMBEDDINGS

• Term Frequency-Inverse Document Frequency (TF-IDF)


• Word2Vec
• CBOW (Continuous Bag of words)
• Skip-Gram
• Doc2Vect
• Distributed Memory Model of Paragraph Vectors (PV-DM)
• Distributed Bag of Words version of Paragraph Vector (PV-DBOW)
• GloVe (Global Vectors for Word Representation)
• FastText
• Contextual Embeddings (ELMo, BERT, GPT)
Word2Vec
• In Word2Vec neural networks is used to get the embeddings representation
of the words in corpus (set of documents). The Word2Vec is likely to capture
the contextual meaning of the words very well.
• Word2Vec is a word embedding technique developed by Tomas Mikolov and
his team at Google in 2013.
• It represents words as dense numerical vectors in a continuous vector space,
capturing semantic and syntactic relationships between words.
• Word2Vec is a shallow, two-layer neural network designed to learn distributed
word representations by predicting word context.
• Unlike traditional methods like TF-IDF or Bag of Words, which produce
sparse and high-dimensional representations, Word2Vec generates dense,
low-dimensional, continuous vector representations of words.
Examples of Word2Vec Meaningful Word
Relationships
Synonym Relationships happy joyful", "cheerful", and
"content
Antonym Relationships (Weaker in Word2Vec) hot Cold
Gender Analogies King" - "Man" + "Woman" ≈ "Queen" Queen
Prince" - "Boy" + "Girl" ≈ "Princess" Princess
Country-Capital Relationships Paris" is to "France" "Rome" is to "Italy".

Object-Action Relationships Baker bread


Butcher meat
Profession-Workplace Relationship Doctor hospital
School teacher
Transportation Relationships Car" is to "road" Train is to railway
Brand-Product Relationship Apple" is to "iPhone" Samsung" is
to "Galaxy".
Odd-One-Out (Semantic Relationship) apple", "banana", "grape", "car" car
• How Word2Vec Works
• Word2Vec uses neural networks to learn word embeddings, where
words with similar meanings have similar vector representations.
There are two main models:
• Continuous Bag of Words (CBOW)
• Skip-Gram Model
Continuous Bag of Words
(CBOW)
• Predicts a target word given its surrounding context words.

• Works by training a simple neural network to maximize the probability


of predicting the correct word based on its neighbors.

• Example: Given the sentence "The cat is sitting on the mat," the
model tries to predict the word "mat" from the words "The cat is
sitting on the_____________."
• Efficient for large datasets and computationally less expensive.
Architecture
The CBOW model has three key layers:
1.Input Layer: This layer takes the surrounding context
words, represented as one-hot vectors, as input.
2.Hidden Layer: The hidden layer learns the word
embeddings by transforming the one-hot input vectors
into lower-dimensional vectors (the embeddings).
3.Output Layer: This layer predicts the target word (the
word being predicted) by applying SoftMax, which
gives a probability distribution over all possible words in
the vocabulary.
CBOW

• The CBOW model uses the target word


around the context word in order to predict
it. Consider the above example “She is a
great dancer.”
• The CBOW model converts this phrase
into pairs of context words and target
words.

• The word pairings would appear like this


([she, a], is), ([is, great], a) ([a,
dancer], great) having window size=2.
• CBOW (Continuous Bag
of Words): The CBOW
model predicts the current
word given context words
within a specific window.

• The input layer contains


the context words and the
output layer contains the
current word.

• The hidden layer contains


the dimensions we want
to represent the current
word present at the
output layer.
Illustration
BoW vs CBoW
• The Bag-of-Words model and the Continuous Bag-of-Words model
are both techniques used in natural language processing to
represent text in a computer-readable format, but they differ in
how they capture context.

• The BoW model represents text as a collection of words and their


frequency in a given document or corpus. It does not consider
the order or context in which the words appear, and therefore, it
may not capture the full meaning of the text. The BoW model is
simple and easy to implement, but it has limitations in capturing
the meaning of language.

• In contrast, the CBOW model is a neural network-based approach


that captures the context of words. It learns to predict the target
word based on the words that appear before and after it in a
given context window. By considering the surrounding words, the
CBOW model can better capture the meaning of a word in a given
Skip-Gram Model
• Predicts surrounding context words given a target word.
• The model takes a word and tries to predict words appearing nearby
in the sentence.
• Example: Given the word "cat," it tries to predict words like "pet,"
"animal," "furry“ or “sat”
• Works better for learning rare word representations compared to
CBOW.
Skip gram model
• The context words are predicted in the skip-gram model given a
target (center) word. Consider the following sentence: "Word2Vec
uses a deep learning model in the backend."

• Given the center word 'learning' and a context window size of 2,


the model tries to predict ['deep,' 'model'], and so on.
• We feed the skip-gram model pairs of (X, Y), where X is our input
and Y is our label, because the model has to predict many words
from a single provided word. This is accomplished by creating
positive and negative input samples.
• These samples alert the model to contextually relevant terms,
causing it to construct similar embeddings for words with similar
meanings.
Skipgram
The model's operation is described in the steps below:
• Individual embedding layers are passed both the target and context
word pairs, yielding dense word embeddings for each of these two
words.
• The dot product of these two embeddings is computed using a 'merge
layer,' and the dot product value is obtained.
• The value of the dot product is then transmitted to a dense sigmoid
layer, which outputs 0 or 1.
• The output is compared to the actual value or label, and the loss is
calculated, then backpropagation is used to update the embedding layer
at each epoch.
Aspect CBOW (Continuous Bag of Words) Skip-gram

Objective Predict center word from Predict context words from a center
surrounding context words word

Training Focus Focuses on collective (context → Focuses on individual (center →


center) prediction context) predictions

When to Use Better for large datasets and Better for small datasets and rare
frequent words words

Example ["queen", "prince", "royal", “king" → predict ["queen",


"crown"] → predict "king" "prince", "royal", "crown"]

Input and Output Input : One center word Input : Multiple context word
Output : Multiple context word Output : one center word

Handling of context Averages context words before Treats each context word
predicting independently
GloVe: Global Vectors
• Models like Word2Vec (Skip-gram/CBOW) are prediction-based: they predict
context words.
• But prediction-based models only capture local context.
• GloVe proposes a count-based model:
• Focus on the global co-occurrence statistics of words.
• Idea: The frequency with which words co-occur carries meaningful information.
Core Idea of GloVe
• Premise: The ratio of co-occurrence probabilities is important, not just raw
counts.
• If two words frequently co-occur with the same other words, they must be
semantically related.
• Example:
• Probability(word | ice) vs. Probability(word | steam)
How
• Builds on matrix factorization
- store most of the important information in a fixed, small number of
dimensions: a dense vector
- Create a low-dimensional matrix for the embedding while minimizing
reconstruction loss (error in going from low to high dimension)
• Fast training, scalable to huge corpora
GloVe: Global Vectors

Distance between pairs of word


vector for similarity on nearest
neighbour
• Discourse Segmentation,
• Text Coherence, Discourse Structure
DISCOURSE
• Language does not normally consist of isolated, unrelated sentences,
but instead of collocated, related group of sentences. We refer to
such a group of sentences discourse.
• Monologues are characterized by a speaker (a term which will be
used to include writers, as it is here), and a hearer (which, analogously,
includes readers). Discourse in NLP (Natural Language Processing)
It’s about going beyond individual words In computational linguistics and NLP, studying
or single sentences, and looking at the
discourse means:
bigger picture:
•How ideas are connected •Segmenting texts into meaningful units (discourse
•How topics change segmentation).
•How pronouns link back to nouns •Finding relations between parts (cause-effect,
•How reasons, causes, contrasts are contrast, elaboration).
expressed •Resolving references (who/what is "he", "it",
"they"?).
•Modeling conversations and long texts.
DISCOURSE SEGMENTATION
Discourse segmentation is the task of breaking down a long text or
conversation into smaller meaningful units, called Elementary
Discourse Units (EDUs).
An EDU could be:
• A sentence
• A clause
• A phrase
• A dialogue turn
• These units serve as the building blocks for analyzing the structure and
meaning of a discourse.
Types of Discourse Segmentation
• 1. Sentence-Level Segmentation – Splitting text into sentences (e.g., in
speechto-text systems).
• 2. Topic Segmentation – Identifying where one topic ends and
another begins.
• 3. Dialogue Segmentation – Separating speaker turns in a
conversation
What Features Help in Discourse Segmentation?
•Cue Words: "because", "although", "however" often signal segment boundaries.
•Punctuation: Periods, commas, semicolons hint at boundaries.
•Syntactic Structure: Subordinate clauses may start new segments.
•Lexical Patterns: Repeated words or changes in topic vocabulary.
•Prosody (in speech): Pauses, intonation changes (for spoken language)

A discourse marker is a word or phrase that functions to signal discourse structure.


Text:
"Although it was raining, they decided to go for a walk because they enjoyed the
fresh air."
Discourse Segmentation output:
1.[Although it was raining]
2.[they decided to go for a walk]
3.[because they enjoyed the fresh air]
Each break unit is an EDU.
Eg2
• Example of Discourse Segmentation Consider this passage: "Climate
change is a serious global issue. Scientists warn about rising
temperatures. Governments are working on policies. On another
note, recent tech advancements in AI are revolutionizing industries.“
• A discourse segmentation system would separate the two topics:
1. Segment 1 (Climate Change Topic): "Climate change is a serious
global issue. Scientists warn about rising temperatures.
Governments are working on policies."
2. Segment 2 (AI Technology Topic): "On another note, recent tech
advancements in AI are revolutionizing industries."
DISCOURSE SEGMENTATION
• Separating a document into a linear sequence of subtopics: Information
retrieval, for example, for automatically segmenting a TV news broadcast or a
long news story into a sequence of stories so as to find a relevant story, or for
text summarization algorithms which need to make sure that different
segments of the document are summarized correctly, or for information
extraction algorithms which tend to extract information from inside a single
discourse segment.
• Unsupervised Discourse Segmentation: Cohesion is the use of certain
linguistic devices to link or tie together textual units. Lexical cohesion is
cohesion indicated by relations between words in the two units, such as use
of an identical words a synonm and hypernym.
• Supervised Discourse Segmentation: For the task of paragraph segmentation,
it is trivial to find labeled training data from the web (marked with <p>) or
other sources.
Methods for Discourse
Segmentation
• Rule-Based Approaches – Using linguistic rules (e.g., discourse
markers like "however," "on another note").
• Supervised Machine Learning – Training models on labeled datasets.
• Unsupervised Techniques – Using clustering methods based on word
similarity.
• Neural Networks & Deep Learning – Using models like BERT or GPT
for segmenting complex texts.
Text Coherence
• Text coherence refers to the logical connection and smooth flow of ideas in
a text, making it meaningful and easy to understand. A coherent text
ensures that sentences and paragraphs are well-organized, related, and
contribute to the overall message
• Why is Text Coherence Important?
• 1. Enhances Readability – Makes text easier to follow.
• 2. Improves Comprehension – Helps readers understand the relationships
between ideas.
• 3. Essential for NLP Applications – Used in text summarization, machine
translation, and question-answering systems.
• 4. Key in Discourse Analysis – Ensures that discourse structure is logically
connected
Hebb has proposed such kind of relations as follows −
• We are taking two terms S0 and S1 to represent the meaning of the two related
sentences −
i) Result - It infers that the state asserted by term S0 could cause the state asserted
by S1. For example, two statements show the relationship result: Ram was caught in
the fire. His skin burned.
ii) Explanation - It infers that the state asserted by S1 could cause the state asserted
by S0. For example, two statements show the relationship − Ram fought with Shyams
friend. He was drunk.
iii ) Parallel - It infers p(a1,a2,) from assertion of S0 and p(b1,b2,) from assertion S1. Here
ai and bi are similar for all i. For example, two statements are parallel − Ram wanted
car. Shyam wanted money.
iv) Elaboration- It infers the same proposition P from both the assertions − S0 and S1 For
example, two statements show the relation elaboration: Ram was from Chandigarh.
Shyam was from Kerala.
v) Occasion - It happens when a change of state can be inferred from the assertion
of S0, final state of which can be inferred from S1 and vice-versa. For example, the two
statements show the relation occasion: Ram picked up the book. He gave it to Shyam.
Building Hierarchical Discourse
Structure

Result
Explanation
Parallel
Elaboration
Occasion
Types of Text Coherence
In Natural Language Processing (NLP), several types of coherence contribute to the logical
flow and clarity of a text:
• Lexical Coherence: This type refers to the repetition or semantic relatedness of words
across a text. By maintaining consistent vocabulary or using synonyms, a text achieves a
unified and connected feel.
Example: The doctor examined the patient carefully. The physician then prescribed
medication.
• Syntactic Coherence: Syntactic coherence ensures that sentences are constructed with
consistent and grammatically correct structures. Proper syntax allows sentences to link
together naturally, supporting smooth reading and understanding.
Example: She arrived late. She apologized to the group. (Clear syntactic connection.)
• Semantic Coherence: Semantic coherence is established through meaningful relationships
between sentences. When the ideas expressed in consecutive sentences logically build on
each other, the text maintains semantic clarity and avoids abrupt shifts in meaning.
Example: The sun was setting. The sky turned a brilliant orange. (Logical meaning
progression.)
Types of Text Coherence
• Contrast Coherence: Contrast coherence occurs when differences or opposing ideas are
clearly articulated. Using discourse markers such as "however," "although," or "on the
other hand," a text can signal shifts in perspective without disrupting overall coherence.
Example: He worked hard all year. However, he did not achieve his goals.
• Discourse Coherence: At a broader level, discourse coherence concerns the logical
organization of the entire text. It ensures that ideas are introduced, developed, and
concluded in a structured manner, often analyzed through discourse frameworks like
Rhetorical Structure Theory.
Example: In a research paper: Introduction → Literature Review → Methodology →
Results → Discussion → Conclusion.
• Referential Coherence: Referential coherence involves the consistent and clear use of
references to entities within a text. By properly managing pronouns and noun phrases,
writers avoid ambiguity and ensure that readers can easily track subjects and objects
throughout the discourse.
Example: Emily found a lost dog. She took it home and cared for it.
Rhetorical Structure Theory
• RST is a theory of how parts of a text are connected logically.
• It models the structure of discourse (text longer than a sentence).
• Developed by William Mann and Sandra Thompson in 1988.
• The idea is: Texts aren't just a bunch of sentences stuck together —
they have a deep structure where some sentences support, explain,
contrast with, or elaborate on others.
How RST Works
•A text is broken into parts called Elementary Discourse Units (EDUs) — usually simple clauses
or sentences.
•These EDUs are linked by rhetorical relations (like cause, elaboration, contrast, background,
etc.).
•The structure is typically shown as a tree where:
•Nucleus: The main idea (important).
•Satellite: Additional info that supports or explains the nucleus (less important).
[He missed the train] ← Nucleus Relation Meaning
because
[he woke up late] ← Satellite Elaboration Adding more detail.
Cause One event causes another.
Contrast Showing differences.
Background Giving context or background info.
Showing conditions under which
Condition
something happens.
Main: The city was flooded. (Nucleus)
└── because: Heavy rains lasted for two days. (Satellite)
└── background: The rainy season had just started. (Satellite)
Reference Resolution
• Reference Resolution is the process of identifying what a word or phrase
(such as a pronoun or noun phrase) refers to in a given text. It is a crucial task
in Natural Language Processing (NLP) and is essential for understanding
meaning in discourse.
Why is Reference Resolution Important?
• Improves Text Coherence – Ensures clarity in who or what is being referred
to.
• Enhances Machine Translation – Helps translate pronouns and noun
references correctly.
• Aids Chatbots & Virtual Assistants – Allows systems to track conversations
accurately.
• Supports Sentiment Analysis – Determines the correct entity being reviewed
or criticized.
Types of Reference Resolution
Coreference Resolution
Identifies when two or more expressions in a text refer to the same entity.
Example: "John loves football. He plays for his college team." "He" refers to "John" (Coreference).

Pronominal Anaphora Resolution


Resolves pronouns to their corresponding nouns.
Example: "Sarah went to the store. She bought some milk." "She" refers to "Sarah".
Cataphora resolution is a specific type of coreference resolution in Natural Language Processing (NLP). It deals with
cases where a pronoun or a referring expression appears before the actual entity it refers to.
Before she entered the room, Alice took a deep breath.---- both she and Alice referes to same person
. Named Entity Resolution
Identifies when different names refer to the same person, place, or entity. Example: "Barack Obama was the
44th U.S. President. Obama was known for his speeches." "Barack Obama" and "Obama" refer to the same
entity.
Bridging Resolution
Identifies implicit relationships between entities. Example: "I bought a book. The cover is beautiful." "The
cover" refers to "the book" without directly mentioning it.
Pronominal anaphora resolution
• Pronominal anaphora resolution is a specific type of coreference
resolution where the goal is to resolve anaphoric pronouns—i.e.,
pronouns that refer back to previously mentioned entities
(antecedents) in a discourse.

•Anaphora: A reference to an earlier expression (the


antecedent).
•Pronominal anaphora: When the anaphor is a pronoun.
Hobb’s Algorithm
Given a pronoun in a sentence:
1.Find the NP node immediately dominating the pronoun.
2.Go up the tree to the first NP or S node encountered. Call this node X.
3.Traverse the tree below X from left to right in a breadth-first manner,
skipping subtrees that dominate the pronoun.
4.At each NP encountered, check for gender/number agreement. The
first match is selected as the antecedent.
5.If no antecedent is found in the current sentence, repeat the
procedure for previous sentences, going left-to-right.
1. Hobb’s Algorithms Step 4: Move to the previous
sentence
• (S (NP She) (VP bought (NP milk)))
Sentence 1: "Priya went to the
Step 1: Find the NP node dominating the store."
pronoun
• “She” is in the NP: (NP She) Step 5:
Step 2: Go up to first NP or S node that Left-to-right, breadth-first search
dominates this NP through previous sentence
•Up to S (entire sentence): 1.NP → Priya
X=S 2.VP → went
Step 3: Traverse all branches under X, left to 3.PP → to → NP → the store
right, skipping any subtree dominating the Check gender/number match:
pronoun •“Priya” is singular, female —
•Skip (NP She) matches “She
•Traverse VP: bought NP → milk
No noun phrases before “She” in this sentence that
2. Centering Theory
Centering Theory (developed by Grosz, Joshi, and Weinstein in the 1980s–1990s) assumes that:
• Each sentence (or utterance) in a discourse has "centers" — entities that the sentence is
about.
• These centers help determine what pronouns refer to, based on discourse coherence and
attention.
Key Terms
1. Forward-looking centers (Cf):
1. The ranked list of entities (NPs) in the current sentence, ordered by salience (subject > object > others).
2. Backward-looking center (Cb):
1. The most salient entity carried over from the previous sentence.
3. Preferred center (Cp):
1. The highest-ranked entity in Cf — usually the subject.
Coreference resolution
• Coreference happens when multiple expressions (like names, noun
phrases, or pronouns) refer to the same entity in a text.
Term Meaning
Refers backward to something already mentioned
Anaphora
(e.g., She → Priya)
Any two expressions referring to the same entity
Coreference
(includes anaphora, cataphora, etc.)
Types of Coreference
• Pronominal coreference
He, she, it, they → a noun phrase
Eg., Aruna went to the park . She enjoyed the fresh air. She  Aruna
• Proper noun coreference
eg. Barack Obama was elected in 2008 . He served 2 times
Barack Obama → the president
• Nominal coreference
The man → the tall guy
• Definite Description Coreference
A dog was barking. The animal seems hungry."The animal" → "dog“
• Demonstrative Coreference This is his passion.“This” → “soccer”
• Indefinite CoreferenceSomeone left their phone on the table. The person might come back for it.“The person” → “someone”
• Bridging Coreference
John bought a car. The engine is powerful.“Engine” is inferred to be related to “car”
• Appositive coreference
Priya, the teacher, → one person
• Split antecedents (harder)
John met Mary. They went to lunch.
Algorithm for CR
1. Preprocessing
• Tokenize the text, POS-tag and syntactically parse (constituency or dependency parse), Identify all
mentions (NPs, pronouns, named entities)
2. Mention Pair Generation
For each pronoun or noun mention 𝑚𝑖​, generate possible antecedents 𝑚𝑗
3. Apply Constraints (Filtering stage)
Eliminate unlikely antecedents using: Gender agreement , Number agreement, Person agreement
(1st/2nd/3rd), Syntactic constraints (e.g., Binding Theory)
4. Apply Preferences (Scoring stage)
Rank remaining candidates using heuristics: Grammatical role (subject > object), Recency (closer is
better), Repeated mention (frequency), Semantic compatibility (e.g., animacy, occupation, etc.), Discourse
salience (Centering Theory)
5. Select Antecedent
Choose the highest-ranked candidate from the list
6. Form Clusters
Group all mentions that corefer into a coreference chain or cluster

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy