0% found this document useful (0 votes)
17 views30 pages

Lec3-posner intro

Uploaded by

pratzohol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views30 pages

Lec3-posner intro

Uploaded by

pratzohol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Part of Speech Tagging and

Named Entity Recognition


Parts of Speech
From the earliest linguistic traditions (Yaska and Panini 5th
C. BCE, Aristotle 4th C. BCE), the idea that words can be
classified into grammatical categories
• part of speech, word classes, POS, POS tags
8 parts of speech attributed to Dionysius Thrax of
Alexandria (c. 1st C. BCE):
• noun, verb, pronoun, preposition, adverb, conjunction,
participle, article
• These categories are relevant for NLP today.
Two classes of words: Open vs. Closed

Closed class words


• Relatively fixed membership
• Usually function words: short, frequent words with
grammatical function
• determiners: a, an, the
• pronouns: she, he, I
• prepositions: on, under, over, near, by, …
Open class words
• Usually content words: Nouns, Verbs, Adjectives, Adverbs
• Plus interjections: oh, ouch, uh-huh, yes, hello
• New nouns and verbs like iPhone or to fax
Open class ("content") words
Nouns Verbs Adjectives old green tasty

Proper Common Main Adverbs slowly yesterday


Janet cat, cats eat
Italy mango went Interjections Ow hello
Numbers
122,312
… more
one
Closed class ("function")
Auxiliary
Determiners the some can Prepositions to with
had
Conjunctions and or Particles off up … more

Pronouns they its


Part-of-Speech Tagging
Assigning a part-of-speech to each word in a text.
Words often have more than one POS.
book:
• VERB: (Book that flight)
• NOUN: (Hand me that book).
Part-of-Speech Tagging
Map from sequence x1,…,xn of words to y1,…,yn of POS tags
"Universal Dependencies" Tagset Nivre et al. 2016
Sample "Tagged" English sentences
There/PRO were/VERB 70/NUM children/NOUN
there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX
reported/VERB in/ADP today/NOUN ’s/PART
New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN
Why Part of Speech Tagging?

◦ Can be useful for other NLP tasks


◦ Parsing: POS tagging can improve syntactic parsing
◦ MT: reordering of adjectives and nouns (say from Spanish to English)
◦ Sentiment or affective tasks: may want to distinguish adjectives or other POS
◦ Text-to-speech (how do we pronounce “lead” or "object"?)
◦ Or linguistic or language-analytic computational tasks
◦ Need to control for POS when studying linguistic change like creation of new
words, or meaning shift
◦ Or control for POS in measuring meaning similarity or difference
How difficult is POS tagging in English?
Roughly 15% of word types are ambiguous
• Hence 85% of word types are unambiguous
• Janet is always PROPN, hesitantly is always ADV
But those 15% tend to be very common.
So ~60% of word tokens are ambiguous
E.g., back
earnings growth took a back/ADJ seat
a small building in the back/NOUN
a clear majority of senators back/VERB the bill
enable the country to buy back/PART debt
I was twenty-one back/ADV then
POS tagging performance in English
How many tags are correct? (Tag accuracy)
◦ About 97%
◦ Hasn't changed in the last 10+ years
◦ HMMs, CRFs, BERT perform similarly .
◦ Human accuracy about the same
But baseline is 92%!
◦ Baseline is performance of stupidest possible method
◦ "Most frequent class baseline" is an important baseline for many tasks
◦ Tag every word with its most frequent tag
◦ (and tag unknown words as nouns)
◦ Partly easy because
◦ Many words are unambiguous
Sources of information for POS tagging
Janet will back the bill
AUX/NOUN/VERB? NOUN/VERB?
Prior probabilities of word/tag
• "will" is usually an AUX
Identity of neighboring words
• "the" means the next word is probably not a verb
Morphology and wordshape:
◦ Prefixes unable: un-  ADJ
◦ Suffixes importantly: -ly  ADV
◦ Capitalization Janet: CAP  PROPN
Standard algorithms for POS tagging
Supervised Machine Learning Algorithms:
• Hidden Markov Models
• Conditional Random Fields (CRF)/ Maximum Entropy Markov
Models (MEMM)
• Neural sequence models (RNNs or Transformers)
• Large Language Models (like BERT), finetuned
All required a hand-labeled training set, all about equal performance
(97% on English)
All make use of information sources we discussed
• Via human created features: HMMs and CRFs
• Via representation learning: Neural LMs
Named Entity Recognition
(NER)
Named Entities
◦ Named entity, in its core usage, means anything that
can be referred to with a proper name. Most common
4 tags:
◦ PER (Person): “Marie Curie”
◦ LOC (Location): “New York City”
◦ ORG (Organization): “Stanford University”
◦ GPE (Geo-Political Entity): “India, Colorado"
◦ Often multi-word phrases
◦ But the term is also extended to things that aren't entities:
◦ dates, times, prices
Named Entity tagging
The task of named entity recognition (NER):
• find spans of text that constitute proper names
• tag the type of the entity.
NER output
Why NER?
Sentiment analysis: consumer’s sentiment toward a
particular company or person?
Question Answering: answer questions about an
entity?
Information Extraction: Extracting facts about
entities from text.
Why NER is hard
1) Segmentation
• In POS tagging, no segmentation problem since each
word gets one tag.
• In NER we have to find and segment the entities!
2) Type ambiguity
BIO Tagging
How can we turn this structured problem into a
sequence problem like POS tagging, with one label per
word?

[PER Jane Villanueva] of [ORG United] , a unit of [ORG


United Airlines Holding] , said the fare applies to the
[LOC Chicago ] route.
BIO Tagging
[PER Jane Villanueva] of [ORG United] , a unit of [ORG United Airlines Holding] ,
said the fare applies to the [LOC Chicago ] route.

Now we have one tag per token!!!


BIO Tagging
B: token that begins a span
I: tokens inside a span
O: tokens outside of any span

# of tags (where n is #entity types):


1 O tag,
n B tags,
n I tags
total of 2n+1
BIO Tagging variants: IO and BIOES
[PER Jane Villanueva] of [ORG United] , a unit of [ORG United Airlines Holding] ,
said the fare applies to the [LOC Chicago ] route.
Standard algorithms for NER
Supervised Machine Learning given a human-
labeled training set of text annotated with tags
• Hidden Markov Models
• Conditional Random Fields (CRF)/ Maximum
Entropy Markov Models (MEMM)
• Neural sequence models (RNNs or Transformers)
• Large Language Models (like BERT), finetuned
Part of Speech Tagging
Techniques
Part-of-Speech Tagging
How hard is the tagging problem?
That:
• as a determiner (followed by a noun):Give me that
hammer.
• as a demonstrative pronoun (without a following
noun):Who gave you that?
• as a conjunction (connecting two clauses):I didn’t
know that she was married.
The number of word types in Brown corpus by degree of • as a relative pronoun (forming the subject, object, or
ambiguity. complement of a relative clause):It’s a song that my
mother taught me.
• Many of the 40% ambiguous tokens are easy to • as an adverb (before an adjective or adverb):Three
years? I can’t wait that long.
disambiguate, because
– Various tags associated with a word are not equally likely, or
event.
– E.g., ‘a’ can be a determiner or a letter (perhaps as part of a
acronym)
• But the determiner sense is much more likely
Part-of-Speech Tagging
Many tagging algorithms fall into two classes:
◦ Rule-based taggers
◦ Involve a large database of hand-written disambiguation rule
specifying, for example, that an ambiguous word is a noun rather
than a verb if it follows a determiner.
◦ Stochastic taggers
◦ Resolve tagging ambiguities by using a training corpus to count the
probability of a given word having a given tag in a given context.
The Brill tagger, called the transformation-based
tagger, shares features of both tagging architecture.
Rule-Based Part-of-Speech Tagging
The earliest algorithms for automatically assigning POS were
based on a two-stage architecture
◦ First, use a dictionary to assign each word a list of potential POS.
◦ Second, use large lists of hand-written disambiguation rules to
winnow down this list to a single POS for each word
The ENGTWOL tagger (1995) is based on the same two stage
architecture, with much more sophisticated lexicon and
disambiguation rules than before.
◦ Lexicon:
◦ 56000 entries
◦ A word with multiple POS is counted as separate entries
Rule-Based Part-of-Speech Tagging
In the first stage of tagger,
◦ each word is run through the two-level lexicon transducer and
◦ the entries for all possible POS are returned.
A set of about 1,100 constraints are then applied to the input sentences to rule out incorrect
POS.
Rule-Based Part-of-Speech Tagging

A simplified version of the constraint:


ADVERBIAL-THAT RULE
Given input: “that”
if
(+1 A/ADV/QUANT); /* if next word is adj, adverb, or quantifier */
(+2 SENT-LIM); /* and following which is a sentence boundary, */
(NOT -1 SVOC/A); /* and the previous word is not a verb like */
/* ‘consider’ which allows adj as object complements */
then eliminate non-ADV tags
else eliminate ADV tags

• It isn’t that odd.


• I considered that odd.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy