Morp
Morp
• Introduction
• Morphology
• Morphological Analysis (MA)
• Using FS techniques in MA
• Automatic learning of the morphology of a
language
NLP Morphology 1
Morphology 2
• Morphology
• Structure of a word as a composition of morphemes
• Related to word formation rules
• Functions
• Inflection
• Derivation
• Composition
• Result of morphologic analysis
• Morphosyntactic categorization (POS)
• e.g. Parole tagset (VMIP1S0), more than 150 categories for Spanish
• e.g. Penn Treebank tagset (VBD), about 30 categories for English
• Morphological features
• Number, case, gender, lexical functions
NLP Morphology 2
Morphology 3
• Morphologic analysis
• Decompose a word into a concatenation of
morphemes
• Usually some of the morphemes contain the meaning
• One (root or stem) in flexion and derivation
• More than one in composition
• The other (affixes) provide morphological features
• Problems
• Phonological alterations in morpheme concatenation
• Morphotactics
• Which morphemes can be concatenated with which others
NLP Morphology 3
Morphology 4
• Problems
• Affixes
• Suffixes, prefixes, infixes, interfixes
• Inflectional affixes ≠ derivational affixes
• Derivation implies sometimes a semantic change not always
predictible
• Meaning extensions
• Lexical rules
• A derivativational suffix can be followed by an inflectional one
• love => lover => lovers
• Inflection does not change POS, sometimes derivation does
• Inflection affects other words in the sentence
• agreement
NLP Morphology 4
Morphology 5
• Morphotactics
• Word formation rules
• Valid combinations between morphemes
• Simple concatenation
• Complex models root/pattern
• Language dependency regularity
• Phonological alterations (Morphophonology)
• Changes when concatenating morphemes
• Source: Phonology, morphology, orthography
• variable in number and complexity
• e.g. vocalic harmony
NLP Morphology 5
Morphology 6
Morphemes
• 1 morpheme:
Evitar ( verb to avoid)
• 2 morphemes:
• evitable = evitar + able (adj: can be avoided)
• 3 morphemes:
• inevitable = in + evitar + able
(adj: cannot be avoided)
• 4 morphemes:
• inevitabilidad = in + evitar + able + idad
(noun: cannot be avoided)
NLP Morphology 6
Morphology 7
Inflectional Morphology
• number
• house houses
• cheval chevaux
• casa casas
• verbal form
• walk walkes walked walking
• amo amas aman ...
• gender
• niño niña
NLP Morphology 7
Morphology 8
Derivational Morphology
• Form
• Without change barcelonés
• Prefix inevitable
• Suffix importantísimo
• Source
• verb => adjective tardar => tardío
• verb => noun sufrir => sufrimiento
• noun => noun actor => actorazo
• noun => adjective atleta => atlético
• adjective => adjective rojo => rojizo
• adjective => adverb alegre => alegremente
NLP Morphology 8
Morphological Analysis 1
Types of morphological analyzers
Formaries
• Dictionaries of word forms
+ efficiency
+ Languages with few variants (e.g. English)
+ extensibility
+ Possibility of building and maintenance from a
morphological generator
– Languages with high flexive variation
– derivation, composition
• FS techniques
• FSA
• 1 level analyzers
• FST
• > 1 level analyzers
NLP Morphology 9
Morphological Analysis 2
NLP Morphology 10
Morphological Analysis 3
• Morphological rules
• Define the relations betweens characters
(surface) and morphemes and map strings of
characters and the morphemic structure of the
word.
• Spelling rules
• Perform at the level of the letters forming the
word. Can be used to define the valid
phomological alterations.
• Ritchie, Pulman, Black, Russell, 1987
NLP Morphology 11
Morphological Analysis 4
• input:
• form
• output
• lemma + morphological features
Input Output
cat cat + N + sg
cats cat + N + pl
cities city + N + pl
merging merge + V + pres_part
caught (catch + V + past) or (catch + V + past_part)
NLP Morphology 12
Morphological Analysis 5
0 1 2
irreg_pl_noun
Morphotactics
irreg_sg_noun
NLP Morphology 13
Morphological Analysis 6
o
f
x
a
c t s
o
d g
ε
fog y
n
m e
cat e
e
dog o s
donkey u
mouse i
c
mice
Letter Transducers
NLP Morphology 14
Morphological Analysis 7
NLP Morphology 15
Morphological Analysis 8
Using FST
• As a recognizer
• From a pair of input strings (one lexical and the other
superficial) determines if one is transduction of the other
• As a generator
• Generates pairs of strings
• As a translator
• From a superficial string generates its lexical translation
NLP Morphology 16
Morphological Analysis 9
reg_noun irreg_pl_noun irreg_sg_noun plural
fox sheep sheep s
cat m o:i u:ε ce mouse
dog g o:e o:e se goose
reg_noun +pl:s
+N:ε
0 irreg_sg_noun 1 4 2
+sg:ε
+N:ε
2 5 +sg:ε
NLP Morphology 17
Morphological Analysis 10
NLP Morphology 18
Morphological Analysis 11
o
f
x
a
c t +pl:^s
+N:ε
o
d g
+sg:ε
y
n
m e
e +sg:ε
fog o s
cat u e +pl:ε
dog o:i +N:ε
donkey +u:ε c
mouse e
mice +N:ε
NLP Morphology 19
Morphological Analysis 12
Spelling rules
NLP Morphology 20
Morphological Analysis 13
Spelling rules: e-insertion
⇒ decomposition
/⇐
NLP Morphology 21
Morphological Analysis 14
epenthesis
+:e <=> {< {s:s c:c} h:h> s:s x:x z:z} --- s:s
context
<=>
C: {...}
=> context restriction
V: {a,e,i,o,u,y}
<= surface coercion
C2: {...}
=: whatever
example: box + s
box e s
NLP Morphology 22
Morphological Analysis 15
e-deletion
agre e + ed
agre ed
NLP Morphology 23
Morphological Analysis 16
a-deletion
redu c e + a t ion
redu c t ion
... left context focus right context ...
NLP Morphology 24
Morphological Analysis 17
Lexicon-FST
intermediate level f o x ^ s
superficial level f o x e s
NLP Morphology 25
Morphological Analysis 18
intersection composition
NLP Morphology 26
Automatic morphology learning 1
• Problem
• Paradigm stem + affixea
• Obtaining the stems
• Classification of stems into models
• Learning part of the morphology (e.g. derivational)
• Two approaches
• No previous morphologic knowledge is available
• Goldsmith, 2001
• Brent, 1999
• Snover, Brent, 2001, 2002
• Morphologic knowledge can be used
• Oliver at al, 2002
NLP Morphology 27
Automatic morphology learning 2
NLP Morphology 28
Automatic morphology learning 3
NLP Morphology 29
Automatic morphology learning 4