0% found this document useful (0 votes)

52 views118 pages

Morphological Analysis

Uploaded by

rajputakashchand4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views118 pages

Morphological Analysis

Uploaded by

rajputakashchand4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 118

Morphology

• Morphology is the study of words

• Study's internal structure of words

• How words change their form to generate new word

• Different role they play in sentence ,strictly following linguistic rule.

Morphemes

• words are built from smaller meaningful grammatical units

called morphemes.
• dogs
• 2 morphemes, ‘dog’ and ‘s’
• ‘s’ is a plural marker on nouns
Morpheme

• A Smallest unit carries meaning

• While doing analysis of a word, we split the given word into parts.
• Each part is called a morpheme.
• Eg.:
• uneducated –>
• un + educate + d.

• it has three morphemes.

Free Morpheme
◦ Can appear as a word by itself; often can combine with other morphemes too.

◦ house (house-s), walk (walk-ed), of, the, or

◦ Independent/can stand by themselves as single words.

◦ When combined with bound morphemes

◦ free morphemes are called stems/root.

Lexical Morpheme

◦ Visualize/Content word
◦ Nouns –Glass
◦ Adjectives -Black,
◦ Verbs-Dance,
◦ Adverb - beautifully.
◦ ‘open' class of morphemes because we can add new words to the
language easily
◦ Eg.(girl, play, google, e-mail, blog).
Grammatical Morpheme

◦ Also called as Functional morphemes

◦ consist of functional/Grammer words

◦ Conjunctions- And/Or
◦ Prepositions-On/Under/At
◦ Articles-An/The
◦ Pronouns- She/He.

◦ Limited words

◦This is a ‘closed' class of morphemes

Bound Morpheme
◦ Cannot appear as a word by itself.
◦ -s (dog-s), -ly (quick-ly), -ed (walk-ed)
◦ cannot stand alone
◦ Not complete in themselves
◦ Depends on free Morpheme
◦ Affixes are bound morpheme
◦ typically attached to free morpheme.
◦ They can prefixes, Infix and suffixes (re-, un-, dis-, pre-, -ness, -less, -ly).
Prefix
◦ Attached before lexical morpheme
◦ Be-come
◦ Un-happy
Infix
• unladylike
• 3 morphemes
• un- ‘not’
• lady ‘well-behaved woman’
• -like ‘having the characteristic of’
Suffix

◦ Words attached after root

◦ Teach-er
◦ Use-less
Suffix
◦ 8 Suffixes
◦ Noun
1. Plural – Book->Books
2. Possessional- Ram’s
◦ Adjective
1. er – Tall->Taller
2. est- Tall->Tallest
◦ Verb
1. S= Play->Plays
2. ed= Play->Played
3. Ing= Play->Playing
4. en= Brake->Broken
Derivational Morpheme
◦ make new words of a different grammatical category from a stem
◦ Class Changing
◦ Care careful
(Verb) (Adjective)

◦ Care carelessly
(Verb) (Adverb)

◦ Class Maintaining
◦ Child Childhood
(Noun) (Noun)
Inflectional Morpheme

◦ Morpheme when attached to root doesn’t change its class

◦ There are eight inflectional morphemes in English.

◦ They are all suffixes.

Regular Inflectional Morpheme

◦ Refers to those inflections which follows a standard pattern

◦ Eg.

◦ Dance+ed->Danced
◦ Walk+ed->Walked
◦ Boy+s->Boys
Irregular Inflectional Morpheme

◦ Refers to those inflections which does not follow a standard pattern

◦ Eg.
◦ Wife+s - > Wives
◦ Child+s -> Children
◦ Mouse+s -> Mice
Suppletion Inflectional Morpheme

◦ Completely change the morpheme

◦ Occurrence of phonemically unrelated morpheme

◦ Eg.
◦ Go+ed->Went
◦ Good+er->Better
◦ am+ed->was
Identify Inflectional or Derivational
Morphology

• I jumped into the puddle this

morning
• This is unbelievable
• I have john’s umbrella
• Emma goes to school
• She is working carelessly
Identify Inflectional or Derivational
Morphology

• I jumped into the puddle this • Inflectional Morphology

morning
• This is unbelievable • Derivational Morphology
• I have john’s umbrella • Inflectional Morphology
• Emma goes to school
• She is working carelessly • Inflectional Morphology
• Derivational Morphology
Regular Expressions
Some Questions on Regular Expression.....
What is Regular Expression?

What does the command ab+c search for?

• ac,abc,abbc, and so on
• ab,abc,abcc and so on
• abc,abbc,abbbc and so on
• None of the above

The number of matches does the command a{1,3} give with the string aabbaaaa?
• 3
• 2
• 1
• 4
What is the output of the below code?
print(re.match(‘On’,”one”))

• 'e'
• Match Object
• ‘n’
• None

• What does the sequence \D finds the match?

Decimal digits
Non-decimal digits
Division
None of the above

• Which of the following command is used to search a match for 1,2,3,4?

[1-4]
(1-3)
[1234]
Both a and c
•What is the output of the below code?
re.sub(‘a’,’u’,’aeiou!’)
• 'ueiou!'
• 'eiou!'
• 'eio!'
• None of the above
What is Regular Expression?
A Formal Language for specifying text search strings

A regular expression is an algebraic notation for characterizing a set of

strings.

Regular expression search function will search through the corpus,

returning all texts that match the pattern or the first match.

The corpus can be a single document or a collection

Regular expressions
• The simplest kind of regular expression is sequence of simple characters

• A formal language for specifying text strings

• How can we search for any of these?

• woodchuck
• woodchucks
• Woodchuck
• Woodchucks

• Regular expressions are case sensitive

• All the above are different regular expression

Regular Expression
Regular Expressions: Disjunctions
• Letters inside square brackets []
Pattern Matches
[wW]oodchuck Woodchuck, woodchuck
[1234567890] Any digit

• Ranges [A-Z]

Pattern Matches
[A-Z] An upper case letter Drenched Blossoms
[a-z] A lower case letter my beans were impatient
[0-9] A single digit Chapter 1: Down the Rabbit Hole
Regular Expressions: Negation in
Disjunction

• Negations [^Ss]
• Carat means negation only when first in []
Pattern Matches
[^A-Z] Not an upper case Oyfn pripetchik
letter
[^Ss] Neither ‘S’ nor ‘s’ I have no exquisite
reason”
[^e^] Neither e nor ^ Look here
a^b The pattern a carat b Look up a^b now
Regular Expressions: More
Disjunction
• Woodchuck is another name for groundhog!
• The pipe | for disjunction

Pattern Matches
groundhog|woodchuck woodchuck
yours|mine yours
a|b|c = [abc]
[gG]roundhog|[Ww]oodchuck Woodchuck
Regular Expressions: ? *+.
Pattern Matches
colou?r Optional color colour
previous char
oo*h! 0 or more of oh! ooh! oooh! ooooh!
previous char
o+h! 1 or more of oh! ooh! oooh! ooooh!
previous char
baa+ baa baaa baaaa baaaaa
Stephen C Kleene
beg.n begin begun begun
beg3n Kleene *, Kleene +
Regular Expressions: Anchors ^ $
Pattern Matches
^[A-Z] Palo Alto
^[^A-Za-z] 1 “Hello”
\.$ The end.
.$ The end? The end!

993, 99
If your pattern is 99 it will be matched to both above string if you want to match only 99 you can use \b99\b
Example
• Find me all instances of the word “the” in a text.
the
Misses capitalized examples
[tT]he
Incorrectly returns other or theology
[^a-zA-Z][tT]he[^a-zA-Z]
Errors
• The process we just went through was based on
fixing two kinds of errors:

1. Matching strings that we should not have matched

(there, then, other)
False positives (Type I errors)

2. Not matching things that we should have matched (The)

False negatives (Type II errors)
Errors cont.
• In NLP we are always dealing with these kinds of
errors.
• Reducing the error rate for an application often
involves two antagonistic efforts:
• Increasing accuracy or precision (minimizing false
positives)
• Increasing coverage or recall (minimizing false negatives).
Summary
• Regular expressions play a surprisingly large role
• Sophisticated sequences of regular expressions are often
the first model for any text processing text
• For hard tasks, we use machine learning classifiers
• But regular expressions are still used for pre-processing,
or as features in the classifiers
• Can be very useful in capturing generalizations

41
More Regular Expressions:
Substitutions and ELIZA
Substitutions
• Substitution in Python and UNIX commands:

• s/regexp1/pattern/
• e.g.:
• s/colour/color/
Capture Groups
• Say we want to put angles around all numbers:
the 35 boxes  the <35> boxes
• Use parens () to "capture" a pattern into a numbered register (1, 2,
3…)
• Use \1 to refer to the contents of the register
s/([0-9]+)/<\1>/
Capture groups: multiple registers
• /the (.*)er they (.*), the \1er we \2/
• Matches
• the faster they ran, the faster we ran
• But not
• the faster they ran, the faster we ate
But suppose we don't want to
capture?
Parentheses have a double function: grouping terms, and
capturing
Non-capturing groups: add a ?: after paren:
• /(?:some|a few) (people|cats) like some \
1/
• matches
• some cats like some cats
• but not
• some cats like some some
Lookahead assertions
• (?= pattern) is true if pattern matches, but is zero-width; doesn't
advance character pointer
• (?! pattern) true if a pattern does not match
• How to match, at the beginning of a line, any single word that doesn’t
start with “Volcano”:
• /ˆ(?!Volcano)[A-Za-z]+/
Simple Application: ELIZA
• Early NLP system that imitated a Rogerian psychotherapist
• Joseph Weizenbaum, 1966.

• Uses pattern matching to match, e.g.,:

• “I need X”
and translates them into, e.g.
• “What would it mean to you if you got X?
Simple Application: ELIZA
Men are all alike.
IN WHAT WAY
They're always bugging us about something or other.
CAN YOU THINK OF A SPECIFIC EXAMPLE
Well, my boyfriend made me come here.
YOUR BOYFRIEND MADE YOU COME HERE
He says I'm depressed much of the time.
I AM SORRY TO HEAR YOU ARE DEPRESSED
How ELIZA works
• s/.* I’M (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \
1/
• s/.* I AM (depressed|sad) .*/WHY DO YOU THINK YOU ARE \
1/
• s/.* all .*/IN WHAT WAY?/
• s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE?/
Write the regular expression for the following

• Write the regular expression that matches a string that has an a followed by zero or more b's

• Write the regular expression for given language L = {E, a, aa, b,bb,ab,ba, aba, bab,.....},any combination
of a and b.

• Write the regular expression for the language L = {a, aba, aab, aba, aaa, abab, .....}

• Write the regular expression for the language L = {a, aa,aaa, ....}

• Write the regular expression for the language L = {E, 0, 1,00, 11,10,100,.....}

• Write the regular expression for the language accepting all the string which are starting with 1 and
ending with 0, over ∑ = {0, 1}.
Write the regular expression for the following

• Write the regular expression that matches a string that has an a followed by zero
or more b's R=ab*
• Write the regular expression for given language L = {E, a, aa, b,bb,ab,ba, aba,
bab,.....},any combination of a and b. Answer:-Solution
• The regular expression will be −(a + b)*
• Write the regular expression for the language L = {a, aba, aab, aba, aaa,
abab, .....} R = {a + ab}*
• Write the regular expression for the language L = {a, aa,aaa, ....} R = a+
• Write the regular expression for the language L = {E, 0, 1,00, 11,10,100,.....}
• R = (1* O*)
• Write the regular expression for the language accepting all the string which are
starting with 1 and ending with 0, over ∑ = {0, 1}. R = 1 (0+1)* 0
Finite State Automata

• Finite Automata(FA) is the simplest machine to recognize patterns.

• The finite automata or finite state machine is an abstract machine that has five
elements or tuples.

• It has a set of states and rules for moving from one state to another, but it
depends upon the applied input symbol.

• Basically, it is an abstract model of a digital computer.

• Finite automata have two states, Accept state or Reject state.

• When the input string is processed successfully, and the automata reached its final
state, then it will accept.
•The above figure shows the
following features of
automata:

•Input
•Output
•States of automata
•State relation
•Output relation
A Finite Automata consists of the following:
Q : Finite set of states.
Σ : set of Input Symbols.
q : Initial state.
F : set of Final States.
δ : Transition Function.
Deterministic Finite Automata (DFA):

DFA consists of 5 tuples {Q, Σ, q, F, δ}.

Q : set of all states.
Σ : set of input symbols. ( Symbols which machine takes as
input )
q : Initial state. ( Starting state of a machine )
F : set of final state.
• δ : Transition Function, defined as δ : Q X Σ --> Q.

• In a DFA, for a particular input character, the machine goes to one state only.

• A transition function is defined on every state for every input symbol.

For example, below DFA with Σ = {0, 1}
accepts all strings ending with 0.
State
Transition
Diagram
Draw a deterministic finite automate which accept 00 and 11 at the
end of a string containing 0, 1 in it, e.g., 01010100 but not
000111010.
• Finite-state can capture the generalization here:
• Eg.
• I eat Sushi
• Ram like Mango
• Noun+ Verb Noun+
Language is recursive
§the ball
§ the ball
§ the ball in the garden
§ the big ball
§ the ball in the garden behind the
§ the big, red ball
house
§ the big, red, heavy ball
§ the ball in the garden behind the
house next to school
Morphological Parsing
Morphological parsing is to find the lexical form of a word
from its surface form.
Stem-----Prefix+Stem+Suffix

•
–cats ------------ cat +N +PLU
–cat ------------- cat +N +SG
Surface Form
–goose ------------goose +N +SG
–geese ------------goose +N +PLU Lexical Form
–catch ------------catch +V
–caught -----------catch +V +PAST
Parts of A Morphological
Processor
•Lexicon: The list of stems
with basic information about
categories (noun, verb, adjective, …)
sub-categories (regular noun, irregular noun, …)

•Morphotactics: Explains ordering -which classes of morphemes can

follow other classes of morphemes inside a word.

•Orthographic Rules (Spelling Rules): These spelling rules are used to

model changes that occur in a word (normally when two morphemes combine).
Lexicon
•A lexicon is a repository for words (stems).

•They are grouped according to their main categories.

–noun, verb, adjective, adverb, …

•They may be also divided into sub-categories.

–regular-nouns, irregular-singular nouns, irregular-plural nouns

•The simplest way to create a morphological parser, put all possible words
(together with its inflections) into a lexicon.
Morphotactics
We cannot find Say What is Lemma for the word Boys.
Finite state transducers
• A finite state transducer essentially is a finite state automaton that works on two (or
more) tapes. The most common way to think about transducers is as a kind of
``translating machine''.

• They read from one of the tapes and write onto the other. This, for instance, is a
transducer that translates as into bs:
Finite state transducers
• A finite state transducer (FST) is a finite state machine where transitions are conditioned on a
pair of symbols
• The machine moves between the states based on input symbol, while it outputs
the corresponding output symbol
• An FST encodes a relation, a mapping from a set to another . The relation defined by an FST
is called a regular (or rational) relation
Two-Level Morphology
Two-level morphology represents the correspondence between lexical and
surface levels.
•We use a finite-state transducer to find mapping between these two levels.
•A FST is a two-tape automaton:
–Reads from one tape and writes to other one.
•For morphological processing, one tape holds lexical representation, the second
one holds the surface form of a word.
Morphological segmentation (or
Stemming)
• Taking a surface input and breaking it down into its
morphemes
• cat- cat+N+SL
• cats- cat+N+PL
• Plays- Play+V+3Singular
• Played- Play+V+ PastParticipat
• foxes – fox+N+PL
+Sg

#
+Sg

#
• For each spelling rule we will have a FST, and these FSTs run parallel .
Orthographic Rules
•For each spelling rule we will have a FST, and these FSTs run parallel.
•We represent these rules using two-level morphology rules:

a => b / c __ d

rewrite a as b when it occurs between c and d.

English Spelling Rules:

––E insertion --e added after s, z, x, ch, sh before s --watch/watches
Stemming
• Stemming is suffix stripping operation
• Process of reducing word into its base form (Root form/stem form)
• This is achieved by cutting of begging or end of the word

• Eg:
• Playing- Play
• Boys- Boy

• Popular Algorithm is Porter Stemmer

Porter Stemmer

– Five sets of rules, applied in order

– Within each set, if more than one of the rules can apply,
only the one with the longest matching suffix (S1) is
followed

Advantage: easy to see understand, easy to implement.

Convention

• Consonant( C ): other than A, E, I, O or U, and other

than Y preceded by a consonant.

• So in TOY the consonants are T and Y

•Vowel(V) : Any other letter.

•Any word in English has this forms:
•[C](VCm)[V]
•[]denotes arbitrary presence of content
•C-Consonant
•V-Vowel
•m will be called the measure of any word or word part
•m=0 TR, EE, TREE, Y, BY.
•m=1 TROUBLE, OATS, TREES, IVY.
•m=2 TROUBLES, PRIVATE, OATEN, ORRERY.
◦RULES
◦The rules for removing a suffix will be given in the form

(condition) S1 -> S2

if a word ends with the suffix S1 will be replaced by S2, if it

satisfies the given condition.
◦ e.g. (m > 1) EMENT -> Є

Here S1 is 'EMENT' and S2 is null.

◦REPLACEMENT to REPLAC,
◦since REPLAC is a word part for which m = 2.
Condition Meaning
part
*S the stem ends with S (and similarly for the other letters).

v the stem contains a vowel.

m=2 TROUBLES, PRIVATE, OATEN, ORRERY.

*d the stem ends with a double consonant (e.g. -TT, -SS).

*o the stem ends cvc, where the second c is not W, X or Y (e.g. -

WIL, -HOP).

E.g. (m>1 or *S)

PORTER STEMMER

Step 1a :
•sses -> ss (Example : caresses -> caress)
•ies -> i (Example : ponies -> poni ; ties -> ti)
•ss -> ss (Example : caress -> caress)
•s ->Є (Example : cats -> cat)
Step 1b :
•(m>0) eed -> ee (Example : agreed -> agree; feed -> feed )
•(*v*) ed -> є (Example : plastered -> plaster ; bled -> bled)
•(*v*) ing -> є (Example : motoring -> motor ; sing -> sing)
•s -> є (Example : cats -> cat)
If the second or third of the rules in Step 1b is successful, the
following is done: Cleaning Step
•at -> ate (Example : conflat(ed) -> conflate)
•bl -> ble (Example : troubl(ed) -> trouble)
•iz -> ize (Example : siz(ed) -> size)
•s -> є (Example : cats -> cat)
•(*d &! (*l or *s or *z)) -> single letter
• (Example : hopp(ing) -> hop ; tann(ed) -> tan ; fall(ing) ->
fall ; hiss(ing) -> hiss ; fizz(ed) -> fizz)
•(m=1 and *o) -> e
• (Example : fil(ing) -> file); fail(ing) -> fail
Step 1c : Y Elimination
( \*v\*) y -> i (Example : happy -> happi ; sky -> sky)
Step 1 deals with plurals and past participles. The subsequent
steps are much more straightforward.

Step 2 :Derivational Morphology -1

•(m>0) ational -> ate (Example : relational -> relate
•(m>0) ization -> ize (generalization -> generalize)
•(m>0) biliti -> ble (sensibiliti -> sensible)
Step 3 :

•(m>0) icate -> ic (Example : triplicate -> triplic)

•(m>0) ful -> є (Example : hopeful -> hope)
•(m>0) ness -> є (Example : goodness -> good)
Step 4 :Derivational Morphology-II

•( (m>1) ance -> є (Example : allowance -> allow)

•(m>1) ment -> є(Example : adjustment -> adjust)
•(m>1) ent -> є (Example : dependent -> depend)
•(m>1) ive -> є (Example : effective -> effect)
The suffixes are now removed. All that remains is a little tidying up.

Step 5a :
•(m>1) e -> є (Example : probate -> probat ; rate -> rate)
•(m=1 and not *o) ness -> є (Example : goodness -> good)
Step 5b
(m > 1 and *d &*l) -> single letter

(Example : controll -> control ; roll -> roll)

Disadvantage

• This indiscriminate cutting will be successful at some occasion and fail in some

• Eg.
• Studies- Studi
• Giving- Giv
• Intelligence-Intelligen
• Produced immediate representation of the word may not have any meaning
Stemming
Lemmatization
• Cutting of Suffixes to extract stem • Cutting of Suffixes to extract Lemma

• Eg. • Eg.
• Studies- Studi
• Studies- Study
• Giving- Giv
• Giving- Give
• Caring- Car
• Caring- Care
• Produced immediate representation of the
word may not have any meaning ◦ Produced immediate representation of the
word having meaning
Lemmatization
• Same as Stemming but immediate representation have some meaning
• Stemming and lemmatization both of these concepts are used to normalized the given word by removing affixes
Stemming Lemmatization

Root word is called Stem Root word is called Lemma

It needs POS tagging

It does not need (Part of Speech) POS tagging

Lemmatization requires the context of the word in

Stemming does not require knowledge of the context
the sentence

Stemming requires less computational power Lemmatization requires more computational power

Stemming is not used to make dictionary or Lemmatization concept is used to make dictionary
WordNet kind of dictionary. or WordNet kind of dictionary.
Eg. Caring-car Eg. Caring-care
Application: Sentiment Analysis, Document Application: Sentiment Analysis, Document
Clustering, Machine Translation Clustering, Machine Translation
N-Gram Language Model
• Predicting Nth word from N-1 words
• Predicting 3rd word from previous 2 words – Model is called Trigram
• Predicting 2nd word from previous 1 words – Model is called Bigram
•
1. He is going to _____  Predicting 5th word from last four words is 5 gram
Language Model
2. going to _____  Predicting 3rd word from last two words is trigram
Application of N-Gram
1. Optical Character Recognition
• Image  Text
• If words are missing or not clear can be predicted
2. Grammar correction
• Spelling is not correct then based on context can be suggested
• If spelling is right
• E.g. Deer sir instead of Dear sir can be correctly as, Sir is normally followed by Dear not
Deer
3. Speech to Text
• Words with different pronunciation. e.g. Eye am Fine & I am Fine
4. Translation
• Multiple Synonym of words
• E.g., He is biggest minister of Pakistan  Prime
5. Suggestion while typing in mobile text prediction mode.
• He is going to school Meaningful

• He school is to going Doesn’t Make sense

For N-gram models

P(w1,w2, ...,wn)

By the Chain Rule we can decompose a joint probability, as follows:

P(w1,w2, ...,wn) = P(w1) P(w2|w1) P(w3|w1w2 ) P(wn|w1..., Wn-2 wn-1)

Join Probability of sentence

P(He is going to school)= P(He)P(is|He)P(going|He is)P(to|He is going)P(school|He is going to)

Looking back so much

Word Level Analysis NLP Mod 2
No ratings yet
Word Level Analysis NLP Mod 2
18 pages
NLP MODULE-2 Final
No ratings yet
NLP MODULE-2 Final
114 pages
2-Regular Expressions, Text Normalization, Edit Distance
No ratings yet
2-Regular Expressions, Text Normalization, Edit Distance
42 pages
Text Proc
No ratings yet
Text Proc
55 pages
Chapter Three Words and Transducers
No ratings yet
Chapter Three Words and Transducers
56 pages
2 Text Processing
No ratings yet
2 Text Processing
58 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
Basic Text Processing: Regular Expressions and Text Normalization
No ratings yet
Basic Text Processing: Regular Expressions and Text Normalization
53 pages
Basic Text Processing: Regular Expressions and Text Normalization
No ratings yet
Basic Text Processing: Regular Expressions and Text Normalization
53 pages
Upto Morphological Parsing
No ratings yet
Upto Morphological Parsing
19 pages
2 TextProc 2023
No ratings yet
2 TextProc 2023
74 pages
10 FST
No ratings yet
10 FST
26 pages
Regular Expression and BPE
No ratings yet
Regular Expression and BPE
68 pages
Week 2
No ratings yet
Week 2
90 pages
3b TextProcessing
No ratings yet
3b TextProcessing
32 pages
NLP Lect-5 02.02.21
No ratings yet
NLP Lect-5 02.02.21
18 pages
NLP Lect-6 03.02.21
No ratings yet
NLP Lect-6 03.02.21
17 pages
Mod 2
No ratings yet
Mod 2
49 pages
02 Textprocessingboth
No ratings yet
02 Textprocessingboth
46 pages
2 TextProc Mar 25 2021
No ratings yet
2 TextProc Mar 25 2021
71 pages
Agriculture Research Proposal
100% (7)
Agriculture Research Proposal
10 pages
Lec08 09 FSA For Morphological Parsig and Generation
No ratings yet
Lec08 09 FSA For Morphological Parsig and Generation
40 pages
NLP Module 2 - 1
No ratings yet
NLP Module 2 - 1
86 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Unit3 - Morphology and Finite State Transducers
100% (1)
Unit3 - Morphology and Finite State Transducers
55 pages
2-Introduction To Language Engineering - Part2
No ratings yet
2-Introduction To Language Engineering - Part2
26 pages
Chapter 1 + 2
No ratings yet
Chapter 1 + 2
9 pages
4.word Level Analysis-Regular Expression
No ratings yet
4.word Level Analysis-Regular Expression
8 pages
Regular Expression
No ratings yet
Regular Expression
29 pages
Unit 2
No ratings yet
Unit 2
20 pages
01 Regular Expressions 11-25
No ratings yet
01 Regular Expressions 11-25
5 pages
Natural Langauge Processsing Unit 2
No ratings yet
Natural Langauge Processsing Unit 2
16 pages
Grading: Final Term: 40 % Term Paper: 30% Assignments and Quizzes: 30%
No ratings yet
Grading: Final Term: 40 % Term Paper: 30% Assignments and Quizzes: 30%
46 pages
02 Text Processing - Regular Expressions-Text Normalization
No ratings yet
02 Text Processing - Regular Expressions-Text Normalization
58 pages
Lexical Analysis - Morphological Analysis
No ratings yet
Lexical Analysis - Morphological Analysis
9 pages
Module 2
No ratings yet
Module 2
78 pages
Module 2 Chap1
No ratings yet
Module 2 Chap1
92 pages
Lect2 Regular Expressions
No ratings yet
Lect2 Regular Expressions
41 pages
Regular Expressions, Tok-Enization, Edit Distance
No ratings yet
Regular Expressions, Tok-Enization, Edit Distance
29 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
Wordlevel Analysis - Chap2
No ratings yet
Wordlevel Analysis - Chap2
97 pages
Regular Expressions, Text Normalization, Edit Distance
No ratings yet
Regular Expressions, Text Normalization, Edit Distance
30 pages
Module2 NLP BAD613B Notes
100% (1)
Module2 NLP BAD613B Notes
16 pages
NLP - Sem
No ratings yet
NLP - Sem
31 pages
3-Regular Expressions
No ratings yet
3-Regular Expressions
34 pages
NLP Practice Problems
No ratings yet
NLP Practice Problems
48 pages
Ghar Ki Baat Ghar Me Hi Rehne Do - Part 1 - Desi Kahani
50% (2)
Ghar Ki Baat Ghar Me Hi Rehne Do - Part 1 - Desi Kahani
6 pages
IS 7118 Unit-2 Regular Expressions
No ratings yet
IS 7118 Unit-2 Regular Expressions
69 pages
2 Regular Expressions
No ratings yet
2 Regular Expressions
34 pages
3 Regular Expression
No ratings yet
3 Regular Expression
15 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Chapter Three Regular Expressions and Finite-State Automata
No ratings yet
Chapter Three Regular Expressions and Finite-State Automata
19 pages
Natural Language Processing - Session 3 - Regular Expressions
No ratings yet
Natural Language Processing - Session 3 - Regular Expressions
39 pages
Module II
No ratings yet
Module II
47 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Regular Expressions, Text Normalization, Edit Distance
No ratings yet
Regular Expressions, Text Normalization, Edit Distance
23 pages
Management Accounting 2marks Solved (2014-2021)
No ratings yet
Management Accounting 2marks Solved (2014-2021)
12 pages
Regex
No ratings yet
Regex
24 pages
Regular Expression Overview
No ratings yet
Regular Expression Overview
5 pages
J. H. Wells, L. R. Williams Auth. Embeddings and Extensions in Analysis PDF
100% (1)
J. H. Wells, L. R. Williams Auth. Embeddings and Extensions in Analysis PDF
116 pages
Catalogo Hiab 122
No ratings yet
Catalogo Hiab 122
4 pages
Arihant (Madam Rides The Bus)
No ratings yet
Arihant (Madam Rides The Bus)
8 pages
Mitosis
No ratings yet
Mitosis
15 pages
Atwood - 1984 - Molten Salt Technology
100% (1)
Atwood - 1984 - Molten Salt Technology
536 pages
Epp
100% (1)
Epp
2 pages
Standard American Accent Worksheets
No ratings yet
Standard American Accent Worksheets
10 pages
Instruction Manual Fieldvue dvc2000 Digital Valve Controller Fisher en 135208
No ratings yet
Instruction Manual Fieldvue dvc2000 Digital Valve Controller Fisher en 135208
80 pages
Success Against The Odds
No ratings yet
Success Against The Odds
194 pages
3 Listening Subskills Which Are Key For Learners
No ratings yet
3 Listening Subskills Which Are Key For Learners
2 pages
BARRIERS TO GIRLS EDUCATION IN - SOUTH CENTRAL SOMALIA Annex 1
No ratings yet
BARRIERS TO GIRLS EDUCATION IN - SOUTH CENTRAL SOMALIA Annex 1
34 pages
Tube Stube Settlers
No ratings yet
Tube Stube Settlers
9 pages
Library Manager
No ratings yet
Library Manager
20 pages
Zamoras Vs Su Case Digest
No ratings yet
Zamoras Vs Su Case Digest
1 page
Tale of High Elf and Futa Oni
No ratings yet
Tale of High Elf and Futa Oni
1 page
COVID Related Essays
No ratings yet
COVID Related Essays
13 pages
Global Marketing
No ratings yet
Global Marketing
9 pages
Emerging Trends in Civil Engg
No ratings yet
Emerging Trends in Civil Engg
7 pages
DSCP & Vlan Priority
No ratings yet
DSCP & Vlan Priority
13 pages
Seismic Analysis of A Reinforced Concrete Building by Response Spectrum Method
No ratings yet
Seismic Analysis of A Reinforced Concrete Building by Response Spectrum Method
10 pages
A Review On The Ayurvedic Management of Causes and Symptoms of Bronchial Asthma
No ratings yet
A Review On The Ayurvedic Management of Causes and Symptoms of Bronchial Asthma
8 pages
Compilation - Stamp Duty - Lease Deed
No ratings yet
Compilation - Stamp Duty - Lease Deed
7 pages
Beyond The Oedipus Complex
No ratings yet
Beyond The Oedipus Complex
16 pages
Heat Exchanger Formulas
No ratings yet
Heat Exchanger Formulas
2 pages
Decision For Supply Material Deployment OPGW in January 2023
No ratings yet
Decision For Supply Material Deployment OPGW in January 2023
2 pages
Jesse
No ratings yet
Jesse
4 pages
Biju Expence Details
No ratings yet
Biju Expence Details
2 pages
Exercises in Speaking English
From Everand
Exercises in Speaking English
A. G. Schopf A.A. B.A.
No ratings yet
Across and Down: The ABC's of Solving Crossword Puzzles
From Everand
Across and Down: The ABC's of Solving Crossword Puzzles
Adrienne Cadik
No ratings yet
Spanish Simple & Easy!
From Everand
Spanish Simple & Easy!
D. DeJuan Turner
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Morphological Analysis

Uploaded by

Morphological Analysis

Uploaded by

Morphology

• Morphology is the study of words

• Study's internal structure of words

• How words change their form to generate new word

• Different role they play in sentence ,strictly following linguistic rule.

• words are built from smaller meaningful grammatical units

• A Smallest unit carries meaning

• it has three morphemes.

◦ house (house-s), walk (walk-ed), of, the, or

◦ Independent/can stand by themselves as single words.

◦ When combined with bound morphemes

◦ free morphemes are called stems/root.

◦ Also called as Functional morphemes

◦This is a ‘closed' class of morphemes

◦ Words attached after root

◦ Morpheme when attached to root doesn’t change its class

◦ There are eight inflectional morphemes in English.

◦ They are all suffixes.

◦ Refers to those inflections which follows a standard pattern

◦ Refers to those inflections which does not follow a standard pattern

◦ Completely change the morpheme

◦ Occurrence of phonemically unrelated morpheme

• I jumped into the puddle this

• I jumped into the puddle this • Inflectional Morphology

What does the command ab+c search for?

• What does the sequence \D finds the match?

• Which of the following command is used to search a match for 1,2,3,4?

A regular expression is an algebraic notation for characterizing a set of

Regular expression search function will search through the corpus,

The corpus can be a single document or a collection

• A formal language for specifying text strings

• How can we search for any of these?

• Regular expressions are case sensitive

• All the above are different regular expression

1. Matching strings that we should not have matched

2. Not matching things that we should have matched (The)

• Uses pattern matching to match, e.g.,:

• Finite Automata(FA) is the simplest machine to recognize patterns.

• Basically, it is an abstract model of a digital computer.

• Finite automata have two states, Accept state or Reject state.

DFA consists of 5 tuples {Q, Σ, q, F, δ}.

• A transition function is defined on every state for every input symbol.

•Morphotactics: Explains ordering -which classes of morphemes can

•Orthographic Rules (Spelling Rules): These spelling rules are used to

•They are grouped according to their main categories.

•They may be also divided into sub-categories.

rewrite a as b when it occurs between c and d.

English Spelling Rules:

• Popular Algorithm is Porter Stemmer

– Five sets of rules, applied in order

Advantage: easy to see understand, easy to implement.

• Consonant( C ): other than A, E, I, O or U, and other

• So in TOY the consonants are T and Y

•Vowel(V) : Any other letter.

if a word ends with the suffix S1 will be replaced by S2, if it

Here S1 is 'EMENT' and S2 is null.

*v* the stem contains a vowel.

*d the stem ends with a double consonant (e.g. -TT, -SS).

*o the stem ends cvc, where the second c is not W, X or Y (e.g. -

E.g. (m>1 or *S)

Step 2 :Derivational Morphology -1

•(m>0) icate -> ic (Example : triplicate -> triplic)

•( (m>1) ance -> є (Example : allowance -> allow)

(Example : controll -> control ; roll -> roll)

Root word is called Stem Root word is called Lemma

It needs POS tagging

Lemmatization requires the context of the word in

• He school is to going Doesn’t Make sense

For N-gram models

By the Chain Rule we can decompose a joint probability, as follows:

P(w1,w2, ...,wn) = P(w1) P(w2|w1) P(w3|w1w2 ) P(wn|w1..., Wn-2 wn-1)

Join Probability of sentence

Looking back so much

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

v the stem contains a vowel.