NLP 2 Chapter 2
NLP 2 Chapter 2
Analysis
CHAPTER 2
1. Morphology Analysis
Morphology is the study of the structure and formation of words in a language.
Definition: Morphology analyzes how words are formed by combining morphemes
(smallest meaningful units).
Types of Morphemes:
Free morphemes: Stand alone as words (e.g., "book").
Bound morphemes: Cannot stand alone (e.g., "-ed" in "played").
Importance in NLP: Morphology aids in text analysis tasks like tokenization, parsing, and
information retrieval.
Inflectional Morphology: Modifies a word to express tense, mood, number, etc. (e.g.,
"run" → "running").
Derivational Morphology: Creates new words with new meanings by adding prefixes or
suffixes (e.g., "happy" → "unhappy").
•Morphological Parsing: Decomposes words into their root forms and affixes.
•Challenges: Handling irregular forms (e.g., "go" → "went") and language-specific variations.
•Cross-Linguistic Morphology: Languages like Finnish and Turkish are morphologically rich, making
analysis more complex.
2. Inflectional Morphology & Derivational Morphology
Inflectional Morphology
Definition: Focuses on modifying a word’s form to indicate grammatical categories (e.g.,
"cat" → "cats").
Grammatical Categories: Includes tense, number, case, gender, aspect, and mood.
Limited Effect: Does not change the core meaning or word class (e.g., "walk" →
"walked").
Examples in English: Plural forms ("dogs"), past tense ("played"), comparative
("smarter").
Regular vs. Irregular Inflection: Regular patterns follow rules (e.g., add "-ed"), while
irregular forms do not (e.g., "sing" → "sang").
Applications: Used in POS tagging, grammar correction, and dependency parsing.
Cross-Language Examples: German uses suffixes for case and gender (e.g., "Hund" →
"Hunde").
Role in Sentence Construction: Essential for subject-verb agreement in languages
with rich inflection.
Simpler in English: English has fewer inflectional markers compared to languages like
Russian or Arabic.
NLP Impact: Helps understand morphological variance in syntactic analysis.
Derivational Morphology
Definition: Involves creating new words by adding prefixes, suffixes, or other affixes.
Examples: Adding "-ness" to "happy" to form "happiness" or "un-" to "do" to form "undo."
Word Class Changes: Often changes the grammatical category (e.g., noun → adjective).
Prefix and Suffix: Prefixes like "pre-" and "re-" often indicate time, while suffixes like "-
ation" indicate actions.
Recursive Derivation: Multiple derivational processes can apply to the same word (e.g.,
"modernize" → "modernization").
Meaning Changes: Derivation often creates significant semantic shifts (e.g., "clear" →
"clarify").
Language-Specific Rules: Different languages have unique derivational affixes.
Applications: Used in text generation, translation, and language learning tools.
Complexity: Derivational processes can involve compounding and blending (e.g.,
"brunch").
NLP Challenges: Requires understanding context to ensure correct derivation.
3. Stemming and Lemmatization
Definition: Techniques to reduce words to their base forms for text processing.
Stemming:
Removes affixes to produce the stem.
Often results in non-standard words (e.g., "studying" → "studi").
Algorithm: Porter Stemmer, Snowball Stemmer.
Lemmatization:
Reduces words to their dictionary form (lemma).
Considers context and part of speech (e.g., "better" → "good").
Applications: Search engines, information retrieval, and text normalization.
Comparison: Lemmatization is more accurate but computationally intensive
compared to stemming.
Example:
Stemming: "flies" → "fli"
Lemmatization: "flies" → "fly."
Role in NLP Pipelines: Often used during preprocessing for feature extraction.
Challenges: Handling polysemous words (e.g., "bank" as a verb vs. noun).
Multilingual Support: Tools like SpaCy provide language-specific lemmatizers.
Future Trends: Integration with deep learning for better contextual understanding.
Regular expression :
A regular expression (regex) is a sequence of characters that
define a search pattern. Here’s how to write regular expressions:
Start by understanding the special characters used in regex, such
as “.”, “*”, “+”, “?”, and more.
Choose a programming language or tool that supports regex,
such as Python, Perl, or grep.
Write your pattern using the special characters and literal
characters.
Use the appropriate function or method to search for the pattern
in a string.