0% found this document useful (0 votes)
13 views45 pages

Introduction To NLPAbebe Zerihun

The document provides an overview of Natural Language Processing (NLP), a subfield of Artificial Intelligence that focuses on the interaction between computers and human languages. It covers the history, approaches, techniques, and challenges in NLP, including rule-based systems, statistical methods, and machine learning algorithms. Additionally, it discusses linguistic issues such as ambiguity, context-dependence, and morphological variations, as well as various NLP tasks like text generation, machine translation, and sentiment analysis.

Uploaded by

Abebe Zeihun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views45 pages

Introduction To NLPAbebe Zerihun

The document provides an overview of Natural Language Processing (NLP), a subfield of Artificial Intelligence that focuses on the interaction between computers and human languages. It covers the history, approaches, techniques, and challenges in NLP, including rule-based systems, statistical methods, and machine learning algorithms. Additionally, it discusses linguistic issues such as ambiguity, context-dependence, and morphological variations, as well as various NLP tasks like text generation, machine translation, and sentiment analysis.

Uploaded by

Abebe Zeihun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Natural Language

Processing (NLP)
Abebe Zerihun Jabessa
Outline
Linguistic related Classical/Statistical
Introduction
issues in NLP Machine Learning

NLP related Deep


History of NLP Technologies and Learning/Generative
Descipline Learning

Approaches to NLP Foundations of NLP


Introduction
• subfield of Artificial Intelligence
(AI) that focuses on the
interaction between computers NLP
and human languages.
• It enables machines to
understand, interpret, and Machine
Artificial
Intelligence
Learning
generate human language in a
way that is both meaningful and
useful.
Robotics
• involves combining
computational techniques with
linguistic principles to process
and analyze large amounts of Computer Vision

natural language data.


History of NLP

The symbolic
approach
focused on
formal language 1983–1993:
1940s–1950s: theory, AI, and 1970–1983: Empiricism and 2000–2008:
Foundational early NLP Paradigms Finite-State Rise of Machine
Insights systems. Multiply Models Learning

1957–1970: In contrast, Hidden Markov 1994–1999:


Symbolic vs. stochastic Models (HMMs) Probabilistic
Stochastic methods used revolutionized Models
Camps probabilistic speech Dominate
models, recognition
including
Bayesian
approaches for
text recognition
and authorship
attribution.
Approaches to NLP
• Rule-Based Systems: - relied on a set of handcrafted rules defined
by linguists and domain experts
• Techniques Used:-
• Regular expressions for pattern matching
• Context-free grammars for parsing
• Lexicons and morphological analyzers for word-level analysis.
• These systems are deterministic but lack scalability and flexibility.
• Example: ELIZA
• User: "I feel sad today."
ELIZA: "Why do you feel sad today?"
• User: "I think my work is stressful."
ELIZA: "Why do you say your work is stressful?"
• User: "You are not listening to me."
ELIZA: "What makes you think I am not listening to you?"
• User: "I am just frustrated."
Does Eliza Understand Language?
• Does not truly understand in the humans or modern AI
models do.
• Eliza is a rule-based program that uses simple pattern
matching and substitution techniques to simulate
understanding.
• Pattern matching
• It uses these patterns to generate pre-programmed responses
based on simple rules.
• Substitution techniques
• Eliza rephrases the user's statements by substituting words or
phrases.
• No context understanding
• Eliza does not maintain memory of the conversation or truly grasp
Statistical/probabilistic Methods
• Statistical inference is used to understand and predict
language patterns
• Techniques Employed:
• n-gram models: Probabilistic models predicting the next word
based on a fixed number of preceding words.
• Hidden Markov Models (HMMs): Used for sequence labeling
tasks like Part-of-Speech (POS) tagging.
• Latent Semantic Analysis (LSA): For understanding
relationships between terms in a large text corpus.
N-gram
• An N-gram is a contiguous sequence of NNN items (words,
characters, or tokens) extracted from a given text
• How N-gram works?
• N-grams are created by sliding a window of size NNN over a
sequence of words or characters in a text.
• Unigram N=1 (single word or character)
• Unigram N=2 (two consecutive words or characters)
• Trigram N=3 (three consecutive words or characters)
• N-gram N>3 (more consecutive items)
• Purpose
• To model language by understanding the relationship and
probabilities of sequence of items.
Cont…
• Example: “I love Natural Language Processing”
• Unigram
• [I],[love],[natural],[language],[processing]
• Bigram
• [I love],[love natural], [natural language], [language processing]
•.
•.
• N=5 [I love natural language processing]
Cont ..
• Probabilistic model: The probability of a sequence P(w1,w2,…,wn) can be
decomposed using the chain rule:
• P(w1​,w2​,…,wn​ )=P(w1​)⋅P(w2​∣w1​)⋅P(w3​∣w1​ ,w2​)⋅⋯
⋅P(wn​∣w1​,w2​,…,wn−1​ )
• This means the probability of the entire sequence is

• The probability of the first word (𝑃(𝑤1)).


the product of:

given all previous words (𝑃(𝑤𝑖∣𝑤1,𝑤2,…,𝑤𝑖−1).


• The conditional probability of each subsequent word
Machine Learning
• Involves training models on datasets to recognize patterns and
perform tasks like classification, clustering, and prediction.
• Techniques Employeed:
• Support Vector Machines (SVMs): Common for classification tasks.
• Decision Trees and Random Forests: For feature-based classification.
• Logistic Regression: For binary and multi-class text classification.
• Naïve Bayes: For text classification tasks like spam detection.
• Recurrent Neural Networks (RNNs): For sequential tasks like language
modeling.
• Convolutional Neural Networks (CNNs): For text classification tasks by
capturing local patterns.
Generative (Transformers) Method
• “Attention is All You Need" by Vaswani et al. in 2017.
• Attention Mechanism:
• Positional Encoding
• Feed-Forward Networks
• Encoder-Decoder Architecture
Linguistic Related Issues in NLP
• Ambiguity in Language: a single word or sentence can have multiple
interpretations depending on the context.
• Example:- The sentence “I went to the bank” could mean:
• A financial institution where money is deposited or withdrawn.
• The side of a river or stream.
• Ambiguity could be lexical, syntactic, semantic and Pragmatic.
• Solution: Word embedding (Contextual representation)
• Context-dependence:-Machines struggle to understand and retain
long-term context over extended conversations.
• Example: In a conversation: User: "Where is John?"
• System: "He’s at the library."
• User: "What is he reading?" A machine might not connect “he” in the
second question to “John” from the first.
• Solution: LSTMs (somehow capture long term dependence)
• Polysemy and Homonymy
• Polysemy: A single word has multiple related meanings.
• Example: head
• Meaning 1: The uppermost part of the body
• She hurt her head.
• Meaning 2: The leader or chief of something
• He is the head of the company
• Homonymy: A single word has completely unrelated meanings (e.g.,
bat as in the animal or a sports tool).
• Word: "Gaarii“
• Meaning 1: Good or pleasant (adjective describing quality)
• Example: Eyyaasuun nama gaarii dha. (Eyasu is a good person)
• Meaning 2: A type of cart or carriage (noun, referring to a transport
vehicle)
• Example: Gaariin Dhufaa jira. (The cart is coming.)
• Sol/n: Contextual Embedding (transformer based models)
• Morphological Variations:-Words can take on many forms
depending on grammar
• Example: The base word run can become:
• Running, ran, runner, runs.
• Sol/n: Lemmatization better-> good ,running ->run
• Lemmatization ca be a solution to morphological problems
in NLP :
• Information retrieval and search engine
• Text classification.
• Sentiment Analysis
• Machine Translation
• Text summarization
• Idioms and Figurative Language:- phrases that don’t
mean what individual word suggest.
• Example:- Example: “Kick the bucket” means “to die,”
not literally kicking a bucket.
• Sol/n: Data Annotation (phrase level data labelling)
• Regional and Dialectal Variations
• Language usage varies significantly by region and dialect
• Example:-
• Football in Europe and Soccer in USA
• Sol/n: Preprocessing text for standardization
How can we normalize the problem of
dialect?
• Text normalization
• Dialect: "gonna" → Standard: "going to“
• Dialect: "ain't" → Standard: "is not“
• Spelling normalization
• Dialect: "colour" (British) → Standard: "color" (American).
• Synonym Matching:
• Scottish English: "wee" → Standard English: "small.“
• Dialect-Specific Fine-Tuning
• Fine-tune large pretrained models (e.g., BERT, GPT) on dialect-
specific corpora.
Foundation of NLP

• Linguistics: Understanding the structure and rules of


language, including syntax (sentence structure),
semantics (meaning), phonetics (sounds), and
pragmatics (contextual use).
• Mathematics: Leveraging concepts from probability,
statistics, linear algebra, and calculus for model
development.
• Computer Science: Applying algorithms, data
structures, and computational efficiency principles to
process language data.
Tasks of NLP: Preprocessing Task
• Tokenization: Tokenization slices the loaf into smaller
pieces, like individual words or phrases.
• Example:- the computer is on the table into [“the”,
“computer”, “is”, “on”, “the”, “table”.
• Truecasing: standardizing cases: Example:- Hello to
Hello.
• Stemming/Lemmatization: Reducing words to their root
forms Example: (running, runs, ran → run)
Text Understanding Tasks
• Part-of-Speech Tagging (POS Tagging):- POS tagging assigns grammatical
categories (e.g., noun, verb, adjective) to words in a sentence.
• Sentence: "The bank is near the river."
• POS Tagging
• "The" → Determiner (DT)
• "bank" → Noun (NN)
• "is" → Verb (VBZ)
• "near" → Preposition (IN)
• "the" → Determiner (DT)
• "river" → Noun (NN)
• Named Entity Recognition (NER): Identifying and
classifying entities (e.g., people, locations) in text.
• Example : "Dr. Smith from Boston arrived yesterday.“
• NER output:
• "Dr. Smith" → Person
• "Boston" → Location
• "yesterday" → Date
• Semantic Analysis: grasping the meaning behind
words and sentences. It’s like understanding a joke—
there’s more to it than just the words; you need context
and nuance.
Example: Figuring out that “bank” means a financial
institution in "I deposited money at the bank."
Language Generation Tasks
• Overview of tasks
Task Definition Input Output
Produces coherent,
New text that is coherent
fluent, and contextually A seed prompt, context,
Text Generation and contextually
relevant text from scratch or none.
appropriate.
or given a partial input.
Converts text from one
Machine Translation (MT) language to another while Text in source language. Text in target language.
preserving meaning.
Condenses input text into
a shorter version while A concise summary of the
Summarization A long piece of text.
preserving key text.
information.
Language Interaction and Reasoning
Tasks
• Chatbots and Virtual Assistants:
• A computer program designed to simulate conversation with users.
• Example: Alexa or Siri as a helpful butler.
• They answer questions, set reminders, or play songs by
interpreting your commands and responding appropriately.
• Question Answering:
• Imagine asking a librarian a question and getting a direct
answer instead of a pile of books.
• NLP-powered systems try to do just that.
• Information Retrieval and • Sentiment Analysis
Extraction • This task is about figuring out
• Information Retrieval (IR): emotions behind text.
• Information Retrieval is about • Imagine reading a restaurant review
locating relevant documents or and deciding if the reviewer was
data from a large repository happy, angry, or neutral.
based on a query.
• NLP tools do this automatically for
• Information Extraction (IE): tasks like understanding customer
• Information Extraction goes feedback.
deeper into the content of the
retrieved data, extracting
structured information (like
names, dates, or relationships)
from unstructured text.
Speech- Text Generation
• Speech-to-Text (STT)
• converts spoken language into written text
• Audio input -> Audio Processing -> Phoneme detection-
>Language Model -> text output.
• Text-to-Speech (TTS):
• converts written text into spoken language
• Text input -> Text processing -> Phoneme Generation -> Voice
Synthesis->Audio input.
• Imagine a narrator reading out a book aloud in a human-like
voice
Discipline
• Linguistic: theoretical insights into language structure and
usage.
• Computer Science: Computer Science provides the
computational tools and algorithms to implement NLP systems.
• Algorithms and Data Structures: search , graph (relationship between
words) and tree (syntax parsing).
• Mathematics and statistics:
• Probability and Statistical Models:
• Hidden Markov Models (HMMs) for sequence prediction.
• Bayesian Networks for decision-making.
• Linear Algebra
• Word embeddings (e.g., Word2Vec, GloVe) represented as vectors in high-
dimensional spaces.
• Optimization
• Gradient descent
• Cognitive Science: Cognitive science examines how humans
process and use language, informing NLP systems about cognitive
behaviors.
• Psychology: Psychology provides insights into human emotion,
perception, and interaction with language.
• Emotional Understanding: How humans express and detect
emotions.
• Interpretation of Intent: Understanding underlying motivations
in text.
• Human Computer Interaction (HCI): HCI focuses on creating
user-friendly interfaces that enhance interaction between humans and
machines.
• Speech and audio processing: Speech and audio processing
focus on the interaction between spoken language and machines.
• Signal Processing: Extracting meaningful patterns from audio signals.
Supervised ML Algorithms
• Naïve Bayes: Probabilistic model based on Bayes'
theorem, assuming feature independence.

• Applications:
• Sentiment analysis (e.g., classifying reviews as
positive/negative)
• Spam filtering.
• Logistic Regression: Predicts probabilities for
binary/multi-class text classification.
• Text classification (e.g., topic labeling).
• Sentiment analysis.
• Support Vector Machines: Finds a hyperplane that best
separates classes; works well with high-dimensional text
data.
• Named Entity Recognition (NER).
• Document classification.
• Random Forests: Ensemble of decision trees, reducing
overfitting and improving accuracy.
• Text classification.
• Sentiment analysis.
• k-Nearest Neighbors (k-NN): Classifies text by finding the
majority label among the k most similar instances in the
dataset.
• Document classification.
• Similarity-based search.
Unsupervised ML Algorithms
• k-Means Clustering: Groups text into clusters based on
similarity in vector space.
• Document clustering (e.g., grouping news articles by topics).
• Hierarchical Clustering: Builds a hierarchy of clusters,
useful for visualizing text groupings (e.g., dendrograms).
• Document clustering.
• Grouping similar sentences.
• Word2Vec (Skip-Gram):Learns vector representations of
words by predicting their context or vice versa.
• Learning word embeddings.
• Semantic analysis.
Reinforcement Learning (RL)
• Q-Learning: RL algorithm where an agent learns optimal
actions by interacting with an environment.
• Dialogue systems (e.g., learning optimal responses).
• Policy Gradient Methods:- Trains a model to maximize
rewards for generating high-quality responses or text
sequences.
• Text generation (e.g., optimizing fluency and coherence).
Deep Learning / Generative
Learning
• Deep Learning (DL) have revolutionized Natural Language Processing
(NLP) by enabling machines to understand, analyze, and generate
human language.
• deep learning models can grasp the nuances of language, like idioms,
context, and tone, by learning patterns directly from the data.
• Benefits:
• Automatic feature extraction
• Scalability- DL models improve as they are fed more data.
• Understanding context-They capture word meanings based on context (e.g.,
"bank" in a financial or river setting)
Deep Learning Architectures
• Recurrent Neural Network (RNN):- Capture temporal
dependencies in sequences
• Struggle with long sequences due to "vanishing gradients”.
• Use case: text generation
• Vanishing gradients?
• a phenomenon that occurs during the training of deep neural
networks, where the gradients of the loss function with respect to the weights
become very small (approach zero).
• This makes it difficult for the model to update its weights effectively, slowing
down or even halting learning in the earlier layers of the network.
• Long Short-Term Memory Networks(LSTM):-A special
type of RNN with memory gates
• Useful for tasks like language modeling or text generation
• Gated Recurrent Units (GRUs):a simplified version of
LSTMs with fewer gates, making them computationally
more efficient while still has vanishing gradients
problem.
• Use case: Predicting stock prices based on historical
data.
Does vanishing gradient problem of
LSTM?
• LSTMs Address Vanishing Gradients
• Designed to handle long-term dependencies in sequential
data.
• Use memory cells and gates (input, forget, output) to regulate
gradient flow.
• Gradient Flow Mechanism:
• Cell state allows gradients to flow backward without
significant diminishment.
• Not Completely Immune:
• LSTM can still face exploding gradients.
Encoder-Decoder Architecture
• a general framework used for sequence-to-sequence
(seq2seq).
• widely used in Recurrent Neural Networks (RNNs),
LSTMs, and GRUs for tasks like machine translation
• Encoder: processes the input sequence and encodes it
into a fixed-length vector.
• capture the important features of the input sequence.
• Decoder: takes encoded vector (or vectors) and generates
the output sequence.
• Encoder :RNN/LSTM/GRU
• Decoder:RNN/LSTM/GRU
• Transformers:-The backbone of modern NLP. Unlike
RNNs, they process all words simultaneously.
• Key concept: Self-attention.
• Each word looks at all other words to decide which ones are
most important.
• Transformer is based on an encoder-decoder structure.
• No recurrent Layer :Parallel processing
• Positional Encoding
• Use case: Machine translation, text summarization,
question answering and more.
Transformer based Pre-trained
models
• BERT(Bidirectional Encoder Representation from
Transformer):-uses Masked Language modelling.
• Text classification
• Question answering
• Text summarization
• Translation.
• GPT (Generative pretrained Transformer)
• Text Generation
• Text completion
• T5 (text to text transfer Transformer)
• Translation
What is Masked LM?
• a type of language model used in natural language processing
(NLP) tasks, particularly for pre-training deep learning models like
BERT.
• In a Masked LM, some percentage of the input tokens (words) are
randomly replaced with a special token (usually denoted as
[MASK]).
• the model is trained to predict the original token that was masked.
• Example: Input sentence
• "The quick brown fox jumps over the lazy dog."
• A portion of this might be masked, such as:
• "The quick brown [MASK] jumps over the lazy dog.“
• The model's task is to predict that the masked token is "fox" based
on the surrounding context.
Activation Functions
• Activation functions introduce non-linearity, enabling the network to
learn complex patterns.
Normalization
• Normalization makes training more stable and efficient
by adjusting inputs or intermediate values to have
specific properties.
• Batch Normalization (BatchNorm):
• Normalizes the input to each layer using the mean and
variance of the mini-batch.
Optimization
• Optimization is the process of tweaking the network's
parameters (weights, biases) to minimize the loss
function.
• It measures how far the model's predictions are from
the true values.
• Optimizers
• Gradient decent: SGD, mini batch SGD, SGD with momentum.
• RMSProp
• Adam
Hyper-parameters
• Hyper-parameters are the settings you configure before
training the model.
• Grid search or random Seach.
• Examples:
• Batch size
• Learning rate
• Dropout rate
• Number of layers
Super Parameters
• Super-parameters are an informal sometimes used to
refer to parameters that control the search or
optimization of hyper-parameters.
• super-parameters might govern how hyper-parameters
are chosen or explored in a given model search.
References
1. Jurafsky, Daniel, and James H. Martin. "Speech and Language Processing: An
Introduction to Natural Language Processing, Computational Linguistics, and
Speech Recognition."
2. Manning, Christopher, and Hinrich Schutze. Foundations of statistical natural
language processing. MIT press, 1999.
3. Daniel Jurafsky & James H. Martin “Speech and Language Processing: An
introduction to natural language processing, computational linguistics, and
speech recognition”.
4. [Online: Jan 11, 2023]: Natural Language Processing (NLP) [A Complete Guide]
5. Christopher D. Manning and Hinrich Schutze, Foundations of statistical natural
language Processing.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy