0% found this document useful (0 votes)

6 views6 pages

Unit 2

The document discusses Word Window Classification, a technique in natural language processing (NLP) used to classify words based on their surrounding context, which is essential for tasks like Named Entity Recognition and Part-of-Speech tagging. It also covers the basics of neural networks, particularly Recurrent Neural Networks (RNNs), and their applications in handling sequential data, along with challenges like vanishing and exploding gradients. Additionally, it introduces N-gram language models and perplexity as metrics for evaluating language models.

Uploaded by

pinky.thogaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

Unit 2

Uploaded by

pinky.thogaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

UNIT-2 – Classification: NN (Noun)

• Target Word: "sat"

Word Window Classification – Word Window: [cat, sat, on]

– POS Tags: [NN, VB, IN]

• Word Window Classification is a technique used – Classification: VB (Verb)
processing (NLP) to classify words based on the context • Target Word: "on"
provided by surrounding words, known as a “window”. – Word Window: [sat, on, the]
• This approach is particularly useful in tasks like Named
Entity Recognition (NER), Part-of-Speech (POS) tagging, – POS Tags: [VB, IN, DT]
and other sequence labelling tasks.
– Classification: IN (Preposition)
How Word Window Classification Works?
Applications
• Word Window: A word window is a fixed-size
• Named Entity Recognition (NER): Classifying words
context around a target word. For instance, if the
into categories like person, location, organization, etc.
window size is 3, the window will include the target
• Part-of-Speech (POS) Tagging: Assigning parts of
word, one word to its left, and one word to its right.
speech to each word in a sentence.
• Feature Extraction: The features for the target word
• Chunking: Dividing a text into syntactically correlated
are extracted from this window. These features can
parts like noun or verb phrases.
include the words themselves, their embeddings, POS
Benefits
tags, or any other relevant linguistic features.
• Context-Aware: Takes into account the surrounding
• Model Training: A machine learning model (e.g., context, leading to better classification performance.
logistic regression, SVM, or a neural network) is trained • Simplicity: Relatively simple to implement and
on these features to classify the target word. understand.
• Sliding Window: The window slides over the text,
classifying each word based on its surrounding context.
Example Neural Networks for text
• Consider the sentence: "The quick brown fox jumps Neural networks are like smart algorithms that learn
over the lazy dog." patterns from data. When it comes to text, here's a
• If the target word is "fox" and the window size is 3, simple explanation:
the window will look like this: Basic Concept:
• Previous word: "brown" • Neurons (Nodes): Think of neurons as tiny decision-
• Target word: "fox" makers. Each neuron takes some input (like a word or
• Next word: "jumps" a number) and decides if it should pass that input
• Features for "fox" could include the embeddings of along to the next layer of neurons based on how
"brown", "fox", and "jumps". "important" it thinks the input is.
Layers:
Example to illustrate how word window classification – Input Layer: This is where the text data first
works: enters the network. If we're working with text, each
• Task: Part-of-Speech Tagging word or character can be turned into numbers
Example Sentence: (using something called embeddings or one-hot
• The cat sat on the mat. encoding) and fed into the input layer.
Goal: – Hidden Layers: These are layers between the
• Assign each word in the sentence its correct part of input and output. They do the heavy lifting by learning
speech (POS) tag. complex patterns in the data. Each layer passes its
Word Window: output to the next one.
• We'll use a word window of size 3 (1 word to the – Output Layer: This gives the final prediction,
left, the target word, and 1 word to the right). like identifying whether a text is positive or negative in
• For the sentence "The cat sat on the mat": sentiment analysis.
• Target Word: "cat"  Learning:
– Word Window: [The, cat, sat] • The network learns by adjusting the importance
– POS Tags: [DT (determiner), NN (Noun), VB (Verb)] (weights) of the connections between neurons. It does
this over many cycles (called epochs) using examples • Unigram (1-gram): A single word (e.g., "The") •
of text and the correct answers. Bigram (2-gram): A sequence of two words (e.g., "The
• It tries to minimize errors using a method called cat")
back propagation, where it checks how far off its • Trigram (3-gram): A sequence of three words (e.g.,
prediction was and adjusts the weights to do better "The cat sits") • And so on...
next time.
 Language Model:
 Activation Functions: These are like filters that
• A language model assigns probabilities to sequences
decide if a neuron should activate (send a signal
of words. • For an N-gram model, the probability of a
forward). They add non-linearity, which helps the
word depends on the previous N-1 words.
network learn complex patterns.
• For example, in a trigram model, the probability of a
word depends on the two preceding words.
 How It Works?
Applying to Text
• The model is trained on a large corpus of text,
• Text as Input: Text is turned into numbers counting how often different N-grams occur.
(embeddings) that the network can understand. • It uses these counts to estimate the probability of a
• Pattern Recognition: The neural network learns word following a given sequence of N-1 words.
patterns like which words usually appear together,
sentence structures, or even the sentiment behind
phrases.
• Prediction: After learning from lots of examples, it
can predict things like the sentiment of a sentence,
classify topics, or generate new text.

Embeddings
 Embeddings are a way to represent words or
phrases as numerical vectors, making them
understandable for machine learning models,
especially neural networks.

• Why Use Embeddings? Applications

Computers Understand Numbers: Text data needs • Text Prediction: N-gram models can predict the next
to be converted into numbers because computers word or sequence of words in a sentence (e.g., in auto
work with numbers, not words. complete).
Capture Meaning: Simple methods like assigning a • Speech Recognition: They help in determining the
unique number to each word don’t capture the most probable words spoken in a sequence.
meaning or relationships between words. Embeddings • Spelling Correction: They can suggest corrections by
solve this by encoding semantic relationships between considering the most likely word sequences.
words.  Limitations:
• Limited Context: Higher-order N-gram models (e.g.,
N-gram Language Models
trigram, 4- gram) capture more context but require
 N-gram language models are a type of statistical much more data and computational power.
model used in natural language processing (NLP) to • Data Sparsity: Rare N-grams might not appear often
predict the probability of a sequence of words in a enough in the training data, leading to poor
sentence. They are called "N-gram" models because probability estimates for some word sequences.
they consider sequences of "N" words at a time.
• Over fitting: High-order N-gram models might fit the
training data too closely and not generalize well to
Key Concepts: new text.
• N-gram:
– An N-gram is a contiguous sequence of "N"
Perplexity
items (usually words) from a given text. • Perplexity is a measurement used in natural
– For example: language processing to evaluate the quality of a
language model.
• It essentially tells us how well a probability model
predicts a sample of text.
• Lower perplexity indicates a better model because it Hidden Markov Models
suggests the model is better at predicting the text. • A Hidden Markov Model (HMM) is a statistical model
 What is Perplexity? used to represent systems that are governed by a
• Understanding Perplexity: Markov process with hidden states.
– Perplexity is the exponentiation of the average • HMMs are widely used in areas such as speech
negative log-likelihood of a test set, which can be recognition, natural language processing, and
interpreted as the average branching factor of a bioinformatics.
language model.  States:
– In simpler terms, it tells us how "surprised" the • Hidden States: The actual states of the system are
model is by the text. If a model is well-trained and not directly observable. Instead, they are inferred
predicts the text well, it will have low perplexity (low based on observable outputs.
surprise). If the model is poorly trained, it will have • Observable States: These are the outputs or
high perplexity (high surprise). observations that can be directly seen or measured.
• Markov Property:

The Markov property assumes that the probability of

transitioning to the next state depends only on the
current state, not on the sequence of previous states.
This is known as the first-order Markov property.

Transition Probabilities: It define the likelihood of

moving from one hidden state to another. They are
usually represented in a matrix called the transition
matrix.
Emission Probabilities: It define the likelihood of
 Perplexity and Language Models: observing a particular output given a specific hidden
• N-gram Models: Perplexity is often used to evaluate state. These are represented in an emission matrix.
N-gram models. A trigram model, for example, will Initial State Probabilities: These are the probabilities
have lower perplexity than a bigram model if it better of the system starting in each possible hidden state.
captures the text's patterns.
• Neural Language Models: Modern neural language
models (e.g., RNNs, Transformers) often achieve much
lower perplexity than traditional N-gram models,
indicating they are better at predicting sequences of
words.
 Interpreting Perplexity:
• A lower perplexity score indicates a better model.
For example, if one model has a perplexity of 50 and
another has 100, the first model is considered to be
better at predicting the text.
• However, perplexity is relative; it should be
compared within the same dataset and task.

Example
• Suppose a language model predicts the following
sequence: "The cat sat on the mat.
• "If the model predicts each word with high
probability, the perplexity will be low, suggesting the
model understands the text well.
• If the model predicts each word with low probability,
the perplexity will be high, suggesting the model is
less effective.
network to the next, enabling the network to maintain
a memory of previous inputs.

Key Features of RNNs:

Sequential Data Handling: RNNs are designed to

process sequences of data, such as sentences or time
series, where the order of the elements is important.

Hidden State: RNNs maintain a hidden state that is

updated at each time step, capturing information about
previous elements in the sequence. This hidden state
helps the network retain context and understand
dependencies between words in a sentence.

Shared Weights: The same set of weights is applied at

each time step, allowing RNNs to generalize across
different positions in the input sequence.

RNNs are widely used in various NLP tasks, including:

Language Modelling: Predicting the next word in a
sequence based on the previous words.

Text Generation: Generating coherent text by

predicting sequences of words, character by character
or word by word.

Machine Translation: Translating text from one

language to another by processing the input sequence
and generating the corresponding sequence in the
target language.

Speech Recognition: Converting spoken language into

text by processing the sequence of audio features.

Sentiment Analysis: Analysing sequences of text to

determine the sentiment (positive, negative, neutral)
expressed in the text.

Architecture of an RNN:

Recurrent Neural network Input Layer: Accepts sequential input data (e.g., words
in a sentence, stock prices).
RNN(Recurrent Neural Network): Recurrent Neural
Hidden Layer: Contains a recurrent connection that
Networks (RNNs) are a class of neural networks
allows the network to remember past states.
specifically designed to handle sequential data, making
them particularly well-suited for tasks in Natural Output Layer: Generates the output at each time step
Language Processing (NLP). Unlike traditional or after processing the entire sequence.
feedforward neural networks, RNNs have loops that
allow information to be passed from one step of the
Challenges with RNNs: In RNNs: This typically happens when there are large
weight values or when trying to model very complex
Vanishing Gradient Problem: Gradients diminish over
sequences, where the error gradients multiply and
long sequences, making it difficult for the network to
grow rapidly as they propagate backward through time.
learn long-term dependencies.
Impact: The training process becomes unstable, with
Exploding Gradient Problem:Gradients can grow
the model's loss function often resulting in "NaN" (Not
excessively large during backpropagation, causing
a Number) values, and the model's performance
instability.
deteriorates.
Limited Memory: Difficulty in handling very long
sequences due to reliance on hidden states.

Vanishing Gradients and Exploding Gradients are two

common problems encountered during the training of
deep neural networks, particularly in Recurrent Neural
Networks (RNNs) and other deep architectures. These
issues arise during the backpropagation process, which
is used to update the model's weights by calculating
gradients.

Vanishing Gradients and exploding

gradient
Vanishing Gradients

The Vanishing Gradient problem occurs when the

gradients of the loss function with respect to the
model's parameters become very small as they are
propagated backward through the network. This
leadsto very small updates to the model’s weights,
effectively stalling learning, particularly in the early
layers of the network. This problem is especially
prevalent in deep networks or in RNNs when trying to
capture long-term dependencies.

In RNNs: When processing long sequences, the

contributions from earlier inputs diminish
exponentially, making it difficult for the network to
learn relationships between distant inputs in the
sequence.

Impact: The model struggles to learn and represent

long-range dependencies in the data, leading to poor
performance on tasks that require understanding of
context over long sequences (e.g., in NLP tasks like
language modeling or translation).

Exploding Gradients

The Exploding Gradient problem occurs when the

gradients become excessively large during
backpropagation. This can cause the model's weights to
grow exponentially, leading to numerical instability.
The model may diverge during training, making it
impossible to learn anything meaningful.

Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Speech and Language Processing
100% (2)
Speech and Language Processing
623 pages
Practical AI For Cybersecurity
No ratings yet
Practical AI For Cybersecurity
293 pages
NLP Book
No ratings yet
NLP Book
599 pages
Inam Ullah Khan (editor), Salma El Hajjami (editor), Mariya Ouai - Cognitive Machine Intelligence_ Applications, Challenges, and Related Technologies (Intelligent Data-Driven Systems and Artific (2024, CRC Press) -
No ratings yet
Inam Ullah Khan (editor), Salma El Hajjami (editor), Mariya Ouai - Cognitive Machine Intelligence_ Applications, Challenges, and Related Technologies (Intelligent Data-Driven Systems and Artific (2024, CRC Press) -
373 pages
NLP M4 Part 2 SPP
No ratings yet
NLP M4 Part 2 SPP
71 pages
Back Propagation
No ratings yet
Back Propagation
56 pages
CH 3
No ratings yet
CH 3
183 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Results of III B.tech II Semester (R20R19R16) RegularSupplementary Examinations, June-2024
No ratings yet
Results of III B.tech II Semester (R20R19R16) RegularSupplementary Examinations, June-2024
264 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
28 pages
Lecture6 421
No ratings yet
Lecture6 421
43 pages
Unit-III NLP
No ratings yet
Unit-III NLP
15 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
Unit-Ii Knowledge Representation and Reasoning Part-A
No ratings yet
Unit-Ii Knowledge Representation and Reasoning Part-A
10 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Ed 3 Book
No ratings yet
Ed 3 Book
577 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
Delay Prediction
No ratings yet
Delay Prediction
37 pages
Deep Learning Based Context Aware Recommender System
No ratings yet
Deep Learning Based Context Aware Recommender System
70 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Text Classification Using NLP
No ratings yet
Text Classification Using NLP
28 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Data Sceince PPT (Copy 3)
No ratings yet
Data Sceince PPT (Copy 3)
12 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Mod 1
No ratings yet
Mod 1
71 pages
Comprehensive Resource
No ratings yet
Comprehensive Resource
47 pages
AI Primer
No ratings yet
AI Primer
12 pages
Characteristics of Artificial Neural Networks
No ratings yet
Characteristics of Artificial Neural Networks
38 pages
Speech and Language Processing: Third Edition Draft
No ratings yet
Speech and Language Processing: Third Edition Draft
287 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
NLP Concepts
No ratings yet
NLP Concepts
37 pages
Unit 3
No ratings yet
Unit 3
18 pages
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
19 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
NLP Questions
No ratings yet
NLP Questions
26 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Deep Learning-5
No ratings yet
Deep Learning-5
5 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Updated - Mini - Project - II - Predictive Maintenance Analysis For Industrial Machinary
No ratings yet
Updated - Mini - Project - II - Predictive Maintenance Analysis For Industrial Machinary
39 pages
Resource MGT
No ratings yet
Resource MGT
7 pages
Slide
No ratings yet
Slide
28 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
17 pages
CH-2 Natural Language Processing Models and Algorithm
No ratings yet
CH-2 Natural Language Processing Models and Algorithm
119 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Modern Language Models
No ratings yet
Modern Language Models
28 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Language Models
No ratings yet
Language Models
11 pages
NLP Sem Unit 5
No ratings yet
NLP Sem Unit 5
9 pages
2 Marks
No ratings yet
2 Marks
11 pages
Mcascheme
No ratings yet
Mcascheme
9 pages
7-Text Classification-13-11-2024
No ratings yet
7-Text Classification-13-11-2024
53 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Chapter II
No ratings yet
Chapter II
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Online Time-Series Anomaly Detection A Survey of M
No ratings yet
Online Time-Series Anomaly Detection A Survey of M
36 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
5 pages
Transformer
No ratings yet
Transformer
5 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
2020 NLPDeepLearning
No ratings yet
2020 NLPDeepLearning
72 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Offer Letter/Approval Letter From The Company: Movie Recommendation System With Python
No ratings yet
Offer Letter/Approval Letter From The Company: Movie Recommendation System With Python
27 pages
Industrial Training at
No ratings yet
Industrial Training at
22 pages
Intrusion Detection Systems Based On Machine Learning Algorithms
No ratings yet
Intrusion Detection Systems Based On Machine Learning Algorithms
7 pages
3CP10 MJJ Clustering Intro
No ratings yet
3CP10 MJJ Clustering Intro
18 pages
Detecting Stress With Wearable Sensor Data and Machine Learning
No ratings yet
Detecting Stress With Wearable Sensor Data and Machine Learning
14 pages
Scamshield Product Requirements Document
No ratings yet
Scamshield Product Requirements Document
1 page
COMP3010 Machine Learning Trimester 1 2025 Dubai Intern'l Academic City INT
No ratings yet
COMP3010 Machine Learning Trimester 1 2025 Dubai Intern'l Academic City INT
13 pages
Optimizing Data Warehousing Performance Through Machine Learning
No ratings yet
Optimizing Data Warehousing Performance Through Machine Learning
10 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
SCS6105 Assignment 1 Group3
No ratings yet
SCS6105 Assignment 1 Group3
11 pages
Kelavath Balaji Naik - Yikk
No ratings yet
Kelavath Balaji Naik - Yikk
2 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
Sop Machine Learning Model Development
No ratings yet
Sop Machine Learning Model Development
5 pages
Predicting The Future With Artificial Inteligence
No ratings yet
Predicting The Future With Artificial Inteligence
10 pages
Complex Engineering Problem-ES205-Fa2023
No ratings yet
Complex Engineering Problem-ES205-Fa2023
7 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2

Uploaded by

Unit 2

Uploaded by

UNIT-2 – Classification: NN (Noun)

• Target Word: "sat"

– POS Tags: [NN, VB, IN]

• Why Use Embeddings? Applications

The Markov property assumes that the probability of

Transition Probabilities: It define the likelihood of

Key Features of RNNs:

Sequential Data Handling: RNNs are designed to

Hidden State: RNNs maintain a hidden state that is

Shared Weights: The same set of weights is applied at

RNNs are widely used in various NLP tasks, including:

Text Generation: Generating coherent text by

Machine Translation: Translating text from one

Speech Recognition: Converting spoken language into

Sentiment Analysis: Analysing sequences of text to

Vanishing Gradients and Exploding Gradients are two

Vanishing Gradients and exploding

The Vanishing Gradient problem occurs when the

In RNNs: When processing long sequences, the

Impact: The model struggles to learn and represent

The Exploding Gradient problem occurs when the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.