0% found this document useful (0 votes)
96 views

Natural Language Processing

Natural Language Processing (NLP) refers to systems that can understand human language. NLP is concerned with the interactions between computers and human languages, specifically extracting meaningful information from natural language inputs and producing natural language outputs. The goal of NLP is to process text data like documents to perform tasks such as translation, grammar checking, topic classification, and determining document similarities.

Uploaded by

KISHAN MALAVIYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Natural Language Processing

Natural Language Processing (NLP) refers to systems that can understand human language. NLP is concerned with the interactions between computers and human languages, specifically extracting meaningful information from natural language inputs and producing natural language outputs. The goal of NLP is to process text data like documents to perform tasks such as translation, grammar checking, topic classification, and determining document similarities.

Uploaded by

KISHAN MALAVIYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Natural Language Processing

(NLP)
What is NLP?

• Artificial Intelligence(AI): the broad discipline of creating intelligent machines (Machines work
like a human)
• Machine Learning (ML): is the science of getting computers to act without being explicitly
programmed. It refers to systems that can learn from previous experience. (medical diagnosis,
image recognition, speech recognition, online fraud detection etc.)
• Natural Language Processing (NLP): refers to systems that can understand human language.
What is NLP?
• NLP is Natural Language Processing.
• Natural Languages are those spoken by people.
• Natural Language Processing(NLP) is a field of computer science, artificial intelligence (also
called machine learning) concerned with the interactions between computer and human (natural)
languages.
• Specifically, the process of a computer extracting meaningful information from natural language
input and/or producing natural language output.
• It goal is to process text data (unstructured data) to perform tasks like translation, grammar
checking, topic classification, document similarities etc.
• Example: Google Assistant, Siri, Alexa
• NLP encompasses anything a computer needs to understand natural language (typed or spoken)
and also generate the natural language.
• A language is a system, a set of symbols and set of rules (or grammar).
• The Symbols are combined to convey new information.
• The Rules govern the manipulation of symbols.
Natural Language Processing (NLP)
• NLP encompasses anything a computer needs to understand natural language
(typed or spoken) and also generate the natural language.
Natural Language Understanding (NLU):
The NLU task is understanding and reasoning while the input is a natural language.
Here we ignore the issues of natural language generation.
Natural Language Generation (NLG):
NLG is a subfield of natural language processing NLP.
NLG is also referred to text generation.
Advantages of NLP
• NLP helps users to ask questions about any subject and get a direct
response within seconds.
• NLP offers exact answers to the question means it does not offer
unnecessary and unwanted information.
• NLP helps computers to communicate with humans in their languages.
• Most of the companies use NLP to improve the efficiency of
documentation processes.
Kinds of Natural Language
• The input or output of a natural language processing system could be
called
• written text and
• speech
• We will mainly focused on written text (not necessarily speech).
• To be able to method written text, we need:
• Lexical
• Syntactic
• Semantic information
• Discourse details
• Real world knowledge
Formal Language
• Before defining formal language, we need to define symbols, alphabets, strings and words.
• Symbol: Symbol is a character, an abstract entity that has no meaning by itself. e.g.,
Letters, digits and special characters
• Alphabet: Alphabet is finite set of symbols,
An alphabet is often denoted by Σ (sigma)
e.g., B = {0,1} says B is an alphabet of two symbols, 0 and 1.
C = {a,b,c} says C is an alphabet of three symbols a, b and c.
• String or a word: String or a word is a finite sequence of symbols from an alphabet.
e.g. 01110 and 111 are strings from the alphabet B above.
aaabccc and b are strings from the alphabet C above.
• Language: Language is a set of strings from an alphabet.
• Formal Language (or simply language) is a set L of strings over some finite alphabet Σ .
Formal language is described using formal grammars.
Information Extraction
• Extraction of meaning from email :-
We have decided to meet tomorrow at 10:00am in the lab.
To do : meeting
Time : 10:00am
Venue : Lab
Applications of NLP
• Language Translation :
Translate a sentence from one natural language to another.
• Text Summarization :
Extract keywords from a large piece of text.
Creating an abstract of an entire article.
• Context Analysis :
Social networking sites can ‘fairly’ understand the topic of discussion
“4 of your friends posted about Indian Institute of Technology, Kanpur”.
• Sentiment Analysis :
Help companies analyze large number of reviews on a product.
Help customers process the reviews provided on a product.
• Information Retrieval :
Selecting from a set of documents the ones that are relevant to a query
• Text Categorization :
Sorting text into fixed topic categories
• Extracting data from text :
Converting unstructured text into structure data
Applications of NLP
• Database Access
• Spoken language control systems
• Spelling and grammar checkers and correction
• Question Answering
• Spam Detection
• Speech Recognition (Voice to Text)
• Information Extraction
• Chat boat
Tasks in NLP/ How to build an NLP
pipeline
Tokenization / Segmentation

Disambiguation

Stemming

Part of Speech (POS) tagging

Sentiment Analysis

Text Summarization
Segmentation
• Segmenting text into words
• E.g. “The meeting has been scheduled for this Saturday.”
• “He has agreed to co-operate with me.”
• “Indian Airlines introduces another flight on the New Delhi-Mumbai
route.”
• “We are leaving for the U.S.A. on 26th May.”
Word Segmentation
• Breaking a string of characters into a sequence of words.
• Even in English, character other than white-space can be used to
separate words [e.g. , ; . - : ()]
• Examples from English URLs:
• jumtheshark.com=>jump the shark.com
• myspace.com/pluckerswingsbar=>myspace.com pluckers wing bar
• =>myspace.com plucker swing bar
Ambiguity
• Natural language is highly ambiguous and must be disambiguated.
• I saw the Grand Canyon flying to LA.
• Time flies like an arrow.
• Time runners like a coach.
Word Sense Disambiguation
• “Same word different meanings.”
• E.g. I can hear bass sound. VS He likes to eat grilled bass.
• I can hear bass/frequency sound. VS He likes to eat grilled bass/fish.
Word Sense Disambiguation (Cont…)
Stemming
• Stemming is the process for reducing inflected (or sometimes derived)
words to their stem, base or root form.
• E.g. car, cars -> car
• ran, run, running -> run
• stemmer, stemming, stemmed -> stem
POS tagging
• Part of speech (POS) recognition
• E.g. “Today is a beautiful day.”
• Today – Noun
• is – Verb
• a – Article
• beautiful – Adjective
• day - Noun
Sentiment Analysis
• Reviews about a restaurant :-
• E.g. “Service was very disappointing.”
Text Summarization
• Give a piece of text, automatically make a summary satisfying
required constraints.
• Examples of constraints:
• Summary should have all the information of the document.
• Summary should have only correct information of the document.
• Summary should have information only from the document.
• And so on, depending on the user’s needs!
Text Summarization (Cont…)
Problems faced in NLP
• Incomplete description
• Same word different Meanings
• New Words, Expressions and Meanings are generated quite freely.
• There are a lot of ways of telling the same thing.
Steps in NLP / Phases of NLP

Morphological and Lexical Analysis

Syntactic Analysis

Semantic Analysis

Discourse Integration

Pragmatic Analysis
Morphological and Lexical Analysis
• The lexicon of language is its vocabulary that includes its words and
expressions.
• Morphology depicts analyzing, identifying and description of structure
of words.
• The words are generally accepted as being the smallest units of syntax.
• The syntax refers to the rules and principles that govern the sentence
structure of any individual language.
• Lexical analysis: Lexical analysis involves dividing a text into
paragraphs, words and the sentences.
Morphological and Lexical Analysis
• A morpheme is the smallest unit of meaning.
• Morphology is the structure of words.
• It is concerned with derivation of new words from existing ones.
• Individual words are analyzed into their components and non word
tokens such as punctuations are separated from the words.
• Interpretation for affixes (prefix and suffixes) may depend on the
syntactic category of the complete word.
Morphological and Lexical Analysis:
Example
• Suppose we have an English interface to an operating system and the
following sentence is typed:
• I want to print Bill’s .init file.
• Morphological analysis must do the following things:
• - Pull apart the word “Bill’s” into proper noun “Bill” and the possessive
suffix “s”.
- Recognize the sentence “.init” as a file extension that is functioning an
adjective in the sentence.
- “print” – either a plural noun (many prints) or a third person singular verb
(he prints)
• E.g. the construction of friendly from the root ‘friend’ and the suffix ‘ly’.
Morphological and Lexical Analysis
Types of Morphemes
Types of Morphemes
Syntactic Analysis
• Syntax concerns the proper ordering of words and its affect on meaning.
• This involves analysis of the words in a sentence and the grammatical structure of the
sentence.
• Linear sequences of words are transformed into structures that show how the words relate
to each other.
• This knowledge relates to how words are put together or structured to form grammatically
correct sentences in the language.
• Some word sequences may be rejected if they violate the rules of the language for how
words may be combined.
• E.g. “the girl the go to the school”. This would definitely be rejected by the English
syntactic analyzer.
• Example: “Boy the go the to store”
• Checks that the sentence is correct according with the grammar and returns a parse tree that
representing the structure of the sentence.
Syntactic Analysis
• Here the analysis is of words in a sentence to know the grammatical
structure of the sentence.
• The words are transformed into structures that show how the words
relate to each others.
• Some word sequences may be rejected if they violate the rules of the
language for how words may be combined.
• Example: An English syntactic analyzer would reject the sentence say:
“Boy the go the to store.”
Phrase Structure Rules
Phrase Structure Rules
Syntactic Analysis: Example
Modeling a sentence using Phase Structure
Modeling a sentence using Phase Structure
Syntactic Analysis: Example
Syntactic Analysis - Grammar
Syntactic Analysis - Parsing
Syntactic Analysis (Cont…)
A Parse tree for a sentence
Syntactic Analysis Example
Parsing Example
Top Down vs. Bottom Up
To parse a sentence, the sentence could have been generated from the
start symbol.
• Top-down parsing -Begin with the start symbol and apply the
grammar rules forward until the system at the terminals of the tree
correspond to the components of the sentence being parsed.

• Bottom-up parsing -Begin with the sentence to be parsed and apply


the grammar rules backward until a single tree whose terminals are the
words of the sentence and whose top node is start sym. has been
produced.
Top Down Parsing
Bottom Up Parsing
Syntax
Semantic Analysis
• It derives an absolute (dictionary definition) meaning from context; it
determines the possible meanings of a sentence in a context.
• The structures created by the syntactic analyzer are assigned
meanings.
• Thus, a mapping is made between the syntactic structures and objects
in the task domain. The structures for which no such mapping is
possible are rejected.
• Example: the sentence “Colorless green ideas…” would be rejected as
semantically anomalous because colorless and green make no sense.
Semantic Analysis
• Semantics concerns the meaning of words, phrases, and sentences.
• Semantic analysis must do two important things:
• It must map individual words into appropriate objects in the knowledgebase or
database.
• It must create the correct structures to correspond to the way the meanings of
the individual words combine with each other.
• E.g. “colorless blue idea”. This would be rejected by the analyzer as
colorless blue do not make any sense together.
Semantic Analysis (Cont…)
• Semantic analysis is concerned with the meaning of the language.
• The first step in any semantic processing system is to look up the individual words
in a dictionary (or lexicon) and extract their meanings.
• Unfortunately, many words have several meanings, for example, the word
‘diamond ’ might have the following set of meanings:
• (1) a geometrical shape with four equal sides.
• (2) an extremely hard and valuable gemstone.
• The process of determining the correct meaning of an individual word is call word
sense disambiguation or lexical disambiguation.
E.g. “plant” = industrial plant
“plant” = living organism
Semantics
Discourse Integration
• Sense of the context.
• The meaning of any individual/single sentence may depends upon the
sentences that precedes it and also invokes the meaning of the
sentences that follow it.
• We have figured out what kinds of things this sentence is about.
• E.g. The word “it” in the sentence “She wanted it” depends upon the
prior discourse context.
Pragmatic Analysis
• It derives knowledge from external commonsense information; it
means undertaking the purposeful use of language in situations,
particularly those aspects of language which require world knowledge;
The idea is, what was said is reinterpreted to determine what was
actually meant.
• Example: the sentence
• “Do you know what time it is?”
• Should be interpreted as a request.
Pragmatic Analysis
• Pragmatics concerns the overall communicative and social context and
its effect on interpretation.
• Final step-understanding is to decide what to do as a result.
• The main focus is on what was said is reinterpreted on what it actually
means.
• E.g. “close the window?” should have been interpreted as a request
rather than an order.
• This is high-level knowledge which relates to the use of sentences in
different contexts and how the context affects the meaning of the
sentences.
Pragmatic Analysis (Cont…)
• Where, by who, to whom, why, when it was said
• Intentions: inform, request, promise, criticize,…
• Handling Pronouns
- “Mary eats apples. She likes them.”
- She=“Mary”, them=“apples”
• Handling ambiguity
- Pragmatic ambiguity: “you are late”: What is the speaker’s intension:
informing or critizing?
The final step toward effective understanding is to decide what to do as a
result.
Pragmatics
“Classical” NLP pipeline
Grammatical Structure of Utterances
Grammatical Structure of Utterances
Grammatical Structure of Utterances
Classification of Phrases
Syntactic Processing
NLP in other domain
• Bio-medical
• Forensic Science
• Advertisement
• Education
• Politics
• Business development
• Marketing
• And where ever we use language!!!
Advantages of NLP
• NLP helps users to ask questions about any subject and get a direct
response within seconds.
• NLP offers exact answers to the question means it does not offer
unnecessary and unwanted information.
• NLP helps computers to communicate with humans in their languages.
• It is very time efficient.
• Most of the companies use NLP to improve the efficiency of
documentation processes, accuracy of documentation, and identify the
information from large databases.
Disadvantages of NLP
• NLP may not show context.
• NLP is unpredictable
• NLP may require more keystrokes.
• NLP is unable to adapt to the new domain, and it has a limited
function that's why NLP is built for a single and specific task only.
Future of NLP
• Semantic web/ search
• Sentiment analysis/ opinion mining
• Machine translation
• Advance speech processing application
• Social network analysis
• Collective intelligence
Future of NLP
• Human level or human readable natural language processing is an AI-complete
problem.
• It is equivalent to solving the central artificial intelligence problem and making
computers as intelligent as people.
• Make computers as they can solve problems like human and think like humans as
well as perform activities that humans can’t perform and making it more efficient
than humans.
• NLP’s future is closely linked to the growth of artificial intelligence.
• As natural language understanding or readability improves, computers and
machines or devices will be able to learn from the information online and apply
what they learned in the real world.
• Combined with natural language generation, computers will become more and
more capable of receiving and giving useful and resourceful information or data.
Machine Learning — Text Processing
• Text Processing is one of the most common task in many ML applications. Below are some examples of such applications.
Step 1 : Data Pre processing
• Tokenization — convert sentences to words
• Removing unnecessary punctuation, tags
• Removing stop words — frequent words such as ”the”, ”is”, etc. that do not have specific semantic
• Stemming — words are reduced to a root by removing inflection through dropping unnecessary
characters, usually a suffix.
• Lemmatization — Another approach to remove inflection by determining the part of speech and
utilizing detailed database of the language.
Step 2: Feature Extraction
• In text processing, words of the text represent discrete, categorical features.
• Bag of Words (BOW): We make the list of unique words in the text corpus called vocabulary.
Then we can represent each sentence or document as a vector with each word represented as 1 for
present and 0 for absent from the vocabulary. Another representation can be count the number of
times each word appears in a document. The most popular approach is using the Term
Frequency-Inverse Document Frequency (TF-IDF) technique.
• Term Frequency (TF) = (Number of times term t appears in a document)/(Number of terms in the
document)
• Inverse Document Frequency (IDF) = log(N/n), where, N is the number of documents and n is
the number of documents a term t has appeared in. The IDF of a rare word is high, whereas the IDF
of a frequent word is likely to be low. Thus having the effect of highlighting words that are
distinct.
• We calculate TF-IDF value of a term as = TF * IDF
Step 2: Feature Extraction
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy