Natural Language Processing
Natural Language Processing
(NLP)
What is NLP?
• Artificial Intelligence(AI): the broad discipline of creating intelligent machines (Machines work
like a human)
• Machine Learning (ML): is the science of getting computers to act without being explicitly
programmed. It refers to systems that can learn from previous experience. (medical diagnosis,
image recognition, speech recognition, online fraud detection etc.)
• Natural Language Processing (NLP): refers to systems that can understand human language.
What is NLP?
• NLP is Natural Language Processing.
• Natural Languages are those spoken by people.
• Natural Language Processing(NLP) is a field of computer science, artificial intelligence (also
called machine learning) concerned with the interactions between computer and human (natural)
languages.
• Specifically, the process of a computer extracting meaningful information from natural language
input and/or producing natural language output.
• It goal is to process text data (unstructured data) to perform tasks like translation, grammar
checking, topic classification, document similarities etc.
• Example: Google Assistant, Siri, Alexa
• NLP encompasses anything a computer needs to understand natural language (typed or spoken)
and also generate the natural language.
• A language is a system, a set of symbols and set of rules (or grammar).
• The Symbols are combined to convey new information.
• The Rules govern the manipulation of symbols.
Natural Language Processing (NLP)
• NLP encompasses anything a computer needs to understand natural language
(typed or spoken) and also generate the natural language.
Natural Language Understanding (NLU):
The NLU task is understanding and reasoning while the input is a natural language.
Here we ignore the issues of natural language generation.
Natural Language Generation (NLG):
NLG is a subfield of natural language processing NLP.
NLG is also referred to text generation.
Advantages of NLP
• NLP helps users to ask questions about any subject and get a direct
response within seconds.
• NLP offers exact answers to the question means it does not offer
unnecessary and unwanted information.
• NLP helps computers to communicate with humans in their languages.
• Most of the companies use NLP to improve the efficiency of
documentation processes.
Kinds of Natural Language
• The input or output of a natural language processing system could be
called
• written text and
• speech
• We will mainly focused on written text (not necessarily speech).
• To be able to method written text, we need:
• Lexical
• Syntactic
• Semantic information
• Discourse details
• Real world knowledge
Formal Language
• Before defining formal language, we need to define symbols, alphabets, strings and words.
• Symbol: Symbol is a character, an abstract entity that has no meaning by itself. e.g.,
Letters, digits and special characters
• Alphabet: Alphabet is finite set of symbols,
An alphabet is often denoted by Σ (sigma)
e.g., B = {0,1} says B is an alphabet of two symbols, 0 and 1.
C = {a,b,c} says C is an alphabet of three symbols a, b and c.
• String or a word: String or a word is a finite sequence of symbols from an alphabet.
e.g. 01110 and 111 are strings from the alphabet B above.
aaabccc and b are strings from the alphabet C above.
• Language: Language is a set of strings from an alphabet.
• Formal Language (or simply language) is a set L of strings over some finite alphabet Σ .
Formal language is described using formal grammars.
Information Extraction
• Extraction of meaning from email :-
We have decided to meet tomorrow at 10:00am in the lab.
To do : meeting
Time : 10:00am
Venue : Lab
Applications of NLP
• Language Translation :
Translate a sentence from one natural language to another.
• Text Summarization :
Extract keywords from a large piece of text.
Creating an abstract of an entire article.
• Context Analysis :
Social networking sites can ‘fairly’ understand the topic of discussion
“4 of your friends posted about Indian Institute of Technology, Kanpur”.
• Sentiment Analysis :
Help companies analyze large number of reviews on a product.
Help customers process the reviews provided on a product.
• Information Retrieval :
Selecting from a set of documents the ones that are relevant to a query
• Text Categorization :
Sorting text into fixed topic categories
• Extracting data from text :
Converting unstructured text into structure data
Applications of NLP
• Database Access
• Spoken language control systems
• Spelling and grammar checkers and correction
• Question Answering
• Spam Detection
• Speech Recognition (Voice to Text)
• Information Extraction
• Chat boat
Tasks in NLP/ How to build an NLP
pipeline
Tokenization / Segmentation
Disambiguation
Stemming
Sentiment Analysis
Text Summarization
Segmentation
• Segmenting text into words
• E.g. “The meeting has been scheduled for this Saturday.”
• “He has agreed to co-operate with me.”
• “Indian Airlines introduces another flight on the New Delhi-Mumbai
route.”
• “We are leaving for the U.S.A. on 26th May.”
Word Segmentation
• Breaking a string of characters into a sequence of words.
• Even in English, character other than white-space can be used to
separate words [e.g. , ; . - : ()]
• Examples from English URLs:
• jumtheshark.com=>jump the shark.com
• myspace.com/pluckerswingsbar=>myspace.com pluckers wing bar
• =>myspace.com plucker swing bar
Ambiguity
• Natural language is highly ambiguous and must be disambiguated.
• I saw the Grand Canyon flying to LA.
• Time flies like an arrow.
• Time runners like a coach.
Word Sense Disambiguation
• “Same word different meanings.”
• E.g. I can hear bass sound. VS He likes to eat grilled bass.
• I can hear bass/frequency sound. VS He likes to eat grilled bass/fish.
Word Sense Disambiguation (Cont…)
Stemming
• Stemming is the process for reducing inflected (or sometimes derived)
words to their stem, base or root form.
• E.g. car, cars -> car
• ran, run, running -> run
• stemmer, stemming, stemmed -> stem
POS tagging
• Part of speech (POS) recognition
• E.g. “Today is a beautiful day.”
• Today – Noun
• is – Verb
• a – Article
• beautiful – Adjective
• day - Noun
Sentiment Analysis
• Reviews about a restaurant :-
• E.g. “Service was very disappointing.”
Text Summarization
• Give a piece of text, automatically make a summary satisfying
required constraints.
• Examples of constraints:
• Summary should have all the information of the document.
• Summary should have only correct information of the document.
• Summary should have information only from the document.
• And so on, depending on the user’s needs!
Text Summarization (Cont…)
Problems faced in NLP
• Incomplete description
• Same word different Meanings
• New Words, Expressions and Meanings are generated quite freely.
• There are a lot of ways of telling the same thing.
Steps in NLP / Phases of NLP
Syntactic Analysis
Semantic Analysis
Discourse Integration
Pragmatic Analysis
Morphological and Lexical Analysis
• The lexicon of language is its vocabulary that includes its words and
expressions.
• Morphology depicts analyzing, identifying and description of structure
of words.
• The words are generally accepted as being the smallest units of syntax.
• The syntax refers to the rules and principles that govern the sentence
structure of any individual language.
• Lexical analysis: Lexical analysis involves dividing a text into
paragraphs, words and the sentences.
Morphological and Lexical Analysis
• A morpheme is the smallest unit of meaning.
• Morphology is the structure of words.
• It is concerned with derivation of new words from existing ones.
• Individual words are analyzed into their components and non word
tokens such as punctuations are separated from the words.
• Interpretation for affixes (prefix and suffixes) may depend on the
syntactic category of the complete word.
Morphological and Lexical Analysis:
Example
• Suppose we have an English interface to an operating system and the
following sentence is typed:
• I want to print Bill’s .init file.
• Morphological analysis must do the following things:
• - Pull apart the word “Bill’s” into proper noun “Bill” and the possessive
suffix “s”.
- Recognize the sentence “.init” as a file extension that is functioning an
adjective in the sentence.
- “print” – either a plural noun (many prints) or a third person singular verb
(he prints)
• E.g. the construction of friendly from the root ‘friend’ and the suffix ‘ly’.
Morphological and Lexical Analysis
Types of Morphemes
Types of Morphemes
Syntactic Analysis
• Syntax concerns the proper ordering of words and its affect on meaning.
• This involves analysis of the words in a sentence and the grammatical structure of the
sentence.
• Linear sequences of words are transformed into structures that show how the words relate
to each other.
• This knowledge relates to how words are put together or structured to form grammatically
correct sentences in the language.
• Some word sequences may be rejected if they violate the rules of the language for how
words may be combined.
• E.g. “the girl the go to the school”. This would definitely be rejected by the English
syntactic analyzer.
• Example: “Boy the go the to store”
• Checks that the sentence is correct according with the grammar and returns a parse tree that
representing the structure of the sentence.
Syntactic Analysis
• Here the analysis is of words in a sentence to know the grammatical
structure of the sentence.
• The words are transformed into structures that show how the words
relate to each others.
• Some word sequences may be rejected if they violate the rules of the
language for how words may be combined.
• Example: An English syntactic analyzer would reject the sentence say:
“Boy the go the to store.”
Phrase Structure Rules
Phrase Structure Rules
Syntactic Analysis: Example
Modeling a sentence using Phase Structure
Modeling a sentence using Phase Structure
Syntactic Analysis: Example
Syntactic Analysis - Grammar
Syntactic Analysis - Parsing
Syntactic Analysis (Cont…)
A Parse tree for a sentence
Syntactic Analysis Example
Parsing Example
Top Down vs. Bottom Up
To parse a sentence, the sentence could have been generated from the
start symbol.
• Top-down parsing -Begin with the start symbol and apply the
grammar rules forward until the system at the terminals of the tree
correspond to the components of the sentence being parsed.