Intro. To NLP
Intro. To NLP
1. Intro to NLP
1
What is NLP?
Natural Language Processing (NLP) is a field in
Artificial Intelligence (AI) devoted to creating
computers that use natural language as input
and/or output.
2
3
Components of NLP
• Natural Language Understanding
– Mapping the given input in the natural language into a useful
representation.
– Different level of analysis required:
morphological analysis,
syntactic analysis,
semantic analysis,
discourse analysis, …
• Natural Language Generation
– Producing output in the natural language from some internal
representation.
– Different level of synthesis required:
deep planning (what to say),
syntactic generation
• NL Understanding is much harder than NL Generation.
But, still both of them are hard. 4
Why NLP?
5
Test for Intelligence – Turing Test
Alan Turing (1950) proposed a
test of a machine's capability to
perform human-like
conversation.
6
Early Conversational Programs
• ELIZA (by Joseph Weizenbaum), 1966
– A psychotherapist
– No real understanding; simple pattern-matching
to respond to user input -> canned responses
7
>> Hello.
How do you do. Please state your problem.
>> Men are all alike.
In what way ?
>> They're alwas bugging us about something or other.
I'm not sure I understand you fully.
>> They're always bugging us about something or other.
Can you think of a specific example ?
>> Well, my boyfriend made me come here.
Your boyfriend made you come here ?
>> He says I'm depressed much of the time.
I am sorry to hear that you are depressed.
8
Modern NLP:
NLP in the Days of Big Data
Three trends:
1. An enormous amount of information is now
available in machine readable form as natural
language text (newspapers, web pages, medical
records, financial filings, product reviews, discussion
forums, etc.)
2. Conversational agents are becoming an important
form of human-computer communication
3. Much of human-human interaction is now mediated by
computers via social media
9
NLP Applications
• Three prominent application areas:
Text analytics/mining (from “unstructured data”)
Sentiment analysis
Topic identification
Digital Humanities (“new ways of doing scholarship that
involve collaborative, transdisciplinary, and computationally
engaged research, teaching, and publishing.”)
Conversational agents
Siri, Cortana, Amazon Alexa, Google Assistant
Chatbots
Machine translation
10
Text Analytics
• Data-mining of weblogs, microblogs, discussion forums,
user reviews, and other forms of user-generated media.
11
Text Analytics (cont.)
• Typically this involves the extraction of limited kinds of
semantic and pragmatic information from texts
– Entity mentions
– Concept identification
– Sentiment
12
Demo
• Concept Extraction
– http://aylien.com/concept-extraction/
13
Conversational Agents
• Combine
– Speech recognition/synthesis
– Question answering
• From the web and from structured information sources (freebase,
dbpedia, yago, etc.)
– Simple agent-like abilities
• Create/edit calendar entries
• Reminders
• Directions
• Invoking/interacting with other apps
14
15
Question Answering
• Traditional information retrieval provides
documents/resources that provide users with what they
need to satisfy their information needs.
• Question answering on the other hand directly provides
an answer to information needs posed as questions.
16
IBM Watson
https://www.youtube.com/watch?v=WFR3lOm_xhE
17
Machine Translation (MT)
• The automatic translation of texts between languages is one of the
oldest non-numerical applications in Computer Science.
• In the past 15 years or so, MT has gone from a niche academic
curiosity to a robust commercial industry.
18