1-Introduction To NLP
1-Introduction To NLP
Spring Semester
Natural Language
Processing
Dr. Wafaa Samy
Introduction to Natural Language
Processing (NLP)
Lecture (1)
2
Contents
• Text Books
• Assessment Criteria
• Basic Definitions
• Linguistics Levels (Disciplines)
• Applications of Natural Language Processing
• Categories of Knowledge of Language
Text Books
• Daniel Jurafsky and James H. Martin, Speech and Language
Processing, Prentice-Hall, Second Edition, 2009.
4
Assessment Criteria
• Course Activities 30
o Quizzes 20
o Project 7
o Sections Attendance 3
o Lectures Bonus
• Mid-Term Exam 20
• Final Exam 50
5
Languages
• Languages can be classified as: Natural Language and
Artificial Language.
7
Computational Linguistics (CL)
• Linguistics is the study and the description of human
languages.
• Computational Linguistics (CL) is the part of the science of
human language that uses computers to aid observation of
or experiment with language.
o The use of computers in the study of languages.
o CL goal is to design mathematical models of language
structures enabling the automation of language processing by a
computer.
• Computational linguistics can be considered as:
o The formalization of linguistic theories and models or their
implementation in a machine.
o i.e. to develop new linguistic theories with the aid of a
computer.
8
Linguistics Levels (Disciplines)
• Linguistics has been divided into disciplines or
levels, which go from sounds to meaning.
9
Linguistics Levels (Disciplines) (Cont.)
• Phonetics: Concerns the production and perception of acoustic
sounds that form the speech signal. In each language, sounds can
be classified into a finite set of phonemes.
• Morphology: The second level concerns the words. Morphology is
the study of the structure and the forms of a word.
o The word set of a language is called a lexicon. Usually a lexicon consists of
root words.
o Words can appear under several forms, for instance, the singular and the
plural forms (e.g. book and books, walk and walking).
• Syntax: Study the order of words in a sentence and their
relationships. Syntax defines word categories and functions.
o Subject, verb, object is a sequence of functions that corresponds to a
common order in many European languages including English and French.
10
o Parsing determines the structure of a sentence.
Linguistics Levels (Disciplines) (Cont.)
• Semantic: It considers the meaning of words and sentences.
• Pragmatics: Pragmatics is the meaning of words and sentences in specific
situations (Study how the context of a sentence contributes to its meaning).
o Pragmatics is semantics restricted to a specific context and relies on facts that
are external to the sentence.
o These facts contribute to the inference of a sentence’s meaning or prove its
truth or falsity.
• Discourse: The production of language consists of a stream of sentences
that are linked together to form a discourse.
o This discourse is usually aimed at other people who can answer – it is to be
hoped through a dialogue.
o A dialogue is a set of linguistic interactions that enables the exchange of
information and sometimes eliminates misunderstandings or ambiguities.
11
What is Natural Language Processing
(NLP)?
• Natural language processing (NLP) is a field of computer
science and artificial intelligence concerned with the
interactions between computers and human languages.
o In particular, how to program computers to process and analyze
large amounts of natural language data.
o The goal is a computer capable of "understanding" the contents
of documents, including the contextual nuances of the language
within them.
Natural Natural
Language Language
13
Applications of Natural Language
Processing
• Text-Based Applications.
• Dialogue-Based Applications.
14
Text-Based Applications
• Text-based applications involve the “processing of written
text”, such as books, newspapers, reports, manuals, e-mail
message, etc.
• Examples of text-based applications:
o Finding appropriate documents on certain topics from a
database of texts.
e.g. finding relevant books in a library.
o Machine Translation (MT) from one language to another.
e.g. Translating documents from one language to another.
o Extracting information from messages or articles on
certain topics.
o Summarizing texts for certain purposes.
o Web-based question answering.
15
Dialogue-Based Applications
• Dialogue-based applications involve human-machine
communication.
• Examples of dialogue-based applications:
o Question-answering systems, where natural language is
used to query a database.
o Automated customer service over the telephone.
e.g. to perform banking transactions or order items from a catalogue.
21
Example (1) (Cont.)
• Consider each example below as a candidate for the
initial sentence of the book concerning natural
language processing:
o Green frogs have large noses.
Syntax?
Semantics?
Pragmatics?
22
Example (1) (Cont.)
• Consider each example below as a candidate for the
initial sentence of the book concerning natural
language processing: .الضفادع الخضراء لها أنوف كبيرة
23
Example (1) (Cont.)
• Consider each example below as a candidate for the initial
sentence of the book concerning natural language
processing:
o Green ideas have large noses.
Syntax?
Semantics?
Pragmatics?
24
Example (1) (Cont.)
• Consider each example below as a candidate for the initial
sentence of the book concerning natural language
processing: .األفكار الخضراء لها أنوف كبيرة
Syntax?
Semantics?
Pragmatics?
26
Example (1) (Cont.)
• Consider each example below as a candidate for the
initial sentence of the book concerning natural
language processing:
o Large have green ideas nose.
28
Example (3)
• Given that the person uttering the following
sentences is responding to a complaint that the car
is too cold.
a. The heater are on.
b. The tires are brand new.
• Classify these sentences along each of the following
dimensions:
i. Syntactically correct or not.
ii. Semantically correct or not.
iii. Pragmatically correct or not.
29
Example (3) (Cont.)
• Given that the person uttering the following
sentences is responding to a complaint that the car
is too cold.
a. The heater are on. correct syntax is “The heater is on”
i. Syntactically incorrect.
ii. Semantically correct.
iii. Pragmatically correct.
30