0% found this document useful (0 votes)
135 views25 pages

Natural Language Processing: Dr. Abdulfetah A.A

This document provides an outline for a course on Natural Language Processing (NLP). The course will cover basic NLP techniques and modeling, syntactic and semantic parsing, and applications like information extraction and machine translation. It will involve 6 units over 3 main topics. Assessment will include assignments, a midterm, and a final exam. The document also discusses why NLP is an important and challenging field due to the complexity of human language.

Uploaded by

Jemal Yaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views25 pages

Natural Language Processing: Dr. Abdulfetah A.A

This document provides an outline for a course on Natural Language Processing (NLP). The course will cover basic NLP techniques and modeling, syntactic and semantic parsing, and applications like information extraction and machine translation. It will involve 6 units over 3 main topics. Assessment will include assignments, a midterm, and a final exam. The document also discusses why NLP is an important and challenging field due to the complexity of human language.

Uploaded by

Jemal Yaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Natural Language Processing

Dr. Abdulfetah A.A

1
Course outline(tentative)
• Basics of Natural Language Processing and Language
modeling techniques.
• Syntactic and Semantic Parsing
• NLP: Applications: Information Extraction & Machine
Translation
Units
• Unit 1 and 2: Basics and modeling techniques
• Unit 3 and 4: Syntactic and Semantic Parsing
• Unit 5 and 6: Information Extraction & Machine
Translation
2
References

• Jurafsky, D., and Martin, J.H., Speech and


Language Processing, 2nd Edn, Prentice Hall,
2008
– New draft: https://web.stanford.edu/ jurafsky/slp3/
– References to online draft where possible: J&M3
• Bird, S., Klein, E., Loper, E. (2009). Natural
Language Processing with Python, O'Reilly,
2009
• Web Resources
3
Assessment types

• The course assessment will contain at least


the following types
– Assignments : 40%
– Midterm Exam : 20%
– Final Exam: 40%

4
Natural Language Processing
• Topics for today
– General introduction to NLP
• Why study NLP?

5
Natural language and NLP
• “natural” language
– Amharic, English, Chinese , German Japanese, etc
• Ultimate goal
– To build computer systems that perform as well at using natural
language as humans do
• Immediate goal
– To build computer systems that can process text and speech more
intelligently

6
Why NLP
• Why do we need computers to understand (or
generate) human language?
– Huge amounts of data on the Internet
– We need applications for processing
(understanding, retrieving, translating,
summarizing, …) this large amounts of texts.
– People expect interactive agents to communicate
in NL
E.g. dialogue systems
7
Dialogue systems

• Require both understanding and generation


– Dave: Open the pod bay doors, HAL.
– HAL: I'm sorry Dave, I'm afraid I can't do that.
– Dave: What's the problem?
– HAL: I think you know what the problem is just as
well as I do.

8
Knowledge of Language(steps NLU)
• Task: given sentence, get
some representation
computer can use for
question answering

9
Why NLP
NLP Applications:
• Classifiers: classify a set of document into categories, (as spam filters)
• Information Retrieval: find relevant documents to a given query.
• Information Extraction: Extract useful information from resumes; discover
names of people and events they participate in, from a document.
• Machine Translation: translate text from one human language into another
• Question Answering: find answers to natural language questions in a text
collection or database…
• Summarization: Produce a readable summary, e.g., news about oil today.
• Sentiment Analysis, identify people opinion on a subjective.
• Speech Processing: book a hotel over the phone
• Spelling checkers, grammar checkers, auto-filling, ….. and more

10
Why NLP is complex/hard
• Natural language is extremely rich in form and structure, constantly
changing and ambiguous.
• How to represent meaning,
• Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity can be at
different levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning
of that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.

11
Linguistics Level of Ambiguity/Analysis

• Phonology: sounds / letters / pronunciation


– two, too.
• Morphology: the structure of words
• child – children, book - books;
• Syntax: grammar, how these sequences are structured
• I saw the man with the telescope
• Semantics: meaning of the strings
• table as data structure, table as furniture.
• Dealing with all of these levels of ambiguity make NLP
difficult

12
Knowledge of Language
• Phonetics and Phonology
– Phonetics: is computer processing concerned with
physical sounds of language; performed using
signal processing methodology. It can be divided
into speech generation and speech analysis.
– Phonology: is linguistic processing of the sounds of
spoken language; higher level than phonetics,
mainly concerned with elementary sound units of
a language called phonemes.

13
Knowledge of Language
• Phonetics and phonology: speech sounds,
their production, and the rule systems that
govern their use
• Morphology: words and their composition
from more basic units
– Cat, cats (inflectional morphology)
– Child, children
– Friend, friendly (derivational morphology)

14
Knowledge of Language
Syntax: is concerned with the sentence structure, i.e., the
rules for arranging words within a sentence.
– One of the main tasks is parsing, which is the task of producing
a parse tree given a sentence as the input.
– Grammar— set of rules for deriving syntactic structure
– Semantics: is interpreting literal meaning of language up
to the sentence level.
– Lexical semantics: semantics of words
– Building semantic representation of larger structures
– Methodology: neural networks, FOPC (first-order logic),
unification

15
Knowledge of Language
• Pragmatics: is concerned with intended, practical meaning
of language.
– Example: “Could you print this document?”
• Discourse: is concerned with language structure beyond
sentence level; such as inter-sentence relations,
references, and document structure.
– Examples: turn taking, speech acts
– Sue took the trip to New York. She had a great time there.
• Sue/she;
• New York/there;
• took/had (time)

16
Ambiguity
• Some interpretations of : I made her duck.
1. I cooked duck for her.
2. I cooked duck belonging to her.
3. I created a toy duck which she owns.
4. I caused her to quickly lower her head or body.
5. I used magic and turned her into a duck.
• duck – morphologically and syntactically ambiguous: noun
or verb.
• her – syntactically ambiguous: dative or possessive.
• make – semantically ambiguous: cook or create.
• make – syntactically ambiguous

17
Resolve Ambiguities
• We will introduce models and algorithms to resolve
ambiguities at different levels.
• part-of-speech tagging -- Deciding whether duck is verb or
noun.
• word-sense disambiguation -- Deciding whether make is
create or cook.
• lexical disambiguation -- Resolution of part-of-speech and
word-sense ambiguities are two important kinds of lexical
disambiguation.
• syntactic ambiguity -- her duck is an example of syntactic
ambiguity, and can be addressed by probabilistic parsing.

18
Resolve Ambiguities (cont.)
I made her duck

S S

NP VP NP VP

I V NP NP I V NP

made her duck made DET N

her duck

19
Models to Represent Linguistic Knowledge

• We will use certain formalisms (models) to


represent the required linguistic knowledge.
• State Machines -- FSAs, FSTs, HMMs, ATNs,
RTNs
• Formal Rule Systems -- Context Free Grammars,
Unification Grammars, Probabilistic CFGs.
• Models of Uncertainty -- Bayesian probability
theory.

20
Algorithms to Manipulate Linguistic Knowledge

• Used to manipulate the representations and produce the desired


behavior
– choosing among possibilities and combining pieces
• Many of the algorithms that we’ll study will turn out to be
transducers; algorithms that take one kind of structure as input and
output another.
• In particular..
– State-space search
• To manage the problem of making choices during processing when we lack the
information needed to make the right choice
– Dynamic programming
• To avoid having to redo work during the course of a state-space search
– Machine Learning (classifiers, EM, etc)

21
State Space Search

• States represent pairings of partially processed


inputs with partially constructed answers
– E.g. sentence + partial parse tree
• Goal is to arrive at the right/best structure after
having processed all the input.
– E.g. the best parse tree spanning the sentence
• As with most interesting AI problems the spaces
are too large and the criteria for “bestness” are
difficult to encode (so heuristics, probabilities)

22
Dynamic Programming

• Don’t do the same work over and over.


• Avoid this by building and making use of
solutions to sub-problems that must be
invariant across all parts of the space.

23
Reading
• Slp2: Chapter 1:
– 1.4 Language, Thought, and Understanding
– 1.5 the state of the art
– 1.6 history
Next lecture : Morphology Analysis: FST
Unit 2: Language model:

24
25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy