0% found this document useful (0 votes)
12 views44 pages

My M-7

Uploaded by

Ananya Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

My M-7

Uploaded by

Ananya Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Module:7

Communicating,
Perceiving and
Acting
Communication-Fundamentals of Language -
Probabilistic Language Processing -Information
Retrieval- Information Extraction-Perception-Image
Formation- Object Recognition
• Language Models
Contents • N – gram character and word models
• Information Retrieval
(Part-1) • BM25 word based relevance, improving recall
using Case Folding, Stemming, Synonyms,
Metadata, links
Language Models • Page Rank – Link based scoring
N – gram character and word models
• HITS Algorithm – Word and link based scoring
Information Retrieval
SM25 word based relevance, improving recall using Case Folding,
Stemming, Synonyms, Metadata, links
• Question Answering
Page Rank – Link based scoring
• ASKMSR process – Question Classification,
HITS Algorithm – Word and link based scoring
Query formation and retrieval of n-grams ,
Question Answering Filtering n-grams based on Expected Answer
ASKMSR process – Question Classification, Query formation and
retrieval of n-grams , Filtering n-grams based on Expected Answer Type, Filtering n –grams that were present in the
Type, Filtering n –grams that were present in the question,
Combining high score n grams for longer answers question, Combining high score n grams for
longer answers
Communication - NLP

• Language Models
• Information Retrieval
• Language is a set of strings .
• Any legal sentence of a language should
Language adhere to its syntactic rules (grammar)
and express a meaning as per its semantic
Model rules (meanings)
• Natural languages – ambiguous (single
Language is a set of strings .
word – multiple meanings, single sentence
Any legal sentence of a language should adhere to
its syntactic rules (grammar) and express a
meaning as per its semantic rules (meanings)
– have multiple meanings)
Natural languages – ambiguous (single word –
multiple meanings, single sentence – have multiple
• Example: The man saw the girl with the
meanings) telescope, He saw her duck.
Example: The man saw the girl with the telescope, He saw her
duck.
• Meaning of a sentence/word – probability
Meaning of a sentence/word – probability distribution over
possible meanings distribution over possible meanings
Applications of Language Models

• Few of the NLP applications:


• Information Retrieval
• Question Anwering
Probabilistic Models

• N-gram character models


• N-gram word models
• Probability distribution of N letter sequence – N-
gram model
• N gram model – Markov chain of order n-1x
N-gram Character • Example 3 gram model
Models
Probability distribution of N letter sequence – N-gram model
N gram model – Markov chain of order n-1x
Example 3 gram model
• For a trigram model with 100 characters require
100x100x100 entries (model – P(Ci|Ci-2, Ci-1)
values for all combination of 100 characters)
For a trigram model with 100 characters require 100x100x100 entries
(model – P(Ci|Ci-2, Ci-1) values for all combination of 100 characters)
The probabilities are calculated from a corpus
• The probabilities are calculated from a corpus
Application of character model – Language identification
• Application of character model – Language
identification
• Analyse sequence of words rather than
characters
N-gram Word • The set of words form the vocabulary
• When a corpus is process for the generation
Models of N-gram model – out of vocabulary words
have to be handled by adding special tags to
Analyse sequence of words rather than characters the vocabulary to model the out of
The set of words form the vocabulary vocabulary words
When a corpus is process for the generation of N-
gram model – out of vocabulary words have to be • Example - <UNKNOWN> tag can count
handled by adding special tags to the vocabulary to
model the out of vocabulary words the out of vocabulary words in the corpus or
Example - <UNKNOWN> tag can count the out <EMAIL> tag can account for out of
of vocabulary words in the corpus or <EMAIL>
tag can account for out of vocabulary character vocabulary character sequences and <NUM>
sequences and <NUM> can count for out of
vocabulary numeric sequences
can count for out of vocabulary numeric
sequences
Example – Bigram Word
Model
• A corpus of documents.
Information Retrieval – • Queries posed in a query
Characterizations language.
A corpus of documents • Results set – documents
Queries posed in a query language
relevant to the query.
Results set – documents relevant to the
query
Presentation of the result set
• Presentation of the result set.
Finding Relevance of
Documents
• Term frequency – Table of words x
documents containing the frequency of
Tables and Indices for words in the documents.

BM25 (IR) • Document frequency – words – count


of the documents containing the words.
Term frequency – Table of words x documents containing
the frequency of words in the documents
• For improved performance in retrieval:
Document frequency – words – count of the documents
containing the words
• BM25 function is computed over a
For improved performance in retrieval vocabulary for the documents in the
BM25 function is computed over a vocabulary for
the documents in the corpus – index/hit list
corpus – index/hit list.
Given a query, get the score of the documents from
the hit list after intersecting with the query words
• Given a query, get the score of the
documents from the hit list after
intersecting with the query words.
IR System Evaluation
Measures – Precision and
Recall
• Case folding – Converting cases to a common
case (COUCHES -> couches)
Improving the • Stemming – Reducing different forms of the
word to its base form (couches -> couch)
BM • Synonyms – Using words with same
meanings (couch, sofa)
Case folding – Converting cases to a common case
(COUCHES -> couches) • Improves recall but affects precision
Stemming – Reducing different forms of the word to its
base form (couches -> couch) • Metadata
Synonyms – Using words with same meanings (couch, sofa) • Using data outside of the document
Improves recall but affects precision

Metadata • Example: Keywords, publication data of a


Using data outside of the document research article
Example: Keywords, publication data of a research article

Link – Links between documents are a crucial source of


• Link – Links between documents are a crucial
information source of information
• IBM website should be ranked higher for
the query term “IBM” compared to a
Approach for document containing the highest
Relavence in Web frequency of IBM
• Solution: Link Analysis
IBM website should be ranked higher for the
query term “IBM” compared to a document
containing the highest frequency of IBM
• Link Analysis algorithms
Solution: Link Analysis • Page Rank (Query independent ranking
Link Analysis algorithms algorithm for the pages)
Page Rank (Query independent ranking algorithm
for the pages) • HITS algorithm (Query dependent ranking
HITS algorithm (Query dependent ranking algorithm)
algorithm)
PageRank Algorithm – • Original ideas that set Google’s search
Web IR – Algorithm 1 apart from other search engines.

Original ideas that set Google’s search apart • A page with higher inlinks from high
from other search engines quality websites will get higher rank
A page with higher inlinks from high quality (inlinks - links to the page).
websites will get higher rank (inlinks - links
to the page) • High quality pages – pages with more
High quality pages – pages with more inlinks inlinks.
Example – Pages and
their Page ranks
• Hyperlink Induced Topic Search Algorithm
aka Hubs and Authorities is a link analysis
algorithm
HITS Algorithm – Web • HITS Vs Page Rank
• HITS – query dependent
IR – Algorithm 2
• Starts with intersection of hit list with the query
words (relevance set) and further adds the pages
Hyperlink Induced Topic Search Algorithm aka Hubs and
Authorities is a link analysis algorithm based on links
HITS Vs Page Rank • Hub score, Authority score – Scores of the
HITS – query dependent
pages calculated iteratively similar to Page Rank
Starts with intersection of hit list with the query words (relevance set)
and further adds the pages based on links score
Hub score, Authority score – Scores of the pages calculated
iteratively similar to Page Rank score Authority score – Degree that other pages in the
Authority score – Degree that other pages in the relevant
set point to it
relevant set point to it
Hub score – Degree that it points to other authoritative
pages in the relevant set Hub score – Degree that it points to other
authoritative pages in the relevant set
• Yet another IR application
Question • Input is a query / question , response is a
short response – a sentence or a phrase
Answering • Question Answering requires Natural
Language Understanding such as pronoun
Yet another IR application
Input is a query / question , response is a short response –
reference, other semantic facts
a sentence or a phrase
Question Answering requires Natural Language John Wilkes Booth altered history with a bullet.
Understanding such as pronoun reference, other semantic
facts He will forever be known as the man who
John Wilkes Booth altered history with a bullet. He will
forever be known as the man who ended Abraham
ended Abraham Lincoln’s life.
Lincoln’s life.
ASKMSR – QA engine that works on base of N-gram
model
• ASKMSR – QA engine that works on
base of N-gram model
• Questions are classified to one of 15
categories and is rewritten as queries to a
Question Answering(Example) search engine
– ASKMSR approach-Question
Example Who killed Abraham Lincoln?
classification, query formation,
rewritten as [* killed Abraham Lincoln] ,
n grams retrieval [Abraham Lincoln was killed by *] with exact
Questions are classified to one of 15 categories and is
phrase match having more weight that words
rewritten as queries to a search engine match as the query [Abraham OR Lincoln OR
Example Who killed Abraham Lincoln? rewritten as [*
killed Abraham Lincoln] , [Abraham Lincoln was killed by killed]
*] with exact phrase match having more weight that words
match as the query [Abraham OR Lincoln OR killed] Highly ranked ngrams received - John Wilkes
Highly ranked ngrams received - John Wilkes Booth,
Abraham Lincoln, assassination of, Ford’s Theatre Booth, Abraham Lincoln, assassination of,
N grams retrieved - filtered by expected type based on the
question type
Ford’s Theatre
• N grams retrieved - filtered by expected type
based on the question type

Contents
Information Extraction (IE) Overview.
• Domain-Specific vs. General Domain IE.
• Approaches to Information Extraction.

(Part-2) •

Finite State Automata (FSA) in IE.
Probabilistic Models – Hidden Markov Models (HMMs).
• Conditional Random Fields (CRFs).
Information Extraction (IE) Overview.

Domain-Specific vs. General Domain IE. • Ontology Extraction from Large Corpora.
Approaches to Information Extraction.

Finite State Automata (FSA) in IE. • Automated Template Construction.


Probabilistic Models – Hidden Markov Models (HMMs).

Conditional Random Fields (CRFs). • Machine Reading – TEXTRUNNER.


Ontology Extraction from Large Corpora.

Automated Template Construction. • Attribute-Based Extraction.


Machine Reading – TEXTRUNNER.

Attribute-Based Extraction. • Template-Based Extraction.


Template-Based Extraction.

FASTUS System Stages. • FASTUS System Stages.


Performance of FSA-Based Approaches

HMM vs. FSA in Information Extraction • Performance of FSA-Based Approaches


• HMM vs. FSA in Information Extraction
Contents • Perception.
• Challenges in vision based perception.
(Part-3) • Approaches for addressing the
challenges:
Perception.
Challenges in vision based perception. • Feature Extraction.
Approaches for addressing the challenges:
Feature Extraction.
• Recognition.
Recognition. • Reconstruction.
Reconstruction.

Image Formation & Processing. • Image Formation & Processing.


Object Recognition.
• Object Recognition.
• Perception provides agents with
information about the world they inhabit
by interpreting the response of
sensors.

Perception • Sensor measures the aspect of the


environment that can be used by an
agent program
Perception provides agents with information about the
world they inhabit by interpreting the response of
sensors. • Sensors share the aspects of human
Sensor measures the aspect of the environment that can vision, hearing and touch in addition to
be used by an agent program
Sensors share the aspects of human vision, hearing and
those that are not available for human –
touch in addition to those that are not available for
human – Radio, Infrared, GPS, wireless signals Radio, Infrared, GPS, wireless signals
Sensing – active / passive
• Sensing – active / passive
Active – send out signals
and measure the reflection • Active – send out signals and
measure the reflection
• When sensors are partially observable –
model based agent has a sensor model
• Sensor model –
Sensor Model • Probability distribution P(E |
When sensors are partially observable – model based agent has a
sensor model
S) over the evidence that its
Sensor model – sensors provide, given a
Probability distribution P(E state of the world. Bayes’
| S) over the evidence that
its sensors provide, given a rule can then be used to
state of the world. Bayes’
rule can then be used to update the estimation of the
update the estimation of
the state. state.
• The sensor model of vision has two
components
• Object Model

Vision • Rendering model


• Object Model
• Precise/vague description of objects in
The sensor model of vision has two components the real world
Object Model
Rendering model • Rendering model
Object Model
Precise/vague description of objects in the real world • describes the physical, geometric,
Rendering model and statistical processes that
describes the physical, geometric, and statistical
processes that produce the stimulus from the world. produce the stimulus from the
Accurate but ambiguous (nature of scene and
illumination conditions)
world.
• Accurate but ambiguous (nature of
scene and illumination conditions)
• Visual observations are
extraordinarily rich, both in the detail
they can reveal and in the sheer
Challenge in vision amount of data they produce.
based perception • Which aspects of the rich visual
Visual observations are extraordinarily
stimulus should be considered to
rich, both in the detail they can reveal and help the agent make good action
in the sheer amount of data they produce.
Which aspects of the rich visual stimulus
choices, and which aspects should
should be considered to help the agent be ignored?
make good action choices, and which
aspects should be ignored?
• Vision—and all perception—serves
Vision—and all perception—serves to
further the agent’s goals, not as an end to to further the agent’s goals, not as
itself.
an end to itself.
Feature extraction approach
emphasizes simple computations
applied directly to the sensor
Three Approaches observations.
Recognition approach
for the Challenge an agent draws distinctions among the
Feature extraction approach
objects it encounters based on visual and
emphasizes simple computations applied directly other information. Recognition could mean
to the sensor observations.
Recognition approach
labeling each image with a yes or no as to
an agent draws distinctions among the objects it
whether it contains food that we should
encounters based on visual and other information.
Recognition could mean labeling each image with a yes forage, or contains grandma’s face.
or no as to whether it contains food that we should
forage, or contains grandma’s face.
Reconstruction approach
Reconstruction approach
an agent builds a geometric model of the world
from an image or a set of images. an agent builds a geometric model of
the world from an image or a set of
images.
• Image formation models
• Pinhole camera, lens, scaled orthographic
Image Formation & projection model – geometric model for co-
Processing, Object ordinates
• Models for brightness and color
Recognition.
• Image processing operations
Image formation models • Edge detection, Texture, Optical flow,
Pinhole camera, lens, scaled orthographic projection model –
geometric model for co-ordinates Segmentation
Models for brightness and color

Image processing operations • Object recognition


Edge detection, Texture, Optical flow, Segmentation

Object recognition
• Appearance (feature based approaches)
Appearance (feature based approaches) • Structural Information (deformable template
Structural Information (deformable template matching based
approach) matching based approach)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy