My M-7
My M-7
Communicating,
Perceiving and
Acting
Communication-Fundamentals of Language -
Probabilistic Language Processing -Information
Retrieval- Information Extraction-Perception-Image
Formation- Object Recognition
• Language Models
Contents • N – gram character and word models
• Information Retrieval
(Part-1) • BM25 word based relevance, improving recall
using Case Folding, Stemming, Synonyms,
Metadata, links
Language Models • Page Rank – Link based scoring
N – gram character and word models
• HITS Algorithm – Word and link based scoring
Information Retrieval
SM25 word based relevance, improving recall using Case Folding,
Stemming, Synonyms, Metadata, links
• Question Answering
Page Rank – Link based scoring
• ASKMSR process – Question Classification,
HITS Algorithm – Word and link based scoring
Query formation and retrieval of n-grams ,
Question Answering Filtering n-grams based on Expected Answer
ASKMSR process – Question Classification, Query formation and
retrieval of n-grams , Filtering n-grams based on Expected Answer Type, Filtering n –grams that were present in the
Type, Filtering n –grams that were present in the question,
Combining high score n grams for longer answers question, Combining high score n grams for
longer answers
Communication - NLP
• Language Models
• Information Retrieval
• Language is a set of strings .
• Any legal sentence of a language should
Language adhere to its syntactic rules (grammar)
and express a meaning as per its semantic
Model rules (meanings)
• Natural languages – ambiguous (single
Language is a set of strings .
word – multiple meanings, single sentence
Any legal sentence of a language should adhere to
its syntactic rules (grammar) and express a
meaning as per its semantic rules (meanings)
– have multiple meanings)
Natural languages – ambiguous (single word –
multiple meanings, single sentence – have multiple
• Example: The man saw the girl with the
meanings) telescope, He saw her duck.
Example: The man saw the girl with the telescope, He saw her
duck.
• Meaning of a sentence/word – probability
Meaning of a sentence/word – probability distribution over
possible meanings distribution over possible meanings
Applications of Language Models
Original ideas that set Google’s search apart • A page with higher inlinks from high
from other search engines quality websites will get higher rank
A page with higher inlinks from high quality (inlinks - links to the page).
websites will get higher rank (inlinks - links
to the page) • High quality pages – pages with more
High quality pages – pages with more inlinks inlinks.
Example – Pages and
their Page ranks
• Hyperlink Induced Topic Search Algorithm
aka Hubs and Authorities is a link analysis
algorithm
HITS Algorithm – Web • HITS Vs Page Rank
• HITS – query dependent
IR – Algorithm 2
• Starts with intersection of hit list with the query
words (relevance set) and further adds the pages
Hyperlink Induced Topic Search Algorithm aka Hubs and
Authorities is a link analysis algorithm based on links
HITS Vs Page Rank • Hub score, Authority score – Scores of the
HITS – query dependent
pages calculated iteratively similar to Page Rank
Starts with intersection of hit list with the query words (relevance set)
and further adds the pages based on links score
Hub score, Authority score – Scores of the pages calculated
iteratively similar to Page Rank score Authority score – Degree that other pages in the
Authority score – Degree that other pages in the relevant
set point to it
relevant set point to it
Hub score – Degree that it points to other authoritative
pages in the relevant set Hub score – Degree that it points to other
authoritative pages in the relevant set
• Yet another IR application
Question • Input is a query / question , response is a
short response – a sentence or a phrase
Answering • Question Answering requires Natural
Language Understanding such as pronoun
Yet another IR application
Input is a query / question , response is a short response –
reference, other semantic facts
a sentence or a phrase
Question Answering requires Natural Language John Wilkes Booth altered history with a bullet.
Understanding such as pronoun reference, other semantic
facts He will forever be known as the man who
John Wilkes Booth altered history with a bullet. He will
forever be known as the man who ended Abraham
ended Abraham Lincoln’s life.
Lincoln’s life.
ASKMSR – QA engine that works on base of N-gram
model
• ASKMSR – QA engine that works on
base of N-gram model
• Questions are classified to one of 15
categories and is rewritten as queries to a
Question Answering(Example) search engine
– ASKMSR approach-Question
Example Who killed Abraham Lincoln?
classification, query formation,
rewritten as [* killed Abraham Lincoln] ,
n grams retrieval [Abraham Lincoln was killed by *] with exact
Questions are classified to one of 15 categories and is
phrase match having more weight that words
rewritten as queries to a search engine match as the query [Abraham OR Lincoln OR
Example Who killed Abraham Lincoln? rewritten as [*
killed Abraham Lincoln] , [Abraham Lincoln was killed by killed]
*] with exact phrase match having more weight that words
match as the query [Abraham OR Lincoln OR killed] Highly ranked ngrams received - John Wilkes
Highly ranked ngrams received - John Wilkes Booth,
Abraham Lincoln, assassination of, Ford’s Theatre Booth, Abraham Lincoln, assassination of,
N grams retrieved - filtered by expected type based on the
question type
Ford’s Theatre
• N grams retrieved - filtered by expected type
based on the question type
•
Contents
Information Extraction (IE) Overview.
• Domain-Specific vs. General Domain IE.
• Approaches to Information Extraction.
(Part-2) •
•
Finite State Automata (FSA) in IE.
Probabilistic Models – Hidden Markov Models (HMMs).
• Conditional Random Fields (CRFs).
Information Extraction (IE) Overview.
Domain-Specific vs. General Domain IE. • Ontology Extraction from Large Corpora.
Approaches to Information Extraction.
Object recognition
• Appearance (feature based approaches)
Appearance (feature based approaches) • Structural Information (deformable template
Structural Information (deformable template matching based
approach) matching based approach)