0% found this document useful (0 votes)

8 views32 pages

(PR 2024) Lec14 Unsupervised Learning II

Uploaded by

oomaarhmed2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views32 pages

(PR 2024) Lec14 Unsupervised Learning II

Uploaded by

oomaarhmed2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Pattern Recognition

Lecture 14: Unsupervised learning II

(Document Retrieval)
Dr. Dina Khattab
Faculty of Computer & Information Sciences (FCIS) - Ain Shams University
dkhattab@eelu.edu.eg
Instructor: Dr. Dina Khattab

Email: dkhattab@eelu.edu.eg

Office Hours: Wednesday 7:00 PM to 9:00 PM

Agenda
Document retrieval
– Practice on document similarity

– Search Complexity of Nearest Neighbor (NN)

• KD-tree

3
Given the following three documents d1, d2 and d3, calculate the TF-IDF feature
vector for each document
• d1 - Music is a universal language
• d2 - Music is a miracle
• d3 - Music is a universal feature of the human experience
then based on the cosine similarity metric show which two documents are the most
similar.
Solution:
• TF = word count
• IDF = log( # doc / #doc using word)
Word TF d1 TF d2 TF d3 IDF TF-IDF d1 TF-IDF d2 TF-IDF d3

Music 1 1 1 0 0 0 0
Is 1 1 1 0 0 0 0
a 1 1 1 0 0 0 0
universal 1 0 1 0.176 0.176 0 0.176
language 1 0 0 0.477 0.477 0 0
miracle 0 1 0 0.477 0 0.477 0
feature 0 0 1 0.477 0 0 0.477
of 0 0 1 0.477 0 0 0.477
The 0 0 1 0.477 0 0 0.477
Human 0 0 1 0.477 0 0 0.477 4
Experience 0 0 1 0.477 0 0 0.477
Word TF d1 TF d2 TF d3 IDF TF-IDF d1 TF-IDF d2 TF-IDF d3

• Cos similarity (d1,d2) = 0

• Cos similarity (d1,d3) = 0.0309 / (0.508)*(1.08) = 0.0563

• Cos similarity (d2,d3) = 0

• So d1, d3 are the most similar documents

5
Document retrieval
• Currently reading article you like
• Goal: Want to find similar article

6
1-NN search for retrieval
• Space of all articles, organized by similarity of
text

7
Compute distances to all docs
• Space of all articles, organized by similarity of
text

8
Retrieve “nearest neighbor”
• Space of all articles, organized by similarity of
text

9
Complexity of brute-force search
• Given a query point, scan through each point
– O(N) distance computations per 1-NN query!
– O(Nlogk) per k-NN query!

10
KD-trees
• Structured organization of documents
– Recursively partitions points into axis aligned
boxes.
• Enables more efficient pruning of search space
• Works “well” in “low-medium” dimensions

11
KD-tree construction
• Start with a list of
d-dimensional points

12
KD-tree construction
• Split points into 2 groups

13
KD-tree construction
• Recurse on each group
separately

14
KD-tree construction
• Recurse on each group
separately

15
KD-tree construction
• Continue splitting points at each set
– Creates a binary tree structure
• Each leaf node contains a list of points

16
KD-tree construction

17
KD-tree construction choices
• Use heuristics to make splitting decisions:
– Which dimension do we split along?
the widest one (with highest variance)

– Which value do we split at?

median or center of the box

– When do we stop?
fewer then m points left or box hits minimum width

18
Nearest neighbor with KD-trees
• Traverse tree looking for nearest neighbor to
query point

19
Nearest neighbor with KD-trees
1. Start by exploring leaf node containing query
point

20
Nearest neighbor with KD-trees
1. Start by exploring leaf node containing query point

21
Nearest neighbor with KD-trees
1. Start by exploring leaf node containing query point
2. Compute distance to each other point at leaf node

22
Nearest neighbor with KD-trees
1. Start by exploring leaf node containing query point
2. Compute distance to each other point at leaf node

23
Nearest neighbor with KD-trees
1. Start by exploring leaf node containing query point
2. Compute distance to each other point at leaf node
3. Backtrack and try other branch at each node visited

24
Nearest neighbor with KD-trees
1. Start by exploring leaf node containing query point
2. Compute distance to each other point at leaf node
3. Backtrack and try other branch at each node visited

25
Nearest neighbor with KD-trees
• Use distance bound and bounding box of each node
to prune parts of tree that cannot include nearest
neighbor

26
Nearest neighbor with KD-trees
• Use distance bound and bounding box of each node
to prune parts of tree that cannot include nearest
neighbor

27
Nearest neighbor with KD-trees
• Use distance bound and bounding box of each node
to prune parts of tree that cannot include nearest
neighbor

28
Complexity
For (nearly) balanced, binary trees...
• Construction
– Size: 2N-1 nodes if 1 data point at each leaf → O(N)
– Depth: O(log N)
– Construction time: O(N log N)
• 1-NN query
– Traverse down tree to starting point: O(log N)
– Maximum backtrack and traverse: O(N) in worst case
– Complexity range: O(log N) → O(N) (brute-force)

Search time can go exponential with increase of feature

dimension
29
Complexity

30
k-NN with KD-trees
• Exactly same algorithm, but maintain distance
to furthest of current k nearest neighbors

31
Credit for
“Machine Learning Specialization” (2015) by Emily Fox
& Carlos Guestrin – Uni. of Washington.

“Machine Learning”, by Andrew Ng – Uni. of Stanford.

Unit II - 2 - Supervised Learning
No ratings yet
Unit II - 2 - Supervised Learning
23 pages
K Nearest Neighbour
100% (1)
K Nearest Neighbour
35 pages
Config Guide Firewall Filter
100% (1)
Config Guide Firewall Filter
468 pages
L19.Kd Trees
0% (1)
L19.Kd Trees
19 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Text Clustering and Validation For Web Search Results
No ratings yet
Text Clustering and Validation For Web Search Results
7 pages
K-Nearest Neighbor Methods: William Cohen 10-601 April 2008
No ratings yet
K-Nearest Neighbor Methods: William Cohen 10-601 April 2008
35 pages
Just For Fun
No ratings yet
Just For Fun
24 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
100 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
232 CO2003 Assignment
No ratings yet
232 CO2003 Assignment
17 pages
K-D Trees
No ratings yet
K-D Trees
19 pages
ML - Course - 15 - 17
No ratings yet
ML - Course - 15 - 17
31 pages
KD Tree
No ratings yet
KD Tree
41 pages
Notes 02
No ratings yet
Notes 02
79 pages
K-D Trees and KNN Searches
No ratings yet
K-D Trees and KNN Searches
9 pages
003 05 KNN - Enhancements W3L2
No ratings yet
003 05 KNN - Enhancements W3L2
10 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
6 pages
The K-D Tree Data Structure and A Proof For Neighborhood Computation in Expected Logarithmic Time
No ratings yet
The K-D Tree Data Structure and A Proof For Neighborhood Computation in Expected Logarithmic Time
12 pages
t100 Manual
No ratings yet
t100 Manual
40 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
K-Nearest Neighbors: Nipun Batra July 5, 2020
No ratings yet
K-Nearest Neighbors: Nipun Batra July 5, 2020
66 pages
Developments in KD Tree and KNN Searches
No ratings yet
Developments in KD Tree and KNN Searches
8 pages
ANSYS Stress Linearization
No ratings yet
ANSYS Stress Linearization
15 pages
KD Trees
No ratings yet
KD Trees
7 pages
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
No ratings yet
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
55 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
ACTIVITY 2 - The PC System
No ratings yet
ACTIVITY 2 - The PC System
4 pages
Best PHD Thesis Topics
No ratings yet
Best PHD Thesis Topics
5 pages
Similarity Search-Kd Tree
No ratings yet
Similarity Search-Kd Tree
5 pages
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
No ratings yet
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
11 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
ML 4
No ratings yet
ML 4
33 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Recitation8 Solutions
No ratings yet
Recitation8 Solutions
5 pages
7.classification After
No ratings yet
7.classification After
51 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Pec-Cs 701e
No ratings yet
Pec-Cs 701e
4 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
6 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Android Widgets
0% (1)
Android Widgets
58 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Application Note: Revision 01
No ratings yet
Application Note: Revision 01
34 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Research Paper
No ratings yet
Research Paper
6 pages
Computational Geomatory
No ratings yet
Computational Geomatory
212 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Enterprise Architecture Udemy Course Contents
No ratings yet
Enterprise Architecture Udemy Course Contents
17 pages
Windows 7 Developer Guide v1.5
No ratings yet
Windows 7 Developer Guide v1.5
46 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
Powershell Commands PDF
No ratings yet
Powershell Commands PDF
3 pages
Seagate 1.5tb USB2.0 S$168 GSS: Asia Pte LTD Internet TV USB $39.90
No ratings yet
Seagate 1.5tb USB2.0 S$168 GSS: Asia Pte LTD Internet TV USB $39.90
4 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
Oracle Inventory Setups
No ratings yet
Oracle Inventory Setups
3 pages
Jagpat Project Dhapni
No ratings yet
Jagpat Project Dhapni
46 pages
C Programming Sollution
100% (1)
C Programming Sollution
43 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
03 School Sports Draft Data Privacy Notice and Consent Form 3
No ratings yet
03 School Sports Draft Data Privacy Notice and Consent Form 3
3 pages
Akira Ct-14ns9re 3y11 Chassis
No ratings yet
Akira Ct-14ns9re 3y11 Chassis
34 pages
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
No ratings yet
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
7 pages
8 Switch Magnum 10KT
No ratings yet
8 Switch Magnum 10KT
51 pages
Co PDF
No ratings yet
Co PDF
123 pages
Case Study and Research Material CMU
No ratings yet
Case Study and Research Material CMU
23 pages
Unit-1: 1. Introduction To Object Oriented Programming
No ratings yet
Unit-1: 1. Introduction To Object Oriented Programming
14 pages
Segment 11
No ratings yet
Segment 11
4 pages
Insert Pic 1 To 6: Proudabmstudent Thebridgeintriumph
No ratings yet
Insert Pic 1 To 6: Proudabmstudent Thebridgeintriumph
4 pages
Bahagian B: 45 Markah
No ratings yet
Bahagian B: 45 Markah
6 pages
Computer Organization Chapter 6 Lecture 19 - Lecture 20
No ratings yet
Computer Organization Chapter 6 Lecture 19 - Lecture 20
63 pages
CS-IT341 Lecture 3
No ratings yet
CS-IT341 Lecture 3
34 pages
Mohammed Radwan CV PDF
No ratings yet
Mohammed Radwan CV PDF
6 pages
System Design Resources
No ratings yet
System Design Resources
25 pages
Project
No ratings yet
Project
2 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
S1 - Human Computer Interaction
No ratings yet
S1 - Human Computer Interaction
2 pages
Computer Organization Chapter 1,2 Lecture 04 Part 2
No ratings yet
Computer Organization Chapter 1,2 Lecture 04 Part 2
61 pages
CS-IT341 Lecture 4
No ratings yet
CS-IT341 Lecture 4
23 pages
Natural Language Processing
No ratings yet
Natural Language Processing
7 pages
Computer Organization Chapter 3 Lecture 09
No ratings yet
Computer Organization Chapter 3 Lecture 09
91 pages
Computer Organization Chapter 4 Lecture 10 - Lecture 13
No ratings yet
Computer Organization Chapter 4 Lecture 10 - Lecture 13
89 pages
CS-IT341 Lecture 5
No ratings yet
CS-IT341 Lecture 5
65 pages
Computer Organization Chapter 1,2 Lecture 01 - Lecture 02
No ratings yet
Computer Organization Chapter 1,2 Lecture 01 - Lecture 02
43 pages
NLP 1-4 Quizes
No ratings yet
NLP 1-4 Quizes
52 pages
(PR 2024) Lec8 Classification VI
No ratings yet
(PR 2024) Lec8 Classification VI
36 pages
Computer Organization Chapter 5 Lecture 15 - Lecture 18
No ratings yet
Computer Organization Chapter 5 Lecture 15 - Lecture 18
44 pages
Computer Organization Chapter 1,2 Lecture 06
No ratings yet
Computer Organization Chapter 1,2 Lecture 06
18 pages
Task 2
No ratings yet
Task 2
3 pages
Logitech POP Keys and POP Mouse Bundle
No ratings yet
Logitech POP Keys and POP Mouse Bundle
1 page
Final B
No ratings yet
Final B
2 pages
Computer Organization Schedule, Course Contents
No ratings yet
Computer Organization Schedule, Course Contents
2 pages
Computer Done
No ratings yet
Computer Done
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

(PR 2024) Lec14 Unsupervised Learning II

Uploaded by

(PR 2024) Lec14 Unsupervised Learning II

Uploaded by

Pattern Recognition

Lecture 14: Unsupervised learning II

Office Hours: Wednesday 7:00 PM to 9:00 PM

– Search Complexity of Nearest Neighbor (NN)

• Cos similarity (d1,d2) = 0

• Cos similarity (d1,d3) = 0.0309 / (0.508)*(1.08) = 0.0563

• Cos similarity (d2,d3) = 0

• So d1, d3 are the most similar documents

– Which value do we split at?

Search time can go exponential with increase of feature

“Machine Learning”, by Andrew Ng – Uni. of Stanford.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.