0% found this document useful (0 votes)

63 views11 pages

Workbook of Pattern Recognition

The document provides information about pattern recognition and machine vision. It defines pattern recognition as the classification of objects into categories, which is important for decision making in machine intelligence systems. It describes machine vision as using cameras to capture images which are then analyzed to describe what is imaged, with typical applications in manufacturing for inspection and automation. Feature extraction and selection are discussed as important processes for determining and choosing important input variables for machine learning models.

Uploaded by

Ankur Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views11 pages

Workbook of Pattern Recognition

Uploaded by

Ankur Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Pattern Recognition

PECCS702B
WorkBook
Semester - 7
Prof. Bavrabi Ghosh

What is Pattern Recognition?

Pattern recognition is the scientific discipline whose goal is the classification of
objects into a number of categories or classes. Depending on the application, these
objects can be images or signal waveforms or any type of measurements that need
to be classified. Pattern recognition is an integral part in most machine intelligence systems
built for decision making.

What is machine vision?

A machine vision system captures images via a camera and analyzes them to
produce descriptions of what is imaged. A typical application of a machine vision
system is in the manufacturing industry, either for automated visual inspection or
for automation in the assembly line. For example, in inspection, manufactured
objects on a moving conveyor may pass the inspection station, where the camera
stands, and it has to be ascertained whether there is a defect. Thus, images have
to be analyzed on line, and a pattern recognition system has to classify the objects
into the “defect” or “non-defect”c1ass. After that, an action has to be taken, such as
to reject the offending parts. In an assembly line, different objects must be located
and “recognized,” that is, classified in one of a number of classes known a priori.
Examples are the “screwdriver class,” the “German key class,” and so forth in a
tools’ manufacturing unit. Then a robot arm can place the objects in the right place.

What is a feature?
The input variables that we give to our machine learning models are called features. Each
column in our dataset constitutes a feature.

What is feature extraction?

Feature extraction is the process of determining the features to be used for learning. The
description and properties of the patterns are known. However, for the classification task at
hand, it is necessary to extract the features to be used. It may involve carrying out some
arithmetic operations on the features like linear combinations of the features or finding the
value of a function.
What is feature selection?

To train an optimal model, we need to make sure that we use only the essential features. If
we have too many features, the model can capture the unimportant patterns and learn
from noise. The method of choosing the important parameters of our data is called Feature
Selection.

Why is feature selection needed?

Machine learning models follow a simple rule: whatever goes in, comes out. If we put
garbage into our model, we can expect the output to be garbage too. In this case, garbage
refers to noise in our data.

To train a model, we collect enormous quantities of data to help the machine learn better.
Usually, a good portion of the data collected is noise, while some of the columns of our
dataset might not contribute significantly to the performance of our model. Further, having
a lot of data can slow down the training process and cause the model to be slower. The
model may also learn from this irrelevant data and be inaccurate.

A case study on feature selection

Consider a table which contains information on old cars. The model decides which cars must
be crushed for spare parts.

In the above table, we can see that the model of the car, the year of manufacture, and the
miles it has traveled are pretty important to find out if the car is old enough to be crushed
or not. However, the name of the previous owner of the car does not decide if the car
should be crushed or not. Further, it can confuse the algorithm into finding patterns
between names and the other features. Hence we can drop the column.
Elaborate the concept of feature selection.

Feature Selection is the method of reducing the input variable to your model by using only
relevant data and getting rid of noise in data.

It is the process of automatically choosing relevant features for your machine learning
model based on the type of problem you are trying to solve. We do this by including or
excluding important features without changing them. It helps in cutting down the noise in
our data and reducing the size of our input data.

What are the different feature selection models?

Feature selection models are of two types:

Supervised Models: Supervised feature selection refers to the method which uses the
output label class for feature selection. They use the target variables to identify the
variables which can increase the efficiency of the model
Unsupervised Models: Unsupervised feature selection refers to the method which does not
need the output label class for feature selection. We use them for unlabelled data.
What is the filter method of feature selection?

Filter Method: In this method, features are dropped based on their relation to the output,
or how they are correlating to the output. We use correlation to check if the features are
positively or negatively correlated to the output labels and drop features accordingly. Eg:
Information Gain, Chi-Square Test, Fisher’s Score, etc.

What is the wrapper method of feature selection?

We split our data into subsets and train a model using this. Based on the output of the
model, we add and subtract features and train the model again. It forms the subsets using a
greedy approach and evaluates the accuracy of all the possible combinations of features. Eg:
Forward Selection, Backwards Elimination, etc.
What is intrinsic method of feature selection?

This method combines the qualities of both the Filter and Wrapper method to create the
best subset. This method takes care of the machine training iterative process while
maintaining the computation cost to be minimum. Eg: Lasso and Ridge Regression.

How to choose a feature selection model?

The process is relatively simple, with the model depending on the types of input and output
variables.

Variables are of two main types:

Numerical Variables: Which include integers, float, and numbers.

Categorical Variables: Which include labels, strings, boolean variables, etc.
Based on whether we have numerical or categorical variables as inputs and outputs, we can
choose our feature selection model as follows:

Input
Output Variable Feature Selection Model
Variable

• Pearson’s correlation coefficient

Numerical Numerical
• Spearman’s rank coefficient

• ANOVA correlation coefficient (linear).

Numerical Categorical
• Kendall’s rank coefficient (nonlinear).

• Kendall’s rank coefficient (linear).

Categorical Numerical • ANOVA correlation coefficient

(nonlinear).

• Chi-Squared test (contingency tables).

Categorical Categorical
• Mutual Information.

Short note on KNN Algorithm.

1. K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.
2. K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
3. K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
4. K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
5. K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
6. It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.

KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we
want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the most similar features it
will put it in either cat or dog category.

Why do we need a KNN algorithm ?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of
problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the
category or class of a particular dataset. Consider the below diagram:
How does KNN algorithm work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider
the below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:

o As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.

How to select the value of K in KNN algorithm?

There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in
the model.

Large values for K are good, but it may find some difficulties.

What are the advantages of KNN algorithm?

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

What are the disadvantages of KNN algorithm?

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data
points for all the training samples.

Write short notes on different variants of KNN algorithm.

Locally Adaptive KNN - In standard KNN algorithm global value of input parameter k is used.
But this proposed algorithm suggested using different values of the parameter k for
different portions of input space. Each time for classification of a query, the value of k is
determined via applying cross-validation in its local neighbourhood.

Weighted adjusted KNN - In standard KNN algorithm all the attributes have equal
importance. All the attributes or features give equal contribution for classification of novel
tuples. But not all the attributes in the data set are equally important. A weight-adjusted
KNN algorithm which first learns weights for different attributes and according to the
weights assigned, each attribute would affect the process of classification that much only.

Improved KNN for Text Categorization - As we know how much the value of input parameter
k influences the performance of KNN algorithm. So, it is very crucial to choose appropriate
value of the parameter k. In general the classes are not evenly distributed in the data set.
Therefore, using a fixed value of k for all the classes would result in bias towards the class
which has larger number of tuples. Therefore one can use different values of k for different
classes according to their class distribution. More number of tuples is used to classify a new
tuple in a class which has large number of tuples.

Adaptive KNN – Rather than using a fixed value of k, to use non-fixed number of nearest
neighbours i.e. k. Large value of parameter k would also increase the computational cost
and time in case of large data sets. To solve this problem it has applied three heuristics so
that early break of the algorithm can be possible. These heuristics on fulfilment of a fixed
condition would break out from the algorithm. This would save computational time of the
algorithm.

KNN with Shared Nearest Neighbours - another variant of KNN algorithm which uses shared
nearest neighbours to classify documents. To find neighbours of a novel tuple, it uses BM25
similarity measure. A threshold is set, only that much number of nearest neighbours can
vote for classification of an unknown tuple.

KNN with K-Means - One of the shortcomings of KNN algorithm is its high computation
complexity. To alleviate this drawback an effort has been made by combining KNN algorithm
with the clustering algorithm K-Means. In the proposed algorithm first the clusters of the
different categories in training data set are formed. The centres of these newly formed
clusters will now act as new training samples. To classify an unknown tuple the distance of it
is computed with these new training tuples and with which the tuple has least distance it
will be assigned to that class. The benefit of this variant of KNN is that there is no need of
passing the input parameter k as we have to do in standard KNN.

KNN with Mahalanobis Metric - The performance of KNN algorithm largely depends on the
distance metric which is used to find the distance between any two tuples. A new distance
metric called Mahalanobis distance metric was introduced. It transforms the whole input
space using linear transformation. In this transformed input space the Euclidean distance is
same as Mahalanobis distance between any two data points. Euclidean distance is the
distance between any two points whereas Mahalanobis distance is a distance between a
point and a distribution. If the point represents the mean of the distribution then the
Mahalanobis distance would be zero. The main benefit of taking Mahalanobis distance
metric instead of Euclidean distance metric is that it also reckoned the correlation between
data tuples.

dimensionalityReduction.pptx
No ratings yet
dimensionalityReduction.pptx
117 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
FDS Unit-4
No ratings yet
FDS Unit-4
15 pages
7. Feature Engineering and Dimensionality Reduction
No ratings yet
7. Feature Engineering and Dimensionality Reduction
146 pages
Lectures 7 and 8 - Data Anaysis in Management - MBM
No ratings yet
Lectures 7 and 8 - Data Anaysis in Management - MBM
78 pages
DWDM Notes Unit-4
No ratings yet
DWDM Notes Unit-4
89 pages
Data User 0 Com - Microsoft.office - Officehubrow Files Tempoffice OfficeMobilePdf DWDM UNIT-4
No ratings yet
Data User 0 Com - Microsoft.office - Officehubrow Files Tempoffice OfficeMobilePdf DWDM UNIT-4
81 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Learning Types ML
No ratings yet
Learning Types ML
18 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
CSE-VSEM-503-B-PR-UNIT-2-NOTES
No ratings yet
CSE-VSEM-503-B-PR-UNIT-2-NOTES
17 pages
CSEC Maths - Crash Course 12 Hour Session
No ratings yet
CSEC Maths - Crash Course 12 Hour Session
294 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Classification
No ratings yet
Classification
58 pages
Newman Networks An Introduction 2010
100% (2)
Newman Networks An Introduction 2010
394 pages
Features Election
No ratings yet
Features Election
62 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
UNIT04
No ratings yet
UNIT04
35 pages
Unit4_PPT
No ratings yet
Unit4_PPT
118 pages
chapter-4
No ratings yet
chapter-4
40 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
ML-UNIT-2
No ratings yet
ML-UNIT-2
46 pages
feature selection
No ratings yet
feature selection
17 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Module5.2 Feature selection methods
No ratings yet
Module5.2 Feature selection methods
64 pages
Unit - II
No ratings yet
Unit - II
37 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
ch2
No ratings yet
ch2
30 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Classification
No ratings yet
Classification
50 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
ml unit2
No ratings yet
ml unit2
38 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Introduction to Classification and Classification Algorithms
No ratings yet
Introduction to Classification and Classification Algorithms
9 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
CALORIMETRY and HEAT TRANSFER TYPE 1 PDF
No ratings yet
CALORIMETRY and HEAT TRANSFER TYPE 1 PDF
16 pages
Adhikari - Bhattacharya - 2011 - Vibrations of Wind-Turbines Considering SSI
No ratings yet
Adhikari - Bhattacharya - 2011 - Vibrations of Wind-Turbines Considering SSI
28 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Mathematica Assignment No. 3
No ratings yet
Mathematica Assignment No. 3
20 pages
9709_w24_ms_43
No ratings yet
9709_w24_ms_43
19 pages
PDF
No ratings yet
PDF
43 pages
Fertility PDF
No ratings yet
Fertility PDF
25 pages
Latches and Flip-Flop: Finals - Lecture 2
No ratings yet
Latches and Flip-Flop: Finals - Lecture 2
28 pages
Self Assessment Activity Sheet Quarter 2 Grade 7 1
No ratings yet
Self Assessment Activity Sheet Quarter 2 Grade 7 1
7 pages
Aurora education-xii-maths-key-remesh-model-2025
No ratings yet
Aurora education-xii-maths-key-remesh-model-2025
9 pages
MMW-Chapter 3-1 Reasoning (New)
No ratings yet
MMW-Chapter 3-1 Reasoning (New)
35 pages
Tutorial 1
No ratings yet
Tutorial 1
3 pages
MATH 7 CURRICULUM MAP 1st Quarter
100% (2)
MATH 7 CURRICULUM MAP 1st Quarter
4 pages
Finite Difference
100% (2)
Finite Difference
10 pages
Pkem
No ratings yet
Pkem
42 pages
Eva 1 PDF
No ratings yet
Eva 1 PDF
12 pages
A Critical Analysis of Anchor Spacing in Refractory Lining Design
100% (1)
A Critical Analysis of Anchor Spacing in Refractory Lining Design
11 pages
QTM - Unit - III - Questions On Assignment Problem
No ratings yet
QTM - Unit - III - Questions On Assignment Problem
2 pages
2024-25_PHYSICS_XI_HY_SP
No ratings yet
2024-25_PHYSICS_XI_HY_SP
5 pages
Z and T Distribution Table
100% (2)
Z and T Distribution Table
2 pages
Critical Success Factors in Construction Projects: Maninder Singh, S.K. Sharma
No ratings yet
Critical Success Factors in Construction Projects: Maninder Singh, S.K. Sharma
6 pages
Problem 10.26:: E E E, Note
No ratings yet
Problem 10.26:: E E E, Note
8 pages
Laguna State Polytechnic University: Senior High School Department
No ratings yet
Laguna State Polytechnic University: Senior High School Department
2 pages
Junit Presentation
No ratings yet
Junit Presentation
26 pages
Cascaded Multilevel Inverters
No ratings yet
Cascaded Multilevel Inverters
6 pages
Problem Solving: Set C Solutions
No ratings yet
Problem Solving: Set C Solutions
7 pages
Why Quadratic Mean Diameter
No ratings yet
Why Quadratic Mean Diameter
3 pages
End Semester Question Paper - CSPC205
No ratings yet
End Semester Question Paper - CSPC205
4 pages
Geodesic Sphere
No ratings yet
Geodesic Sphere
2 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Workbook of Pattern Recognition

Uploaded by

Workbook of Pattern Recognition

Uploaded by

Pattern Recognition

What is Pattern Recognition?

What is machine vision?

What is feature extraction?

Why is feature selection needed?

A case study on feature selection

What are the different feature selection models?

Feature selection models are of two types:

What is the wrapper method of feature selection?

How to choose a feature selection model?

Variables are of two main types:

Numerical Variables: Which include integers, float, and numbers.

• Pearson’s correlation coefficient

• ANOVA correlation coefficient (linear).

• Kendall’s rank coefficient (linear).

Categorical Numerical • ANOVA correlation coefficient

• Chi-Squared test (contingency tables).

Short note on KNN Algorithm.

1. K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Why do we need a KNN algorithm ?

o Step-1: Select the number K of the neighbors

How to select the value of K in KNN algorithm?

What are the advantages of KNN algorithm?

What are the disadvantages of KNN algorithm?

Write short notes on different variants of KNN algorithm.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.