Class10-Introduction To ML
Class10-Introduction To ML
Data Preprocessing
Data
Feature
Database Cleaning and
Representation
Cleansing
2
Data Science
• Multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract
knowledge and insight from structured and
unstructured data
• Central concept is gaining insight from data
Data Preprocessing
Data
Feature
Database Cleaning and
Representation
Cleansing
3
Data Preprocessing and Descriptive
Data Analytics
• Data preprocessing involve:
– Data cleaning, Data integration, Data transformation,
Data reduction
• Descriptive data analytics serves as a foundation for
data preprocessing
• It helps us to study the general characteristics of data
and identify the presence of noise or outliers
• Data characteristics:
– Central tendency of data
• Centre of the data
• Measuring mean, median and mode
– Dispersion of data
• The degree to which numerical data tend to spread
• Measuring range, quartiles, interquartile range (IQR), the
five-number summary and standard deviation
• Descriptive analytics are the backbone of reporting
Data Science
• Multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge
and insight from structured and unstructured data
• Central concept is gaining insight from data
• Machine learning uses data to extract knowledge –
predictive analytics
Data Preprocessing
Data
Feature
Database Cleaning and
Representation
Cleansing
5
Predictive Data Analytics
• It is used to identify the trends, correlations and
causation by learning the patterns from data
• Study and construction of algorithms that can learn
from data and make predictions on data
• It involve tasks like
– Classification: Categorical label prediction
• E.g.: predicting the presence or absence of disease or
• predicting the category of the disease according to
symptoms
– Regression: Numeric prediction
• E.g.: predicting the amount of landslide or
• predicting the amount of rainfall
– Clustering: Grouping of similar patterns
• E.g.: grouping the similar items to be sold or
• grouping the people from the same region
• Learning from data
6
Machine Learning:
Learning from Data
• 1, 2, 3, 4, 5, ?, …, 24, 25, 26, 27, ?
• 1, 3, 5, 7, 9, ?, …, 25, 27, 29, 31, ?
• 2, 3, 5, 7, 11, ?, …, 29, 31, 37, 41, ?
• 1, 4, 9, 16, 25, ?, …, 121, 144, 169, ?
• 1, 2, 4, 8, 16, 32, ?,…, 1024, 2048, 4096, ?
• 1, 1, 2, 3, 5, 8, ?, …, 55, 89, 144, 233, ?
• 1, 1, 2, 4, 7, 13, ?, 44, 81, 149, 274, 504, ?
• 3, 5, 12, 24, 41, ?, …., 201, 248, 300, 357, ?
• 1, 6, 19, 42, 59, ?, …, 95, 117, 156, 191, ?
8
• 1, 2, 3, 4, 5, 6, …, 24, 25, 26, 27, 28
• 1, 3, 5, 7, 9, 11, …, 25, 27, 29, 31, 33
• 2, 3, 5, 7, 11, 13, …, 29, 31, 37, 41, 43
• 1, 4, 9, 16, 25, 36, …, 121, 144, 169, 196
• 1, 2, 4, 8, 16, 32, 64,…, 1024, 2048, 4096, 8192
• 1, 1, 2, 3, 5, 8, 13, …, 55, 89, 144, 233, 377
• 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, 927
• 3, 5, 12, 24, 41, 63, ….., 201, 248, 300, 357, 419
(2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62)
• 1, 6, 19, 42, 59, ?, …, 95, 117, 156, 191, ?
Tige
r
Giraffe
Horse
Bear
Intraclass variability
10
Interclass
Scene Image Classification similarity
Tall Inside Street Highway Coast Open Mountain Forest
building city country
11
Scene Image Clustering
12
Scene Image Clustering
Residential Interiors
Mountain
s
Military Vehicles
Sacred Places
13
Machine Learning for Pattern
Recognition
• Learning: Acquiring new knowledge or modifying the existing
knowledge
• Knowledge: Familiarity with information present in data
• Learning by machines for pattern analysis: Acquisition of
knowledge from data to discover patterns in data
• Data-driven techniques for learning by machines: Learning from
examples (Training of models)
• Generalization ability of learning machines: Performance of trained
models on new (test) data
• Target of learning techniques: Good generalization ability
• Learning techniques: Estimation of parameters of models
• Learning machines and Learning techniques for pattern analysis:
– Statistical Models (Maximum likelihood)
– Artificial Neural Networks (Error correction learning)
– Kernel Methods (Learning optimal linear relationships) 14
Illustration - Data1: Representing a
Person
• A person is represented using two
attributes:
– Height
– Weight
Weight
in Kg
(x2)
Height in cm (x1)
x = [x1 x2]T
17
Illustration – Data2: Iris (Flower) Data [1]
x = [x1 x2 x3 x4]T
[1] R. A. Fisher, "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, pp. 179-
188, 1936. 18
Illustration – Data3: Years of Experience
and Salary
Years of Salary (in
experienc Rs 1000)
e (x2)
(x1)
3 30
8 57
9 64
13 72 Salary
3 36 (x2)
6 43
11 59
21 90
1 20
Years of experience (x1)
16 83
x = [x1 x2]T
19
Illustration – Data4: Environmental Data
x = [x1 x2 x3 x4]T
20
Supervised and Unsupervised
Learning
Supervised Learning
22
Labeled Data – Illustration:
Data1 - Representing a Person
• A person is represented using two
attributes: • Class (y):
– Height – Child (0)
– Weight – Adult (1)
Weight
in Kg
(x2)
Height in cm (x1)
x = [x1 x2]T
23
Labeled Data – Illustration:
Data2 - Iris (Flower) Data
• Class (y):
– Iris Setosa (1)
– Iris Versicolour (2)
– Iris Virginica (3)
x = [x1 x2 x3 x4]T
24
Labeled Data – Illustration:
Data3 - Years of Experience and Salary
Years of • Class – Raise in Salary (y):
Salary (in Raise
experienc
Rs 1000) – Yes(1)
e
(x2) (y) – No (0)
(x1)
3 30 1
8 57 0
9 64 1
13 72 1
3 36 1
6 43 0
11 59 1
21 90 1
1 20 0
16 83 0
x = [x1 x2]T
25
Labeled Data – Illustration:
Data3 - Years of Experience and Salary
26
Illustration – Data4: Environmental Data
27
Supervised Learning
• In supervised learning, each example (data sample) is
a pair consisting of an input example (typically a
vector) and a desired output value (also called
the target)
• Task of learning a function that maps an input to an
output based on example input-output pairs
29
Supervised Learning
30
Unsupervised Learning
• Learning without a supervision
• In the context of machine learning, data used for
learning (Train data) is unlabeled
• Given these unlabeled data machine tries to identify
the pattern and give the response
• Example:
– A person is
represented using
two attributes: Height Weight
and Weight in Kg
– No label is given (x2)
32
Unsupervised Learning
• Unsupervised learning is grouped into
– Clustering
– Association
• Clustering:
– Partitioning the data into cohesive groups such that the
data samples in a group are similar
– Example:
• Grouping the persons based on their height and weight
• Given the customer and their purchase data:
– Grouping the customers based on the similar products
purchased
• Association:
– It is a rule-based machine learning to discover the
interesting variables in a data set
– Example:
• Given the customer and their purchase data:
– Finding the products purchased together 33
Text Books
34