0% found this document useful (0 votes)
14 views32 pages

Class10-Introduction To ML

The document provides an overview of machine learning and data science, emphasizing data modeling, preprocessing, and analytics. It distinguishes between supervised and unsupervised learning, detailing their methodologies, applications, and examples. Key concepts include classification, regression, clustering, and association, along with the importance of data characteristics in analysis.

Uploaded by

Paladin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views32 pages

Class10-Introduction To ML

The document provides an overview of machine learning and data science, emphasizing data modeling, preprocessing, and analytics. It distinguishes between supervised and unsupervised learning, detailing their methodologies, applications, and examples. Key concepts include classification, regression, clustering, and association, along with the importance of data characteristics in analysis.

Uploaded by

Paladin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to Machine

Learning: Data Modeling


Data Science
• Multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract
knowledge and insight from structured and
unstructured data
• Central concept is gaining insight from data

Data Modeling Inference


Data Collection (Machine
Learning)

Data Preprocessing

Data
Feature
Database Cleaning and
Representation
Cleansing
2
Data Science
• Multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract
knowledge and insight from structured and
unstructured data
• Central concept is gaining insight from data

Data Modeling Inference


Data Collection (Machine
Learning)

Data Preprocessing

Data
Feature
Database Cleaning and
Representation
Cleansing
3
Data Preprocessing and Descriptive
Data Analytics
• Data preprocessing involve:
– Data cleaning, Data integration, Data transformation,
Data reduction
• Descriptive data analytics serves as a foundation for
data preprocessing
• It helps us to study the general characteristics of data
and identify the presence of noise or outliers
• Data characteristics:
– Central tendency of data
• Centre of the data
• Measuring mean, median and mode
– Dispersion of data
• The degree to which numerical data tend to spread
• Measuring range, quartiles, interquartile range (IQR), the
five-number summary and standard deviation
• Descriptive analytics are the backbone of reporting
Data Science
• Multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge
and insight from structured and unstructured data
• Central concept is gaining insight from data
• Machine learning uses data to extract knowledge –
predictive analytics

Data Modeling Inference


Data Collection (Machine
Learning)

Data Preprocessing

Data
Feature
Database Cleaning and
Representation
Cleansing
5
Predictive Data Analytics
• It is used to identify the trends, correlations and
causation by learning the patterns from data
• Study and construction of algorithms that can learn
from data and make predictions on data
• It involve tasks like
– Classification: Categorical label prediction
• E.g.: predicting the presence or absence of disease or
• predicting the category of the disease according to
symptoms
– Regression: Numeric prediction
• E.g.: predicting the amount of landslide or
• predicting the amount of rainfall
– Clustering: Grouping of similar patterns
• E.g.: grouping the similar items to be sold or
• grouping the people from the same region
• Learning from data
6
Machine Learning:
Learning from Data
• 1, 2, 3, 4, 5, ?, …, 24, 25, 26, 27, ?
• 1, 3, 5, 7, 9, ?, …, 25, 27, 29, 31, ?
• 2, 3, 5, 7, 11, ?, …, 29, 31, 37, 41, ?
• 1, 4, 9, 16, 25, ?, …, 121, 144, 169, ?
• 1, 2, 4, 8, 16, 32, ?,…, 1024, 2048, 4096, ?
• 1, 1, 2, 3, 5, 8, ?, …, 55, 89, 144, 233, ?
• 1, 1, 2, 4, 7, 13, ?, 44, 81, 149, 274, 504, ?
• 3, 5, 12, 24, 41, ?, …., 201, 248, 300, 357, ?
• 1, 6, 19, 42, 59, ?, …, 95, 117, 156, 191, ?

8
• 1, 2, 3, 4, 5, 6, …, 24, 25, 26, 27, 28
• 1, 3, 5, 7, 9, 11, …, 25, 27, 29, 31, 33
• 2, 3, 5, 7, 11, 13, …, 29, 31, 37, 41, 43
• 1, 4, 9, 16, 25, 36, …, 121, 144, 169, 196
• 1, 2, 4, 8, 16, 32, 64,…, 1024, 2048, 4096, 8192
• 1, 1, 2, 3, 5, 8, 13, …, 55, 89, 144, 233, 377
• 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, 927
• 3, 5, 12, 24, 41, 63, ….., 201, 248, 300, 357, 419
(2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62)
• 1, 6, 19, 42, 59, ?, …, 95, 117, 156, 191, ?

• Pattern: Any regularity or structure in data or source of


data
• Pattern Analysis: Automatic discovery of patterns in
data
9
Image Classification

Tige
r

Giraffe

Horse

Bear

Intraclass variability
10
Interclass
Scene Image Classification similarity
Tall Inside Street Highway Coast Open Mountain Forest
building city country

11
Scene Image Clustering

12
Scene Image Clustering
Residential Interiors

Mountain
s

Military Vehicles

Sacred Places

Sunsets & Sunrises

13
Machine Learning for Pattern
Recognition
• Learning: Acquiring new knowledge or modifying the existing
knowledge
• Knowledge: Familiarity with information present in data
• Learning by machines for pattern analysis: Acquisition of
knowledge from data to discover patterns in data
• Data-driven techniques for learning by machines: Learning from
examples (Training of models)
• Generalization ability of learning machines: Performance of trained
models on new (test) data
• Target of learning techniques: Good generalization ability
• Learning techniques: Estimation of parameters of models
• Learning machines and Learning techniques for pattern analysis:
– Statistical Models (Maximum likelihood)
– Artificial Neural Networks (Error correction learning)
– Kernel Methods (Learning optimal linear relationships) 14
Illustration - Data1: Representing a
Person
• A person is represented using two
attributes:
– Height
– Weight

Weight
in Kg
(x2)

Height in cm (x1)

x = [x1 x2]T
17
Illustration – Data2: Iris (Flower) Data [1]

x = [x1 x2 x3 x4]T

[1] R. A. Fisher, "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, pp. 179-
188, 1936. 18
Illustration – Data3: Years of Experience
and Salary
Years of Salary (in
experienc Rs 1000)
e (x2)
(x1)
3 30
8 57
9 64
13 72 Salary
3 36 (x2)
6 43
11 59
21 90
1 20
Years of experience (x1)
16 83

x = [x1 x2]T
19
Illustration – Data4: Environmental Data

x = [x1 x2 x3 x4]T

20
Supervised and Unsupervised
Learning
Supervised Learning

• Learning under the supervision


– Student learning from teacher
– Child learning to recognize objects/animals
• In the context of machine learning, data used for
learning (Train data) is labeled
• Labeled data: Data for which the target value is
already known

22
Labeled Data – Illustration:
Data1 - Representing a Person
• A person is represented using two
attributes: • Class (y):
– Height – Child (0)
– Weight – Adult (1)

Weight
in Kg
(x2)

Height in cm (x1)

x = [x1 x2]T
23
Labeled Data – Illustration:
Data2 - Iris (Flower) Data
• Class (y):
– Iris Setosa (1)
– Iris Versicolour (2)
– Iris Virginica (3)

x = [x1 x2 x3 x4]T

24
Labeled Data – Illustration:
Data3 - Years of Experience and Salary
Years of • Class – Raise in Salary (y):
Salary (in Raise
experienc
Rs 1000) – Yes(1)
e
(x2) (y) – No (0)
(x1)
3 30 1
8 57 0
9 64 1
13 72 1
3 36 1
6 43 0
11 59 1
21 90 1
1 20 0
16 83 0
x = [x1 x2]T
25
Labeled Data – Illustration:
Data3 - Years of Experience and Salary

Years of Salary (in • Input variable: Years of experience


experienc Rs 1000)
e (y)
• Output variable: Salary
(x)
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83

26
Illustration – Data4: Environmental Data

• Predicting Rain (target


attribute) based on
Temperature, Humidity and
Pressure
• Input variable: Temperature,
Humidity and Pressure
• Output variable: Rain

27
Supervised Learning
• In supervised learning, each example (data sample) is
a pair consisting of an input example (typically a
vector) and a desired output value (also called
the target)
• Task of learning a function that maps an input to an
output based on example input-output pairs

• A supervised learning algorithm


– analyzes the training data and
– produces an inferred function, which can be used for
predicting the output of a new examples
• One of the scenario will be the algorithm to determine
the class labels for unseen instances
Class
Height, x1 Adult/Child Adult :Class C1 (1)
Classifier Child :Class C2 (0)
Weight, x2
28
Supervised Learning
• Supervised learning is grouped into
– Classification
– Regression
• Classification:
– Output variable is categorical
– Categorical label prediction
– Example:
• Predicting a person as adult or child (2-class)
• Predicting the raise in salary based on the year of
experience and salary (2-class)
• Identify an email as spam or not (2-class)
• Predicting the presence or absence of disease (2-class)
• Categorising the disease according to symptoms (Multi-
class)
• Categorizing the Iris flowers (Multi-class)

29
Supervised Learning

• Supervised learning is grouped into


– Classification
– Regression
• Regression:
– Output variable is real or continuous value
– Numeric prediction
– Example:
• predicting the salary based on the experience
• predicting the amount of rainfall based on atmospheric
temperature, humidity, pressure, amount of sunlight etc.

30
Unsupervised Learning
• Learning without a supervision
• In the context of machine learning, data used for
learning (Train data) is unlabeled
• Given these unlabeled data machine tries to identify
the pattern and give the response
• Example:
– A person is
represented using
two attributes: Height Weight
and Weight in Kg
– No label is given (x2)

– Machine try to learn


the patterns from the
given set and groups
them based on the Height in cm (x1)
similarity
31
Summary

• Machine learning: Learning from data


• Supervised machine learning
– Data used for learning (Train data) is labeled
– Each example (data sample) is a pair consisting of an
input example (typically a vector) and a desired output
value (also called the target)
– Task of learning a function that maps an input to an
output based on example input-output pairs
– Classification and Regression
• Unsupervised machine learning
– Data used for learning (Train data) is unlabeled
– Given these unlabeled data machine tries to identify the
pattern based on similarity
– Clustering and Association

32
Unsupervised Learning
• Unsupervised learning is grouped into
– Clustering
– Association
• Clustering:
– Partitioning the data into cohesive groups such that the
data samples in a group are similar
– Example:
• Grouping the persons based on their height and weight
• Given the customer and their purchase data:
– Grouping the customers based on the similar products
purchased

• Association:
– It is a rule-based machine learning to discover the
interesting variables in a data set
– Example:
• Given the customer and their purchase data:
– Finding the products purchased together 33
Text Books

1. J. Han and M. Kamber, Data Mining: Concepts and


Techniques, Third Edition, Morgan Kaufmann Publishers,
2011.

2. S. Theodoridis and K. Koutroumbas, Pattern Recognition,


Academic Press, 2009.

3. C. M. Bishop, Pattern Recognition and Machine Learning,


Springer, 2006.

34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy