Unit - 2 Machine Learning
Unit - 2 Machine Learning
● Tom Mitchell provides a more modern definition: "A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with experience E."
● Example: playing checkers.
○ E = the experience of playing many games of checkers
○ T = the task of playing checkers.
○ P = the probability that the program will win the next game.
Machine Learning Five Core Steps
Types of Machine Learning
Machine
Learning
Association
Classification Regression Clustering
Analysis
Types of Machine Learning
● Supervised learning:
○ Also called predictive learning . A machine predicts the class of unknown objects based
on prior class-related information of similar objects.
● Unsupervised learning:
○ Also called descriptive learning. A machine finds patterns in unknown objects by grouping
similar objects.
● Reinforcement learning:
○ A machine learns to act on its own to achieve the given goals.
Supervised learning
- Learn from past information
- It is the information about the task the machine has to execute.
- In context of definition of machine learning, this past information is the
experience.
Supervised learning - example
● Say a machine is getting images of different objects as input and the task is to
segregate the images by either shape or colour.
● How can a machine know what is round shape or triangular shape?
● How can a machine distinguish image of an object based on whether it is blue
or green in color?
● A machine needs the basic information to be provided to it.
● The basic input is given in the form of training data.
● Training data will have past data on different aspects or features on a
number of images along with the tag on whether the image is round or
rectangular or blue or green in color.
● The tag is called ‘label’ and we say training data is labelled in case of
supervised learning.
Examples of supervised learning
● Predicting the results of a game
● Predicting whether the tumor is malignant or benign
● Predicting the price of domains like real estate , stocks, etc
● Classifying texts such as classifying a set of emails as spam or not.
Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it
has been hacked/compromised. Should you treat these as classification or as regression problems?
Association
Classification Regression Clustering
Analysis
Regression
● Regression analysis is a statistical method to model the relationship between
a dependent (target) and independent (predictor) variables with one or more
independent variables.
● More specifically, Regression analysis helps us to understand how the value of
the dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
● It predicts continuous/real values such as temperature, age, salary, price, etc
Regression
● Example: Suppose there is a marketing
company A, who does various advertisement
every year and get sales on that. list shows the
advertisement made by the company in the
last 5 years and the corresponding sales:
● Now, the company wants to do the advertisement
of $200 in the year 2019 and wants to know the
prediction about the sales for this year
Some real life examples
Example 2:
Given data about the size of houses on the real estate market, try to predict their price. Price as
a function of size is a continuous output, so this is a regression problem.
Example 3:
Given a picture of a person, we have to predict their age on the basis of the given picture
Regression
● helps in finding the correlation between variables and enables us to predict the
continuous output variable based on the one or more predictor variables.
● It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
● In Regression, we plot a graph between the variables which best fits the given data
points, using this plot, the machine learning model can make predictions about the
data. In simple words,
● "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum."
● The distance between datapoints and line tells whether a model has captured a
strong relationship or not.
Regression
Other reasons for using Regression analysis:
● Regression estimates the relationship between the target and the independent
variable.
● It is used to find the trends in data.
● It helps to predict real/continuous values.
● By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other factors.
Regression- Contd
● The relationship between variables in the linear
regression model can be explained using the
below image. Here we are predicting the salary
of an employee on the basis of the year of
experience.
● mathematical equation for Linear regression:
Y= aX+b
Here,
Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Regression
● Types of Regression
○ Linear Regression
○ Logistic Regression
○ Polynomial Regression
○ Support Vector Regression
○ Decision Tree Regression
○ Random Forest Regression
○ Ridge Regression
○ Lasso Regression
Types of Machine Learning
Machine
Learning
Association
Classification Regression Clustering
Analysis
Unsupervised learning
● There is no labelled training data to learn from and no prediction to be made.
● The objective is to take a dataset as input and try to find natural groupings or
patterns within the data
● It is often termed as descriptive model and the process of unsupervised
learning is referred to as pattern discovery or knowledge discovery.
● Clustering is the main type of unsupervised learning.
○ It intends to group or organize similar objects together.
○ Objects belonging to the same cluster are quite similar to each other while objects belonging to
different clusters are quite dissimilar.
○ Objective of clustering is to discover the intrinsic grouping of unlabelled data and form clusters .
○ Different measures of similarity can be applied for clustering
Question?
Of the following examples, which would you address using an unsupervised learning algorithm? (Check all
that apply.)
Association
Classification Regression Clustering
Analysis
Reinforcement learning
● Example: We have seen babies learning to walk without any prior knowledge of how to
do it.
○ First they notice how others do it.
○ They understand that legs have to be used, one at a time, to take a step
○ While walking, sometimes they fall down hitting an obstacle, whereas other times they are able to walk
smoothly
○ Babies might get a reward like clapping of hands by parents or chocolates.
○ Obviously no claps when baby falls.
○ Slowly a time comes when the babies learn from mistakes and are able to walk with much ease
● In the same way, machines often learn to do tasks automatically.
● Machine is given a task with hurdles.
● It tries to improve its performance of doing task .
● When a sub task is completed successfully, a reward is given.
● When a sub task is not performed successfully no reward is given
● This continues until the task is completed successfully.
● This process of learning is called reinforcement learning
● Applications
○ Self driving cars
Comparison - supervised , unsupervised and
reinforcement learning
Question?
Some of the problems below are best addressed using a supervised learning algorithm, and the others
with an unsupervised learning algorithm. Which of the following would you apply supervised learning to?
(Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to
learn from.
● Examine a large collection of emails that are known to be spam email, to discover if there are sub-
types of spam mail. := This can addressed using a clustering (unsupervised learning) algorithm, to
cluster spam mail into sub-types.
● Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the
next 10 years. := This can be addressed as a supervised learning, classification, problem, where we
can learn from a labeled dataset comprising different people's genetic data, and labels telling us if
they had developed diabetes.
● Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict
the gender of a new manuscript's author (when the identity of this author is unknown). := This
can be addressed as a supervised learning, classification, problem, where we learn from the labeled
data to predict gender.
Question?
Some of the problems below are best addressed using a supervised learning algorithm, and the others
with an unsupervised learning algorithm. Which of the following would you apply supervised learning to?
(Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to
learn from.
● Examine a large collection of emails that are known to be spam email, to discover if there are sub-
types of spam mail. := This can addressed using a clustering (unsupervised learning) algorithm, to
cluster spam mail into sub-types.
● Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the
next 10 years. := This can be addressed as a supervised learning, classification, problem, where we
can learn from a labeled dataset comprising different people's genetic data, and labels telling us if
they had developed diabetes.
● Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict
the gender of a new manuscript's author (when the identity of this author is unknown). := This
can be addressed as a supervised learning, classification, problem, where we learn from the labeled
Tools in Machine learning
1. Python
a. Most popular open source programming languge
b. Numpy - mathematical functions
c. Matplotlib - numerical plotting
d. Scipy- mathematical tools
e. Scikit-learn- for classification, regression and clustering algorithms
2. R
a. Used for Statistical computing and data analysis
b. Open source
3. Matlab
a. Matrix laboratory
b. Licensed commercial software
c. Used for numerical computing
4. SAS- Statistical Analysis System
a. Licensed commercial software
b. Strong support for machine learning functionalities