0% found this document useful (0 votes)
38 views45 pages

Unit - 2 Machine Learning

Uploaded by

Mihir Maisuria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views45 pages

Unit - 2 Machine Learning

Uploaded by

Mihir Maisuria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Unit-I

Introduction to Machine Learning


Overview of Human Learning
● Learning is typically referred to as the process of gaining information
through observation.
● And why do we need to learn?
○ In our daily life , we need to carry out multiple activities.
○ It may be a task as simple as walking down the street or doing the homework.
○ Or it can be a complex task of deciding the angle of trajectory of a rocket for launching in
space.
● As we keep learning more , efficiency in doing tasks keep improving.
● With more knowledge the ability to do homework with less number of
mistakes increases
● With more learning, tasks can be performed easily.
Types of Human Learning
● Learning under expert guidance
○ Like a child taught by parents
○ He calls his hand a ‘hand’ because that is the information he gets from his
parents.
○ Sky is blue to him because that is what his parents have taught him
○ Next phase of life is when baby goes to school. He starts with basic
familiarization of alphabets and digits
○ Moving to words , sentences, paragraphs, etc.
○ And moves to next phase of life with higher studies, professional life..etc
○ In all phases of life of a human being there is an element of guided learning
So guided learning is a process of gaining information from a person having
sufficient knowledge due to past experience.
Contd..
● Learning guided by knowledge gained from experts
○ Knowledge imparted by teacher or mentor at some point of time in some other form or context.
○ Ex: a baby can group together all objects of same color even if his parents have not specifically
taught him to do so.
○ There is no direct learning.
○ It is some past information shared on some context which is used as a learning to make
decisions.
● Learning by self
○ In many situations, humans are left to learn on their own.
○ A classic example is a baby learning to walk through obstacles.
○ He bumps on to obstacles and falls down multiple times till he learns that whenever there is an
obstacle , he needs to cross over it.
○ Not all things are taught by others.
○ A lot of things need to be learnt only from mistakes made in the past.
When to use Machine learning

● involve a repeated decision or evaluation which you want to automate and


need consistent results.
● It is difficult or impossible to explicitly describe the solution or criteria behind
a decision.
● You have labeled data, or existing examples where you can describe the
situation and map it to the correct result.
Applications of Machine learning
Three major domains where machine learning is applied:
● Banking and finance
○ Identifying fraudulent transactions
○ To maintain the customers so that they don’t leave the bank.
○ To identify customers who are vulnerable to leave
● Insurance
○ Risk prediction during new customer onboarding
○ Claims management- whether fraudulent ?
● Healthcare
○ Predict health conditions
○ Person is alerted to take preventive actions
○ Machine learning with computer vision also plays an important role in disease diagnosis from
medical imaging
AI, ML and DL
1. AI enables machines to think without
any human intervention. It is a broad
area of computer science
2. Machine Learning (ML): ML is a
subset of AI that uses statistical
learning algorithms to build smart
systems. The ML systems can
automatically learn and improve
without explicitly being programmed.
3. Deep Learning (DL) This subset of
AI is a technique that is inspired by
the way a human brain filters
information. I
Evolution of Machine Learning from 1950
What is Machine Learning?
● Before learning this we should be able to answer more fundamental
questions like
○ Do machines really learn?
○ If so , how do they learn?
○ Which problem do we consider as well posed learning problem? What are the important
features that are required to well define a learning problem?
Definition of Machine Learning?
“Arthur Samuel described it as: "the field of study that gives computers the ability to
learn without being explicitly programmed." This is an older, informal definition.”

● Tom Mitchell provides a more modern definition: "A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with experience E."
● Example: playing checkers.
○ E = the experience of playing many games of checkers
○ T = the task of playing checkers.
○ P = the probability that the program will win the next game.
Machine Learning Five Core Steps
Types of Machine Learning
Machine
Learning

Supervised Unsupervised Reinforcement


learning learning learning

Association
Classification Regression Clustering
Analysis
Types of Machine Learning
● Supervised learning:
○ Also called predictive learning . A machine predicts the class of unknown objects based
on prior class-related information of similar objects.
● Unsupervised learning:
○ Also called descriptive learning. A machine finds patterns in unknown objects by grouping
similar objects.
● Reinforcement learning:
○ A machine learns to act on its own to achieve the given goals.
Supervised learning
- Learn from past information
- It is the information about the task the machine has to execute.
- In context of definition of machine learning, this past information is the
experience.
Supervised learning - example
● Say a machine is getting images of different objects as input and the task is to
segregate the images by either shape or colour.
● How can a machine know what is round shape or triangular shape?
● How can a machine distinguish image of an object based on whether it is blue
or green in color?
● A machine needs the basic information to be provided to it.
● The basic input is given in the form of training data.
● Training data will have past data on different aspects or features on a
number of images along with the tag on whether the image is round or
rectangular or blue or green in color.
● The tag is called ‘label’ and we say training data is labelled in case of
supervised learning.
Examples of supervised learning
● Predicting the results of a game
● Predicting whether the tumor is malignant or benign
● Predicting the price of domains like real estate , stocks, etc
● Classifying texts such as classifying a set of emails as spam or not.

● When we are trying to predict a categorical or nominal variable, the


problem is known as classification problem.
● Whereas when we are trying to predict a real values variable, the problem
falls under the category of regression.
Question?
You’re running a company, and you want to develop learning algorithms to address each of two problems.
Problem 1:You have a large inventory of identical items. You want to predict how many of these items
will sell over the next 3 months.

Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it
has been hacked/compromised. Should you treat these as classification or as regression problems?

1. Treat both as classification problems.


2. Treat problem 1 as a classification problem, problem 2 as a regression problem.
3. Treat problem 1 as a regression problem, problem 2 as a classification problem.
4. Treat both as regression problems.
Classification
● Let’s discuss how to segregate the images of objects based on the shape.
● If image is round, it is put under one category, while if image is of rectangular object, it is
put under another category.
● In which category the machine should put image of unknown category, also called as
test data depends on the information it gets from the past data, which we have called
as training data.
● Since the training data has a label or category defined for each and every image , the
machine has to map a new image or test data to a set of images to which it is similar to
and assign the same label or category to the test data.
● Whole problem revolves around assigning label or category or class to a test data based
on the label or category or class information that is imparted by the training data.
● Since the target objective is to assign a class label, this type of problem is called
classification problem.
Machine learning algorithms for classification
● Some Machine learning algorithms for classification
○ Naive Bayes
○ Decision tree
○ k-Nearest Neighbour algorithm
○ Random Forest
○ Support Vector Machine
Machine learning algorithms for classification
● In summary, classification is a type of
supervised learning where a target
feature which is of type categorical is
predicted for test data based on
information imparted by training data.
● Some typical classification problems
include:
○ Image classification
○ Prediction of disease
○ Win loss prediction of games
○ Prediction of natural calamity
○ Recognition of handwriting
Types of Machine Learning
Machine
Learning

Supervised Unsupervised Reinforcement


learning learning learning

Association
Classification Regression Clustering
Analysis
Regression
● Regression analysis is a statistical method to model the relationship between
a dependent (target) and independent (predictor) variables with one or more
independent variables.
● More specifically, Regression analysis helps us to understand how the value of
the dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
● It predicts continuous/real values such as temperature, age, salary, price, etc
Regression
● Example: Suppose there is a marketing
company A, who does various advertisement
every year and get sales on that. list shows the
advertisement made by the company in the
last 5 years and the corresponding sales:
● Now, the company wants to do the advertisement
of $200 in the year 2019 and wants to know the
prediction about the sales for this year
Some real life examples
Example 2:

Given data about the size of houses on the real estate market, try to predict their price. Price as
a function of size is a continuous output, so this is a regression problem.

Example 3:

Given a picture of a person, we have to predict their age on the basis of the given picture
Regression
● helps in finding the correlation between variables and enables us to predict the
continuous output variable based on the one or more predictor variables.
● It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
● In Regression, we plot a graph between the variables which best fits the given data
points, using this plot, the machine learning model can make predictions about the
data. In simple words,
● "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum."
● The distance between datapoints and line tells whether a model has captured a
strong relationship or not.
Regression
Other reasons for using Regression analysis:

● Regression estimates the relationship between the target and the independent
variable.
● It is used to find the trends in data.
● It helps to predict real/continuous values.
● By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other factors.
Regression- Contd
● The relationship between variables in the linear
regression model can be explained using the
below image. Here we are predicting the salary
of an employee on the basis of the year of
experience.
● mathematical equation for Linear regression:
Y= aX+b

Here,
Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Regression
● Types of Regression
○ Linear Regression
○ Logistic Regression
○ Polynomial Regression
○ Support Vector Regression
○ Decision Tree Regression
○ Random Forest Regression
○ Ridge Regression
○ Lasso Regression
Types of Machine Learning
Machine
Learning

Supervised Unsupervised Reinforcement


learning learning learning

Association
Classification Regression Clustering
Analysis
Unsupervised learning
● There is no labelled training data to learn from and no prediction to be made.
● The objective is to take a dataset as input and try to find natural groupings or
patterns within the data
● It is often termed as descriptive model and the process of unsupervised
learning is referred to as pattern discovery or knowledge discovery.
● Clustering is the main type of unsupervised learning.
○ It intends to group or organize similar objects together.
○ Objects belonging to the same cluster are quite similar to each other while objects belonging to
different clusters are quite dissimilar.
○ Objective of clustering is to discover the intrinsic grouping of unlabelled data and form clusters .
○ Different measures of similarity can be applied for clustering
Question?
Of the following examples, which would you address using an unsupervised learning algorithm? (Check all
that apply.)

1. Given email labeled as spam/not spam, learn a spam filter.


2. Given a set of news articles found on the web, group them into sets of articles about the same stories.
3. Given a database of customer data, automatically discover market segments and group customers into
different market segments.
4. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as
having diabetes or not.
Question?
Of the following examples, which would you address using an unsupervised learning algorithm? (Check all
that apply.)

1. Given email labeled as spam/not spam, learn a spam filter.


2. Given a set of news articles found on the web, group them into sets of articles about the same
stories.
3. Given a database of customer data, automatically discover market segments and group
customers into different market segments.
4. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as
having diabetes or not.
● Common similarity measure is distance.
● Two data items are considered a part of the same cluster if the distance
between them is less.
● If the distance between the data items is high, the items do not generally
belong to the same cluster.
● This is known as distance based clustering.
Association analysis
● One more variant of unsupervised learning.
● Association between data items is identified.
● Examples: Market basket analysis.
○ From past transaction data in a grocery store, it may be observed that most of the
customers who have bought item A, have also bought item B and item C or atleast one of
them.
○ This means that there is a strong association of the event ‘purchase of item A’ with the
event ‘purchase of item B’ or ‘purchase of item C’ .
○ Identifying these sort of associations is the goal of association analysis.
● Applications:
○ Market basket analysis
○ Recommender systems
Types of Machine Learning
Machine
Learning

Supervised Unsupervised Reinforcement


learning learning learning

Association
Classification Regression Clustering
Analysis
Reinforcement learning
● Example: We have seen babies learning to walk without any prior knowledge of how to
do it.
○ First they notice how others do it.
○ They understand that legs have to be used, one at a time, to take a step
○ While walking, sometimes they fall down hitting an obstacle, whereas other times they are able to walk
smoothly
○ Babies might get a reward like clapping of hands by parents or chocolates.
○ Obviously no claps when baby falls.
○ Slowly a time comes when the babies learn from mistakes and are able to walk with much ease
● In the same way, machines often learn to do tasks automatically.
● Machine is given a task with hurdles.
● It tries to improve its performance of doing task .
● When a sub task is completed successfully, a reward is given.
● When a sub task is not performed successfully no reward is given
● This continues until the task is completed successfully.
● This process of learning is called reinforcement learning
● Applications
○ Self driving cars
Comparison - supervised , unsupervised and
reinforcement learning
Question?
Some of the problems below are best addressed using a supervised learning algorithm, and the others
with an unsupervised learning algorithm. Which of the following would you apply supervised learning to?
(Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to
learn from.
● Examine a large collection of emails that are known to be spam email, to discover if there are sub-
types of spam mail. := This can addressed using a clustering (unsupervised learning) algorithm, to
cluster spam mail into sub-types.
● Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the
next 10 years. := This can be addressed as a supervised learning, classification, problem, where we
can learn from a labeled dataset comprising different people's genetic data, and labels telling us if
they had developed diabetes.
● Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict
the gender of a new manuscript's author (when the identity of this author is unknown). := This
can be addressed as a supervised learning, classification, problem, where we learn from the labeled
data to predict gender.
Question?
Some of the problems below are best addressed using a supervised learning algorithm, and the others
with an unsupervised learning algorithm. Which of the following would you apply supervised learning to?
(Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to
learn from.
● Examine a large collection of emails that are known to be spam email, to discover if there are sub-
types of spam mail. := This can addressed using a clustering (unsupervised learning) algorithm, to
cluster spam mail into sub-types.
● Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the
next 10 years. := This can be addressed as a supervised learning, classification, problem, where we
can learn from a labeled dataset comprising different people's genetic data, and labels telling us if
they had developed diabetes.
● Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict
the gender of a new manuscript's author (when the identity of this author is unknown). := This
can be addressed as a supervised learning, classification, problem, where we learn from the labeled
Tools in Machine learning
1. Python
a. Most popular open source programming languge
b. Numpy - mathematical functions
c. Matplotlib - numerical plotting
d. Scipy- mathematical tools
e. Scikit-learn- for classification, regression and clustering algorithms
2. R
a. Used for Statistical computing and data analysis
b. Open source
3. Matlab
a. Matrix laboratory
b. Licensed commercial software
c. Used for numerical computing
4. SAS- Statistical Analysis System
a. Licensed commercial software
b. Strong support for machine learning functionalities

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy