0% found this document useful (0 votes)
71 views23 pages

R20 ML - Unit-1

Unit 1 material for machine learning

Uploaded by

ssamatha8811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views23 pages

R20 ML - Unit-1

Unit 1 material for machine learning

Uploaded by

ssamatha8811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

R20-MACHINE LEARNING

Unit-1:
Unit I: Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning
Systems, Main Challenges of Machine Learning. Statistical Learning: Introduction, Supervised and
Unsupervised Learning, Training and Test Loss, Tradeoffs in Statistical Learning, Estimating Risk Statistics,
Sampling distribution of an estimator, Empirical Risk Minimization.

The evolution of machine learning from 1950 is depicted in Figure 1.1:

1
Introduction- Artificial Intelligence, Machine Learning, Deep learning

Artificial Intelligence is the concept of creating smart intelligent machines.


Machine Learning is a subset of artificial intelligence that helps you build AI-driven applications.
Deep Learning is a subset of machine learning that uses vast volumes of data and complex algorithms to
train a model.
What is Artificial Intelligence?
Artificial intelligence, commonly referred to as AI, is the process of imparting data, information, and human
intelligence to machines. The main goal of Artificial Intelligence is to develop self-reliant machines that can
think and act like humans. These machines can mimic human behavior and perform tasks by learning and
problem-solving. Most of the AI systems simulate natural intelligence to solve complex problems.
Let’s have a look at an example of an AI-driven product - Amazon Echo.

Types of Artificial Intelligence


Reactive Machines - These are systems that only react. These systems don’t form memories, and they don’t
use any past experiences for making new decisions.

2
Limited Memory - These systems reference the past, and information is added over a period of time. The
referenced information is short-lived.

Theory of Mind - This covers systems that are able to understand human emotions and how they affect
decision making. They are trained to adjust their behavior accordingly.

Self-awareness - These systems are designed and created to be aware of themselves. They understand their
own internal states, predict other people’s feelings, and act appropriately.
Applications of Artificial Intelligence:

 Machine Translation such as Google Translate


 Self Driving Vehicles such as Google’s Waymo
 AI Robots such as Sophia and Aibo
 Speech Recognition applications like Apple’s Siri or OK Google

WHAT IS MACHINE LEARNING?


Machine learning is a discipline of computer science that uses computer algorithms and analytics to build
predictive models that can solve business problems.

Tom M. Mitchell, Professor of Machine Learning Department, School of Computer Science, Carnegie
Mellon University. Tom M. Mitchell has defined machine learning as:
‘A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience
E.’
How Does Machine Learning Work?
Machine learning accesses vast amounts of data (both structured and unstructured) and learns from it to
predict the future. It learns from the data by using multiple algorithms and techniques. Below is a diagram
that shows how a machine learns from data.

3
TYPES OF MACHINE LEARNING
Machine learning can be classified into three broad categories:
1. Supervised learning – Also called predictive learning. A machine predicts the class of unknown objects
based on prior class-related information of similar objects.
2. Unsupervised learning – Also called descriptive learning. A machine finds patterns in unknown objects by
grouping similar objects together.
3. Reinforcement learning – A machine learns to act on its own to achieve the given goals.

FIG. Types of machine learning

Supervised learning:

FIG. Supervised learning


Some examples of supervised learning are
 Predicting the results of a game
 Predicting whether a tumour is malignant or benign
 Predicting the price of domains like real estate, stocks, etc.

4
 Classifying texts such as classifying a set of emails as spam or non-spam

Two areas of supervised learning, i.e. classification and regression.


Classification: classification is a type of supervised learning where a target feature, which is of type
categorical, is predicted for test data based on the information imparted by training data. The target
categorical feature is known as class.

FIG. Classification
Some typical classification problems include:
 Image classification
 Prediction of disease
 Win–loss prediction of games
 Prediction of natural calamity like earthquake, flood, etc.
 Recognition of handwriting
Regression: In linear regression, the objective is to predict numerical features like real estate or stock
price, temperature, marks in an examination, sales revenue, etc. The underlying predictor variable and the
target variable are continuous in nature.
In case of simple linear regression, there is only one predictor variable whereas in case of multiple linear
regression, multiple predictor variables can be included in the model.
A typical linear regression model can be represented in the form –
where ‘x’ is the predictor variable and ‘y’ is the target variable.
Typical applications of regression can be seen in
 Demand forecasting in retails
 Sales prediction for managers
 Price prediction in real estate
 Weather forecast
 Skill demand forecast in job market.

5
Below is an example of a supervised learning method. The algorithm is trained using labeled data of dogs
and cats. The trained model predicts whether the new image is that of a cat or a dog.

Unsupervised learning: In unsupervised learning, there is no labeled training data to learn from and no
prediction to be made. In unsupervised learning, the objective is to take a dataset as input and try to find
natural groupings or patterns within the data elements or records. Therefore, unsupervised learning is often
termed as descriptive model and the process of unsupervised learning is referred as pattern discovery or
knowledge discovery.

FIG. Unsupervised learning


Two areas of unsupervised learning, i.e. clustering and association analysis.

Some examples of unsupervised learning are


 Market basket analysis
 Recommender systems
 Customer segmentation etc.

6
Below is an example of an unsupervised learning method that trains a model using unlabeled data. In this
case, the data consists of different vehicles. The purpose of the model is to classify each kind of vehicle.

Reinforcement learning:
The goal of reinforcement learning is to train an agent to complete a task within an uncertain environment.
The agent receives observations and a reward from the environment and sends actions to the environment.
The reward measures how successful action is with respect to completing the task goal.

FIG. Reinforcement learning


Below is an example that shows how a machine is trained to identify shapes.

7
Examples of reinforcement learning algorithms include Q-learning and Deep Q-learning Neural Networks.

Aspects of developing a learning system or Design of a learning system:

8
Machine Learning Processes
Machine Learning involves seven steps:

Machine Learning Applications

 Sales forecasting for different products


 Fraud analysis in banking
 Product recommendations
 Stock price prediction
What is Deep Learning?
Deep learning is a subset of machine learning that deals with algorithms inspired by the structure and
function of the human brain. Deep learning algorithms can work with an enormous amount of both
structured and unstructured data. Deep learning’s core concept lies in artificial neural networks, which
enable machines to make decisions.

The major difference between deep learning vs machine learning is the way data is presented to the machine.
Machine learning algorithms usually require structured data, whereas deep learning networks work on
multiple layers of artificial neural networks.
This is what a simple neural network looks like:

9
The network has an input layer that accepts inputs from the data. The hidden layer is used to find any hidden
features from the data. The output layer then provides the expected output.

Here is an example of a neural network that uses large sets of unlabeled data of eye retinas. The network
model is trained on this data to find out whether or not a person has diabetic retinopathy.

How Does Deep Learning Work?


1. Calculate the weighted sums.
2. The calculated sum of weights is passed as input to the activation function.
3. The activation function takes the “weighted sum of input” as the input to the function, adds a bias,
and decides whether the neuron should be fired or not.
4. The output layer gives the predicted output.
5. The model output is compared with the actual output. After training the neural network, the model
uses the backpropagation method to improve the performance of the network. The cost function helps
to reduce the error rate.

10
In the following example, deep learning and neural networks are used to identify the number on a license
plate. This technique is used by many countries to identify rules violators and speeding vehicles.

Types of Deep Neural Networks


Convolutional Neural Network (CNN) - CNN is a class of deep neural networks most commonly used for
image analysis.

11
Recurrent Neural Network (RNN) - RNN uses sequential information to build a model. It often works better
for models that have to memorize past data.
Generative Adversarial Network (GAN) - GAN are algorithmic architectures that use two neural networks to
create new, synthetic instances of data that pass for real data. A GAN trained on photographs can generate
new photographs that look at least superficially authentic to human observers.
Deep Belief Network (DBN) - DBN is a generative graphical model that is composed of multiple layers of
latent variables called hidden units. Each layer is interconnected, but the units are not.

Deep Learning Applications

 Cancer tumor detection


 Captionbot for captioning an image
 Music generation
 Image coloring
 Object detection

Main Challenges of Machine Learning:


In short, since your main task is to select a learning algorithm and train it on some data, the two things that
can go wrong are “bad algorithm” and “bad data.”

ISSUES IN MACHINE LEARNING:


 What algorithms exist for learning general target functions from specific training examples? In what
settings will particular algorithms converge to the desired function, given sufficient training data?
Which algorithms perform best for which types of problems and representations?
 How much training data is sufficient? What general bounds can be found to relate the confidence in
learned hypotheses to the amount of training experience and the character of the learner's hypothesis
space?
 When and how can prior knowledge held by the learner guide the process of generalizing from
examples? Can prior knowledge be helpful even when it is only approximately correct?
 What is the best strategy for choosing a useful next training experience, and how does the choice of
this strategy alter the complexity of the learning problem?
 What is the best way to reduce the learning task to one or more function approximation problems?
Put another way, what specific functions should the system attempt to learn? Can this process itself
be automated?
 How can the learner automatically alter its representation to improve its ability to represent and learn
the target function?

12
1. Inadequate Training Data: due to noise data, incorrect data and generalizing of output data.
Data quality can be affected by some factors as follows:
Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well as accuracy in
classification tasks.
Incorrect data- It is also responsible for faulty programming and results obtained in machine learning
models. Hence, incorrect data may affect the accuracy of the results also.
Generalizing of output data- Sometimes, it is also found that generalizing output data becomes complex,
which results in comparatively poor future actions.
2. Poor quality of data: if your training data is full of errors, outliers, and noise (e.g., due to poor quality
measurements), it will make it harder for the system to detect the underlying patterns, so your system
is less likely to perform well.
For example:
• If some instances are clearly outliers, it may help to simply discard them or try to fix the errors
manually.
• If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you
must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing
values (e.g., with the median age), or train one model with the feature and one model without it, and so on.
3. Non-representative training data: it is crucial that your training data be representative of the new
cases you want to generalize to. This is true whether you use instance-based learning or model-based
learning.
if the sample is too small, you will have sampling noise (i.e., non representative data as a result of
chance), but even very large samples can be non representative if the sampling method is flawed.
This is called sampling bias.

4. Over fitting and Under fitting:


Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the performance of
the model. Let's understand with a simple example where we have a few training data sets such as
1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a considerable
probability of identification of an apple as papaya because we have a massive amount of biased data
in the training data set; hence prediction got negatively affected. The main reason behind overfitting
is using non-linear methods used in machine learning algorithms as they build non-realistic data
models. We can overcome overfitting by using linear and parametric algorithms in the machine
learning models.
Methods to reduce overfitting:
o Increase training data in a dataset.
o Reduce model complexity by simplifying the model by selecting one with fewer parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.
Underfitting occurs when our model is too simple to understand the base structure of the data, just
like an undersized pant. This generally happens when we have limited data into the data set, and we

13
try to build a linear model with non-linear data. In such scenarios, the complexity of the model
destroys, and rules of the machine learning model become too easy to be applied on this data set, and
the model starts doing wrong predictions as well.
Methods to reduce Underfitting:
o Increase model complexity
o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5. Getting bad recommendations


6. Testing and Validating:
Data into two sets: the training set and the test set. As these names imply, you train your model using
the training set, and you test it using the test set. The error rate on new cases is called the
generalization error (or out-of sample error), and by evaluating your model on the test set, you get an
estimate of this error. This value tells you how well your model will perform on instances it has
never seen before.
If the training error is low (i.e., your model makes few mistakes on the training set) but the
generalization error is high, it means that your model is overfitting the training data.
7. Data Bias(errors)
8. Slow implementations and results
9. Irrelevant features: if the training data contains too many irrelevant ones. This issue can be solved by
using feature engineering (i,e. feature selection and feature extraction ).

Statistical Learning:
Structuring and visualizing data are important aspects of data science, the main challenge lies in the
mathematical analysis of the data. When the goal is to interpret the model and quantify the uncertainty in the
data, this analysis is usually referred to as statistical learning.
There are two major goals for modeling data:
1) to accurately predict some future quantity of interest, given some observed data, and
2) to discover unusual or interesting patterns in the data
To achieve these goals, one must rely on knowledge from three important pillars of the mathematical
sciences.
Function approximation: Building a mathematical model for data usually means understanding how one
data variable depends on another data variable. The most natural way to represent the relationship between
variables is via a mathematical function or map. Thus, data scientists have to understand how best to
approximate and represent functions using the least amount of computer processing and memory.
Optimization: Given a class of mathematical models, we wish to find the best possible model in that class.
This requires some kind of efficient search or optimization procedure. The optimization step can be viewed
as a process of fitting or calibrating a function to observed data. This step usually requires knowledge of
optimization algorithms and efficient computer coding or programming.
Probability and Statistics: In general, the data used to fit the model is viewed as a realization of a random
process or numerical vector, whose probability law determines the accuracy with which we can predict
future observations. Thus, in order to quantify the uncertainty inherent in making predictions about the
future, and the sources of error in the model, data scientists need a firm grasp of probability theory and
statistical inference.

14
Supervised and Unsupervised Learning:
Feature and response: Given an input or feature vector x, one of the main goals of machine learning is to
predict an output or response variable y.

For example,
 x could be a digitized signature and y a binary variable that indicates whether the signature is genuine or
false.
 x represents the weight and smoking habits of an expecting mother and y the birth weight of the baby.

Prediction function: which takes as an input x and outputs a guess g(x) for y (denoted by 𝑦, for example)
Regression: the response variable y can take any real value.
when y can only lie in a finite set, say y ∈ {0, . . . , c − 1}, then predicting y is conceptually the same as
classifying the input x into one of c categories, and so prediction becomes a classification problem.
loss function: We can measure the accuracy of a prediction by with respect to a given response y by loss
function using some loss function Loss(y, 𝑦). n a regression setting the usual choice is the squared error loss
`12(y− 𝑦)2 .
In the case of classification, the zero–one (also written 0–1) loss function Loss(y, 𝑦) = 1{y , 𝑦} is often
used, which incurs a loss of 1 whenever the predicted class by is not equal to the class y.
we will encounter various other useful loss functions, such as the cross-entropy and hinge loss functions.

Error is often used as a measure of distance between a “true” object y and some approximation 𝑦, thereof. If
y is real-valued, the absolute error |y − 𝑦 | and the squared error (y− 𝑦,)2 are both well-established error
concepts, as are the norm ||y− 𝑦 || and squared norm ||y− 𝑦 || 2 for vectors. The squared error (y− 𝑦) 2 is just
one example.

Risk: assume that each pair (x, y) is the outcome of a random pair (X, Y) that has some joint probability
density f(x, y). We then assess the predictive performance via the expected loss, usually called the risk, for

g:
For example, in the classification case with zero–one loss function the risk is equal to the probability of
incorrect classification: . The prediction function g is called classifier.

Given the distribution of (X, Y) and any loss function, we can in principle find the best possible
g* := that yields the smallest risk

we have where f(y | x) = P[Y = y | X = x] is the conditional probability of Y = y


given X = x.

Learner: the function gT is a learner who learns the unknown functional relationship g ∗ : x  y from the
training data T.
The learner gT to predict the output of a new input X, for which the correct output Y is not unknown.

15
Supervised Learning: One tries to learn the functional relationship between the feature vector x and response y
in the presence of a teacher who provides n examples. It is common to speak of “explaining” or predicting y
on the basis of explanatory x, where x is a vector of explanatory variables.
An example of supervised learning is email spam detection.

unsupervised learning: learning makes no distinction between response and explanatory variables, and the
objective is simply to learn the structure of the unknown distribution of the data. In other words, we need to
learn f(x). In this case the guess g(x) is an approximation of f(x) and the risk is of the form

Training and Test Loss:


Given an arbitrary prediction function g, it is typically not possible to compute its risk .
However, using the training sample T, we can approximate via the empirical (sample average) risk

which we call the training loss. The training loss is thus an unbiased estimator of the risk (the expected loss)
for a prediction function g, based on the training data.

To approximate the optimal prediction function g∗ (the minimizer of the risk we first select a suitable
collection of approximating functions G and then take our learner to be the function in G that minimizes the
training loss; that is

The prediction accuracy of new pairs of data is measured by the generalization risk of the learner. For a
fixed training set τ it is defined as

Figure: The generalization risk for a fixed training set is the weighted-average loss over all possible pairs (x,
y).
16
.

Figure: The expected generalization risk is the weighted-average loss over all possible pairs (x, y) and over
all training sets.

For any outcome τ of the training data, we can estimate the generalization risk without bias by taking the
sample average

is called Test Loss.


Where is a so-called test sample. The test sample is completely separate
from T, but is drawn in the same way as T;

17
Tradeoffs in Statistical Learning:
The relation between model complexity, computational simplicity, and estimation accuracy, it is useful to
decompose the generalization risk into several parts, so that the tradeoffs between these parts can be studied.

We will consider two such decompositions: the approximation–estimation tradeoff and the bias–variance
tradeoff.
We can decompose the generalization risk into the following three components:

18
The decomposition can now be interpreted as follows.

Thus, when using a squared-error loss, the generalization risk for a linear class can be decomposed as:

Note that in this decomposition the statistical error is the only term that depends on the training set.

The errors in a machine learning model can be broken down into 2 parts:
1. Reducible Error
2. Irreducible Error

Irreducible errors are errors that cannot be reduced even if you use any other machine learning model.

Reducible errors, on the other hand, is further broken down into square of bias and variance. Due to this
bias-variance, it causes the machine learning model to either overfit or underfit the given data.
What exactly is Bias?
Bias is the inability of a machine learning model to capture the true relationship between the data variables.
It is caused by the erroneous assumptions that are inherent to the learning algorithm. For example, in linear
regression, the relationship between the X and the Y variable is assumed to be linear, when in reality the
relationship may not be perfectly linear.
In general,
High Bias indicates more assumptions in the learning algorithm about the relationships between the
variables.
Less Bias indicates fewer assumptions in the learning algorithm.

What is the Variance Error?


This is nothing but the concept of the model overfitting on a particular dataset. If the model learns to fit very
closely to the points on a particular dataset, when it used to predict on another dataset it may not predict as
accurately as it did in the first.
Variance is the difference in the fits between different datasets.

19
Generally, nonlinear machine learning algorithms like decision trees have a high variance. It is even higher
if the branches are not pruned during training.

Low-variance ML algorithms: Linear Regression, Logistic Regression, Linear Discriminant Analysis.

High-variance ML algorithms: Decision Trees, k-NN, and Support Vector Machines.


Bias – Variance Tradeoff

Let’s summarize:
 If a model uses a simple machine learning algorithm like in the case of a linear model in the above
code, the model will have high bias and low variance (underfitting the data).
 If a model follows a complex machine learning model, then it will have high variance and low bias(
overfitting the data).
 You need to find a good balance between the bias and variance of the model we have used. This
tradeoff in complexity is what is referred to as bias and variance tradeoff. An optimal balance of bias
and variance should never overfit or underfit the model.
 This tradeoff applies to all forms of supervised learning: classification, regression, and structured
output learning.
How to fix bias and variance problems?
Fixing High Bias
 Adding more input features will help improve the data to fit better.
 Add more polynomial features to improve the complexity of the model.
20
 Decrease the regularization term to have a balance between bias and variance.
Fixing High Variance
 Reduce the input features, use only features with more feature importance to reduce overfitting the
data.
 Getting more training data will help in this case, because the high variance model will not be
working for an independent dataset if you have very data.

Estimating Risk Statistics:


Different methods of estimating risk measures:
1. In-Sample Risk 2. Cross-Validation

1. In-Sample Risk : Due to the phenomenon of overfitting, the training loss of the learner ,is not a
good estimate of the generalization risk of the learner..

2. Cross-Validation:

The idea is to make multiple identical copies of the data set, and to partition each copy into different training
and test sets, as illustrated in Below Figure. Here, there are four copies of the data set (consisting of response
and explanatory variables). Each copy is divided into a test set (colored blue) and training set (colored pink).
For each of these sets, we estimate the model parameters using only training data and then predict the
responses for the test set. The average loss between the predicted and observed responses is then a measure
for the predictive power of the model.

Figure: An illustration of four-fold cross-validation, representing four copies of the same data set. The data
in each copy is partitioned into a training set (pink) and a test set (blue). The darker columns represent the
response variable and the lighter ones the explanatory variables.
21
Sampling distribution of an estimator:
In statistics, it is the probability distribution of the given statistic estimated on the basis of a random sample.
It provides a generalized way to statistical inference. The estimator is the generalized mathematical
parameter to calculate sample statistics. An estimate is the result of the estimation.

The sampling distribution of estimator depends on the sample size. The effect of change of the sample size
has to be determined. An estimate has a single numerical value and hence they are called point estimates.
There are various estimators like sample me
mean,
an, sample standard deviation, proportion, variance, range etc.

Sampling distribution of the mean: It is the population mean from which the samples are drawn. For all the
sample sizes, it is likely to be normal if the population distribution is normal. Th
Thee population mean is equal
to the mean of the sampling distribution of the mean. Sampling distribution of mean has the standard
deviation, which is as follows:

Where , is the standard deviation of the sampling mean , is the population standard deviation
deviatio and n is
the sample size.
As the size of the sample increases, the spread of the sampling distribution of the mean decreases. But the
mean of the distribution remains the same and it is not affected by the sample size.

The sampling distribution of the standard deviation is the standard error of the standard deviation. It is
defined as:

Here, is the sampling distribution of the standard deviation. It is positively skewed for
small n but it approximately becomes normal for sample sizes greater than 30 30.
22
Empirical Risk Minimization:
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family
of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we
cannot know exactly how well an algorithm will work in practice (the true "risk") because we don't know the
true distribution of data that the algorithm will work on, but we can instead measure its performance on a
known set of training data (the "empirical" risk).

In general, the risk R(h) cannot be computed because the distribution P(x,y) is unknown to the learning
algorithm (this situation is referred to as agnostic learning). However, we can compute an approximation,
called empirical risk, by averaging the loss function on the training set; more formally, computing the
expectation with respect to the empirical measure:

The empirical risk minimization principle states that the learning algorithm should choose a
hypothesis ℎ which minimizes the empirical risk:

Thus the learning algorithm defined by the ERM principle consists in solving the
above optimization problem.

23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy