0% found this document useful (0 votes)
6 views53 pages

Ml-Unit 1

This document provides an overview of machine learning, including its definition, types (supervised, unsupervised, semi-supervised, and reinforcement learning), and applications. It discusses the differences between machine learning and traditional programming, the importance of data in training models, and the structure of artificial neural networks. Additionally, it outlines the steps involved in designing a machine learning system, emphasizing the need for careful problem definition, data preparation, and model evaluation.

Uploaded by

varshini04ammu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views53 pages

Ml-Unit 1

This document provides an overview of machine learning, including its definition, types (supervised, unsupervised, semi-supervised, and reinforcement learning), and applications. It discusses the differences between machine learning and traditional programming, the importance of data in training models, and the structure of artificial neural networks. Additionally, it outlines the steps involved in designing a machine learning system, emphasizing the need for careful problem definition, data preparation, and model evaluation.

Uploaded by

varshini04ammu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

UNIT 1 - INTRODUCTION TO MACHINE LEARNING

Introduction to Machine Learning-Types of Machine Learning Techniques- Supervised


and Unsupervised Learning- The Brain and the Neuron – Design a Learning System –
Perspectives and Issues in Machine Learning – Concept Learning Task – Concept
Learning as Search – Finding a Maximally Specific Hypothesis – Version Spaces and the
Candidate Elimination Algorithm – Linear Discriminates – Perceptron – Linear
Separability –Linear Regression.

1. Introduction to Machine Learning


 It enables the machine to automatically learn from data, improve performance
from past experiences.
 From experiences it is able to make predictions.
 It contains a set of algorithms that work on a huge amount of data.
 Data is fed to these algorithms to train them, and on the basis of training, they build
the model.

Definition
“Machine learning enables a machine to automatically learn
from data, improve performance from experiences, and predict
things without being explicitly programmed”.

A machine has the ability to learn if it can improve its performance


by gaining more data.
Machine Learning vs. Traditional Programming
 In traditional programming, a programmer codes all the rules in consultation
with an expert in the industry for which software is being developed.

 Machine learning is supposed to overcome this issue. The machine learns how
the input and output data are correlated and it writes a rule. The programmers
do not need to write new rules each time there is new data. The algorithms
adapt in response to new data and experiences to improve efficacy over time.

How does Machine Learning work


The core objective of machine learning is the learning and inference.

 First of all, the machine learns the data.

 One crucial part of the data scientist is to choose carefully which data to
provide to the machine.

 The list of attributes used to solve a problem is called a feature vector.

 The machine uses some algorithms to simplify the reality and transform this
discovery into a model.

 The learning stage is used to describe the data and summarize it into a model.
 The new data are transformed into a features vector, go through the model and
give a prediction.

 There is no need to update the rules or train again the model.

 You can use the model previously trained to make inference on new data.

Features of Machine Learning

 Machine learning uses data to detect various patterns in a given dataset.


 It can learn from past data and improve automatically.
 It is a data-driven technology.
 Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Need for Machine Learning

 Rapid increment in the production of data

 Solving complex problems, which are difficult for a human

 Decision making in various sector including finance

 Finding hidden patterns and extracting useful information from data.


Applications of Machine Learning

 Machine learning is a buzzword for today's technology, and it is


growing very rapidly day by day.
 We are using machine learning in our daily life even without
knowing it such as Google Maps, Google assistant, Alexa, etc.

Trending real-world applications of Machine Learning:


2. Types of Machine Learning Techniques
 Based on the methods and way of learning, machine learning is divided into
mainly four types, which are:

1. Supervised Machine Learning

2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

Dataset

A dataset in the context of machine learning is a collection of data points that


are used for training, testing, and evaluating a machine learning model.
Types:
1. Labeled Dataset – Supervised Learning
2. Non Labeled Dataset – Unsupervised Learning
3. No Dataset – Reinforcement Learning
3. Supervised Machine Learning
 Supervised machine learning is based on supervision.
 It means in the supervised learning technique, we train the machines using the
"labelled" dataset.
 Based on the training, the machine predicts the output.
 Here the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly.
 It applies the same concept as a student learns in the supervision of the teacher.
Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are
given below:
Classification
Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
The classification algorithms predict the categories present in the dataset.

Some real-world examples of classification algorithms are Spam Detection, Email


filtering, etc.

Some popular classification algorithms are given below:

 Random Forest Algorithm


 Decision Tree Algorithm
 Logistic Regression Algorithm
 Support Vector Machine Algorithm

b) Regression
Regression algorithms are used to solve regression problems in which there is a
linear relationship between input and output variables. These are used to predict
continuous output variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

 Simple Linear Regression Algorithm


 Multivariate Regression Algorithm
 Decision Tree Algorithm
 Lasso Regression
Example – Real Time Application - Supervised learning
Applications of Supervised Learning
Some common applications of Supervised Learning are given below:
 Image segmentation- Supervised Learning algorithms are used in image
segmentation. In this process, image classification is performed on different
image data with pre-defined labels.
 Medical Diagnosis - Supervised algorithms are also used in the medical field
for diagnosis purposes. The machine can identify a disease for the new patients.
 Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic
data to identify the patterns that can lead to possible fraud.
 Spam detection - In spam detection & filtering, classification algorithms are
used. These algorithms classify an email as spam or not spam. The spam emails
are sent to the spam folder.
 Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various
identifications can be done using the same, such as voice-activated passwords,
voice commands, etc.

Advantages and Disadvantages of Supervised Learning


Advantages:
 Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
 These algorithms are helpful in predicting the output on the basis of prior
experience.

Disadvantages:
 These algorithms are not able to solve complex tasks.
 It may predict the wrong output if the test data is different from the training
data.
 It requires lots of computational time to train the algorithm.
4. Unsupervised Machine Learning

 Unsupervised machine learning, the machine is trained using the unlabeled


dataset and the machine predicts the output without any supervision.
 The main aim of the unsupervised learning algorithm is to group or categories
the unsorted dataset according to the similarities, patterns, and differences.
 Machines are instructed to find the hidden patterns from the input dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given
below:
Clustering
Association

1) Clustering
Group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other
groups.

Some of the popular clustering algorithms are given below:

 K-Means Clustering algorithm


 Mean-shift algorithm
 DBSCAN Algorithm
 Principal Component Analysis
 Independent Component Analysis

2) Association
Association rule learning is an unsupervised learning technique, which finds
interesting relations among variables within a large dataset.
The main aim of this learning algorithm is to find the dependency of one data item on
another data item and map those variables accordingly so that it can generate
maximum profit. This algorithm is mainly applied in Market Basket analysis, Web
usage mining, continuous production, etc.

Some popular algorithms of Association rule learning are


 Apriori Algorithm
 Eclat
 FP-growth algorithm
Example – Real Time Application Unsupervised learning
Applications of Unsupervised Learning

 Network Analysis: Unsupervised learning is used for identifying plagiarism


and copyright in document network analysis of text data for scholarly articles.

 Recommendation Systems: Recommendation systems widely use


unsupervised learning techniques for building recommendation applications for
different web applications and e-commerce websites.

 Anomaly Detection: Anomaly detection is a popular application of


unsupervised learning, which can identify unusual data points within the
dataset. It is used to discover fraudulent transactions.

 Singular Value Decomposition: Singular Value Decomposition or SVD is


used to extract particular information from the database. For example,
extracting information of each user located at a particular location.

Advantages and Disadvantages of Unsupervised Learning Algorithm


Advantages:
 These algorithms can be used for complicated tasks
 Compared to the supervised ones because these algorithms work on the
unlabeled dataset.

Disadvantages:
 The output of an unsupervised algorithm can be less accurate as the dataset is
not labeled.
 Working with Unsupervised learning is more difficult as it works with the
unlabelled dataset that does not map with the output.
Semi-Supervised Learning

 Semi-Supervised learning is a type of Machine Learning algorithm that lies


between Supervised and Unsupervised machine learning.
 It represents the intermediate ground between Supervised (With Labelled
training data) and Unsupervised learning (with no labelled training data)
algorithms and uses the combination of labelled and unlabeled datasets during
the training period.

Semi supervised Learning Algorithm

 Continuity Assumption: The algorithm assumes that the points which are closer
to each other are more likely to have the same output label.
 Cluster Assumption: The data can be divided into discrete clusters and points in
the same cluster are more likely to share an output label.
 Manifold Assumption: The data lie approximately on a manifold of much lower
dimension than the input space. This assumption allows the use of distances and
densities which are defined on a manifold.
Applications of Semi-Supervised Learning

1. Speech Analysis: Since labeling of audio files is a very intensive task, Semi-
Supervised learning is a very natural approach to solve this problem.
2. Internet Content Classification: Labeling each webpage is an impractical and
unfeasible process and thus uses Semi-Supervised learning algorithms. Even the
Google search algorithm uses a variant of Semi-Supervised learning to rank the
relevance of a webpage for a given query.
3. Protein Sequence Classification: Since DNA strands are typically very large in
size, the rise of Semi-Supervised learning has been imminent in this field.
Reinforcement learning

 Reinforcement learning works on a feedback-based process.


 An AI agent (A software component) automatically explores its surrounding by
hitting & trail, taking action, learning from experiences, and improving its
performance.
 Agent gets rewarded for each good action and gets punished for each bad
action.
 The goal of reinforcement learning agent is to maximize the rewards.
 In reinforcement learning, there is no labelled data like supervised learning, and
agents learn from their experiences only.
 A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). In MDP, the agent constantly interacts with the environment
and performs actions; at each action, the environment responds and generates a
new state.
Application of Reinforcement Learning

 VideoGames: RL algorithms are much popular in gaming applications. It is


used to gain super-human performance. Some popular games that use RL
algorithms are AlphaGO and AlphaGO Zero.
 Robotics: RL is widely being used in Robotics applications. Robots are used in
the industrial and manufacturing area, and these robots are made more powerful
with reinforcement learning.
 TextMining: Text-mining, one of the great applications of NLP, is now being
implemented with the help of Reinforcement Learning by Salesforce company.
4. The Brain and the Neuron

Artificial Neural Network

 The term "Artificial Neural Network" is derived from Biological neural


networks that develop the structure of a human brain.
 Similar to the human brain that has neurons interconnected to one another.
 Artificial neural networks also have neurons that are interconnected to one
another in various layers of the networks.
 These neurons are known as nodes.
 An Artificial Neural Network in the field of Artificial intelligence where it
attempts to mimic the network of neurons makes up a human brain
 So that computers will have an option to understand things and make decisions
in a human-like manner.

Neuron in Biology

 Neurons in deep learning were inspired by neurons in the human brain.


 A biological neuron receives input from other sources, combines them in some
way, followed by performing a nonlinear operation on the result, and the output
is the final result.

Structure of Human Brain

1. Soma or Cell Body-Where the cell nucleus is located.


2. Dendrites-Where the nerve is connected to the cell body.
3. Axon-Which carries the impulses of the neuron.
 Dendrites are tree like networks made of nerve fiber connected to the cell body.
 An Axon is a long connection extending from the cell body and carrying
signals from the neuron.
 The end of the axon splits into fine strands.
 It is found that such strand terminates into a small bulb like organ called
synapse.
 Through Synapse that the neuron introduce its signals to other nearby neurons
 The cell body will perform some operations that can be a summation,
multiplication, etc.
 After the operations are performed on the set of input, then they are transferred
to the next neuron via axon, which is the transmitter of the signal for the
neuron.
 The dendrites of one neuron are connected to the axon of another neuron.
 These connections are called synapses, which is a concept that has been
generalized to the field of deep learning.
Neuron in Deep Learning

 ANN posse’s large number of highly interconnected processing elements


called nodes or units or neurons.
 Each neuron is connected with other by a connection link.
 Each connection link is associated with weights which contain information
about the input signal.
 This information is used by the neuron net to solve a particular problem.
 Neuron receives one or more input signals.
 These input signals can come from either the raw dataset or from neurons
position data previous layer of the neuralnet.
 They perform some calculations.
 They send one output signals to neurons deeper in the neuralnet through a
synapse.
Structure of Neural Network

 Each synapse has an associated weight, which impacts the preceding


neuron's importance in the overall neural network.
 Once a neuron receives its inputs from the neurons in the preceding layer
of the model, it adds up each signal multiplied by its corresponding weight
and passes them on to an activation function.

 The activation function calculates the output value for the neuron. This
output value is then passed on to the next layer of the neural network
through another synapse.
 This serves as a broad overview of deep learning neurons.
Illustration of an ANN

Human Brain Vs Neural Network


Biological Neuron Artificial Neuron

Cell Neuron
Dendrites Weights or interconnections
Soma Net input
Axon Output

5. Design a Learning System

 Designing a machine learning system involves several key steps to create a


model that can learn patterns from data and make predictions or decisions.
Here's a general framework for designing a machine learning system:
1. Define Problem and Objectives
2. Data Collection and Preparation
3. Feature Engineering
4. Model Selection
5. Model Training
6. Evaluation Metrics
7. Hyperparameter Tuning
8. Validation and Testing
9. Deployment
10. Monitoring and Maintenance
11. Ethical Considerations
12. Documentation
13. User Interface
14. Security
15. Compliance and Regulations
Define Problem and Objectives:
 Clearly define the problem you want to solve and the objectives of
the machine learning system.
 Identify whether it's a classification, regression, clustering, or other
types of problem.
Data Collection and Preparation:
 Gather relevant data for training, validation, and testing.
 Clean and preprocess the data to handle missing values, outliers, and
ensure it is in a format suitable for machine learning algorithms.

The data preparation process can be complicated by issues such as:

 Missing or incomplete records. It is difficult to get every data point for every
record in a dataset. Missing data sometimes appear as empty cells or a
particular character, such as a question mark. For example:

 Improperly formatted data. Data sometimes needs to be extracted into a


different format or location. A good way to address this is to consult domain
experts or join data from other sources.
 The need for techniques such as feature engineering. Even if all of the relevant
data is available, the data preparation process may require techniques such
as feature engineering to generate additional content that will result in more
accurate, relevant models.
Feature Engineering:
 Select and transform features that are relevant to the problem.
 This may involve scaling, encoding categorical variables, creating
new features, or selecting a subset of features based on their
importance.
Model Selection:
 Choose an appropriate machine learning algorithm based on the
nature of the problem.
 Consider factors such as the size of the dataset, the type of data, and
the desired outcome.
Model Training:
 Split the dataset into training, validation, and test sets.
 Train the selected model using the training set and validate its
performance using the validation set.
 Adjust hyperparameters as needed.
Evaluation Metrics:
 Define metrics to evaluate the performance of the model.
 Common metrics include accuracy, precision, recall, F1 score, mean
squared error, or area under the ROC curve, depending on the
problem type.
Hyperparameter Tuning:
 Fine-tune the model's hyperparameters to optimize its performance.
 This may involve using techniques like grid search, random search,
or more advanced optimization methods.
Validation and Testing:
 Assess the model's performance on the validation set to ensure it
generalizes well to unseen data.
 Use the test set to get a final evaluation of the model's performance.
Deployment:
 Once satisfied with the model's performance, deploy it to a
production environment.
 This may involve creating APIs, integrating with existing systems,
or deploying in cloud services.
Monitoring and Maintenance:
 Implement monitoring tools to keep track of the model's
performance in the production environment.
 Set up alerts for potential issues, and periodically retrain the model
with new data to maintain accuracy.

Ethical Considerations:
 Consider ethical implications, biases, and fairness in the data and
model. Ensure that the machine learning system does not
inadvertently perpetuate or exacerbate existing biases.
Documentation:
 Document the entire machine learning pipeline, including data
sources, preprocessing steps, model architecture, and deployment
details.
 This documentation is crucial for future reference, collaboration,
and troubleshooting.
User Interface (if applicable):
 If the machine learning system is user-facing, design a user interface
that facilitates interaction and provides meaningful insights.
 Ensure that users understand the system's capabilities and
limitations.
Security:
 Implement security measures to protect the machine learning system
from potential attacks or unauthorized access, especially if it
involves sensitive data.
Compliance and Regulations:
 Ensure that the machine learning system complies with relevant
regulations and standards, especially if it involves sensitive data or is
deployed in regulated industries.

Remember that machine learning system design is an iterative process, and


continuous monitoring and improvement are essential for maintaining effectiveness
over time. Regularly update models with new data, reassess performance, and adapt
to changing requirements or circumstances.

6. Perspectives and Issues in Machine Learning


 Here are some common issues in Machine Learning that professionals face to
inculcate ML skills and create an application from scratch.
1. Inadequate Training Data

The major issue that comes while using machine learning algorithms is the lack of
quality as well as quantity of data.

Data quality can be affected by some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the


decision as well as accuracy in classification tasks.
o Incorrect data- Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing
output data becomes complex, which results in comparatively poor future
actions.

2. Poor quality of data

 Data plays a significant role in machine learning, and it must be of good quality
as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to
less accuracy in classification and low-quality results.

3. Non-representative training data

 The training data must cover all cases that are already occurred as well as
occurring.

 Further, if we are using non-representative training data in the model, it results


in less accurate predictions.

 If there is less training data, then there will be a sampling noise in the model,
called the non-representative training set.
 It won't be accurate in predictions. To overcome this, it will be biased against
one class or a group.

4. Overfitting and Underfitting


Overfitting:

 overfitting occurs when the model or the algorithm fits the data too well.

 Whenever a machine learning model is trained with a huge amount of data, it


starts capturing noise and inaccurate data into the training data set.
Underfitting:

 Underfitting is just the opposite of overfitting.

 Whenever a machine learning model is trained with fewer amounts of data, and
as a result, it provides incomplete and inaccurate data and destroys the accuracy
of the machine learning model.

5. Monitoring and maintenance

 Hence, regular monitoring and maintenance become compulsory for the same.

 Different results for different actions require data change, hence editing of
codes as well as resources for monitoring them also become necessary.

6. Getting bad recommendations

 A machine learning model operates under a specific context which results in


bad recommendations and concept drift in the model.

 It generally occurs when new data is introduced or interpretation of data


changes.

 However, we can overcome this by regularly updating and monitoring data


according to the expectations.
7. Lack of skilled resources

 Hence, we need manpower having in-depth knowledge of mathematics,


science, and technologies for developing and managing scientific substances for
machine learning.

8. Customer Segmentation

 Customer segmentation is also an important issue while developing a machine


learning algorithm.

 To identify the customers who paid for the recommendations shown by the
model and who don't even check them.

 Hence, an algorithm is necessary to recognize the customer behavior and


trigger a relevant recommendation for the user based on past experience.

9. Process Complexity of Machine Learning

 The machine learning process is very complex, which is also another major
issue faced by machine learning engineers and data scientists.

 There is the majority of hits and trial experiments; hence the probability of
error is higher than expected.

10. Data Bias

 These errors exist when certain elements of the dataset are heavily weighted or
need more importance than others.

 Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors.

11. Lack of Explainability

 Hence, a lack of explainability is also found in machine learning algorithms


which reduce the credibility of the algorithms.
12. Slow implementations and results

 However, machine learning models are highly efficient in producing accurate


results but are time-consuming.

 Slow programming, excessive requirements' and overloaded data take more


time to provide accurate results than expected.

13. Irrelevant features

 Hence, we should use relevant features in our training sample.

 A machine learning model is said to be good if training data has a good set of
features or less to no irrelevant features.

7. Concept Learning Task

 Inducing general functions from specific training examples is a main issue of


machine learning.
 Concept Learning: Acquiring the definition of a general category from given
sample positive and negative training examples of the category.
Definition for Concept Learning:
 Inferring a boolean-valued function from training examples of its input and
output.
 Example:
Hypothesis Space

 Concept Learning can seen as a problem of searching through a predefined


space of potential hypotheses for the hypothesis that best fits the training
examples.
 The hypothes is space has a general-to-specific ordering of hypotheses, and the
search can be efficiently organized by taking advantage of a naturally occurring
structure over the hypothes is space.
Inductive Learning Hypothesis

 Any hypothes is found to approximate the target function well over a


sufficiently largeset of training examples will also approximate the target
function well over other unobserved examples.
8. Concept Learning as Search
9. Finding a Maximally Specific Hypothesis

 FIND-S Algorithm starts from the most specific hypothesis and generalize it by
considering only positive examples.

 FIND-S algorithm ignores negative examples.


 FIND –S algorithm finds the most specific hypothesis with in H that is
consistent with the positive training examples.
Algorithm
Example
10. Version Spaces and the Candidate Elimination Algorithm
Examples:
Example
11. Perceptron
 Perceptron is a building block of an Artificial Neural Network.
 Perceptron is Machine Learning algorithm for supervised learning of various
binary classification tasks
 Perceptron is also understood as an Artificial Neuron or neural network unit
that helps to detect certain input data computations in business intelligence.
 It is a supervised learning algorithm of binary classifiers.
 Four main parameters, i.e., input values, weights and Bias, net sum, and an
activation function.
Basic Components of Perceptron
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.
o Wight and Bias:

Weight parameter represents the strength of the connection between units. Weight is
directly proportional to the strength of the associated input neuron in deciding the
output. Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not.

Types of Activation functions:

o Sign function
o Step function
o Sigmoid function

The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs.

Activation function may differ (e.g., Sign, Step, and Sigmoid) in perceptron models
by checking whether the learning process is slow or has vanishing or exploding
gradients.

How does Perceptron work?


In Machine Learning, Perceptron is considered as a single-layer neural network that
consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function.

The perceptron model begins with the multiplication of all input values and their
weights, then adds these values together to create the weighted sum.

Then this weighted sum is applied to the activation function 'f' to obtain the desired
output.

This activation function is also known as the step function and is represented by 'f'.

14.Linear Regression
o Regression is a supervised learning technique.
o Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables.
o Linear regression is a method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.
o Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
o If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.
o The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.

Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want


to predict or understand is called the dependent variable. It is also called target
variable.
o Independent Variable: The factors which affect the dependent variables or
which are used to predict the values of the dependent variables are called
independent variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It
should not be present in the dataset, because it creates problem while ranking
the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called Overfitting.
And if our algorithm does not perform well even with training dataset, then
such problem is called underfitting.

Below is the mathematical equation for


Simple Linear regression:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy