UNIT 3 - Data Science - III BSC CS
UNIT 3 - Data Science - III BSC CS
The below diagram illustrates the different ML algorithm, along with the categories:
DATA SCIENCE III BSC CS
Supervised learning is the types of machine learning in which machines are trained using
well "labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to train the model for each shape.
DATA SCIENCE III BSC CS
If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
If the given shape has three sides, then it will be labelled as a triangle.
If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify
the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.
Split the training dataset into training dataset, test dataset, and validation dataset.
Determine the input features of the training dataset, which should have enough knowledge so
that the model can accurately predict the output.
Determine the suitable algorithm for the model, such as support vector machine, decision tree,
etc.
Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:
➢ Linear Regression
➢ Regression Trees
➢ Non-Linear Regression
➢ Bayesian Linear Regression
➢ Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.
➢ Spam Filtering,
➢ Random Forest
➢ Decision Trees
➢ Logistic Regression
➢ Support vector Machines
With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
In supervised learning, we can have an exact idea about the classes of objects.
Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.
Supervised learning models are not suitable for handling the complex tasks.
Supervised learning cannot predict the correct output if the test data is different from the training
dataset.
In the previous topic, we learned supervised machine learning in which models are
trained using labeled data under the supervision of training data. But there may be many cases in
which we do not have labeled data and need to find the hidden patterns from the given dataset.
So, to solve such types of cases in machine learning, we need unsupervised learning techniques.
Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.
Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given dataset,
which means it does not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into the groups
according to similarities between images.
DATA SCIENCE III BSC CS
Below are some main reasons which describe the importance of Unsupervised Learning:
Unsupervised learning is helpful for finding useful insights from the data.
Unsupervised learning is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
Unsupervised learning works on unlabeled and uncategorized data which make unsupervised
learning more important.
In real-world, we do not always have input data with the corresponding output so to solve such
cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means clustering, Decision tree,
etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
DATA SCIENCE III BSC CS
The unsupervised learning algorithm can be further categorized into two types of problems:
Clustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them as
per the presence and absence of those commonalities.
Association: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such as
people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.
➢ K-means clustering
➢ KNN (k-nearest neighbors)
➢ Hierarchal clustering
➢ Anomaly detection
➢ Neural Networks
➢ Principle Component Analysis
DATA SCIENCE III BSC CS
Apriori algorithm
Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.
algorithm, the machine will learn to classify between a dog or a cat from these labeled images.
When we input new dog or cat images that it has never seen before, it will use the learned
algorithms and predict whether it is a dog or a cat. This is how supervised learning works,
and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
• Classification
• Regression
Classification
Classification deals with predicting categorical target variables, which represent discrete
classes or labels. For instance, classifying emails as spam or not spam, or predicting whether a
patient has a high risk of heart disease. Classification algorithms learn to map the input
features to one of the predefined classes.
Here are some classification algorithms:
• Logistic Regression
• Support Vector Machine
• Random Forest
• Decision Tree
• K-Nearest Neighbors (KNN)
• Naive Bayes
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size,
location, and amenities, or forecasting the sales of a product. Regression algorithms learn to
map the input features to a continuous numerical value.
Here are some regression algorithms:
• Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression
• Decision tree
• Random Forest
Advantages of Supervised Machine Learning
• Supervised Learning models can have high accuracy as they are trained on labelled data.
• The process of decision-making in supervised learning models is often interpretable.
• It can often be used in pre-trained models which saves time and resources when developing
new models from scratch.
Disadvantages of Supervised Machine Learning
• It has limitations in knowing patterns and may struggle with unseen or unexpected patterns
that are not present in the training data.
• It can be time-consuming and costly as it relies on labeled data only.
DATA SCIENCE III BSC CS
• Dimensionality reduction: Reduce the dimensionality of data while preserving its essential
information.
• Recommendation systems: Suggest products, movies, or content to users based on their
historical behavior or preferences.
• Topic modeling: Discover latent topics within a collection of documents.
• Density estimation: Estimate the probability density function of data.
• Image and video compression: Reduce the amount of storage required for multimedia
content.
• Data preprocessing: Help with data preprocessing tasks such as data cleaning, imputation
of missing values, and data scaling.
• Market basket analysis: Discover associations between products.
• Genomic data analysis: Identify patterns or group genes with similar expression profiles.
• Image segmentation: Segment images into meaningful regions.
• Community detection in social networks: Identify communities or groups of individuals
with similar interests or connections.
• Customer behavior analysis: Uncover patterns and insights for better marketing and
product recommendations.
• Content recommendation: Classify and tag content to make it easier to recommend
similar items to users.
• Exploratory data analysis (EDA): Explore data and gain insights before defining specific
tasks.
3. Semi-Supervised Learning
Positive reinforcement
• Rewards the agent for taking a desired action.
• Encourages the agent to repeat the behavior.
• Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct
answer.
Negative reinforcement
• Removes an undesirable stimulus to encourage a desired behavior.
• Discourages the agent from repeating the behavior.
• Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by
completing a task.
Machine learning modeling is the process of creating and training an algorithm (or
model) to make predictions or decisions based on patterns and relationships within a data set.
The process of creating an ML model varies depending on the application, but it typically
includes the following steps:
1. Data collection—The first step is to gather relevant data to train and evaluate the model.
Strategic data selection is vital to the success of the model.
DATA SCIENCE III BSC CS
2. Data preprocessing—The data must be cleaned and prepared, including removing duplicates,
resolving missing values, normalizing or scaling features, and splitting the data into training and
testing sets.
3. Feature engineering—Feature engineering uses domain knowledge to transform data into
features (variables) that ML algorithms can understand to improve the model's performance.
4. Model selection here are many different types of ML models. Choosing a model depends on the
type of problem to be solved (classification, regression, clustering, etc.), available data, and
various other factors.
5. Model training—The model is trained on the collected and prepared data by feeding it input
features and corresponding target variables.
6. Model evaluation—It’s important to assess the model's performance using evaluation metrics
appropriate to the problem. This step ensures that the model will effectively generalize to new
data. Machine learning modeling is an iterative process, so if the model isn’t performing as
expected, adjustments are critical.
7. Model optimization—Optimization involves improving the model's performance by tuning it’s
hyperparameters—configurations set before training.
8. Deployment—Once a model is built, tested, and optimized, it’s ready to be deployed to make
predictions or take action on new data.
9. Operations—Models in production need to be governed and monitored to ensure results and
predictions can be trusted.
Different kinds of machine learning modeling techniques are suited to address different types of
problems. Selecting the best type of model is the key to using machine learning effectively and
efficiently.
Supervised learning
In this technique, labeled data sets train the model to produce a set of desired outputs. Since
some of the input data is already tagged with the correct output, the training data acts as a
supervisor, providing the model with the instruction required to correctly predict the output.
Real-world applications of supervised learning algorithms include spam filtering, image
recognition, and fraud detection.
Unsupervised learning
As the name suggests, models are not supervised using a training data set with unsupervised
learning. Instead, this ML modeling technique trains the model on unlabeled data without any
DATA SCIENCE III BSC CS
specific desired output. These models are designed to discover patterns, structures, or
relationships within the data, such as grouping objects together with common characteristics.
Ecommerce recommendation engines and customer segmentation are two common applications
of unsupervised learning models.
Semi-supervised learning
A hybrid approach combines elements of supervised and unsupervised learning. With semi-
supervised learning, the model is trained on a data set that contains a small amount of labeled
data and a large amount of unlabeled data. The labeled examples provide a level of supervision,
while the unlabeled examples help train the model to discover hidden patterns or improve the
model's performance. Semi-supervised learning plays an important role in web content
classification and is used by internet search engines to label and rank search results.
Reinforcement learning
Reinforcement learning involves requiring the algorithm to train itself through a series of trial-
and-error experiments. This modeling technique does not rely on training data. Instead, the
algorithm learns by interacting with its environment, receiving feedback from the environment
based on its actions. Common examples of reinforcement learning include autonomous driving
systems and the segmentation of medical images such as CT scans.
Deep learning
Deep learning is a type of machine learning that uses multiple layers of neural networks to
simulate the way the human brain processes information. Deep learning uses these neural
networks to ingest vast amounts of data from multiple data sources and learn without the aid of
human intervention. Many artificial intelligence (AI) applications and services are driven by
deep learning technology, including voice-enabled television remotes, facial recognition
programs, and virtual assistants.
Reference link:
https://books.google.com/books?hl=en&lr=&id=_EZsDwAAQBAJ&oi=fnd&pg=PP1&dq=relat
ed:XF3T9bgAso4J:scholar.google.com/&ots=cJRi2Yc08B&sig=pkq2NaRzi7rnbqaR2D29uXPjz
JM
https://books.google.com/books?hl=en&lr=&id=2HteDwAAQBAJ&oi=fnd&pg=PP1&dq=relate
d:XF3T9bgAso4J:scholar.google.com/&ots=HCTbrClgXE&sig=spml1QBDW7RAdAHXftYzW
b7RBSE