Unit-4 Ai
Unit-4 Ai
Uncertainty in artificial intelligence (AI) refers to the inherent limitations in predictions, decisions, or
classifications due to incomplete, ambiguous, or noisy data, as well as model limitations. AI systems, especially
those employing machine learning, often encounter uncertainty when dealing with real-world data that may
be imperfect or incomplete. Managing uncertainty is crucial to ensure robust, reliable, and accurate
performance in AI applications.
There are several ways to model uncertainty in AI. Bayesian approaches quantify uncertainty by treating
model parameters as probabilistic entities, offering confidence intervals or probability distributions for
predictions. Fuzzy logic addresses uncertainty by allowing partial truth values between 0 and 1, making it
useful for systems where binary decisions (true/false) are inadequate. Probabilistic graphical models like
Hidden Markov Models or Bayesian Networks handle uncertainty by modelling relationships between
variables and their likelihoods.
Additionally, deep learning models handle uncertainty through techniques like dropout as a regularization
method, which can be interpreted to provide uncertainty estimates in predictions. Uncertainty measures play
a critical role in applications like autonomous systems, healthcare, and decision-making processes, where
incorrect or overconfident predictions can have significant consequences.
Uncertainty in artificial intelligence (AI) refers to the lack of complete information or the presence of
variability in data and models. Understanding and modeling uncertainty is crucial for making informed
decisions and improving the robustness of AI systems. There are several types of uncertainty in AI, including:
1. Aleatoric Uncertainty: This type of uncertainty arises from the inherent randomness or variability
in data. It is often referred to as “data uncertainty.” For example, in a classification task, aleatoric
uncertainty may arise from variations in sensor measurements or noisy labels.
2. Epistemic Uncertainty: Epistemic uncertainty is related to the lack of knowledge or information
about a model. It represents uncertainty that can potentially be reduced with more data or better
modeling techniques. It is also known as “model uncertainty” and arises from model limitations, such
as simplifications or assumptions.
3. Parameter Uncertainty: This type of uncertainty is specific to probabilistic models, such as Bayesian
neural networks. It reflects uncertainty about the values of model parameters and is characterized by
probability distributions over those parameters.
4. Uncertainty in Decision-Making: Uncertainty in AI systems can affect the decision-making process.
For instance, in reinforcement learning, agents often need to make decisions in environments with
uncertain outcomes, leading to decision-making uncertainty.
5. Uncertainty in Natural Language Understanding: In natural language processing (NLP),
understanding and generating human language can be inherently uncertain due to language
ambiguity, polysemy (multiple meanings), and context-dependent interpretations.
6. Uncertainty in Probabilistic Inference: Bayesian methods and probabilistic graphical models are
commonly used in AI to model uncertainty. Uncertainty can arise from the process of probabilistic
inference itself, affecting the reliability of model predictions.
7. Uncertainty in Reinforcement Learning: In reinforcement learning, uncertainty may arise from the
stochasticity of the environment or the exploration-exploitation trade-off. Agents must make
decisions under uncertainty about the outcomes of their actions.
8. Uncertainty in Autonomous Systems: Autonomous systems, such as self-driving cars or drones,
must navigate uncertain and dynamic environments. This uncertainty can pertain to the movement
of other objects, sensor measurements, and control actions.
9. Uncertainty in Safety-Critical Systems: In applications where safety is paramount, such as
healthcare or autonomous vehicles, managing uncertainty is critical. Failure to account for uncertainty
can lead to dangerous consequences.
10. Uncertainty in Transfer Learning: When transferring a pre-trained AI model to a new domain or
task, uncertainty can arise due to domain shift or differences in data distributions. Understanding this
uncertainty is vital for adapting the model effectively.
11. Uncertainty in Human-AI Interaction: When AI systems interact with humans, there can be
uncertainty in understanding and responding to human input, as well as uncertainty in predicting
human behavior and preferences.
Addressing and quantifying these various types of uncertainty is an ongoing research area in AI, and
techniques such as probabilistic modeling, Bayesian inference, and Monte Carlo methods are commonly used
to manage and mitigate uncertainty in AI systems.
Become a master of Data Science and AI by going through this PG Diploma in Data Science and
Artificial Intelligence!
Techniques for Addressing Uncertainty in AI
We’ve just discussed the different types of uncertainty in AI. Now, let’s switch gears and learn techniques for
addressing uncertainty in AI. It’s like going from understanding the problem to finding solutions for it.
Fullscreen
Real world applications are probabilistic in nature, and to represent the relationship between multiple events,
we need a Bayesian network. It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision making under
uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under uncertain
knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then
node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic
graph or DAG.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which
determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's first
understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are
known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia,
who have taken a responsibility to inform Harry at work when they hear the alarm. David always calls Harry
when he hears the alarm, but sometimes he got confused with the phone ringing and calls at that time too. On
the other hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would
like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network structure is showing that
burglary and earthquake is the parent node of the alarm and directly affecting the probability of
alarm's going off, but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and also do
not notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set
of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the
above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
Uncertainty is a pervasive aspect of AI systems, as they often deal with incomplete or conflicting information.
Dempster–Shafer Theory, named after its inventors Arthur P. Dempster and Glenn Shafer, offers a
mathematical framework to represent and reason with uncertain information. By utilizing belief functions,
Dempster–Shafer Theory in Artificial Intelligence systems enables them to handle imprecise and conflicting
evidence, making it a powerful tool in decision-making processes.
Introduction
In recent times, the scientific and engineering community has come to realize the significance of incorporating
multiple forms of uncertainty. This expanded perspective on uncertainty has been made feasible by notable
advancements in computational power within the field of artificial intelligence. As computational systems
become more adept at handling intricate analyses, the limitations of relying solely on traditional probability
theory to encompass the entirety of uncertainty have become apparent.
Traditional probability theory falls short in its ability to effectively address consonant, consistent, or arbitrary
evidence without the need for additional assumptions about probability distributions within a given set.
Moreover, it fails to express the extent of conflict that may arise between different sets of evidence. To
overcome these limitations, Dempster-Shafer theory has emerged as a viable framework, blending the concept
of probability with the conventional understanding of sets. Dempster-Shafer theory provides the means to
handle diverse types of evidence, and it incorporates various methods to account for conflicts when combining
multiple sources of information in the context of artificial intelligence.
What Is Dempster – Shafer Theory (DST)?
Dempster-Shafer Theory (DST) is a theory of evidence that has its roots in the work of Dempster and Shafer.
While traditional probability theory is limited to assigning probabilities to mutually exclusive single events,
DST extends this to sets of events in a finite discrete space. This generalization allows DST to handle evidence
associated with multiple possible events, enabling it to represent uncertainty in a more meaningful way. DST
also provides a more flexible and precise approach to handling uncertain information without relying on
additional assumptions about events within an evidential set.
Where sufficient evidence is present to assign probabilities to single events, the Dempster-Shafer model can
collapse to the traditional probabilistic formulation. Additionally, one of the most significant features of DST
is its ability to handle different levels of precision regarding information without requiring further
assumptions. This characteristic enables the direct representation of uncertainty in system responses, where
an imprecise input can be characterized by a set or interval, and the resulting output is also a set or interval.
The incorporation of Dempster Shafer theory in artificial intelligence allows for a more comprehensive
treatment of uncertainty. By leveraging the unique features of this theory, AI systems can better navigate
uncertain scenarios, leveraging the potential of multiple evidentiary types and effectively managing conflicts.
The utilization of Dempster Shafer theory in artificial intelligence empowers decision-making processes in the
face of uncertainty and enhances the robustness of AI systems. Therefore, Dempster-Shafer theory is a
powerful tool for building AI systems that can handle complex uncertain scenarios.
The Uncertainty in this Model
At its core, DST represents uncertainty using a mathematical object called a belief function. This belief
function assigns degrees of belief to various hypotheses or propositions, allowing for a nuanced representation
of uncertainty. Three crucial points illustrate the nature of uncertainty within this theory:
1. Conflict: In DST, uncertainty arises from conflicting evidence or incomplete information. The theory
captures these conflicts and provides mechanisms to manage and quantify them, enabling AI systems
to reason effectively.
2. Combination Rule: DST employs a combination rule known as Dempster's rule of combination to
merge evidence from different sources. This rule handles conflicts between sources and determines
the overall belief in different hypotheses based on the available evidence.
3. Mass Function: The mass function, denoted as m(K), quantifies the belief assigned to a set of
hypotheses, denoted as K. It provides a measure of uncertainty by allocating probabilities to various
hypotheses, reflecting the degree of support each hypothesis has from the available evidence.
Example
Consider a scenario in artificial intelligence (AI) where an AI system is tasked with solving a murder mystery
using Dempster–Shafer Theory. The setting is a room with four individuals: A, B, C, and D. Suddenly, the
lights go out, and upon their return, B is discovered dead, having been stabbed in the back with a knife. No
one entered or exited the room, and it is known that B did not commit suicide. The objective is to identify the
murderer.
To address this challenge using Dempster–Shafer Theory, we can explore various possibilities:
1. Possibility 1: The murderer could be either A, C, or D.
2. Possibility 2: The murderer could be a combination of two individuals, such as A and C, C and D, or
A and D.
3. Possibility 3: All three individuals, A, C, and D, might be involved in the crime.
4. Possibility 4: None of the individuals present in the room is the murderer.
To find the murderer using Dempster–Shafer Theory, we can examine the evidence and assign measures of
plausibility to each possibility. We create a set of possible conclusions (P)(P) with individual
elements {p1,p2,...,pn}{p1,p2,...,pn}, where at least one element (p)(p) must be true. These elements must be
mutually exclusive.
By constructing the power set, which contains all possible subsets, we can analyze the evidence. For instance,
if P={a,b,c}P={a,b,c}, the power set would
be {o,{a},{b},{c},{a,b},{b,c},{a,c},{a,b,c}}{o,{a},{b},{c},{a,b},{b,c},{a,c},{a,b,c}},
comprising 23=823=8 elements.
Mass function m(K)
In Dempster–Shafer Theory, the mass function m(K) represents evidence for a hypothesis or subset K. It
denotes that evidence for {K or B} cannot be further divided into more specific beliefs for K and B.
Belief in K
The belief in KK, denoted as Bel(K)Bel(K), is calculated by summing the masses of the subsets that belong
to KK. For example, if K={a,d,c},Bel(K)K={a,d,c},Bel(K) would be calculated
as m(a)+m(d)+m(c)+m(a,d)+m(a,c)+m(d,c)+m(a,d,c)m(a)+m(d)+m(c)+m(a,d)+m(a,c)+m(d,c)+m(a,d,c).
Plausibility in K
Plausibility in KK, denoted as Pl(K)Pl(K), is determined by summing the masses of sets that intersect with KK.
It represents the cumulative evidence supporting the possibility of K being true. Pl(K)Pl(K) is computed
as m(a)+m(d)+m(c)+m(a,d)+m(d,c)+m(a,c)+m(a,d,c)m(a)+m(d)+m(c)+m(a,d)+m(d,c)+m(a,c)+m(a,d,c).
By leveraging Dempster–Shafer Theory in AI, we can analyze the evidence, assign masses to subsets of
possible conclusions, and calculate beliefs and plausibilities to infer the most likely murderer in this murder
mystery scenario.
Characteristics of Dempster Shafer Theory
Dempster Shafer Theory in artificial intelligence (AI) exhibits several notable characteristics:
1. Handling Ignorance: Dempster Shafer Theory encompasses a unique aspect related to ignorance,
where the aggregation of probabilities for all events sums up to 1. This peculiar trait allows the theory
to effectively address situations involving incomplete or missing information.
2. Reduction of Ignorance: In this theory, ignorance is gradually diminished through the accumulation
of additional evidence. By incorporating more and more evidence, Dempster Shafer Theory enables
AI systems to make more informed and precise decisions, thereby reducing uncertainties.
3. Combination Rule: The theory employs a combination rule to effectively merge and integrate various
types of possibilities. This rule allows for the synthesis of different pieces of evidence, enabling AI
systems to arrive at comprehensive and robust conclusions by considering the diverse perspectives
presented.
By leveraging these distinct characteristics, Dempster Shafer Theory proves to be a valuable tool in the field
of artificial intelligence, empowering systems to handle ignorance, reduce uncertainties, and combine multiple
types of evidence for more accurate decision-making.
Advantages and Disadvantages
Dempster Shafer Theory in Artificial Intelligence (AI) Offers Numerous Benefits:
1. Firstly, it presents a systematic and well-founded framework for effectively managing uncertain
information and making informed decisions in the face of uncertainty.
2. Secondly, the application of Dempster–Shafer Theory allows for the integration and fusion of diverse
sources of evidence, enhancing the robustness of decision-making processes in AI systems.
3. Moreover, this theory caters to the handling of incomplete or conflicting information, which is a
common occurrence in real-world scenarios encountered in artificial intelligence.
Nevertheless, it is Crucial to Acknowledge Certain Limitations Associated with the Utilization of
Dempster Shafer Theory in Artificial Intelligence:
1. One drawback is that the computational complexity of DST increases significantly when confronted
with a substantial number of events or sources of evidence, resulting in potential performance
challenges.
2. Furthermore, the process of combining evidence using Dempster–Shafer Theory necessitates careful
modeling and calibration to ensure accurate and reliable outcomes.
3. Additionally, the interpretation of belief and plausibility values in DST may possess subjectivity,
introducing the possibility of biases influencing decision-making processes in artificial intelligence.
MACHINE LEARNING
Machine learning (ML) is a subdomain of artificial intelligence (AI) that focuses on developing systems
that learn—or improve performance—based on the data they ingest. Artificial intelligence is a broad word
that refers to systems or machines that resemble human intelligence. Machine learning and AI are frequently
discussed together, and the terms are occasionally used interchangeably, although they do not signify the same
thing. A crucial distinction is that, while all machine learning is AI, not all AI is machine learning.
What is Machine Learning?
Machine Learning is the field of study that gives computers the capability to learn without being explicitly
programmed. ML is one of the most exciting technologies that one would have ever come across. As it is
evident from the name, it gives the computer that makes it more similar to humans: The ability to learn.
Machine learning is actively being used today, perhaps in many more places than one would expect.
Key Points:
• Supervised learning involves training a machine from labeled data.
• Labeled data consists of examples with the correct answer or classification.
• The machine learns the relationship between inputs (fruit images) and outputs (fruit labels).
• The trained machine can then make predictions on new, unlabeled data.
Example:
Let’s say you have a fruit basket that you want to identify. The machine would first analyze the image to
extract features such as its shape, color, and texture. Then, it would compare these features to the features of
the fruits it has already learned about. If the new image’s features are most similar to those of an apple, the
machine would predict that the fruit is an apple.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to train
the machine with all the different fruits one by one like this:
• If the shape of the object is rounded and has a depression at the top, is red in color, then it will be
labeled as –Apple.
• If the shape of the object is a long curving cylinder having Green-Yellow color, then it will be labeled
as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from the basket, and
asked to identify it.
Since the machine has already learned the things from previous data and this time has to use it wisely. It will
first classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in the
Banana category. Thus the machine learns the things from training data(basket containing fruits) and then
applies the knowledge to test data(new fruit).
Types of Supervised Learning
Supervised learning is classified into two categories of algorithms:
• Regression: A regression problem is when the output variable is a real value, such as “dollars” or
“weight”.
• Classification: A classification problem is when the output variable is a category, such as “Red” or
“blue” , “disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is already tagged
with the correct answer.
1- Regression
Regression is a type of supervised learning that is used to predict continuous values, such as house prices,
stock prices, or customer churn. Regression algorithms learn a function that maps from the input features to
the output value.
Some common regression algorithms include:
• Linear Regression
• Polynomial Regression
• Support Vector Machine Regression
• Decision Tree Regression
• Random Forest Regression
2- Classification
Classification is a type of supervised learning that is used to predict categorical values, such as whether a
customer will churn or not, whether an email is spam or not, or whether a medical image shows a tumor or
not. Classification algorithms learn a function that maps from the input features to a probability distribution
over the output classes.
Some common classification algorithms include:
• Logistic Regression
• Support Vector Machines
• Decision Trees
• Random Forests
• Naive Baye
Evaluating Supervised Learning Models
Evaluating supervised learning models is an important step in ensuring that the model is accurate and
generalizable. There are a number of different metrics that can be used to evaluate supervised learning models,
but some of the most common ones include:
For Regression
• Mean Squared Error (MSE): MSE measures the average squared difference between the predicted
values and the actual values. Lower MSE values indicate better model performance.
• Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the standard
deviation of the prediction errors. Similar to MSE, lower RMSE values indicate better model
performance.
• Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted
values and the actual values. It is less sensitive to outliers compared to MSE or RMSE.
• R-squared (Coefficient of Determination): R-squared measures the proportion of the variance in
the target variable that is explained by the model. Higher R-squared values indicate better model fit.
For Classification
• Accuracy: Accuracy is the percentage of predictions that the model makes correctly. It is calculated
by dividing the number of correct predictions by the total number of predictions.
• Precision: Precision is the percentage of positive predictions that the model makes that are actually
correct. It is calculated by dividing the number of true positives by the total number of positive
predictions.
• Recall: Recall is the percentage of all positive examples that the model correctly identifies. It is
calculated by dividing the number of true positives by the total number of positive examples.
• F1 score: The F1 score is a weighted average of precision and recall. It is calculated by taking the
harmonic mean of precision and recall.
• Confusion matrix: A confusion matrix is a table that shows the number of predictions for each
class, along with the actual class labels. It can be used to visualize the performance of the model and
identify areas where the model is struggling.
Applications of Supervised learning
Supervised learning can be used to solve a wide variety of problems, including:
• Spam filtering: Supervised learning algorithms can be trained to identify and classify spam emails
based on their content, helping users avoid unwanted messages.
• Image classification: Supervised learning can automatically classify images into different categories,
such as animals, objects, or scenes, facilitating tasks like image search, content moderation, and image-
based product recommendations.
• Medical diagnosis: Supervised learning can assist in medical diagnosis by analyzing patient data,
such as medical images, test results, and patient history, to identify patterns that suggest specific
diseases or conditions.
• Fraud detection: Supervised learning models can analyze financial transactions and identify patterns
that indicate fraudulent activity, helping financial institutions prevent fraud and protect their
customers.
• Natural language processing (NLP): Supervised learning plays a crucial role in NLP tasks,
including sentiment analysis, machine translation, and text summarization, enabling machines to
understand and process human language effectively.
Advantages of Supervised learning
• Supervised learning allows collecting data and produces data output from previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new sample.
• We have complete control over choosing the number of classes we want in the training data.
Disadvantages of Supervised learning
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
• Supervised learning cannot handle all complex tasks in Machine Learning.
• Computation time is vast for supervised learning.
• It requires a labelled data set.
• It requires a training process.
Unsupervised Learning.
Unsupervised learning is a type of machine learning that learns from unlabeled data. This means that the data
does not have any pre-existing labels or categories. The goal of unsupervised learning is to discover patterns
and relationships in the data without any explicit guidance.
Unsupervised learning is the training of a machine using information that is neither classified nor labeled and
allowing the algorithm to act on that information without guidance. Here the task of the machine is to group
unsorted information according to similarities, patterns, and differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the machine.
Therefore the machine is restricted to find the hidden structure in unlabeled data by itself.
You can use unsupervised learning to examine the animal data that has been gathered and distinguish between
several groups according to the traits and actions of the animals. These groupings might correspond to various
animal species, providing you to categorize the creatures without depending on labels that already exist.
Key Points
• Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
• Clustering algorithms group similar data points together based on their inherent characteristics.
• Feature extraction captures essential information from the data, enabling the model to make
meaningful distinctions.
• Label association assigns categories to the clusters based on the extracted patterns and characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled images, containing both
dogs and cats. The model has never seen an image of a dog or cat before, and it has no pre-existing labels or
categories for these animals. Your task is to use unsupervised learning to identify the dogs and cats in a new,
unseen image.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as ‘dogs and cats ‘.
But it can categorize them according to their similarities, patterns, and differences, i.e., we can easily categorize
the above picture into two parts. The first may contain all pics having dogs in them and the second part may
contain all pics having cats in them. Here you didn’t learn anything before, which means no training data or
examples.
It allows the model to work on its own to discover patterns and information that was previously undetected.
It mainly deals with unlabelled data.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms:
• Clustering: A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
• Association: An association rule learning problem is where you want to discover rules that describe
large portions of your data, such as people that buy X also tend to buy Y.
Clustering
Clustering is a type of unsupervised learning that is used to group similar data points together. Clustering
algorithms work by iteratively moving data points closer to their cluster centers and further away from data
points in other clusters.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
6. Gaussian Mixture Models (GMMs)
7. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Association rule learning
Association rule learning is a type of unsupervised learning that is used to identify patterns in a
data. Association rule learning algorithms work by finding relationships between different items in a dataset.
Some common association rule learning algorithms include:
• Apriori Algorithm
• Eclat Algorithm
• FP-Growth Algorithm
Evaluating Non-Supervised Learning Models
Evaluating non-supervised learning models is an important step in ensuring that the model is effective and
useful. However, it can be more challenging than evaluating supervised learning models, as there is no ground
truth data to compare the model’s predictions to.
There are a number of different metrics that can be used to evaluate non-supervised learning models, but some
of the most common ones include:
• Silhouette score: The silhouette score measures how well each data point is clustered with its own
cluster members and separated from other clusters. It ranges from -1 to 1, with higher scores
indicating better clustering.
• Calinski-Harabasz score: The Calinski-Harabasz score measures the ratio between the variance
between clusters and the variance within clusters. It ranges from 0 to infinity, with higher scores
indicating better clustering.
• Adjusted Rand index: The adjusted Rand index measures the similarity between two clusterings. It
ranges from -1 to 1, with higher scores indicating more similar clusterings.
• Davies-Bouldin index: The Davies-Bouldin index measures the average similarity between
clusters. It ranges from 0 to infinity, with lower scores indicating better clustering.
• F1 score: The F1 score is a weighted average of precision and recall, which are two metrics that are
commonly used in supervised learning to evaluate classification models. However, the F1 score can
also be used to evaluate non-supervised learning models, such as clustering models.
Application of Unsupervised learning
Non-supervised learning can be used to solve a wide variety of problems, including:
• Anomaly detection: Unsupervised learning can identify unusual patterns or deviations from normal
behavior in data, enabling the detection of fraud, intrusion, or system failures.
• Scientific discovery: Unsupervised learning can uncover hidden relationships and patterns in scientific
data, leading to new hypotheses and insights in various scientific fields.
• Recommendation systems: Unsupervised learning can identify patterns and similarities in user
behavior and preferences to recommend products, movies, or music that align with their interests.
• Customer segmentation: Unsupervised learning can identify groups of customers with similar
characteristics, allowing businesses to target marketing campaigns and improve customer service
more effectively.
• Image analysis: Unsupervised learning can group images based on their content, facilitating tasks
such as image classification, object detection, and image retrieval.
Advantages of Unsupervised learning
• It does not require training data to be labeled.
• Dimensionality reduction can be easily accomplished using unsupervised learning.
• Capable of finding previously unknown patterns in data.
• Unsupervised learning can help you gain insights from unlabeled data that you might not have been
able to get otherwise.
• Unsupervised learning is good at finding patterns and relationships in data without being told what
to look for. This can help you learn new things about your data.
Disadvantages of Unsupervised learning
• Difficult to measure accuracy or effectiveness due to lack of predefined answers during training.
• The results often have lesser accuracy.
• The user needs to spend time interpreting and label the classes which follow that classification.
• Unsupervised learning can be sensitive to data quality, including missing values, outliers, and noisy
data.
• Without labelled data, it can be difficult to evaluate the performance of unsupervised learning models,
making it challenging to assess their effectiveness.
Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to maximize
cumulative rewards in a given situation. Unlike supervised learning, which relies on a training dataset with
predefined answers, RL involves learning through experience. In RL, an agent learns to achieve a goal in an
uncertain, potentially complex environment by performing actions and receiving feedback through rewards
or penalties.
Key Concepts of Reinforcement Learning
• Agent: The learner or decision-maker.
• Environment: Everything the agent interacts with.
• State: A specific situation in which the agent finds itself.
• Action: All possible moves the agent can make.
• Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behavior through trial and error. The agent takes actions
within the environment, receives rewards or penalties, and adjusts its behavior to maximize the cumulative
reward. This learning process is characterized by the following elements:
• Policy: A strategy used by the agent to determine the next action based on the current state.
• Reward Function: A function that provides a scalar feedback signal based on the state and action.
• Value Function: A function that estimates the expected cumulative reward from a given state.
• Model of the Environment: A representation of the environment that helps in planning by predicting
future states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is
supposed to find the best possible path to reach the reward. The following problem explains the problem more
easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the
diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then
choosing the path which gives him the reward with the least hurdles. Each right step will give the robot a
reward and each wrong step will subtract the reward of the robot. The total reward will be calculated when it
reaches the final reward that is the diamond.
Types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a particular behavior,
increases the strength and the frequency of the behavior. In other words, it has a positive effect on
behavior.
Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a negative
condition is stopped or avoided.
3. Advantages of reinforcement learning:
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior
Elements of Reinforcement Learning
i) Policy: Defines the agent’s behavior at a given time.
ii) Reward Function: Defines the goal of the RL problem by providing feedback.
iii) Value Function: Estimates long-term rewards from a state.
iv) Model of the Environment: Helps in predicting future states and rewards for planning.
The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular to the
hyperplane. The parameter b in the equation represents the offset or distance of the hyperplane from the origin
along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:
Optimization:
• For Hard margin linear SVM classifier:
•