0% found this document useful (0 votes)
359 views68 pages

Ai Notes

The document discusses probabilistic reasoning and Bayesian networks. Probabilistic reasoning uses probability theory to represent uncertain knowledge and handle uncertainty. A Bayesian network is a probabilistic graphical model that represents conditional dependencies between random variables using a directed acyclic graph. Bayesian networks allow calculating conditional probabilities and updating beliefs based on new evidence. They are useful for modeling real-world problems involving uncertainty.

Uploaded by

svvv gandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
359 views68 pages

Ai Notes

The document discusses probabilistic reasoning and Bayesian networks. Probabilistic reasoning uses probability theory to represent uncertain knowledge and handle uncertainty. A Bayesian network is a probabilistic graphical model that represents conditional dependencies between random variables using a directed acyclic graph. Bayesian networks allow calculating conditional probabilities and updating beliefs based on new evidence. They are useful for modeling real-world problems involving uncertainty.

Uploaded by

svvv gandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

SHRI VAISHNAV VIDYAPEETH VISHWAVIDYALAYA

SHRI VAISHNAV INSTITUTE OF INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

ARTIFICIAL INRELLIFENCE

BTCS511
CS H& I
IIIrd Year, VSem
Probabilistic reasoning
⬡ Probabilistic reasoning is a way of knowledge representation where
we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability
theory with logic to handle the uncertainty.
⬡ We use probability in probabilistic reasoning because it provides a
way to handle the uncertainty that is the result of someone's
laziness and ignorance.

2
⬡ In the real world, there are lots of scenarios, where the
certainty of something is not confirmed, such as "It will rain
today," "behavior of someone for some situations," "A
match between two teams or two players." These are
probable sentences for which we can assume that it will
happen but not sure about it, so here we use probabilistic
reasoning.

3
Need of probabilistic reasoning in AI:
⬡ When there are unpredictable outcomes.
⬡ When specifications or possibilities of predicates becomes too
large to handle.
⬡ When an unknown error occurs during an experiment.
⬡ In probabilistic reasoning, there are two ways to solve problems
with uncertain knowledge:
⬡ Bayes' rule
⬡ Bayesian Statistics
4
Probability:

⬡ Probability can be defined as a chance that an uncertain event


will occur. It is the numerical measure of the likelihood that an
event will occur. The value of probability always remains
between 0 and 1 that represent ideal uncertainties.
⬡ 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
⬡ P(A) = 0, indicates total uncertainty in an event A.
⬡ P(A) =1, indicates total certainty in an event A.
5
⬡ We can find the probability of an uncertain event by using
the below formula.

⬡ P(¬A) = probability of a not happening event.

6
Conditional probability:
⬡ Conditional probability is a probability of occurring an event when another
event has already happened.
⬡ Let's suppose, we want to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it can be written as:

⬡ Where P(A⋀B)= Joint probability of a and B


⬡ P(B)= Marginal probability of B.
⬡ If the probability of A is given and we need to find the probability of B, then it
will be given as:

7
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).

8
Example:
⬡ There are 40% of the children's who like Apple and 30% of the children's who
likes orange and 20% children's who like both apple and oranges and then apply
conditional probability.
⬡ Solution:

9
Bayes' theorem
⬡ Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian
reasoning, which determines the probability of an event with uncertain
knowledge.
⬡ In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
⬡ It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
⬡ Bayes' theorem allows updating the probability prediction of an event by
observing new information of the real world.

10
11
⬡ The Rule
⬡ The below equation is Bayes rule:

⬡ The rule has a very simple derivation that directly leads from the relationship
between joint and conditional probabilities. First, note that P(A,B) = P(A|B)P(B) =
P(B,A) = P(B|A)P(A). Next, we can set the two terms involving conditional
probabilities equal to each other, so P(A|B)P(B) = P(B|A)P(A), and finally, divide
both sides by P(B) to arrive at Bayes rule.
12
⬡ In this formula, A is the event we want the probability of, and B is the new evidence that
is related to A in some way.

⬡ P(A|B) is called the posterior; this is what we are trying to estimate. In the above
example, this would be the “probability of having cancer given that the person is a
smoker”.

⬡ P(B|A) is called the likelihood; this is the probability of observing the new evidence,
given our initial hypothesis. In the above example, this would be the “probability of
being a smoker given that the person has cancer”.

⬡ P(A) is called the prior; this is the probability of our hypothesis without any additional
prior information. In the above example, this would be the “probability of having
cancer”.
13
⬡ P(B) is called the marginal likelihood; this is the total
probability of observing the evidence. In the above example,
this would be the “probability of being a smoker”. In many
applications of Bayes Rule, this is ignored, as it mainly serves as
normalization.

14
15
⬡ Imagine 100 people at a party, and you tally how many wear pink or not, and if
a man or not, and get these numbers:

⬡ Bayes' Theorem is based off just those 4 numbers!


⬡ Let us do some totals:

16
⬡ And calculate some probabilities:
⬡ the probability of being a man is P(Man) = 40100 = 0.4
⬡ the probability of wearing pink is P(Pink) = 25100 = 0.25
⬡ the probability that a man wears pink is P(Pink|Man) = 540 =
0.125
⬡ the probability that a person wearing pink is a man P(Man|Pink)
=?
⬡ P(Man) = 0.4,
⬡ P(Pink) = 0.25 and
⬡ P(Pink|Man) = 0.125 17
⬡ Can you discover P(Man|Pink) ?

18
Bayesian belief network
⬡ Bayesian belief network is key computer technology for dealing with
probabilistic events and to solve a problem which has uncertainty. We can
define a Bayesian network as:
⬡ "A Bayesian network is a probabilistic graphical model which represents a set
of variables and their conditional dependencies using a directed acyclic
graph."
⬡ It is also called a Bayes network, belief network, decision network, or Bayesian
model.
⬡ Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection. 19
⬡ Real world applications are probabilistic in nature, and to represent
the relationship between multiple events, we need a Bayesian
network. It can also be used in various tasks including prediction,
anomaly detection, diagnostics, automated insight, reasoning, time
series prediction, and decision making under uncertainty.
⬡ Bayesian Network can be used for building models from data and
experts opinions, and it consists of two parts:
⬡ Directed Acyclic Graph
⬡ Table of conditional probabilities.

20
⬡ The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
⬡ A Bayesian network graph is made up of nodes and Arcs (directed links),
where:

21
⬡ Each node corresponds to the random variables, and a variable can
be continuous or discrete.
⬡ Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each
other
∙ In the above diagram, A, B, C, and D are random variables represented
by the nodes of the network graph.
∙ If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
∙ Node C is independent of node A. 22
Explanation of Bayesian network:
⬡ Let's understand the Bayesian network through an example by creating a directed
acyclic graph:
⬡ Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always
calls Harry when he hears the alarm, but sometimes he got confused with the
phone ringing and calls at that time too. On the other hand, Sophia likes to listen
to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

23
⬡ Problem:
⬡ Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David(P1) and Sophia(P2) both called the
Harry.
⬡ Solution:
⬡ The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the alarm
and directly affecting the probability of alarm's going off, but David and Sophia's
calls depend on alarm probability.
⬡ The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
24
⬡ The conditional distributions for each node are given as conditional probabilities
table or CPT.
⬡ Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
⬡ In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
⬡ List of all events occurring in this network:
⬡ Burglary (B)
⬡ Earthquake(E)
⬡ Alarm(A)
⬡ David Calls(D)
⬡ Sophia calls(S) 25
T 0.001 T 0.002

F 0.999
F 0.998
B E P(A=T) P(A=F)

T T 0.95 0.05

T F 0.94 0.06

Conditional probability of alarm A depends on


F T & Earthquake
Burglar 0.29 0.71
Conditional probability for P1 calls
F A F P(P52=T0
0.01 P(P2=F)
0.999
A P(P1=T0 P(P1=F)

26C
T Conditional
0.70
probability0.30
for P2 calls
⬡ Let's take the observed probability for the Burglary and
earthquake component:
⬡ P(B= True) = 0.001, which is the probability of burglary.
⬡ P(B= False)= 0.999, which is the probability of no burglary.
⬡ P(E= True)= 0.002, which is the probability of a minor earthquake
⬡ P(E= False)= 0.998, Which is the probability that an earthquake
not occurred.
⬡ We can provide the conditional probabilities as per the below
tables:

27
⬡ Conditional probability table for Alarm A:
⬡ The Conditional probability of Alarm A depends on Burglar and
earthquake:

B E P(A= True) P(A= False)

True True 0.95 0.06

True False 0.94 0.06

False True 0.29 0.71

False False 0.001 0.999


28
⬡ Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability
of Alarm.
A P(D= True) P(D= False)

True 0.90 0.10

False 0.05 0.95


⬡ Conditional probability table for Sophia Calls:
⬡ The Conditional probability of Sophia that she calls is depending on its Parent
Node "Alarm."
A P(S= True) P(S= False)

True 0.70 0.30

False 0.01 0.99 29


⬡ From the formula of joint distribution, we can write the problem
statement in the form of probability distribution:
⬡ P(P1, P2, A, ¬B, ¬E)
⬡ P (P1|A) *P (P2|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
⬡ = 0.90* 0.70* 0.001* 0.999*0.998
⬡ = 0.00062.
⬡ Hence, a Bayesian network can answer any query about the
domain by using Joint distribution.

30
Machine Learning
⬡ Machine Learning is said as a subset of artificial
intelligence that is mainly concerned with the
development of algorithms which allow a
computer to learn from the data and past
experiences on their own. The term machine
learning was first introduced by Arthur
Samuel in 1959. We can define it in a
summarized way as: Machine learning enables a
machine to automatically learn from data,
improve performance from experiences, and
predict things without being explicitly
programmed.
31
⬡ With the help of sample historical data, which is known
as training data, machine learning algorithms build
a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine learning
brings computer science and statistics together for creating
predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will
provide the information, the higher will be the performance.

32
How does Machine Learning work
⬡ A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge
amount of data helps to build a better model which predicts the output more
accurately.

33
Features of Machine Learning:
⬡ Machine learning uses data to detect various patterns in a given dataset.

⬡ It can learn from past data and improve automatically.

⬡ It is a data-driven technology.

⬡ Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

34
Applications of Machine learning
Image recognition is one of the most
common applications of machine learning.
It is used to identify objects, persons,
places, digital images, etc. The popular use
case of image recognition and face
detection is, Automatic friend tagging
suggestion:

Facebook provides us a feature of auto


friend tagging suggestion. Whenever we
upload a photo with our Facebook friends,
then we automatically get a tagging
suggestion with name, and the technology
behind this is machine learning's face
detection and recognition algorithm.

35
⬡ Speech Recognition: While using Google, we get an option of "Search by voice," it
comes under speech recognition, and it's a popular application of machine learning.
⬡ Speech recognition is a process of converting voice instructions into text, and it is
also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.
⬡ Traffic prediction: If we want to visit a new place, we take help of Google Maps,
which shows us the correct path with the shortest route and predicts the traffic
conditions.
⬡ It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested.

36
⬡ Self-driving cars: One of the most exciting applications of machine
learning is self-driving cars. Machine learning plays a significant
role in self-driving cars. Tesla, the most popular car manufacturing
company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and
objects while driving.
⬡ Email Spam and Malware Filtering: Whenever we receive a new
email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the
important symbol and spam emails in our spam box, and the
technology behind this is Machine learning.
37
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:

•Supervised learning

•Unsupervised learning

•Reinforcement learning

38
Supervised Learning
⬡ Supervised learning as the name indicates the presence of a
supervisor as a teacher.
⬡ Basically supervised learning is a learning in which we teach or
train the machine using data which is well labeled that means
some data is already tagged with the correct answer.
⬡ After that, the machine is provided with a new set of
examples(data) so that supervised learning algorithm analyses
the training data(set of training examples) and produces a
correct outcome from labeled data.
39
Example

40
⬡ For instance, suppose you are given a basket filled with different
kinds of fruits.
⬡ Now the first step is to train the machine with all different fruits
one by one like this:
⬡ If shape of object is rounded and depression at top having color
Red then it will be labeled as –Apple.
⬡ If shape of object is long curving cylinder having color Green-
Yellow then it will be labeled as –Banana.
⬡ Now suppose after training the data, you have given a new
separate fruit say Banana from basket and asked to identify it.
41
⬡ Since the machine has already learned the things from previous
data and this time have to use it wisely. It will first classify the
fruit with its shape and color and would confirm the fruit name
as BANANA and put it in Banana category.

⬡ Thus the machine learns the things from training data(basket


containing fruits) and then apply the knowledge to test data(new
fruit).

42
⬡ Supervised learning classified into two categories of algorithms:
⬡ Classification: A classification problem is when the output variable
is a category, such as “Red” or “blue” or “disease” and “no
disease”.
⬡ Regression: A regression problem is when the output variable is a
real value, such as “dollars” or “weight”.
⬡ Supervised learning deals with or learns with “labeled” data.
Which implies that some data is already tagged with the correct
answer.

43
Advantages
⬡ Supervised learning allows collecting data and produce data
output from the previous experiences.

⬡ Helps to optimize performance criteria with the help of


experience.

⬡ Supervised machine learning helps to solve various types of real-


world computation problems.
44
Disadvantages
⬡ Decision boundary might be over trained if your training set
which doesn't have examples that you want to have in a class
⬡ You need to select lots of good examples from each class while
you are training the classifier.
⬡ Classifying big data can be a real challenge.
⬡ Training for supervised learning needs a lot of computation time

45
Unsupervised Learning
⬡ Unsupervised learning is the training of machine using information that is
neither classified nor labeled and allowing the algorithm to act on that
information without guidance. Here the task of machine is to group unsorted
information according to similarities, patterns and differences without any
prior training of data.
⬡ Unlike supervised learning, no teacher is provided that means no training will
be given to the machine. Therefore machine is restricted to find the hidden
structure in unlabeled data by our-self.
For instance, suppose it is given an image having both dogs and cats which
have not seen ever.
46
⬡ Thus the machine has no idea about the features of dogs and cat so we can’t
categorize it in dogs and cats. But it can categorize them according to their
similarities, patterns, and differences i.e., we can easily categorize the above
picture into two parts.
⬡ First first may contain all pics having dogs in it and second part may contain all
pics having cats in it. Here you didn’t learn anything before, means no training
data or examples.


47
⬡ It allows the model to work on its own to discover patterns and information
that was previously undetected. It mainly deals with unlabelled data.
⬡ Unsupervised learning classified into two categories of algorithms:
⬡ Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
⬡ Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.

48
Reinforcement learning
⬡ Reinforcement learning is an area of Machine Learning. It is about taking
suitable action to maximize reward in a particular situation. It is employed by
various software and machines to find the best possible behavior or path it
should take in a specific situation.
⬡ Reinforcement learning differs from the supervised learning in a way that in
supervised learning the training data has the answer key with it so the model is
trained with the correct answer itself whereas in reinforcement learning, there
is no answer but the reinforcement agent decides what to do to perform the
given task.
⬡ In the absence of a training dataset, it is bound to learn from its experience.

49
⬡ Reinforcement learning (RL) can be viewed as an approach which
falls between supervised and unsupervised learning. It is not
strictly supervised as it does not rely only on a set of labelled
training data but is not unsupervised learning because we have a
reward which we want our agent to maximise. The agent needs to
find the “right” actions to take in different situations to achieve its
overall goal.

50
Example:
The problem is as follows: Suppose there is an AI agent present within a maze
environment, and his goal is to find the diamond. The agent interacts with the environment
by performing some actions, and based on those actions, the state of the agent gets
changed, and it also receives a reward or penalty as feedback.

⬡ To understand the working process of the RL,


we need to consider two main things:
⬡ Environment: It can be anything such as a
room, maze, football ground, etc.
⬡ Agent: An intelligent agent such as AI robot.
⬡ Let's take an example of a maze environment
that the agent needs to explore. Consider the
image:

51
⬡ The agent continues doing these three
things (take action, change
state/remain in the same state, and
get feedback), and by doing these
actions, he learns and explores the
environment.
⬡ The agent learns that what actions
lead to positive feedback or rewards
and what actions lead to negative
feedback penalty. As a positive
reward, the agent gets a positive
point, and as a penalty, it gets a
negative point.
52
⬡ In the above image, the agent is at the very first block of the
maze. The maze is consisting of an S6 block, which is a wall,
S8 a fire pit, and S4 a diamond block.
⬡ The agent cannot cross the S6 block, as it is a solid wall. If the
agent reaches the S4 block, then get the +1 reward; if it reaches
the fire pit, then gets -1 reward point. It can take four actions:
move up, move down, move left, and move right.
⬡ The agent can take any path to reach to the final point, but he
needs to make it in possible fewer steps. Suppose the agent
considers the path S9-S5-S1-S2-S3, so he will get the +1-
reward point.
⬡ The agent will try to remember the preceding steps that it has
taken to reach the final step. To memorize the steps, it assigns 1
value to each previous step. Consider the below step:

53
Main points in Reinforcement learning

⬡ Input: The input should be an initial state from which the model will start
⬡ Output: There are many possible output as there are variety of solution to a
particular problem
⬡ Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
⬡ The model keeps continues to learn.
⬡ The best solution is decided based on the maximum reward.

54
Terms used in Reinforcement Learning
⬡ Agent(): An entity that can perceive/explore the environment and act upon it.
⬡ Environment(): A situation in which an agent is present or surrounded by. In
RL, we assume the stochastic environment, which means it is random in
nature.
⬡ Action(): Actions are the moves taken by an agent within the environment.
⬡ State(): State is a situation returned by the environment after each action
taken by the agent.
⬡ Reward(): A feedback returned to the agent from the environment to evaluate
the action of the agent.

55
Types of Reinforcement:
⬡ There are two types of Reinforcement:
⬡ Positive –The positive reinforcement learning means adding something to increase the
tendency that expected behavior would occur again. It impacts positively on the
behavior of the agent and increases the strength of the behavior.
⬡ This type of reinforcement can sustain the changes for a long time, but too much
positive reinforcement may lead to an overload of states that can reduce the
consequences.
⬡ Example:
⬡ A little boy receives Rs 50 for every A grade on his report card.
⬡ A father gives his daughter candy for cleaning up toys.
56
⬡ Advantages of reinforcement learning are:
∙ Maximizes Performance
∙ Sustain Change for a long period of time

⬡ Disadvantages of reinforcement learning:


∙ Too much Reinforcement can lead to overload of states which can
diminish the results

57
⬡ Negative –The negative reinforcement learning is opposite to the positive
reinforcement as it increases the tendency that the specific behavior will occur
again by avoiding the negative condition.
⬡ It can be more effective than the positive reinforcement depending on
situation and behavior, but it provides reinforcement only to meet minimum
behavior.
⬡ Advantages of reinforcement learning:
⬡ Increases Behavior
⬡ Provide defiance to minimum standard of performance
⬡ Disadvantages of reinforcement learning:
⬡ It Only provides enough to meet up the minimum behavior
58
Applications of Reinforcement Learning

⬡ Robotics for industrial automation.


⬡ Business strategy planning
⬡ Machine learning and data processing
⬡ It helps you to create training systems that provide custom
instruction and materials according to the requirement of
students.
⬡ Aircraft control and robot motion control
59
Markov Decision Process
⬡ Markov decision processes (mdps) model decision making in discrete,
stochastic, sequential environments.

⬡ A stochastic model models a process where the state depends on


previous states in a non deterministic way.

⬡ A stochastic process has the Markov property if the conditional


property distribution of future states of the process.

60
⬡ In probability theory and related fields a Markov process Markov
property, “memorylessness”.
⬡ Markov chain as a Markov process in either discrete or continuous
time with a countable state space.

0.1
1 22

0.9

33

61
For A
0.7*0.6=0.42

For E
0.4*0.7
=0.28

0.28+0.3
=0.58

62
63
64
65
66
67
68

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy