Ai Notes
Ai Notes
ARTIFICIAL INRELLIFENCE
BTCS511
CS H& I
IIIrd Year, VSem
Probabilistic reasoning
⬡ Probabilistic reasoning is a way of knowledge representation where
we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability
theory with logic to handle the uncertainty.
⬡ We use probability in probabilistic reasoning because it provides a
way to handle the uncertainty that is the result of someone's
laziness and ignorance.
2
⬡ In the real world, there are lots of scenarios, where the
certainty of something is not confirmed, such as "It will rain
today," "behavior of someone for some situations," "A
match between two teams or two players." These are
probable sentences for which we can assume that it will
happen but not sure about it, so here we use probabilistic
reasoning.
3
Need of probabilistic reasoning in AI:
⬡ When there are unpredictable outcomes.
⬡ When specifications or possibilities of predicates becomes too
large to handle.
⬡ When an unknown error occurs during an experiment.
⬡ In probabilistic reasoning, there are two ways to solve problems
with uncertain knowledge:
⬡ Bayes' rule
⬡ Bayesian Statistics
4
Probability:
6
Conditional probability:
⬡ Conditional probability is a probability of occurring an event when another
event has already happened.
⬡ Let's suppose, we want to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it can be written as:
7
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
8
Example:
⬡ There are 40% of the children's who like Apple and 30% of the children's who
likes orange and 20% children's who like both apple and oranges and then apply
conditional probability.
⬡ Solution:
9
Bayes' theorem
⬡ Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian
reasoning, which determines the probability of an event with uncertain
knowledge.
⬡ In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
⬡ It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
⬡ Bayes' theorem allows updating the probability prediction of an event by
observing new information of the real world.
10
11
⬡ The Rule
⬡ The below equation is Bayes rule:
⬡ The rule has a very simple derivation that directly leads from the relationship
between joint and conditional probabilities. First, note that P(A,B) = P(A|B)P(B) =
P(B,A) = P(B|A)P(A). Next, we can set the two terms involving conditional
probabilities equal to each other, so P(A|B)P(B) = P(B|A)P(A), and finally, divide
both sides by P(B) to arrive at Bayes rule.
12
⬡ In this formula, A is the event we want the probability of, and B is the new evidence that
is related to A in some way.
⬡ P(A|B) is called the posterior; this is what we are trying to estimate. In the above
example, this would be the “probability of having cancer given that the person is a
smoker”.
⬡ P(B|A) is called the likelihood; this is the probability of observing the new evidence,
given our initial hypothesis. In the above example, this would be the “probability of
being a smoker given that the person has cancer”.
⬡ P(A) is called the prior; this is the probability of our hypothesis without any additional
prior information. In the above example, this would be the “probability of having
cancer”.
13
⬡ P(B) is called the marginal likelihood; this is the total
probability of observing the evidence. In the above example,
this would be the “probability of being a smoker”. In many
applications of Bayes Rule, this is ignored, as it mainly serves as
normalization.
14
15
⬡ Imagine 100 people at a party, and you tally how many wear pink or not, and if
a man or not, and get these numbers:
16
⬡ And calculate some probabilities:
⬡ the probability of being a man is P(Man) = 40100 = 0.4
⬡ the probability of wearing pink is P(Pink) = 25100 = 0.25
⬡ the probability that a man wears pink is P(Pink|Man) = 540 =
0.125
⬡ the probability that a person wearing pink is a man P(Man|Pink)
=?
⬡ P(Man) = 0.4,
⬡ P(Pink) = 0.25 and
⬡ P(Pink|Man) = 0.125 17
⬡ Can you discover P(Man|Pink) ?
18
Bayesian belief network
⬡ Bayesian belief network is key computer technology for dealing with
probabilistic events and to solve a problem which has uncertainty. We can
define a Bayesian network as:
⬡ "A Bayesian network is a probabilistic graphical model which represents a set
of variables and their conditional dependencies using a directed acyclic
graph."
⬡ It is also called a Bayes network, belief network, decision network, or Bayesian
model.
⬡ Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection. 19
⬡ Real world applications are probabilistic in nature, and to represent
the relationship between multiple events, we need a Bayesian
network. It can also be used in various tasks including prediction,
anomaly detection, diagnostics, automated insight, reasoning, time
series prediction, and decision making under uncertainty.
⬡ Bayesian Network can be used for building models from data and
experts opinions, and it consists of two parts:
⬡ Directed Acyclic Graph
⬡ Table of conditional probabilities.
20
⬡ The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
⬡ A Bayesian network graph is made up of nodes and Arcs (directed links),
where:
21
⬡ Each node corresponds to the random variables, and a variable can
be continuous or discrete.
⬡ Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each
other
∙ In the above diagram, A, B, C, and D are random variables represented
by the nodes of the network graph.
∙ If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
∙ Node C is independent of node A. 22
Explanation of Bayesian network:
⬡ Let's understand the Bayesian network through an example by creating a directed
acyclic graph:
⬡ Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always
calls Harry when he hears the alarm, but sometimes he got confused with the
phone ringing and calls at that time too. On the other hand, Sophia likes to listen
to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.
23
⬡ Problem:
⬡ Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David(P1) and Sophia(P2) both called the
Harry.
⬡ Solution:
⬡ The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the alarm
and directly affecting the probability of alarm's going off, but David and Sophia's
calls depend on alarm probability.
⬡ The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
24
⬡ The conditional distributions for each node are given as conditional probabilities
table or CPT.
⬡ Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
⬡ In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
⬡ List of all events occurring in this network:
⬡ Burglary (B)
⬡ Earthquake(E)
⬡ Alarm(A)
⬡ David Calls(D)
⬡ Sophia calls(S) 25
T 0.001 T 0.002
F 0.999
F 0.998
B E P(A=T) P(A=F)
T T 0.95 0.05
T F 0.94 0.06
26C
T Conditional
0.70
probability0.30
for P2 calls
⬡ Let's take the observed probability for the Burglary and
earthquake component:
⬡ P(B= True) = 0.001, which is the probability of burglary.
⬡ P(B= False)= 0.999, which is the probability of no burglary.
⬡ P(E= True)= 0.002, which is the probability of a minor earthquake
⬡ P(E= False)= 0.998, Which is the probability that an earthquake
not occurred.
⬡ We can provide the conditional probabilities as per the below
tables:
27
⬡ Conditional probability table for Alarm A:
⬡ The Conditional probability of Alarm A depends on Burglar and
earthquake:
30
Machine Learning
⬡ Machine Learning is said as a subset of artificial
intelligence that is mainly concerned with the
development of algorithms which allow a
computer to learn from the data and past
experiences on their own. The term machine
learning was first introduced by Arthur
Samuel in 1959. We can define it in a
summarized way as: Machine learning enables a
machine to automatically learn from data,
improve performance from experiences, and
predict things without being explicitly
programmed.
31
⬡ With the help of sample historical data, which is known
as training data, machine learning algorithms build
a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine learning
brings computer science and statistics together for creating
predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will
provide the information, the higher will be the performance.
32
How does Machine Learning work
⬡ A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge
amount of data helps to build a better model which predicts the output more
accurately.
33
Features of Machine Learning:
⬡ Machine learning uses data to detect various patterns in a given dataset.
⬡ It is a data-driven technology.
⬡ Machine learning is much similar to data mining as it also deals with the huge
amount of the data.
34
Applications of Machine learning
Image recognition is one of the most
common applications of machine learning.
It is used to identify objects, persons,
places, digital images, etc. The popular use
case of image recognition and face
detection is, Automatic friend tagging
suggestion:
35
⬡ Speech Recognition: While using Google, we get an option of "Search by voice," it
comes under speech recognition, and it's a popular application of machine learning.
⬡ Speech recognition is a process of converting voice instructions into text, and it is
also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.
⬡ Traffic prediction: If we want to visit a new place, we take help of Google Maps,
which shows us the correct path with the shortest route and predicts the traffic
conditions.
⬡ It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested.
36
⬡ Self-driving cars: One of the most exciting applications of machine
learning is self-driving cars. Machine learning plays a significant
role in self-driving cars. Tesla, the most popular car manufacturing
company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and
objects while driving.
⬡ Email Spam and Malware Filtering: Whenever we receive a new
email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the
important symbol and spam emails in our spam box, and the
technology behind this is Machine learning.
37
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:
•Supervised learning
•Unsupervised learning
•Reinforcement learning
38
Supervised Learning
⬡ Supervised learning as the name indicates the presence of a
supervisor as a teacher.
⬡ Basically supervised learning is a learning in which we teach or
train the machine using data which is well labeled that means
some data is already tagged with the correct answer.
⬡ After that, the machine is provided with a new set of
examples(data) so that supervised learning algorithm analyses
the training data(set of training examples) and produces a
correct outcome from labeled data.
39
Example
40
⬡ For instance, suppose you are given a basket filled with different
kinds of fruits.
⬡ Now the first step is to train the machine with all different fruits
one by one like this:
⬡ If shape of object is rounded and depression at top having color
Red then it will be labeled as –Apple.
⬡ If shape of object is long curving cylinder having color Green-
Yellow then it will be labeled as –Banana.
⬡ Now suppose after training the data, you have given a new
separate fruit say Banana from basket and asked to identify it.
41
⬡ Since the machine has already learned the things from previous
data and this time have to use it wisely. It will first classify the
fruit with its shape and color and would confirm the fruit name
as BANANA and put it in Banana category.
42
⬡ Supervised learning classified into two categories of algorithms:
⬡ Classification: A classification problem is when the output variable
is a category, such as “Red” or “blue” or “disease” and “no
disease”.
⬡ Regression: A regression problem is when the output variable is a
real value, such as “dollars” or “weight”.
⬡ Supervised learning deals with or learns with “labeled” data.
Which implies that some data is already tagged with the correct
answer.
43
Advantages
⬡ Supervised learning allows collecting data and produce data
output from the previous experiences.
45
Unsupervised Learning
⬡ Unsupervised learning is the training of machine using information that is
neither classified nor labeled and allowing the algorithm to act on that
information without guidance. Here the task of machine is to group unsorted
information according to similarities, patterns and differences without any
prior training of data.
⬡ Unlike supervised learning, no teacher is provided that means no training will
be given to the machine. Therefore machine is restricted to find the hidden
structure in unlabeled data by our-self.
For instance, suppose it is given an image having both dogs and cats which
have not seen ever.
46
⬡ Thus the machine has no idea about the features of dogs and cat so we can’t
categorize it in dogs and cats. But it can categorize them according to their
similarities, patterns, and differences i.e., we can easily categorize the above
picture into two parts.
⬡ First first may contain all pics having dogs in it and second part may contain all
pics having cats in it. Here you didn’t learn anything before, means no training
data or examples.
⬡
47
⬡ It allows the model to work on its own to discover patterns and information
that was previously undetected. It mainly deals with unlabelled data.
⬡ Unsupervised learning classified into two categories of algorithms:
⬡ Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
⬡ Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.
⬡
48
Reinforcement learning
⬡ Reinforcement learning is an area of Machine Learning. It is about taking
suitable action to maximize reward in a particular situation. It is employed by
various software and machines to find the best possible behavior or path it
should take in a specific situation.
⬡ Reinforcement learning differs from the supervised learning in a way that in
supervised learning the training data has the answer key with it so the model is
trained with the correct answer itself whereas in reinforcement learning, there
is no answer but the reinforcement agent decides what to do to perform the
given task.
⬡ In the absence of a training dataset, it is bound to learn from its experience.
49
⬡ Reinforcement learning (RL) can be viewed as an approach which
falls between supervised and unsupervised learning. It is not
strictly supervised as it does not rely only on a set of labelled
training data but is not unsupervised learning because we have a
reward which we want our agent to maximise. The agent needs to
find the “right” actions to take in different situations to achieve its
overall goal.
50
Example:
The problem is as follows: Suppose there is an AI agent present within a maze
environment, and his goal is to find the diamond. The agent interacts with the environment
by performing some actions, and based on those actions, the state of the agent gets
changed, and it also receives a reward or penalty as feedback.
51
⬡ The agent continues doing these three
things (take action, change
state/remain in the same state, and
get feedback), and by doing these
actions, he learns and explores the
environment.
⬡ The agent learns that what actions
lead to positive feedback or rewards
and what actions lead to negative
feedback penalty. As a positive
reward, the agent gets a positive
point, and as a penalty, it gets a
negative point.
52
⬡ In the above image, the agent is at the very first block of the
maze. The maze is consisting of an S6 block, which is a wall,
S8 a fire pit, and S4 a diamond block.
⬡ The agent cannot cross the S6 block, as it is a solid wall. If the
agent reaches the S4 block, then get the +1 reward; if it reaches
the fire pit, then gets -1 reward point. It can take four actions:
move up, move down, move left, and move right.
⬡ The agent can take any path to reach to the final point, but he
needs to make it in possible fewer steps. Suppose the agent
considers the path S9-S5-S1-S2-S3, so he will get the +1-
reward point.
⬡ The agent will try to remember the preceding steps that it has
taken to reach the final step. To memorize the steps, it assigns 1
value to each previous step. Consider the below step:
53
Main points in Reinforcement learning
⬡ Input: The input should be an initial state from which the model will start
⬡ Output: There are many possible output as there are variety of solution to a
particular problem
⬡ Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
⬡ The model keeps continues to learn.
⬡ The best solution is decided based on the maximum reward.
54
Terms used in Reinforcement Learning
⬡ Agent(): An entity that can perceive/explore the environment and act upon it.
⬡ Environment(): A situation in which an agent is present or surrounded by. In
RL, we assume the stochastic environment, which means it is random in
nature.
⬡ Action(): Actions are the moves taken by an agent within the environment.
⬡ State(): State is a situation returned by the environment after each action
taken by the agent.
⬡ Reward(): A feedback returned to the agent from the environment to evaluate
the action of the agent.
55
Types of Reinforcement:
⬡ There are two types of Reinforcement:
⬡ Positive –The positive reinforcement learning means adding something to increase the
tendency that expected behavior would occur again. It impacts positively on the
behavior of the agent and increases the strength of the behavior.
⬡ This type of reinforcement can sustain the changes for a long time, but too much
positive reinforcement may lead to an overload of states that can reduce the
consequences.
⬡ Example:
⬡ A little boy receives Rs 50 for every A grade on his report card.
⬡ A father gives his daughter candy for cleaning up toys.
56
⬡ Advantages of reinforcement learning are:
∙ Maximizes Performance
∙ Sustain Change for a long period of time
57
⬡ Negative –The negative reinforcement learning is opposite to the positive
reinforcement as it increases the tendency that the specific behavior will occur
again by avoiding the negative condition.
⬡ It can be more effective than the positive reinforcement depending on
situation and behavior, but it provides reinforcement only to meet minimum
behavior.
⬡ Advantages of reinforcement learning:
⬡ Increases Behavior
⬡ Provide defiance to minimum standard of performance
⬡ Disadvantages of reinforcement learning:
⬡ It Only provides enough to meet up the minimum behavior
58
Applications of Reinforcement Learning
60
⬡ In probability theory and related fields a Markov process Markov
property, “memorylessness”.
⬡ Markov chain as a Markov process in either discrete or continuous
time with a countable state space.
0.1
1 22
0.9
33
61
For A
0.7*0.6=0.42
For E
0.4*0.7
=0.28
0.28+0.3
=0.58
62
63
64
65
66
67
68