0% found this document useful (0 votes)
243 views31 pages

Unit-5 ML Notes

Reinforcement learning is a type of machine learning where an agent learns from interaction with an environment using rewards and penalties. The agent takes actions in an environment and receives feedback in the form of rewards or penalties, allowing it to learn over time which actions yield the most reward. The agent aims to learn an optimal policy or strategy through trial-and-error interactions to maximize its total reward. Key concepts include the agent, environment, actions, states, rewards, and the policy or strategy for action selection.

Uploaded by

Prateek Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
243 views31 pages

Unit-5 ML Notes

Reinforcement learning is a type of machine learning where an agent learns from interaction with an environment using rewards and penalties. The agent takes actions in an environment and receives feedback in the form of rewards or penalties, allowing it to learn over time which actions yield the most reward. The agent aims to learn an optimal policy or strategy through trial-and-error interactions to maximize its total reward. Key concepts include the agent, environment, actions, states, rewards, and the policy or strategy for action selection.

Uploaded by

Prateek Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-5

What is Reinforcement Learning?


o Reinforcement Learning is a feedback-based Machine learning technique in which an
agent learns to behave in an environment by performing the actions and seeing the results
of actions. For each good action, the agent gets positive feedback, and for each bad action,
the agent gets negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without any
labeled data, unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.
o RL solves a specific type of problem where decision making is sequential, and the goal is
long-term, such as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary goal of an
agent in reinforcement learning is to improve the performance by getting the maximum
positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it learns to
perform the task in a better way. Hence, we can say that "Reinforcement learning is a
type of machine learning method where an intelligent agent (computer program)
interacts with the environment and learns to act within that." How a Robotic dog learns
the movement of his arms is an example of Reinforcement learning.
o It is a core part of Artificial intelligence, and all AI agent works on the concept of
reinforcement learning. Here we do not need to pre-program the agent, as it learns from its
own experience without any human intervention.
o Example: Suppose there is an AI agent present within a maze environment, and his goal
is to find the diamond. The agent interacts with the environment by performing some
actions, and based on those actions, the state of the agent gets changed, and it also receives
a reward or penalty as feedback.
o The agent continues doing these three things (take action, change state/remain in the
same state, and get feedback), and by doing these actions, he learns and explores the
environment.
o The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty. As a positive reward, the agent gets a positive point,
and as a penalty, it gets a negative point.
Terms used in Reinforcement Learning
o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL, we
assume the stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken by the
agent.
o Reward(): A feedback returned to the agent from the environment to evaluate the
action of the agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.
o Value(): It is expected long-term retuned with the discount factor and opposite to the
short-term reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a
current action (a).
Key Features of Reinforcement Learning
o In RL, the agent is not instructed about the environment and what actions need to be
taken.
o It is based on the hit and trial process.
o The agent takes the next action and changes states according to the feedback of the
previous action.
o The agent may get a delayed reward.
o The environment is stochastic, and the agent needs to explore it to reach to get the
maximum positive rewards.

Approaches to implement Reinforcement Learning


There are mainly three ways to implement reinforcement-learning in ML, which are:
1. Value-based:
The value-based approach is about to find the optimal value function, which is the
maximum value at a state under any policy. Therefore, the agent expects the long-term
return at any state(s) under policy π.
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards
without using the value function. In this approach, the agent tries to apply such a policy
that the action performed in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
o Deterministic: The same action is produced by the policy (π) at any state.
o Stochastic: In this policy, probability determines the produced action.
3. Model-based: In the model-based approach, a virtual model is created for the
environment, and the agent explores that environment to learn it. There is no particular
solution or algorithm for this approach because the model representation is different for
each environment.

How does Reinforcement Learning Work?


To understand the working process of the RL, we need to consider two main things:
o Environment: It can be anything such as a room, maze, football ground, etc.
o Agent: An intelligent agent such as AI robot.
Let's take an example of a maze environment that the agent needs to explore. Consider the
below image:

In the above image, the agent is at the very first block of the maze. The maze is consisting of
an S6 block, which is a wall, S8 a fire pit, and S4 a diamond block.
The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then
get the +1 reward; if it reaches the fire pit, then gets -1 reward point. It can take four actions:
move up, move down, move left, and move right.
The agent can take any path to reach to the final point, but he needs to make it in possible fewer
steps. Suppose the agent considers the path S9-S5-S1-S2-S3, so he will get the +1-reward point.
The agent will try to remember the preceding steps that it has taken to reach the final step. To
memorize the steps, it assigns 1 value to each previous step. Consider the below step:

Now, the agent has successfully stored the previous steps assigning the 1 value to each previous
block. But what will the agent do if he starts moving from the block, which has 1 value block
on both sides? Consider the below diagram:

It will be a difficult condition for the agent whether he should go up or down as each block has
the same value. So, the above approach is not suitable for the agent to reach the destination.
Hence to solve the problem, we will use the Bellman equation, which is the main concept
behind reinforcement learning.

The Bellman Equation


The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the
year 1953, and hence it is called as a Bellman equation. It is associated with dynamic
programming and used to calculate the values of a decision problem at a certain point by
including the values of previous states.
It is a way of calculating the value functions in dynamic programming or environment that
leads to modern reinforcement learning.
The key-elements used in Bellman equations are:
o Action performed by the agent is referred to as "a"
o State occurred by performing the action is "s."
o The reward/feedback obtained for each good and bad action is "R."
o A discount factor is Gamma "γ."
The Bellman equation can be written as:
1. V(s) = max [R(s,a) + γV(s`)]
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state s by performing an action.
γ = Discount factor
V(s`) = The value at the previous state.
In the above equation, we are taking the max of the complete values because the agent tries to
find the optimal solution always.
So now, using the Bellman equation, we will find value at each state of the given environment.
We will start from the block, which is next to the target block.
For 1st block:
V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.
V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.
For 2nd block:
V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no
reward at this state.
V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9
For 3rd block:
V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no
reward at this state also.
V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81
For 4th block:
V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is
no reward at this state also.
V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73
For 5th block:
V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because there is
no reward at this state also.
V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66
Consider the below image:

Now, we will move further to the 6th block, and here agent may change the route because it
always tries to find the optimal path. So now, let's consider from the block next to the fire pit.

Now, the agent has three options to move; if he moves to the blue box, then he will feel a bump
if he moves to the fire pit, then he will get the -1 reward. But here we are taking only positive
rewards, so for this, he will move to upwards only. The complete block values will be
calculated using this formula. Consider the below image:

Types of Reinforcement learning


There are mainly two types of reinforcement learning, which are:
o Positive Reinforcement
o Negative Reinforcement
Positive Reinforcement:
The positive reinforcement learning means adding something to increase the tendency that
expected behavior would occur again. It impacts positively on the behavior of the agent and
increases the strength of the behavior.
This type of reinforcement can sustain the changes for a long time, but too much positive
reinforcement may lead to an overload of states that can reduce the consequences.
Negative Reinforcement:
The negative reinforcement learning is opposite to the positive reinforcement as it increases
the tendency that the specific behavior will occur again by avoiding the negative condition.
It can be more effective than the positive reinforcement depending on situation and behavior,
but it provides reinforcement only to meet minimum behavior.
How to represent the agent state?
We can represent the agent state using the Markov State that contains all the required
information from the history. The State St is Markov state if it follows the given condition:
P[St+1 | St ] = P[St +1 | S1,......, St]
The Markov state follows the Markov property, which says that the future is independent of
the past and can only be defined with the present. The RL works on fully observable
environments, where the agent can observe the environment and act for the new state. The
complete process is known as Markov Decision process, which is explained below:

Markov Decision Process


Markov Decision Process or MDP, is used to formalize the reinforcement learning
problems. If the environment is completely observable, then its dynamic can be modeled as
a Markov Process. In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
MDP is used to describe the environment for the RL, and almost all the RL problem can be
formalized using MDP.
MDP contains a tuple of four elements (S, A, Pa, Ra):
o A set of finite States S
o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.
MDP uses Markov property, and to better understand the MDP, we need to learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs an action a1 and move
to the state s2, then the state transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards, or states."
Or, in other words, as per Markov Property, the current state transition does not depend on any
past action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as
in a Chess game, the players only focus on the current state and do not need to remember
past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we
consider only the finite MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of random states S1, S2, ....., St that
uses the Markov Property. Markov process is also known as Markov chain, which is a tuple (S,
P) on state S and transition function P. These two components (S and P) can define the
dynamics of the system.
Reinforcement Learning Algorithms
Reinforcement learning algorithms are mainly used in AI applications and gaming applications.
The main used algorithms are:
o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal
difference Learning. The temporal difference learning methods are the way of
comparing temporally successive predictions.
o It learns the value function Q (S, a), which means how good to take action "a"
at a particular state "s."
o The below flowchart explains the working of Q- learning:
o State Action Reward State action (SARSA):
o SARSA stands for State Action Reward State action, which is an on-policy temporal
difference learning method. The on-policy control method selects the action for each state
while learning using a specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and
all pairs of (s-a).
o The main difference between Q-learning and SARSA algorithms is that unlike Q-
learning, the maximum reward for the next state is not required for updating the Q-
value in the table.
o In SARSA, new action and reward are selected using the same policy, which has
determined the original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
o Deep Q Neural Network (DQN):
o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex task to define and
update a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table,
neural network approximates the Q-values for each action and state.
Now, we will expand the Q-learning.
Q-Learning Explanation:
o Q-learning is a popular model-free reinforcement learning algorithm based on the
Bellman equation.
o The main objective of Q-learning is to learn the policy which can inform the agent
that what actions should be taken for maximizing the reward under what
circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the
Bellman equation given below:

In the equation, we have various components, including reward, discount factor (γ), probability,
and end states s'. But there is no any Q-value is given so first consider the below image:

In the above image, we can see there is an agent who has three values options, V(s 1), V(s2),
V(s3). As this is MDP, so agent only cares for the current state and the future state. The agent
can go to any direction (Up, Left, or Right), so he needs to decide where to go for the optimal
path. Here agent will take a move as per probability bases and changes the state. But if we want
some exact moves, so for this, we need to make some changes in terms of Q-value. Consider
the below image:

Q- represents the quality of the actions at each state. So instead of using a value at each state,
we will use a pair of state and action, i.e., Q(s, a). Q-value specifies that which action is more
lubricative than others, and according to the best Q-value, the agent takes his next move. The
Bellman equation can be used for deriving the Q-value.
To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain
state, so the Q -value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.


What is 'Q' in Q-learning?
The Q stands for quality in Q-learning, which means it specifies the quality of an action taken
by the agent.
Q-table:
A Q-table or matrix is created while performing the Q-learning. The table follows the state and
action pair, i.e., [s, a], and initializes the values to zero. After each action, the table is updated,
and the q-values are stored within the table.
The RL agent uses this Q-table as a reference table to select the best action based on the q-
values.

Difference between Reinforcement Learning and Supervised Learning


The Reinforcement Learning and Supervised Learning both are the part of machine learning,
but both types of learnings are far opposite to each other. The RL agents interact with the
environment, explore it, take action, and get rewarded. Whereas supervised learning algorithms
learn from the labeled dataset and, on the basis of the training, predict the output.
The difference table between RL and Supervised learning is given below:

Reinforcement Learning Supervised Learning

RL works by interacting with the Supervised learning works on the existing


environment. dataset.

The RL algorithm works like the human Supervised Learning works as when a
brain works when making some decisions. human learns things in the supervision of a
guide.

There is no labeled dataset is present The labeled dataset is present.


No previous training is provided to the Training is provided to the algorithm so that
learning agent. it can predict the output.

RL helps to take decisions sequentially. In Supervised learning, decisions are made


when input is given.

Genetic Algorithm in Machine Learning

A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory


of evolution in Nature." It is used to solve optimization problems in machine learning. It is
one of the important algorithms as it helps solve complex problems that would take a long time
to solve.

Genetic Algorithms are being widely used in different real-world applications, for
example, Designing electronic circuits, code-breaking, image processing, and artificial
creativity.

In this topic, we will explain Genetic algorithm in detail, including basic terminologies used in
Genetic algorithm, how it works, advantages and limitations of genetic algorithm, etc.

What is a Genetic Algorithm?

Before understanding the Genetic algorithm, let's first understand basic terminologies to better
understand this algorithm:

• Population: Population is the subset of all possible or probable solutions, which can
solve the given problem.
• Chromosomes: A chromosome is one of the solutions in the population for the given
problem, and the collection of gene generate a chromosome.
• Gene: A chromosome is divided into a different gene, or it is an element of the
chromosome.
• Allele: Allele is the value provided to the gene within a particular chromosome.
• Fitness Function: The fitness function is used to determine the individual's fitness
level in the population. It means the ability of an individual to compete with other
individuals. In every iteration, individuals are evaluated based on their fitness function.
• Genetic Operators: In a genetic algorithm, the best individual mate to regenerate
offspring better than parents. Here genetic operators play a role in changing the genetic
composition of the next generation.
• Selection
After calculating the fitness of every existent in the population, a selection process is used to
determine which of the individualities in the population will get to reproduce and produce the
seed that will form the coming generation.

Types of selection styles available

o Roulette wheel selection


o Event selection
o Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to solve optimization
problems. It is a subset of evolutionary algorithms, which is used in computing. A genetic
algorithm uses genetic and natural selection concepts to solve optimization problems.

How Genetic Algorithm Work?

The genetic algorithm works on the evolutionary generational cycle to generate high-quality
solutions. These algorithms use different operations that either enhance or replace the
population to give an improved fit solution.

It basically involves five phases to solve the complex optimization problems, which are given
as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

1. Initialization

The process of a genetic algorithm starts by generating the set of individuals, which is called
population. Here each individual is the solution for the given problem. An individual contains
or is characterized by a set of parameters called Genes. Genes are combined into a string and
generate chromosomes, which is the solution to the problem. One of the most popular
techniques for initialization is the use of random binary strings.
2. Fitness Assignment

Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated based
on their fitness function. The fitness function provides a fitness score to each individual. This
score further determines the probability of being selected for reproduction. The high the fitness
score, the more chances of getting selected for reproduction.

3. Selection

The selection phase involves the selection of individuals for the reproduction of offspring. All
the selected individuals are then arranged in a pair of two to increase reproduction. Then these
individuals transfer their genes to the next generation.

There are three types of Selection methods available, which are:

o Roulette wheel selection


o Tournament selection
o Rank-based selection

4. Reproduction

After the selection process, the creation of a child occurs in the reproduction step. In this step,
the genetic algorithm uses two variation operators that are applied to the parent population. The
two operators involved in the reproduction phase are given below:

Crossover: The crossover plays a most significant role in the reproduction phase of the genetic
algorithm. In this process, a crossover point is selected at random within the genes. Then the
crossover operator swaps genetic information of two parents from the current generation to
produce a new individual representing the offspring.

The genes of parents are exchanged among themselves until the crossover point is met. These
newly generated offspring are added to the population. This process is also called or crossover.
Types of crossover styles available:

o One point crossover


o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover

Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification.
The below image shows the mutation process:
Types of mutation styles available,

o Flip bit mutation


o Gaussian mutation
o Exchange/Swap mutation

5. Termination

After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the final
solution as the best solution in the population.

General Workflow of a Simple Genetic Algorithm


Advantages of Genetic Algorithm

o The parallel capabilities of genetic algorithms are best.


o It helps in optimizing various problems such as discrete functions, multi-objective
problems, and continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.

Limitations of Genetic Algorithms

o Genetic algorithms are not efficient algorithms for solving simple problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational challenges.

Difference between Genetic Algorithms and Traditional Algorithms

o A search space is the set of all possible solutions to the problem. In the traditional
algorithm, only one set of solutions is maintained, whereas, in a genetic algorithm,
several sets of solutions in search space can be used.
o Traditional algorithms need more information in order to perform a search, whereas
genetic algorithms need only one objective function to calculate the fitness of an
individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can work
parallelly (calculating the fitness of the individualities are independent).
o One big difference in genetic Algorithms is that rather of operating directly on seeker
results, inheritable algorithms operate on their representations (or rendering),
frequently appertained to as chromosomes.
o One of the big differences between traditional algorithm and genetic algorithm is that
it does not directly operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas Genetic
Algorithms can generate multiple optimal results from different generations.
o The traditional algorithm is not more likely to generate optimal results, whereas Genetic
algorithms do not guarantee to generate optimal global results, but also there is a great
possibility of getting the optimal result for a problem as it uses genetic operators such
as Crossover and Mutation.
o Traditional algorithms are deterministic in nature, whereas Genetic algorithms are
probabilistic and stochastic in nature.
Some other Topics

Principal Component Analysis


Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory data analysis and
predictive modeling. It is a technique to draw strong patterns from the given dataset by reducing
the variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the
power allocation in various communication channels. It is a feature extraction technique, so
it contains the important variables and drops the least important variable

• Variance and Covariance


• Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given dataset. More
easily, it is the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each other. Such as
if one changes, the other variable also gets changed. The correlation value ranges from -1
to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates
that variables are directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will
be eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of variables is
called the Covariance Matrix.
Principal Components in PCA

As described above, the transformed new features or the output of PCA are the Principal
Components. The number of these PCs are either equal to or less than the original features
present in the dataset. Some properties of these principal components are given below:

• The principal component must be the linear combination of the original features.
• These components are orthogonal, i.e., the correlation between a pair of variables is zero.
• The importance of each component decreases when going to 1 to n, it means the 1 PC
has the most importance, and n PC will have the least importance.

Steps for PCA algorithm

1. Getting the dataset Firstly, we need to take the input dataset and divide it into two
subparts X and Y, where X is the training set, and Y is the validation set.
2. Representing data into a structure Now we will represent our dataset into a structure.
Such as we will represent the two-dimensional matrix of independent variable X. Here each
row corresponds to the data items, and the column corresponds to the Features. The number of
columns is the dimensions of the dataset.
3. Standardizing the data In this step, we will standardize our dataset. Such as in a
particular column, the features with high variance are more important compared to the features
with lower variance. If the importance of features is independent of the variance of the feature,
then we will divide each data item in a column with the standard deviation of the column. Here
we will name the matrix as Z.
4. Calculating the Covariance of Z To calculate the covariance of Z, we will take the
matrix Z, and will transpose it. After transpose, we will multiply it by Z. The output matrix
will be the Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors Now we need to calculate the
eigenvalues and eigenvectors for the resultant covariance matrix Z. Eigenvectors or the
covariance matrix are the directions of the axes with high information. And the coefficients of
these eigenvectors are defined as the eigenvalues.
6. Sorting the Eigen Vectors In this step, we will take all the eigenvalues and will sort
them in decreasing order, which means from largest to smallest. And simultaneously sort the
eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will be named as P*.
7. Calculating the new features Or Principal Components Here we will calculate the
new features. To do this, we will multiply the P* matrix to the Z. In the resultant matrix Z*,
each observation is the linear combination of original features. Each column of the Z* matrix
is independent of each other.
8. Remove less or unimportant features from the new dataset. The new feature set has
occurred, so we will decide here what to keep and what to remove. It means, we will only keep
the relevant or important features in the new dataset, and unimportant features will be removed
out.

Applications of Principal Component Analysis

• PCA is mainly used as the dimensionality reduction technique in various AI


applications such as computer vision, image compression, etc.
• It can also be used for finding hidden patterns if data has high dimensions. Some fields
where PCA is used are Finance, data mining, Psychology, etc.

Introduction to Ant colony optimization (ACO)


Optimization problems are very important in the field of both scientific and industrial. Some
real-life examples of these optimization problems are time table scheduling, nursing time
distribution scheduling, train scheduling, capacity planning, traveling salesman problems,
vehicle routing problems, Group-shop scheduling problem, portfolio optimization, etc. Many
optimizations algorithms are developed for this reason. Ant colony optimization is one of them.
Ant colony optimization is a probabilistic technique for finding optimal paths. In computer
science and researches, the ant colony optimization algorithm is used for solving different
computational problems.
Ant colony optimization (ACO) was first introduced by Marco Dorigo in the 90s in his Ph.D.
thesis. This algorithm is introduced based on the foraging behavior of an ant for seeking a path
between their colony and source food. Initially, it was used to solve the well-known traveling
salesman problem. Later, it is used for solving different hard optimization problems.
Ants are social insects. They live in colonies. The behavior of the ants is controlled by the goal
of searching for food. While searching, ants roaming around their colonies. An ant repeatedly
hops from one place to another to find the food. While moving, it deposits an organic compound
called pheromone on the ground. Ants communicate with each other via pheromone trails. When
an ant finds some amount of food it carries as much as it can carry. When returning it deposits
pheromone on the paths based on the quantity and quality of the food. Ant can smell pheromone.
So, other ants can smell that and follow that path. The higher the pheromone level has a higher
probability of choosing that path and the more ants follow the path, the amount of pheromone
will also increase on that path.
Let’s see an example of this. Let consider there are two paths to reach the food from the colony.
At first, there is no pheromone on the ground. So, the probability of choosing these two paths is
equal that means 50%. Let consider two ants choose two different paths to reach the food as the
probability of choosing these paths is fifty-fifty.
The distances of these two paths are different. Ant following the shorter path will reach the food
earlier than the other.

After finding food, it carries some food with itself and returns to the colony. When it tracking
the returning path it deposits pheromone on the ground. The ant following the shorter path will
reach the colony earlier.

When the third ant wants to go out for searching food it will follow the path having shorter
distance based on the pheromone level on the ground. As a shorter path has more pheromones
than the longer, the third ant will follow the path having more pheromones.

By the time the ant following the longer path returned to the colony, more ants already have
followed the path with more pheromones level. Then when another ant tries to reach the
destination(food) from the colony it will find that each path has the same pheromone level. So,
it randomly chooses one. Let consider it choose the above one(in the picture located below)
Repeating this process again and again, after some time, the shorter path has a more pheromone
level than others and has a higher probability to follow the path, and all ants next time will
follow the shorter path.

For solving different problems with ACO, there are three different proposed version of Ant-
System:
Ant Density & Ant Quantity: Pheromone is updated in each movement of an ant from one
location to another.
Ant Cycle: Pheromone is updated after all ants completed their tour.
Let see the pseudocode for applying the ant colony optimization algorithm. An artificial ant is
made for finding the optimal solution. In the first step of solving a problem, each ant generates
a solution. In the second step, paths found by different ants are compared. And in the third step,
paths value or pheromone is updated.
procedure ACO_MetaHeuristic is
while not_termination do
generateSolutions()
daemonActions()
pheromoneUpdate()
repeat
end procedure
There are many optimization problems where you can use ACO for finding the optimal solution.
Some of them are:
1. Capacitated vehicle routing problem
2. Stochastic vehicle routing problem (SVRP)
3. Vehicle routing problem with pick-up and delivery (VRPPD)
4. Group-shop scheduling problem (GSP)
5. Nursing time distribution scheduling problem
6. Permutation flow shop problem (PFSP)
7. Frequency assignment problem
8. Redundancy allocation problem
9. Traveling salesman problem(TSP)
Let see the mathematical terms of ACO(typically for a TSP problem).
Pheromone update

The left side on the equation indicates the amount of pheromone on the given edge x,y
ρ — the rate of pheromone evaporation
And the last term on the right side indicated the amount of pheromone deposited.

Where L is the cost of an ant tour length and Q is a constant.

Particle Swarm Optimization (PSO) Algorithm


Overview:

1. PSO is a stochastic optimization technique based on the movement and intelligence of


swarms.
2. In PSO, the concept of social interaction is used for solving a problem.
3. It uses a number of particles (agents) that constitute a swarm moving around in the
search space, looking for the best solution.
4. Each particle in the swarm looks for its positional coordinates in the solution space,
which are associated with the best solution that has been achieved so far by that particle.
It is known as pbest or personal best.
5. Another best value known as gbest or global best is tracked by the PSO. This is the best
possible value obtained so far by any particle in the neighborhood of that particle.

Introduction to Particle Swarm Optimization

I am sure each one of us in our lifetime has heard from our well-wisher’s, “Be with good
company. It helps you to cultivate good quality.” When we speak about a ‘good company,’ we
discuss the unequal distribution of good qualities among group members to achieve a better
common goal. It is the reason we always say ‘Work as a Team.’ Particle Swarm Optimization
(PSO) Algorithm is based on that. In 1995, Kennedy and Eberhart wrote a research paper based
on the social behavior of animal groups, where they had stated that sharing information among
the group increases survival advantage. Like while a bird searching for food randomly can
optimize her searching if she works with the flock. The advantage of working is mutual sharing
of the best information, which can help a flock to discover the best place to hunt.
Group optimization and Ensemble Learning

Many of you have heard about ‘No Free Lunch (NFL) in machine learning. It speaks that no
single model works best for all possible situations. We can also say that all optimization
algorithms perform equally well when averaged across all potential problems. The last
statement that I have written isn’t self-explanatory with the example of flock of bird. Why do
we need optimization in machine learning or deep learning? To train a model, we must define
a loss function to measure the difference between our model predictions. Our objective is to
minimize or optimize this loss function so that it will be closer to 0. Maybe you have heard
about a term called ‘Ensemble Learning.’ If you have not, then let me explain you. ‘Ensemble’
is a French word—meaning ‘Assembly.’ It speaks about learning in a group or crowd. It is like
you are trying to train a model with the help of multiple algorithms. So, what type of benefit
are we going to get here? A single base learner is a weak learner. But, when we combine all
these vulnerable learners, they become strong learners. They become strong learners because
their predictive power, accuracy, precision are high. And the error rate is less. We call this type
of combined model ‘Meta-learning’ in machine learning. It refers to learning algorithms that
can learn from other learning algorithms. It decreases variance, decreases bias, and improves
prediction. Now, when you achieve that, that’s your ultimate ‘Nirvana’ moment as a data
analyst.

The problem of optimization

Now let’s come back to our PSO model. The concept of swarm intelligence inspired the POS.
Here we are speaking about finding the optimal solution in a high-dimensional solution space.
It talks about Maximizing earns or minimizing losses. So, we are looking to maximize or
minimize a function to find the optimum solution. A function can have multiple local maximum
and minimum. But, there can be only one global maximum as well as a minimum. If your
function is very complex, then finding the global maximum can be a very daunting task. PSO
tries to capture the global maximum or minimum. Even though it cannot capture the exact
global maximum/minimum, it goes very close to it. It is the reason we called PSO a heuristic
model.

Let me give you an example of why the finding of global maximum/minimum is problematic.
Check the below function :

y=f(x)=sinx+sinx2+sinxcosx
We can see that we have one global maximum and one global minimum. If we consider the
function based on an interval in X-axis value from -4 to 6, we will have a maximum that will
not be our global maximum. It is a local maximum. So we can say that finding out the global
maximum may depend upon the interval. It is something like we try to observe a portion of a
continuous function. Also, one thing to note while describing a dynamic system or entity, you
cannot have a static function. The function that I have defined here is fixed. Data analytics is
data-hungry. To train a model or to find a suitable mathematical function, you must have
enormous data. It is impossible to have all the data. Meaning it’s challenging to get the exact
global minimum or maximum. Well, for me, it’s a limitation of Mathematics. Fortunately, we
have Statistics that advocate sampling, and from there, it can optimize some value like global
maximum or minimum concerning the original function. But again, you won’t get the exact
global maximum or minimum. You will get some values that will be closer to the actual global
maximum or minimum.

Also, when we describe a mathematical function based on some real-life scenario, we must
explain it with multiple variables or higher-dimensional vector space. The growth of bacteria
in a jar may depend upon temperature, humidity, the container, the solvent, etc. For this type
of function, it’s more challenging to get the exact global maximum and minimum. Check the
below function. And see if we add more variables than how difficult it becomes to get global
maximum and minimum.

z=f(x, y)=sin x2+siny2+sinxsiny


The mathematical formulation of an Optimization Problem :

In the optimization problem, we have a variable represented by a vector X= [x1x2x3…xn] that


minimizes or maximizes cost function depending on the proposed optimization formulation of
the function f(X). X is known as position vector; it represents a variable model. It is an n
dimensions vector, where n represents the number of variables determined in a problem. We
can call it latitude and the longitude in the problem of choosing a point to land by a flock of
birds. The function f(X) is called the fitness function or objective function. The job of f(X)
is to assess how good or bad a position X is; that is, how perfect a certain landing point a bird
thinks after finding a suitable place. Here, the evaluation, in this case, is performed through
several survival criteria.

An Intuition of Particle Swarm Optimization

The movement towards a promising area to get the global optimum.

• Each particle adjusts its traveling velocity dynamically, according to the flying
experiences it has and its colleagues in the group.
• Each particle tries to keep track of :

1. It’s best result for him/her, known as personal best or pbest.


2. The best value of any particle is the global best or gbest.

• · Each particle modifies its position according to:

1. its current position


2. its current velocity
3. the distance between its current position and pbest.
4. The distance between its current position and gbest.

Particle Swarm Optimization Algorithm

Let’s us assume a few parameters first. You will find some new parameters, which I will
describe later.

f: Objective function, Vi: Velocity of the particle or agent, A: Population of agents, W: Inertia
weight, C1: cognitive constant, U1, U2: random numbers, C2: social constant, Xi: Position of
the particle or agent, Pb: Personal Best, gb: global Best

The actual algorithm goes as below :


1. Create a ‘population’ of agents (particles) which is uniformly distributed over X.
2. Evaluate each particle’s position considering the objective function (say the below function).
z=f (x, y) = sin x2+siny2+sinxsiny
3. If a particle’s present position is better than its previous best position, update it.
4. Find the best particle (according to the particle’s last best places).
5. Update particles’ velocities.

Move particles to their new positions.

7. Go to step 2 until the stopping criteria are satisfied.


Analysis of the Particle Swarm Optimization Algorithm

If W=1, the particle’s motion is entirely influenced by the previous motion, so the particle may
keep going in the same direction. On the other hand, if 0≤W<1, such influence is reduced,
which means that a particle instead goes to other regions in the search domain.
Pb1t And its current position Pit. It has been noticed that the idea behind this term is that as the
particle gets more distant from the Pb1t (Personal Best) position, the difference (Pb1t-Pit ) Must
increase; hence, this term increases, attracting the particle to its best own position. The
parameter C1 existing as a product is a positive constant, and it is an individual-cognition
parameter. It weighs the importance of the particle’s own previous experiences.
The other hyper-parameter which composes the product of the second term is U1t. It is a random
value parameter with [0,1] range. This random parameter plays an essential role in avoiding
premature convergences, increasing the most likely global optima.
The difference (gbt-Pit ) Works as an attraction for the particles towards the best point until it’s
found at t iteration. Likewise, C2 is also a social learning parameter, and it weighs the
importance of the global learning of the swarm. And U2t plays precisely the same role as U1t.
In the case of C1=C2=0, all particles continue flying at their current speed until they hit the
search space’s boundary.
In cases C1>0 and C2=0, all particles are independent.
In cases C1>0 and C2=0, all particles are attracted to a single point in the entire swarm.
In case C1=C2≠0, all particles are attracted towards the average of pbest and gbest.
Neighbourhood Topologies
A neighborhood must be defined for each particle. This neighborhood determines the extent of
social interaction within the swarm and influences a particular particle’s movement. Less
interaction occurs when the neighborhoods in the swarm are small. For small neighborhoods,
the convergence will be slower, but it may improve the quality of solutions. The convergence
will be faster for more prominent neighborhoods, but the risk that sometimes convergence
occurs earlier.

For Star topology, each particle is connected with other particles. It leads to faster convergence
than other topologies, Easy to find out gbest. But it can be biased to the pbest.

For wheel topology, only one particle connects to the others, and all information is
communicated through this particle. This focal particle compares the best performance of all
particles in the swarm, and adjusts its position towards the best performance particle. Then the
new position of the focal particle is informed to all the particles.

For Ring Topology, when one particle finds the best result, it will make pass it to its immediate
neighbors, and these two immediate neighbors pass it to their immediate neighbors until it
reaches the last particle. Here the best result found is spread very slowly.
Types of Particle Swarm Optimization

Contour plot

It is a graphical technique to represent 3 -Dimensional surface in 2- dimensional Plot using


variable Z in the form of slices known as contours. I hope the below example can give you the
intuition.

Let’s draw a graph of circle z=x2+y2 at fixed heights ‘z’, z =1,2,3 etc.
To give you intuition, let Plot the function below in the contour plot.

z=x2+y2 its actual plotting and the contour plotting will look like below:

Here we can see the function in the region of f(x,y). We can create ten particles at random
locations in this region, together with a random velocity which is sampled over a normal
distribution with mean 0 and standard deviation 0.1, as follows:
The actual outcome will be like :

PSO found best solution at f([0.01415657 0.65909248])=0.4346033028251361

Global optimal at f ([0.0, 0.0])=0.0

Difference between PSO and Genetic Algorithm

Genetic Algorithms (GAs) and PSOs are both used as cost functions, they are both iterative,
and they both have a random element. They can be used on similar kinds of problems. The
difference between PSO and Genetic Algorithms (GAs) is that GAs it does not traverse the
search space like birds flocking, covering the spaces in between. The operation of GAs is more
like Monte Carlo, where the candidate solutions are randomized, and the best solutions are
picked to compete with a new set of randomized solutions. Also, PSO algorithms require
normalization of the input vectors to reach faster “convergence” (as heuristic algorithms, both
don’t truly converge). GAs can work with features that are continuous or discrete.

Also, In PSO, there is no creation or deletion of individuals. Individuals merely move on a


landscape where their fitness is measured over time. This is like a flock of birds or other
creatures that communicate.

Advantages and disadvantages of Particle Swarm Optimization


Advantages:
1. Insensitive to scaling of design variables.
2. Easily parallelized for concurrent processing.
3. Derivative free.
4. Very few algorithm parameters.
5. A very efficient global search algorithm.
Disadvantages:
1. PSO’s optimum local searchability is weak
Conclusion
The most exciting part of PSO is there is a stable topology where particles are able to
communicate with each other and increase the learning rate to achieve global optimum. The
metaheuristic nature of this optimization algorithm gives us lots of opportunities as it optimizes
a problem by iteratively trying to improve a candidate solution. Applicability of it will increase
more with the ongoing research work in Ensemble Learning.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy