Reinforcement Learning-1
Reinforcement Learning-1
REINFORCEMENT LEARNING
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make
decisions by interacting with an environment in order to maximize some notion of cumulative
reward. Unlike supervised learning, where a model learns from labeled data, reinforcement
learning involves learning from trial and error, with the agent taking actions in an environment
and receiving feedback (rewards or punishments) based on those actions.
RL Process:
➔ Initialization: The agent starts in an initial state (often chosen randomly).
➔ Action Selection: Based on the current state, the agent selects an action using its
policy.
➔ State Transition: The action taken causes a change in the environment, which
transitions the agent to a new state.
➔ Reward Feedback: After each action, the agent receives a reward or penalty from the
environment.
➔ Learning and Update: The agent updates its knowledge (policy, value function, etc.)
based on the reward received, to improve future decisions.
Types of Reinforcement Learning:
Model-Free vs. Model-Based RL:
● Model-Free: The agent doesn't build a model of the environment; it directly learns from
experiences (e.g., Q-learning, SARSA).
● Model-Based: The agent tries to learn a model of the environment (transition dynamics
and reward function) and uses this model to plan future actions (e.g., Dyna-Q).
On-Policy vs. Off-Policy RL:
● On-Policy: The agent learns the value of the policy that it is currently following (e.g.,
SARSA).
● Off-Policy: The agent learns from actions that were generated by a different policy than
the one it is currently learning (e.g., Q-learning).
Value-Based vs. Policy-Based vs. Actor-Critic:
Value-Based: The agent learns a value function (e.g., Q-learning, Deep Q Networks or DQN) to
make decisions.
Policy-Based: The agent learns a policy directly, without using a value function (e.g.,
REINFORCE algorithm).
Actor-Critic: Combines both value-based and policy-based methods, where the actor chooses
actions and the critic evaluates them (e.g., A3C, PPO).
Axioms of probability
● The axioms of probability are the foundational rules that form the basis for probability
theory.
● They were first formalized by Andrey Kolmogorov in the 1930s, and they provide a
rigorous mathematical framework for understanding random events and their likelihood.
● The axioms define how probabilities behave in a consistent and logically sound way.
● These axioms, along with derived properties, are the building blocks for all further
concepts and theorems in probability theory, including conditional probability,
independence, expectation, and the various probability distributions used in statistics
and machine learning.
● There are three basic axioms of probability:
1. Non-negativity (Non-negativity Axiom)
2. Normalization (Normalization Axiom)
3. Additivity (Additivity Axiom)
These axioms, along with derived properties, are the building blocks for all further concepts and
theorems in probability theory, including conditional probability, independence, expectation, and
the various probability distributions used in statistics and machine learning.
Random Variables: A Visual Overview
A random variable is a numerical representation of the outcome of a random experiment. It's a
function that maps the sample space (all possible outcomes) to a set of real numbers.
Key Concepts:
Probability Distribution: Describes the likelihood of different outcomes.
● For discrete random variables, it's a PMF.
● For continuous random variables, it's a PDF.
Cumulative Distribution Function (CDF): Gives the probability that a random variable is less
than or equal to a certain value.
[Image of a CDF graph, showing the cumulative probability as a function of the value of the
random variable]
Expected Value (E[X]): The average value of a random variable over many trials.
Variance (Var[X]): Measures the spread or dispersion of a random variable.
Standard Deviation (SD[X]): The square root of the variance.
Applications:
➔ Statistics: Analyzing multivariate data and modeling complex relationships between
variables.
➔ Machine Learning: Training models on multivariate data, such as image recognition or
natural language processing.
➔ Finance: Modeling the joint distribution of stock prices or interest rates.
➔ Physics: Describing the joint distribution of particle positions and momenta.
➔ Understanding joint and multiple random variables is crucial for many fields that involve
uncertainty and variability. By analyzing these distributions, we can gain insights into the
relationships between different variables and make informed decisions.
Key Concepts:
Joint Probability Notation:
● P(A ∩ B) represents the probability of both A and B occurring.
Visual Representation:
● Venn Diagrams: Often used to visualize the intersection of events.
● Joint Probability Tables: Tabular representation of probabilities for different
combinations of events.
Calculating Joint Probability:
➔ Direct Calculation:
If you have the raw data or can directly calculate the probability of both events occurring
together, you can use the formula: P(A ∩ B) = P(A and B).
➔ Using Conditional Probability:
If you know the conditional probability of one event given the other, you can use the
formula: P(A ∩ B) = P(A|B) * P(B) = P(B|A) * P(A).
➔ Independence and Joint Probability:
Independent Events: If two events are independent, the occurrence of one does not
affect the probability of the other. In this case: P(A ∩ B) = P(A) * P(B).
➔ Real-world Example:
Suppose you're flipping two coins. Let A be the event of getting a head on the first coin,
and B be the event of getting a head on the second coin. If the coins are fair and
independent, the joint probability of getting heads on both coins is:
Correlation
➔ Correlation measures the linear relationship between two random variables. It
quantifies how one variable tends to change when the other variable changes.
Correlation is often used to assess the strength and direction of the linear relationship
between two variables.
➔ Correlation measures the strength and direction of a linear relationship between two
variables. It is quantified by the Pearson correlation coefficient, which ranges from -1
(perfect negative correlation) to 1 (perfect positive correlation). Zero correlation indicates
no linear relationship.
Key Properties of Correlation:
● Positive Correlation: If X and Y have a positive correlation, when X increases, Y also
tends to increase, and vice versa. The correlation coefficient is positive.
● Negative Correlation: If X and Y have a negative correlation, when X increases, Y
tends to decrease, and vice versa. The correlation coefficient is negative.
● No Correlation: If X and Y are uncorrelated, changes in one variable do not predict any
systematic change in the other. The correlation coefficient is 0.
Spearman's Rank Correlation:
If the relationship between the variables is not linear, but monotonic (i.e., one variable increases
as the other does, but not necessarily at a constant rate), you might use Spearman's rank
correlation coefficient. This is based on the ranks of the data, not the actual values, and it can
be used to measure non-linear but monotonic relationships.
Independence
Independence describes a situation where two random variables have no influence on each
other. If two random variables are independent, the value of one variable provides no
information about the value of the other.