Ky - Unit 3
Ky - Unit 3
ChatGPT said:
ChatGPT
Sure! Here's a detailed explanation of the concepts and methods mentioned in the exam questions:
3. Acceptance-Rejection Method
The acceptance-rejection method is used to generate random variables from a distribution that is
difficult to sample from directly. It involves sampling from an easier distribution and rejecting some
of the samples based on a criterion.
Steps:
1. Choose a proposal distribution g(x)g(x)g(x), which is easy to sample from.
2. Find a constant MMM such that for all xxx, f(x)≤M⋅g(x)f(x) \leq M \cdot g(x)f(x)≤M⋅g(x),
where f(x)f(x)f(x) is the target distribution.
3. Sample x∗x^*x∗ from g(x)g(x)g(x) and generate a uniform random number UUU from [0,
1].
4. If U≤f(x∗)M⋅g(x∗)U \leq \frac{f(x^*)}{M \cdot g(x^*)}U≤M⋅g(x∗)f(x∗), accept
x∗x^*x∗; otherwise, reject it.
5. Repeat steps 3 and 4 until a valid sample is accepted.
Example: Suppose you want to sample from a distribution f(x)f(x)f(x) and choose a proposal
distribution g(x)g(x)g(x) that is easy to sample from, like a uniform or normal distribution.
7. Importance Sampling
Importance Sampling is a variance reduction technique where you sample from a distribution that
is more concentrated in the areas that contribute most to the integral or expectation, and weight the
samples accordingly.
The formula is:
E[f(X)]=∫f(x)p(x)dx≈1N∑i=1Nf(xi)q(xi)p(xi)E[f(X)] = \int f(x) p(x) dx \approx \frac{1}{N} \
sum_{i=1}^{N} \frac{f(x_i)}{q(x_i)} p(x_i)E[f(X)]=∫f(x)p(x)dx≈N1i=1∑Nq(xi)f(xi)p(xi)
Where:
• p(x)p(x)p(x) is the target distribution,
• q(x)q(x)q(x) is the proposal distribution, and
• xix_ixi are samples from q(x)q(x)q(x).
8. Stratified Sampling
Stratified Sampling involves dividing the sample space into strata (subintervals or subsets) and
sampling from each stratum. This reduces variance by ensuring that all parts of the sample space are
represented proportionally.
Steps:
1. Divide the sample space into NNN strata.
2. Sample from each stratum proportionally.
3. Combine the results to estimate the desired quantity.
By ensuring each stratum is sampled, stratified sampling can achieve lower variance than simple
random sampling.
9. Markov Chain
A Markov chain is a sequence of random variables where the future state depends only on the
current state, not on the history of previous states. Its key properties are:
• Memorylessness: The probability of transitioning to the next state depends only on the
current state.
• Stationary Distribution: Over time, the chain may converge to a stable distribution.
• Transition Matrix: The probabilities of moving from one state to another are often
represented in a matrix.
You said:
in more detail and explain by taking an exaples
ChatGPT said:
ChatGPT
Sure! Let’s go deeper into each concept with detailed explanations and examples.
Steps:
1. Generate a uniform random variable UUU on [0, 1].
2. Find the inverse of the CDF: Solve for XXX in the equation F(X)=UF(X) = UF(X)=U.
3. Return the value of XXX.
3. Acceptance-Rejection Method
The acceptance-rejection method is a technique used when it’s difficult to sample from a
distribution directly, but easier to sample from a proposal distribution. It generates random variables
from the target distribution by rejecting certain samples based on a probability criterion.
Steps:
1. Choose a proposal distribution g(x)g(x)g(x) and a constant MMM such that
f(x)≤M⋅g(x)f(x) \leq M \cdot g(x)f(x)≤M⋅g(x) for all xxx, where f(x)f(x)f(x) is the target
distribution.
2. Sample x∗x^*x∗ from g(x)g(x)g(x).
3. Generate a uniform random variable UUU on [0, 1].
4. Accept x∗x^*x∗ if U≤f(x∗)M⋅g(x∗)U \leq \frac{f(x^*)}{M \cdot
g(x^*)}U≤M⋅g(x∗)f(x∗); otherwise, reject it.
5. Repeat the above steps until a sample is accepted.
Steps:
1. Generate two independent uniform random variables U1,U2∼Uniform(0,1)U_1, U_2 \
sim \text{Uniform}(0, 1)U1,U2∼Uniform(0,1).
2. Compute the two standard normal variables using the formulas:
Z0=−2lnU1cos(2πU2)Z_0 = \sqrt{-2 \ln U_1} \cos(2\pi U_2)Z0=−2lnU1cos(2πU2)
Z1=−2lnU1sin(2πU2)Z_1 = \sqrt{-2 \ln U_1} \sin(2\pi U_2)Z1=−2lnU1sin(2πU2)
These two variables Z0Z_0Z0 and Z1Z_1Z1 are independent and follow a standard normal
distribution.
Example:
Let’s generate two standard normal variables using the Box-Muller method.
1. Suppose U1=0.7U_1 = 0.7U1=0.7 and U2=0.3U_2 = 0.3U2=0.3.
2. Compute Z0Z_0Z0 and Z1Z_1Z1: Z0=−2ln0.7cos(2π⋅0.3)≈0.5154Z_0 = \sqrt{-2 \ln 0.7} \
cos(2\pi \cdot 0.3) \approx 0.5154Z0=−2ln0.7cos(2π⋅0.3)≈0.5154
Z1=−2ln0.7sin(2π⋅0.3)≈−1.2686Z_1 = \sqrt{-2 \ln 0.7} \sin(2\pi \cdot 0.3) \approx -
1.2686Z1=−2ln0.7sin(2π⋅0.3)≈−1.2686
Now, Z0Z_0Z0 and Z1Z_1Z1 are independent normal random variables.
Example:
Suppose we want to estimate the expectation of a function f(X)f(X)f(X) where XXX is a random
variable. If we generate a sample X1X_1X1 from a uniform distribution, we also generate an
"antithetic" sample 1−X11 - X_11−X1. The average of these two values, X1+(1−X1)2\frac{X_1 +
(1 - X_1)}{2}2X1+(1−X1), will have less variance than using X1X_1X1 alone.
Control Variates:
In this method, we use a known random variable with a known expectation to reduce variance. If
YYY is a random variable with a known expectation E[Y]E[Y]E[Y], we use it to adjust the
estimator.
Example:
Suppose we are estimating E[X]E[X]E[X], and we know the expectation of YYY. We adjust the
estimate as:
E[X]≈1N∑i=1NXi−α(Yi−E[Y])E[X] \approx \frac{1}{N} \sum_{i=1}^{N} X_i - \alpha (Y_i -
E[Y])E[X]≈N1i=1∑NXi−α(Yi−E[Y])
Where α\alphaα is a constant chosen to minimize the variance of the estimator.
7. Importance Sampling
Importance Sampling is a method used to estimate the expected value of a function by sampling
from a distribution that is more concentrated in the areas of interest.
Formula:
E[f(X)]=∫f(x)p(x)dx≈1N∑i=1Nf(xi)q(xi)p(xi)E[f(X)] = \int f(x) p(x) dx \approx \frac{1}{N} \
sum_{i=1}^{N} \frac{f(x_i)}{q(x_i)} p(x_i)E[f(X)]=∫f(x)p(x)dx≈N1i=1∑Nq(xi)f(xi)p(xi)
Where:
• p(x)p(x)p(x) is the target distribution.
• q(x)q(x)q(x) is the proposal distribution.
Example:
To estimate E[X2]E[X^2]E[X2] where X∼Exponential(1)X \sim \text{Exponential}
(1)X∼Exponential(1), we can use a proposal distribution q(x)=0.5exp(−0.5x)q(x) = 0.5 \exp(-
0.5x)q(x)=0.5exp(−0.5x) and correct the estimate using the formula.
8. Stratified Sampling
Stratified Sampling divides the population into subgroups or "strata" and then samples from each
stratum. This approach helps ensure that every part of the distribution is well-represented, reducing
variance.
Example:
Suppose we want to estimate the mean of a function f(x)f(x)f(x) over [0,1][0, 1][0,1], and we divide
the interval into 3 equal strata. We then sample from each subinterval to obtain more accurate
estimates.
9. Markov Chains
A Markov Chain is a sequence of random variables where the future state depends only on the
current state, not on the history.
Example:
Suppose we have a weather model where the weather tomorrow depends only on today’s weather:
• If it’s sunny today, there’s a 70% chance of sunny tomorrow, 30% chance of rain.
• If it’s rainy today, there’s an 80% chance of rain tomorrow, 20% chance of sunny.
This is a Markov Chain.
Example:
Suppose you want to sample from a target distribution p(x)p(x)p(x). Start with an initial value,
propose a new value, and accept or reject it based on the acceptance ratio.
Example:
Suppose we have a bivariate normal distribution. We sample each variable conditioned on the other
variable.
9. Markov Chains
A Markov Chain is a sequence of random variables where the future state depends only on the
current state and not on the sequence of events that preceded it. This property is called the Markov
property or memoryless property.
Key Characteristics:
• State space: The set of all possible states the system can be in.
• Transition probabilities: The probabilities of moving from one state to another.
• Stationary distribution: If the Markov Chain reaches a steady state, the distribution of
states stops changing.
A Markov Chain can be represented by a transition matrix PP, where the entry P(i,j)P(i,j) is the
probability of transitioning from state ii to state jj.
Key Idea:
• Start with an initial guess or state x0x_0.
• Propose a new state x′x' from a proposal distribution q(x′∣x)q(x' | x), where q(x′∣x)q(x' | x) is
a distribution that proposes new states based on the current state xx.
• Calculate the acceptance ratio: A(x,x′)=min(1,p(x′)q(x∣x′)p(x)q(x′∣x))A(x, x') = \min \
left( 1, \frac{p(x') q(x | x')}{p(x) q(x' | x)} \right) This ratio compares the target distribution
at x′x' with the target distribution at xx, adjusting for the proposal distribution.
Acceptance Rule:
• Accept the new state x′x' with probability A(x,x′)A(x, x'). If accepted, xn+1=x′x_{n+1} = x'.
• Reject the new state with probability 1−A(x,x′)1 - A(x, x'). If rejected, keep the current
state: xn+1=xnx_{n+1} = x_n.
Example: Sampling from a Normal Distribution
Suppose we want to sample from a normal distribution with mean μ=0\mu = 0 and variance σ2=1\
sigma^2 = 1. Let’s say we start with x0=2x_0 = 2.
1. Propose a new value x′x' from a normal proposal distribution, say x′∼N(xn,1)x' \sim
N(x_n, 1), meaning x′x' is sampled around the current value xnx_n.
2. Calculate the acceptance ratio: A(x,x′)=min(1,p(x′)p(x))A(x, x') = \min \left( 1, \frac{p(x')}
{p(x)} \right) Here, p(x)p(x) and p(x′)p(x') are the probability densities of the target
distribution (a normal distribution in this case).
3. Accept x′x' if A(x,x′)A(x, x') is large, or reject it with a smaller probability. Repeat this
process iteratively.
Over time, the chain of values generated will converge to the target distribution.
Key Steps:
1. Start with initial values for all variables X1(0),X2(0),…,Xd(0)X_1^{(0)}, X_2^{(0)}, \
dots, X_d^{(0)}.
2. Iteratively update each variable by sampling from its conditional distribution, which is the
distribution of one variable conditioned on the others. X1(t+1)∼p(X1∣X2(t),
…,Xd(t))X_1^{(t+1)} \sim p(X_1 | X_2^{(t)}, \dots, X_d^{(t)})
X2(t+1)∼p(X2∣X1(t+1),X3(t),…,Xd(t))X_2^{(t+1)} \sim p(X_2 | X_1^{(t+1)},
X_3^{(t)}, \dots, X_d^{(t)}) Repeat for each variable in the system.
3. Repeat this process for a number of iterations, where each iteration updates the values of all
the variables.
2. Autocorrelation:
Autocorrelation measures the correlation between samples at different steps. For the chain to be
effectively independent, the autocorrelation should decay to zero as the lag increases. A high
autocorrelation suggests that the chain has not mixed well, and more iterations are needed.
3. Trace Plots:
Plotting the trace of the MCMC chain over time can help visualize convergence. A well-mixed
chain should show no obvious trends, and the values should wander around the target distribution.
A non-mixing chain might show a pattern or be stuck in a local mode of the distribution.
Example:
If we’re using an MCMC method to sample from a distribution and plotting the trace of the chain,
we would expect the plot to show fluctuations around a constant value. If the plot shows a steady
upward or downward trend, it indicates that the chain has not yet converged.
In summary:
• Markov Chains are the foundation of MCMC and are widely used to model stochastic
processes.
• Metropolis-Hastings is a general-purpose MCMC algorithm for sampling from complex
distributions.
• Gibbs Sampling simplifies sampling from multivariate distributions by sampling each
variable conditioned on the others.
• To check for convergence, we can use diagnostics like the Gelman-Rubin test,
autocorrelation, trace plots, and effective sample size to ensure that the samples represent
the target distribution.
Let me know if you need further clarification or examples!
Here's a detailed explanation of the topics you mentioned, focusing on their importance for your
AKTU 3rd-year semester exam. This explanation includes essential concepts, detailed
descriptions, and examples.
1. Pseudo-Random Numbers
Random Number Generation
• Random numbers are numbers generated in such a way that their values are unpredictable
and follow a uniform distribution.
• Pseudo-random numbers are generated using deterministic algorithms but are designed to
appear random.
Inverse-Transform Method
• Used to generate random variables from any probability distribution.
• Steps:
1. Generate a random number U∼U(0,1)U \sim U(0, 1).
2. Solve FX(X)=UF_X(X) = U, where FXF_X is the CDF of the desired distribution.
• Example: For an exponential distribution with rate parameter λ\lambda: FX(X)=1−e−λX,so
X=−ln(1−U)λ.F_X(X) = 1 - e^{-\lambda X}, \quad \text{so} \, X = -\frac{\ln(1-U)}{\
lambda}.
Acceptance-Rejection Method
• Used when direct sampling is difficult.
• Steps:
1. Choose a proposal distribution g(x)g(x) such that f(x)≤Mg(x)f(x) \leq Mg(x), where
M>1M > 1.
2. Generate XX from g(x)g(x) and U∼U(0,1)U \sim U(0, 1).
3. Accept XX if U≤f(X)/(Mg(X))U \leq f(X)/(Mg(X)), else reject.
Transformations
• To generate random variables of complex distributions, transformations of simple
distributions are used.
• Example: Generate normal random variables using Box-Muller transformation:
Z1=−2lnU1cos(2πU2),Z2=−2lnU1sin(2πU2),Z_1 = \sqrt{-2 \ln U_1} \cos(2\pi U_2), \quad
Z_2 = \sqrt{-2 \ln U_1} \sin(2\pi U_2), where U1,U2∼U(0,1)U_1, U_2 \sim U(0, 1).
Importance Sampling
• Focus sampling efforts on regions where the function contributes most to the integral.
• Weights: Scale samples by the ratio of the target distribution to the proposal distribution.
Stratified Sampling
• Divide the domain into non-overlapping strata and sample separately from each.
• Improves accuracy by ensuring uniform coverage of the domain.
3. Markov Chain Monte Carlo (MCMC)
Markov Chains
• A sequence of random variables X1,X2,…X_1, X_2, \dots where the future state depends
only on the current state.
• Transition Probability Matrix: P(i,j)=P(Xn+1=j∣Xn=i)P(i, j) = P(X_{n+1} = j \mid X_n =
i).
Metropolis-Hastings Algorithm
• Used to sample from complex probability distributions.
• Steps:
1. Start with an initial state X0X_0.
2. Propose a new state YY from a proposal distribution q(X→Y)q(X \to Y).
3. Accept YY with probability: α=min(1,π(Y)q(Y→X)π(X)q(X→Y)).\alpha = \min\
left(1, \frac{\pi(Y)q(Y \to X)}{\pi(X)q(X \to Y)}\right).
Gibbs Sampling
• A special case of MCMC used when the joint distribution can be decomposed into
conditional distributions.
• Iteratively sample from conditional distributions: X1∼P(X1∣X2,…,Xn), X2∼P(X2∣X1,
…,Xn),…X_1 \sim P(X_1 \mid X_2, \dots, X_n), \, X_2 \sim P(X_2 \mid X_1, \dots,
X_n), \dots
Convergence
• MCMC chains converge to the target distribution as the number of iterations increases.
• Diagnostics:
• Autocorrelation: Measures dependence between samples.
• Gelman-Rubin Statistic: Checks for convergence by comparing within-chain and
between-chain variance.
This detailed explanation covers all major concepts along with their potential applications and
derivations to help you prepare for your AKTU semester exam. If you'd like this content as a PDF
or need additional clarifications, let me know!
KHUSHI YADUVANSHI
Here’s a detailed explanation of the important questions mentioned earlier. These answers are
crafted in a way to help you write well in your exams.
Steps:
1. Generate a random number UU from a uniform distribution [0,1][0,1].
2. Use the relationship: FX(X)=P(X≤x)=U,F_X(X) = P(X \leq x) = U, where FX(X)F_X(X) is
the CDF of the desired distribution.
3. Solve for XX in terms of UU to get the random variable XX.
Application in Exams:
Explain this method step-by-step and derive the formula for XX. If needed, substitute numerical
values for UU and λ\lambda to show an example.
Steps:
1. Choose a proposal distribution g(x)g(x) such that f(x)≤Mg(x)f(x) \leq Mg(x) for all xx,
where M>1M > 1 is a constant.
2. Generate a candidate XX from g(x)g(x) and a uniform random number U∼U(0,1)U \sim
U(0,1).
3. Accept XX if: U≤f(X)Mg(X).U \leq \frac{f(X)}{Mg(X)}. Otherwise, reject XX and repeat
the process.
Example: Sampling from a Normal Distribution
• Let f(x)f(x) be a normal distribution and g(x)g(x) be a uniform distribution.
• Suppose M=2M = 2, and a candidate XX is generated from g(x)g(x).
• Use the acceptance criterion U≤f(X)/(2g(X))U \leq f(X)/(2g(X)).
Advantages:
• Simple to implement when f(x)f(x) is known.
• Effective for distributions with complex shapes.
Steps:
1. Suppose we want to estimate: I=∫abf(x)dx.I = \int_a^b f(x) dx.
2. Generate NN random points x1,x2,…,xNx_1, x_2, \dots, x_N uniformly in [a,b][a, b].
3. Compute the integral as: I≈b−aN∑i=1Nf(xi).I \approx \frac{b - a}{N} \sum_{i=1}^N f(x_i).
Example:
Estimate ∫01x2dx\int_0^1 x^2 dx:
• Generate N=3N = 3 random points: x1=0.2,x2=0.5,x3=0.8x_1 = 0.2, x_2 = 0.5, x_3 = 0.8.
• Compute the integral: I≈13[f(0.2)+f(0.5)+f(0.8)]=13(0.04+0.25+0.64)=0.31.I \approx \
frac{1}{3} \left[ f(0.2) + f(0.5) + f(0.8) \right] = \frac{1}{3}(0.04 + 0.25 + 0.64) = 0.31.
Applications:
Monte Carlo Integration is widely used in high-dimensional integrals or complex functions.
Key Techniques:
1. Antithetic Variables:
• Generate pairs of negatively correlated random variables.
• Example: If U∼U(0,1)U \sim U(0,1), then use 1−U1-U as its antithetic variable.
• Reduces variance because the pair balances extreme values.
2. Control Variates:
• Use a variable YY with a known expectation to adjust estimates.
• Formula: I^=1N∑i=1N[f(Xi)−c(Yi−E[Y])].\hat{I} = \frac{1}{N} \sum_{i=1}^N \
left[ f(X_i) - c(Y_i - \mathbb{E}[Y]) \right].
3. Importance Sampling:
• Focus samples on the regions of the domain where f(x)f(x) contributes most.
• Weight samples by f(x)g(x)\frac{f(x)}{g(x)}, where g(x)g(x) is the sampling
distribution.
4. Stratified Sampling:
• Divide the domain into strata and sample independently from each stratum.
Exam Tip:
Define each technique clearly and use examples to demonstrate.
Steps:
1. Start at an initial state X0X_0.
2. Propose a new state YY from q(X→Y)q(X \to Y).
3. Compute the acceptance probability: α=min(1,π(Y)q(Y→X)π(X)q(X→Y)).\alpha = \min\
left(1, \frac{\pi(Y)q(Y \to X)}{\pi(X)q(X \to Y)}\right).
4. Accept YY with probability α\alpha; otherwise, retain XnX_n.
Example:
• Suppose π(x)\pi(x) is a normal distribution and q(x→y)q(x \to y) is a uniform distribution.
• Propose Y=X+ϵY = X + \epsilon, where ϵ∼U(−1,1)\epsilon \sim U(-1,1).
Steps:
1. Initialize the values of all variables.
2. Iteratively sample each variable from its conditional distribution, keeping others fixed.
Example:
For a bivariate distribution P(X,Y)P(X, Y):
• Sample X∼P(X∣Y)X \sim P(X \mid Y).
• Sample Y∼P(Y∣X)Y \sim P(Y \mid X).
Applications:
Used in Bayesian inference for complex models.
Example:
For μ=[1,2]\mu = [1, 2] and Σ=[10.80.81]\Sigma = \begin{bmatrix} 1 & 0.8 \\ 0.8 & 1 \
end{bmatrix}:
1. Perform Cholesky decomposition of Σ\Sigma.
2. Use Z=[Z1,Z2]Z = [Z_1, Z_2], where Z1,Z2∼N(0,1)Z_1, Z_2 \sim N(0, 1), to compute XX.
Benefits:
• Reduces variance.
• Ensures better representation of the entire domain.
Example:
If estimating the mean of a population divided into males and females, stratify by gender and
sample within each group.