0% found this document useful (0 votes)

27 views28 pages

Advanced Probability & Statistics - 23CST-286

The document provides comprehensive notes and questions on Advanced Probability and Statistics, covering topics such as random variables, distribution functions, joint probability, expectation, moments, and the law of large numbers. It includes definitions, properties, examples, and formulas for both discrete and continuous random variables. Additionally, it discusses concepts like correlation, conditional expectations, and moment-generating functions.

Uploaded by

Shubham Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views28 pages

Advanced Probability & Statistics - 23CST-286

Uploaded by

Shubham Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Advanced Probability & Statistics

(23CST-281)

ALL UNITS - NOTES & QUESTIONS

Compiled by : Subhayu

Contents :
Unit 1……………………………………………………………………………………………………………………………………..
Unit 2…………………………………………………………………………………………………………………………………….
Unit 3……………………………………………………………………………………………………………………………………..
MST 1 and 2 solutions………………………………………………………………………………………………………
Sample Questions…………………………………………………..…………………………………………………………..
Advanced Probability & Statistics (23CST-286)
Unit 1: Random Variable and Distribution Function

Random Variable
A random variable is a function that maps each outcome in a sample space to a real number.
It provides a numerical description of the outcome of a statistical experiment.

Types of Random Variables:

1. Discrete Random Variable: Takes on a countable number of distinct values.

Example: Number of heads in 3 coin tosses. Possible values: 0, 1, 2, 3.

2. Continuous Random Variable: Takes on infinitely many values in a continuous range.

Example: The height of students in a class.

Distribution Function (Cumulative Distribution Function – CDF)

For a random variable X , the distribution function is defined as:
FX (x) = P (X ≤ x)

Properties of CDF:

0 ≤ FX (x) ≤ 1 for all x

Non-decreasing

limx→−∞ FX (x) = 0

limx→∞ FX (x) = 1

Right continuous

For discrete variables, the CDF is a step function.

For continuous variables, the CDF is a smooth function.

2D Joint Probability Mass Function (Joint PMF)

For discrete random variables X and Y , the joint PMF is:
p(x, y) = P (X = x, Y = y)
It defines the probability that X takes the value x and Y takes the value y simultaneously.

Example Table:

Y=0 Y=1

X=0 0.2 0.1

1/27
Y=0 Y=1

X=1 0.3 0.4

Here, p(0, 0) = 0.2, p(1, 1) = 0.4, etc.

Marginal Probability Function
The marginal PMF is obtained by summing the joint PMF over the values of the other
variable.

For X :
PX (x) = ∑y P (X = x, Y = y)

For Y :
PY (y) = ∑x P (X = x, Y = y)

Using the example:

PX (0) = 0.2 + 0.1 = 0.3

PX (1) = 0.3 + 0.4 = 0.7

PY (0) = 0.2 + 0.3 = 0.5

PY (1) = 0.1 + 0.4 = 0.5

Conditional Probability Function

The conditional probability of X given Y = y:
P (X=x,Y =y)
P (X = x ∣ Y = y) = PY (y)

Provided PY (y) > 0.

Joint and Marginal Probability Distribution Function
For discrete variables:

Joint Distribution Function:

F (x, y) = P (X ≤ x, Y ≤ y)
Marginal Distribution Functions are derived from the joint function as described above.

Joint Density Function (for Continuous Variables)

If X and Y are continuous random variables, their joint probability density function (joint
PDF) is denoted as f (x, y), and:
P ((X, Y ) ∈ A) = ∬A f (x, y) dx dy

The marginal PDFs are:

∞
fX (x) = ∫−∞ f (x, y) dy

2/27
∞
fY (y) = ∫−∞ f (x, y) dx

Properties:

f (x, y) ≥ 0 for all x, y

∞
∬−∞ f (x, y) dx dy = 1

Example (Continuous):
If f (x, y)
= 2 for 0 < x < 1, 0 < y < 1, and 0 elsewhere,
1
then fX (x) = ∫0 2 dy = 2 for 0 < x < 1.

This example is a uniform distribution over the unit square.

Marginal Density Function

Given a joint probability density function f (x, y) of two continuous random variables X
and Y , the marginal density function of one variable is obtained by integrating out the other.

Marginal PDF of X :
∞
fX (x) = ∫−∞ f (x, y) dy

Marginal PDF of Y :
∞
fY (y) = ∫−∞ f (x, y) dx

This gives the individual distribution of each variable regardless of the other.

Conditional Distribution Function and Conditional Probability Density Function

The conditional density function describes the distribution of one variable given that the
other variable has a fixed value.

Conditional PDF of X given Y = y:

f (x,y)
fX∣Y (x∣y) =

fY (y)

if fY (y) > 0

Conditional PDF of Y given X = x:

f (x,y)
fY ∣X (y∣x) =

fX (x)

if fX (x) > 0

Conditional Distribution Function (CDF) of X given Y = y:

x
FX∣Y (x∣y) = P (X ≤ x ∣ Y = y) = ∫

fX∣Y (t∣y) dt

−∞

Independent Random Variables

Two random variables X and Y are independent if their joint density is the product of their
marginals:

f (x, y) = fX (x) ⋅ fY (y)

3/27
For discrete random variables:

P (X = x, Y = y) = P (X = x) ⋅ P (Y = y)

Independence implies that knowing the value of one variable gives no information about the
other.

Bivariate Distribution
A bivariate distribution describes the joint distribution of two random variables X and Y . It
can be expressed through:

Joint PMF (for discrete variables): p(x, y) = P (X = x, Y = y)

Joint PDF (for continuous variables): f (x, y)

The bivariate distribution includes all information about the behavior and relationship
between two variables.

Example (Discrete Case):

Let X = number of heads in 2 tosses, Y = number of tails.
The joint PMF will include probabilities for combinations like (1,1), (2,0), (0,2), etc.

Correlation
Correlation measures the strength and direction of a linear relationship between two
random variables.

Let X and Y be two random variables.

Let μX = E(X), μY = E(Y ), σX = SD(X), σY = SD(Y )

Then, the covariance between X and Y is:

Cov(X, Y ) = E[(X − μX )(Y − μY )] = E(XY ) − E(X)E(Y )

Karl Pearson Coefficient of Correlation (r):

Cov(X, Y )
rXY =

σX σY

Properties:

4/27
−1 ≤ rXY ≤ 1

r = 1: perfect positive correlation

r = −1: perfect negative correlation
r = 0: no linear correlation (but variables may still be dependent)

Example:
If X is the number of study hours and Y is exam score, a positive correlation is expected
(higher study hours lead to higher scores).

Correlation only captures linear dependency. Nonlinear relationships may exist even when
correlation is zero.

Unit 2: Expectation, Moments and Law of Large Numbers

1. Transformation of One- and Two-Dimensional Random Variables

Transformation refers to deriving the distribution of a function of one or more random

variables.

For One Random Variable (Univariate Case):

Let X be a random variable with known probability distribution and Y = g(X) is a
transformation.
If g is monotonic and differentiable, then the probability density function (PDF) of Y is given
by:

dx
fY (y) = fX (x)
, where x = g −1 (y)

Example:
Let X ∼ U (0, 1), and let Y = X .

Then X = Y 2 ⇒ fY (y) = fX (y 2 ) ⋅ ∣2y∣ = 1 ⋅ 2y = 2y, 0 ≤ y ≤ 1

For Two Random Variables (Bivariate Case):

Let X and Y be two continuous random variables and U = g(X, Y ), V = h(X, Y ).
The joint PDF of U and V is found using the Jacobian determinant:

−1
fU ,V (u, v) = fX,Y (x, y) ⋅ ∣J ∣

where

∂(x, y)
J=
∂(u, v)

5/27
2. Distribution of the Difference, Product, Quotient of Two Random Variables

Difference: If Z = X − Y , and X and Y are independent,

∞
fZ (z) = ∫
fX (x)fY (x − z)dx

−∞

Product: For Z = XY , the PDF is more complex and often solved using transformation
techniques or convolution in log-domain.

Quotient: If Z = X
Y
, the density function is:
∞
fZ (z) = ∫
∣y∣fX (zy)fY (y)dy

−∞

3. Mathematical Expectation of a Random Variable

Definition:
The expected value or mean of a random variable X gives the average value in the long run.

For discrete X :

E[X] = ∑ xP (X = x)

For continuous X :
∞
E[X] = ∫ xfX (x)dx

−∞

Properties:

Linearity: E[aX + b] = aE[X] + b

Expectation of a function: E[g(X)] = ∑x g(x)P (X = x) or ∫ g(x)f (x)dx

4. Moments

Moments describe the shape characteristics of a probability distribution.

6/27
r-th moment about origin:

μ′r = E[X r ]

r-th central moment:

μr = E[(X − μ)r ]

Special Cases:

μ′1 = E[X] (Mean)

μ2 = Variance = E[(X − μ)2 ]

μ3 relates to skewness

μ4 relates to kurtosis

5. Moments of Bivariate Probability Distribution

Given two random variables X and Y :

Joint expectation:

E[XY ] = ∑ ∑ xy ⋅ P (X = x, Y = y ) (discrete) or ∫∫ xyfX,Y (x, y )dxdy

x y

Covariance:

Cov(X, Y ) = E[XY ] − E[X]E[Y ]

Correlation coefficient (ρ):

Cov(X, Y )
ρXY =

σX σY

If ρ = 0, the variables are uncorrelated.

6. Law of Large Numbers (LLN)

7/27
Definition:
The law states that as the number of trials increases, the sample mean approaches the
population mean.

Weak Law (Chebyshev’s form):

For i.i.d. random variables X1 , X2 , … , Xn with mean μ,

P ( ∑ Xi − μ ≥ ϵ) → 0 as n → ∞
n
1

n
i=1

Strong Law:
The sample average converges to the population mean almost surely (with probability
1).

Implication:
Justifies using sample averages to estimate expected values in practical applications.

Unit 2: Expectation, Moments and Law of Large Numbers (Contd)

1. Conditional Expectation

Discrete Case:
Let X and Y be discrete random variables. The conditional expectation of X given Y = y is:

E[X∣Y = y] = ∑ x ⋅ P (X = x∣Y = y)

The function E[X∣Y ] is a random variable that depends on Y .

Continuous Case:
If X and Y are continuous random variables with joint density function fX,Y (x, y) and

marginal fY (y), then:

∞
E[X∣Y = y] = ∫ x ⋅ fX∣Y (x∣y)dx

−∞
fX,Y (x,y)
where fX∣Y (x∣y) =

fY (y)

2. Conditional Variance

8/27
Definition:
The conditional variance of X given Y = y is the variance of X when Y = y is known.

Var(X∣Y = y) = E[(X − E[X∣Y = y])2 ∣Y = y]

For continuous case:

∞
Var(X∣Y = y) = ∫ (x − E[X∣Y = y])2 fX∣Y (x∣y)dx

−∞

Law of Total Expectation:

E[X] = E[E[X∣Y ]]

Law of Total Variance:

Var(X) = E[Var(X∣Y )] + Var(E[X∣Y ])

3. Moment Generating Functions (MGF)

Definition:
The moment generating function of a random variable X is defined as:

MX (t) = E[etX ]

For discrete:

MX (t) = ∑ etx P (X = x)

For continuous:
∞
MX (t) = ∫
etx fX (x)dx
−∞

Properties:
(n)
MX (0) = E[X n ], i.e., the nth derivative of the MGF at t = 0 gives the nth moment.

If X and Y are independent, then MX+Y (t) = MX (t) ⋅ MY (t)

9/27
4. Chebyshev’s Inequality

Statement:
For any random variable X with finite mean μ and variance σ 2 , and for any k > 0:
1
P (∣X − μ∣ ≥ kσ) ≤
k2

This inequality gives an upper bound on the probability that the value of a random variable
deviates from its mean by more than k standard deviations.

Use:
Chebyshev’s inequality is used to prove the Weak Law of Large Numbers and to give non-
parametric bounds on dispersion.

5. Weak Law of Large Numbers (WLLN)

Statement:
Let X1 , X2 , ..., Xn be i.i.d. random variables with finite mean μ. Then the sample mean

ˉn =
X 1 n
∑i=1 Xi satisfies:
n

ˉ n − μ∣ ≥ ϵ) = 0 for any ϵ > 0
lim P (∣X

n→∞

This means the sample average converges in probability to the expected value.

Proof uses Chebyshev’s Inequality:

ˉ σ2
P (∣Xn − μ∣ ≥ ϵ) ≤ 2 → 0 as n → ∞

nϵ

6. Central Limit Theorem (CLT)

Statement:
Let X1 , X2 , ..., Xn be i.i.d. random variables with mean μ and variance σ 2 . Then, as n

→∞
, the standardized sum:
n
∑ Xi − nμ
Zn = i=1 → N (0, 1)

σ n

10/27
In other words, the distribution of Zn tends to the standard normal distribution.

Implication:
No matter the original distribution of Xi , the sample sum or mean becomes approximately

normal for large n. This is the basis of many statistical inference techniques.

Example:
If tossing a biased coin n times (say probability of head = 0.6), the distribution of number of
heads (a binomial variable) will resemble a normal curve as n becomes large.

Unit 3: Methods of Estimation

1. Difference Between Likelihood and Probability

Probability is used when the parameters are known, and we compute the likelihood of
observing a particular data point or set of data.
Example: If we know a coin is fair (P (H) = 0.5), the probability of getting 2 heads in 2
tosses is 0.5 × 0.5 = 0.25.
Likelihood is used when data is observed, and we want to estimate the unknown
parameters that best explain the data.
Example: If we toss a coin twice and get two heads, the likelihood of different values of p
(probability of heads) can be evaluated by L(p) = p2 . The value of p that maximizes this
likelihood is taken as the estimate.

2. Parameter Space

The parameter space is the set of all possible values that a parameter can take.

For example, if p is the probability of success in a Bernoulli trial, then the parameter
space is 0 ≤ p ≤ 1.
For a normal distribution N (μ, σ 2 ), the parameter space is μ ∈ R, σ 2 > 0.

3. Characteristics of Estimators

Let θ^ be an estimator of the parameter θ :

Unbiasedness: E[θ^] = θ. The estimator is correct on average.

11/27
Consistency: θ^ → θ in probability as sample size n → ∞.
Efficiency: Among all unbiased estimators, the one with the smallest variance is called
efficient.

Sufficiency: An estimator is sufficient if it uses all the information in the sample about
the parameter.

Minimum Variance Unbiased Estimator (MVUE): An unbiased estimator that has the
smallest variance among all unbiased estimators.

4. Method of Maximum Likelihood Estimation (MLE)

A general method of estimating parameters by maximizing the likelihood function.

Given sample X1 , X2 , ..., Xn and probability density function f (x; θ), the likelihood is:

n
L(θ) = ∏ f (Xi ; θ)

i=1

Often the log-likelihood is used for simplification:

n
log L(θ) = ∑ log f (Xi ; θ)

i=1

Set derivative of log-likelihood to zero:

d
log L(θ) = 0

dθ
Solve for θ to get the MLE.

Example: For a normal distribution N (μ, σ 2 ) with known variance:

n
1 (Xi − μ)2
L(μ) = ∏ exp (− )

2πσ 2 2σ 2

i=1

Taking log and maximizing leads to:

1
^M LE =
μ
∑ Xi

12/27
5. Method of Minimum Variance

Aims to find an unbiased estimator θ^ such that:

Var(θ^) is minimized among all unbiased estimators

Rao-Blackwell Theorem is often used to derive MVUE using a sufficient statistic.

6. Method of Moments (MoM)

Based on the idea that population moments can be estimated using sample moments.

If μ′1 , μ′2 , ..., μ′k are the first k population moments expressed as functions of the

parameters θ1 , θ2 , ..., θk , then equating:

n
1
μ′j = ∑ Xi
j
for j = 1, 2, ..., k
n
i=1

Solve the equations to get estimates of parameters.

Example: For an exponential distribution f (x; λ) = λe−λx , the first moment is μ = 1/λ.
ˉ , we equate:
Using sample mean X

ˉ = 1 ⇒λ
X ^= 1
^
λ ˉ
X

7. Method of Least Squares

Commonly used in regression analysis where the aim is to minimize the sum of squared
deviations between observed and predicted values.

For linear model Y = β0 + β1 X + ϵ, define the residual sum of squares:

n
S(β0 , β1 ) = ∑(Yi − β0 − β1 Xi )2

i=1

Take partial derivatives of S w.r.t. β0 , β1 , set to zero and solve:

ˉ )(Yi − Yˉ )
∑(Xi − X
β^1 = , β^0 = Yˉ − β^1 X
ˉ

∑(Xi − X ˉ )2

13/27
This method ensures the best linear unbiased estimates (BLUE) under standard assumptions.

Unit 3: Methods of Estimation (Continued)

Sampling

Sampling is the process of selecting a subset of individuals or observations from a larger

population to estimate characteristics of the whole population. Since it is often impractical or
expensive to collect data from every member of a population, sampling provides a practical
way to draw inferences about the population.

Types of Sampling

1. Random Sampling (Simple Random Sampling):

Every unit of the population has an equal chance of being selected. Selection is entirely
by chance.
Example: Picking names from a hat.

2. Stratified Sampling:
The population is divided into subgroups (strata) based on a particular characteristic (like
age, gender), and samples are randomly drawn from each subgroup.
Example: Surveying both male and female participants separately.

3. Systematic Sampling:
Every kth unit from a list is selected after a random starting point.
Example: Selecting every 10th student from a roll list.

4. Cluster Sampling:
The population is divided into clusters (groups), some clusters are randomly selected,
and then all or some elements within those clusters are studied.
Example: Selecting a few classrooms and interviewing all students within those classes.

5. Convenience Sampling:
Sample is taken from a group that is easy to access. This is non-probabilistic and may
involve biases.
Example: Interviewing people in a nearby park.

6. Quota Sampling:
The population is segmented, and a quota is set for each segment. Within each quota,
sampling is done conveniently or randomly.
Example: Choosing 10 people from each age group for a survey.

14/27
Algorithms Using Regression

1. Gradient Descent Algorithm

Gradient Descent is an optimization algorithm used to minimize the cost function in

regression and other learning algorithms.

The idea is to update parameters (e.g., weights θ ) in the opposite direction of the
gradient of the cost function with respect to the parameters.

Cost Function (for linear regression):

m
1
J(θ) = ∑(hθ (x(i) ) − y (i) )2
2m

i=1

where hθ (x)
= θ0 + θ1 x

Update Rule:

∂J(θ)
θj := θj − α
∂θj

where α is the learning rate.

Example: For linear regression with one feature:

m
1
θ0 := θ0 − α ∑(hθ (x(i) ) − y (i) )

m
i=1
m
1
θ1 := θ1 − α ∑(hθ (x(i) ) − y (i) )x(i)

m
i=1

2. Locally Weighted Regression (LWR)

Locally Weighted Regression is a non-parametric algorithm that fits multiple linear models
locally to subsets of the data.

Instead of using a single global model for the entire dataset, LWR fits a model at a target
query point x using nearby training data.

A weight is assigned to each training example based on its distance from x.

A common weighting function is:

()

15/27
(x(i) − x)2
w (i)
= exp (− )
2τ 2

where τ controls the decay rate (bandwidth parameter).

The cost function becomes:

m
J(θ) = ∑ w(i) (hθ (x(i) ) − y (i) )2

i=1

A separate θ is computed for each query point using weighted linear regression.

LWR is computationally expensive at test time but offers flexibility and good local fitting.

3. Logistic Regression

Logistic Regression is used for binary classification problems, where the output is 0 or 1.

The hypothesis is:

1
hθ (x) =
1 + e−θ x

This is a sigmoid function that maps any real-valued number into the (0,1) interval.

Cost Function:
m
1
J(θ) = − ∑ [y (i) log(hθ (x(i) )) + (1 − y (i) ) log(1 − hθ (x(i) ))]

m i=1

The cost function is convex, allowing gradient descent to converge to the global
minimum.

Gradient Descent Update:

m
1
θ := θ − α ⋅ ∑(hθ (x(i) ) − y (i) )x(i)

m
i=1

Logistic regression is interpretable and works well when the data is linearly separable.
For multi-class classification, softmax regression (multinomial logistic regression) is
used.

MST 1 Solutions

16/27
1. Define a random variable and its types

A random variable (RV) is a function that assigns a real number to each outcome in a sample
space of a random experiment.

Types of Random Variables:

Discrete Random Variable: Takes countable values.

Example: Number of heads in 3 coin tosses (0,1,2,3).

Continuous Random Variable: Takes any value in a given interval (infinite uncountable
set).
Example: Time taken to run a race.

2. Compute the value of k for the following probability distribution of a random variable
X, P(X = 1) = 0.5, P(X = 2) = 0.3 and P(X = 3) = k

We know that the sum of all probabilities in a probability distribution must equal 1.

P (X = 1) + P (X = 2) + P (X = 3) = 1

0.5 + 0.3 + k = 1

k = 1 − 0.8 = 0.2

Answer: k = 0.2

3. Define the term correlation coefficient

The correlation coefficient (specifically, Pearson’s correlation coefficient, denoted by r )

measures the strength and direction of the linear relationship between two variables.

It is defined as:

Cov(X, Y )
r=

σX σY

Where:

17/27
Cov(X, Y) is the covariance of X and Y.

σX , σY are standard deviations of X and Y.

Range: −1 ≤r≤1
r = 1: Perfect positive linear correlation
r = −1: Perfect negative linear correlation
r = 0: No linear correlation

4. Explain a cumulative distribution function (CDF) and provide an example

A Cumulative Distribution Function (CDF) of a random variable X gives the probability that
X will take a value less than or equal to x:

F (x) = P (X ≤ x)

Properties of CDF:

Non-decreasing

limx→−∞ F (x) = 0

limx→∞ F (x) = 1

Example (for discrete RV):

Let P (X = 1) = 0.2, P (X = 2) = 0.3, P (X = 3) = 0.5
Then:

F (1) = P (X ≤ 1) = 0.2

F (2) = P (X ≤ 2) = 0.2 + 0.3 = 0.5

F (3) = P (X ≤ 3) = 0.2 + 0.3 + 0.5 = 1

5. Explain the concept of independence for two random variables X and Y

Two random variables X and Y are independent if the occurrence of one does not affect
the probability distribution of the other.

18/27
Mathematical condition:
For all values of x and y ,

P (X = x and Y = y) = P (X = x) ⋅ P (Y = y)

Example:
Let P (X = 1) = 0.5, P (Y = 1) = 0.4, and P (X = 1, Y = 1) = 0.2
Then X and Y are independent if

P (X = 1, Y = 1) = 0.5 ⋅ 0.4 = 0.2 ⇒ Yes, they are independent

If this condition fails for even one pair, X and Y are not independent.

2/2
6. Calculate the Pearson correlation coefficient for the following data
X (Father's height in inches): 65, 66, 67, 67, 68, 69, 70, 72
Y (Son's height in inches): 61, 68, 65, 68, 72, 72, 64, 71

We use the Pearson correlation coefficient formula:

n ∑ XY −∑ X ∑ Y
r=
[n ∑ X 2 −(∑ X)2 ][n ∑ Y 2 −(∑ Y )2 ]

Step 1: Compute required values

X Y X² Y² XY

65 61 4225 3721 3965

66 68 4356 4624 4488

67 65 4489 4225 4355

67 68 4489 4624 4556

68 72 4624 5184 4896

69 72 4761 5184 4968

70 64 4900 4096 4480

72 71 5184 5041 5112

ΣX=544 ΣY=541 ΣX²=37028 ΣY²=36699 ΣXY=36820

19/27
Step 2: Plug into formula

8(36820) − (544)(541)
r=
[8(37028) − 5442 ][8(36699) − 5412 ]

294560 − 294304
=
[296224 − 295936][293592 − 292681]

256 256 256

= = ≈ ≈ 0.500
288 ⋅ 911 262368 512.18

r ≈ 0.500 — moderate positive correlation.

7. The length of time (in minutes) that a certain lady speaks on a telephone is found to be
random phenomenon with a probability function specified by the pdf

f (x) = Ae−x/5 , x ≥ 0; f (x) = 0 otherwise

(a) Determine the value of A that makes f(x) a valid pdf

∞
∫ Ae−x/5 dx = 1
0 ∞
A∫ e−x/5 dx = 1
0
A ⋅ [−5e−x/5 ]∞
0 =1

1
A ⋅ (0 + 5) = 1 ⇒ A = = 0.2
5

A = 0.2

(b) Calculate the probability that the number of minutes she will talk over phone is more
than 10 minutes
∞
P (X > 10) = ∫ 0.2e−x/5 dx = 0.2 ⋅ [−5e−x/5 ]∞
10

10
= 0.2 ⋅ [0 + 5e−2 ] = 0.2 ⋅ 5 ⋅ e−2 = e−2 ≈ 0.1353

P(X > 10) ≈ 0.1353

MST 2 solutions

20/27
1. Define the probability density function (pdf) of a random variable Y when Y = g(X),
where g(X) is a continuous function.
Solution:
If Y = g(X) and g is a continuous and differentiable function with a strictly monotonic
inverse g −1 (y), then the probability density function (pdf) of Y is given by:

d −1
fY (y) = fX (g −1 (y)) ⋅
g (y)

Here, fX (x) is the pdf of the original variable X , and this formula is valid where g −1 (y)

exists and is differentiable.

2. State the formula for the expectation of the product of two independent random
variables.
Solution:
If X and Y are two independent random variables, then:

E[XY ] = E[X] ⋅ E[Y ]

3. Describe the process of finding the expected value of a discrete random variable X
given that X takes values 0, 1, and 2 with probabilities 0.3, 0.4, and 0.3 respectively.
Solution:
The expected value E[X] of a discrete random variable is calculated as:

E[X] = ∑ xi ⋅ P (xi )

Here:

E[X] = 0 ⋅ 0.3 + 1 ⋅ 0.4 + 2 ⋅ 0.3 = 0 + 0.4 + 0.6 = 1.0

Expected value E[X] = 1.0

21/27
4. Define the moment generating function (mgf) of a random variable.
Solution:
The moment generating function (MGF) of a random variable X is defined as:

MX (t) = E[etX ]

It is used to generate the moments (like mean, variance) of the distribution by differentiating
the MGF with respect to t and evaluating at t = 0. That is:
(n)
E[X n ] = MX (0)

if the MGF exists in an open interval around t = 0.

5. Compute the mgf of a continuous random variable with the probability density
function fX (x)
= 2x for 0 ≤ x ≤ 1.
Solution:
The moment generating function is given by:

1
MX (t) = ∫ etx ⋅ 2x dx
0

To solve this, use integration by parts. Let:

u = 2x ⇒ du = 2dx
etx
dv = etx dx ⇒ v = t

Then,

1 1
2x ⋅ etx 2etx
MX (t) = [ ] −∫ dx

t 0 0 t
1
2et 2 2e t
2
= − ⋅ ∫ etx dx =

− 2 (et − 1)

t t 0 t t
2et 2(et − 1)
MX (t) = −
t2

t
This is the MGF of X for the given pdf.

6. Let X be a random variable with the following probability distribution:

X : −3, 6, 9

22/27
1 1 1
P (X = x) : , ,
6 2 3

Calculate E(X), E(X 2 ) and using the laws of expectation, evaluate E((2X + 1)2 ) for
the given probability distribution of X .

Solution:

Step 1: Compute E(X)

E(X) = ∑ x ⋅ P (X = x)
1 1 1
= (−3) ⋅ + 6 ⋅ + 9 ⋅
6 2 3

= −0.5 + 3 + 3 = 5.5

Step 2: Compute E(X 2 )

E(X 2 ) = ∑ x2 ⋅ P (X = x)
1 1 1
= (−3)2 ⋅ + 62 ⋅ + 92 ⋅
6 2 3

1 1 1
= 9 ⋅ + 36 ⋅ + 81 ⋅
6 2 3

= 1.5 + 18 + 27 = 46.5

Step 3: Evaluate E((2X + 1)2 )

(2X + 1)2 = 4X 2 + 4X + 1

E((2X + 1)2 ) = E(4X 2 + 4X + 1) = 4E(X 2 ) + 4E(X) + 1

= 4(46.5) + 4(5.5) + 1 = 186 + 22 + 1 = 209

Final Answers:

E(X) = 5.5
E(X 2 ) = 46.5
E((2X + 1)2 ) = 209

23/27
7. Compute the Moment Generating Function (MGF) about origin of a Normal
Distribution.

Let X ∼ N (μ, σ 2 ). We need to find the moment generating function (MGF), defined by:

MX (t) = E[etX ]

Solution:

Step 1: Start with the definition

Let X be a continuous random variable with PDF:

1 (x − μ)2
f (x) = exp (− )
2πσ 2 2σ 2

The MGF is given by:

∞ ∞
1 (x − μ)2
MX (t) = ∫ e f (x) dx = ∫
tx
e ⋅
tx
⋅ exp (− ) dx
2πσ 2 2σ 2

−∞ −∞

Step 2: Combine exponentials

∞
1 (x − μ)2
MX (t) = ∫ ⋅ exp (tx − ) dx
2πσ 2 2σ 2

−∞

We use the known result after completing the square:

Step 3: Final simplified expression

The moment generating function of X ∼ N (μ, σ 2 ) is:

1
MX (t) = exp (μt + σ 2 t2 )
2

This formula is valid for all real t ∈ R.

24/27
Final Answer:

1
MX (t) = exp (μt + σ 2 t2 )
2

Sample Questions for Unit 3

2-Marks Questions (12)

1. Define sampling and list two common types.

2. State the difference between likelihood and probability.

3. Define parameter space with an example.

4. List two properties of an unbiased estimator.

5. Write the general formula for the Maximum Likelihood Estimator (MLE).

6. Define the method of moments.

7. What is the method of least squares?

8. A sample contains values: 4, 7, 9. Estimate the population mean using the method of
moments.

9. In simple random sampling, what is the probability that the first item selected is the
largest in the population of size n?

10. Define logistic regression.

11. Write one key difference between gradient descent and locally weighted regression.

12. What does the term “independent identically distributed” (i.i.d) mean?

5-Marks Questions (6)

1. Derive the method of moments estimators for the parameters of an exponential

distribution.

25/27
2. A sample of 5 values: 2, 3, 5, 7, 9. Estimate the population variance using the method of
moments.

3. Find the MLE of p for a binomial distribution based on the observation: 3 successes in 5
trials.

4. Use gradient descent (one iteration) to update weights for minimizing the function
J(θ) = (θx − y)2 , where x = 2, y = 5, θ = 1, and learning rate α = 0.1.
5. Describe stratified and systematic sampling with appropriate examples.

6. Consider a regression model Y = a + bX . Given data:

X = [1, 2, 3], Y = [2, 4, 5],
find the best-fit line using the method of least squares.

10-Marks Questions (6)

1. A random sample of size 10 from a normal distribution yielded the following values:
6, 8, 9, 10, 7, 9, 11, 12, 8, 10.
(a) Estimate the population mean and variance using the method of moments.
(b) Derive the maximum likelihood estimators for the same parameters.

2. Perform two iterations of gradient descent for minimizing the cost function
1 m
J(θ) = ∑i=1 (θxi − yi )2
m

using data: (x1 = 1, y1 = 2), (x2 = 2, y2 = 3), initial θ = 0, learning rate α = 0.1.

3. For the logistic regression hypothesis function

1
hθ (x) =

1+e−θT x
,

derive the cost function and explain how gradient descent is used to minimize it.

4. A sample of data is: X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5].

Fit a regression line using the least squares method and compute the predicted value of
Y when X = 6.
5. Let X1 , X2 , … , Xn be a random sample from a distribution with pdf

f (x; λ) = λe−λx , x ≥ 0.
Derive the MLE for λ based on a sample of size n.

6. Explain in detail the various types of sampling methods:

(a) Simple random sampling
(b) Stratified sampling

26/27
(c) Cluster sampling
(d) Systematic sampling
Include relevant examples and illustrations.

27/27

Joint Distribution and Later
No ratings yet
Joint Distribution and Later
61 pages
Sugar Plant Specifications 5000 TCD-7500 TCD
80% (5)
Sugar Plant Specifications 5000 TCD-7500 TCD
104 pages
03-Multiple Random Variables
No ratings yet
03-Multiple Random Variables
42 pages
Module 3
No ratings yet
Module 3
93 pages
Maths Unit III
No ratings yet
Maths Unit III
66 pages
Random Variables: Statistics (ECON 511)
No ratings yet
Random Variables: Statistics (ECON 511)
49 pages
Lecture 7 - Fall 2023
No ratings yet
Lecture 7 - Fall 2023
29 pages
S201, Lec 2
No ratings yet
S201, Lec 2
48 pages
Lect 3
No ratings yet
Lect 3
32 pages
Lecture Notes
No ratings yet
Lecture Notes
23 pages
Stats 116 SU
No ratings yet
Stats 116 SU
128 pages
Joint Distribution
No ratings yet
Joint Distribution
37 pages
Probability
No ratings yet
Probability
44 pages
Supportive Notes & QB-Distribution Theory-PS-Unit2
No ratings yet
Supportive Notes & QB-Distribution Theory-PS-Unit2
11 pages
Probability Theory: Lecture 16: Lecture by Ishapathik Das Slides by Shilpak Banerjee
No ratings yet
Probability Theory: Lecture 16: Lecture by Ishapathik Das Slides by Shilpak Banerjee
22 pages
13 Independent Random Variables
No ratings yet
13 Independent Random Variables
34 pages
Randon Variable and Probability Distribution
No ratings yet
Randon Variable and Probability Distribution
75 pages
Week 11
No ratings yet
Week 11
24 pages
Unit - Ii
No ratings yet
Unit - Ii
65 pages
Lecure-4 Probability
No ratings yet
Lecure-4 Probability
51 pages
CHAPTER-03-Random Variables
No ratings yet
CHAPTER-03-Random Variables
42 pages
Chapter 5
No ratings yet
Chapter 5
46 pages
ENGDAT1 Module4 PDF
No ratings yet
ENGDAT1 Module4 PDF
32 pages
ST3236 Note3
No ratings yet
ST3236 Note3
17 pages
Appendix A Probability and Statistics
No ratings yet
Appendix A Probability and Statistics
12 pages
PRO-Ch4 (2021-22 Note
No ratings yet
PRO-Ch4 (2021-22 Note
52 pages
Joint Density
No ratings yet
Joint Density
28 pages
Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au
No ratings yet
Lecture 1: Introduction and Review of Prerequisite Concepts: DR Jay Lee Jay - Lee@unsw - Edu.au
33 pages
Joint Dist
No ratings yet
Joint Dist
30 pages
Random Variables Cheatsheet
No ratings yet
Random Variables Cheatsheet
3 pages
MAT 326 Chapter 7 Fall 2024
No ratings yet
MAT 326 Chapter 7 Fall 2024
9 pages
Admit Card SSC CGL
No ratings yet
Admit Card SSC CGL
6 pages
Llecture2 1
No ratings yet
Llecture2 1
62 pages
LN 1
No ratings yet
LN 1
11 pages
Joint Distributions: A Random Variable Is That Maps To Numbers
No ratings yet
Joint Distributions: A Random Variable Is That Maps To Numbers
37 pages
Jointly Distributed Random Variables: Jeff Chak Fu WONG
No ratings yet
Jointly Distributed Random Variables: Jeff Chak Fu WONG
44 pages
Lecture+17 Inclass
No ratings yet
Lecture+17 Inclass
33 pages
CHAPTER 03-Random Variable
No ratings yet
CHAPTER 03-Random Variable
68 pages
ReactJS PDF
No ratings yet
ReactJS PDF
403 pages
APMA1655
No ratings yet
APMA1655
56 pages
HPE6-A88 HPE Aruba Networking ClearPass Exam Free Dumps
No ratings yet
HPE6-A88 HPE Aruba Networking ClearPass Exam Free Dumps
10 pages
R Variables
No ratings yet
R Variables
9 pages
Chap 3: Two Random Variables: X X X X X
No ratings yet
Chap 3: Two Random Variables: X X X X X
63 pages
(Last) Extension of Several Random Variables
No ratings yet
(Last) Extension of Several Random Variables
16 pages
CH 5 3502 PDF
No ratings yet
CH 5 3502 PDF
5 pages
Joint
No ratings yet
Joint
5 pages
PTSP Notes Unit 2
No ratings yet
PTSP Notes Unit 2
15 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
Week 8 Notes
No ratings yet
Week 8 Notes
8 pages
Catalogo Typhoon Piaggio 125 4T 2V 2010-2011
No ratings yet
Catalogo Typhoon Piaggio 125 4T 2V 2010-2011
62 pages
Sta 242 Bivariate Analysis
No ratings yet
Sta 242 Bivariate Analysis
46 pages
Chap 3: Two Random Variables: X X X X X
No ratings yet
Chap 3: Two Random Variables: X X X X X
63 pages
Itt459 Individual Assignment
No ratings yet
Itt459 Individual Assignment
28 pages
Chapter 4: Multiple Random Variables
No ratings yet
Chapter 4: Multiple Random Variables
34 pages
Theories Joint Distribution PDF
No ratings yet
Theories Joint Distribution PDF
25 pages
Probabilty Distributions
No ratings yet
Probabilty Distributions
7 pages
Lect6 PDF
No ratings yet
Lect6 PDF
11 pages
Prob Review
No ratings yet
Prob Review
19 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
6 pages
Joint Random Variables 1
No ratings yet
Joint Random Variables 1
11 pages
Review MidtermII Summer09
No ratings yet
Review MidtermII Summer09
51 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Probability
No ratings yet
Probability
28 pages
Theories Joint Distribution
No ratings yet
Theories Joint Distribution
25 pages
PEAC Lesson Plan English 8
No ratings yet
PEAC Lesson Plan English 8
2 pages
Foundation Load (Reactions) Data FOR 45 M Diameter Thickener
No ratings yet
Foundation Load (Reactions) Data FOR 45 M Diameter Thickener
88 pages
Preliminaryproject
No ratings yet
Preliminaryproject
9 pages
0613 CT 0001
No ratings yet
0613 CT 0001
180 pages
Unit 5 - Part - 2 Limitations of Algorithm Power
No ratings yet
Unit 5 - Part - 2 Limitations of Algorithm Power
9 pages
Fundamentals of Mathematics Unit 2 - V1
No ratings yet
Fundamentals of Mathematics Unit 2 - V1
21 pages
ML Syllabus Updated E13137
No ratings yet
ML Syllabus Updated E13137
7 pages
STP 1571-2014
No ratings yet
STP 1571-2014
184 pages
Mangotes Plantadeiras
No ratings yet
Mangotes Plantadeiras
20 pages
Quiz 2 AIS Niko Arniño
No ratings yet
Quiz 2 AIS Niko Arniño
8 pages
Battery Degradation in Ev and Hev
No ratings yet
Battery Degradation in Ev and Hev
30 pages
Prashanth 091123
No ratings yet
Prashanth 091123
8 pages
PCworth Product Pricelist
No ratings yet
PCworth Product Pricelist
22 pages
2022 Construction Estimating Pricing Guide
No ratings yet
2022 Construction Estimating Pricing Guide
13 pages
Agri-Fishery LAS 5
No ratings yet
Agri-Fishery LAS 5
5 pages
Block Diagram
No ratings yet
Block Diagram
6 pages
Kunuba Prliminary Pages 8
No ratings yet
Kunuba Prliminary Pages 8
9 pages
Engine Immobilizer System
No ratings yet
Engine Immobilizer System
6 pages
How To Importing Text File
No ratings yet
How To Importing Text File
18 pages
Database Administration Level IV Theory Exam 6
No ratings yet
Database Administration Level IV Theory Exam 6
5 pages
Inspeksi Electrical
No ratings yet
Inspeksi Electrical
2 pages
WM412C.1-V1.1-1.2 Main Vertical Sections
No ratings yet
WM412C.1-V1.1-1.2 Main Vertical Sections
1 page
Sonnenschein A412/20 G5 Data Sheet: Drawing: Terminal
No ratings yet
Sonnenschein A412/20 G5 Data Sheet: Drawing: Terminal
1 page
Rotella DD: Two-Stroke Diesel Engine Oil
No ratings yet
Rotella DD: Two-Stroke Diesel Engine Oil
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Advanced Probability & Statistics - 23CST-286

Uploaded by

Advanced Probability & Statistics - 23CST-286

Uploaded by

Advanced Probability & Statistics

ALL UNITS - NOTES & QUESTIONS

Types of Random Variables:

1. Discrete Random Variable: Takes on a countable number of distinct values.

2. Continuous Random Variable: Takes on infinitely many values in a continuous range.

Distribution Function (Cumulative Distribution Function – CDF)

0 ≤ FX (x) ≤ 1 for all x

For discrete variables, the CDF is a step function.

2D Joint Probability Mass Function (Joint PMF)

X=0 0.2 0.1

X=1 0.3 0.4

Here, p(0, 0) = 0.2, p(1, 1) = 0.4, etc.

Using the example:

PX (0) = 0.2 + 0.1 = 0.3

PX (1) = 0.3 + 0.4 = 0.7

PY (0) = 0.2 + 0.3 = 0.5

PY (1) = 0.1 + 0.4 = 0.5

Conditional Probability Function

Provided PY (y) ​ > 0.

Joint Distribution Function:

Joint Density Function (for Continuous Variables)

The marginal PDFs are:

f (x, y) ≥ 0 for all x, y

This example is a uniform distribution over the unit square.

Marginal Density Function

Conditional Distribution Function and Conditional Probability Density Function

Conditional PDF of X given Y = y:

Conditional PDF of Y given X = x:

Conditional Distribution Function (CDF) of X given Y = y:

Independent Random Variables

f (x, y) = fX (x) ⋅ fY (y)

Joint PMF (for discrete variables): p(x, y) = P (X = x, Y = y)

Example (Discrete Case):

Let X and Y be two random variables.

Then, the covariance between X and Y is:

Cov(X, Y ) = E[(X − μX )(Y − μY )] = E(XY ) − E(X)E(Y )

Karl Pearson Coefficient of Correlation (r):

r = 1: perfect positive correlation

Unit 2: Expectation, Moments and Law of Large Numbers

1. Transformation of One- and Two-Dimensional Random Variables

Transformation refers to deriving the distribution of a function of one or more random

For One Random Variable (Univariate Case):

Then X = Y 2 ⇒ fY (y) = fX (y 2 ) ⋅ ∣2y∣ = 1 ⋅ 2y = 2y, 0 ≤ y ≤ 1

For Two Random Variables (Bivariate Case):

Difference: If Z = X − Y , and X and Y are independent,

3. Mathematical Expectation of a Random Variable

Linearity: E[aX + b] = aE[X] + b

Moments describe the shape characteristics of a probability distribution.

r-th central moment:

μ′1 = E[X] (Mean)

μ2 = Variance = E[(X − μ)2 ]

5. Moments of Bivariate Probability Distribution

Given two random variables X and Y :

E[XY ] = ∑ ∑ xy ⋅ P (X = x, Y = y ) (discrete) or ∫∫ xyfX,Y (x, y )dxdy

Cov(X, Y ) = E[XY ] − E[X]E[Y ]

Correlation coefficient (ρ):

If ρ = 0, the variables are uncorrelated.

6. Law of Large Numbers (LLN)

Weak Law (Chebyshev’s form):

Unit 2: Expectation, Moments and Law of Large Numbers (Contd)

The function E[X∣Y ] is a random variable that depends on Y .

marginal fY (y), then:

Var(X∣Y = y) = E[(X − E[X∣Y = y])2 ∣Y = y]

For continuous case:

Law of Total Expectation:

Law of Total Variance:

Var(X) = E[Var(X∣Y )] + Var(E[X∣Y ])

3. Moment Generating Functions (MGF)

If X and Y are independent, then MX+Y (t) ​ = MX (t) ⋅ MY (t)

5. Weak Law of Large Numbers (WLLN)

Proof uses Chebyshev’s Inequality:

6. Central Limit Theorem (CLT)

Unit 3: Methods of Estimation

1. Difference Between Likelihood and Probability

Let θ^ be an estimator of the parameter θ :

Unbiasedness: E[θ^] = θ. The estimator is correct on average.

Provided PY (y) > 0.

If X and Y are independent, then MX+Y (t) = MX (t) ⋅ MY (t)

Take partial derivatives of S w.r.t. β0 , β1 , set to zero and solve: