0% found this document useful (0 votes)
27 views28 pages

Advanced Probability & Statistics - 23CST-286

The document provides comprehensive notes and questions on Advanced Probability and Statistics, covering topics such as random variables, distribution functions, joint probability, expectation, moments, and the law of large numbers. It includes definitions, properties, examples, and formulas for both discrete and continuous random variables. Additionally, it discusses concepts like correlation, conditional expectations, and moment-generating functions.

Uploaded by

Shubham Upadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views28 pages

Advanced Probability & Statistics - 23CST-286

The document provides comprehensive notes and questions on Advanced Probability and Statistics, covering topics such as random variables, distribution functions, joint probability, expectation, moments, and the law of large numbers. It includes definitions, properties, examples, and formulas for both discrete and continuous random variables. Additionally, it discusses concepts like correlation, conditional expectations, and moment-generating functions.

Uploaded by

Shubham Upadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Advanced Probability & Statistics

(23CST-281)

ALL UNITS - NOTES & QUESTIONS

​ Compiled by : Subhayu

Contents :
Unit 1……………………………………………………………………………………………………………………………………..
Unit 2…………………………………………………………………………………………………………………………………….
Unit 3……………………………………………………………………………………………………………………………………..
MST 1 and 2 solutions………………………………………………………………………………………………………
Sample Questions…………………………………………………..…………………………………………………………..
Advanced Probability & Statistics (23CST-286)
Unit 1: Random Variable and Distribution Function

Random Variable
A random variable is a function that maps each outcome in a sample space to a real number.
It provides a numerical description of the outcome of a statistical experiment.

Types of Random Variables:

1. Discrete Random Variable: Takes on a countable number of distinct values.


Example: Number of heads in 3 coin tosses. Possible values: 0, 1, 2, 3.

2. Continuous Random Variable: Takes on infinitely many values in a continuous range.


Example: The height of students in a class.

Distribution Function (Cumulative Distribution Function – CDF)


For a random variable X , the distribution function is defined as:
FX (x) = P (X ≤ x)

Properties of CDF:

0 ≤ FX (x) ≤ 1 for all x


Non-decreasing

limx→−∞ FX (x) = 0
​ ​

limx→∞ FX (x) = 1
​ ​

Right continuous

For discrete variables, the CDF is a step function.


For continuous variables, the CDF is a smooth function.

2D Joint Probability Mass Function (Joint PMF)


For discrete random variables X and Y , the joint PMF is:
p(x, y) = P (X = x, Y = y)
It defines the probability that X takes the value x and Y takes the value y simultaneously.

Example Table:

Y=0 Y=1

X=0 0.2 0.1

1/27
Y=0 Y=1

X=1 0.3 0.4

Here, p(0, 0) = 0.2, p(1, 1) = 0.4, etc.


Marginal Probability Function
The marginal PMF is obtained by summing the joint PMF over the values of the other
variable.

For X :
PX (x) = ∑y P (X = x, Y = y)
​ ​

For Y :
PY (y) = ∑x P (X = x, Y = y)
​ ​

Using the example:

PX (0) = 0.2 + 0.1 = 0.3


PX (1) = 0.3 + 0.4 = 0.7


PY (0) = 0.2 + 0.3 = 0.5


PY (1) = 0.1 + 0.4 = 0.5


Conditional Probability Function


The conditional probability of X given Y = y:
P (X=x,Y =y)
P (X = x ∣ Y = y) = PY (y)

Provided PY (y) ​ > 0.


Joint and Marginal Probability Distribution Function
For discrete variables:

Joint Distribution Function:


F (x, y) = P (X ≤ x, Y ≤ y)
Marginal Distribution Functions are derived from the joint function as described above.

Joint Density Function (for Continuous Variables)


If X and Y are continuous random variables, their joint probability density function (joint
PDF) is denoted as f (x, y), and:
P ((X, Y ) ∈ A) = ∬A f (x, y) dx dy ​

The marginal PDFs are:



fX (x) = ∫−∞ f (x, y) dy
​ ​

2/27

fY (y) = ∫−∞ f (x, y) dx
​ ​

Properties:

f (x, y) ≥ 0 for all x, y



∬−∞ f (x, y) dx dy = 1

Example (Continuous):
If f (x, y)
= 2 for 0 < x < 1, 0 < y < 1, and 0 elsewhere,
1
then fX (x) = ∫0 2 dy = 2 for 0 < x < 1.
​ ​

This example is a uniform distribution over the unit square.

Marginal Density Function


Given a joint probability density function f (x, y) of two continuous random variables X
and Y , the marginal density function of one variable is obtained by integrating out the other.

Marginal PDF of X :

fX (x) = ∫−∞ f (x, y) dy
​ ​

Marginal PDF of Y :

fY (y) = ∫−∞ f (x, y) dx
​ ​

This gives the individual distribution of each variable regardless of the other.

Conditional Distribution Function and Conditional Probability Density Function


The conditional density function describes the distribution of one variable given that the
other variable has a fixed value.

Conditional PDF of X given Y = y:


f (x,y)
fX∣Y (x∣y) = ​

fY (y)


if fY (y) > 0

Conditional PDF of Y given X = x:


f (x,y)
fY ∣X (y∣x) = ​

fX (x)


if fX (x) > 0

Conditional Distribution Function (CDF) of X given Y = y:


x
FX∣Y (x∣y) = P (X ≤ x ∣ Y = y) = ∫
​ ​
fX∣Y (t∣y) dt

−∞

Independent Random Variables


Two random variables X and Y are independent if their joint density is the product of their
marginals:

f (x, y) = fX (x) ⋅ fY (y)


​ ​

3/27
For discrete random variables:

P (X = x, Y = y) = P (X = x) ⋅ P (Y = y)

Independence implies that knowing the value of one variable gives no information about the
other.

Bivariate Distribution
A bivariate distribution describes the joint distribution of two random variables X and Y . It
can be expressed through:

Joint PMF (for discrete variables): p(x, y) = P (X = x, Y = y)


Joint PDF (for continuous variables): f (x, y)

The bivariate distribution includes all information about the behavior and relationship
between two variables.

Example (Discrete Case):


Let X = number of heads in 2 tosses, Y = number of tails.
The joint PMF will include probabilities for combinations like (1,1), (2,0), (0,2), etc.

Correlation
Correlation measures the strength and direction of a linear relationship between two
random variables.

Let X and Y be two random variables.


Let μX ​ = E(X), μY = E(Y ), σX = SD(X), σY = SD(Y )
​ ​ ​

Then, the covariance between X and Y is:

Cov(X, Y ) = E[(X − μX )(Y − μY )] = E(XY ) − E(X)E(Y )


​ ​

Karl Pearson Coefficient of Correlation (r):

Cov(X, Y )
rXY =
​ ​

σX σY ​ ​

Properties:

4/27
−1 ≤ rXY ≤ 1

r = 1: perfect positive correlation


r = −1: perfect negative correlation
r = 0: no linear correlation (but variables may still be dependent)

Example:
If X is the number of study hours and Y is exam score, a positive correlation is expected
(higher study hours lead to higher scores).

Correlation only captures linear dependency. Nonlinear relationships may exist even when
correlation is zero.

Unit 2: Expectation, Moments and Law of Large Numbers

1. Transformation of One- and Two-Dimensional Random Variables

Transformation refers to deriving the distribution of a function of one or more random


variables.

For One Random Variable (Univariate Case):


Let X be a random variable with known probability distribution and Y = g(X) is a
transformation.
If g is monotonic and differentiable, then the probability density function (PDF) of Y is given
by:

dx
fY (y) = fX (x)
​ ​ ​ , where x = g −1 (y)
​ ​

dy

Example:
Let X ∼ U (0, 1), and let Y = X . ​

Then X = Y 2 ⇒ fY (y) = fX (y 2 ) ⋅ ∣2y∣ = 1 ⋅ 2y = 2y, 0 ≤ y ≤ 1


​ ​

For Two Random Variables (Bivariate Case):


Let X and Y be two continuous random variables and U = g(X, Y ), V = h(X, Y ).
The joint PDF of U and V is found using the Jacobian determinant:

−1
fU ,V (u, v) = fX,Y (x, y) ⋅ ∣J ∣
​ ​

where

∂(x, y)
J=
∂(u, v)
​ ​ ​

5/27
2. Distribution of the Difference, Product, Quotient of Two Random Variables

Difference: If Z = X − Y , and X and Y are independent,



fZ (z) = ∫
​ fX (x)fY (x − z)dx
​ ​

−∞

Product: For Z = XY , the PDF is more complex and often solved using transformation
techniques or convolution in log-domain.

Quotient: If Z = X
Y
​, the density function is:

fZ (z) = ∫
​ ​ ∣y∣fX (zy)fY (y)dy ​ ​

−∞

3. Mathematical Expectation of a Random Variable

Definition:
The expected value or mean of a random variable X gives the average value in the long run.

For discrete X :

E[X] = ∑ xP (X = x)

For continuous X :

E[X] = ∫ ​ xfX (x)dx

−∞

Properties:

Linearity: E[aX + b] = aE[X] + b


Expectation of a function: E[g(X)] = ∑x g(x)P (X = x) or ∫ g(x)f (x)dx

4. Moments

Moments describe the shape characteristics of a probability distribution.

6/27
r-th moment about origin:

μ′r = E[X r ]

r-th central moment:

μr = E[(X − μ)r ]

Special Cases:

μ′1 = E[X] (Mean)


μ2 = Variance = E[(X − μ)2 ]


μ3 relates to skewness

μ4 relates to kurtosis

5. Moments of Bivariate Probability Distribution

Given two random variables X and Y :

Joint expectation:

E[XY ] = ∑ ∑ xy ⋅ P (X = x, Y = y ) (discrete) or ∫∫ xyfX,Y (x, y )dxdy


​ ​ ​

x y

Covariance:

Cov(X, Y ) = E[XY ] − E[X]E[Y ]

Correlation coefficient (ρ):

Cov(X, Y )
ρXY = ​ ​

σX σY
​ ​

If ρ = 0, the variables are uncorrelated.

6. Law of Large Numbers (LLN)

7/27
Definition:
The law states that as the number of trials increases, the sample mean approaches the
population mean.

Weak Law (Chebyshev’s form):


For i.i.d. random variables X1 , X2 , … , Xn with mean μ,
​ ​ ​

P ( ∑ Xi − μ ≥ ϵ) → 0 as n → ∞
n
1
​ ​ ​ ​ ​

n
i=1

Strong Law:
The sample average converges to the population mean almost surely (with probability
1).

Implication:
Justifies using sample averages to estimate expected values in practical applications.

Unit 2: Expectation, Moments and Law of Large Numbers (Contd)

1. Conditional Expectation

Discrete Case:
Let X and Y be discrete random variables. The conditional expectation of X given Y = y is:

E[X∣Y = y] = ∑ x ⋅ P (X = x∣Y = y) ​

The function E[X∣Y ] is a random variable that depends on Y .

Continuous Case:
If X and Y are continuous random variables with joint density function fX,Y (x, y) and ​

marginal fY (y), then:



E[X∣Y = y] = ∫ ​ x ⋅ fX∣Y (x∣y)dx

−∞
fX,Y (x,y)
where fX∣Y (x∣y) =

fY (y)
​ ​

2. Conditional Variance

8/27
Definition:
The conditional variance of X given Y = y is the variance of X when Y = y is known.

Var(X∣Y = y) = E[(X − E[X∣Y = y])2 ∣Y = y]

For continuous case:



Var(X∣Y = y) = ∫ (x − E[X∣Y = y])2 fX∣Y (x∣y)dx
​ ​

−∞

Law of Total Expectation:

E[X] = E[E[X∣Y ]]

Law of Total Variance:

Var(X) = E[Var(X∣Y )] + Var(E[X∣Y ])

3. Moment Generating Functions (MGF)

Definition:
The moment generating function of a random variable X is defined as:

MX (t) = E[etX ]

For discrete:

MX (t) = ∑ etx P (X = x)
​ ​

For continuous:

MX (t) = ∫
​ ​ etx fX (x)dx
−∞

Properties:
(n)
MX (0) = E[X n ], i.e., the nth derivative of the MGF at t = 0 gives the nth moment.

If X and Y are independent, then MX+Y (t) ​ = MX (t) ⋅ MY (t)


​ ​

9/27
4. Chebyshev’s Inequality

Statement:
For any random variable X with finite mean μ and variance σ 2 , and for any k > 0:
1
P (∣X − μ∣ ≥ kσ) ≤
k2

This inequality gives an upper bound on the probability that the value of a random variable
deviates from its mean by more than k standard deviations.

Use:
Chebyshev’s inequality is used to prove the Weak Law of Large Numbers and to give non-
parametric bounds on dispersion.

5. Weak Law of Large Numbers (WLLN)

Statement:
Let X1 , X2 , ..., Xn be i.i.d. random variables with finite mean μ. Then the sample mean
​ ​

ˉn =
X 1 n
∑i=1 Xi satisfies:
n
​ ​ ​ ​


ˉ n − μ∣ ≥ ϵ) = 0 for any ϵ > 0
lim P (∣X ​

n→∞

This means the sample average converges in probability to the expected value.

Proof uses Chebyshev’s Inequality:

ˉ σ2
P (∣Xn − μ∣ ≥ ϵ) ≤ 2 → 0 as n → ∞
​ ​

6. Central Limit Theorem (CLT)

Statement:
Let X1 , X2 , ..., Xn be i.i.d. random variables with mean μ and variance σ 2 . Then, as n
​ ​
→∞
, the standardized sum:
n
∑ Xi − nμ
Zn = i=1 → N (0, 1)
​ ​

​ ​

σ n ​

10/27
In other words, the distribution of Zn tends to the standard normal distribution.

Implication:
No matter the original distribution of Xi , the sample sum or mean becomes approximately

normal for large n. This is the basis of many statistical inference techniques.

Example:
If tossing a biased coin n times (say probability of head = 0.6), the distribution of number of
heads (a binomial variable) will resemble a normal curve as n becomes large.

Unit 3: Methods of Estimation

1. Difference Between Likelihood and Probability

Probability is used when the parameters are known, and we compute the likelihood of
observing a particular data point or set of data.
Example: If we know a coin is fair (P (H) = 0.5), the probability of getting 2 heads in 2
tosses is 0.5 × 0.5 = 0.25.
Likelihood is used when data is observed, and we want to estimate the unknown
parameters that best explain the data.
Example: If we toss a coin twice and get two heads, the likelihood of different values of p
(probability of heads) can be evaluated by L(p) = p2 . The value of p that maximizes this
likelihood is taken as the estimate.

2. Parameter Space

The parameter space is the set of all possible values that a parameter can take.

For example, if p is the probability of success in a Bernoulli trial, then the parameter
space is 0 ≤ p ≤ 1.
For a normal distribution N (μ, σ 2 ), the parameter space is μ ∈ R, σ 2 > 0.

3. Characteristics of Estimators

Let θ^ be an estimator of the parameter θ :

Unbiasedness: E[θ^] = θ. The estimator is correct on average.

11/27
Consistency: θ^ → θ in probability as sample size n → ∞.
Efficiency: Among all unbiased estimators, the one with the smallest variance is called
efficient.

Sufficiency: An estimator is sufficient if it uses all the information in the sample about
the parameter.

Minimum Variance Unbiased Estimator (MVUE): An unbiased estimator that has the
smallest variance among all unbiased estimators.

4. Method of Maximum Likelihood Estimation (MLE)

A general method of estimating parameters by maximizing the likelihood function.

Given sample X1 , X2 , ..., Xn and probability density function f (x; θ), the likelihood is:
​ ​ ​

n
L(θ) = ∏ f (Xi ; θ) ​ ​

i=1

Often the log-likelihood is used for simplification:


n
log L(θ) = ∑ log f (Xi ; θ) ​ ​

i=1

Set derivative of log-likelihood to zero:

d
log L(θ) = 0


Solve for θ to get the MLE.

Example: For a normal distribution N (μ, σ 2 ) with known variance:


n
1 (Xi − μ)2
L(μ) = ∏ exp (− )

2πσ 2 2σ 2
​ ​ ​

i=1

Taking log and maximizing leads to:

1
^M LE =
μ ​ ​ ​
∑ Xi ​

12/27
5. Method of Minimum Variance

Aims to find an unbiased estimator θ^ such that:

Var(θ^) is minimized among all unbiased estimators


Rao-Blackwell Theorem is often used to derive MVUE using a sufficient statistic.

6. Method of Moments (MoM)

Based on the idea that population moments can be estimated using sample moments.

If μ′1 , μ′2 , ..., μ′k are the first k population moments expressed as functions of the
​ ​ ​

parameters θ1 , θ2 , ..., θk , then equating:


​ ​ ​

n
1
μ′j = ∑ Xi
j
​ ​ ​ ​ for j = 1, 2, ..., k
n
i=1

Solve the equations to get estimates of parameters.

Example: For an exponential distribution f (x; λ) = λe−λx , the first moment is μ = 1/λ.
ˉ , we equate:
Using sample mean X

ˉ = 1 ⇒λ
X ^= 1
^
λ ˉ
X
​ ​

7. Method of Least Squares

Commonly used in regression analysis where the aim is to minimize the sum of squared
deviations between observed and predicted values.

For linear model Y = β0 + β1 X + ϵ, define the residual sum of squares:


​ ​

n
S(β0 , β1 ) = ∑(Yi − β0 − β1 Xi )2
​ ​ ​ ​ ​ ​ ​

i=1

Take partial derivatives of S w.r.t. β0 , β1 , set to zero and solve: ​ ​

ˉ )(Yi − Yˉ )
∑(Xi − X
β^1 = , β^0 = Yˉ − β^1 X
ˉ
​ ​

​ ​

∑(Xi − X ˉ )2 ​
​ ​ ​ ​ ​

13/27
This method ensures the best linear unbiased estimates (BLUE) under standard assumptions.

Unit 3: Methods of Estimation (Continued)

Sampling

Sampling is the process of selecting a subset of individuals or observations from a larger


population to estimate characteristics of the whole population. Since it is often impractical or
expensive to collect data from every member of a population, sampling provides a practical
way to draw inferences about the population.

Types of Sampling

1. Random Sampling (Simple Random Sampling):


Every unit of the population has an equal chance of being selected. Selection is entirely
by chance.
Example: Picking names from a hat.

2. Stratified Sampling:
The population is divided into subgroups (strata) based on a particular characteristic (like
age, gender), and samples are randomly drawn from each subgroup.
Example: Surveying both male and female participants separately.

3. Systematic Sampling:
Every kth unit from a list is selected after a random starting point.
Example: Selecting every 10th student from a roll list.

4. Cluster Sampling:
The population is divided into clusters (groups), some clusters are randomly selected,
and then all or some elements within those clusters are studied.
Example: Selecting a few classrooms and interviewing all students within those classes.

5. Convenience Sampling:
Sample is taken from a group that is easy to access. This is non-probabilistic and may
involve biases.
Example: Interviewing people in a nearby park.

6. Quota Sampling:
The population is segmented, and a quota is set for each segment. Within each quota,
sampling is done conveniently or randomly.
Example: Choosing 10 people from each age group for a survey.

14/27
Algorithms Using Regression

1. Gradient Descent Algorithm

Gradient Descent is an optimization algorithm used to minimize the cost function in


regression and other learning algorithms.

The idea is to update parameters (e.g., weights θ ) in the opposite direction of the
gradient of the cost function with respect to the parameters.

Cost Function (for linear regression):


m
1
J(θ) = ∑(hθ (x(i) ) − y (i) )2
2m
​ ​ ​

i=1

where hθ (x)
​ = θ0 + θ1 x
​ ​

Update Rule:

∂J(θ)
θj := θj − α
∂θj
​ ​ ​

where α is the learning rate.

Example: For linear regression with one feature:


m
1
θ0 := θ0 − α ∑(hθ (x(i) ) − y (i) )
​ ​ ​ ​ ​

m
i=1
m
1
θ1 := θ1 − α ∑(hθ (x(i) ) − y (i) )x(i)
​ ​ ​ ​ ​

m
i=1

2. Locally Weighted Regression (LWR)

Locally Weighted Regression is a non-parametric algorithm that fits multiple linear models
locally to subsets of the data.

Instead of using a single global model for the entire dataset, LWR fits a model at a target
query point x using nearby training data.

A weight is assigned to each training example based on its distance from x.


A common weighting function is:

()

15/27
(x(i) − x)2
w (i)
= exp (− )
2τ 2

where τ controls the decay rate (bandwidth parameter).

The cost function becomes:


m
J(θ) = ∑ w(i) (hθ (x(i) ) − y (i) )2
​ ​

i=1

A separate θ is computed for each query point using weighted linear regression.

LWR is computationally expensive at test time but offers flexibility and good local fitting.

3. Logistic Regression

Logistic Regression is used for binary classification problems, where the output is 0 or 1.

The hypothesis is:

1
hθ (x) =
1 + e−θ x

T ​

This is a sigmoid function that maps any real-valued number into the (0,1) interval.

Cost Function:
m
1
J(θ) = − ∑ [y (i) log(hθ (x(i) )) + (1 − y (i) ) log(1 − hθ (x(i) ))]
​ ​ ​ ​

m i=1

The cost function is convex, allowing gradient descent to converge to the global
minimum.

Gradient Descent Update:


m
1
θ := θ − α ⋅ ∑(hθ (x(i) ) − y (i) )x(i)
​ ​ ​

m
i=1

Logistic regression is interpretable and works well when the data is linearly separable.
For multi-class classification, softmax regression (multinomial logistic regression) is
used.

MST 1 Solutions

16/27
1. Define a random variable and its types

A random variable (RV) is a function that assigns a real number to each outcome in a sample
space of a random experiment.

Types of Random Variables:

Discrete Random Variable: Takes countable values.


Example: Number of heads in 3 coin tosses (0,1,2,3).

Continuous Random Variable: Takes any value in a given interval (infinite uncountable
set).
Example: Time taken to run a race.

2. Compute the value of k for the following probability distribution of a random variable
X, P(X = 1) = 0.5, P(X = 2) = 0.3 and P(X = 3) = k

We know that the sum of all probabilities in a probability distribution must equal 1.

P (X = 1) + P (X = 2) + P (X = 3) = 1

0.5 + 0.3 + k = 1

k = 1 − 0.8 = 0.2

Answer: k = 0.2

3. Define the term correlation coefficient

The correlation coefficient (specifically, Pearson’s correlation coefficient, denoted by r )


measures the strength and direction of the linear relationship between two variables.

It is defined as:

Cov(X, Y )
r= ​

σX σY​ ​

Where:

17/27
Cov(X, Y) is the covariance of X and Y.

σX , σY are standard deviations of X and Y.


​ ​

Range: −1 ≤r≤1
r = 1: Perfect positive linear correlation
r = −1: Perfect negative linear correlation
r = 0: No linear correlation

4. Explain a cumulative distribution function (CDF) and provide an example

A Cumulative Distribution Function (CDF) of a random variable X gives the probability that
X will take a value less than or equal to x:

F (x) = P (X ≤ x)

Properties of CDF:

Non-decreasing

limx→−∞ F (x) = 0​

limx→∞ F (x) = 1

Example (for discrete RV):


Let P (X = 1) = 0.2, P (X = 2) = 0.3, P (X = 3) = 0.5
Then:

F (1) = P (X ≤ 1) = 0.2

F (2) = P (X ≤ 2) = 0.2 + 0.3 = 0.5

F (3) = P (X ≤ 3) = 0.2 + 0.3 + 0.5 = 1

5. Explain the concept of independence for two random variables X and Y

Two random variables X and Y are independent if the occurrence of one does not affect
the probability distribution of the other.

18/27
Mathematical condition:
For all values of x and y ,

P (X = x and Y = y) = P (X = x) ⋅ P (Y = y)

Example:
Let P (X = 1) = 0.5, P (Y = 1) = 0.4, and P (X = 1, Y = 1) = 0.2
Then X and Y are independent if

P (X = 1, Y = 1) = 0.5 ⋅ 0.4 = 0.2 ⇒ Yes, they are independent

If this condition fails for even one pair, X and Y are not independent.

2/2
6. Calculate the Pearson correlation coefficient for the following data
X (Father's height in inches): 65, 66, 67, 67, 68, 69, 70, 72
Y (Son's height in inches): 61, 68, 65, 68, 72, 72, 64, 71

We use the Pearson correlation coefficient formula:


n ∑ XY −∑ X ∑ Y
r=
[n ∑ X 2 −(∑ X)2 ][n ∑ Y 2 −(∑ Y )2 ]

Step 1: Compute required values

X Y X² Y² XY

65 61 4225 3721 3965

66 68 4356 4624 4488

67 65 4489 4225 4355

67 68 4489 4624 4556

68 72 4624 5184 4896

69 72 4761 5184 4968

70 64 4900 4096 4480

72 71 5184 5041 5112

ΣX=544 ΣY=541 ΣX²=37028 ΣY²=36699 ΣXY=36820

19/27
Step 2: Plug into formula

8(36820) − (544)(541)
r=
[8(37028) − 5442 ][8(36699) − 5412 ]

294560 − 294304
=
[296224 − 295936][293592 − 292681]

256 256 256


= = ≈ ≈ 0.500
288 ⋅ 911 262368 512.18
​ ​ ​

​ ​

r ≈ 0.500 — moderate positive correlation.

7. The length of time (in minutes) that a certain lady speaks on a telephone is found to be
random phenomenon with a probability function specified by the pdf

f (x) = Ae−x/5 , x ≥ 0; f (x) = 0 otherwise

(a) Determine the value of A that makes f(x) a valid pdf



∫ ​ Ae−x/5 dx = 1
0 ∞
A∫ ​ e−x/5 dx = 1
0
A ⋅ [−5e−x/5 ]∞
0 =1 ​

1
A ⋅ (0 + 5) = 1 ⇒ A = = 0.2
5

A = 0.2

(b) Calculate the probability that the number of minutes she will talk over phone is more
than 10 minutes

P (X > 10) = ∫ ​ 0.2e−x/5 dx = 0.2 ⋅ [−5e−x/5 ]∞
10 ​

10
= 0.2 ⋅ [0 + 5e−2 ] = 0.2 ⋅ 5 ⋅ e−2 = e−2 ≈ 0.1353

P(X > 10) ≈ 0.1353

MST 2 solutions

20/27
1. Define the probability density function (pdf) of a random variable Y when Y = g(X),
where g(X) is a continuous function.
Solution:
If Y = g(X) and g is a continuous and differentiable function with a strictly monotonic
inverse g −1 (y), then the probability density function (pdf) of Y is given by:

d −1
fY (y) = fX (g −1 (y)) ⋅
​ ​ ​ ​g (y) ​

dy

Here, fX (x) is the pdf of the original variable X , and this formula is valid where g −1 (y)

exists and is differentiable.

2. State the formula for the expectation of the product of two independent random
variables.
Solution:
If X and Y are two independent random variables, then:

E[XY ] = E[X] ⋅ E[Y ]

3. Describe the process of finding the expected value of a discrete random variable X
given that X takes values 0, 1, and 2 with probabilities 0.3, 0.4, and 0.3 respectively.
Solution:
The expected value E[X] of a discrete random variable is calculated as:

E[X] = ∑ xi ⋅ P (xi )
​ ​

Here:

E[X] = 0 ⋅ 0.3 + 1 ⋅ 0.4 + 2 ⋅ 0.3 = 0 + 0.4 + 0.6 = 1.0

Expected value E[X] = 1.0

21/27
4. Define the moment generating function (mgf) of a random variable.
Solution:
The moment generating function (MGF) of a random variable X is defined as:

MX (t) = E[etX ]

It is used to generate the moments (like mean, variance) of the distribution by differentiating
the MGF with respect to t and evaluating at t = 0. That is:
(n)
E[X n ] = MX (0) ​

if the MGF exists in an open interval around t = 0.

5. Compute the mgf of a continuous random variable with the probability density
function fX (x)
​ = 2x for 0 ≤ x ≤ 1.
Solution:
The moment generating function is given by:

1
MX (t) = ∫​ ​ etx ⋅ 2x dx
0

To solve this, use integration by parts. Let:

u = 2x ⇒ du = 2dx
etx
dv = etx dx ⇒ v = t

Then,

1 1
2x ⋅ etx 2etx
MX (t) = [ ​ ] −∫ dx ​ ​ ​ ​

t 0 0 t
1
2et 2 2e t
2
= − ⋅ ∫ etx dx =
​ ​
− 2 (et − 1)
​ ​ ​

t t 0 t t
2et 2(et − 1)
MX (t) = −
t2
​ ​ ​

t
This is the MGF of X for the given pdf.

6. Let X be a random variable with the following probability distribution:

X : −3, 6, 9

22/27
1 1 1
P (X = x) : , ,
6 2 3
​ ​ ​

Calculate E(X), E(X 2 ) and using the laws of expectation, evaluate E((2X + 1)2 ) for
the given probability distribution of X .

Solution:

Step 1: Compute E(X)

E(X) = ∑ x ⋅ P (X = x)
1 1 1
= (−3) ⋅ + 6 ⋅ + 9 ⋅
6 2 3
​ ​ ​

= −0.5 + 3 + 3 = 5.5

Step 2: Compute E(X 2 )

E(X 2 ) = ∑ x2 ⋅ P (X = x)
1 1 1
= (−3)2 ⋅ + 62 ⋅ + 92 ⋅
6 2 3
​ ​ ​

1 1 1
= 9 ⋅ + 36 ⋅ + 81 ⋅
6 2 3
​ ​ ​

= 1.5 + 18 + 27 = 46.5

Step 3: Evaluate E((2X + 1)2 )

(2X + 1)2 = 4X 2 + 4X + 1

E((2X + 1)2 ) = E(4X 2 + 4X + 1) = 4E(X 2 ) + 4E(X) + 1


= 4(46.5) + 4(5.5) + 1 = 186 + 22 + 1 = 209

Final Answers:

E(X) = 5.5
E(X 2 ) = 46.5
E((2X + 1)2 ) = 209

23/27
7. Compute the Moment Generating Function (MGF) about origin of a Normal
Distribution.

Let X ∼ N (μ, σ 2 ). We need to find the moment generating function (MGF), defined by:

MX (t) = E[etX ]

Solution:

Step 1: Start with the definition

Let X be a continuous random variable with PDF:

1 (x − μ)2
f (x) = exp (− )
2πσ 2 2σ 2
​ ​

The MGF is given by:


∞ ∞
1 (x − μ)2
MX (t) = ∫ e f (x) dx = ∫
tx
e ⋅
tx
⋅ exp (− ) dx
2πσ 2 2σ 2
​ ​ ​ ​ ​

−∞ −∞ ​

Step 2: Combine exponentials



1 (x − μ)2
MX (t) = ∫ ⋅ exp (tx − ) dx
2πσ 2 2σ 2
​ ​ ​ ​

−∞ ​

We use the known result after completing the square:

Step 3: Final simplified expression

The moment generating function of X ∼ N (μ, σ 2 ) is:

1
MX (t) = exp (μt + σ 2 t2 )
2
​ ​

This formula is valid for all real t ∈ R.

24/27
Final Answer:

1
MX (t) = exp (μt + σ 2 t2 )
2
​ ​

Sample Questions for Unit 3

2-Marks Questions (12)

1. Define sampling and list two common types.

2. State the difference between likelihood and probability.

3. Define parameter space with an example.

4. List two properties of an unbiased estimator.

5. Write the general formula for the Maximum Likelihood Estimator (MLE).

6. Define the method of moments.

7. What is the method of least squares?

8. A sample contains values: 4, 7, 9. Estimate the population mean using the method of
moments.

9. In simple random sampling, what is the probability that the first item selected is the
largest in the population of size n?

10. Define logistic regression.

11. Write one key difference between gradient descent and locally weighted regression.

12. What does the term “independent identically distributed” (i.i.d) mean?

5-Marks Questions (6)

1. Derive the method of moments estimators for the parameters of an exponential


distribution.

25/27
2. A sample of 5 values: 2, 3, 5, 7, 9. Estimate the population variance using the method of
moments.

3. Find the MLE of p for a binomial distribution based on the observation: 3 successes in 5
trials.

4. Use gradient descent (one iteration) to update weights for minimizing the function
J(θ) = (θx − y)2 , where x = 2, y = 5, θ = 1, and learning rate α = 0.1.
5. Describe stratified and systematic sampling with appropriate examples.

6. Consider a regression model Y = a + bX . Given data:


X = [1, 2, 3], Y = [2, 4, 5],
find the best-fit line using the method of least squares.

10-Marks Questions (6)

1. A random sample of size 10 from a normal distribution yielded the following values:
6, 8, 9, 10, 7, 9, 11, 12, 8, 10.
(a) Estimate the population mean and variance using the method of moments.
(b) Derive the maximum likelihood estimators for the same parameters.

2. Perform two iterations of gradient descent for minimizing the cost function
1 m
J(θ) = ∑i=1 (θxi − yi )2
m
​ ​ ​ ​

using data: (x1 = 1, y1 = 2), (x2 = 2, y2 = 3), initial θ = 0, learning rate α = 0.1.
​ ​ ​ ​

3. For the logistic regression hypothesis function


1
hθ (x) =

1+e−θT x
, ​

derive the cost function and explain how gradient descent is used to minimize it.

4. A sample of data is: X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5].


Fit a regression line using the least squares method and compute the predicted value of
Y when X = 6.
5. Let X1 , X2 , … , Xn be a random sample from a distribution with pdf
​ ​ ​

f (x; λ) = λe−λx , x ≥ 0.
Derive the MLE for λ based on a sample of size n.

6. Explain in detail the various types of sampling methods:


(a) Simple random sampling
(b) Stratified sampling

26/27
(c) Cluster sampling
(d) Systematic sampling
Include relevant examples and illustrations.

27/27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy