Section 4
Section 4
1. Independent Variables
The idea of independence between two events or two variables is a very important one
in Statistics, Econometrics and Finance. We can first consider the following – which
1
So now we see “two events 𝐴 and 𝐵 are independent” does not mean 𝐴 and 𝐵
cannot happen at the same time. Consider the example: 𝐴 is the event {𝑀𝑜𝑛𝑑𝑎𝑦} and
𝐵 the event of {𝑟𝑎𝑖𝑛𝑦 𝑑𝑎𝑦} . We can agree that the two events are independent,
because there is no scientific reason at all to suggest that Monday tends to have some
specific weather conditions. But they certainly can happen at the same time, that is,
In fact, when two events cannot happen at the same time, or we say the two events
have no intersection, or their intersection is a null set {∅} , such events are called
mutually exclusive. In other words, if 𝐴 occurs then 𝐵 cannot occur, and vice versa.
In this case, 𝐴 and 𝐵 are actually dependent events. This is because the idea of
about 𝐴 has no value at all (is useless, not helpful) to evaluate the probability of
occurrence of 𝐵.”
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) = = 𝑃(𝐴)
𝑃(𝐵)
And therefore,
For example, we toss a coin twice, and the results of the first and second tosses are, of
2
course, independent. Therefore, the probability of seeing two heads {𝐻𝐻} is equal to
(1/2)(1/2) = 1/4.
And this result suggests the joint density is the product of two marginal densities:
(b) Two variables 𝑔(𝑋) and ℎ(𝑌) are also independent for all functions 𝑔 and ℎ
1
See the review in Chapter 2 of Taylor (2005).
3
∞ ∞ ∞ ∞
𝐸[𝑋𝑌] = ∫ ∫ 𝑥𝑦𝑓(𝑥, 𝑦) d𝑦d𝑥 = ∫ ∫ 𝑥𝑦𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) d𝑦d𝑥
−∞ −∞ −∞ −∞
∞ ∞
= ∫ 𝑥𝑓𝑋 (𝑥) ∫ 𝑦𝑓𝑌 (𝑦) d𝑦d𝑥 = 𝐸[𝑋]𝐸[𝑌]
−∞ −∞
However, when Cov(𝑋, 𝑌) = 𝜌𝑋,𝑌 = 0 , this does NOT mean 𝑋 and 𝑌 are
The only instance that when Cov(𝑋, 𝑌) = 𝜌𝑋,𝑌 = 0 and then 𝑋 and 𝑌 are
4
2. The Bernoulli and Binomial Distribution
Let 𝑋 be a random variable which can only take two values 0 and 1. Suppose the
, for 𝑥 = 0, 1. This random variable 𝑋 is like the outcome from tossing a coin once,
and we can define “head” = 1 and “tail” = 0. So 𝑝 is the probability of seeing “head”,
𝐸[𝑋] = 1 ∙ 𝑝 + 0 ∙ (1 − 𝑝) = 𝑝,
5
When we repeat the Bernoulli trials for several times, say we toss a coin for 𝑛
times, then the number of success 𝑋 can be 0, 1, 2, …, 𝑛 . Note that each trial is
independent, like tossing a fair coin in defining the Bernoulli distribution. In this setup,
we say 𝑋 has a Binomial distribution and is written as 𝑋 ~ 𝐵𝑖𝑛(𝑛, 𝑝). The pmf of
𝑋 ~ 𝐵𝑖𝑛(𝑛, 𝑝) is:
, for 𝑥 = 0, 1, … , 𝑛. We have the factor 𝐶𝑥𝑛 because we do not care the order of 𝑥
successes out of the 𝑛 trials. We can think of this like in a baseball season, a team has
𝐸[𝑋] = 𝑛𝑝
We can see that indeed when 𝑛 = 1, the results are the same as 𝑋 ~ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝).
The Binomial distribution or, very often the binomial expansion, has many
important applications. For example, we use a binomial tree model for stock price to
distribution. Last but not the least, Binomial distribution is related to both normal
➢ Exercise
(1) Let 𝑋 be the number of heads (success) out of 𝑛 = 7 tosses of a coin. Write down
(2) Suppose 𝑋 ~ 𝐵𝑖𝑛(𝑛, 1/3) what is the smallest value of 𝑛 such that the
7
3. The Poisson Distribution
The Poisson distribution, like Bernoulli and Binomial distributions, belongs to discrete-
valued distributions. Its creation can be derived from the following infinite series:
𝜆2 𝜆3 ∞ 𝜆𝑥
1+𝜆+ + +⋯=∑
2! 3! 𝑥=0 𝑥!
From Calculus, we know that the above sum converges to the limit2:
𝜆2 𝜆3 ∞ 𝜆𝑥
1+𝜆+ + +⋯=∑ = 𝑒𝜆 (8)
2! 3! 𝑥=0 𝑥!
For more about the exponential function 𝑓(𝜆) = 𝑒 𝜆 , 𝜆 ∈ ℝ, please see Appendix II.
𝑥 = 0, 1, 2, 3, …
Secondly, 𝑋 is the number of occurrences of some events in a unit time interval3. For
example, the event can be earthquake and we are interested to know the number of
earthquakes in one year. As another example, the event can be “a customer walking into
a bank (or McDonald’s or Family Store)” and we would like to know how many
customers showing up in a bank during one hour. The number can be 8, or 13 or any
2
Indeed, (8) is the Taylor series expansion of the function 𝑓(𝑥) = 𝑒 𝑥 .
3
It can be defined over space, or a unit area, too.
8
A very important property of Poisson distribution is, over any two time intervals,
For example, over two hours, the number of events in the first hour may be 4, and the
number of events in the second hour may be 7 – the observation of 4 in the first hour
cannot help evaluate the possible outcome in the second hour. On the contrary, if we
see 4 customers walking into a bank during 09:00-10:00 and this information can be
used to predict the number of customers during the next hour 10:00-11:00, then in this
So when 𝑋 has a Poisson distribution, we can talk about the average rate of
occurrence, or the expected number of events in a fixed time interval like one hour or
one day. This rate of occurrence is constant and is the parameter of Poisson distribution,
which we call 𝜆. For example, suppose 𝜆 = 7.5 for the bank-customer example and
this means on average, there are 7.5 customers walking into a bank per hour. Sometimes
we see more customers, maybe 10, and sometimes we have fewer customers, maybe 6
9
So what exactly is the pmf of the random variable 𝑋 when 𝑋 has a Poisson
∞ 𝜆𝑥 𝑒 −𝜆
∑ ( )=1 (9)
𝑥=0 𝑥!
In (9), we have a sum of infinitely many terms, over all possible values of 𝑋 – recall
𝜆𝑥 𝑒 −𝜆
𝑝𝑋 (𝑥) = (10)
𝑥!
The expected value and the variance of 𝑋 ~ 𝑃𝑜𝑖(𝜆) are given as:
𝐸[𝑋] = 𝜆
Var(𝑋) = 𝜆
It turns out that the expected value = the variance = 𝜆. We skip the proof of this result
(although it is actually quite interesting). The Poisson distribution has some important
properties. Its duration, that is, the time between two events, has an exponential
distribution. The Poisson distribution can also be derived as the limit of Binomial
10
➢ Exercise
2𝑥 𝑒 −2
𝑝(𝑥) =
𝑥!
11
Before we move on, we can have a look at this map of statistical distributions:
https://www.math.wm.edu/~leemis/chart/UDR/UDR.html
You may agree that the most important distributions are standard uniform,
If 𝑋 has a uniform distribution over interval (𝑎, 𝑏), its pdf is given as:
1
𝑓(𝑥) = , 𝑎<𝑥<𝑏 (11)
𝑏−𝑎
𝑋 ~ 𝑈(0, 1). In this case, 𝐸[𝑋] = 1/2 and Var(𝑋) = 1/12. Please verify this result:
12
Thus, we can also know that for 𝑋 ~ 𝑈(𝑎, 𝑏):
𝑎+𝑏
𝐸[𝑋] =
2
(𝑏 − 𝑎)2
Var(𝑋) = (12)
12
𝑥−0
𝐹(𝑋) = 𝑃(𝑋 ≤ 𝑥) = 𝑥 = , 0<𝑥<1
1−0
𝑥−𝑎
𝐹(𝑋) = , 𝑎<𝑥<𝑏 (13)
𝑏−𝑎
13
5. Exponential Distribution
We denote this as 𝑋 ~ 𝐸𝑥𝑝(𝜆). The parameter 𝜆 > 0 is the same parameter 𝜆 in the
1
𝐸[𝑋] =
𝜆
1
Var(𝑋) = (15)
𝜆2
Thus, when 𝑋 ~ 𝐸𝑥𝑝(𝜆), its mean value and standard deviation are the same.
distribution? As we mentioned earlier, the time between two events will be exponential
distribution, with the number of events in a fixed time interval (like one day or one hour)
distribution, 𝜆 is the average number of events in a time interval. For example, say
𝜆 = 0.2 per day and this means on average one day we see 0.2 occurrences of some
event (like car accidents). Equivalently, there is, on average, one event every 5 days,
which is given by (1/𝜆) . Thus, (1/𝜆) is the average length of time between two
events. Sometimes we have 10.5 days between two car accidents, and sometimes it may
14
An important conclusion is that, if the events are given by a Poisson distribution,
the time between two events will have exponential distribution, and vice versa. One
simple thing we can do is to collect the times 𝑋 between events, and check if:
𝐸[𝑋] = 𝑆. 𝐷. (𝑋)
𝑆. 𝐷. (𝑋)
=1 (16)
𝐸[𝑋]
If this ratio has a value larger than one or smaller than one, then the events are not given
by a Poisson distribution.
There are a lot more can be said about exponential distribution. For example, it
has a memoryless property (again has an interesting proof but we skip). An exponential
special case of Weibull distribution. Both Gamma and Weibull are very useful
15
➢ Exercise
(1) From the pdf 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 , 𝑥 ≥ 0, show that the cdf 𝐹(𝑥) for 𝑋 ~𝐸𝑥𝑝(𝜆) is
given by:
𝐹(𝑥) = 1 − 𝑒 −𝜆𝑥
(2) Let 𝑋 be the life of an equipment (measured in year) and assume it has cdf:
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝐹𝑋 (𝑥) = {
1 − 𝑒 −𝑥 , 𝑥 ≥ 0
What is pdf 𝑓𝑋 (𝑥) of 𝑋 and what is the probability of the event that this
16
6. Normal Distribution
A variable 𝑋 has normal distribution, denoted as 𝑋 ~ 𝑁(𝜇, 𝜎 2 ), when its pdf is:
1 1 𝑥−𝜇 2
𝑓(𝑥) = exp (− ( ) ) (17)
√2𝜋𝜎 2 2 𝜎
for 𝑥 ∈ ℝ.
Errr… there are so much to say about normal distribution. We will talk about the
Central Limit Theorem (CLT), the Jarque-Bera test4 for normal distribution and bi-
We can only talk about the main idea (also its simplest form, see Coles 2001) of CLT.
𝑋1 + ⋯ + 𝑋𝑛
𝑋𝑛 = (18)
𝑛
𝑑 𝜎2
𝑋𝑛 → 𝑁 (𝜇, ) (19)
𝑛
Or equivalently,
(𝑋𝑛 − 𝜇) 𝑑
√𝑛 → 𝑁(0, 1) (20)
𝜎
4
Jarque, C. M. and A. K. Bera (1987) A test for normality of observations and regression residuals,
International Statistical Review 55, 163-172.
17
𝑑
The notation “→” means convergence in distribution. In practice, we say the sample
mean 𝑋𝑛 from a random sample will, approximately, have a normal distribution with
mean 𝜇 and variance (𝜎 2 /𝑛) , when the sample size 𝑛 is large. Importantly, this
From the CLT, we can use the sample mean 𝑋𝑛 to infer the population mean 𝜇.
𝜎 𝜎
(𝑋𝑛 − 𝑧𝛼/2 , 𝑋𝑛 + 𝑧𝛼/2 )
√𝑛 √𝑛
When 𝛼 = 5%, we have 𝑧𝛼/2 = 1.96. Thus a 95% confidence interval covers about
And this is because we have the approximate normal distribution from CLT. Of course,
when the population variance 𝜎 2 is unknown, we can replace 𝜎/√𝑛 by the sample
There are many tests to see if a data or a sample comes from normal distribution. The
Jarque-Bera test is very popular and its test statistic is given as:
𝑛 1 2
JB = (𝑠𝑘𝑒𝑤 2 + (𝑘𝑢𝑟𝑡 − 3)2 ) ~ 𝜒𝑑𝑓=2 (21)
6 4
Under the null hypothesis of normal distribution, the test statistic JB = ________:
𝐻0 : 𝑠𝑘𝑒𝑤 = 0, 𝑘𝑢𝑟𝑡 = 3
{
𝐻1 : 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑛𝑜𝑡 𝑛𝑜𝑟𝑚𝑎𝑙
Therefore, if the JB statistic value is very large, we have strong confidence that the data
does not come from normal distribution. Note that the test statistic JB ≥ 0.
This is because in the formula, we have to first calculate two statistics, the sample
skewness and sample kurtosis, to obtain the JB value. At 5% significance level, the
2
𝜒𝑑𝑓=2 critical value is 5.99. Thus, if the JB value is larger than 5.99, we can reject the
null hypothesis and conclude that the data or sample is not from a normal distribution.
Note that the sample size 𝑛 is included in the JB test (21), but a large sample size 𝑛
Finally, we want to emphasize that there are several important distributions which
are closely related to normal distribution. The first is Student’s 𝑡 distribution, which is
also symmetric (with skewness = 0), but has fat tails (with kurtosis > 3). The 𝑡
19
distribution has only one parameter 𝜈, and it will converge to normal distribution when
we say 𝑋 has lognormal distribution. Both 𝑡 and lognormal distribution has many
important applications: 𝑡 distribution is used in many hypothesis tests and also for
describing intraday financial returns, while lognormal distribution is the model for
20
References
Series in Statistics.
University Press.
https://medium.com/@andrew.chamberlain/deriving-the-poisson-distribution-from-
the-binomial-distribution-840cc1668239
21
Appendix I. Proof of (7)
When 𝑋 ~ 𝐵𝑖𝑛(𝑛, 𝑝), by definition it is the outcome of 𝑛 Bernoulli trials, which are
independent from each other with the probability of success 𝑝. Therefore, we can think
𝐸[𝑋] = 𝐸[𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ]
Var(𝑋) = Var(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
22
Appendix II. Must-knows of the Exponential Function 𝒇(𝒙) = 𝒆𝒙
Perhaps we can say 𝑓(𝑥) = 𝑒 𝑥 , 𝑥 ∈ ℝ, and 𝑔(𝑥) = ln 𝑥 , 𝑥 > 0, are the two most
𝑟𝑡 = ln(𝑃𝑡 /𝑃𝑡−1 )
, where 𝑃𝑡 is, for example the price of stock from day 𝑡. We do not use the percentage
return:
𝑃𝑡 − 𝑃𝑡−1
𝑃𝑡−1
The reason for this is because we use compound interest in continuous time, or
our bank account with an annual interest rate of 𝑟 = 4%. Let 𝑚 be the number of
1 ∙ (1 + 4%)1 = 1.04$
Suppose your bank offers you 𝑚 = 2, that is, it can pay you interest twice per year.
4% 2
1 ∙ (1 + ) = 1.0404$
2
23
Here we can see in the second half-year, there is compound interest, that is, interest
on interest, and thus the final result 1.0404 is larger than 1.04.
Your bank then offers you interest payment per month, and in this case:
4% 12
1 ∙ (1 + ) ≅ 1.0407$
12
This result is expected, because when interests are paid more frequently, there are more
compound interests and so the final result gets better. A good question is, can we keep
What would happen when your bank says it will pay interests every minute, or
every second or, let 𝑚 → ∞, the number of interest payment go to infinity? In this case,
4% 𝑚
lim (1 + ) = 𝑒 4% ≅ 1.0408$
𝑚→∞ 𝑚
In other words, we obtain a limit which is given by the exponential function. In fact,
the calculation of compound interest is exactly how the exponential function was
𝑟 𝑚
lim (1 + ) = 𝑒𝑟
𝑚→∞ 𝑚
Thus, suppose a stock has price 𝑆 = 100$ today and say, it has a return 𝜇 = 12%.
5
He is one of several important Bernoullis!
24
After one year, the price of this stock becomes:
Conversely, suppose we see the two prices 100 and 112.75, then how do we calculate
𝜇 = ln(112.75/100) ≅ 0.12
25
Appendix III. Poisson as the limit of Binomial
poisson-distribution-from-the-binomial-distribution-840cc1668239.
26