0% found this document useful (0 votes)
19 views44 pages

Probability Distributions

Probability distribution for biostatistics for undergraduate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views44 pages

Probability Distributions

Probability distribution for biostatistics for undergraduate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Definitions

1) A random variable: Characteristic whose obtained values


arise as a result of chance factors

2) A Probability distribution: A description of all possible


values of a random variable and their associated probabilities. It
gives the probability of the outcomes of an experiment.

1
Types
A. Discrete probability distributions
• Define probabilities associated with discrete variables. A
discrete random variable take any of a specified finite or
countable list of values. Discrete probability distributions
include:
• Binomial probability distribution
• Poisson probability distribution
• Negative binomial probability distribution
• Multinomial probability distribution
• Hypergeometric probability distribution
B. Continuous Probability Distributions
Define probabilities associated with continuous variables.
• Normal distribution
• Student’s t-distribution
• Chi-square
2
• F-distribition
Normal distribution
The Normal(Gaussian) distribution describes a special class of
distributions that are symmetric and can be described
by two parameters

(i) = The mean of the distribution (  ) determines the center of


the distribution

(ii) = The standard deviation of the distribution ( ) determines


the spread of the distribution

Changing the values of (  ) and ( ) alters the positions and


shapes of the distributions
3
4
See example in excel

5
Probability density function
The function tells us about the probability of obtaining a
value within some interval

The function is typically denoted f(x), that gives


probabilities based on the area under the curve (AUC)
1  1 
f ( x)  exp   2 ( x   ) 2 
2  2 
  x  
Total area under the function f ( x) is 1 i.e .  f ( x)dx  1.0
and f ( x)  0

The pdf defines the normal distribution


6
Standard Normal Distribution Table
Table rows show the whole number and tenths place of the z-score
Table columns show the hundredths place. The cumulative
probability (often from minus infinity to the z-score)
appears in the cell of the table.

e.g. P(Z<0.58) = 0.719

7
Standard normal distribution (Z)

Special distribution where   0 and  1


X ~ N (0,1)
The normal random variable of a standard normal
distribution is called a standard score or a Z-score

Every normal random variable X can be transformed


into a Z-score using the formula below:
x
Z

Probability tables available to obtain cumulative
probabilities of z-scores i.e P ( Z  z ) 8
68-95-99.7 rule

68% of the AUC lies within one standard deviation from the mean
i.e (   )
95 % within two standard deviations (  2 ) and 99.7% within
three standard deviations (   3 )
9
Suppose Z~ N(0,1), what is P(Z<0) ? Use the standard
Normal distribution table

Symmetry: P(Z<0) = 0.5


10
If Z ~ N(0,1) what is the P(Z>0.92) ?

P(Z>0.92) = 1-P(Z<0.92) since we can obtain P(Z<0.92) in


The tables, then we can obtain P(Z>0.92)

P(Z>0.92) = 1- 0.8212 = 0.1788 11


What is the P(-0.64< Z < 0.43) ?

P(-0.64< Z < 0.43) = P(Z<0.43) – P(Z<-0.64)


= 0.6664 – 0.2611 = 0.4053
12
Real example: (Rosner 5.20)

Serum cholesterol is approximately normally distributed with


Mean 219 mg/mL and standard deviation 50 mg/mL. If the
clinically desirable range is <200 mg/mL, then what
proportion of the population falls in this range?

Step 1. convert 200mg/mL to standard normal (Z score)


x 200  219
Remember, Z   = -0.38
 50

Step 2. Determine P(Z< -0.38) from table


P(Z< -0.38) = 0.352

What proportion of the population falls outside the desired


Range? 13
Other applications

14
Hypothesis testing: The z-test for
the mean of a Normal population
(large samples)
Situation
• A sample of n is selected from a normal
population with mean m (unknown) and
standard deviation s. We want to test
either
1. H 0 :   0 versus H A :   0
or
2. H 0 :   0 versus H A :   0
or
3. H 0 :   0 versus H A :   0
The Test Statistic

x  0 x  0x  0
z  
x  s
n n
if n is large.

Reject null if the absolute value of Z is larger than the critical


value
Normal approximation of a Binomial distribution
• If x is a random variable with distribution Bin(n, p),
then for sufficiently large n, the random variable has
a standard normal distribution
• Provided n is large enough, N(μ,σ) is a good
approximation for Bin(n, p) where μ = np and σ2 =
np (1 – p)
• The normal distribution is a good approximation for
the binomial distribution when np ≥ 5 and n(1 – p) ≥
5
• The Z-test can therefore be used in hypothesis
testing on proportions------coming soon stay tuned
18
Example 2:

1) What is the probability that less than 6 out of 1000 children get
infected, given that the probability of transmission by 6 weeks
is 2 percent

Note: np=>>>5
mean=np=1000*.02= 20
Variance=np(1-p)= (1000*.02)*(1-0.02)=19.6
Standard deviation=4.4
Using the Binomial distribution, X~bin (1000,0.02),
P(X<6)=0.000064

Using normal distribution: Z= (6-20)/4.4 = -3.2


P(Z<-3.2)=0.000068 ~0.000064
19
Application of normal approximation to

binomial
A success-failure experiment has been
repeated n times
• The probability of success p is
unknown. We want to test either
1. H 0 : p  p0 versus H A : p  p0
or
2. H 0 : p  p0 versus H A : p  p0
or
3. H 0 : p  p0 versus H A : p  p0
The Test Statistic

pˆ  p0 pˆ  p0
z 
 pˆ p0 1  p0 
n

Reject null if the absolute value of Z is larger than the critical


value
Binomial probability distribution
Suppose you flip a coin two times. In this experiment there four
possible outcomes: HH, HT, TH, and TT.

Let the random variable X represent the number of Heads that


result from this experiment. The random variable X can only
take on the values 0, 1, or 2, so it is a discrete random variable.

The probability distribution of the random variable X is shown in


the table below:

x 0 1 2
P(X=x) 1/4 1/2 1/4

Generraly, a binomial random variable X with success


probability p and sample size n we write: X bin(n, p )
22
Bernoulli Trial

A Bernoulli trial is an experiment with only 2 possible outcomes,


which we denote by 0 or 1 e.g. tossing a coin

Can you think of other examples?

Assumptions:

1) Two possible outcomes—success(1) or failure (0)

2) The probability of success, p, is the same each trial

3) The outcome of one trial has no influence on the later


outcomes (independent trials)

23
Binomial Random Variable

A binomial random variable is the total number of successes in n


Bernoulli trials.

Example: number of HIV-infected children born to an HIV-infected


mother (determined at 6 weeks) in a family of 3 (assume all
children were born when the mother was HIV-infected)

What do we need to know?


1) How many ways are there to get k successes (k=0,1…..3) in
trials

2) What’s the probability of any given outcome with exactly k


successes (i.e probability distribution of k)

24
Illustration of possible outcomes

Child number
1 2 3 Outcome (k)
+ + + 3 infected
+ + - 2 infected
+ - + 2 infected
- + + 2 infected
+ - - 1 Infected
- + - 1 infected
- - + 1 infected
- - - 0 infected

25
Combinations

Combinations are the number of different arrangements of k


successes taken from a total of n independent trials if order does
not matter
Illustration:
Child number
1 2 3 Outcome (k) Number of ways

+ + + 3 infected 1 way
+ + - 2 infected
+ - + 2 infected 3 ways

- + + 2 infected
+ - - 1 Infected 3 ways
- + - 1 infected
1 way
- - + 1 infected
- - - 0 infected 26
What are the probabilities of the different outcomes?

Sequence of successes (0,1,2 or 3) and failures will have


probability P k (1  p )3k
Note: Where p is the probability of success for each independent
trial

The number of different ways (combinations) such sequences of


successes and failures can occur is given by the formula below:
n n! 3  3!
C  
n
i.e C  
3

 k  k ! n  k !  k  k !3  k !
k k

“n factorial” = n! = n x (n-1) x ….x 1

Confirm combinations from previous slide


27
Binomial Probabilities

Probabilities of outcomes will therefore be given by

n k n! nk
P ( X  k )    P (1  P ) nk = ( ) P k
(1  P )
k  k ! n  k !

This formula is called the probability mass function


of a binomial probability distribution

28
Example:
1) What is the probability that none (0) out of three children get
infected, given that the probability of transmission by 6 weeks
is 2 percent
3
P ( X  0)    X 0.020 X (1  0.02)30
0
3!
( ) x1x0.983

0! x(3  0)!
(3 x 2 x1)
 *0.983
1x(3 x 2 x1)
 0.94

29
What is the binomial probability distribution of the number of
infected children out of 3?

X 0 1 2 3

P(X=k) 0.94112 0.057624 0.001176 0.000008

Cumulative probability 0.94112 0.998816 0.999992 1

Note that the sum of the probabilities for all the possible
values of the binomial random variable is equal to 1
n

i.e
 P( X  k )  1
k 1

30
Cumulative-distribution function (cdf)

The probability of observing less than or equal to a given


number of successes for given number of independent trials

i.e cdf= P ( X  x ) of denoted as F ( X )

e.g. What is the cumulative probability of 2 successes?


2
P ( X  2)   P ( X  k )  P ( X  0)  P ( X  1)  P( X  2)
k 0

=0.999992

31
Mean and Variance of a Bernoulli random variable

Given P  X  1  p then P  X  0  1  p
1
Mean,   E  X    p j x j
j 0

 (1  p ) x0  px1
p
Variance,  2 V X  1 p ( x   )2
   j j
j 0

 (1  p ) *(0  p 2 )  p *(1  p) 2
 p (1  p )
32
Mean and Variance of a binomial random
Variable
MEAN

It can be shown that the mean of a binomial random variable,


  E ( X )  np

VARIANCE

Variance of binomial random variable is given by,

 2  V [ X ]  np (1  p )

33
Poisson distribution

Includes counts of events within a set unit of time,


area, population number etc. e.g:

1) Number of deaths for every 100,000 people

2) Number of births per hour during a given day

3) In cohort studies we can study number of new


cases of disease
per a given number of person years of observation

34
Poisson Probability distribution
Discrete probability distribution for the counts of events that occur
randomly in a given interval of time (or space)

If X =number of events in a given interval,

and mean number of events per interval is 


The probability of observing x events in a given interval is:

  x
Note that e is an exponential number=2.711828
P( X  x)  e
x!
If X has a Poisson distribution, then we write: X  Po( ) where
 is the parameter of the distribution
Note: A Poisson random variable can take on any positive integer
value while Binomial distribution always has a finite upper limit.
35
Assumptions
1. The probability of two events occurring in the
same narrow interval is negligible. (rare events)
2. The probability of observing a single event over a
small interval is approximately proportional to the size
of that interval
3. The probability of an event within a certain interval
does not change over different intervals. (stationarity)
4. The probability of an event in one interval is
independent of the probability of an event in any other
non-overlapping interval.(independence)

36
Example:

Stillbirths in hospital Y occur randomly at an average rate of


2.1 per month. What is the probability of observing 3 deaths in a
given month?

Let X = No. of deaths per month, then X ~ Po(2.1)


3
2.1
Then, P ( X  3)  e 2.1 = 0.19
3!

What is the probability of observing more than 2 events per


month?
P ( X  2)  P ( X  3)  P ( X  4)  .......
Oops!!! an infinite number of probabilities. How do we do this?
37
Poisson distribution

P ( X  2)  P ( X  3)  P ( X  4)  .......
 1  P ( X  2)
 1  P ( X  0)  P ( X  1)  P ( X  2)

 1- 0.122+0.257+0.270)
 0.350

38
Changing intervals
What would be the probability of observing 5 stillbirths in 2
Months?

Note: the interval for the rate changed from 1 month to 2 months
What is  ?

If X ~ Po( ) on a 1 unit interval

then,X ~ Po(k  ) for k unit intervals

Thus, X ~ Po(2* 2.1)

5
4.2
So P ( X  5)  e 4.2 = 0.1633
5!
39
Shape of the Poisson distribution
(i) Unimodal

(ii) Positive skew (that decreases as  increases)

(iii) centered roughly on 

(iii) the variance (spread) increases as  increases

Try to vary  for the


previous example and
observe the effect on the
shape

Source:http://en.wikipedia.org/wiki
40
Mean and variance of a Poisson distribution

If X ~ Po( ) then
 

 
2

 

41
Poisson Approximation to the Binomial Distribution
The binomial distribution with large n and small p can be
accurately approximated by the Poisson distribution with
Parameters   np

Consider a binomial distribution with large n and small p. The


mean for this distribution   np and variance is

Note that for small p, q is approximately =1. Thus:

npq ~ np whereq  1  p
Thus for large n, the mean and variance of a binomial
Distribution are almost equal. Thus binomial approximates
a Poisson distribution
42
Example 2:

1) What is the probability that 6 out of 1000 children get


infected, given that the probability of transmission by 6 weeks
is 2 percent

mean=np=1000*.02= 20
Using the Binomial distribution, X~bin (1000,0.02),
P(X=6) = 0.00017

Using Poisson, =20 , P(X=6) = 0.00018 ~ 0.00017

43
Useful references

Fundamentals of Biostatistics, Rosners 7th Ed.2010

44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy