24 Intro To Bayesian Inference
24 Intro To Bayesian Inference
com
#EPIB621-24
2
Model Trend 𝑌𝑖𝑗 = 𝛽0 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝜖𝑖𝑗
3
Model Trend Extension
𝑌𝑖𝑗 = 𝛽0 + 𝑢0𝑖 + 𝛽1 𝑋𝑖𝑗
𝑌𝑖𝑗 = 𝛽0 + 𝑢0𝑖 + 𝛽1 𝑋𝑖𝑗 + 𝛽2 𝑇𝑖𝑗 + 𝜖𝑖𝑗 +(𝛽2 +𝑢1𝑖 )𝑇𝑖𝑗 +𝜖𝑖𝑗
4
Modeling change from baseline
Sometimes we would like to include the baseline level of the outcome as
the predictor to adjust for the potentially different scale of the outcome.
5
Correlated
Outcome
Course Map
Count Poisson Regression Two-way Table
Outcome Model Selection
66
Population, sample, parameter, statistics
Maximum likelihood
7
Which one is random?
9
Which is/are a parameter(s)?
Parameter is fixed
11
24 Introduction to Bayesian Inference
Qihuang Zhang
13
Bayesian statistical methods ...
▪ We are still interested in estimating the parameters.
▪ Rely on the mathematics of probability to combine:
⏵data to be analyzed
⏵information from sources extraneous to the data
▪ Make scientific conclusions with quantified certainty via probability
statements, such as,
14
Bayesian inference ...
▪ Takes a model-based approach toward data analysis
▪ Model: the mechanism by which data similar to those collected/observed
could arise
→ Statistical distributions are used to model the data
▪ Example: suppose that infection risk, 𝜋 of a disease is of interest in a
given population; the collected data are the infection status of 𝑛
individuals randomly sampled from the population
▪ Remember the appropriate model for this data?
15
Bayesian vs frequentist philosophy
▪ Before, under the frequentist framework we said that 𝑝 is unknown but fixed;
▪ Under the Bayesian framework, however, we can “model" our
uncertainty about 𝜋 using probability distributions;
i.e., we can define models not only for data but also for the unknown
quantities of interest that are parameters in our data models.
▪ The probability models for the unknowns before we observe and analyze the
data are called the prior distribution.
▪ The probability models for the unknowns after we observe and analyze the
data are called the posterior distribution.
16
Essential ingredients of Bayesian analysis
▪ Data (𝑌, 𝑋)
▪ Unobservables or unknowns of interest (parameters 𝜃)
▪ Probability mechanism or model that generates the data (data model, likelihood)
▪ (**New**) State of pre-data or outside-data knowledge about the unknowns
(prior distribution)
▪ (**New**) The mathematical formulation (or learning mechanism) by which
data adds to this knowledge (updating the prior knowledge with the information
in data) → Bayes Theorem
▪ (**New**) The resulting post-data knowledge (posterior distribution)
17
18
Veteran Blood Pressure example
• Observables, i.e., data :
⏵BP: veteran’s blood pressure (𝑌)
⏵trt assignment: educational program to control BP (two programs) (𝑋)
• Unobservables or unknowns of interest (parameters)
⏵Population mean blood pressure of veterans under either of these
programs
𝐸(𝐵𝑃) = 𝛽0 + 𝛽1 𝑡𝑟𝑡
⏵𝛽0 and 𝛽1
• Probability mechanism or model that generates the data (data model)
𝑦𝑡𝑟𝑡=0 ~ 𝑁(𝛽0 , 𝜎 2 ) and 𝑦𝑡𝑟𝑡=1 ∼ 𝑁(𝛽0 + 𝛽1 , 𝜎 2 )
19
Veteran Blood Pressure example
▪ State of pre-data or outside-data knowledge about the unknowns (prior
distribution)
⏵Even before we collect data we know quite a bit about 𝛽0 and 𝛽1 since
they represent the mean blood pressure of people!
⏵In the absence of any other information, the common knowledge about
the range of BP can be used to specify prior distributions;
⏵e.g. DBP in older adults is between 70 − 90 and we don’t expect the type of
program will make a dramatic difference, therefore, apriori, we can assume
𝛽0 ∼ 𝑁(80, 102 ), 𝛽1 ∼ 𝑁(0, 52 )
If we are wrong, with (enough/rich) data we can correct this prior
assumption!
20
Conditional probability
▪ The probability of an event can depend on knowledge of whether or
not another related event has occurred.
▪ Example: roll two dice, P(sum is 6)?
▪ What is the probability that the sum of two is 6 given that the first dice
shows a number less than 4?
▪ The conditional probability of event A given event B is defined as
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) =
𝑃(𝐵)
where ∩ represents “and”.
21
The Bayes Theorem
▪ What about the conditional probability of 𝑃(1𝑠𝑡 𝑑𝑖𝑐𝑒 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 4 | 𝑠𝑢𝑚 𝑖𝑠 6)?
▪ The Bayes Theorem enables proper reversal of conditioning:
𝑃 𝐴 𝐵 𝑃(𝐵)
𝑃(𝐵|𝐴) =
𝑃(𝐴)
22
The Bayes Theorem
▪ Example: suppose that a new rapid antigen test for COVID-19 is
developed with a reported sensitivity of 90% and specificity of 98%.
⏵Sensitivity: 𝑃 𝑇 = + 𝐷 = +) = 0.9
⏵Specificity: 𝑃 𝑇 = − 𝐷 = −) = 0.98
23
The Bayes Theorem
• What you want to know is given by the reverse conditional probabilities:
⏵𝑃 𝐷 = + 𝑇 = +)
𝑃(𝐷 = +) is the prevalence of
⏵𝑃 𝐷 = − 𝑇 = −) the disease in the population,
some estimate of which can be
• The Bayes theorem gives these probabilities as: obtained - say 10%.
𝑃 𝑇 = + 𝐷 = + 𝑃(𝐷 = +)
𝑃(𝐷 = +|𝑇 = +) =
𝑃(𝑇 = +)
𝑃(𝑇 = +) = 𝑃(𝑇 + | 𝐷+)𝑃(𝐷+) + 𝑃(𝑇 + | 𝐷−)𝑃(𝐷−)
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 + (1 − 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦) × (1 − 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
So we have:
0.90 × 0.1
𝑃(𝐷 = +|𝑇 = +) =
0.90 × 0.1 + 0.02 × 0.9
24
The Bayes Theorem for inference
▪ Let 𝑦 be the observed data and 𝜃 an unknown parameter of the data
generating model. Having observed the data 𝑌. We can make inferences
about 𝜃 via
𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
𝑃(𝑦)
25
The Bayes Theorem for inference
𝜃: unknown parameter and 𝑦: the data
𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
𝑃(𝑦)
Posterior: all our knowledge
about the parameters, i.e., 𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝜃)𝑝(𝜃)𝑑𝜃
prior + data A useless but annoying term….
26
27
Basics of Bayesian inference
How to choose the prior 𝑃(𝜃)?
→ There are some combinations of priors and likelihood are very suitable.
28
Basics of Bayesian inference
Conjugate priors?
→ We can decide the prior according to the likelihood (data generating model)
29
Conjugate priors - examples
Normal-Normal
a normal prior
𝜃 ∼ 𝑁(µ, 𝜏 2 )
(known hyper-parameters 𝜇 and 𝜏 ) results in a normal posterior
𝜎2 2 −1
𝜇 + 𝜏 𝑦ത 𝑛 1
𝜃|𝑦 ∼ Normal 𝑛 , 2+ 2
𝜎 2 𝜎 𝜏
+ 𝜏2
𝑛
30
The Bayes Theorem for inference
𝑃 𝑦 𝜃 𝑃(𝜃)
𝑃(𝜃|𝑦) =
𝑃(𝑦)
Example: suppose that the goal is to estimate the mean DBP of a target population
of veterans based on a sample of 𝑛 DBP measurements, we assume the following
data generating model:
𝑦𝑖 ~ 𝑁(𝜃, 4)
31
The Bayes Theorem for inference
where
32
Note
33