Day1 SISCER2023 ClinicalTrials
Day1 SISCER2023 ClinicalTrials
of Clinical Trials
Pamela Shaw & Michael Proschan
1 / 51
Introductions
Pamela Shaw
I Kaiser Permanente Washington Health Research Institute
I Biostatistics Division
I pamela.a.shaw@kp.org
Michael Proschan
I National Institute of Allergy and Infectious Diseases (NIAID)
I Biostatistics Research Branch
I proscham@niaid.nih.gov
2 / 51
Course Outline
Day 1
1. 8:30-8:40 Introductions
2. 8:40-9:30 Choice of primary outcome and analysis
9:30-9:45 Break
3. 9:45-10:30 Randomization
10:30-10:45 Break
4. 10:45-12:00 Sample size/ Power
Day 2
6. 8:30-10:15 Interim monitoring
7. 10:15-10:45 Break
8. 10:45-12:00 Futility
3 / 51
Course Outline (2)
Day 3
1. 8:30-9:40 Handling missing data
9:40-9:55 Break
2. 9:55-10:45 Multiple Comparisons
10:45-11:00 Break
3. 11:00-11:55 Adaptive design
4. 11:55-12:00 Wrap Up
4 / 51
Course overview
Overall aim
That you will gain a set of simple tools and principles that go a long
way towards robust clinical trial design and analysis.
5 / 51
Lecture 1: Choice of primary outcome and
analysis
6 / 51
Key Features of Randomized Controlled Trial (RCT)
7 / 51
A few definitions....
8 / 51
Types of randomized studies
9 / 51
Phases of clinical trials
https://www.fda.gov/patients/drug-development-process/step-3-clinical-research
10 / 51
The RCT Gold Standard
11 / 51
Intent-to-treat analyses
An intent-to-treat (ITT) analysis is one where randomized individuals
are analyzed in the group they were randomized to, regardless of
what happens during the trial. Analyze as you randomize!
13 / 51
Handling missing data in an RCT
Little et al. (2012)
14 / 51
Preventing missing data
15 / 51
Three-prong Approach to Minimizing Impact of
Missing Data
Design
I Avoid endpoints that are more likely to be missing
I Choose the smallest time frame for primary analysis that still
yields clinically relevant information on treatment effects
I Consider a run-in period to ensure commitment
I Particularly important for long/complicated studies
Conduct
I Make extensive efforts to retain subjects
I Continue follow-up for outcomes even if subject stops treatment
Analysis
I Choose analyses that require minimally problematic assumptions
16 / 51
Considerations for the primary outcome
17 / 51
The Measurement Principle
18 / 51
So why choose only 1 endpoint?
19 / 51
Maintaining type I error without loss of power
20 / 51
Reliability
21 / 51
Efficacy and Safety of Metronidazole for Pulmonary
Multidrug Resistent Tuberculosis (MDR-TB)
Study NCT00425113
Background
I MDR TB is a difficult to treat disease. Individuals have been
observed to fail first line therapies (Isoniazid and Rifampicin)
I Standard MDR-TB treatment is 18-24 months of 2nd-line
antibiotics
I In vitro data showed that metronidazole is active against
Mycobacterium tuberculosis (MTB) maintained under anaerobic
conditions
I Pre-clinical studies (non-human primates, rabbits) also showed
metronidazole may have unique activity against an anaerobic
sub-population of bacilli in human disease
22 / 51
Design of the Metronidazole for MDR-TB Trial
23 / 51
Problematic Primary outcome
24 / 51
Transparency on Clinical Trials.gov
Study NCT00425113
5 years after trial opened and after study had closed early, primary
outcome was changed
Changes in TB Lesion Sizes Using High Resolution Computed Tomography (HRCT). [
Time Frame: 6 months. ] Lesions were defined as nodules (<2 mm, 2-<4 mm, and
4−10 mm), consolidations, collapse, cavities, fibrosis, bronchial thickening, tree-in-bud
opacities, and ground glass opacities. Each CT was divided into six zones (upper,
middle, and lower zones of the right and left lungs) and independently scored for the
above lesions by three separate radiologists blinded to treatment arm. A fourth
radiologist adjudicated any scores that were widely discrepant among the initial three
radiologists. The HRCT score was determined by visually estimating the extent of the
above lesions in each lung zone as follows: 0=0% involvement; 1= 1-25% involvement;
2=26-50% involvement; 3=51-75% involvement; and 4=76-100% involvement. A
composite score for each lesion was calculated by adding the score for each specific
abnormality in the 6 lung zones and dividing by 6, with the change in composite score
measured at 2 and 6 months compared to baseline. Composite sums of all 10
composite scores are reported.
25 / 51
RCT Example: Effect of Ranitidine on Hyper-IgE
Recurrent Infection (Job’s) Syndrome
NCT00527878
Background
I Hyper-IgE syndrome (HIES) is an immunological disorder
caused by a genetic mutation (STAT3) characterized by recurrent
infections of the ears, sinuses, lungs and skin, and abnormal
levels of the antibody immunoglobulin E (IgE).
I Patients with hyper-IgE syndrome also tend to have skeletal
abnormalities: characteristic face, retained teeth, and recurrent
fractures from minimal trauma
I An early phase RCT was launched in 2007 at NIAID to study
whether ranitidine would reduce infections
I At time trial done only about 76 known cases in US.
26 / 51
Considerations for an endpoint for this diverse disease
One possibility: A patient-reported score of severity of symptoms.
Problem: Patients with more severe disease less bothered by mild to
moderate symptoms, and high functioning patients bothered by
relatively minor symptoms
Alternative: A numeric score was considered that would capture the
number of new infections
I The number of infections that required new antibiotics was
reported on a quarterly basis, to balance burden and accuracy
(require recall over shorter period)
I Total number in a year is prone to missingness
I Rate of infections per month is a more flexible endpoint
I Disease had many other chronic morbidities (e.g. recurrent
fracture), but ranitidine only expected to affect infections
Final: Primary endpoint chosen was the rate of infections (i.e. avg #
per month during first year). Primary endpoint to require at least 2 of
the 4 quarters to give a robust estimate of the yearly rate.
27 / 51
HIES Ranitidine Trial Study: Double trouble
28 / 51
Clinical relevance
29 / 51
Surrogate Outcome
I Various definitions exist for a surrogate endpoint. Ellenberg and
Hamilton (1989) lay out a general definition: A “Surrogate
endpoint captures an intermediate endpoint on the disease
pathway, which is informative of the true outcome”
I Generally, the point of a surrogate endpoint is to have an
expected reduction in sample size or trial duration, such as when
a rare or distal endpoint is replaced by a more frequent or
proximate endpoint
I In 1989, Prentice laid out conditions for a surrogate outcome
(known as the Prentice criterion), as well as a working definition,
that assumes a treatment Z effect on the true endpoint Y is
completely captured by the surrogate endpoint X
I E(Y |Z , X ) = E(Y |X )
I Prentice criterion has come under criticism as not practical and
various other discussions have ensued regarding the definition of
and how to validate a surrogate
30 / 51
Examples of surrogate endpoints
31 / 51
CAST Example: Caution is needed when working with
surrogate endpoints
32 / 51
Cardia Arrhythmia Suppression Trial (CAST)
CAST Investigators, 1989
35 / 51
Many examples of misleading surrogates
36 / 51
Composite outcomes are another way to improve
practicality
Composite outcomes are an outcome that combine multiple clinical
endpoints. General idea behind composite endpoints is to increase
power through an increased event-rate
Examples
I Time-to-first of disease progression or death (Progression-free
survival)
I Relapse-free survival
I Major adverse cardiovascular events (MACE)
I Time to first serious AIDS or serious non-AIDS event in the
Strategic Timing of AntiRetroviral Treatment (START) trial
I Time to first of cardiac arrest or arrhythmic death (CAST)
Note: Some composite endpoints are surrogate endpoints
37 / 51
Considerations for composite endpoints
Neaton et al. (2005)
38 / 51
Example: SOLVD Trial
NEJM 1991, 325: 293-302.
Background
I SOLVD a RCT examining novel treatment for prevention of
mortality/hospitalization in patients with congestive heart failure
(CHF) and weak left ventricle ejection fraction (EF)
I In 1986-89, 2569 patients randomized to enalapril or placebo
I Enalapril found beneficial for mortality (p = 0.0036) and time to
first hospitalization/death (p < 0.0001)
Analysis
I Seek to evaluate treatment effect on subset of 662 diabetic
subjects
I Considered alternative to time to first that considers overall
severity
39 / 51
SOLVD: Results
Enalapril Placebo
(N=319) (N=343) Cox PH Score Test
Endpoint Yes No Yes No HR (P-value)
Death 137 182 145 198 0.99 (0.91)
Hospitalization 94 225 148 195 0.60 (< 0.0001)
TTF 174 145 229 114 0.71 (0.0007)
40 / 51
SOLVD: Results
Enalapril Placebo
(N=319) (N=343) Cox PH Score Test
Endpoint Yes No Yes No HR (P-value)
Death 137 182 145 198 0.99 (0.91)
Hospitalization 94 225 148 195 0.60 (< 0.0001)
TTF 174 145 229 114 0.71 (0.0007)
40 / 51
Alternative to Time-to-First: Prioritized severity score
(Shaw and Fay, 2016; Shaw, 2018)
I General idea: rank individuals according to clinical severity
I Depending on setting, clinical severity could consider two or
more outcomes or event times
I Shaw and Fay (2016) Proposed ranking considered surrogate
and ”true” event of interest
I Rank the time to event of interest (death) if it is observed
I Rank time to surrogate event (MI hospitalization) for the survivors
I Surrogate time does not affect clinical severity when event of
interest is observed
I Perform two sample test on clinical severity which incorporates
bivariate survival information
I Resulting test is average of two log-rank tests (aids interpretation)
I Prioritization endpoints have grown in popularity in recent years.
Examples: win ratio (Pocock et al., 2012), Desirability of outcome
ranking (DOOR) (Evans et al., 2015). See review Shaw (2018).
41 / 51
Choice of primary analysis
42 / 51
Common test statistics for a parallel two-arm trial
43 / 51
Interpretability: Wilcoxon
I There are different ways to interpret a test, and some may be
more relevant than others.
I Example: Wilcoxon rank sum and Mann-Whitney tests are
equivalent.
I Wilcoxon, assumes one distribution is shifted relative to other, and
estimates size of shift.
I Mann Whitney compares (treatment, control) pairs and estimates
the following probability for outcome of randomly picked
treatment/control patients (> means better):
48 / 51
References I
Action to Control Cardiovascular Risk in Diabetes Study Group (2008). Effects of intensive glucose
lowering in type 2 diabetes. New England Journal of Medicine 358, 2545–2559.
Buyse, M. and Molenberghs, G. (1998). Criteria for the validation of surrogate endpoints in
randomized experiments. Biometrics pages 1014–1029.
Buyse, M., Molenberghs, G., Burzykowski, T., Renard, D., and Geys, H. (2000). The validation of
surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 1, 49–67.
Ellenberg, S. S. and Hamilton, J. M. (1989). Surrogate endpoints in clinical trials: cancer. Statistics
in Medicine 8, 405–413.
Evans, S. R., Rubin, D., Follmann, D., Pennello, G., Huskins, W. C., Powers, J. H., Schoenfeld, D.,
Chuang-Stein, C., Cosgrove, S. E., Fowler Jr, V. G., et al. (2015). Desirability of outcome ranking
(door) and response adjusted for duration of antibiotic risk (radar). Clinical Infectious Diseases
61, 800–806.
Frison, L. and Pocock, S. J. (1992). Repeated measures in clinical trials: analysis using mean
summary statistics and its implications for design. Statistics in Medicine 11, 1685–1704.
Greene, H. L., Roden, D. M., Katz, R. J., Woosley, R. L., Salerno, D. M., and Henthorn, R. W.
(1992). The cardiac arrhythmia suppression trial: First cast. . . then cast-ii. Journal of the
American College of Cardiology 19, 894–898.
Hallstrom, A., Ornato, J., Weisfeldt, M., Travers, A., Christenson, J., McBurnie, M., Zalenski, R.,
Becker, L., and Proschan, M. (2004). Public-access defibrillation and survival after
out-of-hospital cardiac arrest. New England Journal of Medicine 351, 637–646.
49 / 51
References II
Little, R., D’Agostino, R., Cohen, M., Dickersin, K., Emerson, S., Farrar, J., Frangakis, C., Hogan, J.,
Molenberghs, G., Murphy, S., and Neaton, J. (2012). The prevention and treatment of missing
data in clinical trials. NEJM 367, 1355–60.
Molenberghs, G., Buyse, M., Geys, H., Renard, D., Burzykowski, T., and Alonso, A. (2002).
Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Controlled
Clinical Trials 23, 607–625.
Moore, T. (1995). Deadly medicine: Why tens of thousands of heart patients died in the America’s
worst drug disaster. Simon and Schuster.
Neaton, J. D., Gray, G., Zuckerman, B. D., and Konstam, M. A. (2005). Key issues in end point
selection for heart failure trials: composite end points. Journal of Cardiac Failure 11, 567–575.
Pocock, S. J., Ariti, C. A., Collier, T. J., and Wang, D. (2012). The win ratio: a new approach to the
analysis of composite endpoints in clinical trials based on clinical priorities. European Heart
Journal 33, 176–182.
Ruskin, J. N. (1989). The cardiac arrhythmia suppression trial (CAST).
Shaw, P. A. (2018). Use of composite outcomes to assess risk–benefit in clinical trials. Clinical
Trials 15, 352–358.
Shaw, P. A. and Fay, M. P. (2016). A rank test for bivariate time-to-event outcomes when one event
is a surrogate. Statistics in Medicine 35, 3413–3423.
Svensson, S., Menkes, D., and Lexchin, J. (2013). Surrogate outcomes in clinical trials: A
cautionary tail. JAMA Internal Medicine 173, 611–612.
50 / 51
References III
Wang, Y. and Tian, L. (2017). The equivalence between mann-whitney wilcoxon test and score test
based on the proportional odds model for ordinal responses. In 2017 4th International
Conference on Industrial Economics System and Industrial Security Engineering (IEIS), pages
1–5. IEEE.
51 / 51
Lecture 2: Randomization
1 / 52
Outline
I Basic principles
I Randomization Methods
- simple, permuted block, stratified
I Cluster vs individual designs
I Platform trials
I Adaptive randomization
I Threats to integrity of randomization
2 / 52
Basic definitions (1)
3 / 52
Basic Definitions (2)
4 / 52
Random Examples
Flip a coin
- “Heads” and “Tails” have equal chance for a fair coin
Rolling a die
- The numbers 1 through 6 have equal chance of coming up
Draw one ball out of an urn filled with 10 red balls and 10 blue balls
- The chance of drawing a red or blue ball are equal
5 / 52
Random Examples
6 / 52
Examples of Randomized Designs
7 / 52
Motivation Behind Randomization
8 / 52
Ethics of Randomization
9 / 52
Masking/Blinding: Key Components of Randomization
10 / 52
Ways to Randomize
I Standard ways:
- Computer programs (R, stata, sas, REDCap...)
- Random number tables
- Online tools (e.g., randomization.com)
I NOT legitimate
- Odd vs even birth dates
- Last digit of the medical record number
- Alternate as patients enroll
I Theoretically legitimate, but not so in practice
- Flipping a coin
- Rolling dice
- Drawing balls (m&ms) out of an urn (bag)
11 / 52
Summary of Important Features of Randomization
I Random Allocation
- Known chance receiving a treatment
- Cannot predict the treatment to be given
- Scheme is reproducible
I Minimizes the risk of selection bias
I In double-blinded trials, no response/evaluation bias
I Similar treatment groups
- Patient characteristics will tend to be balanced across study arms
- Chance baseline imbalances between groups may still occur
12 / 52
Types of Randomization
I Simple
I Blocked Randomization
I Stratified Randomization
I Cluster Randomization
I Baseline Covariate Adaptive Allocation
I Response Adaptive Allocation (using interim data)
13 / 52
Simple Randomization
14 / 52
Chance of Imbalance Decreases with Sample Size
15 / 52
Block Randomization
16 / 52
Block Randomization (2)
17 / 52
Issues for Block Randomization
I If blocking is not masked, the sequence can get predictable
Example: Block size 4
ABABBAB? Must be A.
AA?? Must be B B.
I If block too small, unblinding one subject can reveal rest of block
- i.e. if block size is 2, knowing one reveals a second
- Solution: use random block sizes, don’t use block size of 2
I Predictability can lead to selection bias
I Simple solution to selection bias
- Do not reveal blocking mechanism
- Use random block sizes
I Proper analysis would incorporate the blocking used in
randomization, such as a test stratified on the randomization
blocks (Matts and Lachin, 1988)
I This is rarely done
I Why some have advocated for simple randomization for larger
trials, allows for simpler analysis (Lachin et al., 1988)
18 / 52
Sample Code in R
> library(blockrand)
> set.seed(31415)
> list<-blockrand(24,num.levels=2,
levels=c("T","C"),id.prefix="CCP2-",block.sizes=2:4)
> list
id block.id block.size treatment
1 CCP2-01 1 6 T
2 CCP2-02 1 6 T
3 CCP2-03 1 6 C
...
28 CCP2-28 5 8 T
29 CCP2-29 5 8 T
30 CCP2-30 5 8 C
> table(list$treatment)
C T
15 15
19 / 52
Blocked Randomization Example: Flu Vaccine Dose
Escalation Study
20 / 52
Stratified Randomization
I A priori certain factors known to be important predictors of
outcome (e.g. age, gender, diabetes)
I AABB BABA BABA BAAB, balanced trial of 16 but what if women
are patients 1,2,6,8 and 16?
I Stratified randomization: Randomize within strata so different
levels of the factor are balanced between treatment groups
I Stratified blocked randomization is a useful way to achieve
balance
- For each subgroup or strata, perform a separate block
randomization
23 / 52
Design consideration: Who/What to Randomize
I Person
- Most common unit of randomization in RCTs
I Provider
- Doctor
- Nursing station
I Locality
- School
- Community
I The sample size is predominantly determined by the number of
randomized units
- This is due to correlation of repeated samples within a
person/doctor/community
24 / 52
Cluster Randomization
25 / 52
Randomization in Platform Trials
26 / 52
Platform Example
27 / 52
Considerations for Platform Trials
Gold et al. (2022); Berry et al. (2015)
28 / 52
The Danger of Non-concurrent Controls
Dodd et al. (2021)
29 / 52
What is Adaptive Randomization?
30 / 52
Baseline Adaptive Schemes (1)
31 / 52
Baseline Adaptive Schemes (2)
I Dynamic allocation algorithms based on maintaining balance
across multiple important prognostic variables
- Develop an index of imbalance across multiple baseline
covariates
- Minimization: next treatment assignment minimizes current
imbalance
- Other dynamic allocation schemes give the treatment which
minimizes the imbalance a higher probability of assignment
- Benefits: can maintain balance across several prognostic
variables, without worrying about lots of incomplete blocks.
Maintains balance better than stratified permuted block,
particularly in small trials and/or many covariates
- Cons: statistical analysis less straight forward, easy to screw up,
hard to document. Classic problem: what happens if find an error
in allocation or participant’s data
32 / 52
Eye-Opening Experience for Minimization
33 / 52
Eye-Opening Experience for Minimization
34 / 52
ANCOVA p = 0.035 Rerandomization p = 0.06
35 / 52
Eye-Opening Experience for Minimization
36 / 52
Eye-Opening Experience for Minimization
I For more details on LOTS trial, see Van der Ploeg et al (2010)
NEJM 362, 1396-1406
I For more details about statistical problems minimization caused
see Proschan et al. (2011), and for how to fix them, see
Kuznetsova and Tymofyeyev (2012)
I For more details about mathematics of randomization see:
Rosenberger, W. F., and Lachin, J. M. (2015). Randomization in
clinical trials: theory and practice. John Wiley & Sons.
37 / 52
Response Adaptive Schemes
38 / 52
Goals of Response Adaptive Schemes can vary
39 / 52
ECMO Trial: A Cautionary Tale for Response
Adaptive Allocation
40 / 52
Challenges of Response Adaptive Schemes:
Analytical properties are hard to decipher
I Some argue these trials are more ethical, because they aim to
maximize number of people on the better treatment
- There have been statistical efficiency claims, but actually now
shown to be false
I Adaptive allocation designs are difficult to implement without
mistakes or problems with blinding
I Inference for response-adaptive randomization is very
complicated because both the treatment assignment and
responses are correlated (Rosenberger and Lachin, 2015)
I Analytical properties are not well-established, especially of new
designs
I Advice: These methods are controversial and prone to
problems, avoid unless you are an expert and willing to repeat
your trial
41 / 52
Lessons from ECMO If you must use RAR
Proschan and Evans (2020); Chandereng and Chappell (2020)
42 / 52
What Randomization scheme is best?
43 / 52
Maintaining Randomization Integrity
44 / 52
Flavors of ITT
I ITT analysis
- Analyze according to the study regimen assigned
- Requires models to weight observed or impute missing outcomes,
requires sensitivity analysis
- Only analysis which preserves randomization
I Modified ITT (MITT) analysis
- ITT, but only include people who take the first dosage
- In well-implemented trials few people drop out before first dose
- Potentially minor departure from ITT if blinded
45 / 52
Analysis Choices (2)
46 / 52
Threats to Randomization Integrity
47 / 52
“Analyze as you randomize”
48 / 52
Summary
49 / 52
Conclusion
50 / 52
References I
Berry, S. M., Connor, J. T., and Lewis, R. J. (2015). The platform trial: an efficient strategy for
evaluating multiple treatments. Jama 313, 1619–1620.
Chandereng, T. and Chappell, R. (2020). How to do response-adaptive randomization (rar) if you
really must. Clinical Infectious Diseases 73, 560.
Dodd, L. E., Freidlin, B., and Korn, E. L. (2021). Platform trials—beware the noncomparable control
group. New England Journal of Medicine 384, 1572–1573.
Gold, S. M., Bofill Roig, M., Miranda, J. J., Pariante, C., Posch, M., and Otte, C. (2022). Platform
trials and the future of evaluating therapeutic behavioural interventions. Nature Reviews
Psychology 1, 7–8.
Kahan, B. C. and Morris, T. P. (2012). Improper analysis of trials randomised using stratified blocks
or minimisation. Statistics in medicine 31, 328–340.
Kuznetsova, O. M. and Tymofyeyev, Y. (2012). Preserving the allocation ratio at every allocation
with biased coin randomization and minimization in studies with unequal allocation. Statistics in
Medicine 31, 701–723.
Lachin, J. M., Matts, J. P., and Wei, L. (1988). Randomization in clinical trials: conclusions and
recommendations. Controlled clinical trials 9, 365–374.
Markaryan, T. and Rosenberger, W. F. (2010). Exact properties of efron’s biased coin
randomization procedure. The Annals of Statistics 38, 1546–1567.
Matts, J. P. and Lachin, J. M. (1988). Properties of permuted-block randomization in clinical trials.
Controlled clinical trials 9, 327–344.
51 / 52
References II
Pocock, S. J., Assmann, S. E., Enos, L. E., and Kasten, L. E. (2002). Subgroup analysis, covariate
adjustment and baseline comparisons in clinical trial reporting: current practiceand problems.
Statistics in medicine 21, 2917–2930.
Proschan, M., Brittain, E., and Kammerman, L. (2011). Minimize the use of minimization with
unequal allocation. Biometrics 67, 1135–1141.
Proschan, M. and Evans, S. (2020). Resist the temptation of response-adaptive randomization.
Clinical Infectious Diseases 71, 3002–3004.
Rosenberger, W. F. and Lachin, J. M. (2015). Randomization in clinical trials: theory and practice.
John Wiley & Sons.
Tsiatis, A. A., Davidian, M., Zhang, M., and Lu, X. (2008). Covariate adjustment for two-sample
treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics
in medicine 27, 4658–4677.
52 / 52
Lecture 3: Sample Size/Power
1 / 45
Introduction to Power/Sample Size
Introduction to EZ Principle
2 / 45
Introduction to Power/Sample Size
3 / 45
Introduction to Power/Sample Size
I Examples:
I T-statistic: δˆ = ȲT − ȲC ; se(δˆ) = 2σ 2 /n.
p
4 / 45
Outline
Introduction to Power/Sample Size
Introduction to EZ Principle
Where Does The Key Formula Come From?
General EZ Principle and Applications
t-test
Test of Proportions
Survival
Noninferiority
Lack of Reproducibility
Sample Size: Practical Aspects
Treatment Effect
Nuisance Parameters
Sample Size: Estimation
Sample Size: Safety
Introduction to EZ Principle
5 / 45
Introduction to EZ Principle
I Makes checking sample size calculations quick and easy.
I Primary outcome: change in log viral load from baseline. Use t-test.
I Want 80% power for difference δ = 0.5 and you expect σ = 1.25.
I Investigator says you need 50/arm. Is that correct?
I Expected z-score is
µ − µT 0.5
E(Z ) = pC =p = 2.
2
2σ /n 2(1.25)2 /50
6 / 45
Introduction to EZ Principle
I Check:
δ 0.5
E(Z ) = p =p = 2.828.
2σ 2 /n 2(1.25)2 /100
I Close to 2.80. Sample size is accurate.
7 / 45
Introduction to EZ Principle
pC − pT 0.20 − 0.12
E(Z ) = p =p = 4.880.
2p(1 − p)/n 2(0.16)(1 − 0.16)/1000
8 / 45
Introduction to EZ Principle
I Trial is overpowered.
I Check:
pC − pT 0.20 − 0.12
E(Z ) = p =p = 3.086.
2p(1 − p)/n 2(0.16)(1 − 0.16)/400
I Still slightly overpowered, but not much because E(Z ) is not too
far from 3.00.
9 / 45
Introduction to EZ Principle
I Before looking at more examples, let’s look at the basis for the
EZ principle.
10 / 45
Outline
Introduction to Power/Sample Size
Introduction to EZ Principle
Where Does The Key Formula Come From?
General EZ Principle and Applications
t-test
Test of Proportions
Survival
Noninferiority
Lack of Reproducibility
Sample Size: Practical Aspects
Treatment Effect
Nuisance Parameters
Sample Size: Estimation
Sample Size: Safety
Where Does The Key Formula Come from?
Area 0.025
0 1.96
Under H0
Figure: The standard normal null density for the z-statistic. For a 1-tailed test at
α = 0.025, we reject H0 if Z > 1.96.
11 / 45
Where Does The Key Formula Come from?
Area 0.90
0 1.96 θ
Under H1
Figure: The alternative N(θ , 1) density for Z . For power 0.90, we want the blue
shaded area to be 0.90.
12 / 45
Where Does The Key Formula Come from?
Area 0.90
1.96 − θ 0
Figure: The blue shaded area in Figure 2 equals the blue shaded area to the right of
1.96 − θ under the standard normal curve. For power 0.90, 1.96 − θ = −1.28, so
θ = 1.96 + 1.28 = 3.24.
13 / 45
Outline
Introduction to Power/Sample Size
Introduction to EZ Principle
Where Does The Key Formula Come From?
General EZ Principle and Applications
t-test
Test of Proportions
Survival
Noninferiority
Lack of Reproducibility
Sample Size: Practical Aspects
Treatment Effect
Nuisance Parameters
Sample Size: Estimation
Sample Size: Safety
General EZ Principle and Applications
14 / 45
General EZ Principle and Applications
area= 1 − β area= β
0 zβ
Figure: The area to the right of zβ is β , so the area to the left of zβ is 1 − β =power.
15 / 45
General EZ Principle and Applications
I Example: return to hepatitis C (HCV) trial.
I Primary outcome: Change in log viral load from baseline. T-test.
I Want sample size for 80% power for 2-sided test at α = 0.05.
I δ = 0.5 and σ = 1.25.
δˆ Ȳ − ȲT δ 0.5
Z = = pC ; E(Z ) = p =q .
se(δˆ) 2
2σ /n 2
2σ /n 2(1.25)2
n
0.5
q = 1.96 + 0.84 = 2.80
2(1.25)2
n
2(1.25)2 (2.80)2
n = = 98.
0.52
Need 98/arm.
16 / 45
General EZ Principle and Applications
δˆ Ȳ − ȲT
Z = = pC
se(δˆ) 2σ 2 /n
δ
E(Z ) = q = (zα/2 + zβ ) (EZ Principle)
2(1.25)2
75
δ
q = 1.96 + 0.84 = 2.80
2(1.25)2
75
r
2(1.25)2
δ = 2.80 = 0.57.
75
Detectable effect is 0.57 logs.
17 / 45
General EZ Principle and Applications
I What is power for detecting 0.5 log if you only recruit 75/arm?
δˆ Ȳ − ȲT δ
Z = = pC ; E(Z ) = p
se(δˆ) 2
2σ /n 2σ 2 /n
0.5
E(Z ) = q = zα/2 + zβ (EZ Principle)
2(1.25)2
75
2.449 = 1.96 + zβ
0.489 = zβ
p̂ − p̂T
Z = qC .
2p(1−p)
n
p − pT 0.20 − 0.12
E(Z ) = qC =q .
2p(1−p) 2(0.16)(1−0.16)
n n
0.20 − 0.12
q = zα/2 + zβ = 3 (EZ Principle)
2(0.16)(1−0.16)
n
2(0.16)(0.84)(3)2
n ≈ = 378. (2)
(0.08)2
Need 378/arm.
19 / 45
General EZ Principle and Applications
p̂ − p̂T
Z = qC .
2p(1−p)
n
p − pT 0.20 − 0.12
E(Z ) = qC =q = 2.673
2p(1−p) 2(0.16)(1−0.16)
n 300
20 / 45
General EZ Principle and Applications
I Schoenfeld (1981) derives sample size for survival tests.
Dead Alive
Control Oi nCi
Treatment nTi
1 ni − 1 ni
nCi , nTi = numbers at risk in treatment, control just prior to ith death.
Dead Alive
Control Oi nCi
Treatment nTi
1 ni − 1 ni
nCi , nTi = numbers at risk in treatment, control just prior to ith death.
22 / 45
General EZ Principle and Applications
I FUN FACT: Each δˆi = (Oi − Ei )/Vi estimates the log hazard ratio
and has variance 1/Vi .
∑di=1 (Oi − Ei )
= . (4)
∑di=1 Vi
23 / 45
General EZ Principle and Applications
Logrank z-statistic is
!v
δˆ ∑d=1 (Oi − Ei ) ∑di=1 (Oi − Ei )
u d
u
Z = = iq = t
∑ Vi
∑di=1 Vi
q
var(δˆ) ∑di=1 Vi i =1
v
ud
u
ˆ
= δ t ∑ Vi , (5)
i =1
Z ≈ δˆ d/4
p
p
E(Z ) ≈ δ d/4, (6)
d =number of deaths, δˆ, δ are estimated and true log hazard ratios.
24 / 45
General EZ Principle and Applications
p
E(Z ) = δ d/4 = (zα/2 + zβ ) (EZ Principle)
4(zα/2 + zβ )2
d = , (7)
δ2
δ =log hazard ratio (parameterized so that large hazard ratios
show that treatment works).
25 / 45
General EZ Principle and Applications
p
E(Z ) = δ d/4 = (zα/2 + zβ ) = (1.96 + 1.04) = 3 (EZ Principle)
4(3)2 4(3)2
d = ≈ 436 events. (8)
δ {ln(1.333)}2
2
26 / 45
General EZ Principle and Applications
p
E(Z ) = ln(1.333) 350/4 = 2.689
27 / 45
General EZ Principle and Applications
p
E(Z ) = ln(λ ) 350/4 = zα/2 + zβ = 1.96 + 1.04 = 3 (EZ Principle)
!
3
λ = exp p
350/4
= 1.378. (10)
28 / 45
General EZ Principle and Applications
29 / 45
General EZ Principle and Applications
30 / 45
General EZ Principle and Applications
If pS = pN ,
0.10
E(Z ) ≈ q = (zα/2 + zβ ) (EZ Principle)
2p(1−p)
n
0.10
q = 1.645 + 1.282 = 2.927.
2p(1−p)
n
31 / 45
General EZ Principle and Applications
E(Z ) = 1.96 + zβ
1.96 = 1.96 + zβ
0 = zβ
Φ(0) = Φ(zβ ) = 1 − β = power
0.5 = power (12)
32 / 45
Outline
Introduction to Power/Sample Size
Introduction to EZ Principle
Where Does The Key Formula Come From?
General EZ Principle and Applications
t-test
Test of Proportions
Survival
Noninferiority
Lack of Reproducibility
Sample Size: Practical Aspects
Treatment Effect
Nuisance Parameters
Sample Size: Estimation
Sample Size: Safety
Sample Size: Practical Aspects
33 / 45
Sample Size: Practical Aspects: Treatment Effect
I If treatment has few side effects (e.g., a diet), even a small effect
is worthwhile.
34 / 45
Sample Size: Practical Aspects: Treatment Effect
35 / 45
Sample Size: Practical Aspects: Nuisance
Parameters
36 / 45
Sample Size: Practical Aspects: Nuisance
Parameters
I Useful formula for variance of change from baseline (BL) to end
of study (EOS):
I NOTE: If ρ < 0.5, then you should use YEOS , NOT YEOS − YBL .
Even better, use baseline value as covariate (also called analysis
of covariance–ANCOVA).
38 / 45
Sample Size: Practical Aspects: Nuisance
Parameters
I Options:
I Increase treatment effect. PI: “A larger effect is unrealistic.”
I Use a different primary endpoint. E.g., add stroke to composite of
coronary heart disease/death. Statistician: “That will work as long
as treatment has a similar effect on added component. Otherwise,
you could decrease power.”
39 / 45
Outline
Introduction to Power/Sample Size
Introduction to EZ Principle
Where Does The Key Formula Come From?
General EZ Principle and Applications
t-test
Test of Proportions
Survival
Noninferiority
Lack of Reproducibility
Sample Size: Practical Aspects
Treatment Effect
Nuisance Parameters
Sample Size: Estimation
Sample Size: Safety
Sample Size: Estimation
I Set
p
1.96 p(1 − p)/n = 0.15
40 / 45
Sample Size: Estimation
(1.96)2 p(1 − p)
n= = 341.4756p(1 − p). (13)
(0.15)2
41 / 45
Outline
Introduction to Power/Sample Size
Introduction to EZ Principle
Where Does The Key Formula Come From?
General EZ Principle and Applications
t-test
Test of Proportions
Survival
Noninferiority
Lack of Reproducibility
Sample Size: Practical Aspects
Treatment Effect
Nuisance Parameters
Sample Size: Estimation
Sample Size: Safety
Sample Size: Safety
42 / 45
Summary
E(Z ) = zα/2 + zβ .
43 / 45
Summary
44 / 45
References I
45 / 45