0% found this document useful (0 votes)
70 views8 pages

Chapter 1: Measurement: Summary Points and Objectives

This document outlines the key objectives and concepts covered in each chapter of the textbook "Basic Biostatistics". The chapters cover: [1] measurement scales and data tables; [2] experimental and observational study designs; [3] frequency distributions and summary statistics like the mean, median, and standard deviation; [4] probability concepts and binomial and normal distributions; [5] statistical inference including hypothesis testing and confidence intervals; and [6] quantitative methods for comparing a mean to a hypothesized value or between paired groups. The objectives emphasize understanding different study designs, probability distributions, how to calculate and interpret common statistics, and performing hypothesis tests and constructing confidence intervals.

Uploaded by

Francis Karanja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views8 pages

Chapter 1: Measurement: Summary Points and Objectives

This document outlines the key objectives and concepts covered in each chapter of the textbook "Basic Biostatistics". The chapters cover: [1] measurement scales and data tables; [2] experimental and observational study designs; [3] frequency distributions and summary statistics like the mean, median, and standard deviation; [4] probability concepts and binomial and normal distributions; [5] statistical inference including hypothesis testing and confidence intervals; and [6] quantitative methods for comparing a mean to a hypothesized value or between paired groups. The objectives emphasize understanding different study designs, probability distributions, how to calculate and interpret common statistics, and performing hypothesis tests and constructing confidence intervals.

Uploaded by

Francis Karanja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Basic Biostatistics by B.

Burt Gerstman
Summary Points and Objectives

Chapter 1: Measurement
ƒ Biostatistics is more than a compilation of computational techniques!
ƒ Identify the main types of measurement scales: quantitative, ordinal, and categorical.
ƒ Understand the layout of a data table (observations, variables, values)
ƒ Appreciate the essential nature of data quality (GIGO principle).

Chapter 2: Types of Studies


ƒ Understand the difference between experimental and non-experimental (“observational”) designs

ƒ Understand the procedure for a simple random sample


ƒ Understand the procedure for randomizing a treatment
ƒ Define “confounding” and “lurking variable”
ƒ List preconditions for confounding

Chapter 3: Frequency Distributions


ƒ Create and interpret stemplots
ƒ Describe distributional shape, location, and spread; check for outliers
ƒ Create frequency tables containing frequency, relative frequency, cumulative frequency using
uniform or non-uniform class intervals

Chapter 4: Summary Statistics


ƒ Appreciate that great care must be taken in interpreting and reporting statistics!
1
ƒ Sample mean: x =
n
∑ xi
n +1
ƒ Median: Form an ordered array. The median is the value with a depth of ; when n is odd,
2
average the two middle values.

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 1 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives
ƒ Quartiles (Tukey’s hinges): Divide the ordered array at the median; when n is odd, the median
belongs to both the low group and the high group. Q1 is median of the low group. Q3 is the
median of the high group.
ƒ Five-point summary: minimum, Q1, median, Q3, maximum
ƒ IQR = Q3 − Q1
ƒ Boxplot: plot median and quartiles (box); determine upper and lower fences: FL = Q1 − 1.5·IQR,
FU = Q3 + 1.5·IQR; plot outside values; draw whiskers from hinges to inside values
ƒ Understand the strengths and limitations of the mean, median, and mode
1
ƒ Sample variance: s 2 =
n −1 ∑( xi − x ) 2

1
ƒ Sample standard deviation: s = s 2 ; direct formula s =
n −1 ∑(x − x)
i
2

ƒ Select descriptive statistics suitable for distributional shape

Chapters 5: Probability Concepts


ƒ Understand and use in practice these basics rules for probabilities:
(1) 0 ≤ Pr(A) ≤ 1
(2) Pr(S) = 1
(3) Pr(Ā) = 1 − Pr(A)
(4) Pr(A or B) = Pr(A) + Pr(B) for disjoint events
ƒ Use probability mass function (pmfs) to find probabilities for discrete random variables
ƒ Use probability density function pdfs to find probabilities for continuous random variables
ƒ Optional: Understand the more advances rules for probabilities: (5) Independence rule (6) General
rule of addition (7) Conditional probability definition (8) General rule of multiplication (9) Total
probability rule (10) Bayes’ theorem

Chapter 6: Binomial Distributions


ƒ Identify a binomial random variable and its parameters: X~b(n,p)
n!
ƒ Calculate and interpret binomial probabilities: Pr( X = x)= n C x p x q n − x where n C x =
x!(n − x)!
ƒ Calculate and interpret expected values (mean) and standard deviation for binomial random
variables: µ = np and σ = npq where q = 1 – p.

Chapter 7: Normal Distributions


ƒ Characterize and sketch, Normal distributions with parameters μ and σ: X ~ N(μ, σ)
ƒ Use the 68–95–99.7 rule to determine approximate probabilities for Normal random variables
ƒ Characterize and sketch Standard Normal random variable Z ~ N(0,1); and understanding Table B
x−μ
ƒ Finding Normal probabilities (1) State (2) Standardize z = (3) Sketch (4) Table B
σ
ƒ Finding percentile values on a Normal distribution: (1) State (2) Sketch (3) Table B (4)
Unstandardize: x = μ + zpσ

Chapter 8: Introduction to Statistical Inference


ƒ Define statistical inference; list the two primary forms of statistical inference
ƒ Distinguish parameters from statistics!
ƒ Understand the method of simulating a sampling distribution of a mean
ƒ Characterize the sampling distribution of x from a Normal population: x ~ N(μ, σ n)
σ
ƒ Understand the standard error of x in relation to the square root law: SE x =
n

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 2 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives
ƒ Appreciate that the central limit theorem assures x ~ N(μ, σ n ) when the sample size is
moderate to large
ƒ Know that the law of large numbers assures that x approaches μ as the sample gets large

Chapter 9: Basics of Hypothesis Testing


ƒ Appreciate that hypothesis testing looks for evidence against the claim of H0 and understand the
meaning of each step of the procedure:
Step A. H0 and Ha
Step B. Test statistic
Step C. P-value
Step D. Optional: Significance level
ƒ See how hypothesis testing relates to the sampling distribution of x
ƒ Conduct one sample tests of means when σ is known:
Conditions: SRS, Normal population or moderate to large sample size.
x − μ0 σ
(A.) H0: μ = μ0 (B.) z = where SE x = (C.) P-value and interpretation
SE x n
ƒ Define: type I error; type II error; beta, power
ƒ Determine the power and sample size requirements of a test (these objective are covered /
reviewed under the Chapter 11 objectives)

Chapter 10: Basics of Confidence Intervals


ƒ Appreciate how a confidence interval seek to locate a parameter with given margin of error
ƒ See how confidence intervals estimation relates to the sampling distribution of x
ƒ Calculate and interpret confidence intervals for μ at various levels of confidence when σ is known:
Conditions: SRS, Normal population or moderate to large sample size.
σ
Formula: x ± z1−α / 2 ⋅ SE x where SE x =
n
ƒ Determine sample size requirements for estimating μ with given level of confidence and margin of
error (see Chap 11 for formula)
ƒ Understand the relationship between confidence interval location and hypothesis testing

PART II: QUANTITATIVE RESPONSE VARIABLE


Chapter 11: Inference about a Mean
ƒ Quantitative response variable, no explanatory variable per se (single sample or paired samples)
ƒ Understand when to use t procedures
ƒ Sketch t distributions; use Table C to look up t values and associated probabilities
ƒ Conduct one-sample and paired-sample t tests (conditions: SRS, population Normal or large sample):
x − μ0 s
(A.) H0: μ = μ0 (B.) t stat = where SE x = with n – 1 df C. P-value and interpretation
SE x n
ƒ Calculate and interpret one-sample and paired-sample confidence interval for μ:
Formula: x ± t n −1,1− α ⋅ SE x
2

ƒ Recognize paired samples and adapt the one-sample t procedures to paired samples
ƒ Evaluate the Normality assumption in small, medium, and large samples
ƒ Conduct sample size and power analyses:
2
⎛ σ⎞
o to limit margin of error m when estimating μ, use n = ⎜ z1− α ⎟
2 m
⎝ ⎠

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 3 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives

o to detect a difference of Δ with stated power and α, use n =


(
σ 2 z1− β + z1− α
2
)
2

Δ2
⎛ | Δ | n ⎞⎟
o to determine the power of a test to detect Δ , 1 − β = Φ⎜ − z1− α +
⎜ 2 σ ⎟⎠

Chapter 12: Comparing Independent Means


ƒ Quantitative response variable, binary explanatory variable (two independent samples)
ƒ Compare group means, standard deviations, sample sizes
ƒ Compare group distributions graphically (e.g., side-by-side boxplots, side-by-side stemplots)
ƒ Conduct independent t test: (conditions: independent samples and Normality or large samples)
x1 − x2 s12 s22
(A.) H0: μ1 = μ2 (B.) tstat = where SE x1 − x 2 = + with dfconservative = smaller of (n1 – 1)
SE x1 − x2 n1 n2
or (n2 – 1) [use dfWelch when working with a computer] (C.) P-value and interpretation
ƒ Calculate and interpret (1 −α)100% confidence interval for μ1 − μ2 :
Formula: ( x1 − x2 ) ± (t df ,1− α )( SE x1 − x2 )
2

ƒ Optional: Be aware and understand the historical relevance of equal variance (“pooled”) t procedures
2 ⎛1 1 ⎞ df ⋅ s 2 + df 2 ⋅ s22
where SE = s pooled ⎜⎜ + ⎟⎟ where spooled
2
= 1 1 and df = (n1 − 1) + (n2− 1)
⎝ n1 n2 ⎠ df1 + df 2
ƒ Power and sample size
2σ 2 z12− α
ƒ To estimate μ1 − μ2 with margin of error m, use n = 2
in each group
m2

ƒ To test H0: μ1 = μ2 to detect Δ at given (1–β) and α: use n =


(
2σ 2 z1− β + z1− α )
2

in each group
2

Δ2
ƒ If it is not possible to study groups of equal size, then determine n by the above formulas, fix the
nn1
size of n1, and have n2 = .
2n1 − n

Chapter 13: ANOVA


ƒ Quantitative response variable, categorical explanatory variable (k independent samples)
ƒ Always start with descriptive and exploratory comparisons!
ƒ ANOVA test (conditions: independent samples, normality, equal variance)
(A.) H0: μ1 = μ2 = … = μk versus Ha: at least two of the population means differ
(B.) Fstat with dfB and dfW from ANOVA table
(C.) P-value and interpretation

Variance Sum of Squares df Mean Square


k SS
Between
groups
SS B = ∑ n (x − x )
i =1
i i
2
dfB = k − 1 MSB = B
df B
k SSW
Within
groups
SSW = ∑i =1
(ni − 1) si2 dfW = N − k MSW =
dfW
Total SST = SSB + SSW df = dfB + dfW

MSB
Fstat = with dfB and dfW
MSW

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 4 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives
ƒ Use post-hoc procedures such as the least squares difference method to delineate significant
xi − x j
differences (A.) H0: μi = μj for groups i and j (B.) tstat = where
SE xi − x j

⎛1 1 ⎞⎟
SE xi − x j = MSW ⎜ + and df = N – k (C.) P-value and interpretation
⎜ ni n j ⎟
⎝ ⎠
ƒ Recognize the problem of multiple comparisons and use Bonferroni method to keep the the
family-wise error rate in check (when appropriate): PBonf = PLSD × c where c represents the number
of post hoc comparisons made.
ƒ Assess the equal variance assumption graphically, by comparing group standard deviations, and
with Levene’s test of H0: σ21 = σ22 = … = σ2k.
ƒ Use robust non-parametric ANOVA (i.e., the Kruskal-Wallis test) when necessary.

Chapter 14: Correlation and Regression

ƒ Quantitative explanatory variable; quantitative response variable


ƒ Linear relations only!
ƒ Start with a scatterplot. Describe form, direction, and strength. Also check for outliers.
ƒ Correlation does not necessarily indicate causation; beware of lurking variables.
ƒ Correlation coefficient r is always between −1 and 1; it quantifies the direction (positive/negative) and
strength of an association. As rules of thumb: |r| < 0.3 suggests weak strength and |r| > 0.7 suggests
strong strength (“grain of salt” no firm cutoffs, and best used merely as a screening tool).
1
Formula: r =
n −1 ∑ z X zY

[Use calculator or software tool to check calculations.]


ƒ Inferences about population correlation coefficient ρ:
r 1− r2
To test H0: ρ = 0, use tstat = where SEr = and df = n – 2
SEr n−2

r −ϖ r +ϖ t df2 ,1− α
Confidence interval for ρ: LCL = and UCL = where ϖ = 2 2

1 − rϖ 1 + rϖ t df ,1− α + df
2

s
ƒ Least squares regression model: yˆ = a + bx where b = r Y and a = y − bx .
sX
ƒ Slope estimate b is the key statistic in all this, representing the predicted change in Y per unit X.
ƒ Inference about population slope β:
1
Standard error of the regression sY |x =
n−2 ∑ residuals 2
with df = n – 2

sY | x
(1 −α)100% confidence interval for β = b ± (tn-2,1-α/2)(SEb) where SEb =
n −1 ⋅ sX
b
To test H0: β = 0, use tstat =
SEb
Optional: An ANOVA procedure can be used to test H0: β = 0 using an Fstat (pp. 321–324)

Chapter 15: Multiple Regression


ƒ Multiple regression is an extension of simple regression; students should master simple regression
before moving on to multiple regression.
ƒ The quantitative response variable Y depends on multiple explanatory variables X1, X2, … Xk via this
model: yˆ = a + b1 x1 + b2 x2 + " + bk xk .

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 5 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives
ƒ Categorical explanatory variables can be entered into the model if coded with indicator “dummy”
variables.
ƒ The computer uses a least squares criterion to fit a regression surface by minimizing ∑residuals2.
ƒ The key statistics are the slope estimates, bis, representing predicted changes in Y per unit Xi, adjusting
for the other explanatory variables in the model.
ƒ Interpret confidence intervals for each βi
ƒ Interpret t tests for each H0: βi = 0.
ƒ Residuals are examined to assess linearity, independence, normality, equal variance.
ƒ Optional analysis of variance derives:

Sum of Squares df Mean Square


Regression
∑ ( yˆ i − y )2 k
SS regression
df regression
Residual
“error” ∑ ( y − yˆ )
i i
2
n−k−1
SS residual
df residual
Total
∑ (y − y )
i
2
n−1
MS regression
Fstat = with k and n − k − 1 dfs
MS residual
Sum of Squares Regression
Model fit (of secondary concern) is quantified with R 2 = .
Sum of Squares Total

PART III CATEGORICAL RESPONSE VARIABLE


Chapter 16: Inference about a Proportion
ƒ Single sample; binary outcome.
ƒ Sample proportion p̂ is viewed in the context of a binomial numerator (x) and constant
denominator (n); inference are directed toward binomial parameter p
ƒ p̂ represents incidence or prevalences, depending how data are accrued
ƒ Hypothesis test (large samples)
pˆ − p0
(A.) H0: p = p0 (B.) z stat = (C.) P-value and interpretation
p0 q0 n
| pˆ − p0 | − 21n
Optional continuity-correction z stat ,c =
p0 q0 n
ƒ Hypothesis test (small samples, e.g., less than 5 successes)
(A.) H0: p = p0 (B.) Observed number of success (C.) P-value from “exact” binomial calculations
(computer assisted) and interpretation
ƒ The power of the hypothesis test depends on assumed values for p0, p1, n, and α (p. 368)
ƒ (1 – α)100% confidence intervals for p by “plus-four” method (similar to Wilson’s):
~~ x+2
p ± z1− α ⋅ pq ~ where ~
~ p= and q~ = 1 − ~
p
2 n n+4
ƒ With n < 10 use, use exact binomial procedure (computer) for confidence interval.
z12− α p*q*
ƒ To limit the margin of error (m) when estimating p, use n = 2
.
m2

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 6 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives

Chapter 17: Comparing Two Proportions


ƒ Binary response variable, binary explanatory variable (two independent groups)

Successes Failures Total


Group 1 a1 b1 n1
Group 2 a2 b2 n2
Total m1 m2 N
a1 a
ƒ pˆ 1 = and pˆ 2 = 2 . Sample proportions pˆ 1 and pˆ 2 reflect underlying parameters p1 and p2.
n1 n2
ƒ Hypothesis test, large samples:
(A.) H0: p1 = p2
pˆ1 − pˆ 2
(B.) zstat = (or chi-square, next chapter)
⎛1 1 ⎞
pq ⎜⎜ + ⎟⎟
⎝ n1 n2 ⎠
(C.) P-value and interpretation
ƒ Hypothesis test, small samples, use Fisher’s test (computer assisted)
ƒ Risk difference = pˆ 1 - pˆ 2 ; “excess risk in absolute terms associated with exposure”
(1 – α)100% confidence interval for p1 − p2 by plus-four method:
a +1 ~
p q~ ~p q~
(~
p1 − ~
p2 ) ± z1− α ⋅ SE ~p1 − ~p 2 where ~ pi = i and SE ~p1 − ~p 2 = ~1 1 + ~2 2
2 ni + 2 n1 n2

ƒ Relative risk Rˆ R = 1 ; “excess risk in relative terms associated with exposure”
pˆ 2
ln Rˆ R ± z ⋅SEln Rˆ R
1− α 1 1 1 1
(1−α)100% CI for RR = e 2
where SEln Rˆ R = − + −
a1 n1 a2 n2
ƒ Systematic sources of error due to selection bias, information bias, and confounding!
ƒ The power of testing H0: p1 = p2 depends on p1, p2, n1 and n2, and α. Use software to calculate sample
size and power; encourage students to think about underlying “inputs”.

Chapter 18: Cross-Tabulated Counts


ƒ Understand that data can come from naturalistic, cohort, or case-control samples.
ƒ Cross-tabulate counts from categorical response variable (C columns) and categorical explanatory
variable (R rows). Example of R-by-2 table:

Successes Failures Total


Group 1 a1 b1 n1
Group 2 a2 b2 n2
↕ ↕ ↕ ↕
Group R aR bR nR
Total m1 m2 N

ai
ƒ In naturalistic and cohort samples, report incidence (or prevalences) in each group: pˆ i = .
ni
ƒ Characteristics of chi-square probability distributions (e.g., start at 0, asymmetrical, become
increasingly symmetrical as the df increases)
ƒ Hypothesis test for association (large samples)
(A.) H0: no association in population (homogeneity of proportions)

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 7 of 8


Basic Biostatistics by B. Burt Gerstman
Summary Points and Objectives
⎡ (Oi − Ei )2 ⎤ row total × column total
2
(B.) X stat = ∑ ⎢⎣⎢
all

Ei
⎦⎥
where E =
i
table total
with df = (R – 1) (C – 1)

(C.) P-value from chi-square table or program and interpretation


ƒ Hypothesis test (small samples): use Fisher’s procedure when more than 20% of expected frequencies
are less than 5 or any expected frequency is less than 1.
ƒ In naturalistic and cohort samples, use risk difference or risk ratio as measure of association.
ƒ Hypothesis test for trend (ordinal explanatory or response variable)
(A.) H0: “no trend in population” (B.) Use program to calculate Mantel trend statistic (C.) P-value and
interpretation
ƒ Case-control sample: population cases and random sample of population non-cases → do not calculate
incidence or prevalences. Calculate odds ratio as estimate of population rate ratio (equivalent to the
risk ratio when the outcome is rare).
a /b
Oˆ R = 1 2
a 2 / b1
ln Oˆ R ± z ⋅SEln Oˆ R
1− α 1 1 1 1
(1 – α)100% CI for the OR = e 2
where SEln Oˆ R = + + +
a1 b1 a2 b2
ƒ Matched-pairs:

Case E+ Case E−
Control E+ a b
Control E− c d

ln OR ± z α ⋅SEln Oˆ R ˆ
c 1 1
Oˆ R = ; (1 – α)100% confidence interval for the OR = e
1−
2
where SEln Oˆ R = +
b c b
( c − b) 2
Hypothesis test: (A.) H0: OR = 1 (B.) z stat = (C.) P-value and interpretation; use exact
c+b
binomial procedure when there are 5 or less discordant pairs

Chapter 19: Stratified 2-by-2 Tables


ƒ Methods to mitigate confounding: randomization, restriction, matching, regression, stratification
ƒ Simpson’s paradox is an extreme form of confounding in which the direction of association is reversed
by the confounding factor
ƒ Strata specific RRs are denoted with subscripts: RR1, RR2, …, RRK
ƒ See if strata-specific RRs provide the same “picture” as the crude RR. If not, this is evidence of
confounding or interaction.
ƒ Heterogeneous strata-specific RRs suggest statistical interaction.
ƒ Chi-square test for interaction. Example considers RRs from two strata:
(A.) H0: RR1 = RR2 (no interaction) (B.) Chi-square interaction statistics (various forms) (C.) P-value
and interpretation
ƒ There are no statistical tests for confounding.
ƒ If there is confounding and no interaction), Mantel-Haenszel procedures are applied to summarize the
RRs and test the association (pp. 468 – 472).

C:\data\biostat-text\SummaryPoints.doc printed 6/3/2008 9:52:00 AM Page 8 of 8

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy