0% found this document useful (0 votes)

12 views23 pages

AP Stat Review

This document provides a comprehensive overview of key concepts and strategies for AP Statistics, including test-taking tips, data description, distributions, correlations, studies, and probability. It emphasizes the importance of understanding categorical and quantitative data, sampling methods, and the implications of correlation versus causation. Additionally, it outlines various statistical techniques and formulas essential for analyzing data and conducting experiments.

Uploaded by

zhougrace105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views23 pages

AP Stat Review

Uploaded by

zhougrace105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

← back to Study Materials for Various APs

Note: If you are planning to use this document AFTER Spring 2023, please
make a copy of it before then; I can’t guarantee that it won’t be deleted after
that point.

___ = important for AP, worth memorizing/understanding well

General Test-Taking Tips

● MC: Always check for similar answers
● FRQ: Memorize AP-formatted responses
● Given Formulas (AP) (aka what you don’t have to memorize, but what
you should probably know how to use)
● AP Stat Videos
● Calculator Guide (anytime calculator use is mentioned/implied in the
following notes, check this document for further details)

Chapter 1: Describing Data

1.1: Categorical Data

● categorical: qualitative
● quantitative: numerical
● bar graphs, NOT histograms
● watch the scale!
● frequency = count
● relative frequency = %

1.2: Displaying Quantitative Data

● describing distribution: SOCS (Shape, Outliers, Center, Spread)
● mean is affected by skew, median is not
● histograms: if comparing distributions w/ diff. sample sizes, use %

1.3: Describing Quantitative Data

● IQR = Q1-Q3
○ middle 50% of observations
● finding Q1,Q3:
○ odd: exclude median (Q2)
○ even: split median
● five-number summary (1-Var Stats): min, Q1, Q2, Q3, max
○ turn into box plot (use TRACE to find outliers)
● standard deviation: typical distance from mean
2
∑(𝑥𝑖−𝑥)
○ 𝑠𝑥 = 𝑛−1
2
○ variance: 𝑠𝑥

Chapter 2: Distributions

2.1: Describing Locations in a Distribution

● pth percentile = p% observations less than this observation
● z-score: distance from mean in standard deviations
𝑥−𝑥
○ 𝑧 = 𝑠𝑥
(on formula sheet, but get familiar with this)

● cumulative relative frequency graphs: just adds on starting from 0%

● linear transformations
○ translate → changes center/location, NOT shape/spread
○ scale → changes everything except shape

2.2: Density Curves and Normal Distributions

● density curve: nonnegative y, area of 1
○ mean and standard deviation are µ, σ
● Normal curve: symmetrical curve following the Empirical Rule
(68-95-99.7 Rule), distinguished by µ, σ
● standard Normal distribution: µ = 0, σ = 1

𝑥−µ
○ map N(µ, σ) to standard Normal distribution: 𝑧 = σ
is
distributed N(0,1)
● How to assess normality: graphing, Empirical Rule, Normal Probability
Plot (linear → normal, turned right = skewed right)

Chapter 3: Correlations

3.1: Scatterplots and Correlation

● {(explanatory & response):correlation} as {(independent & dependent):
causation}
● correlation does NOT imply causation
● describing scatterplot (also interpreting r): direction, form, strength,
outliers
○ e.g. The relationship between (exp) and (res) has a strong,
positive, linear association. (There are no obvious outliers.)
● correlation coefficient: r’s closeness to ± 1 indicates strength, sign
indicates direction
○ r=0 means no LINEAR relationship
○ r has no units
○ r is affected by outliers
● linear regression is restricted to domain: cannot extrapolate (predict
extreme points outside of data’s domain)
● 𝑦 = 𝑎 + 𝑏𝑥 (define variables)
○ The slope b=__ tells us that the (res) is predicted to inc/dec by __
for each additional (unit of exp).
○ The y-intercept a=__ is the predicted (res) of an (individual) that
has 0 (units of exp).
● residual: 𝑦 − 𝑦 (Actual Minus Predicted, where actual = scatterplot
points and predicted = regression line, sum of all residuals is 0)
○ positive residual: actual>predicted
○ negative residual: actual<predicted
○ Least-Squares Regression line: minimize sum of squared residuals

○ residual plot (x: exp, y: resid): checks if LSR line was a good model
of the data
■ no pattern → LSR line was a good model of data
○ standard deviation of residuals (RSE): s=__ tells us that when we
try to predict (res) from (exp), we will typically be off by __ (units
of exp). (same as in 12.1)
● coefficient of determination: correlation squared
2
○ About (𝑟 · 100%) of the variation in (res) is accounted for by the
linear model relating (res) to (exp).

3.2: Least-Squares Regression

○ Computer output: (see 12.1 for full)
■ under “coefficient”:
● of constant: y-intercept
● of (exp): slope
■ RMS error (RMSE) is standard deviation of residuals (not
technically the same as s in 12.1, aka RSE, but they have
essentially the same interpretation for our purposes)
2
■ given r2, 𝑟 =± 𝑟 (depends on slope b)
𝑠𝑦
○ 𝑏 = 𝑟 𝑠𝑥

■ change in 1 standard deviation of x → predicted change of r

standard deviations of y
● LSR line always passes through (𝑥‾, 𝑦‾)
○ Given all data points, can just find LSR equation (and r) using
LinReg(a+bx) (RegEQ: Y1 if you want to store it as function)
● correlation and LSR are NOT resistant to outliers
● 𝑦 regresses towards the mean (closer to 𝑦 than 𝑥 is to 𝑥)

Chapter 4: Studies

4.1: Sampling and Surveys

● census (data from entire population) is often infeasible → survey sample
instead
● random sampling gets rid of personal choice (voluntary response) and
sampling bias (convenience)
○ biased = consistently either over/underestimates actual value
○ methods: hat, RNG (just like psych)
○ SRS of size n: every group of n individuals in population has equal
chance to be selected as sample (also every individual has equal
chance of being selected)
■ when using table: skip duplicates, since experimental units
have distinct assigned numbers
○ stratified random sample: given that certain groups have known
differences, classify population into strata (group of similar
individuals), then combine SRSs
■ if given locations each with vastly different numbers of
people, use stratified random sample, not cluster (since
clusters are treated as individuals)
○ cluster sample: classify population into clusters (by location), take
SRS of the clusters (cluster = individual)
● what random sampling doesn’t get rid of: undercoverage, nonresponse
(post-choosing individuals), response bias (systematically incorrect
responses), wording bias

4.2: Experiments
● observational study: no manipulation
● experiment: cause/effect
○ controls (does not eliminate, rather attempts to average out
differences) confounding: comparison (most important; even
voluntary sample compared with control group can show

difference), random assignment (to create roughly equivalent

groups), control, replication
○ completely randomized design: random assignment

○ double-blind gets rid of experimenter/researcher biases and

placebo effect (just like psych)
○ statistically significant: observed effect is too unlikely to have
occurred by chance (see chapter 9)
● {block:experiment} as {strata:observational study} (splitting up based
on preexisting differences)
● randomized block design: split into blocks, then repeat experiment in
each block, compare results between blocks
○ matched pairs: type of randomized block design (pairs=blocks)
■ 1 unit gets 1st treatment, other gets 2nd (subjectively
matched to be similar, then randomize who gets which
treatment)
■ pair can also be single individual (randomize order of
treatment)
■ distribution of differences of each pair (one population, one
list)

4.3: Using Studies Wisely

● random sampling allows inference about population (often experiments
don’t have this; results only apply to “subjects like these”)
● random assignment allows inference about cause and effect
● criteria for causation w/o experiment: replicable association,
chronology, plausibility, etc.

Chapter 5: Probability

5.1: Randomness, Probability, and Simulation

● Law of Large Numbers: more repetitions of chance process →
proportion of particular outcome approaches single value (long run)
● myths: short-run regularity, law of averages
● FRQ answer:
○ State question
○ Plan what device and what to record
○ Do repetitions
○ Conclude answer based on results

5.2: Probability Rules

● probability model: describe chance process with sample space and
probability of each outcome
○ each probability must be between 0 and 1
○ sample space’s probability is 1
# 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑒𝑣𝑒𝑛𝑡 𝐴
○ 𝑃(𝐴) = 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑆
= 1 − 𝑃(𝑛𝑜𝑡 𝐴)
● general addition rule: 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵)
○ 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
■ or: union of probabilities
■ and: intersection between two events
○ equal to 1 − 𝑃(𝑛𝑜𝑡 𝐴 𝑛𝑜𝑟 𝐵)

5.3 Conditional Probability and Independence

● conditional probability: probability of event given occurrence of another
event
𝑃(𝐴∩𝐵)
● 𝑃(𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵) = 𝑃(𝐴|𝐵) = 𝑃(𝐵)
○ general multiplication rule: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵) · 𝑃(𝐴|𝐵)
■ if A and B are independent, 𝑃(𝐴|𝐵) = 𝑃(𝐴) (use this to
check independence)

● tree diagram: multiply branch probabilities to get end probabilities,

which sum up to 1
● If trials are INDEPENDENT: do NOT skip duplicates when simulating
probability via table: numbers are not distinct
● mutually exclusive events CANNOT be independent, since one event’s
occurrence guarantees the other’s absence (affect’s other event’s
probability by making it 0)

Chapter 6: Random Variables

6.1: Discrete and Continuous Random Variables

● discrete random variable X (let 𝑥𝑖 be values and 𝑝𝑖 be probabilities)
○ µ𝑥= expected value = Σ𝑥𝑖𝑝𝑖
2
○ σ𝑥 = Σ(𝑥𝑖 − µ𝑥) 𝑝𝑖
● continuous random variable: no individual outcomes, only intervals
(areas under density curves)

6.2: Transforming and Combining Random Variables

● see 2.1 for linear transformations
○ standard deviation is never negative
● combining random variables: T=X+Y (if D, just flip sign)
○ µ𝑇 = µ𝑋 + µ𝑌(‘’)
○ range of T/D: sum of ranges
○ IF INDEPENDENT: variance of T/D is sum of variances
■ calculate scaled standard deviations before squaring; not
the same as repeated trials (which correspond to
non-squared coefficients)
■ NEVER add standard deviations
○ If X & Y are Normal, T/D also Normal

● observation: for means/proportions, calculating a probability via

sampling distribution yields same results as calculating probability via
summing variables

6.3: Binomial and Geometric Random Variables

● observation: for proportions, calculating a probability via a sampling
distribution yields same results as binomcdf
● Binomial random variable X: counts of SUCCESSES
○ Binary: “success” or “failure”
○ Independent trials
○ Number of trials n is fixed
○ Success has constant probability p in all trials
○ binomial distribution has parameters n & p
○ 𝑃(𝑋 = 𝑘) = ( ) 𝑝 (1 − 𝑝)
𝑛
𝑘
𝑘 𝑛−𝑘
→ binompdf
■ binomial coefficient accounts for arrangements
○ 𝑃(𝑋 ≤ 𝑘) → binomcdf (area of interest like normalcdf)
■ If binomcdf doesn’t get an MC answer, try normalcdf (Large
Counts)
○ µ𝑋 = 𝑛𝑝

○ σ𝑋 = 𝑛𝑝(1 − 𝑝)
○ SRS lacks replacement: not independent trials
■ 10% Condition: binomial distribution can approximately
1
model counts of successes when 𝑛 ≤ 10
𝑁 (population is
much larger than sample)
○ Large Counts Condition: When 𝑛𝑝 ≥ 10, 𝑛(1 − 𝑝) ≥ 10 (at least
10 expected successes and failures each; recall 6.1), the
distribution of X is approximately Normal
● Geometric random variable Y: counts of TRIALS taken to get ONE
success
○ BINS without N
○ geometric distribution has parameter n

■ histogram: exponential decay

𝑘−1
○ 𝑃(𝑌 = 𝑘) = (1 − 𝑝) 𝑝 → geometpdf
■ only one arrangement
○ 𝑃(𝑌 ≤ 𝑘) → geometcdf (area of interest like normalcdf)
1
○ µ𝑌 = 𝑝
○ don’t need find standard deviation

Chapter 7: Sampling Distributions

7.1: Overview
● Parameter (µ, 𝑝) describes Population, Statistic (𝑥, 𝑝) describes Sample
○ statistic (𝑥, 𝑝) estimates corresponding parameter (µ. 𝑝)
○ unbiased estimator: statistic whose mean is equal to value of
parameter being estimated (does NOT consistently either
over/underestimate value of parameter)
● take every possible sample of size n & graph some statistic → sampling
distribution: frequency distribution of ALL possible values of a statistic
(we can only approximate)

7.2: Sample Proportions

● µ = 𝑝
𝑝
𝑝(1−𝑝)
● σ = 𝑛
IF AND ONLY IF 10% Condition (review 6.3)
𝑝
○ often can assume 10% Condition: “Assuming the population size it
at least __…”
○ Regardless of population size, take same sample size for same
variability assuming equal probabilities between distributions
● Sampling distribution of 𝑝 can be approximated as Normal IF AND ONLY
IF Large Counts Condition (review 6.3)

7.3: Sample Means

● µ𝑥 = µ
σ
● σ𝑥 = IF AND ONLY IF 10% Condition (see 6.3 and 7.2)
𝑛

● ways to check if sampling distribution of 𝑥 can be treated as Normal

1. Population distribution is Normal → sampling distribution of 𝑥 is
Normal
2. Central Limit Theorem: Given population distribution isn’t
Normal, sampling distribution of 𝑥 will be APPROXIMATELY
Normal if 𝑛 ≥ 30 (arbitrary “large enough”)
● also see 8.3 (rough guess using sample data; for inference only)
● cannot calculate probabilities if sampling distribution isn’t Normal
● If claim of some probability for some interval of parameter, but sampling
distribution has significantly small probability of that interval, it is
unlikely for that probability’s smallness to be due to chance, so the claim
is likely false (precursor to significance tests)

Chapter 8: Confidence Intervals (lays foundation for

the next chapters)

8.1: Overview
● confidence interval: estimate population parameter from sample
statistic
● C% confidence interval: statistic (unbiased point estimator) ± margin of
error, where margin of error = critical value * standard deviation of
population (if don’t know standard deviation of population, must use
standard error as estimate)
○ critical value depends on confidence level
○ confidence level does NOT tell the chance that a particular interval
captures the parameter; the parameter is fixed already, so one you
make a sample, the interval either contains or does not contain
the parameter

■ Instead (interpretation): “The interval was constructed

using a method that produces intervals that capture the true
(parameter) in C% of all possible samples of size (n).”

8.2: Estimating a Population Proportion

FRQ Response: (bullet points indicate notes, not things you need to write;
memorize everything below)
^
𝑝 = [calculate]
[State] We want to estimate p, the proportion of (population) that…[define]
[Plan] If conditions are met, we will construct a C% one-sample z-interval for
p.
Random: random sample of ___ (individuals)
10%: We assume there are at least 10n (individuals) in the population.
^ ^
Normal (Large Counts): 𝑛𝑝 =... ≥ 10, 𝑛(1 − 𝑝) =... ≥ 10
● at least 10 expected successes and failures (recall 6.1)
[Do: either manual or technology]
^ ^
^ * 𝑝(1−𝑝)
1. 𝑝 ± 𝑧 · 𝑛
→ (__, ___)
^ ^
𝑝(1−𝑝)
● 𝑛
is standard error, since we don’t know p
○ interpreting standard error: “If we take many samples of
size n from the population, the (sample statistic) will
typically differ from the (parameter) by (standard error)”
𝐶
1− 100
● z* = |invNorm(area: 2
, µ: 0, σ: 1)| (state using technology
and write out input)
OR
2. Using technology, 1-PropZInt(...) → (__, ___)
[Conclude] We are C% confident that the interval from ___ to ___ captures the
true (reiterate definition of p).

^
❖Choosing sample size to bound margin of error: to guess 𝑝, use past
^
study if given, otherwise use 𝑝 = 0. 5 (maximizes margin of error)

^ ^
* 𝑝(1−𝑝)
➢solve 𝑧 · 𝑛
≤ 𝑚𝑎𝑥 𝑀𝐸 and return ⌈n⌉ (nearest greater
integer)

8.3: Estimating a Population Mean

● for means, we almost always use t and t* instead of z and z* (except
choosing sample size)
○ t distribution with degrees of freedom df=n-1
○ as df increases, t distribution narrows toward Normality
● same outline as 8.2, except:
○ procedure: C% one-sample t-interval for µ
○ Normal (Large Sample):
1. The population distribution is Normal (thus sampling
distribution is approx. Normal)
2. 𝑛 ≥ 30 (Central Limit Theorem)
3. (if population distribution shape unknown and n is less than
30) sketch graphs of sample data; can proceed if no strong
skewness nor outliers
a. boxplot for skew/outliers
b. Normal probability plot
● Do:
𝐶
* 𝑠𝑥 1− 100
1. 𝑥‾ ± 𝑡 · , where t*=|invT(area: 2
, df=n-1)|
𝑛
2. technology alternative: TInterval
● Choosing sample size to bound margin of error: use given past study for
guessing σ
* σ
○ solve 𝑧 · ≤ 𝑚𝑎𝑥 𝑀𝐸 and return ⌈n⌉ (nearest greater integer)
𝑛
■ note that z* is used instead of t*; since you don’t know the
sample size yet, use critical value corresponding to
maximum possible sample size; t* with 𝑑𝑓 = ∞ is just z*

Chapter 9: Significance Tests (lays foundation for the

next chapters)

9.1: Overview
● significance test: use sample statistic to decide between competing
hypotheses about population parameter
○ null hypothesis (H0): original claim (no-change claim)
■ parameter = value
○ alternative hypothesis (Ha): claim to try to prove (aka find
convincing evidence for)
■ parameter (<, >, ≠) value
● ≠ is a two-sided test
● P-value: assuming H0 were true, the probability that the statistic would
take a value at least as extreme as statistic (in the direction(s) of Ha)
○ think proof by contradiction: getting a very unlikely (in the
direction(s) of Ha) statistic according to H0 suggests that Ha is true
○ P-value < α (significance level): statistically significant at level α
■ reject H0, convincing evidence of Ha (otherwise fail to reject
H0, no convincing evidence of Ha)
■ if α not given, set α = 0. 05
■ draw appropriate distribution alongside calculation
○ interpretation: “Assuming (H0), there would be a (P-value)
probability of getting a (statistic definition) as (extreme or more
extreme; specify direction) than (sample statistic value).”
● Type I error: rejected H0 when H0 was actually true
○ significance level α = 𝑃(𝑇𝑦𝑝𝑒 𝐼) (= 100% - Confidence%)
● Type II error: failed to reject H0 when Ha was actually true (H0 was
actually false)

9.2: Proportion
FRQ Response: (memorize entire thing below)
[State] We want to test at the 5% significance level.

H0: p = (p0)
Ha: p ≠ (p0) (operator depends on problem)
where p is the proportion…[define]
[Plan] If conditions are met, we will conduct a one-sample z-test for p at the
α = 0. 05 level.
Conditions: see 8.2
[Do: either manual or technology]
^
^ 𝑝−𝑝0
1. 𝑝 =..., test statistic 𝑧 = , then using technology, P-value =
𝑝0(1−𝑝0)
𝑛

normalcdf(...) TIMES 2 FOR 2-SIDED (draw Normal distribution with

area of interest and z value)
2. Using technology, 1-PropZTest(...) → report z and P-value along with
same sketch
[Conclude] Since the P-value = __ < α = 0. 05, we reject H0 and have
convincing evidence that (restate Ha). (alternatively, if P-value > α, fail to
reject H0 and do not have convincing evidence of Ha)

❖note that confidence interval is more descriptive than 2-sided

significance test, since you also get an interval of plausible values (reject
values outside of the confidence interval)
➢significance level % = 100% - confidence level %
❖effect size: distance between actual proportion and null proportion
❖power: probability of correctly rejecting H0 (1-P(Type II))
➢increasing sample size, significance level, and effect size increase
power

9.3: Mean
FRQ Response:
● same outline as 9.2, except:
○ procedure: one-sample t-test for µ at the α = 0. 05 level
○ Conditions: same as 8.3
○ Do (2 ways):

𝑥‾−µ0
1. 𝑥‾ =..., 𝑠𝑥 =..., 𝑑𝑓 = 𝑛 − 1, test statistic 𝑡 = 𝑠𝑥 , then using
𝑛

technology, P-value = tcdf(...) TIMES 2 FOR 2-SIDED (draw t

distribution with area of interest and t value)
2. Using technology, T-Test(...) → report df, t, and P-value along
with same sketch
● when using paired data from matched pairs design of experiment:
inferring about µ𝑑 = µ1 − µ2: true mean difference between (subjects)
who receive (treatment 1) and those who receive (treatment 2) in
(response) for subjects like these [be consistent with order]
○ procedure: paired t-test for µ𝑑 at the α = 0. 05 level
○ Random: random assignment to treatments (list treatments)
○ (10% is not relevant within experiment)
○ everything else same as for regular means
● don’t chase significance via multiple tests: with more tests, the
probability of at least one Type I error increases

Chapter 10: Comparing Two Populations

10.1: Proportions
^ ^
● sampling distribution of 𝑝1 − 𝑝2 has mean 𝑝1 − 𝑝2 and standard

deviation 𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 (see 6.2

and 7.2)
○ make sure you are consistent with the order
● for any inference, conditions apply to both populations (see 8.2)
● confidence interval:
○ procedure: two-sample z-interval for 𝑝1 − 𝑝2
○ Do (2 ways):
^ ^ ^ ^
^ ^ * 𝑝1(1−𝑝1) 𝑝2(1−𝑝2)
1. (𝑝1 − 𝑝2) ± 𝑧 𝑛1
+ 𝑛2
(see 8.1)

2. Using technology, 2-PropZInt(...)

● significance test:
○ H0: 𝑝1 − 𝑝2 = 0 (aka 𝑝1 = 𝑝2)
○ procedure: two-sample z-test for 𝑝1 − 𝑝2
○ Do (2 ways):
^ 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠
1. 𝑝𝑐 = 𝑡𝑜𝑡𝑎𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
(pooled proportion), test statistic
^ ^
(𝑝1−𝑝2)−0
𝑧= , then using technology, P-value =
𝑝𝑐(1−𝑝𝑐) 𝑝𝑐(1−𝑝𝑐)
𝑛1
+ 𝑛2

normalcdf(...) TIMES 2 FOR 2-SIDED (draw Normal

distribution with area of interest and z value)
● (𝑝1 − 𝑝2)0 almost always 0
2. Using technology, 2-PropZTest (report z and P-value along
with sketch)
● If data from experiment, 10% is not relevant (see 8.3)

10.2: Means
● sampling distribution of 𝑥‾1 − 𝑥‾2 has mean µ1 − µ2 and standard

deviation 𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 (see 6.2

and 7.3)
○ make sure you are consistent with the order
● for any inference, conditions apply to both populations (see 8.3)
● confidence interval:
○ procedure: two-sample t-interval for 𝑥‾ − 𝑥‾ 1 2
○ Do (2 ways):
2 2
𝑠 𝑠
* 𝑥1 𝑥2
1. 𝑥‾1 − 𝑥‾2 ± 𝑡 𝑛1
+ 𝑛2
(see 8.1)

2. Using technology, 2-SampTInt(...)

● significance test:
○ H0: µ1 − µ2 = 0 (aka µ1 = µ2)

○ procedure: two-sample t-test for µ1 − µ2

○ Do (2 ways):
(𝑥‾1−𝑥‾2)−0
1. test statistic 𝑡 = , then using technology, P-value
2 2
𝑠 𝑥
𝑠 𝑥
1 2
𝑛1
+ 𝑛2

= tcdf(...) TIMES 2 FOR 2-SIDED (draw t distribution with

area of interest and t value)
● (µ1 − µ2)0 almost always 0
● df is the smaller of (n1-1) and (n2-1) (conservative
guess)
2. Using technology, 2-SampTTest (report df, t, and P-value
along with sketch)
● NEVER use “Pooled” for means
● If data from experiment, 10% is not relevant (see 8.3)
○ can use two-sample t-procedures in randomized experiment, but
NOT in matched pairs design (paired t-test)

Chapter 11: Chi-Square Tests

11.1: Chi-Square Test for Goodness of Fit

● ONE categorical variable, ONE population (one-way table): want to
check likelihood of a claimed distribution of categories → chi-square test
for goodness of fit
● H0: (claimed distribution) is correct. (could list out proportions instead,
but then would have to define)
● Ha: (claimed distribution) is incorrect. (at least two of the pi's are
incorrect)
● (third condition; other two are same as usual) Large Counts: The
smallest expected count is ___ ≥ 5, thus all expected counts are at least
5. (calculate expected counts like regular Large Counts in 8.2)
2
● if Ho is true, χ should be close to 0
● Do (2 ways):

2
○ df = # categories - 1 (larger df = flatter χ distribution)
2
2 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑) 2
1. chi-square statistic χ = ∑ 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑
, then P-value = χ cdf
2
a. draw χ distribution with all three statistics
2 2
2. Using technology, χ GOF-Test (report df, χ , and P-value, and draw
2
χ distribution with all three statistics)
2
● either method, suggested to write out formula for χ with few
components and … (partial credit in case of error)
● follow-up analysis (if asked): CNTRB list → “The largest contributors to
the statistic are (category 1) (component value) and (category 2)
(component value). There are fewer/more (category 1) and fewer/more
(category 2) than expected.” (must look back at lists to see if
fewer/more than expected since all components are positive)

11.2: Inference for Two-Way Tables

● If necessary (no output given), input data into matrix (see calculator
guide)
● compare distribution of ONE categorical variable for MULTIPLE
populations/treatments: chi-square test for homogeneity
○ H0: There is no difference in the distribution of (categorical var)
for (populations/treatments).
○ Ha: There is a difference in the distribution of (categorical var) for
(populations/treatments) (aka at least two pi’s are different)
○ Conditions: like 11.1, but
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 2
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 = 𝑡𝑜𝑡𝑎𝑙 𝑡𝑜𝑡𝑎𝑙
(use χ -Test to generate
matrix of expected counts)
○ Do (2 ways)
■ df=(# rows - 1) * (# columns -1)
1. same manual method as in 11.1
2 2
2. Using technology, χ -Test (report df, χ , and P-value, and
2
draw χ distribution with all three statistics)

2
○ either method, suggested to write out formula for χ with
few components and … (partial credit in case of error)
● follow-up analysis (if asked): same as 11.1 but name cells (by two
labels); usually given computer output (otherwise, manual
2
method is χ -Test → manually plug in expected counts from matrix
into list)
● convincing evidence of association between TWO categorical variables
within ONE population: chi-square test for independence
○ H0: There is no association between (categorical var 1) and
(categorical var 2).
○ Ha: There is an association between (categorical var 1) and
(categorical var 2).
○ everything else is the same as homogeneity (except for conclusion
wording, of course)

Chapter 12: Inference for Linear Regression

12.1: Intervals and Tests for Linear Regression

● population regression line: 𝑦 = α + β𝑥
^
● sample regression line: 𝑦 = 𝑎 + 𝑏𝑥
○ mean of sampling distribution of b: µ𝑏 = β
σ
○ standard deviation of sampling distribution of b (10%): σ𝑏 =
σ𝑥 𝑛

● inference
○ Conditions: LINER
■ Linear: linear association (see 3.1), no leftover pattern in
residual plot
■ Independent: observations are independent (one
individual’s response does not affect another’s), 10%
Condition

■ Normal: Graphs (box plot, Normal probability plot,

histogram) of residuals vs x (see 8.3) should not have any
strong skewness/outliers
■ Equal SD (standard deviation): Scatter of points around
residual=0 line is roughly the same for all (x) values
■ Random: random sample or randomized experiment
𝑠
○ 𝑆𝐸𝑏 = (sample error formula is on the formula sheet, but it
𝑠𝑥 𝑛−1

is unlikely you’ll ever need to use it anyway)

○ df=n-2 (weird)
*
○ confidence interval (C% t-interval for the slope): 𝑏 ± 𝑡 · 𝑆𝐸𝑏
■ State: “We want to estimate the slope β of the true
population regression line relating (y; response var) to (x;
explanatory var).”
■ Do: use computer output (see below)
● otherwise use LinRegTInt (create RESID list to check
conditions/graph)
○ significance test (t-test for the slope at the α =... significance
level)
𝑏−β0
■ test statistic 𝑡 = 𝑆𝐸𝑏
(where 𝐻0: β = β0)

■ prompted by “Do the data provide convincing evidence of

any/positive/negative relationship?” (null hypothesis value
is then 0)
■ Do: use computer output (see below), report df, t, P-value
● otherwise use LinRegTest

○ COMPUTER OUTPUT FORMAT: (GET USED TO IT!!!)

y (response) vs x (explanatory)
Predictor Coef. SE Coef. T P

Constant a IGNORE THIS DIVERSION

x (explanatory) b SEb(sample test statistic t 2-sided P-value

error of b) (may need to
divide by 2 if
necessary)

s (standard R-sq (r2) NOTHING

deviation of IMPORTANT
residuals)

2
● given r2, 𝑟 =± 𝑟 (depends on slope b)
● interpretations:
○ SEb: “If we take many identical samples of size n, the sample slope
will typically differ from the true slope by (SEb).” (b vs β)
○ s: “The predicted (explanatory) will typically differ from the
^
population regression line by (s).” (𝑦 vs y; same as in 3.1)

12.2: Linearization via Transformations

● linearize curved relationship to then predict/test using LSR (remember
to include transformation in equation)
𝑝
● power model: original relationship is 𝑦 = 𝐶 + 𝑎𝑥
○ sometimes use intuition (educated guess)
^ 2
■ e.g. area and radius of circle: 𝐴 = # + #𝑟 (don’t actually
have to solve for coefficient/constant; just apply
transformation on list and then do LinReg)
○ can instead apply inverse function to other variable (same effect)
^
■ e.g. 𝐴 = # + #𝑟 in above example
○ foolproof (no guessing): apply same base logarithm to both
(usually log (base 10) or ln (base e))

^
■ transformed equation is in form log(𝑦) = # + # log(𝑥)
𝑥
● exponential model: original relationship is 𝑦 = 𝐶 + 𝑎𝑏
^ ^
○ apply logarithm onto 𝑦: log 𝑦 = # + #𝑥
^
● What if both methods (log both, log 𝑦) result in a linear graph?
○ whichever residual plot has more random scatter
■ if residual plots are roughly same, check for smaller s or
bigger r2

Final Cheat Sheet 2
No ratings yet
Final Cheat Sheet 2
4 pages
STA1007 Notes
No ratings yet
STA1007 Notes
251 pages
Statistics Notes
No ratings yet
Statistics Notes
32 pages
AP Statistics 핵심정리
100% (1)
AP Statistics 핵심정리
20 pages
Essentials of Statistics
No ratings yet
Essentials of Statistics
272 pages
AP Statistics Study Guide
100% (1)
AP Statistics Study Guide
12 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Stat 201 Exam 2 Review Guide
No ratings yet
Stat 201 Exam 2 Review Guide
3 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
3 pages
AP Stats - Vocab List
No ratings yet
AP Stats - Vocab List
28 pages
Lecture Notes - Data
No ratings yet
Lecture Notes - Data
26 pages
GEA1000 Final CS
No ratings yet
GEA1000 Final CS
3 pages
Econometrics 1 Cumulative Final Study Guide
No ratings yet
Econometrics 1 Cumulative Final Study Guide
35 pages
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
No ratings yet
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
7 pages
Statistics Cheatsheet
No ratings yet
Statistics Cheatsheet
3 pages
Gea1000 Cheatsheet
No ratings yet
Gea1000 Cheatsheet
2 pages
Unit Iii
No ratings yet
Unit Iii
41 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
Statistics For Economists-2
No ratings yet
Statistics For Economists-2
31 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
Statistics 101 Study Notes
No ratings yet
Statistics 101 Study Notes
33 pages
Review of Chapters 1-5
No ratings yet
Review of Chapters 1-5
21 pages
STAT Exam 1 - Review Sheet
No ratings yet
STAT Exam 1 - Review Sheet
2 pages
Math 140 Final Review Notes
No ratings yet
Math 140 Final Review Notes
20 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Computer Science 1
No ratings yet
Computer Science 1
61 pages
Review Exam 1
No ratings yet
Review Exam 1
3 pages
Housekeeping Operations - Chapter 4 Guestroom Cleaning & Maintenance
No ratings yet
Housekeeping Operations - Chapter 4 Guestroom Cleaning & Maintenance
34 pages
Loop Breaker Manual
No ratings yet
Loop Breaker Manual
62 pages
Statistics (Curso Completo)
No ratings yet
Statistics (Curso Completo)
9 pages
Words of Wisdom
No ratings yet
Words of Wisdom
17 pages
6622e385653c4ee9ff8b7c51 - AP Statistics Cheatsheet (2024)
No ratings yet
6622e385653c4ee9ff8b7c51 - AP Statistics Cheatsheet (2024)
2 pages
Ap Stat 1-7 Notes
No ratings yet
Ap Stat 1-7 Notes
12 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Applied Statistics Summary
No ratings yet
Applied Statistics Summary
9 pages
Apple Supplier List 2013
No ratings yet
Apple Supplier List 2013
33 pages
Cornell Notes: Chapter 25
No ratings yet
Cornell Notes: Chapter 25
5 pages
GEA1000 Helpsheet v2
No ratings yet
GEA1000 Helpsheet v2
2 pages
SKF TrainingCalendar 2019-20 - India
No ratings yet
SKF TrainingCalendar 2019-20 - India
84 pages
Gea1000 Cheatsheet Finals
No ratings yet
Gea1000 Cheatsheet Finals
3 pages
Statistics Notes
No ratings yet
Statistics Notes
17 pages
Math1041 Study Notes For UNSW
No ratings yet
Math1041 Study Notes For UNSW
16 pages
A. Definitions of Traditional Literacies
100% (1)
A. Definitions of Traditional Literacies
6 pages
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
No ratings yet
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
4 pages
Ecn 104 Foundations of Managerial Economics Syllabus
No ratings yet
Ecn 104 Foundations of Managerial Economics Syllabus
11 pages
Materials SB: 1+3.3log Log (N)
No ratings yet
Materials SB: 1+3.3log Log (N)
10 pages
Study Guide For Statistics
No ratings yet
Study Guide For Statistics
7 pages
67c9d235eca2bdbebab746da - AP Statistics Cheatsheet (2025)
No ratings yet
67c9d235eca2bdbebab746da - AP Statistics Cheatsheet (2025)
2 pages
1.3.STS - Handout 2
No ratings yet
1.3.STS - Handout 2
7 pages
All Strings AS Scales Set C
No ratings yet
All Strings AS Scales Set C
5 pages
ST1131 Cheat Sheet Page 1
0% (1)
ST1131 Cheat Sheet Page 1
1 page
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Contents
No ratings yet
Contents
5 pages
Technical Paper (Seismic Fragility Curves) - Dec2018
No ratings yet
Technical Paper (Seismic Fragility Curves) - Dec2018
32 pages
Gartner - SWOT SAS Institute
100% (1)
Gartner - SWOT SAS Institute
26 pages
Unit 2 Notes - AP Stats
No ratings yet
Unit 2 Notes - AP Stats
2 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
Ebook Golden Rules For Futures Traders
No ratings yet
Ebook Golden Rules For Futures Traders
15 pages
Thcs An Lac - Thi HK I. k9. 2020-2021
No ratings yet
Thcs An Lac - Thi HK I. k9. 2020-2021
8 pages
DB Ex3
No ratings yet
DB Ex3
4 pages
Tetra 350-450MHz Mini Repeater Brochure, V1.04
No ratings yet
Tetra 350-450MHz Mini Repeater Brochure, V1.04
2 pages
HGP11 Q3 W3 - Las
No ratings yet
HGP11 Q3 W3 - Las
13 pages
AP Statistics Topic Outline
No ratings yet
AP Statistics Topic Outline
3 pages
Ap Stats
No ratings yet
Ap Stats
8 pages
VAC Choke Multivariadores sandCoresDatasheet
No ratings yet
VAC Choke Multivariadores sandCoresDatasheet
16 pages
SYNTHESIS
No ratings yet
SYNTHESIS
2 pages
Untitled - Notepad
No ratings yet
Untitled - Notepad
1 page
Skema Toshiba c805 By3 By4
No ratings yet
Skema Toshiba c805 By3 By4
49 pages
AP Statistics 1st Semester Study Guide
No ratings yet
AP Statistics 1st Semester Study Guide
6 pages
Fact Family Trees PDF
No ratings yet
Fact Family Trees PDF
5 pages
PV-RCNN: Point-Voxel Feature Set Abstraction For 3D Object Detection
No ratings yet
PV-RCNN: Point-Voxel Feature Set Abstraction For 3D Object Detection
11 pages
Latihan Soal Akademik Bahasa Inggris-1-1
No ratings yet
Latihan Soal Akademik Bahasa Inggris-1-1
4 pages
Itinerary of Travel
No ratings yet
Itinerary of Travel
4 pages
Studies Soil Improvement of An Expansive Soil Using Addiction of Lime (Caco3)
No ratings yet
Studies Soil Improvement of An Expansive Soil Using Addiction of Lime (Caco3)
4 pages
RICOH Pro L4130/L4160 Print Guide: First, Confirm The Following Items
No ratings yet
RICOH Pro L4130/L4160 Print Guide: First, Confirm The Following Items
8 pages
20 Things To Do After Installing Elementary OS Freya
No ratings yet
20 Things To Do After Installing Elementary OS Freya
2 pages
Hydraulic Port Ds
No ratings yet
Hydraulic Port Ds
2 pages
Skin and Temperature Control
No ratings yet
Skin and Temperature Control
3 pages
S.No. Name of The Agency Contact Details: M/s M.P. Printers
100% (1)
S.No. Name of The Agency Contact Details: M/s M.P. Printers
3 pages
Master Advanced Maths Box Set
From Everand
Master Advanced Maths Box Set
G Ludinski
5/5 (1)
Homework Helpers: Trigonometry
From Everand
Homework Helpers: Trigonometry
Denise Szecsei
1/5 (1)
Algebra & Trigonometry II Essentials
From Everand
Algebra & Trigonometry II Essentials
Editors of REA
4/5 (4)
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Calculus by Muhammad Umer
From Everand
Calculus by Muhammad Umer
Muhammad Umer
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Calculus Super Review
From Everand
Calculus Super Review
Editors of REA
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

AP Stat Review

Uploaded by

AP Stat Review

Uploaded by

​

← back to Study Materials for Various APs

___ = important for AP, worth memorizing/understanding well

General Test-Taking Tips

Chapter 1: Describing Data

1.1: Categorical Data

1.2: Displaying Quantitative Data

1.3: Describing Quantitative Data

2.1: Describing Locations in a Distribution

●​ cumulative relative frequency graphs: just adds on starting from 0%

2.2: Density Curves and Normal Distributions

3.1: Scatterplots and Correlation

3.2: Least-Squares Regression

■​ change in 1 standard deviation of x → predicted change of r

4.1: Sampling and Surveys

difference), random assignment (to create roughly equivalent

○​ double-blind gets rid of experimenter/researcher biases and

4.3: Using Studies Wisely

5.1: Randomness, Probability, and Simulation

5.2: Probability Rules

5.3 Conditional Probability and Independence

●​ tree diagram: multiply branch probabilities to get end probabilities,

Chapter 6: Random Variables

6.1: Discrete and Continuous Random Variables

6.2: Transforming and Combining Random Variables

●​ observation: for means/proportions, calculating a probability via

6.3: Binomial and Geometric Random Variables

■​ histogram: exponential decay

Chapter 7: Sampling Distributions

7.2: Sample Proportions

7.3: Sample Means

●​ ways to check if sampling distribution of 𝑥 can be treated as Normal

Chapter 8: Confidence Intervals (lays foundation for

■​ Instead (interpretation): “The interval was constructed

8.2: Estimating a Population Proportion

8.3: Estimating a Population Mean

Chapter 9: Significance Tests (lays foundation for the

normalcdf(...) TIMES 2 FOR 2-SIDED (draw Normal distribution with

❖​note that confidence interval is more descriptive than 2-sided

technology, P-value = tcdf(...) TIMES 2 FOR 2-SIDED (draw t

Chapter 10: Comparing Two Populations

deviation 𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 (see 6.2

2.​ Using technology, 2-PropZInt(...)

normalcdf(...) TIMES 2 FOR 2-SIDED (draw Normal

deviation 𝑠𝑢𝑚 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 (see 6.2

2.​ Using technology, 2-SampTInt(...)

○​ procedure: two-sample t-test for µ1 − µ2

= tcdf(...) TIMES 2 FOR 2-SIDED (draw t distribution with

Chapter 11: Chi-Square Tests

11.1: Chi-Square Test for Goodness of Fit

11.2: Inference for Two-Way Tables

Chapter 12: Inference for Linear Regression

12.1: Intervals and Tests for Linear Regression

■​ Normal: Graphs (box plot, Normal probability plot,

is unlikely you’ll ever need to use it anyway)

■​ prompted by “Do the data provide convincing evidence of

○​ COMPUTER OUTPUT FORMAT: (GET USED TO IT!!!)

Constant a IGNORE THIS DIVERSION

x (explanatory) b SEb(sample test statistic t 2-sided P-value

s (standard R-sq (r2) NOTHING

12.2: Linearization via Transformations

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

● cumulative relative frequency graphs: just adds on starting from 0%

■ change in 1 standard deviation of x → predicted change of r

○ double-blind gets rid of experimenter/researcher biases and

● tree diagram: multiply branch probabilities to get end probabilities,

● observation: for means/proportions, calculating a probability via

■ histogram: exponential decay

● ways to check if sampling distribution of 𝑥 can be treated as Normal

■ Instead (interpretation): “The interval was constructed

❖note that confidence interval is more descriptive than 2-sided

2. Using technology, 2-PropZInt(...)

2. Using technology, 2-SampTInt(...)

○ procedure: two-sample t-test for µ1 − µ2

■ Normal: Graphs (box plot, Normal probability plot,

■ prompted by “Do the data provide convincing evidence of

○ COMPUTER OUTPUT FORMAT: (GET USED TO IT!!!)