0% found this document useful (0 votes)
26 views4 pages

AP Stats Semester 1 Finals Prep

stats note
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views4 pages

AP Stats Semester 1 Finals Prep

stats note
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Unit 1: Exploring One-Variable Data

AP Course and Exam Description NOTES

1.1 Introducing Statistics: What Can We Learn from Data?


EU: Given that variation may be random or not, conclusions are uncertain.
-​ Numbers may convey meaningful information, when placed in context.

1.2 The Language of Variation: Variables ●​ 2 types of Quantitative


○​ Discrete: no decimals (countable)
-​ Variable: a characteristic that changes from one individual to another. ○​ Continuous: decimals only (infinite)
-​ Categorical variable: takes on values that are category names or group labels. ●​ Association: knowing the value of one variable helps us predict the other
-​ Quantitative variable: takes on numerical values for a measured or counted ■​ “If a student prefers math, then they prefer tech.”
quantity
-​ Discrete variable: take on a countable number of values. The number of values
may be finite or countably infinite, as with the counting numbers.
-​ Continuous variable: take on infinitely many values (values cannot be
counted)
-​ No matter how small the interval between two values of a continuous
variable, it is always possible to determine another value between them.

1.3 Representing a Categorical Variable with Tables ●​ Frequency: how many


●​ Relative frequency: percentage or proportion
-​ Frequency table: gives the number of cases falling into each category.
-​ Relative frequency table: gives the proportion of cases falling into each
category.
-​ Percentages, relative frequencies, and rates = info about proportions

1.4 Representing a Categorical Variable with Graphs ●​ Misleading graphs


○​ The vertical axis must start at 0.
-​ Bar charts (or bar graphs) are used to display frequencies (counts) or relative ●​ If not, differences are exaggerated
frequencies (proportions) for categorical data.
-​ The height or length of each bar in a bar graph corresponds to either the number
or proportion of observations falling within each category.
-​ Frequency tables, bar graphs, or other representations can be used to compare
two or more data sets in terms of the same categorical variable.
1.5 Representing a Quantitative Variable with Graphs Histogram Stem and leaf plot Dot plot
-​ Histogram: the height of each bar shows the number or proportion of
observations that fall within the interval corresponding to that bar. Altering the
interval widths can change the appearance of the histogram.
-​ Stem and leaf plot: each data value is split into a “stem” (the first digit or
digits) and a “leaf” (usually the last digit).
-​ Dot plot: represents each observation by a dot, with the position on the
horizontal axis corresponding to the data value of that observation, with nearly
Make a key (ex. 3|4 = 3.4)
identical values stacked on top of each other.
outliers depend on interval
-​ Cumulative graph: represents the number or proportion of a data set less than
setting
or equal to a given number.
●​ Histogram → show general shape
●​ Dot plot & stem plot → show every value

1.6 Describing the Distribution of a Quantitative Variable Describing distribution (SOCV + Context)
●​ Shape
-​ Descriptions of the distribution of quantitative data: shape, center, and ○​ left/right-skewed, symmetric, unimodal, bimodal, uniform
variability (spread) (+ outliers, gaps, clusters, or multiple peaks) ●​ Outliers
-​ Outliers for one-variable data: data points that are unusually small or large ○​ If skewed: 1.5 IQR method
relative to the rest of the data. ■​ Low: < Q1 - 1.5(IQR)
■​ High: > Q3 + 1.5(IQR)
-​ Skewed
○​ If symmetric: SD method
-​ Skewed to the right (positive): if the right tail is longer than the left ■​ 2 SD above/below the mean
-​ Skewed to the left (negative): if the left tail is longer than the right ●​ Center
-​ Symmetric: if the left half is the mirror image of the right half ○​ If skewed: median
-​ Peaks ○​ If symmetric: mean
-​ Unimodal: Univariate graphs with one main peak ●​ Variability
-​ Bimodal: Graphs with two prominent peaks ○​ Range: Max-min
○​ Standard Deviation:
-​ Uniform: Each bar height is almost the same (no prominent peaks)
2
-​ A gap is a region of distribution between two data values where there are no Σ(𝑥−𝑥̄)
○​ σ = 𝑛
observed data. ■​ “The context typical varies by SD from the mean of x̄
-​ Clusters are concentrations of data usually separated by gaps. ○​ Interquartile range (IQR): Q3-Q1
-​ Descriptive statistics does not attribute properties of a data set to a larger ●​ Context
population, but may provide the basis for conjectures for subsequent testing.
●​ Use “ly” words
○​ Approximately, comparatively
1.7 Summary Statistics for a Quantitative Variable ●​ Percentile
○​ “Percentage of students are at or below value.”
EU: Graphical representations and statistics allow us to identify and represent key ●​ Cumulative relative frequency
features of data. ○​ Graph reaches 100% at the end
-​ A statistic is a numerical summary of sample data. ○​ “Percentage of the context had the same or lower context.”
-​ A parameter is a numerical summary of a population.
-​ Mean: the sum of all the data values divided by the number of values.
𝑛
1
-​ Sample: 𝑥̄ = 𝑛
∑ 𝑥𝑖
𝑖=1
-​ Median: the middle value when data are ordered.
-​ Even number of data points → any value between the two middle
values. (usually, the average of the two middle values)
○​
-​ Q1: the median of the ordered data set from the min to the median ■​ Q1 = 25th percentile
-​ Q3: the median of the ordered data set from the median to the max ■​ Median = 50th percentile
-​ Q1 and Q3 form the boundaries for the middle 50% of values in an ordered data ■​ Q3 = 75th percentile
set. ○​ Steep slope → many values
-​ The pth percentile is interpreted as the value that has p% of the data less than or
equal to it
-​ Variability
-​ Range: difference between the maximum and minimum data values
-​ Interquartile range (IQR): the difference between the third and first
quartiles: Q3 − Q1

1 2
-​ Standard deviation: 𝑠𝑥 = 𝑛−1
∑ (𝑥𝑖 − 𝑥̄)

-​ Sample variance: s2 (square of standard deviation)


-​ Changing units of measurement affects the values of the calculated statistics.
-​ There are many methods for determining outliers. Two methods frequently used
in this course are:
-​ An outlier is a value greater than 1.5 × IQR above the third quartile or
more than 1.5 × IQR below the first quartile.
-​ An outlier is a value located 2 or more standard deviations above, or
below, the mean.
-​ The mean, standard deviation, and range are considered nonresistant (or
non-robust) because they are influenced by outliers. The median and IQR are
considered resistant (or robust), because outliers do not greatly (if at all) affect
their value
1.8 Graphical Representations of Summary Statistics
-​ Five-number summary: minimum, Q1, median, Q3, maximum
-​ Boxplot: a graphical representation of the five-number summary
-​ box represents the middle 50% of the data
-​ a line at the median
-​ ends of the box corresponding to the quartiles
●​ Boxplot
-​ Lines (“whiskers”) extend from the quartiles to the most extreme point
○​ Need title & number line below the box plot
that is not an outlier
-​ outliers are indicated by their own symbol beyond this
-​ If a distribution is relatively symmetric, mean ~ median
-​ If a distribution is skewed right, median < mean
-​ If the distribution is skewed left, mean < median ●​

1.9 Comparing Distributions of a Quantitative Variable


-​ Any graphical representation (histograms, side-by-side boxplots) → used to
compare independent samples on center, variability, clusters, gaps, outliers
-​ Any of the numerical summaries (mean, standard deviation, relative frequency,
etc.) can be used to compare two or more independent samples.

1.10 The Normal Distribution ●​ z-score:


𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛
𝑆𝐷

EU: The normal distribution can be used to represent some population distributions. ○​ “Context is z-score standard deviation above/below the mean.”
●​ Linear Transformation of Data
-​ A normal curve (approximated normal) is mound-shaped and symmetric
Shape Center Variability
-​ population mean (µ) and population standard deviation (σ) Add (+a) same +a same
-​ Empirical Rule Subtract (-a) same -a same
-​ 68% of the observations are within 1 standard deviation of the mean Multiply (×a) same ×a ×a
-​ 95% of observations are within 2 standard deviations of the mean Divide (÷a) same ÷a ÷a
-​ 99.7% of observations are within 3 standard deviations of the mean. Standardize same 0 1
-​ z-score: measures how many standard deviations a data value is from the mean
𝑥𝑖−µ How to find proportion/boundary value How to find boundary value
-​ z-score = σ 1.​ Find z-score value (소수점 2자리) 1.​ Find z-score value (소수점 2자리)
-​ Percentiles and z-scores may be used to compare relative positions of points 2.​ Draw a normal distribution 2.​ Draw a normal distribution
within a data set or between data sets. a.​ N(µ, σ) a.​ N(µ, σ)
3.​ Use a table / normal CDF 3.​ Use inverse normal CDF
4.​ Find proportion (소수점 4자리) 4.​ Find boundary

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy