Statistics Training For Math Tutors VWZdTNUo
Statistics Training For Math Tutors VWZdTNUo
This training course is designed to equip math tutors with essential skills and insights
for effectively assisting students in statistics. By participating, you will enhance your
understanding and prepare for the upcoming semester, ensuring you can provide
valuable support to your students in their statistical studies.
Welcome
What is Statistics?
Types of Data
Descriptive Statistics
Confidence Intervals
Hypothesis Testing
Linear Regression
The TI 84 Calculator
The End
Lesson 1 of 11
Welcome
Heather Waugh
What is Statistics?
Heather Waugh
Introduction to Statistics
Statistics is the science of collecting, analyzing, and interpreting data to uncover
patterns and insights. It plays a crucial role in decision-making across various fields,
from education to business and healthcare. By understanding statistics, we can make
informed predictions and solve real-world problems effectively.
At its core, statistics helps us make sense of complex information by organizing it into
meaningful formats. This process involves summarizing data, identifying trends, and
drawing conclusions based on evidence. Whether working with large datasets or simple
surveys, statistics provides the tools needed to approach challenges with clarity and
precision.
ANIMAKER
Animaker
Awesome videos
READ MORE ANIMAKER
CO N T I N U E
Lesson 3 of 11
Types of Data
Heather Waugh
Qualitative vs Quantitative
As a math tutor, you'll often help students identify and work with different types of data.
Knowing whether data is qualitative or quantitative is the first step to understanding
what statistical tools to use. Let’s break it down together.
ANIMAKER
Animaker
Awesome videos
READ MORE ANIMAKER
Quantitative
Temperature of coffee in
Fahrenheit
Qualitative
Zip Code Type of car
Letter Grades
Height in centimeters
Eye color
Ice cream flavor
Type of car
SUBMIT
Height in centimeters
Eye color
Revenue in dollars
SUBMIT
CO N T I N U E
Lesson 4 of 11
Understanding Histograms
Histograms are ideal for visualizing the distribution of numerical data. They group data into
intervals, or bins, and display the frequency of data points within each bin using touching bars.
This type of graph is particularly useful for identifying patterns such as skewness, peaks, or gaps
in the data. By analyzing a histogram, you can gain insights into the spread and central tendency
of your dataset.
Exploring Boxplots
Boxplots, also known as box-and-whisker plots, are used to summarize the spread, center, and
outliers of a dataset. They display the median, quartiles, and potential outliers in a compact visual
format.
This graph type is especially helpful for comparing distributions across multiple groups. It allows
you to quickly identify variability and detect any anomalies in the data.
When to Use Bar Graphs
Bar graphs are best suited for comparing categories of qualitative data. Unlike histograms, the
bars in a bar graph do not touch, emphasizing the distinction between categories.
These graphs are effective for showcasing differences or trends across groups, making them a
popular choice for presenting survey results or categorical comparisons. They are simple to
interpret and widely used in various fields.
Pie Charts Explained
Pie charts are used to represent parts of a whole, with each slice corresponding to a category's
percentage. They are most effective when there are a limited number of categories to display.
This type of graph provides a clear visual representation of proportions, making it easy to
compare the relative sizes of different segments. However, they are less effective for precise
comparisons between categories.
Stem and Leaf Plot
Stem and leaf plots are used to organize numerical data while retaining the original data values.
They separate each data point into a stem (representing the leading digits) and a leaf
(representing the trailing digit).
This type of plot is particularly useful for small datasets, as it provides a quick way to visualize the
shape and distribution of the data. It also allows for easy identification of clusters, gaps, and
outliers.
Which type of graph is best suited for visualizing the distribution of numerical
data?
Boxplot
Pie Chart
Histogram
Bar Graph
SUBMIT
CO N T I N U E
Lesson 5 of 11
Descriptive Statistics
Heather Waugh
Statistical Symbols
Understanding statistical symbols is essential for students as they form the foundation
of effective data analysis and interpretation. These symbols represent key concepts that
enable students to communicate complex ideas succinctly and accurately. By mastering
these symbols, students can enhance their ability to summarize data, draw meaningful
conclusions, and apply statistical methods in real-world scenarios. This knowledge not
only boosts their confidence in handling data but also prepares them for advanced
studies and professional applications in various fields.
Sample Proportion: p̂
Population Proportion: P
Introduction to Statistical Averages
Understanding how to calculate the mean, median, and mode is essential for analyzing
data effectively. These measures of central tendency provide insights into the distribution
and characteristics of a data set.
Step 2
Begin by collecting all the data points relevant to your analysis. Ensure that the data is
accurate and complete, as errors or omissions can affect the results.
Organize the data in a logical order, such as ascending or descending, to make subsequent
calculations easier. This step lays the foundation for accurate statistical analysis.
Step 3
The mean, or average, is calculated by summing all the data points and dividing by the
total number of points. For example, if your data set is 2, 4, 6, 8, and 10, the sum is 30, and
the mean is 30 divided by 5, which equals 6.
Use the mean to understand the central value of your data set. This measure is particularly
useful when the data is evenly distributed without extreme outliers.
Step 4
The median is the middle value in an ordered data set. Arrange the data in ascending order
and identify the central number. If the data set has an odd number of points, the median is
the middle value. For an even number of points, calculate the median by averaging the two
middle values.
For example, in the data set 3, 5, 7, 9, and 11, the median is 7. If the data set is 3, 5, 7, 9, 11,
and 13, the median is (7 + 9) / 2, which equals 8.
Step 5
The mode is the value that appears most frequently in a data set. A data set can have one
mode, more than one mode, or no mode at all if all values occur with the same frequency.
For example, in the data set 2, 3, 3, 5, and 7, the mode is 3 because it appears twice.
Identifying the mode helps highlight the most common value in your data.
Conclusion on Statistical Averages
By mastering the calculation of mean, median, and mode, you can effectively summarize
and interpret data. These fundamental statistical tools are invaluable for making informed
decisions based on data analysis.
Given the dataset [3, 7, 7, 10, 15], what are the mean, median, and mode?
SUBMIT
Match each statistical symbol with its correct definition. This activity will help you
CO N T I N U E
Lesson 6 of 11
ANIMAKER
Animaker
Awesome videos
READ MORE ANIMAKER
Example
Formula:
Example:
If μ = 100, σ = 15, and x = 130:
is the z-score?
3
SUBMIT
Normal Distribution
Normal distributions frequently appear in statistics. They possess
several intriguing characteristics: the distribution is bell-shaped, the
mean and median are equal, and 68% of the data lies within one
standard deviation.
In this guide, we will explore how to use the normalcdf function to calculate areas under
the normal distribution curve. This includes finding the area to the left of a z-score, to the
right of a z-score, and between two z-scores. Understanding these calculations is
essential for interpreting probabilities in statistics.
Step 2
To find the area to the left of z = -0.04, use the normalcdf function with the lower bound
set to a very large negative number (e.g., -1E99) and the upper bound as -0.04. This will
calculate the cumulative probability up to the given z-score.
To find the area to the right of z = 1.37, use the normalcdf function with the lower bound as
1.37 and the upper bound set to a very large positive number (e.g., 1E99). This will
calculate the cumulative probability above the given z-score.
For instance, inputting normalcdf(1.37, 1E99, 0, 1) will provide the area to the right of z =
1.37. This value represents the proportion of the distribution that exceeds this z-score.
Step 4
To find the area between z = 0.8 and z = 1.03, use the normalcdf function with the lower
bound as 0.8 and the upper bound as 1.03. This will calculate the cumulative probability
between these two z-scores.
For example, inputting normalcdf(0.8, 1.03, 0, 1) will yield the area between these values.
This area represents the proportion of the distribution that lies within this range.
Step 5
Once you have calculated the areas, interpret the results in the context of the problem. The
areas represent probabilities or proportions of the data under the normal curve
corresponding to the specified z-scores.
For example, the area to the left of z = -0.04 might indicate the likelihood of a value being
less than a certain threshold, while the area to the right of z = 1.37 could represent the
probability of exceeding a specific value.
Conclusion
By mastering the normalcdf function, you can efficiently calculate areas under the normal
distribution curve for various z-scores. This skill is crucial for solving probability
problems and interpreting statistical data.
Learn to Calculate Z-Values
This section provides practical examples of using the invNorm function to find z-values
for given areas under the normal curve and to calculate the top 3% of IQ scores based on a
specified mean and standard deviation. These examples will enhance your understanding
of probability and normal distribution concepts.
Step 2
To find the z-value corresponding to an area of 0.0096 to its left, use the invNorm
function. This function requires the area to the left of the z-value as input. Enter 0.0096
into the invNorm function, and it will return the z-value that corresponds to this area.
To determine the z-value for an area of 0.0236 to its right, first calculate the area to the
left. Since the total area under the curve is 1, subtract 0.0236 from 1 to get 0.9764. Use this
value as the input for the invNorm function.
To find the IQ score corresponding to the top 3% of the population, use the invNorm
function with an area of 0.97 (since 1 - 0.03 = 0.97). The mean IQ score is 100, and the
standard deviation is 15. The invNorm function will provide the z-value for this area.
The invNorm function is a powerful tool for solving problems involving the normal
distribution. It allows you to find z-values and corresponding data points for specified
areas under the curve, making it essential for statistical analysis.
By mastering the use of invNorm, you can confidently tackle problems like determining
probabilities, critical values, and data points in various real-world contexts, such as test
scores, measurements, and more.
Mastering invNorm Applications
By following these examples, you have learned how to use the invNorm function to find z-
values for specific areas and calculate IQ scores for the top 3% of the population. These
skills are fundamental for understanding and applying concepts of normal distribution
and probability.
0.8997
0.8000
0.9500
0.7000
SUBMIT
What is the z-value corresponding to an area of 0.97 to the left in a standard
normal distribution?
1.88
2.00
1.96
1.88
SUBMIT
CO N T I N U E
Lesson 7 of 11
Confidence Intervals
Heather Waugh
Confidence Intervals
In this lesson, you’ll learn what a confidence interval is, why it matters, and how to
interpret and explain it in tutoring sessions.
ANIMAKER
Animaker
Awesome videos
READ MORE ANIMAKER
Confidence intervals provide a range of values that are likely to contain the true population
parameter. They are calculated based on sample data and offer a way to express the uncertainty
inherent in statistical estimates.
For population proportions, confidence intervals help us understand the range within which the
true proportion is expected to fall, given a certain level of confidence, such as 95% or 99%.
The formula for calculating a confidence interval for a population proportion is: p̂ ± Z * √(p̂ (1-
p̂ )/n). Here, p̂ represents the sample proportion, Z is the Z-score corresponding to the desired
confidence level, and n is the sample size.
This formula ensures that we account for both the variability in the sample and the level of
confidence we wish to achieve, making it a reliable tool for statistical analysis.
Suppose a survey finds that 60% of respondents favor a new policy, with a sample size of 100.
The confidence interval is 0.6 ± 0.096, or (0.504, 0.696). This means we are 95% confident that
the true proportion of people who favor the policy lies between 50.4% and 69.6%.
Interpreting the Results
The confidence interval provides a range of plausible values for the population proportion. In our
example, the interval (0.504, 0.696) suggests that the true proportion of people favoring the
policy is likely between 50.4% and 69.6%.
It is important to note that the confidence level (e.g., 95%) does not guarantee the true proportion
is within this range, but rather reflects the reliability of the method used to calculate the interval.
Suppose we have a sample of 25 students, and their test scores have a sample mean of 78 and a
sample standard deviation of 10. We want to calculate a 95% confidence interval for the
population mean. Since the population standard deviation is unknown, we will use the t-
distribution.
First, determine the degrees of freedom, which is the sample size minus one (25 - 1 = 24).
Next, find the t-value corresponding to a 95% confidence level and 24 degrees of freedom, by
using invT ((1-0.95)/2, 24) = 2.064.
Then, calculate the standard error of the mean by dividing the sample standard deviation by the
square root of the sample size (10 / √25 = 2).
Finally, multiply the t-value by the standard error to find the margin of error (2.064 × 2 = 4.128).
It is important to note that the width of the interval depends on the sample size and variability. In
this example, a larger sample size or lower variability would result in a narrower interval,
providing a more precise estimate. Understanding how these factors influence the interval helps
in making informed decisions based on the analysis.
“It’s like fishing with a net. You don’t know the exact value (the fish), but
your net (the interval) is wide enough to probably catch it.”
CO N T I N U E
Lesson 8 of 11
Hypothesis Testing
Heather Waugh
The first step in hypothesis testing is to clearly define the null hypothesis (Ho). This is a
statement that assumes no effect or no difference in the population being studied. It serves as the
baseline or default position that the test will challenge.
The alternative hypothesis is a mathematical statement and expresses what the researcher is
trying to prove. Whether greater than, less than, or not equal.
Next, you need to select a significance level, often denoted as alpha (α). This represents the
probability of rejecting the null hypothesis when it is actually true. Common choices for alpha are
0.05 or 0.01, depending on the level of confidence required.
Choosing a lower alpha value means you are being more stringent, reducing the likelihood of a
false positive but increasing the chance of a false negative.
Conduct the Test Statistic and find the p-value
Once the null hypothesis and significance level are set, the next step is to conduct the test using
appropriate statistical methods. This involves collecting data and calculating a test statistic, such
as a t-score or z-score, depending on the type of test being performed.
The test statistic is then compared to a critical value or used to calculate a p-value, which helps
determine whether the observed results are statistically significant.
Make a Decision
Based on the results of the test, you will either reject or fail to reject the null hypothesis. If the p-
value is less than the chosen significance level, you reject the null hypothesis, indicating that the
results are statistically significant.
If the p-value is greater than the significance level, you fail to reject the null hypothesis, meaning
there is not enough evidence to support a significant effect or difference.
Step-by-step example overview for one prop z test
In this example, we will walk through the process of conducting a hypothesis test for a
population proportion. Each step will provide detailed guidance to ensure clarity and
understanding of the methodology.
Step 2
The first step in hypothesis testing is to clearly define the null hypothesis (Ho) and the
alternative hypothesis (Ha). The null hypothesis represents the assumption that there is
no effect or no difference, while the alternative hypothesis represents the claim we are
testing.
For example, if we are testing whether the proportion of students who pass a math test is
70%, the null hypothesis would be Ho: p = 0.70, and the alternative hypothesis could be
Ha: p ≠ 0.70.
Step 3
For a population proportion, the test statistic is typically calculated using the z-test
formula: z = (p̂ - p) / √[p(1-p)/n], where p̂ is the sample proportion, p is the hypothesized
proportion, and n is the sample size.
For our example, if we are testing whether the proportion of students who pass a math
test is 70%. Suppose out of a sample of 100, 72% passed the math test.
z=
z= 0.436
Step 4
The next step is to calculate the p-value. Since Ha has the ≠ its a two tailed test.
Make a conclusion
For our example, the p-value is 0.8729, and let the significance level is 0.05, we fail
to reject the null hypothesis and conclude that there is insufficient evidence that
the proportion of students passing the math test is not 70%.
By following these steps, you can effectively conduct a hypothesis test for a population
proportion. This process ensures a structured approach to making data-driven decisions
based on statistical evidence.
Step by Step of a t test
In this section, we will explore the process of forming and testing a hypothesis for the
mean when the population standard deviation is unknown.
Step 2
Ho: μ = 50
Ha: μ < 50
Step 3
Test Statistic
The test statistic is calculated using the sample mean, sample standard deviation, and
sample size. The formula for the t-statistic is t = (x̄ - μ) / (s / √n), where x̄ is the sample
mean, μ is the hypothesized mean, s is the sample standard deviation, and n is the sample
size.
Step 4
P-value
df = n - 1
df = 30-1=29
Since p-value = 0.0184, let's assume a level of significance of 0.05, we reject Ho. (0.0184 <
0.05
There is significant evidence that the chocolate bars weigh less than 50 grams.
Reviewing Hypothesis Testing Steps
By following these steps, you can effectively test a hypothesis for the mean when the
population standard deviation is unknown. This process ensures a structured approach to
statistical analysis, enabling you to draw accurate and reliable conclusions.
CO N T I N U E
Lesson 9 of 11
Linear Regression
Heather Waugh
Understanding Scatterplots
Scatterplots are a visual representation of the relationship between two variables. Each point on
the graph represents a pair of values, one for each variable, plotted along the x-axis and y-axis.
This type of graph is particularly useful for identifying patterns, trends, or potential correlations
between variables.
By examining the overall distribution of points, you can determine whether there is a positive,
negative, or no relationship between the variables. Scatterplots are a foundational tool in
statistics and are often the first step in analyzing relationships between data sets.
What is the Correlation Coefficient?
The correlation coefficient, often represented as r, quantifies the strength and direction of the
relationship between two variables. It ranges from -1 to 1, where values closer to 1 indicate a
strong positive correlation, and values closer to -1 indicate a strong negative correlation. A value
near 0 suggests little to no linear relationship.
This metric is essential for understanding how closely two variables are related and whether
changes in one variable are associated with changes in the other. The correlation coefficient is a
key concept in statistical analysis and helps in making informed predictions based on data.
The coefficient of determination, denoted as r², measures the proportion of variance in the
dependent variable that can be explained by the independent variable. It is derived by squaring the
correlation coefficient (r²) and ranges from 0 to 1. A higher r² value indicates a stronger
explanatory power of the independent variable.
This statistic is particularly useful in regression analysis, as it provides insight into how well the
model fits the data. Understanding the coefficient of determination helps in evaluating the
reliability of predictions and the overall effectiveness of the analysis.
Analyzing the relationship between variables involves examining both the scatterplot and the
statistical measures such as the correlation coefficient and coefficient of determination. Together,
these tools provide a comprehensive understanding of how two variables interact.
By combining visual and numerical analysis, you can identify trends, assess the strength of
relationships, and make data-driven decisions. This approach is fundamental in statistical studies
and ensures a thorough evaluation of the data.
Least squares regression is a statistical method used to determine the best-fitting line through a
set of data points. This method minimizes the sum of the squared differences between the
observed values and the values predicted by the line. By doing so, it ensures that the line
represents the overall trend of the data as accurately as possible.
In practice, least squares regression is widely used in various fields, including economics, biology,
and engineering, to analyze relationships between variables. Understanding this method is crucial
for making informed predictions and decisions based on data.
The slope in a regression line represents the rate of change between the dependent and
independent variables. It indicates how much the dependent variable is expected to change for a
one-unit increase in the independent variable. A positive slope suggests a direct relationship,
while a negative slope indicates an inverse relationship.
For example, in a study analyzing the relationship between study hours and test scores, a slope of
2 would mean that for every additional hour of study, the test score is expected to increase by 2
points. Interpreting the slope correctly is essential for understanding the dynamics of the
variables involved.
The y-intercept is the point where the regression line crosses the y-axis. It represents the value of
the dependent variable when the independent variable is zero. In other words, it provides a
baseline value for the dependent variable in the absence of any influence from the independent
variable.
For instance, if a regression model predicts monthly sales based on advertising spend, the y-
intercept would indicate the expected sales when no money is spent on advertising. While the y-
intercept may not always have practical significance, it is a key component of the regression
equation.
Regression models are powerful tools for making predictions based on existing data. By plugging
values of the independent variable into the regression equation, you can estimate the
corresponding values of the dependent variable. This process allows you to forecast outcomes and
make data-driven decisions.
For example, if a regression model relates temperature to ice cream sales, you can predict future
sales based on upcoming weather forecasts. Accurate predictions depend on the quality of the data
and the appropriateness of the regression model used.
1 Equation of Linear Regression: The equation is typically
represented as y = b₀ + b₁x , where each symbol has a specific
meaning.
3 Symbol for Intercept Term: The intercept, b₀, is the value of the
dependent variable when the independent variable equals zero.
CO N T I N U E
Lesson 10 of 11
The TI 84 Calculator
Heather Waugh
Let's start with entering data in a list and finding descriptive statistics (mean, median,
standard deviation, five-number summary)
Step 2
Press STAT > CALC > 1-Var Stats, then choose L1 to get the descriptive stats
Remember: mean (x̄), standard deviation (Sx), min, Q1, median, Q3, max
Summary
The ability to use the Distribution Menu is key for many probability functions like
normalcdf, invNorm, tcdf, and invT. These functions are used for finding areas,
probabilities, and critical values.
Step 2
normalcdf (lower, upper, mean, standard deviation): Find area or probability under
normal curve
invNorm (area, mean, standard deviation): Finds z-score for a given area
Summary
Let us watch a video to practice using the normalcdf and invNorm functions
Z and t critical values for confidence intervlas
When calculating a confidence interval, finding the appropriate critical value is needed.
The z-critical value is for estimating the population proportion, and the t-critical value is
for estimating the population mean when the population standard deviation is unknown.
Step 2
Press 2nd > VARS to access probability functions like the invNorm and invT
Step 3
Press the STAT button to open the statistics menu and then choose 1: Edit to enter values
into two lists (e.g., L1, L2).
Use STAT>Tests>LinRegTTest
CO N T I N U E
Lesson 11 of 11
The End
Heather Waugh
ANIMAKER
Animaker
Awesome videos
READ MORE ANIMAKER
The upcoming section includes downloadable resources for your tutoring sessions.
TI-84_Quick_Guide.pdf
85.5 KB
Feel free to reach out to your math center coordinator with any questions,
👋 Now go out there and help someone love stats just a little more than they did
yesterday!
Share Your Feedback
Your feedback is invaluable in helping us improve this course. Please take a moment to
share your thoughts on the content, structure, and overall learning experience. Your
insights will guide us in creating even more effective and engaging materials in the
future.
We encourage you to be honest and specific in your responses. Let us know what worked
well and what could be improved. Together, we can ensure that this training continues
to meet your needs and expectations.
GOOGLE DOCS