0% found this document useful (0 votes)
141 views6 pages

Notes 2 - Scatterplots and Correlation

This document discusses scatterplots and correlation. It provides examples of scatterplots with different forms (linear, exponential, logarithmic), directions (positive and negative slope), and strengths (weak, moderate, strong). It also discusses interpreting scatterplots by looking for patterns, outliers, and the overall relationship between two variables. Finally, it defines correlation as a measure of the linear relationship between two variables on a scale of -1 to 1, and provides facts about interpreting the strength and direction of correlation.

Uploaded by

kjogu giyvg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views6 pages

Notes 2 - Scatterplots and Correlation

This document discusses scatterplots and correlation. It provides examples of scatterplots with different forms (linear, exponential, logarithmic), directions (positive and negative slope), and strengths (weak, moderate, strong). It also discusses interpreting scatterplots by looking for patterns, outliers, and the overall relationship between two variables. Finally, it defines correlation as a measure of the linear relationship between two variables on a scale of -1 to 1, and provides facts about interpreting the strength and direction of correlation.

Uploaded by

kjogu giyvg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Unit 2 – Exploring Two Variable Data

Notes 2 – Scatterplots and Correlation

We have worked with data for bar graphs, box plots, dot plots, and histograms. The type of data that is
represented with these graphs are _________________________ data.

In our Vitruvian Man activity, we explored ____________________________ data. When we compare two
variables, we are exploring the relationship between them.

Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to
compare two data sets by using side by side boxplots or histograms etc. However, there are times where we
want to examine relationships among several variables for the same group of data (in the Vitruvian Man
activity, each person had TWO sets of data: height and arm span).

Graphing Two Variables


When you examine the relationship between two variables you need to start with a scatterplot.

A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The
values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical
axis. Each individual in the data appears as a point in the plot fixed by the values of both variables for that
individual.

Here is a previous class’s data from this activity:

First, we have to identify and name the


Arm Span vs. Height correct variables we used in this activity.

195 A _______________________ variable


measures the outcome of a study or an
185
observation.
175
An ______________________ variable
Height

165 helps explain or influences change in a


response variable.
155

145
**You will often find explanatory
135 variables called independent variables, and
135
145 155 165 175 185 195 response variable called dependent
Arm Span variables. The idea behind this language
is that the response variable depends on
the explanatory variable. Because the
words independent and dependent have other, unrelated meanings in statistics, we won’t use them here.

X variable = ______________________________ = ______________________________

Y variable = ______________________________ = ______________________________


Interpreting a Scatterplot

● In any graph of data, look for overall pattern and for striking deviations from that pattern.
● You can describe the overall pattern of a scatterplot by the direction, form, and strength of the
relationship.
● An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of
the relationship.
Things to look for in a scatterplot:
● Form: Overall pattern (linear, exponential, etc.) or deviations from the pattern (outliers)
● Direction: Positive or negative slope
● Strength: How close do the points lie to a simple form (such as a line)

Form:

Linear Exponential Logarithmic

Direction:

Positive Negative

Strength:
Weak Moderate Strong

Example
Suppose we hypothesize that the number of doctor visits a person has can be explained by the amount of
cigarettes they smoke. So we want to see if there is a relationship between the number of cigarettes one smokes
a week and the number of times per year one visits a doctor. We ask 10 random people and get the following
information:

# of Cigarettes Per Week 0 3 21 15 30 5 40 60 0 0


Number of doctor visits per year 1 2 4 3 5 1 5 6 2 0

Create a scatterplot and describe the relationship between the variables.

Creating a Scatterplot on the Calculator


Use the example on the previous page to create a scatterplot on the calculator.
1) Load the x-values into list 1 and the y-values into list 2.
2) Using StatPlot - highlight the mini-scatterplot: XList: L1 and YList: L2
3) Press “graph” but you will have to “Zoom 9” to fit the scatterplot on the screen
4) Enjoy

Unusual Features
Outliers and Influential Points Gaps Clusters

Correlation
In order to strengthen the analysis when comparing two variables, we can attach a number, called the
correlation coefficient (r), to describe the linear relationship between two variables. This number helps remove
any subjectivity in reading a linear scatter plot.

The correlation measures the strength and direction of the linear relationship between two quantitative
variables.

While we will never have to find correlation by hand, the formula is provided to us on the AP Statistics formula
sheet. There are a few facts about the correlation that the formula can help us remember.

____ : correlation
____ : each x-value
____ : each y-value
____ : mean of the x values
____ : mean of the y values
____ : standard deviation of x values
____ : standard deviation of y values

Essentially, the correlation coefficient, r, finds the average of the product of the standardized scores.

Fact #1 : Correlation is a number that is between -1 and 1.


r = -1 r=0 r=1

A perfect, negative linear No linear relationship A perfect, positive linear


relationship relationship

Fact #2 : Positive correlations between 0 and 1 have varying strengths, with the strongest positive correlations being closer to 1.

r = 0.2 r = 0.5 r = 0.9

A weak, positive linear A moderate, positive linear A strong, positive linear


relationship relationship relationship

*Note: There is no magic “cutoff number” for describing weak/moderate/strong. Use your statistical intuition!

Fact #3 : Negative correlations between -1 and 0 have varying strengths, with the strongest negative correlations being closer to -1.
r = -0.9 r = -0.65 r = -0.3

A strong, negative linear A moderate, negative linear A weak, negative linear


relationship relationship relationship

Fact #4 : Correlation describes only linear relationships between two variables.

MANY students are tempted to say that this scatterplot has a correlation of -1
because it is a perfect negative quadratic relationship.

NO NO NO!!!!

This would have a correlation of ______ because it is not a linear relationship. Do


not make this mistake. Correlation only describes linear relationships.

Fact #5 : Correlation does not have units and changing units on either axis will not affect correlation.

To see this, take a look at the formula again:


r=
1
n−1
∑ ( )( )
x i− x
Sx
yi − y
Sy

Since we are standardizing all the x and y values, it does not matter what the units are! We take the product of
their standardized scores. Speaking of that….

Fact #6 : Switching the explanatory and response variables on the axes will not change the correlation.

Again, looking at the formula, this is because the order of the multiplication does not matter. Correlation makes
no distinction between explanatory and response variables. It makes no difference which variable you call x
and which you call y when calculating the correlation.

Fact #7 : Correlation is very strongly affected by outliers.

Use correlation with caution when outliers appear in your scatter plot. Don’t rely on correlation alone to
determine the linear strength between two variables – graph a scatter plot first!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy