0% found this document useful (0 votes)

9 views4 pages

Unit 2 notes

The document provides an overview of methods for analyzing two-variable data, including techniques for comparing categorical and quantitative variables. It covers the use of two-way tables, segmented bar graphs, scatterplots, correlation coefficients, and linear regression models. Key concepts such as conditional distributions, relative risk, residuals, and the effects of outliers and leverage points are also discussed.

Uploaded by

rishi.nrkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Unit 2 notes

Uploaded by

rishi.nrkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Unit 2 Notes – Exploring Two-Variable Data

TPS – Chapter 3, Chapter 1 (p. 13-22)

Workshop Statistics(WS3)-Topics 6, 26-28
Compare 2 categorical variables: (TPS-ch. 1 p.13-22) (WS3-Topic 6)

 Be able to set up/read two-way tables.

 Be able to create segmented bar graphs.
 Calculate conditional distribution. (recognize exp/resp. var.)
 Know what makes two variables independent.
 Be able to calculate relative risk and interpret it in context.
Compare 2 quantitative variables: (TPS-ch. 3)(WS3-Topics 26-28)

 Be able to draw/read scatterplots. (explanatory on x-axis!)

 Be able to interpret the correlation coefficient (r). Know the properties of r.
 Interpret the slope and y-intercept in context.
 Recognize the definition of r2.
 Be able to write the equation of a least squares regression line either with the formulas,
the calculator, or computer printouts.
 Be able to find predicted/fitted values using the LSRL.
 Find residuals.
 Identify outliers, high leverage points and influential observations and their effects.

Two Categorical Variables:

 Calculate statistics for two categorical variables:

o The marginal relative frequencies are the row and column totals in a two-way
table divided by the total for the entire table.
o A conditional relative frequency is a relative frequency for a specific part of the
table – cell frequencies divided by the total for that row or column.
 A segmented bar graph can be used to compare two categorical variables. Be sure to
put the explanatory variable on the x-axis. Always use relative frequencies (percents) on
the y-axis, and each bar should always go up to 100%.
 The relative risk between two variables is the larger % divided by the smaller %. It is
interpreted as how many times more likely the top group is than the bottom group to
display a certain characteristic.
 Two variables are considered independent if knowing one does not affect the likelihood
of the other. In a segmented bar graph this leads to the bars having the same, or very
close percents.
Two Quantitative Variables:

 A bivariate quantitative data set consists of observations of two different quantitative

variables made on individuals in a sample or population.
 A scatterplot shows two numeric values for each observation, one corresponding to the
value on the x-axis and one to the value on the y-axis.
 An explanatory variable is a variable whose values are used to explain or predict
corresponding values for the response variable.
 A description of a scatterplot includes strength, direction, form, and unusual features.
o The direction of the association shown in a scatterplot, if any, can be described
as positive or negative.
 A positive association means that as values of one variable increase, the
values of the other variable tend to increase.
 A negative association means that as values of one variable increase,
values of the other variable tend to decrease.
o The strength of the association is how closely the individual points follow a
specific pattern (such as linear), and can be shown in a scatterplot. Strength can
be described as strong, moderate, or weak.
o The form of the association shown in a scatterplot, if any, can be described as
linear or non-linear.
o Unusual features of a scatterplot include clusters of points or points with
relatively large discrepancies between the value of the response variable and a
predicted value for the response variable.
 Correlation:
o The correlation, r, gives the direction and quantifies the strength of the linear
association between two quantitative variables.
o The correlation, r, is unit-free, and is always between -1 and 1, inclusive. A value
of 0 indicated there is no linear association. A value of 1 or -1 indicates that
there is a perfect linear association.
o Correlation does not make a distinction between explanatory and response
variables. Switching x and y does not change the correlation.
o r uses the standardized values of the observations (z-scores), so r does not
change when we change the units of measurement of x, y, or both. (changing
from inches to cm does not affect the correlation.) Correlation has no units of
measurement, it is just a number.
o Correlation measures the strength of only a linear relationship between two
variables. Correlation does not describe curved relationships, no matter how
strong they are.
o Correlation is not resistant. It is strongly affected by outliers.
o A correlation close to 1 or -1 does not necessarily mean that a linear model is
appropriate. Remember to also look at the scatterplot and/or the residual plot
to see if the data is actually linear.
o A perceived or real relationship between two variables does not mean that
changes in one variable cause changes in the others. Correlation does not
necessarily imply causation.
 Linear Regression Models:
o A simple linear regression model is an equation that uses an explanatory
variable, x, to predict the response variable, y.
o The predicted response value, 𝑦̂, is calculated as 𝑦̂ = 𝑎 + 𝑏𝑥, where a is the y-
intercept and b is the slope of the regression line, and x is the value of the
explanatory variable.
o Extrapolation is predicting a response value using a value for the explanatory
variable that is beyond the interval of x-values used to determine the regression
line. The predicted value is less reliable as an estimate the further we
extrapolate.
 Residuals:
o The residual is the difference between the actual value and the predicted value
(A-P) 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑦 − 𝑦̂.
o The sum and mean of the residuals is always 0.
o A residual plot is a plot of residuals vs. explanatory variable values or predicted
response values.
o Residual plots can be used to investigate the appropriateness of a selected
model. Apparent randomness in a residual plot for a linear model is evidence of
a linear form to the association between the variables. A pattern indicates that
the attempted model is not a good fit.
 Least Squares Regression Line (LSRL) 𝑦̂ = 𝑎 + 𝑏𝑥
o The LSRL minimizes the sum of the squares of the residuals and always passes
through the point (𝑥, ̅ 𝑦̅).
o x and y are the variables of the LSRL, while a and b are the coefficients of the
line.
𝑠𝑦
o b is the slope of the LSRL and can be calculated by 𝑏 = 𝑟 𝑠
𝑥
o The slope can be interpreted as the predicted change in y for every one unit
increase in x.
o The y-intercept can be calculated by 𝑦̅ = 𝑎 + 𝑏𝑥̅ or 𝑎 = 𝑦̅ − 𝑏𝑥̅ .
o The y-intercept is the predicted value of the response variable (y) when the
explanatory variable (x) is 0.
o Sometimes the y-intercept of the line does not have a logical interpretation in
context.
o 𝑟 2 is the square of the correlation, r. It is also called the coefficient of
determination. 𝑟 2 is the proportion of variation in the response variable (y) that
is explained by the regression line (LSRL) with the explanatory variable (x).
 Departures from Linearity (unusual features)
o An outlier in regression is a point that does not follow the general trend shown
in the rest of the data and has a large residual when the LSRL is calculated. (will
stick out on the y-axis from the rest of the data – far off the line)
o A high-leverage point in regression has a substantially larger or smaller x-value
that the other observations have. (will stick out on the x-axis from the rest of the
data)
o An influential point in regression is any point that, if removed, changes the
relationship substantially. Examples include much different slope, y-intercept,
and/or correlation. Outliers and high leverage points are often influential.
 Transformations of data
o Transformations of variables, such as evaluating the natural log of each value of
y, or squaring each value of the x, can be used to create transformed data sets,
which may be more linear in form than the untransformed data.
o Increased randomness in residual plots after transformation of data and/or an 𝑟 2
value closer to 1 offers evidence that the LSRL for the transformed data is a more
appropriate model to predict responses to the explanatory variable than the line
for the untransformed data.

The Power of Ambition - Jim Rohn
No ratings yet
The Power of Ambition - Jim Rohn
208 pages
DA_UNIT_3_R22
No ratings yet
DA_UNIT_3_R22
15 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
Simple Linear Regression (1)
No ratings yet
Simple Linear Regression (1)
83 pages
Regression Presentation
No ratings yet
Regression Presentation
20 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
The Vocabulary of Epistemology PDF
No ratings yet
The Vocabulary of Epistemology PDF
7 pages
Chapter 3 Describing Relationships
No ratings yet
Chapter 3 Describing Relationships
39 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Econometrics for Mgt ppt-2 (1)
No ratings yet
Econometrics for Mgt ppt-2 (1)
58 pages
CH 5
No ratings yet
CH 5
36 pages
Recall From Last Time: Section 8 Scatter Plots and Linear Regression
No ratings yet
Recall From Last Time: Section 8 Scatter Plots and Linear Regression
3 pages
FDX SDK Pro Programming Manual (Windows) SG1-0030A-015
No ratings yet
FDX SDK Pro Programming Manual (Windows) SG1-0030A-015
64 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
28 pages
Presentation For ECounselling
No ratings yet
Presentation For ECounselling
26 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
8.1-linear-regression-and-correlation-analysis-glossary
No ratings yet
8.1-linear-regression-and-correlation-analysis-glossary
8 pages
Assignment-Based Subjective Questions
100% (1)
Assignment-Based Subjective Questions
10 pages
Bivariate EDA and Regression Analysis
No ratings yet
Bivariate EDA and Regression Analysis
61 pages
Chapter 2
No ratings yet
Chapter 2
67 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Unit-III
No ratings yet
Unit-III
13 pages
Security of Javacard Smart Card Applets: Erik Poll
No ratings yet
Security of Javacard Smart Card Applets: Erik Poll
30 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
F16 BU111 Course Outline - All Sections
No ratings yet
F16 BU111 Course Outline - All Sections
20 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Chapter 4 Quiz - 2019-20 1q Ece154p Cola
No ratings yet
Chapter 4 Quiz - 2019-20 1q Ece154p Cola
14 pages
STAR Rando Questions Stats
No ratings yet
STAR Rando Questions Stats
14 pages
ch._3_review_packet
No ratings yet
ch._3_review_packet
9 pages
1486016038da-mod12-Q1-e-text
No ratings yet
1486016038da-mod12-Q1-e-text
11 pages
Chapter 8 Ppt New Period 3
No ratings yet
Chapter 8 Ppt New Period 3
12 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
CH 6
No ratings yet
CH 6
43 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Lecture 5 - Scatter Plots
No ratings yet
Lecture 5 - Scatter Plots
6 pages
Saranraj B
No ratings yet
Saranraj B
3 pages
Number Systems: Class 5
No ratings yet
Number Systems: Class 5
182 pages
Chapter 3: Describing Relationships: Section 3.2
No ratings yet
Chapter 3: Describing Relationships: Section 3.2
23 pages
Slurry Airlift CFD
No ratings yet
Slurry Airlift CFD
9 pages
Examining Relationships Scatterplot Analysis.: R N 1 Xi X SX Yi y Sy
No ratings yet
Examining Relationships Scatterplot Analysis.: R N 1 Xi X SX Yi y Sy
3 pages
Đáp Án C A The Growth of Bike-Sharing Schemes Around The World
No ratings yet
Đáp Án C A The Growth of Bike-Sharing Schemes Around The World
4 pages
Topic - 9 PDF
No ratings yet
Topic - 9 PDF
12 pages
Reading-Comprehension-Worksheets-Sample
No ratings yet
Reading-Comprehension-Worksheets-Sample
15 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
23 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
A Review of "Chomsky's Universal Grammar: An Introduction"
No ratings yet
A Review of "Chomsky's Universal Grammar: An Introduction"
3 pages
COUGAR Terninator Usermanual
No ratings yet
COUGAR Terninator Usermanual
2 pages
Week 2 - Linear Regression
No ratings yet
Week 2 - Linear Regression
5 pages
AP Stats Module 2 Notes
No ratings yet
AP Stats Module 2 Notes
2 pages
Ra Web
No ratings yet
Ra Web
70 pages
Assignment 1 Variables String Int Float
No ratings yet
Assignment 1 Variables String Int Float
2 pages
PDF
No ratings yet
PDF
136 pages
Chapter 10 Regression Analysis
No ratings yet
Chapter 10 Regression Analysis
3 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Monitors NEC CRT FE Superbright Series Brochure - 17-22 Inch (770,771,791,990.991,2111) - 2002-04
No ratings yet
Monitors NEC CRT FE Superbright Series Brochure - 17-22 Inch (770,771,791,990.991,2111) - 2002-04
4 pages
Feedback Form: ENC/ECDIS Data Presentation and Performance Check For Ships
No ratings yet
Feedback Form: ENC/ECDIS Data Presentation and Performance Check For Ships
6 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Theseis PPT Film City
No ratings yet
Theseis PPT Film City
22 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Looking at Data: Relationships: Least-Squares Regression
No ratings yet
Looking at Data: Relationships: Least-Squares Regression
23 pages
Transportation Models: Operations Management
No ratings yet
Transportation Models: Operations Management
13 pages
STROBE Checklist BMJ Open Cross Sectional Studies
No ratings yet
STROBE Checklist BMJ Open Cross Sectional Studies
2 pages
Eco Trix
No ratings yet
Eco Trix
16 pages
Syllabus of Mechanics of Materials
No ratings yet
Syllabus of Mechanics of Materials
3 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Wave Nature of Matter: A Project File
No ratings yet
Wave Nature of Matter: A Project File
13 pages
Classifications of Magnetic Materials
100% (1)
Classifications of Magnetic Materials
3 pages
IALA Recommendation O-133 On Emergency Wreck Marking Buoy Edition 1 December 2005
No ratings yet
IALA Recommendation O-133 On Emergency Wreck Marking Buoy Edition 1 December 2005
4 pages
Presented By: Samiksha B.Sawant M, PHARM (IP), 1 SEM Guided By: Dr. Indira Parab
No ratings yet
Presented By: Samiksha B.Sawant M, PHARM (IP), 1 SEM Guided By: Dr. Indira Parab
46 pages
AP Statistics Chapter 3
0% (1)
AP Statistics Chapter 3
3 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
(British Standard) Bs 6349-6-1989
100% (1)
(British Standard) Bs 6349-6-1989
62 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Performance Study Based On Matlab Modeling For Hybrid Electric Vehicles
No ratings yet
Performance Study Based On Matlab Modeling For Hybrid Electric Vehicles
5 pages
23-I Effective Clamping Force PDF
No ratings yet
23-I Effective Clamping Force PDF
2 pages
YMS Topic Review (Chs 1-8)
No ratings yet
YMS Topic Review (Chs 1-8)
7 pages
A New Method of Grounding Grid Design
No ratings yet
A New Method of Grounding Grid Design
5 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2 notes

Uploaded by

Unit 2 notes

Uploaded by

Unit 2 Notes – Exploring Two-Variable Data

TPS – Chapter 3, Chapter 1 (p. 13-22)

 Be able to set up/read two-way tables.

 Be able to draw/read scatterplots. (explanatory on x-axis!)

Two Categorical Variables:

 Calculate statistics for two categorical variables:

 A bivariate quantitative data set consists of observations of two different quantitative

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.