0% found this document useful (0 votes)
14 views45 pages

Inferential Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views45 pages

Inferential Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

INFERENTIAL ANALYSES II

(RELATIONSHIPS)

Dr. Abdul Rahman Mahmoud Fata Nahhas


KOP – IIUM
Final Year Research Project
SEM 1
2023-24
CORRELATION ANALYSIS
Introduction
 Correlation measures the strength and direction of a
relationship that exists between two variables

 Partial correlation: Three or more variables are included,


& correlation between two variables is explored, while the
effect of others is removed

 E.G. Correlation between blood pressure and amount of


salt intake after adjustment for the effect of a third variable;
such as amount of fluid intake
Introduction

Example (positive correlation)

Typically, in the summer as the temperature


increases people are thirstier, consuming
more water
Introduction
Water
Temperature Consumption
For seven (C) (Liters)
random summer days, 25 1
a person recorded the
temperature and his
29 1.3
water consumption, 35 1.7
during a three-hour 37 1.9
period spent outside 39 2
41 2.3
44 3.1
Introduction
3.5

3
Water Consumption (L)

2.5

1.5

0.5

0
20 25 30 35 40 45 50

Temperature (C)
Introduction
 Correlation treats all variables equally

 Correlation does not take into consideration


whether a variable has been classified as a
dependent or independent variable
Introduction
 For instance, you might want to find out whether
basketball performance is correlated to a
person's height

 Thus, you’ll plot a graph of performance against


height and calculate the correlation coefficient r

 If - let’s say - r = 0.72, hence, we can conclude


that as height increases so does basketball
performance
Types of correlation
Two main types of Correlation Analysis
Pearson product-moment Spearman's Rank-Order
correlation (Parametric) Correlation (Non-Parametric)

REQUIRES DOES NOT REQUIRE


 A normally distributed Pearson correlation
data assumptions
 A linear relationship
between the two
variables in question
 No heteroscedasticity
Pearson product-moment correlation

 A parametric measure of the strength and direction of a


linear relationship that exists between two continuous
variables

 Denoted by the symbol r

 Attempts to draw a line of best fit through the data of two


variables

 The Pearson correlation coefficient, r, indicates how far


away all these data points are to this line of best fit (i.e.,
how well the data points fit this new line of best fit)
Spearman Rank-order Correlation

 A nonparametric measure of the strength and direction of


relationship that exists between two variables measured on at
least an ordinal scale

 Denoted by the symbol rs (or the Greek letter ρ, pronounced


rho)

 Used for either ordinal variables or for continuous data that


has failed the assumptions necessary for conducting the
Pearson's product-moment correlation
Detecting a linear relationship

 How can you detect a linear relationship


between tested variables?

 Simply by plotting the variables on a graph


(a scatterplot, for example) and visually
inspecting the graph's shape and observe the data
points and their location compared to the line of
best fit
Detecting a linear relationship

3.5

2.5

2 Linear relationship
1.5

0.5

0
5 10 15 20 25 30 35 40 45 50 55
Detecting a linear relationship

3.5

2.5

2 Linear relationship
1.5

0.5

0
0 10 20 30 40 50 60
Detecting a linear relationship

1.2

0.8

0.6 Non-linear relationship

0.4

0.2

0
10 12 14 16 18 20 22
Detecting a linear relationship

1.8

1.6

1.4

1.2

1
Curvilinear relationship
0.8

0.6

0.4

0.2

0
5 10 15 20 25 30
Correlation Coefficient

With the help of Correlation Coefficient, we can


determine:

1. The DIRECTION of the relation →


Positive or Negative

2. The STRENGTH of the relation among the


variables
Direction of Correlation
3.5

Positive
3
Correlation
Water Consumption (L)

2.5

1.5

0.5

0
20 25 30 35 40 45 50

Temperature (C)
Direction of Correlation
6

Negative
5 Correlation

4
Stress Score

0
15 20 25 30 35 40 45 50

Work Performance Score


Strength of Correlation

Strength of Coefficient
Correlation
Positive Negative
Small 0.1 to 0.29 - 0.1 to - 0.29

Medium 0.3 to 0.49 - 0.3 to - 0.49

Large 0.5 to 1 - 0.5 to - 1


Strength of Correlation

 If r (or rs) equals zero, then there is NO


RELATIONSHIP between the two variables

 r = 1 → perfect positive linear relationship

 r = -1 → perfect negative linear relationship


Strength of Correlation
Achieving a value of +1 or -1 means that all your
data points are included on the line of best fit
4.5 4

4
There are no data3.5points that show
3.5
any variation away 3 from this line
3
2.5
2.5
2
2
r = -1 r = +1
1.5
1.5
1
1

0.5 0.5

0 0
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 55
REGRESSION ANALYSIS
Definition

 A predictive statistical method that investigate


the strength of the relationship between TWO
SETS of variables

 It studies the dependence of one or more


variables (dependent variables) on one or more
variables (independent or predictor variables)
Regression Main Purposes
Regression PRIMARILY used to:
1. Estimate (describe) the relationship that exists between
the dependent variable(s) and the explanatory variable(s)
2. Determine the strength of impact of each of the predictor
variables on the dependent variable(s), controlling the
effects of all other predictor variables
3. Predict the value of dependent variable(s) for a given value
of the predictor variable(s)
Regression Equation

 Can be obtained from all types of regression analysis

 Once known, regression equation is used to predict


values of dependent variables, given the values of
independent (predictor) variables

 E.g., if we knew a person's weight, we can then


predict their blood pressure using regression
equation
Regression Equation
E.g., using the simple linear regression model, an equation
obtained can be as the following:

Y = β0 + β1 *X + e
 Typically Y is referred to the dependent variable, &
 X as the independent variable
 β0 is the intercept of the estimated line i.e. the value of Y
when X = 0
 β1 is the gradient of the estimated line [slope of the line] i.e.
the amount by which Y change with one unit change of X
 e is the error term or disturbance in the relationship,
represents factors other than X that affect Y
Types of Regression
Regression analysis is generally classified into two types

Simple Multiple
 Regression involves only  Regression involves more
two variables, one of than two variables,
which is dependent MAINLY, one of which is
variable and the other is dependent variable and the
explanatory (independent) others are explanatory
variable (independent) variables

 The associated model will  The associated model will


be a simple regression be a multiple regression
model model
Types of Regression
Type of dependent variables

Continuous Categorical
Number of Predictor Variables

1 Simple Linear Simple Logistic

>1 Multiple Linear Multiple Logistic


Linear Regression

 Linear Regression establishes a relationship


between dependent (Continuous) variable
(Y) and one or more independent (predictor)
variables (X) using a best fit straight
line (also known as regression line)
Linear Regression
E.g., Predicting patients measured blood glucose level (in
mg/dl) based on dose of insulin infusion (in IU) … SIMPLE
LINEAR REGRESSION

 Presume a sample of 20 DM patients for whom insulin infusion was


administered

 We can plot the values on a graph, with insulin dose on the X axis
and blood glucose on the Y axis

 If there were a perfect linear relationship between insulin dose and


blood glucose, then all 20 points on the graph would fit on a straight
line (But, this is never the case [unless your data are rigged])
Linear Regression
E.g., Predicting patients measured blood glucose level (in
mg/dl) based on dose of insulin infusion (in IU) … SIMPLE
LINEAR REGRESSION

 If there is a (non-perfect) linear relationship between insulin dose


and blood glucose (presumably a negative), then we would get a
cluster of points on the graph which slopes downward

 In other words, as insulin dose is increased; blood glucose level


declines…
Linear Regression
6
Y = β0 + β1 * X + e
5

4
BG = - 7.15 + .095 * Insulin dose
Glucose level

0
10 20 30 40 50 60 70 80

Inslin dose
Linear Regression
 MULTIPLE LINEAR REGRESSION is the same idea as
simple linear regression, except that we have several
independent variables predicting the dependent variable

 To continue with the previous example, assume that we now


want to predict a patient’s BG from insulin dose and gender
as well. In other word, we need to see if gender has also an
impact on the measured BG

 In this case independent variables (Predictors) are Insulin


dose & Gender; while dependent variable is BG
Linear Regression

 Multiple regression tells us the predictive


value of the overall model; all predictor
variables…

 In our example, then, the regression would


tell us how will Insulin dose and Gender
predict a patient’s BG
Linear Regression
DETERMINES THE STRENGTH OF IMPACT OF EACH OF THE PREDICTOR
VARIABLE ON THE DEPENDENT VARIABLE(S), CONTROLLING THE EFFECTS
OF ALL OTHER EXPLANATORY VARIABLES

 Multiple regression ALSO tells us how well each


predictor variable predicts the dependent variable,
controlling for each of the other predictor variables…

 In our example, then, the regression would tell us how


will Insulin dose predicts a patient’s BG, while
controlling for Gender, as well as how will Gender
predict a patient’s BG, while controlling for Insulin
dose
Linear Regression
Assumptions
1. Number of cases: When doing regression, the cases-to-
Independent Variables (IVs) ratio should ideally be 20:1;
that is 20 cases for every IV in the model. The lowest your
ratio should be is 5:1 (i.e., 5 cases for every IV
in the model)

2. Normality: the scores for each variable should be


normally distributed

3. Linearity: There must be linear relationship between


independent and dependent variables
Linear Regression
Assumptions
4. Absence of Multicollinearity: Multicollinearity exists when the
independent variables are highly correlated (r=.9 and above)

5. Absence of Singularity: Singularity occurs when one


independent variable is actually a combination of other
independent variables (e.g. when both subscale scores and the
total score of a scale are included)

6. Outliers: Linear regression is very sensitive to outliers (very


high or very low value on a particular item). Outliers can terribly
affect the regression line and eventually the forecasted values
Logistic Regression

 Used to find the probability of event of


Success and event of Failure

 Used when the dependent variable is binary


(0/ 1, True/ False, Yes/ No) in nature
Logistic Regression
E.g., Predicting if a group of people having depression
or no (depression Yes/No) based, for instance, on
place of residence (Urban/Rural)… SIMPLE
LOGISTIC REGRESSION

 Presume a sample of 50 persons whom depression


was assessed by a psychologist

 Person’s place of residence was reported


Logistic Regression
E.g., Predicting if a group of people having depression or no
(depression Yes/No) based for instance on place of residence
(Urban/Rural)… SIMPLE LOGISTIC REGRESSION

 On a graph, we can plot the result of depression


assessment (Y/N) on the Y axis and the results of
reported place of residence (U/R) on the X axis

 From the graph, we can infer if depression is more


likely to be present among urban or rural persons
Logistic Regression
Assumptions
 Number of cases: When doing regression, the cases-to-
Independent Variables (IVs) ratio should ideally be 20:1; that is
20 cases for every IV in the model. The lowest your ratio
should be is 5:1 (i.e., 5 cases for every IV
in the model)

 Normality: Logistic regression doesn’t require that data to be


normally distributed (non-parametric test)

 Linearity: Logistic regression doesn’t require linear


relationship between dependent and independent variables
Logistic Regression
Assumptions
 Absence of Multicollinearity: Multicollinearity exists when the
independent variables are highly correlated (r=.9 and above)

 Absence of Singularity: Singularity occurs when one


independent variable is actually a combination of other
independent variables (e.g. when both subscale scores and the
total score of a scale are included)

 Outliers: Logistic regression is sensitive to outliers (very high or


very low value on a particular item). Outliers can terribly affect
the regression line and eventually the forecasted values
THANK
YOU!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy