0% found this document useful (0 votes)
16 views92 pages

Hoda MR S23 Hypothesis Testing Updated

This document provides an overview of hypothesis testing in marketing research. It defines descriptive versus inferential statistics, and explains the hypothesis testing process. Key aspects covered include: setting up the null and alternative hypotheses; deciding on a significance level (typically 5%); choosing an appropriate test statistic such as z or t; running the test using statistical software; and interpreting the results by comparing the p-value to the significance level. An example is provided to illustrate a hypothesis test comparing the mean household income in a new market area to a cutoff value.

Uploaded by

Nadine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views92 pages

Hoda MR S23 Hypothesis Testing Updated

This document provides an overview of hypothesis testing in marketing research. It defines descriptive versus inferential statistics, and explains the hypothesis testing process. Key aspects covered include: setting up the null and alternative hypotheses; deciding on a significance level (typically 5%); choosing an appropriate test statistic such as z or t; running the test using statistical software; and interpreting the results by comparing the p-value to the significance level. An example is provided to illustrate a hypothesis test comparing the mean household income in a new market area to a cutoff value.

Uploaded by

Nadine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

MKTG 3201: Marketing Research

Hypothesis Testing
Overview

 Descriptive Vs. Inferential Statistics

 Hypothesis Testing Process

 Inferential Analysis Tests


Descriptive Vs. Inferential Statistics

 Descriptive Statistics
 Involves measures of central tendency and dispersion, one-way tables
 Helps summarize the general nature of the study variables
 Inferential Statistics
 Data analysis aimed at testing specific hypotheses
 Helps infer conclusions to the study population (with high level of
confidence)
 In sampling we cant be 100% confident: sampling error…When 100%??
 80% vs. 99%: risk (airplane vs. chair), tolerance, maximum allowable error
(5%)
 In statistics we do tests to judge how confident we are that a decision is
correct
Parameter Vs. Statistic

 Parameter
Numbers that summarize/describes data for an entire population.
The actual, or true, population mean value or population proportion
or standard deviation for any variable (income, product ownership,
etc.)
Ex: the mean: µ
 Can I know it?

 Statistic
An estimate of a parameter from sample data
Ex: the mean: X
Sampling Errors

 The difference between a statistic value that is generated


through a sampling procedure and the parameter value,
which can be determined only through a census study
 We cant be 100% unless census, and wont need sampling and
statistics then
 Trying to get a sample as representative as possible, so the
difference between population & sample is min, so the inference is as
accurate as possible (increase the confidence & minimize the risk)
 Minimizing Sampling Errors
 Increase the sample size
 Make the sample as representative of the population as possible
 Use a statistically efficient sampling plan: probability, good judgement
(quotas of the important variables)
Sampling Distribution

Representation of the sample statistic values obtained


from every conceivable sample of a certain size chosen
from a population by using a specified sampling procedure
along with the relative frequency of occurrence of those
statistic values
Example: Samples’ Means

 Mean Income (in thousands) of 45 samples

Mean # Mean #
75 1 300 4
100 1 325 4
125 2 350 3
150 2 375 3
175 3 400 2
200 3 425 2
225 4 450 1
250 4 475 1
275 5
Samples’ Mean Distribution

6/45

5/45

4/45

3/45

2/45

1/45

0
75 125 175 225 275 325 375 425 475
100 150 200 250 300 350 400 450

Sample Mean Values ($)


Sampling Distribution

 Not the distribution of your sample data, but distribution of the various mean of all
possible samples for 1 variable

F
r
q
u Normal
e probability
n distribution
c
y
o
f
O
c
c
u
r
r
e
n
c
e
100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 500.0

Sample Mean Values Population mean value


Normal Distribution

 Not the distribution of your sample data, but the means of all possible samples for
1 variable (probability distribution)
 We will just be using it to judge the confidence

2 SD
2 SD
2.5% 2.5%

__ __

Xcritical 1 µ Xcritical 2
Null & Alternative Hypotheses

 Hypotheses always infer to the study population


 Ho & Ha should be both Mutually Exclusive & Collectively
Exhaustive
 H0 : Null Hypothesis:
 What we would like to reject
 WE CAN NEVER ACCEPT IT!
 Ha : Alternative Hypothesis:
 What we want to prove (with a high level of confidence: can never be
100%)
 Else: we fail to prove with confidence (no enough evidence to prove it)
 We can never be 100% confident in proving Ha ; Never has the equality
Significance Level

 The researcher decides the maximum acceptable error (α).


Traditionally α=0.05 (95% confidence) (management judgement)
 Any statistical test generates a (p-value),
It is the probability of incorrectly rejecting the null hypothesis (H 0) and
supporting the alternative hypothesis (Ha) – Type I Error
 If p-value < α, we can reject H0 and accept Ha. It means that we are
confident enough to conclude & infer to the study population.
 If p-value > α, we cannot reject H0 nor accept Ha. It means that we
are not confident enough to conclude & infer to the study
population (No conclusion)
 BEWARE! WE CANNOT ACCEPT H0 & REJECT Ha

(Males & Females income example)


One-Tailed Vs. Two-Tailed Tests

 One-Tailed Hypothesis Test


 Values of the test statistic leading to rejecting H0 fall in one tail of the
sampling distribution curve.
 H0 in this case is called “directional”.
Based on prior information, strong qualitative
 Example: H0: µ <3 & Ha: µ >3
 Two-Tailed Hypothesis Test
 Values of the test statistic leading to rejecting H0 fall in both tails of
the sampling distribution curve.
 H0 in this case is called “non-directional”.
 Example: H0: µ =3 & Ha: µ = 3
 Which is better in mean calculations? i.e. which gives higher chance to
prove Ha?
Two-Tailed Hypothesis Test

2 SD
Rejection Area 1 2 SD Rejection Area 2
(α/2=0.025) (α/2=0.025)

__
__
µ=3 Xcritical 2
Xcritical 1
One-Tailed Hypothesis Test

Rejection Area (α=0.05)

1.645 SD

__

µ=3 xcritical
Hypothesis Testing Process

1. Setup the hypotheses


2. Decide on the significance level (e.g. α=0.05)
3. Decide on the test statistic to use (z, t, …)
4. Run the test on a statistical software (e.g. SPSS)
5. Interpret the results:
 Compare the p-value from the software output to α (p-value
represents the actual Type I error, while α is the maximum
acceptable Type I error).
 If the p-value < α, then you reject H0 & accept Ha
 If the p-value > α, then you fail to reject H0 & fail to accept Ha

 Males & Females income example


Example

 A product manager of a line of apparel wonders whether to introduce


the product line into a new market area.

 A random sample of 400 households in the new market area indicated


that the average income per household is $30,000 with a standard
deviation of $8,000.

 It is believed that the product line will be successful if the average


income per household is >$29,000

 Should the new product line be introduced?


Hypotheses

 H0: µ < 29,000


 The average income per household is <= $29,000
 This what you would like to reject
 Ha: µ > 29,000
 The average income per household is > $29,000
 This what you would like to prove
 Maximum acceptable error (α) = 5% (0.05)
 One of two conclusions could be drawn:
 Reject H0 & accept Ha
 Fail to reject H0 & Fail accept Ha
 NEVER ACCEPT H0 & Reject Ha
One-Tailed Hypothesis Test

Rejection Area (α=0.05)

1.645 SD

__
µ=29,000 xcritical
Z-Test to Test the Hypotheses

z = (x µ)/ sx sx = S/√n
z = test statistic (to be compared to Zcritical)
The number of standard deviations from the mean in this sample (of our test)
x = sample mean
µ = population mean
sx = standard error of the sample mean (SD of sampling distribution)
S = standard deviation of the sample
n = sample size
 If z > Zcritical then we can reject H0
 Compare z to the critical value (if 2-tailed: 2 or 1.96 or if 1-tailed: 1.645), if bigger:
then I reached the hashed area, my p is less than 5% and I am confident enough
(more than 95%) that Ha is correct, proved Ha with enough confidence
 I hope that (x-µ) to be as large as possible & S as small possible, n large
Z-Test to Test the Hypotheses

z = (x - µ)/ sx sx = S/√n

X = $30,000 ; n = 400, S = $8000


zcritical (for α=0.05) = 1.645 (depends on 1 or 2 tailed)
sx = $8000/√400 = $400
z = ($30,000-$29,000)/$400 = 2.5 > zcritical
Reject H0 & Accept Ha

 We can be confident enough to introduce the new product


line based on the mean household income information
available
Do We Need All These Calculations?

NO!(but maybe in the exam!)


All what you need to do is:
 Setup the hypotheses

 Decide on the significance level (e.g. α=0.05)

 Decide on the test statistic to use (z, t, …)

 Run the test on a statistical software (e.g. SPSS)

 Interpret the results:

 Compare the p-value from the software output to α (p-value


represents the actual Type I error, while α is the maximum acceptable
Type I error).
 If the p-value < α, then you can reject H0 & accept Ha
 If the p-value > α, then you fail to reject H0

NEVER EVERRRRR: ACCEPT the NULL and REJECT the ALT.
Inferential Analysis Tests

1. One Sample T-Test


2. Independent Samples T-Test
3. Analysis of Variance (ANOVA)
4. Paired Samples T-Test
5. Chi Square Contingency Test (Cross-Tabulation)
6. Spearman Correlation
7. Pearson Correlation
8. Simple Linear Regression
9. Multiple Regression

 When to select each (based on your research questions),


the Hypothesis for each, how to run it, and how to interpret
the data
1. One Sample T-Test

To determine whether an unknown population mean is different


from a specific value.
 1 metric variable (interval or ratio) (previous example (H0: µ<29 & Ha: µ>29)
 Compares the mean of your sample to a number
 Need to have a magical number
 H: Will always be µ and a number. Where to put the equal?
 Can be 1 or 2 tailed
 Use a z-test (normal distribution) only if n>30
 The most appropriate test is the t-test, which utilizes a t-
distribution (instead of a normal distribution). (SPSS uses t)
 t = (x-µ)/sx but for 1-tail: tcritical=1.71 (instead of 1.645; because it follows a
t-distribution (minimal difference!)
One-Tailed Hypothesis Test

 What if in the other side of the graph?? (always check the direction before
starting in 1-tailed)
 What if in the non-rejection area?

Rejection Area (α=0.05)

1.71 SD

__
µ xcritical
One-Sample T-Test: SPSS

 Use SPSS File

Analyze
Compare Means
One-Sample T-Test
Choose the variable
Select the test value (µ)
One-Sample T-Test: Example 1
 Start by checking the direction!
z = (x - µ)/ sx sx = S/√n
x - µ = mean difference
S x = Std. Error Mean
It is 1 tail: tcritical =1.71
One-Sample T-Test: SPSS

One-Sample Statistics

Std. Error
N Mean Std. Deviation z=Mean
(x - µ)/ sx
Bottles per week 17 9.1176 6.63214 1.60853
sx = S/√n
2. Independent Samples T-Test

 Compares the means of two independent groups in order to determine


whether there is statistical evidence that the associated population means
are significantly different.
 2 variables: One Metric Variable (will calculate its mean) and One Non-
Metric Variable with 2 Groups Only (used to divide the groups)

Remember the Treatment Effects in


Experimental Designs?
In other words how do we get O2-O1?

EG: R X
01
2. Independent Samples T-Test

 2 sample T-Test
 Test of Two Means
 2 groups (can be 2 samples or one sample divided into 2)
and analyzing the difference between the 2 groups on a
metric variable
The means for the two populations are equal. Alternative
hypothesis: The means for the two populations are not equal.

Two samples
measured
at the same
City 1 time City 2
2. Independent Samples T-Test

 Analyze the difference between males and females in terms of the


scores: comparing between 2 groups on a metric variable
 Instead of getting the mean and comparing to magical number, I divide
into 2 groups, get the mean of each and compare to each other.
 The income and discrimination example

Mid-term Mid-term
Exam Exam
Scores for Scores for
Males Females
2. Independent Samples T-Test

Two-Tailed:
H0: 1 = 2 or 1 - 2 = 0
Ha: 1  2 or 1 - 2  0
One-Tailed:
H0: 1 < 2 or 1 - 2 < 0
Ha: 1 > 2 or 1 - 2 > 0
 It compares the average of two samples and tell you whether they are
statistically different from each other.
 Compare the difference in population means to zero.
 Use a z-test (normal distribution) only if n>30
 The most appropriate test is the Independent Sample T-Test, which
utilizes a t-distribution.
2. Independent Samples T-Test

 Tests the difference between two means μ1- μ2


Independent means (averages)
 Each respondent has been only measured once
 Dependent variable is metric
 We calculate a t-score
 We compare it to a critical t-score and see if it exceeds
it, then μ1 is not equal to μ2 and hence there is a
statistically significant difference
2. Independent Samples T-Test

Formula
Difference between the means divided by the pooled
standard error of the mean

x1  x2
t
s x1  x2
 The t score is a ratio between the difference between two groups
and the difference within the groups. The larger the t score, the
more difference there is between groups. The smaller the t score, the
more similarity between groups.
Independent Samples T-Test: SPSS

Analyze
Compare Means
Independent-Sample T-Test
Choose the test variable: bottles per week
Choose the grouping variable: Gender
Define the two groups: 1 & 2

Independent Samples Test  If Sig < 0.05 go down


Levene's Test for Samples Test
Independent

Levene's Test for


Equality of Variances If higher go up
t-test forEquality of Means
Equality of Variances t-test for Equality of Means 95% Confidence
Interval of the
95% Confidence
Mean Std. Error Interval Difference
of the
F Sig. t df Sig. (2-tailed)
Mean Difference
Std. ErrorDifference Difference
Lower Upper
Bottles per week FEqual variances
Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
Bottles per week Equal variances .542 .473 .085 15 .934 .2857 3.37474 -6.90737 7.47879
assumed
.542 .473 .085 15 .934 .2857 3.37474 -6.90737 7.47879
assumed
Equal variances
Equal variances .082 11.777 .936 .2857 3.46763 -7.28547 7.85690
not assumed .082 11.777 .936 .2857 3.46763 -7.28547 7.85690
not assumed
3. Analysis of Variance (ANOVA)

 Compare more than 2 means

 H0: 1 = 2 = 3 =……. = n

 Ha: NOT (1 = 2 = 3 =……. = n)

 Scheffe Post Hoc Test


SPSS Exercise 1
 Age & Health
SPSS Exercise 2
 Age & Cig Cosump before
4. Paired Samples T-Test

 The need to check for significant differences between


two mean values when the samples are not independent
 Tests the mean difference (μ1- μ2): μd

 Independent means (averages)

 Each respondent has been measured Twice


 Using metric scale
 We calculate a t-score
 We compare it to a critical t-score and see if it exceeds it,
then (μ1- μ2) is not equal to Zero, and hence there is a
statistically significant difference.
4. Paired Samples T-Test

Same Sample
tested in two
points in time

Sample 1 Sample 1
Before After

Average Score Average Score After


Before
4. Paired Samples T-Test: Example 1

Same
Patients
Patients
Before a
After the
treatment
treatment

Average Score Before Average Score After


4. Paired Samples T-Test: Example 2

Coca Cola’s Pepsico’ s


Evaluation Evaluation

Same Consumers’ Perceptions of both


4. Paired Samples T-Test

H0: d = 0 (or <= or >=)


Ha: d  0 (or > or <)
 The mean of the paired differences equals zero in the
population. Alternative hypothesis: The mean of the paired
differences does not equal zero in the population.

 It involves two measures by the same sample.


 Analyze the mean of the difference in population to zero.
 Use a z-test (normal distribution) only if n>30
 The most appropriate test is the Paired Sample T-Test,
which utilizes a t-distribution.
Paired Samples T-Test: Hypotheses
Paired Samples T-Test: SPSS

Analyze
Compare Means
Paired Sample T-Test
Choose the two variables:
 Mean is the x bar of
Paired Samples Correlations
the difference
N Correlation Sig.
Pair Bottles per week &
17 .992 .000
1 Bottles per week after

Paired Samples Test

Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair Bottles per week -
-.3529 .86177 .20901 -.7960 .0901 -1.689 16 .111
1 Bottles per week after
SPSS Example: Cig consump
5. Cross-Tabulations: Chi-square Test

 A technique used for determining whether there is a


statistically significant relationship between two categorical
(nominal or ordinal) variables
 While a frequency distribution describes one variable at a
time, a cross-tabulation describes two or more variables
simultaneously.
 Cross-tabulation results in tables that reflect the joint
distribution of two or more variables with a limited number
of categories or distinct values
5. Cross-Tabulations: Chi-square Test

Example:
 A marketing manager of a telecommunications company is
reviewing the results of a study of potential users of a cell
phone.
 A random sample of 200 respondents has been drawn.
 A cross-tabulation of data on whether target consumers would
buy the phone (Yes or No) and whether the cell phone had
access to the Internet (Yes or No)
 Question: Can the marketing manager infer that an
association exists between Internet access and buying the
cell phone?
Two-Way Tabulation: Example 1

Internet Would Buy a Cellular Phone


Access
Yes No Total

Yes 80 (80%) 20 (20%) 100

No 20 (20%) 80 (80%) 100

Total 100 100 200


(100%) (100%)
 If not association at all, what will be the numbers? And if 100%?
Chi Square Test Hypotheses

 H0: There is no association between Internet access


and buying the cell phone (the two variables are
independent of each other).

 Ha: There is some association between Internet access


and buying the cell phone (the two variables are not
independent of each other).
Example 2

 A electronics company would like to test the


relationship between owning a PDA and income levels.

 H0: There is no association between owning a PDA


and income levels (the two variables are independent
of each other).

 Ha: There is some association between owning a PDA


and income levels (the two variables are not
independent of each other).
Cross Tabulations : SPSS

Analyze
Descriptive Statistics
Crosstabs
Choose the two variables (row & column)
Cross-Tabulation in SPSS

Analyze>Descriptive Statistics>Crosstabs

Step 1 :
Choose the
type of
Analysis
How to Create Crosstabs in SPSS

Step 2: Select the Variable in row


Independent Variable
(in rows)
and the Dependent
Variable (in columns)

Variable in
Column

Click on
“Cells” to
compute
percentages
SPSS
 Column: Gender, Row: Income; % row
6. Spearman Correlation Coefficient

 Measures the association between two ordinal variables.


H0: ρs=0; & Ha: ρs 0
Raw: coefficient of correlation, if zero? (slope, horizontal line)
Can be one-tailed if you know the direction

 Example:
You would like to know whether there is a correleation
between the number of bottles’ consumption and school
class (two ordinal)
Spearman Correlation: SPSS

Analyze
Correlate
Bivariate
Choose “Spearman”

 If 2 fail if 1 sig
SPSS
 Which variables? Age & income

Analyze
Correlate
Bivariate
Choose “Spearman”
 0.712 correlation
coefficient, compare
to zero & +
 0 to 1 0r 0 to -1 (0 to
100%), 71%
 Very high so v high
sig
 If – (depends 2 tail or
1 like expected or
opposite)
Inferential Analysis Tests

1. One Sample T-Test


2. Independent Samples T-Test
3. Analysis of Variance (ANOVA)
4. Paired Samples T-Test
5. Chi Square Contingency Test (Cross-Tabulation)
6. Spearman Correlation
7. Pearson Correlation
8. Simple Linear Regression
9. Multiple Regression
 When to select each, how to run it, and how to interpret the
data
7. Pearson Correlation Coefficient

 This technique is appropriate when the degree of


association between two metric (interval or ratio)
variables is to be examined
H0: ρp= 0; & Ha: ρp= 0
 Can also be one tail
 All same for us but completely different equations so must choose the correct option

 Example:
Is there a significant relationship between customers' age (measured in
actual years) and their perceptions of our company's image (measured
on a scale of 1to 7)?
Example: Sales and Advertising Data

30

Dollar Sales of Bright (Thousands)

20

10

0
400 600 800 1000 1200 1400 1600 1800 2000

Advertising Expenditures for Bright ($)

What is the relationship between dollar sales and advertising


expenditure ? + or – or none? (The more the more)
Sales & Number of Competing
Brands
30

Dollar Sales of Bright (Thousands)

20

10

0
0 2 4 6 8 10 12 14 16

Number of Competing Detergents

What is the relationship between dollar sales and number of


competing detergents ? (the more the less)
Pearson Correlation Coefficients

 Correlation between sales and advertising is +0.93.

 Correlation between sales and number of competing


brands is -0.91.

The best fit: if all on the line would be 1 or -1 (100%) the more scattered
the less the correlation
Types of Correlations

Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No correlation


Step 1: Press Analyze-> Correlate-> Bivariate
Step 2: Select Metric Variables & Pearson Correlation

Select
Pearson
Step 3: Analyze Output

Correlation
Coefficient

Divide by 2 if
1-tailed
SPSS: Cig before and after
Correlation versus Causality
Example: Sales and Advertising Data

30

Dollar Sales of Bright (Thousands)

20

10

0
400 600 800 1000 1200 1400 1600 1800 2000

Advertising Expenditures for Bright ($)

What is the relationship between dollar sales and advertising


expenditure ? + or – or none? (The more the more)
8. Simple Linear Regression

 A Correlation tells you if there is an association between X and Y but it


doesn’t describe the relationship or allow you to predict one variable
from the other.

 A Regression generates a mathematical relationship (the regression


equation) between one variable designated as the dependent variable
(Y) & another designated as the independent variable (X).
 Should be based on strong qualitative, need strong justification in
your project
 2 metric variables, but you need to specify which:
 Independent Variable (X) is an explanatory or predictor variable,
which often presumed to be a cause of the other
 Dependent Variable (Y) is the “Criterion Variable” that is influenced
by the independent Variable
Regression

X Y
Independent Dependent
Variable Variable

z z
Extraneous
Variables
Deriving a Regression Equation

Y=a+bX

 Y is the measure of the dependent variable


 X is the measure of the independent variable
 a is the Intercept: The predicted value of Y when X = 0.
 b is the Slope: Represents the change in the predicted value of Y per
one unit change is X: we hypothesize on this, we want to prove there is slope,
what if straight line? No relation
 b vs β H0: β = 0
Ha: β = 0 (X affects Y)
(can be > or < if you know X affects Y + or -)
Best-fitting Straight Line

 Aim is to fit a straight line, ŷ = ax + b, to data that gives best prediction


of y for any value of x
 This will be the line that minimises distance between data and fitted
line, i.e. the residuals (minimize the summation of the errors)
ŷ = a + bx
intercept slope

= ŷ, predicted value
= y i , true value
ε = residual error
Step 1: Press Analyze-> Regression-> Linear
Step 2: Select Dependent & Independent Variables,
Then Press OK
Step 3: Analyze Output

 If p<0.05 then X affects Y with high level of confidence

% of Variance in dependent
explained by Independent

Standardized Beta
Used to develop the regression Coefficient: used for
equation comparative purposes
Significance of
Beta Coefficient
Coefficient of Determination (R2)

 The hypothesis didn’t answer how far or how dispersed the data is of the
regression line. It just that I am confident enough this straight line has a
slope and not a horizontal line (Also got the equation and can predict y
from x)
 A global measure of how much better predictions of the dependent
variable made with the aid of the regression equation are than those made
without (i.e. how good the straight line is, how minimal the errors are).
 0-1 (0 to 100%), If perfect than 1 if very will decrease and come close to zero) will
not affect the significance but will show that the predictability is low.
 The closer R2 is to 1.0 the better the predictions of the regression
equations are. Measure the level of predictability of the model
 R2 = Variance Explained by Regression Equation / Total Variance
Step 3: Analyze Output
If p<0.05 then X affects Y with high level of confidence

% of Variance in dependent
explained by Independent
level of predictability 23%. X
explains 23 % of the variability
in Y. It affects Y not fully
predicts Y, there are other
variables (extraneous)
Model Significant
or not

Y= 0.371 + 0.806 X Beta Coefficient: b in the sample


Significance of
Beta Coefficient
Our Objectives in Regression:

1) Test effect of X on Y: To test the hypothesis of whether


there is a relationship (X affects Y, causal relationship) or
not and develop the equation to predict Y: through p (sig)
2) To use X to predict Y: through regression equation
3) Test the level of predictability (assess how good is the fit
of the straight line, the accuracy of the model in
predicting Y): through the coefficient of determination R2
 Types of questions:
 P (significant or not)
 Regression equation
 + or –
 Predict Y for specific value of X
Applications of Regression Equations

 The regression coefficient, or slope, can indicate how


sensitive the dependent variable is to changes in the
independent variable

 The regression equation is a forecasting tool for predicting


the value of the dependent variable for a given value of the
independent variable
Analyze
Regression
Linear
Choose dependent & independent variables
Using Regression in Prediction

Sales = 644.5 + -42.581 (Price)

If Price = 1 Sales = ??

 Predictability : 86 % (the R2 very good)


 Demand curve (best fitting straight line)
 Regression benefits:
 Prove that X affects Y with high level of confidence & get the
equation to use to predict
 Judge the predictability
9. Multiple Regression

 Generates a mathematical relationship (the regression


equation) between one variable designated as the dependent
variable (Y) & two or more designated independent variables
(Xs) (some of the independent variables are control
variables).
 The effect of any independent variable on the dependent
variable is measured after controlling for other
independent variables in the model
The Multiple Regression Equation

Y = a + b1 X1 + b2 X2 + b3 X3 + …bk Xk
 a the predicted value of Y when all Xs are = 0.
 bk represents the change in the predicted value of Y per one
unit change is Xk.
 The number of hypotheses is equal to the number of
independent variables (p and conclusions) (the number of bs)
H0: β1 = 0 & Ha: β1 = 0
H0: β2 = 0 & Ha: β2 = 0
H0: β3 = 0 & Ha: β3 = 0
H0: βk = 0 & Ha: βk = 0
Multiple Regression: SPSS

Analyze
Regression
Linear
Choose dependent & independent variables
Model Summary ANOVAb

Sum of
Adjusted Std. Error of Model Squares df Mean Square F Sig.
Model R R Square R Square the Estimate 1 Regression 740.931 2 370.465 458.798 .000a
1 .992 a .985 .983 .89859 Residual 11.305 14 .807
a. Predictors: (Constant), Bottles per week, Coke taste Total 752.235 16
a. Predictors: (Constant), Bottles per week, Coke taste
b. Dependent Variable: Bottles per week after

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .289 .616 .469 .646
Coke taste -3.27E-02 .092 -.012 -.356 .727
Bottles per week 1.027 .034 .993 30.204 .000
a. Dependent Variable: Bottles per week after
Y = a + b1 X1 + b2 X2
 2 hypotheses
 2 p (leave the first)
 Failed to prove with confidence that X1(coke taste) significantly affect
Y (bottles per week after) (p=0.7) after controlling for X2
 I proved that X2 (bottles per week) significantly affects dependent
(bottles per week after) after controlling for X1 coke taste (else deduct
marks)
 Regression:
 Far more accurate
 Allows to analyze the effect of each independent variable on the
depend variable after controlling for all other independent variables
you included in the model, hence unique effect of each
 Increases the predictably: The more the number of variable the higher
the R2 the more accurate the predictability (as long as it affects Y)
Summary

Types of Variables Statistical Technique

One Metric (Continuous) variable One-Sample T-Test or (Z-Test when n>30)

One Metric Variable and One Non-Metric Independent-Sample T-Test or


Variable with 2 Groups Only (Z-Test could be used when n>30)
One Metric Variable and One Non-Metric Analysis of Variance (ANOVA) + Scheffe
Variable with >2 Groups Post Hoc Test
Two Metric Variables (Compare Means) Paired Sample T-Test

Two Non-Metric Variables Chi Square Test or Spearman Correlation

Two Metric Variables Pearson Correlation or Simple Linear


Regression
>2 Metric Variables Multiple Regression
(One Dependent & Others Independent)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy