Hoda MR S23 Hypothesis Testing Updated
Hoda MR S23 Hypothesis Testing Updated
Hypothesis Testing
Overview
Descriptive Statistics
Involves measures of central tendency and dispersion, one-way tables
Helps summarize the general nature of the study variables
Inferential Statistics
Data analysis aimed at testing specific hypotheses
Helps infer conclusions to the study population (with high level of
confidence)
In sampling we cant be 100% confident: sampling error…When 100%??
80% vs. 99%: risk (airplane vs. chair), tolerance, maximum allowable error
(5%)
In statistics we do tests to judge how confident we are that a decision is
correct
Parameter Vs. Statistic
Parameter
Numbers that summarize/describes data for an entire population.
The actual, or true, population mean value or population proportion
or standard deviation for any variable (income, product ownership,
etc.)
Ex: the mean: µ
Can I know it?
Statistic
An estimate of a parameter from sample data
Ex: the mean: X
Sampling Errors
Mean # Mean #
75 1 300 4
100 1 325 4
125 2 350 3
150 2 375 3
175 3 400 2
200 3 425 2
225 4 450 1
250 4 475 1
275 5
Samples’ Mean Distribution
6/45
5/45
4/45
3/45
2/45
1/45
0
75 125 175 225 275 325 375 425 475
100 150 200 250 300 350 400 450
Not the distribution of your sample data, but distribution of the various mean of all
possible samples for 1 variable
F
r
q
u Normal
e probability
n distribution
c
y
o
f
O
c
c
u
r
r
e
n
c
e
100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 500.0
Not the distribution of your sample data, but the means of all possible samples for
1 variable (probability distribution)
We will just be using it to judge the confidence
2 SD
2 SD
2.5% 2.5%
__ __
Xcritical 1 µ Xcritical 2
Null & Alternative Hypotheses
2 SD
Rejection Area 1 2 SD Rejection Area 2
(α/2=0.025) (α/2=0.025)
__
__
µ=3 Xcritical 2
Xcritical 1
One-Tailed Hypothesis Test
1.645 SD
__
µ=3 xcritical
Hypothesis Testing Process
1.645 SD
__
µ=29,000 xcritical
Z-Test to Test the Hypotheses
z = (x µ)/ sx sx = S/√n
z = test statistic (to be compared to Zcritical)
The number of standard deviations from the mean in this sample (of our test)
x = sample mean
µ = population mean
sx = standard error of the sample mean (SD of sampling distribution)
S = standard deviation of the sample
n = sample size
If z > Zcritical then we can reject H0
Compare z to the critical value (if 2-tailed: 2 or 1.96 or if 1-tailed: 1.645), if bigger:
then I reached the hashed area, my p is less than 5% and I am confident enough
(more than 95%) that Ha is correct, proved Ha with enough confidence
I hope that (x-µ) to be as large as possible & S as small possible, n large
Z-Test to Test the Hypotheses
z = (x - µ)/ sx sx = S/√n
What if in the other side of the graph?? (always check the direction before
starting in 1-tailed)
What if in the non-rejection area?
1.71 SD
__
µ xcritical
One-Sample T-Test: SPSS
Analyze
Compare Means
One-Sample T-Test
Choose the variable
Select the test value (µ)
One-Sample T-Test: Example 1
Start by checking the direction!
z = (x - µ)/ sx sx = S/√n
x - µ = mean difference
S x = Std. Error Mean
It is 1 tail: tcritical =1.71
One-Sample T-Test: SPSS
One-Sample Statistics
Std. Error
N Mean Std. Deviation z=Mean
(x - µ)/ sx
Bottles per week 17 9.1176 6.63214 1.60853
sx = S/√n
2. Independent Samples T-Test
EG: R X
01
2. Independent Samples T-Test
2 sample T-Test
Test of Two Means
2 groups (can be 2 samples or one sample divided into 2)
and analyzing the difference between the 2 groups on a
metric variable
The means for the two populations are equal. Alternative
hypothesis: The means for the two populations are not equal.
Two samples
measured
at the same
City 1 time City 2
2. Independent Samples T-Test
Mid-term Mid-term
Exam Exam
Scores for Scores for
Males Females
2. Independent Samples T-Test
Two-Tailed:
H0: 1 = 2 or 1 - 2 = 0
Ha: 1 2 or 1 - 2 0
One-Tailed:
H0: 1 < 2 or 1 - 2 < 0
Ha: 1 > 2 or 1 - 2 > 0
It compares the average of two samples and tell you whether they are
statistically different from each other.
Compare the difference in population means to zero.
Use a z-test (normal distribution) only if n>30
The most appropriate test is the Independent Sample T-Test, which
utilizes a t-distribution.
2. Independent Samples T-Test
Formula
Difference between the means divided by the pooled
standard error of the mean
x1 x2
t
s x1 x2
The t score is a ratio between the difference between two groups
and the difference within the groups. The larger the t score, the
more difference there is between groups. The smaller the t score, the
more similarity between groups.
Independent Samples T-Test: SPSS
Analyze
Compare Means
Independent-Sample T-Test
Choose the test variable: bottles per week
Choose the grouping variable: Gender
Define the two groups: 1 & 2
H0: 1 = 2 = 3 =……. = n
Same Sample
tested in two
points in time
Sample 1 Sample 1
Before After
Same
Patients
Patients
Before a
After the
treatment
treatment
Analyze
Compare Means
Paired Sample T-Test
Choose the two variables:
Mean is the x bar of
Paired Samples Correlations
the difference
N Correlation Sig.
Pair Bottles per week &
17 .992 .000
1 Bottles per week after
Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair Bottles per week -
-.3529 .86177 .20901 -.7960 .0901 -1.689 16 .111
1 Bottles per week after
SPSS Example: Cig consump
5. Cross-Tabulations: Chi-square Test
Example:
A marketing manager of a telecommunications company is
reviewing the results of a study of potential users of a cell
phone.
A random sample of 200 respondents has been drawn.
A cross-tabulation of data on whether target consumers would
buy the phone (Yes or No) and whether the cell phone had
access to the Internet (Yes or No)
Question: Can the marketing manager infer that an
association exists between Internet access and buying the
cell phone?
Two-Way Tabulation: Example 1
Analyze
Descriptive Statistics
Crosstabs
Choose the two variables (row & column)
Cross-Tabulation in SPSS
Analyze>Descriptive Statistics>Crosstabs
Step 1 :
Choose the
type of
Analysis
How to Create Crosstabs in SPSS
Variable in
Column
Click on
“Cells” to
compute
percentages
SPSS
Column: Gender, Row: Income; % row
6. Spearman Correlation Coefficient
Example:
You would like to know whether there is a correleation
between the number of bottles’ consumption and school
class (two ordinal)
Spearman Correlation: SPSS
Analyze
Correlate
Bivariate
Choose “Spearman”
If 2 fail if 1 sig
SPSS
Which variables? Age & income
Analyze
Correlate
Bivariate
Choose “Spearman”
0.712 correlation
coefficient, compare
to zero & +
0 to 1 0r 0 to -1 (0 to
100%), 71%
Very high so v high
sig
If – (depends 2 tail or
1 like expected or
opposite)
Inferential Analysis Tests
Example:
Is there a significant relationship between customers' age (measured in
actual years) and their perceptions of our company's image (measured
on a scale of 1to 7)?
Example: Sales and Advertising Data
30
20
10
0
400 600 800 1000 1200 1400 1600 1800 2000
20
10
0
0 2 4 6 8 10 12 14 16
The best fit: if all on the line would be 1 or -1 (100%) the more scattered
the less the correlation
Types of Correlations
Y Y Y
Y Y Y
X X X
Select
Pearson
Step 3: Analyze Output
Correlation
Coefficient
Divide by 2 if
1-tailed
SPSS: Cig before and after
Correlation versus Causality
Example: Sales and Advertising Data
30
20
10
0
400 600 800 1000 1200 1400 1600 1800 2000
X Y
Independent Dependent
Variable Variable
z z
Extraneous
Variables
Deriving a Regression Equation
Y=a+bX
= ŷ, predicted value
= y i , true value
ε = residual error
Step 1: Press Analyze-> Regression-> Linear
Step 2: Select Dependent & Independent Variables,
Then Press OK
Step 3: Analyze Output
% of Variance in dependent
explained by Independent
Standardized Beta
Used to develop the regression Coefficient: used for
equation comparative purposes
Significance of
Beta Coefficient
Coefficient of Determination (R2)
The hypothesis didn’t answer how far or how dispersed the data is of the
regression line. It just that I am confident enough this straight line has a
slope and not a horizontal line (Also got the equation and can predict y
from x)
A global measure of how much better predictions of the dependent
variable made with the aid of the regression equation are than those made
without (i.e. how good the straight line is, how minimal the errors are).
0-1 (0 to 100%), If perfect than 1 if very will decrease and come close to zero) will
not affect the significance but will show that the predictability is low.
The closer R2 is to 1.0 the better the predictions of the regression
equations are. Measure the level of predictability of the model
R2 = Variance Explained by Regression Equation / Total Variance
Step 3: Analyze Output
If p<0.05 then X affects Y with high level of confidence
% of Variance in dependent
explained by Independent
level of predictability 23%. X
explains 23 % of the variability
in Y. It affects Y not fully
predicts Y, there are other
variables (extraneous)
Model Significant
or not
If Price = 1 Sales = ??
Y = a + b1 X1 + b2 X2 + b3 X3 + …bk Xk
a the predicted value of Y when all Xs are = 0.
bk represents the change in the predicted value of Y per one
unit change is Xk.
The number of hypotheses is equal to the number of
independent variables (p and conclusions) (the number of bs)
H0: β1 = 0 & Ha: β1 = 0
H0: β2 = 0 & Ha: β2 = 0
H0: β3 = 0 & Ha: β3 = 0
H0: βk = 0 & Ha: βk = 0
Multiple Regression: SPSS
Analyze
Regression
Linear
Choose dependent & independent variables
Model Summary ANOVAb
Sum of
Adjusted Std. Error of Model Squares df Mean Square F Sig.
Model R R Square R Square the Estimate 1 Regression 740.931 2 370.465 458.798 .000a
1 .992 a .985 .983 .89859 Residual 11.305 14 .807
a. Predictors: (Constant), Bottles per week, Coke taste Total 752.235 16
a. Predictors: (Constant), Bottles per week, Coke taste
b. Dependent Variable: Bottles per week after
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .289 .616 .469 .646
Coke taste -3.27E-02 .092 -.012 -.356 .727
Bottles per week 1.027 .034 .993 30.204 .000
a. Dependent Variable: Bottles per week after
Y = a + b1 X1 + b2 X2
2 hypotheses
2 p (leave the first)
Failed to prove with confidence that X1(coke taste) significantly affect
Y (bottles per week after) (p=0.7) after controlling for X2
I proved that X2 (bottles per week) significantly affects dependent
(bottles per week after) after controlling for X1 coke taste (else deduct
marks)
Regression:
Far more accurate
Allows to analyze the effect of each independent variable on the
depend variable after controlling for all other independent variables
you included in the model, hence unique effect of each
Increases the predictably: The more the number of variable the higher
the R2 the more accurate the predictability (as long as it affects Y)
Summary