Hypothesis
Hypothesis
HYPOTHESI S TESTIN G
LEARNING OUTCOMES
As per Claude Bernard-The research method will not give new and
productive ideas to those who do not have them, it will only help in
guiding the ideas to those who have them and in developing those
so as to draw the best possible results.
• A company is making pipe with 4 mm diameter. But employee start claiming that after
service the machine is no longer making pipe of 4 mm diameters. To clarify the same
claim a sample of 100 pipe were taken at 99% confidence level.
H0 : µ = 4 mm.
Ha : µ ≠ 4 mm.
• Doctors believes that the average teen sleeps on average no longer than 10 hours per
day. A researcher believe that teens on average sleep longer.
H0 : µ 10 hours.
Ha : µ > 10 hours.
EXERCISE
• An ice cream vendor has an average sale of Rs500 per day. Due to
the establishment if school in the locality he expects the ice cream
sale to increase. Create null and alternate hypothesis.
The null hypothesis generally referred by H0 (H sub-zero), is the hypothesis which is tested
for possible rejection under the assumption that is true. Theoretically, a null hypothesis is set
as no difference or status quo and considered true, until and unless it is proved wrong by the
collected sample data.
Symbolically, a null hypothesis is represented as:
The alternative hypothesis, generally referred by H1 (H sub-one), is a logical opposite of the null
hypothesis. In other words, when null hypothesis is found to be true, the alternative hypothesis must be
false or when the null hypothesis is found to be false, the alternative hypothesis must be true.
Symbolically, alternative hypothesis is represented as:
STEP 2: DETERMINE THE APPROPRIATE
STATISTICAL TEST
Type, number, and the level of data may provide a platform for deciding the
statistical test.
Apart from these, the statistics used in the study (mean, proportion, variance,
etc.) must also be considered when a researcher decides on appropriate
statistical test, which can be applied for hypothesis testing in order to obtain the
best results.
STEP 3: SET THE LEVEL OF SIGNIFICANCE
Critical region is the area under the normal curve, divided into two mutually exclusive
regions. These regions are termed as acceptance region (when the null hypothesis is
accepted) and the rejection region or critical region (when the null hypothesis is rejected).
STEP 5: COLLECT THE SAMPLE DATA
In this stage of sampling, data are collected and the appropriate sample statistics
are computed.
The first four steps should be completed before collecting the data for the study.
It is not advisable to collect the data first and then decide on the stages of
hypothesis testing.
STEP 6: ANALYSE THE DATA
In this step, the researcher has to compute the test statistic. This involves
selection of an appropriate probability distribution for a particular test.
Some of the commonly used testing procedures are z, t, F, and χ2.
STEP 7: ARRIVE AT A STATISTICAL CONCLUSION
AND BUSINESS IMPLICATION
• Given the following information relating to two places, A & B, test whether there is
any significant difference between their mean wages :
A B
Mean wages 47 49
SD 28 40
No of workers 1000 1500
1 3 6 -3 9
2 4 7 -3 9
3 2 5 -3 9
4 5 7 -2 4
5 3 2 1 1
6 4 6 -2 4
7 6 5 1 1
8 5 8 -3 9
9 4 6 -2 4
Total: n=9 Total= -16 50
• Mean(D)= -16/9= -1.78
• Standard Deviation= 1.6401
• T value=-3.25
• Null hypothesis is rejected.
• A shopkeeper has shifted from using a manual typewriter to a computer to
do his job. The number of mistakes he makes in both the methods are as
follows. Is the computer helpful in reducing mistakes? Test at 5% level of
significance. T value=2.57
Pages Mistakes before using computer Mistakes after using computer
1 58 53
2 29 28
3 30 31
4 55 48
5 56 50
6 45 42
QUICK QUIZ
Q1: The form of the alternative hypothesis can be:
A) one-tailed
B) two-tailed
C) neither one nor two-tailed
D) one or two-tailed
Q2) By taking a level of significance of 5% it is the same as saying:
A) results in only one direction can lead to rejection of the null hypothesis
B) negative sample means lead to rejection of the null hypothesis
C) results in either of two directions can lead to rejection of the null hypothesis
D) no results lead to the rejection of the null hypothesis
LEARNING OUTCOME
Chi Applications
• Test of Independence
F Test
One Way ANOVA
CHI SQUARE TEST
• Chi square teat was developed by Karl Pearson in 1900.
• It is a non parametric test.
• This data deals with categorical data.
• Categorical data is defined as the counting of frequencies from one or more variables.
• A categorical variable (sometimes called a nominal variable) is one that has two or more
categories, but there is no intrinsic ordering to the categories. For example, gender is a
categorical variable having two categories (male and female) and there is no intrinsic
ordering to the categories)
• Eg: The company has a total of 40,000 officers and it selected a random sample of 650
officers across four departments to assess the representativeness across departments in the
seminar. Out of 650 randomly selected officers, 150 officers are from the production
department, 200 officers are from the marketing department, 160 from the finance
department and remaining 140 from the human resource department.
• A research variable “representativeness from the departments” does not require any rating
scale.
• Here the research question is the frequency count from each department and can be
analysed using the chi square technique.
DEFINING CHI-SQUARE TEST STATISTIC
Chi-square test provides a platform that can be used to ascertain whether theoretical
probability distributions coincide with empirical sample distributions.
Eg: An edible oil company may be interested in knowing whether the purchase of oil is
independent of the customer’s age or whether it is dependent on the customer’ s age.
Eg2: HRD Manager of a company who is interested in ascertaining whether the rate of employee
turnover is independent of employee qualification.
Example 13.2
Low IQ 65 335
• Chi cal=269.24
• Chi tab=3.841
• Reject the Null Hypothesis.
F TEST
• It is particularly useful when multiple sample cases are involved and the data has been
measured on interval or ratio scale.
• It can be used to test the equality of variance of two normal populations ie to find
whether two samples can be regarded as drawn from normal populations having same
variance.
• Analyse variance of more than two independent samples.
• Two random samples have been drawn from two normal populations.
Sample 1 75 68 65 70 84 66 55
Sample 2 42 44 56 52 46
Test using variance ratio at 5% significance level whether the two populations
have same variance.
• F calculated=2.37
• F table=6.16
• Accept the null hypothesis
CORRELATION
• Correlation is concerned with identifying the association between two or more variables.
• Correlation identifies the degree of relationship between two variables and regression is
used to study the nature of relationship and develop a cause and effect relationship.
• Types of Correlation
Marks in 48 35 17 23 47
accountancy
Marks in Statistics 45 20 40 25 45
• R= 0.428
PRACTICE
• Calculate the Karl Pearson’s Coefficient from the following data
X: 811 15 10 12 16
Y: 69 11 7 9 12
SPEARMAN COEFFICIENT OF
CORRELATION
• When ranks are given
EXCERCISE
• When ranks are given
• Two ladies were asked to rank 7 different types of lipsticks. The ranks given by them are as
follows. Calculate Spearman rank correlation.
LIPSTICK A B C D E F G
S
NEELU 2 1 4 3 5 7 6
NEENA 1 3 2 4 5 6 7
• R=0.786
WHEN RANKS ARE NOT GIVEN
• Calculate Spearman coefficient of correlation between marks assign to students by judges X
and Y in a competitive test.
1 2 3 4 5 6 7 8 9 10
Judge 1 52 53 42 60 45 41 37 38 25 27
Judge 2 65 68 43 38 77 48 35 30 25 50
• R=0.539
WHEN RANKS ARE EQUAL
EXERCISE
• Obtain the rank correlation coefficient between the variables X and Y from the
following pairs of observed values.
• X : 50 55 65 50 55 60 50 65 70 75
• Y: 110 110 115 125 140 115 130 120 115 160
• R= 0.155
REGRESSION
• Regression analysis is the process of developing a statistical model, which is used to predict
the value of a dependent variable by at least one independent variable.
• In a simple regression analysis, there are two types of variables
– Dependent Variable: The variable whose value is influenced or to be predicted. It is called
regressed or explained variable.
– Independent variable: The variable which influences the value or is used for prediction. It is also
called regressor or predictor or explanatory variable
Simple linear regression analysis is focused on developing a regression model by which the
value of the dependent variable can be predicted with the help of the independent
variable.
• From the following data
REGRESSION
• A random sample of eight drivers insured with a company and having
similar auto insurance policies was selected. The following table lists their
driving experiences(in years) and monthly auto insurance premiums.
Predict the monthly auto insurance premium for a driver with 10 years
of driving experience.
• b= -1.5476
• a= 76.6605
• Regression equation:
• Y= 76.6605-1.5476X
• When X=10, THEN
• Y= 76.6605-1.5476x10= 61.68
CORRELATION VS REGRESSION
X 55 60 65 70 80
Y 52 54 56 58 62
• X=330
• Y=282
• X^2=22150
• XY=18760
• A= 30
• B=0.4
Y on X
Y=30+0.4X
ONE WAY ANOVA
• ANOVA stands for Analysis of Variance.
• This technique was developed by Sir Ronald Aylmer Fisher when he worked at the Rothamsted
Agricultural Experimental Station from 1919 to 1933.
• It is a technique of testing hypotheses about the significant difference in several population
means.
• The statistic that is computed and tested for statistical significance in this technique is an F
ratio.
• Sawyer (2009) stated that ANOVA is a useful statistical tool for drawing inferential conclusions
about how one or more independent variables influences a parametric dependent variable.
• In analysis of variance, the total variation in the sample data can be on account of two
components, namely, variance between the samples and variances within the samples.
• Variance between samples is attributed to the difference among the sample means. This
variation is due to some assignable causes.
• Variance within the samples is the difference due to chance or experiment errors.
• It is called one-way design because there is only one independent variable, although
any number of groups or levels representing that independent variable can be
subsumed.
• This technique is used to determine whether there exist any statistically significant
differences between the samples of two or more independent groups.
• Using ANOVA, we test for differences among the means of the population by
comparing the amount of variation between samples to amount of variation within
each of these samples.
• OBJECTIVE: The objective of ANOVA is not to test the significance of the
difference between sample variances but to test for the significance of
difference among sample means.
• Fifteen students undergoing training are randomly assigned to three different types on
instruction modules. At the end of training period their test scores are as follows:
A 86 79 81 70 84
B 90 76 88 82 89
C 82 68 73 71 81
Use analysis of Variance to test that there is no significant difference in the mean
scores of three instruction modules using 5% significance level.
• NULL HYPOTHESIS: There is no difference in the mean scores of the three instruction modules.
• STEP 1:
• Mean of A=80
• Mean of B=85
• Mean of C= 75
STEP 2: Mean of sample means=80
STEP 3: SS Between=250
STEP 4: SS within 448
City a 16 8 12 12
City b 14 10 10 6
City c 4 10 8 10
• Ftab=4.26
• F cal=1.63
• Null hypothesis accepted
Example 12.1
Vishal Foods Ltd is a leading manufacturer of biscuits. The company has launched a
new brand in the four metros; Delhi, Mumbai, Kolkata, and Chennai. After one
month, the company realizes that there is a difference in the retail price per pack of
biscuits across cities. Before the launch, the company had promised its employees
and newly-appointed retailers that the biscuits would be sold at a uniform price in the
country. The difference in price can tarnish the image of the company. In order to
make a quick inference, the company collected data about the price from six
randomly selected stores across the four cities. Based on the sample information, the
price per pack of the biscuits (in rupees) is given in Table 12.5:
Example 12.1: Continued
Use one-way ANOVA to analyse the significant difference in the prices. Take 95% as the confidence level.
TAB LE 1 2 . 7 : A N OVA TA B L E FOR E X A MPL E 1 2 . 1
TAB LE 1 2. 11: A N OVA S U M MA RY TA B L E FO R EX A M PL E 1 2 . 2