STA404 Exam Booklet - 20.03.2023
STA404 Exam Booklet - 20.03.2023
INSTRUCTIONS TO CANDIDATES
2. Answer ALL questions in the foolscap paper. Start each answer on a new page.
4. Candidates are required to convert their completed answer in one PDF file before
submission (<FULLNAME_UiTM ID_GROUP>.pdf).
5. Candidates are given 30 minutes to email their completed answer to the respective
lecturers.
6. Candidates are required to attach the following details in every page of the answer script :
i) Full Name
ii) Student Number
iii) Group
iv) HP Number
QUESTION 1
A random sample of employees were selected from three different types of stores at the
mall and their ages were recorded. Assume that the assumptions for the parametric test
are met. The data were analyzed and the result is shown below.
b) Given the between-group variance is 727.925 and total sum of square is 2823. Hence,
compute the value of test statistic for the above data.
(3 marks)
c) At the 0.05 significance level, test the claim that there is a difference in mean ages for
three types of stores at the mall.
(4 marks)
QUESTION 2
and the
number of accidents he or she had over a 3-year period. The data collected for 10 drivers
are shown below.
16 24 16 18 23 27 32 24 28 21
age
No of
3 2 5 2 0 1 1 1 0 3
accident
QUESTION 3
A manager at one of the popular Telco company is currently conducting a survey regarding
the service failure at their service counter. The main objective of the survey is to find out the
factors that cause the failure. He randomly selected five service counters from ten available
service counters all over Malaysia. A questionnaire is distributed to all the customers at the
five selected service counters. Among the information collected from the customers include
their age, gender, occupation, income, rating of service (0 to 100) and service quality (poor,
moderate and good).
c) Identify ONE ordinal variable and ONE ratio variable obtained from the study.
(2 marks)
d) The followings are the statistics produced from the study. Identify whether each
statement is a descriptive or inferential statistics.
.
i) 45% of the sample customers work in the government sector.
ii) Based on the sample, it can be concluded that there is an association between
gender and service quality.
iii) We are 90% confident that the average rating of service of for the customers falls
between 60 and 90.
(3 marks)
QUESTION 4
The depression scores (the higher the score, the more stressed are the patients) of 25
patients were recorded before and after they had undergone a therapy. The scores were
analyzed to see the effectiveness of the therapy. The outputs are indicated below.
b) State the 95% confidence interval for the mean depression scores.
(1 mark)
c) Based on the confidence interval in b), can it be concluded that the therapy is effective?
(2 marks)
QUESTION 5
A study has been made to compare the average amount of sugar contents for two brands of
energy drinks. Ten energy drinks of brands P and eight energy drink of brand Q were
sampled and the amount of sugar content was recorded for this study. The following are the
SPSS output for the analysis of sugar contents.
Group Statistics
Brand of Energy Drink N Mean Std. Deviation Std. Error Mean
Sugar Content Energy Drink P 10 10.80 4.185 1.323
Energy Drink Q 8 11.00 1.069 .378
a) Determine whether the variances of the amount of sugar for the two brands are equal.
Use
(3 marks)
b) Assume the sugar content is normally distributed. If it is claimed that energy drink P has
less amount of sugar than energy drink Q, conduct a hypothesis testing to test this
claim. Use = 0.05.
(4 marks)
QUESTION 6
There is an increasing trend on purchasing the laptop due to the pandemic Covid19. Hence,
a group of researchers intend to describe this scenario by using descriptive statistic. SPSS
output illustrating the information on the number of purchasing laptop (in month) according
to fifteen states in Malaysia are shown in the following output. Assume that the number of
laptop purchases (in month) is normally distributed.
b) State the median value in this study. Hence interpret the value.
(2 marks)
(1 mark)
QUESTION 7
A dairy products factory wants to know the milk flavour preferred by the buyers. The
researchers randomly selected several supermarket visitors and conducted an experiment.
Buyers were given three cups of milk with different flavours to drink. After that, they were
asked to choose one flavor that they preferred the most. The data collected are shown in
the following table.
c) At 10% significance level, what can you conclude about the relationship between the
two variables?
(4 marks)
d) If the variables are measured in ratio, can the Chi-Square Test of Independence be
used?
(1 mark)
APPENDIX 1 (1)
SAMPLE MEASUREMENTS
Mean
or
Standard deviation
Coefficient of Variation CV =
Coefficient of Skewness =
OR
APPENDIX 1 (2)
CONFIDENCE INTERVAL
; df = n1 + n2 2
Difference in means of two normal
distributions, 1 - 2
and unknown
APPENDIX 1 (3)
HYPOTHESIS TESTING
H0 : = 0 ; df = n 1
2
unknown, small samples
; df = n1 + n2 2
H0 : 1 - 2=0
and unknown
H0 : 1 - 2 = 0
and unknown
APPENDIX 1 (4)
Let:
k = the number of different samples (or treatments)
= the size of sample i
the sum of the values in sample i
=
n = the number of values in all samples
= n1 n 2 n3 ...
= the sum of the values in all samples
=
= the sum of the squares of values in all samples
APPENDIX 1 (5)
and
Y = a + bx
and
INSTRUCTIONS TO CANDIDATES
2. Answer ALL questions in the foolscap paper. Start each answer on a new page.
4. Candidates are required to convert their completed answer in one PDF file before
submission (<FULLNAME STUDENTNO GROUP>.pdf).
5. Candidates are given 30 minutes to email their finalized and completed answer to the
respective lecturers.
6. Candidates are required to attach the following details in every page of the answer script :
i) Full Name
ii) Student Number
iii) Group
iv) HP Number
QUESTION 1
A researcher is interested to study the E-wallet usage among customers of Pasaraya Intan
Belian. The researcher intended to obtain information from 50 respondents by interviewing
every 5th customer of Pasaraya Intan Belian on a particular day. The respondents are asked
on their marital status, age, the frequency of using E-wallet (never, seldom, often, very
often), the preferred E-wallet provider (Boast, Grabpay, Touch n Go, BigPay) and the last E-
wallet transaction amount (RM).
b) Identify THREE (3) variables from the study. Hence, state its scales of measurement.
(3 marks)
d) Give ONE (1) advantage of the data collection method used by the researcher.
(1 mark)
QUESTION 2
The director of a government agency heard that their financial department is receiving an
average of 6 complaints from the customers in a week. To solve the problem, he assigned
his secretary to collect some data to see if he needs to replace the supervisor of that
department. The director will replace the supervisor if the actual mean number of complaints
towards the financial department is greater than 6 per week. The secretary gathered data
over the next 12 weeks and discovered that the mean number of weekly complaints towards
the financial department is 7 with a variance of 3.25.
c) Test at the 5% significance level, is the director going to replace the department
supervisor? Show the relevant steps.
(5 marks)
QUESTION 3
N Mode Q3
score 18 1116 73366 67.00 74.25
b) Compute the coefficient of skewness. Hence, comment on the shape of the distribution.
(3 marks)
c) Explain the meaning of the value for third quartile (Q3) for this study.
(1 mark)
QUESTION 4
d) Based on the confidence interval obtained in c), is there any evidence to support that
the average of final examination marks for students who took online class is different
from face-to-face class? Give a reason to support your answer.
(2 marks)
QUESTION 5
A grocery chain wants to know if the three types of advertisements affect the mean sales
differently. They used each type of advertisement at four different randomly selected stores
for a month and measured the sales (RM ‘000) for each store at the end of the month. The
results are as follow.
Descriptives
Advertisement Statistic
Mean 11.5000
Type 1 Std. Deviation 3.41565
Sum 46.00
Mean 10.0000
Sales Type 2 Std. Deviation 3.26599
Sum 40.00
Mean 7.5000
Type 3 Std. Deviation 2.51661
Sum 30.00
ANOVA
Sales
Sum of Squares df Mean Square F Sig.
Between Groups A 2 16.333 D .235
Within Groups 86.000 9 C
Total B 11
a) Using the sum of squares between groups formula, calculate the value of A.
(3 marks)
c) State the null and alternative hypothesis for the above study.
(1 mark)
d) Using the p-value method, is there any evidence to support that the types of
advertisements affect the mean sales? Test at =0.01.
(3 marks)
QUESTION 6
The lecturers of Mathematical Science Department from University M intended to study the
association between the stress levels and the hours of online lessons in a week among
accounting students. A questionnaire which aimed to assess the stress levels was
administered to the respondents of the study. Their responses towards on the stress levels
were categorised into low, medium, and high levels. The students were also asked to state
the number of hours of their online lessons each week, according to the following category:
less than 16 hours, 16 to 18 hours, 19 to 21 hours and more than 21 hours. The data were
collected and the results are as follow.
Chi-Square Tests
Value df
Pearson Chi-Square 4.032 U
Likelihood Ratio 3.963 6
Linear-by-Linear Association .148 1
N of Valid Cases 489
a) Give a reason for conducting the Chi-square Test of Independence for the above study.
(1 mark)
d) State the null and alternative hypothesis for the above study
(1 mark)
e) At the 10% significance level, is there any sufficient evidence to conclude that the stress
level is associated with the hours of online lessons in a week among the accounting
students?
(4 marks)
QUESTION 7
A study was conducted to investigate the influence of the fathers’ height on the sons’
height. The heights (cm) of a random sample of fathers and sons were recorded and
analysed by using IBM SPSS. The following results were obtained from
the bivariate analysis.
Model Summary
Adjusted R Std. Error of the
Model R R Square
Square Estimate
1 .446 .199 .065 6.071
Coefficients
Standardized
Unstandardized Coefficients
Model Coefficients t Sig.
B Std. Error Beta
(Constant) 96.281 60.053 1.603 .160
1
Heights of fathers (cm) .432 .354 .446 1.220 .268
b) State the correlation coefficient value. Hence, interpret the relationship between the
variables.
(2 marks)
d) Based on the equation in c), comment on the slope value in the context of the above
study.
(1 mark)
e) Predict the height of a son if the height of his father is 192 cm.
(2 marks)
APPENDIX 1 (1)
SAMPLE MEASUREMENTS
Mean x
x
n
x
2
s
1
n 1 x n or
2
Standard deviation
s
1
n 1
( x x )2
s
Coefficient of Variation CV = 100%
x
Coefficient of Skewness =
Pearson’s Measure of Skewness
3(mean median ) mean mod e
OR
s tan dard deviation s tan dard deviation
APPENDIX 1 (2)
CONFIDENCE INTERVAL
s12 s2
( x1 x 2 ) t 2 2 ;
n1 n2
2
Difference in means of two normal s12 s22
distributions, 1 - 2 , n n2
df 1
12 22 and unknown 2 2
s12 s22
n1 n2
n1 1 n2 1
sd
Mean difference of two normal d t 2 ; df = n – 1 where n is no. of
distributions for n
paired samples, d pairs
APPENDIX 1 (3)
HYPOTHESIS TESTING
H0 : = 0 x 0
t ; df = n – 1
σ2 unknown, small samples s n
( x 1 x 2 ) ( 1 2 )
t ; df = n1 + n2 – 2
1 1
sp
H0 : 1 - 2 = 0 n1 n 2
12 22 and unknown (n1 1)s12 (n 2 1)s 22
sp
n1 n 2 2
( x 1 x 2 ) (1 2 )
t
s12 s 22
n1 n 2
H0 : 1 - 2 = 0 s12
2
s 22
12 22 and unknown n n2
df 1
2 2
s12 s 22
n1 n2
n1 1 n2 1
d d
H0 : d = 0 t ; df = n – 1, where n is no. of pairs
sd n
(oij eij )2
Hypothesis for categorical data 2
eij
APPENDIX 1 (4)
Let:
k = the number of different samples (or treatments)
ni = the size of sample i
Ti the sum of the values in sample i
=
n = the number of values in all samples
= n1 n 2 n3 ...
x = the sum of the values in all samples
= T1 T2 T3 ...
x 2
= the sum of the squares of values in all samples
( x) 2
Total sum of squares: SST = x2
n
Sum of squares between groups:
SSB
Variance between groups: MSB
(k 1)
SSW
Variance within groups: MSW
(n k )
MSB
Test statistic for a one-way ANOVA test: F
MSW
APPENDIX 1 (5)
SS xy xy
( x)( y)
n
SS xx x
x)
2
( 2
and SS yy y
y)
2
( 2
n n
Y = a + bx
n
SS xy
Linear correlation coefficient: r
SS xxSS yy
3. a) Population: All customers from the 10 service counters all over Malaysia
b) Cluster sampling
c) Ordinal variable: Service quality, Ratio variable: Age / Income / Rating of service
d) (i) Descriptive (ii) Inferential (iii) Inferential
4. a) X = 1.2, Y = 0.424
b) (0.324 < μd < 2.076)
c) The therapy is effective.
6. a) 𝑥̅ = 478.6667, s = 141.5489
b) Median = 492. 50% of the number of purchasing the laptop in 15 states is less than 492 and/or
50% of the number of purchasing the laptop in 15 states is more than 492
c)
7. a) M = 6.9841
b) 0.5314
c) Do not reject 𝐻0
d) No
Feb 2022
2. a) –
b) There is a very strong positive linear relationship between age and mileage of the cars.
c) 𝑟 2 = 0.951. 95.1% of variation in mileage of the cars is explained by age of the car and the
other 4.9% is explained by other factors.
d) 𝑦̂ =3.927 + 13.997x; y = mileage, x = age
e) for every 1-year increase in age of the car, the mileage of the car will increase by 13,997km.
f) 𝑦̂ =64.114 (‘000km)
4. a) T = 0.1783
b) (3.0965, 3.7955)
c) Yes
6. a) 𝑥̅ = 87.5, s = 23.1862
b) The number of cars sold in May is the most consistent.
7. a) 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
𝐻1 : at least two means are different
b) Q = 136.5, R = 51.5
c) Reject 𝐻0