GE - FEL DP 101 - R Programming Activity
GE - FEL DP 101 - R Programming Activity
R Programming Activity
Group 5:
1.Prices in pesos are gathered for a piece of face mask. There are 107 sellers studied around the
country.
a. For the given data compute the following descriptive measures: mean, median, mode,
variance, standard deviation, range, first quartile, third quartile, and IQR.
b. Construct the following graphs for the data: histogram, frequency polygon, stem-and-leaf
plot,and boxplot.
Problem Set 2
Perform a hypothesis test for the population mean. Assume that preliminary data analyses
indicate that it is reasonable to apply the z-test. Use the critical-value approach.
1) The maximum acceptable level of a certain toxic chemical in vegetables has been set at 0.4
parts per million (ppm). A consumer health group measured the level of the chemical in a
random sample of tomatoes obtained from one producer. The levels, in ppm, are shown below.
At the 5% significance level, does the data provide sufficient evidence to conclude that the mean
level of the chemical in tomatoes from this producer is greater than the recommended level of
0.4 ppm? Assume that the population standard deviation of levels of the chemical in these
tomatoes is 0.21 ppm.
H0 : μ = 0.4 ppm
HA : μ > 0.4 ppm
3. Test statistic:
4. Decision:
We will not reject the Null Hypothesis because 0.95 1.65 is greater than 0.95.
5. Conclusion:
As a result, at a 5% significance level, the mean level of the chemical in tomatoes from this
producer is 0.4 ppm.
Preliminary data analyses indicate that it is reasonable to use a t-test to carry out the
specified hypothesis test. Perform the t-test.
2) In tests of a computer component, it is found that the mean time between failures is 520 hours.
A modification is made which is supposed to increase the time between failures. Tests on a
random sample of 10 modified components resulted in the following times (in hours) between
failures.
At the 0.05 significance level, test the claim that for the modified components, the mean time
between failures is greater than 520 hours.
H0 : μ = 520 hours
HA : μ > 520 hours
4. Decision:
We reject the Null Hypothesis since our test statistic (t) 2.612 > 1.83 and p-value (one sided) =
0.014 are less than our significance level of 0.05.
5. Conclusion:
As a result, at a 5% level of significance, we may conclude that the mean duration between
failures is larger than 520 hours.
Because (μ1 < μ2), a one-tailed test was used. Because it is on the left side of the curve,
the t-critical one-tail is -1.721.
The t-statistic is -0.881 4.
4.
The t-statistic is greater than the critical value (-0.881 > -1.721)
The p-value is greater than the significance level (0.194 > 0.05)
Refuse to reject the null hypothesis 5. At a 5% level of significance, there is insufficient
evidence that the average salary of female employees is lower than the average salary of
male employees.
5. At a 5% significance level, there is insufficient evidence that company the mean salary of
female employees is less than the mean salary of male employees
1. Comparison of medians: Within their comparisons, the boxplots' medians are very
comparable.
2. Comparison of dispersion: The male salary boxplot is slightly more dispersed due to the
higher IQR, while the data for the male salary is more scattered due to the longer whisker.
5. Conclusion: Because the medians change by such a little amount, it's tough to detect the
difference. Because the data for the male salary boxplot is much more dispersed and scattered,
we propose that the survey be conducted with a larger sample size. Overall, the data is
insufficient to infer that the two wages are different.
4) A researcher was interested in comparing the heights of women in two different countries.
Independent simple random samples of 9 women from country A and 9 women from country B
yielded the following heights (in inches).
At the 10% significance level, does the data provide sufficient evidence to conclude that the
mean height of women in country A is greater than the mean height of women in country B?
Assume that the variances are NOT equal. Present boxplots to support the conclusion.
Analyzing the given: INDEPENDENT random samples, sample sizes are LESS THAN 30, 10%
significance level, UNEQUAL variance, UNKNOWN variance.
4. Decision:
5. Conclusion:
There is enough data to establish that the average height of women in nation A is larger than the
average height of women in country B at a 10% significance level.
1. Comparison of medians: Nation B has a lower median and is outside the comparison boxplot's
box, which is occupied by country A.
2. Comparison of dispersion: The range of the boxplots, as well as the dispersion of their data,
are identical. Their IQRs are only slightly different.
5. Conclusion: It is clear from the boxplot that there is a difference. Country B's median is
outside the comparison boxplot, indicating a significant difference. The average height of
Country B is only in the first quartile of Country A.
5) The table below shows the weights, in pounds, of seven subjects before and after following a
particular diet for two months. B
At the 1% significance level, does the data provide sufficient evidence to conclude that the diet is
effective in reducing weight? Present boxplots to support the conclusion.
Analysis of the given:The data is dependent since this is a before and after experiment. For each
set of data, the sample size is less than 30. The variations are unknown and the significance level
is 1%. For two paired means, we may apply the t-test.
H0 : μbefore = μafter (There is no difference between the before and after weights)
HA : μbefore > μafter (The diet is effective in reducing weight)
2-3. get the critical value and level of significance
Because (μbefore > μafter), the test is one-tailed. Because it is on the right side of the
curve, t-critical one-tail is 3.143.
The t-statistic is 1.954
The p-value is 0.049
4. Decision:
Our test statistic is less than the critical value (1.954 < 3.143).
The p-value is greater than the significance level
Do not reject the null hypothesis
5. Conclusion:
There is insufficient data to indicate that the diet is successful in weight loss at a 1% significance
level.
1. Comparison of medians: The difference between the median of the before weight and the
median of the after weight is quite tiny (166:168). On their respective comparison box plots, both
medians are located.
2. Comparison of dispersion: The before weight range is greater than the after weight range.
However, because the IQR of the before boxplot is 33 and the IQR of the after boxplot is 35, the
box sizes are almost identical.
4. Skewness: The before weight is symmetrical, with a longer whisker to the right, but the after
weight is left-skewed, with a short whisker issuing from the right.
5. Conclusion: Apart from the little skewness difference, it's difficult to evaluate whether the diet
is helpful because the median and mean only differ by a small amount and are still pretty similar.
6) In a clinical study of an allergy drug, 109 of the 201 subjects reported experiencing significant
relief from their symptoms. At the 0.01 significance level, do the data provide sufficient evidence
to conclude that a majority of all those using the drug experience relief?
A = 0.01
Test Statistics: z = 0.69
Fail to reject the null hypothesis. There is no sufficient evidence that at 0.01 or 1% significance
level it can conclude and support the claim that 54% of the sample experience relief.
Conclusion:
There is insufficient data to indicate that a majority of persons who use the medicine receive
alleviation at a 1% significance level.
7) A marketing survey involves product recognition in New York and California. Of 558 New
Yorkers surveyed, 193 recognized the product while 196 out of 614 Californians recognized the
product. At the 5% significance level, do the data provide sufficient evidence to conclude that the
recognition rate in New York differs from the recognition rate in California?
5. Decision:
● The chi-square statistic falls between 0.1 and 0.9 which means that the p-value is also in the interval
● The p-value is 0.3331
● Because the test statistic does not fall in the critical region and the p-value is greater than
the significance level, We fail to reject the null hypothesis
6. Conclusion: Thus, with a 5% significance level, there is no sufficient evidence to prove that the recognition rate
in New York differs from the recognition rate in California.
8) A consumer magazine wants to compare the lifetimes of ballpoint pens of three different
types. The magazine takes a random sample of pens of each type in the following table.
Does the data indicate that there is a difference in a mean lifetime for the three brands of
ballpoint pens? Use α = 0.01. Which pairs are significant? Present boxplots to support each
conclusion.
From the significance value between pairs, No pair of means exist that are significantly different from each other.
9) The data below arise from a large study of risk-taking. Students were randomly assigned to
three different treatments labeled AA, C, and NC. Students were administered two parallel forms
of a test called ‘low’ and ‘high’. Perform a two-way ANOVA.
H0 = There is no significant difference in student results for any treatment type. HA = There is a significant difference
in student results for any treatment type.
H0 = There is no significant difference in student results for either of the two parallel forms of test. HA = There is a
significant difference in student results for either of the two parallel forms of test.
H0 = There is no interaction effect between the treatment groups and two parallel forms of test. HA = There is an
interaction effect between the treatment groups and two parallel forms of test.