0% found this document useful (0 votes)
187 views20 pages

GE - FEL DP 101 - R Programming Activity

The document contains information about an R programming activity for a group of students. It includes two problem sets - the first involves descriptive statistics calculations and data visualizations for price data, while the second involves performing hypothesis tests to analyze various data sets. The hypothesis tests cover topics such as comparing means to a threshold, comparing modified and original values, comparing groups, and analyzing dependent before/after data. Boxplots are suggested to support conclusions.

Uploaded by

Lois Ira Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views20 pages

GE - FEL DP 101 - R Programming Activity

The document contains information about an R programming activity for a group of students. It includes two problem sets - the first involves descriptive statistics calculations and data visualizations for price data, while the second involves performing hypothesis tests to analyze various data sets. The hypothesis tests cover topics such as comparing means to a threshold, comparing modified and original values, comparing groups, and analyzing dependent before/after data. Boxplots are suggested to support conclusions.

Uploaded by

Lois Ira Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

GE-FEL DP101

R Programming Activity

Group 5 GE-FEL DP 101(M,W 3:00pm-4:30 pm)

Group 5:

Bojo, Aaron Henri

Lim, Lois Ira

Madarang, Keith Amirr

Micarsos, Demi Maureen

Sinining, Adrian Rey

Villena, Anakin Julian


Problem Set 1

1.Prices in pesos are gathered for a piece of face mask. There are 107 sellers studied around the
country.

a. For the given data compute the following descriptive measures: mean, median, mode,
variance, standard deviation, range, first quartile, third quartile, and IQR.
b. Construct the following graphs for the data: histogram, frequency polygon, stem-and-leaf
plot,and boxplot.
Problem Set 2

Perform a hypothesis test for the population mean. Assume that preliminary data analyses
indicate that it is reasonable to apply the z-test. Use the critical-value approach.

1) The maximum acceptable level of a certain toxic chemical in vegetables has been set at 0.4
parts per million (ppm). A consumer health group measured the level of the chemical in a
random sample of tomatoes obtained from one producer. The levels, in ppm, are shown below.

0.31 0.47 0.19 0.72 0.56

0.91 0.29 0.83 0.49 0.28

0.31 0.46 0.25 0.34 0.17

0.58 0.19 0.26 0.47 0.81

At the 5% significance level, does the data provide sufficient evidence to conclude that the mean
level of the chemical in tomatoes from this producer is greater than the recommended level of
0.4 ppm? Assume that the population standard deviation of levels of the chemical in these
tomatoes is 0.21 ppm.

1. State the Null and Alternative hypothesis

H0 : μ = 0.4 ppm
HA : μ > 0.4 ppm

2. Level of significance is 5% or 0.05

Critical Value : One-tailed test = 0.05


Critical Value: According to the Z table, Z = 1.65 had a 95.05 percent area to the left.
Subtraction from 100 percent yields a significance level of about 5%

3. Test statistic:
4. Decision:

We will not reject the Null Hypothesis because 0.95 1.65 is greater than 0.95.

5. Conclusion:

As a result, at a 5% significance level, the mean level of the chemical in tomatoes from this
producer is 0.4 ppm.

Preliminary data analyses indicate that it is reasonable to use a t-test to carry out the
specified hypothesis test. Perform the t-test.

2) In tests of a computer component, it is found that the mean time between failures is 520 hours.
A modification is made which is supposed to increase the time between failures. Tests on a
random sample of 10 modified components resulted in the following times (in hours) between
failures.

518 548 561 523 536

499 538 557 528 563

At the 0.05 significance level, test the claim that for the modified components, the mean time
between failures is greater than 520 hours.

1. State the Null and Alternative hypothesis

H0 : μ = 520 hours
HA : μ > 520 hours

2. Level of significance is 5% or 0.05

Critical value: One-tailed test = 0.05


Critical value: degrees of freedom = n – 1 = 9 ; 1.83
3. Test statistic:

4. Decision:

We reject the Null Hypothesis since our test statistic (t) 2.612 > 1.83 and p-value (one sided) =
0.014 are less than our significance level of 0.05.

5. Conclusion:

As a result, at a 5% level of significance, we may conclude that the mean duration between
failures is larger than 520 hours.

Perform the required hypothesis test.


3) A researcher was interested in comparing the salaries of female and male employees at a
particular company. Independent simple random samples of 8 female employees and 15 male
employees yielded the following weekly salaries (in dollars).
At the 5% significance level, does the data provide sufficient evidence to conclude that at this
particular company the mean salary of female employees is less than the mean salary of male
employees? Assume that the variances are equal. Present boxplots to support the conclusion.

1. Declare the hypothesis:

H0 : μ1 = μ2 (There is no significant difference in salaries between men and women.)


HA : μ1 < μ2 (The mean salary of female employees is lower than that of male
employees.) Where μ1 is the salary of female employees while μ2 is the salary of male
employees

2-3. To get the critical value and level of significance

Because (μ1 < μ2), a one-tailed test was used. Because it is on the left side of the curve,
the t-critical one-tail is -1.721.
The t-statistic is -0.881 4.

4.

The t-statistic is greater than the critical value (-0.881 > -1.721)
The p-value is greater than the significance level (0.194 > 0.05)
Refuse to reject the null hypothesis 5. At a 5% level of significance, there is insufficient
evidence that the average salary of female employees is lower than the average salary of
male employees.
5. At a 5% significance level, there is insufficient evidence that company the mean salary of
female employees is less than the mean salary of male employees

1. Comparison of medians: Within their comparisons, the boxplots' medians are very
comparable.

2. Comparison of dispersion: The male salary boxplot is slightly more dispersed due to the
higher IQR, while the data for the male salary is more scattered due to the longer whisker.

3. Potential outliers: There are no outliers.

4. Skewness: Both boxplots are slightly tilted to the left.

5. Conclusion: Because the medians change by such a little amount, it's tough to detect the
difference. Because the data for the male salary boxplot is much more dispersed and scattered,
we propose that the survey be conducted with a larger sample size. Overall, the data is
insufficient to infer that the two wages are different.

4) A researcher was interested in comparing the heights of women in two different countries.
Independent simple random samples of 9 women from country A and 9 women from country B
yielded the following heights (in inches).
At the 10% significance level, does the data provide sufficient evidence to conclude that the
mean height of women in country A is greater than the mean height of women in country B?
Assume that the variances are NOT equal. Present boxplots to support the conclusion.

Analyzing the given: INDEPENDENT random samples, sample sizes are LESS THAN 30, 10%
significance level, UNEQUAL variance, UNKNOWN variance.

1. Declare the Hypothesis

H0 : μA = μB (There is no significant difference between the heights of women from


countries A and B)
HA : μA > μB (The average height of women in nation A is higher than that of women in
country B.)

2.3 get the test statistic, critical value and p-value


Because (μA > μB), the test is one-tailed. Because it is on the right side of the curve,
t-critical one-tail is 1.341.
The t-statistic is 1.8895
The p-value is 0.039

4. Decision:

Test statistic is greater than critical value (1.8895 > 1.341)


P-value is less than the significance level of 0.10
Null hypothesis is rejected

5. Conclusion:

There is enough data to establish that the average height of women in nation A is larger than the
average height of women in country B at a 10% significance level.
1. Comparison of medians: Nation B has a lower median and is outside the comparison boxplot's
box, which is occupied by country A.

2. Comparison of dispersion: The range of the boxplots, as well as the dispersion of their data,
are identical. Their IQRs are only slightly different.

3. Potential outliers: There are no outliers.

4. Skewness: Both boxplots are slanted to the right.

5. Conclusion: It is clear from the boxplot that there is a difference. Country B's median is
outside the comparison boxplot, indicating a significant difference. The average height of
Country B is only in the first quartile of Country A.

5) The table below shows the weights, in pounds, of seven subjects before and after following a
particular diet for two months. B

At the 1% significance level, does the data provide sufficient evidence to conclude that the diet is
effective in reducing weight? Present boxplots to support the conclusion.

Analysis of the given:The data is dependent since this is a before and after experiment. For each
set of data, the sample size is less than 30. The variations are unknown and the significance level
is 1%. For two paired means, we may apply the t-test.

1. Declare the hypothesis:

H0 : μbefore = μafter (There is no difference between the before and after weights)
HA : μbefore > μafter (The diet is effective in reducing weight)
2-3. get the critical value and level of significance

Because (μbefore > μafter), the test is one-tailed. Because it is on the right side of the
curve, t-critical one-tail is 3.143.
The t-statistic is 1.954
The p-value is 0.049

4. Decision:

Our test statistic is less than the critical value (1.954 < 3.143).
The p-value is greater than the significance level
Do not reject the null hypothesis

5. Conclusion:

There is insufficient data to indicate that the diet is successful in weight loss at a 1% significance
level.
1. Comparison of medians: The difference between the median of the before weight and the
median of the after weight is quite tiny (166:168). On their respective comparison box plots, both
medians are located.

2. Comparison of dispersion: The before weight range is greater than the after weight range.
However, because the IQR of the before boxplot is 33 and the IQR of the after boxplot is 35, the
box sizes are almost identical.

3. Potential outliers: There are no outliers.

4. Skewness: The before weight is symmetrical, with a longer whisker to the right, but the after
weight is left-skewed, with a short whisker issuing from the right.

5. Conclusion: Apart from the little skewness difference, it's difficult to evaluate whether the diet
is helpful because the median and mean only differ by a small amount and are still pretty similar.

6) In a clinical study of an allergy drug, 109 of the 201 subjects reported experiencing significant
relief from their symptoms. At the 0.01 significance level, do the data provide sufficient evidence
to conclude that a majority of all those using the drug experience relief?

Ho: p = 0.5 Ha: p>0.5

A = 0.01
Test Statistics: z = 0.69

Critical Value: z = 2.33

Fail to reject the null hypothesis. There is no sufficient evidence that at 0.01 or 1% significance
level it can conclude and support the claim that 54% of the sample experience relief.

Declare the hypothesis Ho: Po = 0.50 Ha: Po > 0.50


Level of significance = 0.01
Critical Value = 2.33 (positive one tailed z test)
Test statistics: Chi squared, p-value and Z

Conclusion:

There is insufficient data to indicate that a majority of persons who use the medicine receive
alleviation at a 1% significance level.

7) A marketing survey involves product recognition in New York and California. Of 558 New
Yorkers surveyed, 193 recognized the product while 196 out of 614 Californians recognized the
product. At the 5% significance level, do the data provide sufficient evidence to conclude that the
recognition rate in New York differs from the recognition rate in California?

1. Declare the hypothesis Ho: p1= p2


Ha: p1 ≠ p2
2. Level of significance = 0.05
3. Critical Value = ±1.96 (two tailed z-test)
4. Test Statistics :

5. Decision:

​ ● The chi-square statistic falls between 0.1 and 0.9 which means that the p-value is also in the interval
​ ● The p-value is 0.3331
​ ● Because the test statistic does not fall in the critical region and the p-value is greater than
the significance level, We fail to reject the null hypothesis

6. Conclusion: Thus, with a 5% significance level, there is no sufficient evidence to prove that the recognition rate
in New York differs from the recognition rate in California.

8) A consumer magazine wants to compare the lifetimes of ballpoint pens of three different
types. The magazine takes a random sample of pens of each type in the following table.
Does the data indicate that there is a difference in a mean lifetime for the three brands of
ballpoint pens? Use α = 0.01. Which pairs are significant? Present boxplots to support each
conclusion.

1. Declare the Hypothesis


● Ho:μ1 =μ2=μ3
● H1 : At least one mean of a brand is different from another
2. Level of Significance = 0.01
3. Test statistic
4. Decision: Since the P-value is greater than the significance level of 0.01, we will not reject the null
hypothesis
5. Conclusion: Thus, with a 1% significance level, there is no difference in the mean lifetime of the three
different brands of a ballpoint pen.

From the significance value between pairs, No pair of means exist that are significantly different from each other.

9) The data below arise from a large study of risk-taking. Students were randomly assigned to
three different treatments labeled AA, C, and NC. Students were administered two parallel forms
of a test called ‘low’ and ‘high’. Perform a two-way ANOVA.

H0 = There is no significant difference in student results for any treatment type. HA = There is a significant difference
in student results for any treatment type.
H0 = There is no significant difference in student results for either of the two parallel forms of test. HA = There is a
significant difference in student results for either of the two parallel forms of test.

H0 = There is no interaction effect between the treatment groups and two parallel forms of test. HA = There is an
interaction effect between the treatment groups and two parallel forms of test.

Significance level : 0.05 or 5%

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy