Lab 3 Manual
Lab 3 Manual
STATS 13 – Kim
Winter 2025
## [1] TRUE
5 < 3
## [1] FALSE
You can use a relational operator on a vector against a number to perform the comparison on all elements of
a vector. All the typical operators are demonstrated as follows.
3 > c(1, 3, 5) # Greater than
## [1] 3
1
mean(logical_vec)
## [1] 0.75
## [1] 193
2
## flavor weight diameter taste
## 5 Milk Tea 35.78 7.35 5
## 7 Milk Tea 35.90 7.20 5
## 9 Milk Tea 35.85 6.16 5
## 12 Milk Tea 35.76 6.75 5
## 13 Milk Tea 36.33 7.18 5
## 22 Milk Tea 36.80 7.33 5
Now suppose we wanted a dataset with either a large weight or a taste rating of 5. We can use the | symbol
to require at least one of these conditions.
large_or_tasty_index <- chocopie$weight >= 35 | chocopie$taste == 5
chocopie_large_or_tasty <- chocopie[large_or_tasty_index, ]
head(chocopie_large_or_tasty)
With Your TA
Question 1: (1 point) Read in the chocopie.csv file into an object called chocopie. Print out the
first 6 rows and verify it matches the lab manual.
On Your Own
Question 2: (1 point) Save a subset of the chocopie dataset that contains only data with taste
ratings of 4 or more. Save this into an object called chocopie_subset_1 and print the first 6 rows.
Question 3: (1 point) Save a subset of the chocopie dataset that contains only data with taste
ratings exactly 3 and weights of 35 or less. Save this into an object called chocopie_subset_2 and
print the first 6 rows.
Question 4: (1 point) Save a subset of the chocopie dataset that contains only data with taste
ratings of 2 or taste ratings of 4. Save this into an object called chocopie_subset_3 and print the
first 6 rows.
## [1] 4
The first argument we put into sample() is the sample space that we would like to draw from. In this case,
3
we put in the vector 1:6 since it is the sample space of a die. The size argument indicates how many
samples we would like to take, and the replace argument indicates if we would like to sample with or without
replacement. We will always set this to TRUE, otherwise an outcome will be removed from the sample space
after it is drawn. The sample() function chooses outcomes all with equal probability by default. As another
example, consider drawing 5 samples from the numbers 1 through 10, all with equal probability.
set.seed(544)
sample(1:10, size = 5, replace = TRUE)
## [1] 4 10 2 8 3
Use what you learned about the sample() function to answer the following questions.
On Your Own
Question 5: (1 point) Set the seed to 1464 and draw 10 samples from the numbers 1 through 4, each
having equal probability.
Question 6: (1 point) Set the seed to 8535 and draw 7 samples from the numbers 10 through 30,
each having equal probability.
## [1] 2
## [1] 5
## [1] 10
## [1] 17
## [1] 26
Here, we calculated x2 + 1 for the numbers 1 through 5. Thus we have effectively executed 5 lines of code
with a single line of code inside a for loop.
4
On Your Own
Question 7: (1 point) Use a for loop to print your full name three times.
Question 8: (1 point) Use a for loop to print the calculation of (x − 2)3 for the numbers 5 through 10.
Non-Parametric Bootstrap
We will turn our attention to the Non-Parametric Bootstrap, which is a method of approximating a distribution
based on the data. It does this by empirically approximating the original distribution with a discrete one
that samples each data point with probability 1/n (equal probability). Combining the sample() function
and a for loop make it easy to simulate these distributions.
(1) Setup:
Our hypotheses would be as follows.
H0 : E[X] = 3.7
H1 : E[X] ̸= 3.7
Let us assume α = 0.05. Notice that we use E[X] since we are talking about a population mean, not a
parameter.
## [1] 3.7
We can see that the mean of Femp indeed matches the null hypothesis. Now, we will sample from Femp to
construct a sampling distribution of the mean X emp .
M <- 1000 # Number of bootstrap samples
n <- length(chocopie$taste) # Sample size
5
}
Note that the length() function simply returns the size of a vector. This is an easy way to get the sample
size. Recall that square bracket allows you to access a given element in a vector, hence X_bar_emp[i] would
yield the ith element of X_bar_emp. Notice that we can use it to assign a value to the ith element as well, as
you see here.
Let’s view our sampling distribution with the histogram() function. Recall that you will need to load the
mosaic package to use it, and note that mosaic is loaded automatically in your templates.
library(mosaic)
histogram(X_bar_emp)
6
Density
X_bar_emp
6
# Count number of samples in these regions
upper_sum <- sum(X_bar_emp >= upper_ex)
lower_sum <- sum(X_bar_emp <= lower_ex)
# Calculate proportion
pvalue <- (upper_sum + lower_sum)/M
pvalue
## [1] 0.009
Thus we have a p-value of 0.009. Critical regions are much simpler. Since this is a two-tailed test, we want to
figure out what values of the sampling distribution correspond to the bottom 0.025 area under the curve and
the top 0.025 area under the curve. In other words, we want the 2.5 and 97.5 percentiles. We can do this
with the quantile() function.
quantile(X_bar_emp, probs = c(0.025, 0.975))
## 2.5% 97.5%
## 3.606 3.798
Therefore the critical regions are (−∞, 3.606], [3.798, ∞).
(4) Decision
Since our p-value of 0.009 is less than or equal to our α = 0.05, we reject H0 . Equivalently, we can see that
our xobs = 3.824 is inside the critical regions of (−∞, 3.606], [3.798, ∞), thus we reject H0 .
(5) Conclusion
We conclude that Professor Kim’s taste ratings of chocopies different than other taste raters, on average
(subject to α = 0.05).
Now, using this example as a reference, conduct the hypothesis test using the non-parametric bootstrap.
With Your TA
Consider the weight variable in the chocopie dataset. Suppose the chocopies have an advertised
weight of 35 grams. Professor Kim wonders if his chocopies are the same weight as advertised or
not. Consider the research question: Are Professor Kim’s chocopie weights (X) different than the
advertised weight, on average? Assume α = 0.05.
Question 9: (1 point) State the null and alternative hypotheses. Write it in LATEX code.
Question 10: (1 point) Use the re-centering technique to construct Femp such that it obeys H0 .
Print the mean to verify H0 is true.
Question 11: (1 point) Using a loop and the sample() function like the example above, create
X emp . Set M = 1000, and use 7947 as a seed. Print a histogram of this sampling distribution.
Question 14: (1 point) Calculate and print the critical values and state the critical regions. Write
your regions using LATEX code (your TA will teach you).
7
On Your Own
Question 15: (1 point) What is the decision? You may use any scale to make your decision.
On Your Own
Now, you will conduct a hypothesis completely on your own, using the re-centering technique you learned.
For Questions 17 to 24, your R and LATEXcode should mirror that of Questions 9 to 16. However, it is up to
you to modify the correct parts of the code to conduct the new hypothesis test correctly.
On Your Own
Consider the diameter variable in the chocopie dataset. Suppose the chocopies have an advertised
diameter of 6.5 cm. Professor Kim wonders if his chocopies are the same diameter as advertised or
not. Consider the research question: Are Professor Kim’s chocopie diameters (X) different than the
advertised diameter, on average? Assume α = 0.01.
Question 17: (1 point) State the null and alternative hypotheses. Write it in LATEX code.
Question 18: (1 point) Use the re-centering technique to construct Femp such that it obeys H0 .
Print the mean to verify H0 is true.
Question 19: (1 point) Using a loop and the sample() function like the example above, create
X emp . Set M = 1000, and use 5820 as a seed. Print a histogram of this sampling distribution.
Question 22: (1 point) Calculate and print the critical values and state the critical regions. Write
your regions using LATEX code.
Question 23: (1 point) What is the decision? You may use any scale to make your decision.