NHUMUOI Ws4 Spring25 Summary Statistics
NHUMUOI Ws4 Spring25 Summary Statistics
50% of Facebook users have 100 or more friends (median), while the average number of
friends (mean) is 190. Since the mean (190) is higher than the median (100), it implies
that the distribution of the number of Facebook friends is right-skewed.
2. [Shape of distribution]
(a) The housing prices are right-skewed because 25% of houses cost below $350,000,
50% below $450,000, and 75% below $1,000,000, with a significant number of houses
costing more than $6,000,000. The large gap between Q3 = $1,000,000 and the extreme
values $6,000,000 indicates a long right tail. Therefore, the median is the best measure of
central tendency, and the IQR is the best measure of variability.
(b) 25% of houses cost below $300,000, 50% below $600,000, and 75% below $900,000,
1
MATH105-Intro to Stats Worksheet #4 Spring 25
with very few houses exceeding $1,200,000. Therefore, the distribution is only slightly
right-skewed and nearly symmetric due to the nearly equal gaps. The median effectively
represents the typical house price, and the IQR appropriately captures the variability,
reducing the influence of rare expensive homes.
(c) This distribution is right-skewed, because the lower quartiles Q1 & Q2 are around
zero when most don’t drink but there are some who drink excessively, leading to a long
right tail. The median best represents typical consumption, while the IQR effectively
measures variability by focusing on the middle 50% of drinkers and excluding heavy
drinkers.
(d) In this case, most employees earn similar salaries, but a few high-level executives
earn disproportionately more. This may create a right-skewed distribution where the
upper quartile (Q3) and extreme salaries have a large gap. However, it just happens when
there is a very big gap among salaries. The median is the most suitable measure of central
tendency, and the IQR best represents the spread of typical salaries by ignoring the
extreme outliers.
3. [Variance and standard deviation (std)] The time between an electric light
stimulus and a bar press to avoid a shock was noted for each of the five conditioned rats.
Use the definition (formula) to compute the sample variance and the standard deviation
(std). Shock avoidance times (in seconds) are:
a. 5, 4, 3, 1, 3
Compute and then compare the mean and std in (a) and (b).
5+4+3+1+3
(a) 𝜇 = = 3.2
5
(5−3.2)2+(4−3.2)2+(3−3.2)2+(1−3.2)2+(3−3.2)2
s2 = = 2.2
5−1
s= √2.2 = 1.48
2
MATH105-Intro to Stats Worksheet #4 Spring 25
3+3.5+3.5+2.8+3.2
b) 𝜇 = = 3.2
5
(3−3.2)2+(3.5−3.2)2+(3.5−3.2)2+(2.8−3.2)2+(3.2−3.2)2
s2 = = 0.095
5−1
s= √0.095 = 0.31
Both datasets have the same mean of 3.2. Dataset (a) has a higher variance (2.2) and
standard deviation (1.48), leading to more spread out data. Dataset (b) has a lower
variance (0.095) and standard deviation (0.31), meaning that (b) have a more consistent
data.
4. [Quantiles] The following data give noise levels (in decibels) measured at
different times directly outside of Grand Central Station in Manhattan.
82, 89, 94, 110, 74, 122, 112, 95, 100, 78, 65, 60, 90, 83, 87, 75, 114, 85
3
MATH105-Intro to Stats Worksheet #4 Spring 25
4
MATH105-Intro to Stats Worksheet #4 Spring 25
5. [Robust statistics]
5
MATH105-Intro to Stats Worksheet #4 Spring 25
a) The median would best represent the typical income of the 42 patrons at this coffee shop.
Before adding the two extremely high incomes ($225,000 and $250,000), the mean was $65,090
and the median was $65,240—both values were very close, reflecting the symmetric distribution.
However, after adding these high incomes, the mean jumped to $73,300, while the median only
slightly increased to $65,350. This significant change in the mean demonstrates its sensitivity to
extreme values, whereas the median remained stable. This indicates that the median is more
robust than the mean when outliers are present, making it a better measure of typical income in
this situation.
(b) The interquartile range (IQR) would best represent the variability in the incomes of the 42
patrons. Before adding the two high incomes, the standard deviation was $2,122, but it
drastically increased to $37,321 after the outliers were introduced. This sharp increase shows that
the standard deviation is highly sensitive to outliers, making it an unreliable measure of
variability in this case. In contrast, the IQR focuses on the middle 50% of the data and remains
largely unaffected by extreme values. Therefore, the IQR is a more robust and reliable measure
of variability compared to the standard deviation in the presence of outliers.