0% found this document useful (0 votes)
6 views6 pages

NHUMUOI Ws4 Spring25 Summary Statistics

The document is a worksheet for an introductory statistics course, covering topics such as distribution shapes, variance, standard deviation, quantiles, and robust statistics. It includes examples and calculations related to Facebook friends, housing prices, noise levels, and income variability, highlighting the importance of using median and IQR in the presence of skewed data and outliers. The worksheet provides practical exercises for students to apply statistical concepts and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

NHUMUOI Ws4 Spring25 Summary Statistics

The document is a worksheet for an introductory statistics course, covering topics such as distribution shapes, variance, standard deviation, quantiles, and robust statistics. It includes examples and calculations related to Facebook friends, housing prices, noise levels, and income variability, highlighting the importance of using median and IQR in the presence of skewed data and outliers. The worksheet provides practical exercises for students to apply statistical concepts and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

MATH105-Intro to Stats Worksheet #4 Spring 25

Name: La Thi Nhu Muoi Date:


16/01/2025

1. [Distribution shape: mean vs median]

50% of Facebook users have 100 or more friends (median), while the average number of
friends (mean) is 190. Since the mean (190) is higher than the median (100), it implies
that the distribution of the number of Facebook friends is right-skewed.

2. [Shape of distribution]

(a) The housing prices are right-skewed because 25% of houses cost below $350,000,
50% below $450,000, and 75% below $1,000,000, with a significant number of houses
costing more than $6,000,000. The large gap between Q3 = $1,000,000 and the extreme
values $6,000,000 indicates a long right tail. Therefore, the median is the best measure of
central tendency, and the IQR is the best measure of variability.
(b) 25% of houses cost below $300,000, 50% below $600,000, and 75% below $900,000,

1
MATH105-Intro to Stats Worksheet #4 Spring 25

with very few houses exceeding $1,200,000. Therefore, the distribution is only slightly
right-skewed and nearly symmetric due to the nearly equal gaps. The median effectively
represents the typical house price, and the IQR appropriately captures the variability,
reducing the influence of rare expensive homes.
(c) This distribution is right-skewed, because the lower quartiles Q1 & Q2 are around
zero when most don’t drink but there are some who drink excessively, leading to a long
right tail. The median best represents typical consumption, while the IQR effectively
measures variability by focusing on the middle 50% of drinkers and excluding heavy
drinkers.
(d) In this case, most employees earn similar salaries, but a few high-level executives
earn disproportionately more. This may create a right-skewed distribution where the
upper quartile (Q3) and extreme salaries have a large gap. However, it just happens when
there is a very big gap among salaries. The median is the most suitable measure of central
tendency, and the IQR best represents the spread of typical salaries by ignoring the
extreme outliers.

3. [Variance and standard deviation (std)] The time between an electric light
stimulus and a bar press to avoid a shock was noted for each of the five conditioned rats.
Use the definition (formula) to compute the sample variance and the standard deviation
(std). Shock avoidance times (in seconds) are:

a. 5, 4, 3, 1, 3

b. 3, 3.5, 3.5, 2.8, 3.2

Compute and then compare the mean and std in (a) and (b).

5+4+3+1+3
(a) 𝜇 = = 3.2
5
(5−3.2)2+(4−3.2)2+(3−3.2)2+(1−3.2)2+(3−3.2)2
s2 = = 2.2
5−1

 s= √2.2 = 1.48

2
MATH105-Intro to Stats Worksheet #4 Spring 25

3+3.5+3.5+2.8+3.2
b) 𝜇 = = 3.2
5
(3−3.2)2+(3.5−3.2)2+(3.5−3.2)2+(2.8−3.2)2+(3.2−3.2)2
s2 = = 0.095
5−1

 s= √0.095 = 0.31
Both datasets have the same mean of 3.2. Dataset (a) has a higher variance (2.2) and
standard deviation (1.48), leading to more spread out data. Dataset (b) has a lower
variance (0.095) and standard deviation (0.31), meaning that (b) have a more consistent
data.

4. [Quantiles] The following data give noise levels (in decibels) measured at
different times directly outside of Grand Central Station in Manhattan.
82, 89, 94, 110, 74, 122, 112, 95, 100, 78, 65, 60, 90, 83, 87, 75, 114, 85

a) Determine the quartiles and IQR.


Arranged data:
60, 65, 74, 75, 78, 82, 83, 85, 87, 89, 90, 94, 95, 100, 110, 112, 114, 122
Because there is an even number of data values (18), the median is the mean of the ninth
and tenth values.
(87+ 89)/2 = 88 => Q2 = 88
The median of this half is found in the fifth position:
60, 65, 74, 75, 78, 82, 83, 85, 87
 The first quartile is found to equal Q1 = 78
To find the third quartile, we look at the median of the top half of the original data set.
89, 90, 94, 95, 100, 110, 112, 114, 122
 The third quartile Q3 = 100.
IQR = Q3 − Q1 = 100 − 78 = 22

b) Draw a boxplot of the noise levels.

3
MATH105-Intro to Stats Worksheet #4 Spring 25

c) Find the top 20% noise levels


Percentile = 80
Total count of values (N)= 18
Percentile = (n/N) x 100
From the given formula we can find n by
n= (P x N)/100
= (80 x 18) / 100
=14.4
And 80th percentile value is higher than 100 (14th position), but lower than 110.
 Top 20% noise levels can be 110, 112, 114, 122.

d) Find the bottom 10% noise levels.


Percentile = 10

4
MATH105-Intro to Stats Worksheet #4 Spring 25

Total count of values (N)= 18


Percentile = (n/N) x 100
From the given formula we can find n by
n= (P x N)/100
= (10 x 18) / 100
=1.8
And 10th percentile value is between the 1st and 2nd values in the sorted data.
 Bottom 10% noise levels is 60.

5. [Robust statistics]

5
MATH105-Intro to Stats Worksheet #4 Spring 25

a) The median would best represent the typical income of the 42 patrons at this coffee shop.
Before adding the two extremely high incomes ($225,000 and $250,000), the mean was $65,090
and the median was $65,240—both values were very close, reflecting the symmetric distribution.
However, after adding these high incomes, the mean jumped to $73,300, while the median only
slightly increased to $65,350. This significant change in the mean demonstrates its sensitivity to
extreme values, whereas the median remained stable. This indicates that the median is more
robust than the mean when outliers are present, making it a better measure of typical income in
this situation.

(b) The interquartile range (IQR) would best represent the variability in the incomes of the 42
patrons. Before adding the two high incomes, the standard deviation was $2,122, but it
drastically increased to $37,321 after the outliers were introduced. This sharp increase shows that
the standard deviation is highly sensitive to outliers, making it an unreliable measure of
variability in this case. In contrast, the IQR focuses on the middle 50% of the data and remains
largely unaffected by extreme values. Therefore, the IQR is a more robust and reliable measure
of variability compared to the standard deviation in the presence of outliers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy