Solution STA101 Assignment 1&2 Summer24
Solution STA101 Assignment 1&2 Summer24
(ii)
Nominal Ordinal Interval Ratio
Zip code Grade IQ Height
Gender Rating SAT score Time
Eye color Ranking Temperature (F, C) Weight
2. A sample of 100 students was taken, and these students were asked about the amount of
money they possess. The following table gives the frequency distribution of their responses.
[7]
Amount of Number of Amount of Number of
Money (Tk.) Students Money (Tk.) Students
0 - 99 18 500 - 599 √49
∑ ⬚ 𝑓𝑖 = 100
𝑖=1
c & d)
Relativ Percentage Cumulative Cumulative Cumulative
e
Frequency Relative Percentage
Freque Frequency
Amount 𝑓𝑖 ncy
Modal
Class 0 - 99 18 0.18 18 18 0.18 18
900 - 999
5 0.05 5 100 1.00 100
10
∑ ⬚ 𝑓𝑖
𝑖=1
= 100
e)
i. For minimum 500, we will consider classes from 500-599 to 900-999
f)
3. The following data set represents the record high temperatures in degree Fahrenheit (℉)
for each of the 50 US states: [5.5]
a) Construct a suitable frequency distribution table using interval 85 – 95, 95 – 105 and
so on. [2]
b) Construct a stem and leaf plot and mention the interesting features like maximum and
minimum value, range, modal value and median value. [3.5]
125-135 I 1 0.02 2
Total 50 1 100
Stem Leaf
8 5558999
9 000123336666678899
10 0122234567889
11 00113469
12 0235
Minimum value = 85
Range = 125-85 = 40
4. The number of Tesla, Inc. employees who will be selected for various salary bands in 2023 is
demonstrated in the following table: [5.0]
Here, Y is the last digit of your student ID (i.e., 20100012, 21123415, etc.). Suppose your ID
is 20100012, then the 2nd last row (70k-80k) will be √121+2 = 13]. Estimate the following:
a) Find the Range of wages ($) and complete the frequency distribution table. [0.5+0.5]
b) Find:
i. Mean [1.0]
ii. Median [1.5]
iii. Mode [1.5]
50 - 60
(Modal Class) 31 55 41 1705
60 - 70
(Median Class) 19 65 60 1235
70 - 80 13 75 73 975
80 - 90 15 85 88 1275
5 5
∑ ⬚ 𝑓𝑖 ∑ ⬚ 𝑓𝑖 𝑥𝑖
𝑖=1 𝑖=1
= 88 = 5640
b)
i) Mean:
∑5𝑖=1 ⬚𝑓𝑖 𝑥𝑖 5640
𝑥= ∑5𝑖=1 ⬚𝑓𝑖
= 88
= 64.09
ii) Median:
𝑛 88
= = 44
2 2
𝐿𝑚 = 60
𝐹 = 41
𝑓𝑚 = 19
𝑐 = 10
STA101 (Introduction to Statistics) _Assignment 1&2_Summer 24
𝑛
−𝐹 44 − 41
2
Median = 𝐿𝑚 + ∗ 𝑐 = 60 + ∗ 10 = 61.5789
𝑓𝑚 19
3 64 62.10526 56.25
5 63.82979 63.15789 56
88 77 62 65 57 47 31 69
52 54 68 63 42 45 49 58
92 1 31
87 2 42
72 3 45
65 4 47
86 5 49
69 8 57 P30=>7.2th=8th=57
88 9 58
77 10 62
62 11 63
47 14 68
31 15 69
69 16 69
52 17 72 P67=>16.08th=17th=72
63 20 81 D8=>19.2th=20th=81 P80=>19.2th=20th=81
42 21 86
45 22 87
49 23 88
58 24 92
Stem Leaf
3 1
4 2,5,7,9
5 2,4,7,8
6 2,3,5,8,9,9
7 2,7,7
8 1,6,7,8
9 2
1×24
Here n=24 and for 𝑄1 => = 6 (is an Integer value)
4
1 52+54
So 𝑄1 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 6𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 7𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = = 𝟓𝟑
2 2
2×24
Here n=24 and for 𝑄2 => = 12 (is an Integer)
4
1 65+65
So 𝑄2 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 12𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 13𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = = 𝟔𝟓
2 2
3×24
Here n=24 and for 𝑄3 => = 18 (is an Integer value)
4
1 77+77
So 𝑄3 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 18𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 19𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = = 𝟕𝟕
2 2
5×24
Here n=24 and for 𝐷5 => = 12 (is an Integer)
10
1 65+65
So 𝐷5 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 12𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 + 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 13𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = = 𝟔𝟓
2 2
8×24
Here n=24 and for 𝐷8 => = 19.2 (is not an Integer)
10
30×24
Here n=24 and for 𝑃30 => = 7.2 (is not an Integer)
100
So 𝑃30 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 8𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = 𝟓𝟕
80×22
Here n=24 and for 𝑃80 => = 19.2 (is not an Integer)
100
So 𝑃80 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 20𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = 𝟖𝟏
67×24
Here n=24 and for 𝑃67 => = 16.08 (is not an Integer)
100
So 𝑃67 = [𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 17𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛] = 𝟕𝟐
e) 𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏
= 77 − 53
= 24
Outliers are identified as individual data points that fall outside the whiskers, which
extend to the minimum and maximum values within 1.5 times the interquartile
range (IQR).
So,
Since, there is no data points outside this range so there are is outlier.
STA101 (Introduction to Statistics) _Assignment 1&2_Summer 24
g) Coefficient of skewness:
𝐵𝑜𝑤𝑙𝑒𝑦 ′ 𝑠 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠
(𝑄3 − 𝑄2 ) − (𝑄2 − 𝑄1 )
=
𝑄3 − 𝑄1
(77 − 65) − (65 − 53)
= = 0.0
77 − 53
Sample 2:
99 102 110 33 56 112 130 111 124 155
201 209 103 66 84 75 107 202 59
a) What are the sample size of the sample 1 & 2 individually? [0.5]
b) Compute the sample mean, variance, and standard deviation for sample 1. [1+2.5+0.5]
c) Compute the sample mean, variance, and standard deviation for sample 2. [1+2.5+0.5]
d) Compute the Coefficient of variation for sample 1. [0.5]
e) Compute the Coefficient of variation for sample 2. [0.5]
f) Which measure one should consider to compare the performance/consistency among the
sample data? And why? [2]
g) For which sample of commercial oils, the relative variability of oxidation-induction time is
higher? [1]
b) For sample 1:
87+103+130………..+119+129 2563
Sample mean = = = 134.895
19 19
∑𝑛 ̅ )2
𝑖=0(𝐱𝐢− 𝐱 22765.7895
Variance = = = 1264.766
𝑛−1 19−1
c) For sample 2:
99+102+110………..+202+59 2138
Sample mean = = = 112.526
19 19
∑𝑛 ̅)2
𝑖=0(𝐱𝐢− 𝐱 44576.7368
Variance = = = 2476.485
𝑛−1 19−1
𝑆𝐷 35.564
d) Coefficient of Variation, CV1 = x̅
×100 = 134.895 × 100 = 26.364%
𝑆𝐷 49.764
e) Coefficient of Variation, CV2 = x̅
×100 = 112.526 × 100 = 44.225%
f) The coefficient of Variation (CV)/ Standard Deviation one should consider to compare
the performance / consistency of the product of the two company based on the
situation.
The coefficient of variation represents the ratio of the standard deviation to the
mean, and it is a useful statistic for comparing the degree of variation from one data
series to another, even if the means are drastically different from one another.
g) As CV1 < CV2, Sample 2 of commercial oils has relatively higher variation in oxidation-
induction time comparing to Sample 1.