Data Analytics Ass Group-4 Updated
Data Analytics Ass Group-4 Updated
PRESENTED BY : GROUP 4
LEVEL 400
SCHOOL : NAHPI
DEPARTMENT : COME
𝑥1 + 𝑥2 +𝑥3 +⋯+𝑥𝑛
𝑥̅ =
𝑛
For example, consider the data set {2, 4, 4, 4, 5, 5, 7, 9} . The mean would be calculated as follows:
2+4+4+4+5+5+7+9
𝑥̅ = =5
8
So, the mean of this data set is 5. The mean provides a measure of central tendency and is useful for
summarizing a set of values with a single representative value.
Median
The median is another measure of central tendency in a data set. To find the median, you first need to
arrange the values in the data set in ascending order. Then, if the number of values (n) is odd, the
median is the middle value. If the number of values is even, the median is the average of the two middle
values.
For example, consider the data set {4, 9, 4, 5, 2, 5, 4, 7}. After arranging the values in ascending order,
you get {2, 4, 4, 4, 5, 5, 7, 9}. Since there are 8 values (an even number), the median is the average of the
two middle values, which are 4 and 5:
4+5 9
Median = = = 4.5
2 2
So, in this case, the median of the data set is 4.5. The median is less sensitive to extreme values than the
mean and can be a useful measure of central tendency, especially when dealing with skewed
distributions or data sets with outliers.
Mode
The mode of a data set is the value (or values) that appear most frequently. In other words, it is the
value that occurs with the highest frequency in a data set.
A data set may have one mode, more than one mode, or no mode at all.
For example, consider the data set {2, 4, 4, 4, 5, 5, 7, 9}. In this case, the mode is 4 because it appears
more frequently than any other value.
In a situation where there are multiple values with the same highest frequency, the data set is said to be
multimodal, and it has more than one mode. If no value is repeated, the data set is considered to have no
mode.
It's worth noting that unlike the mean and median, the mode does not necessarily provide a measure of
central tendency. It simply identifies the most frequently occurring values in a data set.
Ungrouped Data
Data set from page 551, exercise 3.
Mean
∑𝑥 Sum of given data
𝑥̅ = =
𝑛 number of give data
28 + 20 +32 + 44 + 28 + 30 + 30 + 26 + 28 + 34
𝑥̅ =
10
𝑥̅ = 30 m
Median
Arranging data in ascending order.
20 m, 26 m, 28 m, 28 m, 28 m, 30 m, 30 m, 32 m, 34 m, 44 m
since the we have an even data set, the median is the mean of the two meddle values.
Medain = 29 m
Mode
The number with the highest occurance is 28 m
Hence, Mode = 28 m
GROUPED DATA
Data from exercise 4
Data Organization
Σf = 100 Σ(f.x)=
17070
Mean
Mean = 170.7 cm
Median
Let Lcbm = lower class boundary for the modal class
Cfbmc = Cumulative frequency of class before modal class
W = Class width = 8
fm = frequency of modal class = 42
Σf
− 𝐶𝑓𝑏𝑚𝑐
2
median = 𝐿𝑐𝑏𝑚 + ( )𝑊
𝑓𝑚
100
– 23
2
median = 165.5 + ( )8
42
median = 170.6 cm
Mode
modal class = Class with the highest frequency
Let ∆1 = difference between the frequency of the modal class and the class above it.
∆2 = difference between the frequency of the modal class and the class below it
∆1
mode = 𝐿𝑐𝑏𝑚 + ( )𝑊
∆1 − ∆ 2
42−18
mode = 165.5 + ( )8
(42−18) − (42−27)
mode = 170.4cm
Mean:
Significance: The mean is the average of all the values in a data set.
Use when: The mean is most appropriate when the data is approximately symmetrically
distributed and does not have extreme outliers. It is sensitive to extreme values and may not be
the best measure if the data set is skewed.
Median:
Significance: The median is the middle value in a data set when it is ordered. It is not affected
by extreme values (outliers) and is a measure of the central position.
Use when: The median is useful when the data set is skewed, has outliers, or is not normally
distributed. It provides a better representation of central tendency in such cases.
Mode:
Significance: The mode is the value(s) that occur most frequently in a data set.
Use when: The mode is suitable for categorical data or discrete data sets. It is also useful when
identifying the most common response or category is important. In some cases, a data set may
have no mode, or it may be multimodal (having more than one mode).
In summary:
Use the mean when the data is approximately normally distributed and there are no significant outliers.
Use the median when the data is skewed or contains outliers, providing a robust measure of central
tendency.
Use the mode when identifying the most frequently occurring category or value is essential, especially
for categorical or discrete data.