Module 3: Data Analysis and Interpretation
Module 3: Data Analysis and Interpretation
Interpretation
Module Overview
This module presents the discussions on the principles, concepts, and approaches in
central tendency and variation measures. It also introduces the principles employed in the
normal curve and the methods and interpretations of the skewness and kurtosis of the curves.
Motivation Question
How do the characteristics and significance of central tendency, variation, normal curve,
skewness, and kurtosis provide educational and philosophical insights into its application in
assessing learning outcomes?
Module Pretest
Instructions: Multiple Choice: Select the statement or phrase that best answers the given
statement. Write the letter only that corresponds to the chosen option.
1. The median of a, y, x, c, f, g, d is
a. y b. f
c. x d. y
2. The mean of ten numbers is 58. If one of the numbers is 40, what is the mean of the
other nine?
a. 40 b. 60
c. 50 d. 80
3. Which measure illustrates the distance between the highest to the lowest?
a. range b. variance
c. standard deviation d. mean deviation
4. Which measure of central tendency is most appropriate if your data is the marital
status?
a. Mean b. Median
c. Mode d. Percentile
5. Which is a median of a discrete data composed of 4, 6, 8, 10, 10, 15, 22, 25?
a. 9 b. 10.5
c. 10 d. 11
a. median b. P45
c. 3rd quarter d. mean
7. Which curve suggests a homogeneous cluster of students’ scores showing a very close
competition?
a. Platykurtic b. Leptokurtic
c. Mesokurtic d. Normal curve
8. When the items in a test are easy for most of the students, the curve is skewed
a. negatively b. to the right
c. positively d. no skewness
a. negative b. zero
c. positive d. undefined
Lesson Summary
Introduced in this lesson are the concepts and statistical principles employed in central
tendency and variation measures. It aims to scale up the knowledge and skills of education
students in the computation and the application of these measures to interpret and evaluate
learning outcomes.
Learning Outcomes
After completing the lesson, the students can:
1. Discuss the characteristics, uses, and limitations of each measure of central tendency.
2. Compute each measure of central tendency and interpret the results.
3. Determine the range, variance, and standard deviation of sets of scores.
4. Interpret the descriptive numerical measures of variation of the given set of data.
Motivation Question
How can we use our knowledge and skills in central location measures and variation in
interpreting and evaluating learning outcomes?
Discussion
The Measures of Central Tendency
The mean, median, and mode tend to lie centrally within a set of data. Thus, they are
called the measures of central tendency. The frequency distribution displays a normal curve or
symmetrical curve when the mean, median and mode are equal. However, when the mean,
median, and mode are not the same or equal, we have an asymmetrical shape that is either
skewed to the right or left. The mode corresponds to the maximum point or points on the curve.
The median corresponds to the vertical line, which divides the histogram into parts having equal
areas. In general, the mean surpasses the median in positively skewed distribution, whereas the
median exceeds the mean in a negatively skewed distribution.
Variables are characteristics of a certain object and can be nominal, ordinal, ratio, or
interval. Moreover, the true data measure can either be qualitative or quantitative. Further,
quantitative data can either be discrete or continuous. A contingency table taken from
Buenaflor (2012) is presented as a guide to determine the applicability in using average
measures.
Variable
Measure
Nominal Interval Ordinal
Discrete Mode Mean, Median, Mean, Median,
and Mode and Mode
Continuous (not possible) Mean, Median, Mean, Median,
and Mode and Mode
The table shows that the mode is the only measure applicable to both nominal and
discrete measures. Whereas, the mean, median, and the mode apply to both interval and
discrete and interval and continuous. Similarly, the mean, median and mode apply to both
ordinal and discrete and ordinal and continuous. But the table indicates that no average
measure is possible for nominal and continuous data. Taking this attribute independently in
assessing learning is without meaning and, thus, illogical.
The Mean
The mean is the most stable of all the measures of central tendencies. This measure
has the characteristics of being the center of all the observations concerning their values or
magnitude. In using this measure, the teacher or the assessor must examine the kind of
variable and the measure at hand. The mean is appropriately applicable to interval measures,
whether discrete or continuous, but not nominal variables. The mean of a set of data varies
depending on the data management used. Likewise, each data management corresponds to a
definite formula in getting the mean. We will focus our study on the measures of central
tendency using ungrouped data.
The ungrouped data may use either the arithmetic mean or the weighted mean. The
formula for the arithmetic mean is X́ = ∑X/N. On the other hand, the formula for the weighted
mean is X́ = ∑fx/∑f.
Example 3-1: Suppose your students’ scores in the first examination are the following: 99, 78,
85, 77, 86, 84, 80, 81, 82, 84, 79, 66, 88, 75. Find the mean.
Solution: X́ = ∑X/N
= (99 +78+85+77+86+84+80+81+82+84+79+66+88+75)/14
X́ = 1,144/14 = 82 (discrete).
Example 3-2: Find the mean of the number of eggs sold by 14 students which are: 20, 25, 18, 20,
20, 15 10, 14, 25, 18, 18, 19, 19, 20.
Solution:
X f fx
25 2 50 X́ = ∑fx/∑f = 261/14
20 4 80
19 2 38 X́ = 18.64
18 3 54 X́ = 19
15 1 15
14 1 14
10 1 10
∑f=14 ∑fx=261
Example 3-3: Suppose in the first semester Mr. Aguja’s grades on his subjects with
corresponding credit units are as follows:
Solution: X́ =
∑ XW =
2.0 ( 4 )+2.7 (3 )+ 1.6 (3 )+ 2.0 ( 4 )+2.5 (5) 8+8.1+ 4.8+8+12.5
=
∑W 4 +3+3+ 4+5 19
41.4
= = 2.18 or 2.2
19
The decision whether to round off the answer to a whole number depends on the kind of
the variable being measured. Example number 3.1 is on the number of eggs, which is a discrete
variable. Hence, the correct answer is 19 instead of 18.64 because there is no equivalent of
0.64 of an egg. If the measure stands for a continuous object, then the answer must be 18.64.
In the example, number 3.1, students’ scores are expressed in discrete data. Hence the mean
should be discrete, which is 82 instead of 81.71. In the example, number 3.3, students' grades
are expressed continuously, hence the correct answer is 2.2, a continuous data.
1. Existence of the mean. In means that you can always compute for the mean of any set
of numerical data.
2. The uniqueness of the mean. That signifies that there is one and only one mean for a set
of numerical data.
3. The means of several sets of data can be combined to form only the mean for all the
data.
4. In getting the mean, every value in the set of data is considered.
5. The mean is the most preferred measure of central tendency because it describes the
balance point of any distribution and uses all values in the data set.
6. The algebraic sum of the deviations of a set of numbers form the arithmetic mean is
zero.
For example, the mean of the numbers 6, 10, 14, 18 is 12. The deviations are as follows:
6–12 = -6; 10–12 = -2; 14–12 = 2; 18 –12 = 6. Therefore, the sum of the deviations: -6 – 2 + 2 +
6=0
The Median
~
The median ( X ¿ of a data set is described as middle score obtained after arranging the
data from the lowest value to the highest value (ascending order) assuming that the number of
cases or data or observations is odd. If the number of cases is even, the median is the average
of the two middlemost scores. In an interval data, the median does not require the measure or
weight of the element. What it requires is the ordinal and normal sequencing from highest to
lowest. For nominal variables, it requires the sequential pattern, for example days in a week and
months, as well as letters in the alphabet, and others.
The median is considered the appropriate measure for nominal data. It only considers
the elements as having equal values and is equidistant from highest to lowest extremes after
their proper ordinal or normal sequential arrangement. It is the central point of a line of all
measures in an ordinal arrangement. This value corresponds to a central point between the
upper 50% and the lower 50% of all measures. This central point is calculated as central point
(cp)= (n+1)/2.
To find the median of ungrouped data, we first arrange the highest to lowest values or
vice versa. Then we pick the middle value when the number of values (N) is odd. However, if N
is even, we add the middle scores and divide the sum by two.
Example 3-4: Suppose the scores of 15 students in a ten-item test are as follows:
6 9 7 10 5 7 4 8 6 3 2 6 9 1 6
Solution:
First you arrange the scores from the lowest value to the highest value, hence we have:
10 9 9 8 7 7 6 6 6 6 5 4 3 2 1
Since there are 15 values (odd number) and the middle value which is n of 15 fall at the
8 number, therefore the second 6 is the median.
th
Example 3-5: Suppose instead of 15 students, 16 took the test and the 16 th student got the
score one (1). Hence, we have.
10 9 9 8 7 7 6 6 6 6 5 4 3 2 1 1
Since there are 16 values (even number), then the median is number falling at the
(n+1)/2, which is (16+1)/2=8.5. Thus, the median is between the second and the third 6s as it is
the number falling at the 8.5 location in the series. The median, being discrete, is 6. However, if
you consider the value as continuous, the median is 6.5 (lower limit of 6 which is 5.5 +1).
Notice that a problem may arise in items 2 and 3 as to how the median is computed.
First, you determine if the data is discrete or continuous. For example, in item number 2, if the
measure is discrete, then the median is located at (n+1)/2 or (7+1)/2 = 4 th number in the series;
hence it is 9. But if the data is continuous, the median is 9.5, which is the lower limit of 9 (8.5)
+1.
Similarly, in problem number 3, if the measure is discrete, the median is found at the
(n+1)/2 = (6+1)/2 =3.5 location. That is, the median is found between the first two of the
number 7. Thus, being discrete, the median is 7.
If the value is continuous, the median is computed by getting the upper and lower real
limits falling at the middlemost series within the three 7s, which stands 1/3 of the three 7s. The
lower and upper limits of these three similar numbers are 6.5 and 7.5, respectively. The 1/3 or
0.33 is to be added to the lower limit of 6.5 using interpolation, where the result is the median of
6.83. Another way of solving is by using the upper limit of 7.5, where the median is between the
2nd and the 3rd value of 7, counted from the upper measures. The 2/3 or 0.67 is subtracted
from the upper limit of 7.5 applying interpolation, resulting in 6.83; thus, the median is 6.83.
The Mode
The mode is the measure of central tendency that does not need any calculation. You
have to pick the value in the set of data that appears most frequently. In a set of data, if there is
one mode, we call the set a unimodal. If there are two modes, we describe the set as bimodal.
When there are three modes, we label it as trimodal. In general, if there is more than one mode,
the set is named as multimodal.
Measures of Variation
The sets of data vary to a certain extent. Though two sets have the same mean, still the
spread of the scores vary in some way.
Example 3-10: the scores of two classes in a test in Prof. Ed. Subject.
Class A 2 28 35 33 44 35 25 26 29 28 45
5
Class B 2 29 35 55 40 28 20 26 23 29 40
8
The scores of both the two classes in Prof. Ed. subject have means of both 32.09.
However, if you look closely, it seems that Class B scores are more dispersed than the scores in
Class A.
1. The Range
The range (R) is the simplest and easiest measure of dispersion. It is computed as the
difference between the highest and the lowest values of the observations. The bigger the
range's value, the wider the gaps between the values that indicate the more varied the numbers
are. A small value of the range implies a more uniform set of data. However, it does not tell
anything between the highest and lowest values of the observations; hence, it is considered the
least satisfactory dispersion measure.
To find the range of ungrouped data using the data in Example 3-10 on the scores in Prof.
Ed. of the students in Class A, we have:
σ 2=
∑ (x− x́)2
n
However, when the number of values is not too large the following formula is usually
preferred:
σ =
2 ∑ ( x− x́ )2
n−1
If the observations or scores are quite far from the mean, the variance would be large.
Thus, one can say that there is more variability in the data set. If all observations or scores are
the same, it is zero. It means that there is no variability at all in the data set. On the other hand,
if the scores are not all equal but are very close to the mean, it has a small value indicating less
spread or variability in the data set. If the observations or scores are quite far from the mean,
the variance would be large. Thus, one can say that there is more variability in the data set.
Example 3-11: The scores of the students in Algebra are 33, 23, 40, 44, 15, and 25. Compute for
the variance.
Solution:
X X- X́ ( X − X́ )2
33 3 9
23 -7 49
40 10 100
44 14 196
15 -15 225
25 -5 25
X́ =30 ∑( X − X́ )2=604
n=6
Hence,
σ 2=
∑ ( x− x́ )2 = 604
= 120.8
n−1 6−1
Example 3-12: Susan and Lita obtained the following scores in the various quizzes in Statistics.
Compute the variances of their scores.
Susan 50 45 60 50 75
Lita 60 55 56 49 60
Susan: σ 2=
∑ ( x− x́ )2 = ¿ ¿
n−1
36+121+16+36+361 570
= = = 142.5
4 4
Lita: σ 2 = ∑ ¿¿ ¿ = ¿ ¿
16+1+0+ 49+16
= = 20.5
4
3. The Standard Deviation
The commonly used measure of variation is the standard deviation (sd). The standard
deviation value tells how closely the data set values are clustered around the mean at a uniform
distance. In general, a lower value of the standard deviation for a set of data indicates that the
range of the spread of the observations around the mean is relatively small. On the other hand, a
large value of the standard deviation indicates that the data set's values are scattered over a
relatively wider range around the mean. Moreover, unlike the range, the standard deviation
involves all observations in the distribution. Hence, it is considered the most accurate measure
of dispersion.
The standard deviation denoted by sd is sometimes called the root mean square
because it is obtained by taking the positive square root of the variance calculated for
population data. When the number of observations is small, the standard deviation is obtained
using the formula:
n
sd= √ ∑ ¿¿¿¿
i=1
; where x́ = mean of the data
x = individual observation
n = total number of observations
However, in actual practice, when the sample size is less than 50, the denominator
used is n-1 instead of n.
Example 3-13: Let us compute the standard deviation of the scores obtained by six students in
Algebra (Please refer to Example 3-11).
Solution:
X X- X́ ( X − X́ )2
33 3 9
23 -7 49
40 10 100
44 14 196
15 -15 225
25 -5 25
X́ =30 ∑( X − X́ )2=604
604
sd = √∑ ¿ ¿ ¿ ¿ = √ 6−1
= 10.99
Example 3-14: For our example, let us consider the scores obtained by Susan and Lita in the
various quizzes in Statistics (Please refer to Example 3-12). Let us also
determine who of the two is more consistent in their performance?
Susan 50 45 60 50 75
Lita 60 55 56 49 60
Solution:
a)Susan:
50+45+60+ 50+75
x́ = = 56
5
sd = √∑ ¿ ¿ ¿ ¿ = √ ¿ ¿ ¿
36+ 121+ 16+36+361
sd =
√ 4
= √ 142.5 = 11.94
b) Lita:
60+55+56+ 49+60
x́ = = 56
5
sd = √∑ ¿ ¿ ¿ ¿ = √ ¿ ¿ ¿
16+ 1+ 0+49+ 16
sd =
√ 4
= √ 20.5 = 4.53
Both Susan and Lita have the same mean scores of 56. To find out who is more
consistent in her performance, we computed the standard deviation. A standard deviation of
11.94 units denotes that most of the scores are found within 11.94 units from each side of the
mean. Similarly, a standard deviation of 4.53 means most of the scores are located 4.53 units
from each side of the mean. Since Lita’s scores have a lower standard deviation compared to
Susan’s scores, it means that Lita’s scores are closer to the mean. Therefore, it can be
concluded that Lita’s performance is more consistent than Susan’s performance.
Learning Tasks/Activities
Activity 1. Be sure to allocate time to read and comprehend the contents in the lesson. Once
you have completed your readings, you will make a reflection notes consisting of
summarizing the significant learnings that you get and the insights, reflection, and
your views. The reflection notes should be submitted at the end of the lesson.
Reflection Notes
I learned that . . .
Activity 2: A student’s final grades in ComSci 12, Mathematics, Statistics, Prof. Ed. 11, English,
Chemistry, Physics, Biology, and Earth Science are respectively 90, 83, 85, 88, 85, 81,
83, 80, and 85. The respective credit units for these courses are 4, 3, 3, 3, 3, 4, 5, 4,
and 3, respectively.
a. Compute for the weighted mean grade of the student?
b. If they have the same credit units of 3, what is the student’s mean?
c. What is the modal score? Justify your answer.
d. What is the median score?
e. If you draw a curve of the distribution of the scores, is the curve negatively
skewed or positively skewed or symmetrical? If so, what does it mean?
f. What is the range of the data?
g. Compute for the standard deviation of the scores. Interpret the results.
Assessment
Direction: Answer/Do as directed.
A. Multiple Choice. Select the best answer. Write the letter only that corresponds to the
answer that you have chosen.
1. What is the appropriate measure of central tendency to use when you refer to the
majority frequency of occupants categorized as male or female?
a. mean b. median c. mode
2. Which frequency distribution results to a curve that is skewed to the right?
~ ~ ~
a. X́ = X b. X́ > X c. X́ = X
3. Which measure is applicable to letters in the alphabet measured as nominal and
discrete?
a. mean b. median c. mode
4. The scores in a test obtained a mean of 34 and a standard deviation of 6.2 denotes
a. Most of the scores are found within 6.2 units from each side of the score of 34.
b. Most of the scores are found within 6.2 units below the score of 34.
c. Most of the scores are found within 6.2 units above the score of 34.
5. John and Peter obtained the same mean scores for their midterm performance in
the different subjects. If John’s scores have a standard deviation of 4.2 and Peter
standard deviation is 7.2, who is more consistent in his performance?
a. John b. Peter c. Both
6. P50 of the scores 4, 7, 8, 9, 12 is in what central location?
a. mean b. median c. mode
7. If Tina obtained a percentile rank of 92% in the LET. What does the percentile rank
of 92% means?
a. 92% of all the examinees have scores below Tina’s score.
b. 92% of all the examinees have scores above Tina’s score.
c. 8% of all the examinees have scores below Tina’s scores.
8. If the variability of the scores is “big enough” indicates
a. Some of the students are fast learners and some are slow learners.
b. Most of the students are slow learners.
c. Most of the students are fast learners.
9. Which is a median of the nominal data from c to m?
a. g b. h c. k
10. In a yes or no response to an issue asked, yes occurred more frequently than no;
hence, we say that yes is the ____________ score
a. Mean b. Median c. Modal
B. Modified True or False. Write T if the statement is true and change the underlined word
if the statement is false.
1. A student with a percentile rank of 75% demonstrates that he stands at a point below
25% and above 75% of the 100% score distribution.
2. If the mean is greater than the median, the distribution is negatively skewed.
3. The median corresponds to the 4th decile.
4. The deviations of the numbers 21, 2, -4, 5, and -15 to 7 is -22.
5. If the arithmetic mean is 12 and number of observations are 20, then the sum of all
values is 240.
6. A value of the range shows the number of values between the highest and lowest
scores.
7. A small value of the variance indicates scores that are very near to the mean.
8. The median of the continuous scores 4, 8, 10, 14, 20, 21, 22 is 14.5.
9. In a symmetrical curve, the measures of central tendencies are equal.
10. There is no median in the nominal data of a, c, f and g.
C. Briefly but substantially answer the following problems in your own words.
1. What do you understand about the terms “median” and ‘mode?”
2. How many deciles and quartiles are there in a median? Justify your answer.
3. What do you understand about the term “variation”?