Central Tendency
Central Tendency
The term Central tendency is used to represent a middle value of a data distribution. Three types of
middle values are used which are mean, mode and median and each serve a distinct purpose.
The Mean, which is the arithmetic average of the data set, divides the data set into two halves of
aggregated values. The mean is used to get an idea of the average level of the parameter being studied.
It however gives no idea of the data variation or dispersion and for that reason can sometimes be
misleading. A sample mean is often studied to get an idea of the entire population, although as we will
study later, this will be subject to sampling errors, which also need to be evaluated when extrapolating the
sample mean to the larger population. For example, we can conduct a study in street to get the average
height of persons living there and get an average height of 165 cms. Supposing we were to make a
statement that the global average of height for human beings is 165 cms. Would this be a correct
statement. Obviously no. Why? Because the sample taken, a street in a town is not representative of the
global population, If we take similar samples in a Caucasian society, or in the Hispanic society the
averages may be completely different. Drawing inferences from mean therefore need caution, and the
user should be well conversant with the data set and what it represents.
The Mode is a central tendency that represents a category or class interval which displays the highest
occurrence in a data set distribution. The mode may not always be in the middle and we often see
distributions (histograms) where the mode is shifted to the left or right of the total range. In extreme
cases, the mode may be segregated at the left most or right most category of the range. We are
interested to know the mode so that we can quickly focus our attention to the group or category that
shows the highest occurrence in relation to others.
The Median is the third central tendency that divides the entire data set into two halves by occurrence
th
(not aggregated values). Thus is we have a data set of 110 parts, the median denotes the 55 part or if
the dara was divided into class intervals, say 10 each, the median class shall be represented by
th
whichever class contains the 55 part. The median (and its extensions, quartiles, percentiles) divide the
population in to segments. It is not necessary that the median class shall be in the center. In a classroom
th
having 40 children, the 50 percentile (or median) may be in the 70 -80 % marks range whereas the
entire range may start from 40 % and go upto 100 %.
Let us now see how we calculate all three central tendencies from a data set:
Page1
MEAN VALUE : x
The mean value or mean is the simple arithmetic average of the total of the sample values.
Ungrouped data
Direct calculation
Example : Five samples are taken from a lot of pins. The length of each sample is measured (units =
mm), with the results shown in Table-1a.
Table 1 Table 1a
x x1 x2 ... xn 260. 68
_ x 260.68
X = 52.136(mm)
n 5
Page2
Calculation using Data Transformation
In this case, an amount a (52) is subtracted from each measured value to make a smaller number, which
is multiplied by b (100) to eliminate decimal places. The formulas are modified accordingly.
STEP 1: Transform each of the measured values and arrange them as shown in Table -2.
1 x1 u1 1 52.14 14
2 x2 u2
2 52.03 3
. . .
. . . 3 52.10 10
. . . 4 52.25 25
n xn un
5 52.16 16
Total x Total - 68
Table -2 Table - 2a
STEP 2 :
_ u 68
u = 13. 6
n 5
STEP 3:
_ _ 1
x u a ; a = 52 ; b= 100
b
_ 1
x 13. 6 52 = 52.136 (mm)
100
MODE VALUE : X
The mode is the most frequent score in a data set. On a histogram it represents the highest bar in a bar
chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
An example of a mode is given below. Normally, the mode is used for categorical data where we wish to
know which is the most common category
Page3
Example: A data set has the following values:
N=16: 33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45
The values are plotted against their frequency on a histogram. As we see on the histogram, the value 39
appears 4 times, which is the maxium in comparison to other values.
The mode = 39
4
Frequency
0
33 34 35 36 37 38 39 40 41 42 43 44 45
Score
We can have data sets where two values or class intervals indicate the highest frequency as shown in
the figure below. The most probable cause for the same is the data may be mixed from two sources, each
having similar distribution, but in different ranges.
Page4
In data distribitions that have rectangular (random) distribution as in the figure below, there may be no
mode. Slight variations in frequency in such distributions should not be mistaken for a mode.
MEDIAN VALUE : X
The median value is found by arranging the data values in order and determining the middle value. It is
some times used in place of the mean.
Page5
For an Odd Number of Values
~
Example : Determine the value of median x from the data give in Table l a.
Tabe 1 a
_
STEP 2: The value at the center is taken as the median value. x 52.14
STEP 2 : The two central values are determined, and their average is taken as the median.
~ 52.14 52.16
x = 52.15 (mm)
2
Page6
Data dispersion
STEP 1: Determine the maximum value xmax and the minimum value xmin among the measured
values.
VARIANCE
The variance and standard deviation are two measures of variability that indicate how much the scores
are spread out around the mean.
The population variance is the true or actual variance of the population of scores.
2
where σ is the variance, μ is the mean, and n is the population size
The sample variance is the average of the squared deviations of scores around the sample mean
If the variance in a sample is used to estimate the variance in a population, then the previous
formula underestimates the variance and the following formula should be used:
s2 =
Page7
2
where s is the estimate of the variance and is the sample mean. Note that is the mean of a sample
taken from a population with a mean of μ. Since, in practice, the variance is usually computed in a
sample, this formula is most often used.
There are alternate formulas that can be easier to use if you are doing manual calculations. You should
note that these formulas are subject to rounding error if your values are very large and/or you have an
extremely large number of observations.
2 2
σ = and s =
STEP 1: The measured value x and the square of the measured value x 2 are shown in Table -3.
No x x2 No x x2
1 x1 x12 1 52.14 2.718.5796
2 x2 x 22 2 52.03 2.707.1209
. . . 3 52.10 2.714.4100
. . . 4 52.25 2.730.0625
. . . 5 52.16 2.720.6656
n xn
x n2
Table -3 Table 3a
( x 2 ) ( 260. 68)2
S = x2 - S = 13,590.8386 - = 0.02612
n 5
STEP 3: Calculate the variance (V)
S ( 0. 02612)
v= v = = 0.00653
( n 1) (5 1)
Page8
STEP 1: Transform each of the measured values and arrange them as shown in Table 4
1 2 ( u ) 2 1 (68)2
S= u = 1,186 = 0.02612
b2 n (100)2 5
No x u = (x - a) x x2 No. x u= u2
b (x -52)
x 100
2 x2 u2 u22 2 52.03 3 9
. . . . 3 52.10 10 100
. . . . 4 52.25 25 625
. . . . 5 52.16 16 256
n xn un un2
S ( 0. 02612)
v= = = 0.00653 (mm)
( n 1) (5 1)
STANDARD DEVIATION:
The standard deviation indicates the “average deviation” from the mean, the consistency in the scores,
and how far scores are spread out around the mean
The sample standard deviation is the square root of the sample variance. It is denoted by s.
The population standard deviation is the true or actual standard deviation of the population of scores. It is
denoted by σ.
The standard deviation is an especially useful measure of variability when the distribution is normal or
approximately normal because the proportion of the distribution within a given number of standard
deviations from the mean can be calculated.
Page9
In a normal distribution, 68% of the distribution lies within plus minus one standard deviation of the mean,
approximately 95% of the distribution lies within plus minus two standard deviations of the mean and
99.73 of the population lies within plus minus three standard deviations of the mean
Make a frequency table with f, u, uf, and u 2 f columns on its right side.
Example : The lengths of eighty component samples were measured (unit : mm) and the frequency of
the various measurements was recorded on the frequency table (Table -5).
Total 80 - 6 242
_
Table 5: Frequency Table for Calculation of x and
Page10
STEP 2: Filling in the u column
u = 0 is an expression for the center of the distribution, which corresponds to the class with the highest
frequency f. Classes higher than this median value have ascending u values of 1, 2, 3, etc,; classes
lower than this median value have descending u values of -1, -2, -3, etc. In this case, set the value of u at
class no 5, which appears to be near the center of the distribution, to 0. On the basis of this value of u =
0 for class 5, assign u = 1, 2, 3, etc., for classes 6, 7, 8, etc., and u = -1, -2, -3, for classes 4, 3, 2, etc.
Calculate the product of u and f and write it in the u f column. Determine the total of u f = uf. For
example :
No. 1: uf = 2 x (-4) = -8; No. 2: uf = 4 x (-3) = -12 and so on, to the result uf = 6
Calculate the product of u and uf and write it in the u 2 f column. Determine the summation (total) of u 2 f =
u2 f . For example :
No. 1: u 2 f =(-4) x (-8) = 32; No. 2: u 2 f = (-3) x (-12) =36 and so on, to the result u 2 f = 242
STEP 5: Calculating the average value x
6
x 29. 95 0. 2 , = 29.95 + 0.015, = 29.965 (mm)
80
( uf )2 62
u2 f 242 = 0.2 3.0576 = 0.2 1.749
sh n = s 0. 2 80
N 1 80 1
0. 350( mm)
Page11