0% found this document useful (0 votes)

104 views26 pages

Lecture Notes 2 Data Organization and Presentation

This document discusses various methods for organizing and presenting quantitative data, including graphical and numerical summaries. It provides examples of different data visualization techniques like histograms, bar charts, pie charts, and stem-and-leaf diagrams. Guidelines are given for constructing frequency tables and distributions from raw data, as well as rules for designing clear and informative tables and graphs.

Uploaded by

Enock T Muchinako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views26 pages

Lecture Notes 2 Data Organization and Presentation

Uploaded by

Enock T Muchinako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

NATIONAL DIPLOMA IN QUANTITY SURVEYING-575/15/TN/0

SUBJECT TITLE: STATISTICS

SUBJECT CODE: 575/15/SO6

DATA ORGANIZATION AND PRESENTATION

When data is collected, it is raw data ie ungrouped, muddled, cumbersome uninteresting so it
needs to be summarised
Appropriate ways to summarise these data
 Graphical Summary
 Numerical Summary

Graphical Summary
Thus in form of tables, tree diagrams, stem and leaf, bar charts, pie charts,
pictographs, line graphs, histograms, frequency distribution curves, ogives etc

Why do we summarise data?

-The first step in any analysis is to describe and summarize the data
-to reduce data
-to conserve storage space
-to order data
-to see the salient features of the data
-to become familiar with the data
-to look for unusually high or low values (outliers)
- to check the assumptions required for statistical tests
-to decide the best way to categorize the data if this is necessary
-In addition to tables and graphs, summary values are a convenient way to summarize
large amounts of information.

We shall describe and give examples of qualitative data (unordered and ordered) and
quantitative data (discrete and continuous); how these types of data can be represented
figuratively; the two important features of a quantitative dataset (location and
variability); the measures of location (mean, median and mode); the measures of
variability (range, interquartile range, standard deviation and variance)

How to design a table after collecting data?

a) layout rows and columns
b) content of cells is created by rows and columns
c) Annotation (footnotes should be used to qualify or clarify the table)
d) It should be simple
e) Source of data must be stated
f) Units of measurement must be clearly stated
General rules for presentation of graphs
a) Short and informative title (clear and comprehensive title)
b) Correct impression must be given
c) Units of measurement must be shown
Example 1
1. Cross table
Mean number of students per class in Civil Engineering department
Course levels
NC ND HND
Quantity Surveying 24 22 10
Water Resources Engineering 25 18 9
Civil Engineering 30 25 10

Table 1

2. Stem-and-Leaf Diagrams
A stem-and-leaf diagram has the advantage of retaining the data in its original form, but
providing a visual representation. Illustrated below is the age distribution of some adults
aspiring for presidential candidate. In this case, the stem, the tens portion of the president's
age, is given on the left, and the leaf, the units portion of the president's age, is given on the
right.
Example 2
Data collected for the age distribution for 43 presidential candidates is as follows 42,
43,46,46,47,48,49,49,50,51,51,51,51,51,52,52,54,54,54,54,54,55,55,55,55,56,56,56,57,57,57,
57,58, 60,61,61,61,62,64,64,65,68,69
Stem Leaf
4|23667899
5|0111112244444555566677778
6|0111244589
Or
Reformatting the above with more rows (called by some books splitting the stem) emphasizes
even more its normally distributed nature. Notice how the stem-and-leaf diagram is also
somewhat like a histogram, but turned on its side.
Stem leaf
4|23
4|667899
5|0111112244444
5|555566677778
6|0111244
6|589

Please note that the separation line should be continuous. The following rules should be
observed when constructing stem-and-leaf diagrams.
1. The leaves on the right should be in increasing (or decreasing) order, left to right.
2. No commas should appear on the right.
3. No horizontal lines should appear.
4. If the stem/leaf break occurs at a decimal point, put the decimal point to the left with
the stem.
5. If the leaf is double or triple digit, etc., leave a [half] space between each entry.
6. There should be at least five but no more than twenty rows.
7. If a range is used for the stem, an asterisk (*) may be used to separate the
corresponding leaves.
Example 3
The number of rooms in each of 40 houses in a particular street is given by the
following set of data:

5 6 4 3 3 6 6 4 5 4 7 8 3 5 4 4 4 8 8 3 5 5 6 5 7
4 6 5 4 3 3 4 5 5 4 7 6 10 9 8
-now for the information to be manageable, we divide it into groups and form a
frequency table
-the recording is called tally
-normally if we have little data we array(re-arrange) it in order of size
3. Frequency Tables or Distributions

A frequency table lists in one column the data categories or classes and
in another column the corresponding frequencies.

Score limits (class limits) are the largest or smallest numbers which can actually belong to each class.
Class interval (class width) is the difference between two exact limits (class boundaries) (or
corresponding score/class limits).
Guidelines for constructing frequency tables.
1. The classes must be "mutually exclusive"—no element can belong to more than one class.
2. Even if the frequency is zero, include each and every class.
3. Make all classes the same width. (However, open ended classes may be inevitable.)
4. Target between 5 and 20 classes, depending on the range and number of data points.
5. Keep the limits as simple and as convenient as possible (multiple of width?).
6. If practical, make the width odd so that the interval midpoint is a whole number.

3. Frequency distribution Table for the number of rooms in each of 40 houses

Number of rooms Tally Frequency (fi)
3 IIIII I 6
4 IIIII IIIII 10
5 IIIII IIII 9
6 IIIII I 6
7 III 3
8 IIII 4
9 I 1
10 I 1

4. Bar Chart
Data represented as a series of bars, height of bar proportional to frequency
Bar graph for the number of rooms in each of 40 houses

number of rooms
12

0
3 4 5 6 7 8 9 10

frequency
5. Line graph for the number of rooms in each of 40 houses

rooms
12

0
3 4 5 6 7 8 9 10

frequency

6. Pie chart
- Data represented as a circle divided into segments, area of segment proportional to
frequency.
-a pie chart is a circle divided by radial lines into sections so that the area of each
section is proportional to the size of figure represented

Pie chart for number of rooms in each of 40 houses

houses

3 4 5 6 7 8 9 10

7. Histogram
-a bar chart for a continuous distribution is referred to as a histogram
-Similar to a bar chart Continuous, not categorical variable
-Area of bars proportional to probability of observation being in that bar -Axis can be
 Frequency (heights add up to n)
 Percentage (heights add up to 100%)
 Density (Areas add up to 1)
Example 4
From the frequency table below which shows the number of days technologists
spends to complete a certain project, construct a histogram
Number of days Tally mark Frequency
0-4 II 2
5-9 IIIII IIIII IIIII 15
10-14 IIIII IIIII IIIII IIIII I 21
15-19 IIIII IIIII IIIII III 18
20-24 IIIII IIIII IIII 14
25-29 IIIII IIIII III 13
30-34 IIIII IIII 9
35-39 IIIII 5
40-44 II 2
45-49 I 1

- When class intervals are equal, a histogram can be constructed straight away from
the given data(drawn manually)
8. Frequency curve
Procedure
-Mark the midpoints of the tops of each bar on a histogram
-join the points with straight lines then smoothen to form a curve

9. Ogive
-graph drawn from a cumulative frequency distribution [ALWAYS USE A GRAPH
PAPER]
Procedure
 Compute cumulative frequencies of the distribution
 Prepare a graph with the horizontal axis and with the cumulative frequency on the
vertical axis
 Starting point should be zero
 Plot cumulative frequency on a graph at the upper class
Example 5
Using the data for the example of number of rooms in each of 40 houses, construct a
cumulative frequency graph (ogive)(less than ogive)

Cumulative frequency table for the number of rooms in each of 40 houses

Number of rooms Cumulative frequency
≤3 6
≤4 16
≤5 25
≤6 31
≤7 34
≤8 38
≤9 39
≤ 10 40

Draw a Cumulative frequency curve

Exercise
1. The data below shows the age distribution of a small village
Age (yrs) Frequency 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Frequency density= 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
0-14 18 1.2
15-19 21 4.2
20-24 38 7.6
25-34 41 4.1
35-44 38 3.8
45-59 15 1
60+ 20 2

Draw a histogram to represent this information stating any assumptions you make
2. Table below shows the distribution of skills offered by a construction company
Skill % available
Survey 12
Billing 20
Building 26
Plumbing 32
Civil works 10
Represent this information in a pie chart

 NB: When given raw data you have to make a choice of classes and
 Classes should be below ten if possible
 Wherever practical, class intervals should be equal
 Class intervals of 5 to 10 are more convenient
Classes should be chosen in such a way that occurrences within the classes tend to
balance around the midpoints of the classes
Numerical Statistics
-these are means, mode, median, standard deviation, interquartile range, percentiles, quartiles,
variance

1. Measures of Central Tendency

A measure of central tendency is a single value that describes the way in which a group of
data cluster around a central value. To put in other words, it is a way to describe the center of
a data set. There are three measures of central tendency: the mean, the median, and the mode.

It is a single value that attempts to describe a set of data by identifying the central position
within that set of data. As such, measures of central tendency are sometimes called measures
of central location. They are also classed as summary statistics. The mean (often called the
average) is most likely the measure of central tendency that you are most familiar with, but
there are others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under different
conditions, some measures of central tendency become more appropriate to use than others.
In the following sections, we will look at the mean, mode and median, and learn how to
calculate them and under what conditions they are most appropriate to be used.

Mean (Arithmetic)

The mean (or average) is the most popular and well known measure of central tendency. It
can be used with both discrete and continuous data, although its use is most often with
continuous data. The mean is equal to the sum of all the values in the data set divided by the
number of values in the data set. So, if we have n values in a data set and they have values x1,
x2, ..., xn, the sample mean, usually denoted by (pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter,
, pronounced "sigma", which means "sum of...":
The above formula refers to the sample mean. This is because, in statistics, samples and
populations have very different meanings and these differences are very important, even if, in
the case of the mean, they are calculated in the same way. To acknowledge that we are
calculating the population mean and not the sample mean, we use the Greek lower case letter
"mu", denoted as µ:

The mean is essentially a model of your data set. It is the value that is most common. You
will notice, however, that the mean is not often one of the actual values that you have
observed in your data set.

However, one of its important properties is that it minimises error in the prediction of any
one value in your data set. That is, it is the value that produces the lowest amount of error
from all other values in the data set.

An important property of the mean is that it includes every value in your data set as part of
the calculation. In addition, the mean is the only measure of central tendency where the sum
of the deviations of each value from the mean is always zero.

One main disadvantage of mean: it is particularly susceptible to the influence of outliers.

These are values that are unusual compared to the rest of the data set by being especially
small or large in numerical value.

For example, consider the wages of staff at a factory below:(mean for ungrouped data)

Staff 1 2 3 4 5 6 7 8 9 10
Salary($) 15 18 16 14 15 15 12 17 90 95

The mean salary for these ten staff is $30.7. However, inspecting the raw data suggests that
this mean value might not be the best way to accurately reflect the typical salary of a worker,
as most workers have salaries in the $12 to $18 range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.

Exercise

Calculate the mean for the data below

60, 72, 61, 66, 63, 66, 59, 64, 71, 68.

Example

Mean for grouped data

∑ 𝑓𝑥
𝑥̅ =
∑𝑓

The heights of boys in class are measured to the nearest cm and the results are tabulated as
follows

Height cm Frequency (f) Midpoints (x) fx

145-154.9 3 150 450
155-164.9 9 160 1440
165-174.9 21 170 3570
175-184.9 13 180 2340
185-194.9 4 190 760
∑ 𝑓 = 50 ∑ 𝑓𝑥 = 8560

8560
𝑥̅ = = 171.2
50

The data below shows the age distribution of a small village, find the mean for the data?
Age (yrs) Frequency Midpoints (x) fx
0-14 18 7
15-19 21 17
20-24 38 42
25-34 41 28.5
35-44 38 38.5
45-59 15 52
60-69 20 64.5
Median
The median is the central value when all observations are sorted in order.

-If there is an odd number of observations, then it is simply the middle value; if there is an
even number of observations then it is the average of the middle two.

-The median does not have the beneficial mathematical properties of the mean.

-However, it is not generally influenced by extreme values (outliers), and as a result it is

particularly useful in situations where there are unusually low or high values that would
render the mean unrepresentative of the data.

-The median is the middle score for a set of data that has been arranged in order of
magnitude.

-The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:

Example (ungrouped data)

65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle
mark because there are 5 scores before it and 5 scores after it. This works fine when you have
an odd number of scores, but what happens when you have an even number of scores? What
if you had only 10 scores? Well, you simply have to take the middle two scores and average
the result. So, if we look at the example below:

65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89

Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.
Example (grouped data)

𝑐𝑚 (12𝑛−𝑓𝑚−1 )
Median for grouped data = 𝑙𝑚 +
𝑓𝑚

Where;

𝑙𝑚 = 𝑙𝑜𝑤𝑒𝑟 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦

𝑐𝑚 = 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

𝑓𝑚−1 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑖𝑚𝑚𝑒𝑑𝑖𝑎𝑡𝑒𝑙𝑦 𝑏𝑒𝑙𝑜𝑤 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

Calculate the median for the grouped data on heights of boys in class.

Mode
The mode is simply the most commonly occurring value in the data. It is not generally used
because it is often not representative of the data, particularly when the dataset is small.

The mode is the most frequent score in our data set. On a histogram it represents the highest
bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the
most popular option.

For example of a mode is presented below: what is the modal value in the data set below?

i) 1, 2, 3, 4, 100 mode does not exist

ii) 12,16, 8, 11, 12, 8, 2, 8, 1, 14 mode is 8
iii) 12,16, 8, 11, 12, 8, 2, 8, 1, 14, 12 mode is 8 and 12-bimodal set

Normally, the mode is used for categorical data where we wish to know which is the most
common category, as illustrated below on forms of transport used by students to come to
college:
We can see above that the most common form of transport, in this particular data set, is the
bus. However, one of the problems with the mode is that it is not unique, so it leaves us with
problems when we have two or more values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This is
particularly problematic when we have continuous data because we are more likely not to
have any one value that is more frequent than the other. For example, consider measuring 30
peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people
with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many
people might be close, but with such a small sample (30 people) and a large range of possible
weights, you are unlikely to find two people with exactly the same weight; that is, to the
nearest 0.1 kg. This is why the mode is very rarely used with continuous data.

Another problem with the mode is that it will not provide us with a very good measure of
central tendency when the most common mark is far away from the rest of the data in the data
set, as depicted in the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that the mode
is not representative of the data, which is mostly concentrated around the 20 to 30 value
range. To use the mode to describe the central tendency of this data set would be misleading.

Summary of when to use the mean, median and mode

Please use the following summary table to know what the best measure of central tendency is
with respect to the different types of variable.

Type of Variable Best measure of central tendency

Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median
Advantages and Disadvantages of Measures of Central Tendency
NOT TO BE EXAMINED

Geometric Mean
It is defined as the arithmetic mean of the values taken on a log scale. It is also expressed as
the nth root of the product of an observation.

GM is an appropriate measure when values change exponentially and in case of skewed

distribution that can be made symmetrical by a log transformation. GM is more commonly
used in microbiological and serological research. One important disadvantage of GM is that it
cannot be used if any of the values are zero or negative.

Harmonic mean
It is the reciprocal of the arithmetic mean of the observations.

Alternatively, the reciprocal of HM is the mean of reciprocals of individual observations.

HM is appropriate in situations where the reciprocals of values are more useful. HM is used
when we want to determine the average sample size of a number of groups, each of which has
a different sample size.

Skewness: Measure of Asymmetry

The skewed and askew are widely used terminologies that refer to something that is out of
order or distorted on one side. Similarly, when referring to the shape of frequency
distributions or probability distributions, the term skewness also refers to asymmetry of that
distribution. A distribution with an asymmetric tail extending out to the right is referred to as
“positively skewed” or “skewed to the right”, while a distribution with an asymmetric tail
extending out to the left is referred to as “negatively skewed” or “skewed to the left”. The
range of skewness is from minus infinity (−∞ ) to positive infinity (+∞ ). In simple words
skewness (asymmetry) is measure of symmetry or in other words skewness is the lack of
symmetry.
Karl Pearson (1857-1936) first suggested measuring skewness by standardizing the difference
(𝝁−𝒎𝒐𝒅𝒆)
between the mean and the mode, such that, skewness = 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧𝐬
Since, population modes are not well estimated from sample modes, therefore it was
suggested that one can estimate the difference between the mean and the mode as being three
times the difference between the mean and the median. Therefore, the estimate of skewness
𝟑(𝑴𝒆𝒂𝒏−𝒎𝒆𝒅𝒊𝒂𝒏)
will be: skewness = 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧
It is important for researchers from the behavioral and business sciences to measure skewness
when it appears in their data. Great amount of skewness may motivate the researcher to
investigate the existence of outliers. When making decisions about which measure of location
to report and which inferential statistic to employ, one should take into consideration the
estimated skewness of the population. Normal distributions have zero skewness.

Shape of the Distribution: Symmetry and

Skewness
Skewness is the degree of asymmetry or departure from symmetry of the distribution of a real
valued random variable

It is important to get a sense of the symmetry or skewness of the data to see whether
the distribution is fairly normal of balanced OR its skewed to either left or right. The
skewness (depending on whether its skewed to the left or right) will give us some idea
of whether there are a few extremely large values or a few extremely small values in
our data.

That will help us also decide better on whether to just use mean as a summary measure
or it might be better to report median as well. We will learn how to identify symmetry
and skewness from simply looking at the general shape of the distribution and from
numerical summary measures such as mean and median.

Below are histograms of particular data. From the earlier posts, you should have
learned that histograms is great for showing the shape of the distribution.

SYMMETRIC DATA (MEAN = MEDIAN)

In a symmetric distribution, the value of the mean is equal to the median.

SKEWED TO THE LEFT (MEAN < MEDIAN)(-VE
SKEW)

In a distribution which is skewed to the left, the value of the mean is less than the
median. Note the skewness is in the direction of the long tail (which is in the left side
in this case -- thus it's skewed to the left). The small values tend to pull the mean to
the left so its a little lower than the median.
SKEWED TO THE RIGHT (MEAN > MEDIAN)(+VE
SKEW)

In a distribution which is skewed to the left, the value of the mean is l ess than the
median. Again, the skewness is in the direction of the long tail (which is in the right
side in this case -- thus it's skewed to the right). The large values tend to pull the mean
to the right so its a little larger than the median.

2.Measures of variability
The measures of central tendency are not adequate to describe data. Two data sets can have
the same mean but they can be entirely different. Thus to describe data, one needs to know
the extent of variability. This is given by the measures of dispersion. Range, interquartile
range, and standard deviation are the three commonly used measures of dispersion.

Range

Range is the difference between the largest and smallest observation in the dataset. The
disadvantage of this measure is that it is based on only two of the observations and may not
be representative of the whole dataset, particularly if there are outliers. In addition, it gives no
information regarding how the data are distributed between the two extremes.

Range = (Largest measurement) - (smallest measurement)

It depends on only two measurements

The prime advantage of this measure of dispersion is that it is easy to calculate. On the other
hand, it has lot of disadvantages. It is very sensitive to outliers and does not use all the
observations in a data set. It is more informative to provide the minimum and the maximum
values rather than providing the range.

Interquartile range

Interquartile range is defined as the difference between the 25th and 75th percentile (also called
the first and third quartile ie (Q3-Q1)). Hence the interquartile range describes the middle 50%
of observations. If the interquartile range is large it means that the middle 50% of
observations are spaced wide apart.

The important advantage of interquartile range is that it can be used as a measure of

variability if the extreme values are not being recorded exactly. It is also not affected by
extreme values. The main disadvantage in using interquartile range as a measure of
dispersion is that it is not amenable (willing to be influenced) to mathematical manipulation.

Like the median, the interquartile range is not influenced by unusually high or low values and
may be particularly useful when data are not symmetrically distributed. Ranges based on
alternative subdivisions of the data can also be calculated; for example, if the data are split
into deciles, 80% of the data will lie between the bottom and top deciles and so on.

Less sensitive to extreme values

Need fairly large numbers of observations

Quartile deviation (semi-quartile range)

𝑄3 − 𝑄1
𝑄𝐷 =
2

1st quartile (Q1) or 25th percentile

2nd quartile (Q2) or 50th percentile

3rd quartile (Q3) or 75th percentile

The mean deviation

-for ungrouped data

∑|𝑥 − 𝑥̅ |
𝑀𝐷 =
𝑛

-for grouped data

∑ 𝑓|𝑥 − 𝑥̅ |
𝑀𝐷 =
∑𝑓

Standard deviation

The standard deviation is a measure of the degree to which individual observations in a

dataset deviate from the mean value. Broadly, it is the average deviation from the mean
across all observations. It is calculated by squaring the difference of each individual
observation from the mean (squared to remove any negative differences), adding them
together, dividing by the total number of observations, and taking the square root of the
result.

The standard deviation summarizes a great deal of information in one number and, like the
mean, has useful mathematical properties.
-it uses information from every observation

-Not robust to outliers

Algebraically the standard deviation for a set of n values (X1,X2,...,Xn} is written as follows:

∑𝒏 ̅)𝟐
𝒊=𝟏(𝒙𝒊 −𝒙
𝑺𝑫 = √ , for ungrouped data
𝒏

where

and is the mean described above.

Example

Calculate the standard deviation for the data below

60, 72, 61, 66, 63, 66, 59, 64, 71, 68.

-for grouped data

∑ 𝒇𝒙𝟐
𝑺𝑫 = √ ∑𝒇
̅𝟐
−𝒙

Example

The heights of boys in class are measured to the nearest cm and the results are tabulated as
follows, calculate the standard deviation for the data

Height cm Frequency (f) Midpoints (x) fx x2f

145-154.9 3 150 450 67500
155-164.9 9 160 1440 230400
165-174.9 21 170 3570 606900
175-184.9 13 180 2340 421200
185-194.9 4 190 760 144400
∑ 𝑓 = 50 ∑ 𝑓𝑥 = 8560 ∑ 𝑥 2 𝑓 =1470400
Variance

Another measure of variability that may be encountered is the variance. This is simply the
square of the standard deviation:

Variance = S2

-for ungrouped data

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑣𝑎𝑟 =
𝑛

-for grouped data

∑ 𝒇𝒙𝟐
𝑣𝑎𝑟 = ̅𝟐
−𝒙
∑𝒇

Variance is easy to use mathematically

The variance is not generally used in data description but is central to analysis of variance .

Normal distribution
Symmetrical “Bell-shaped” distribution
Easiest to use mathematically
Many variables are normally distributed
Can be described by two numbers
 Mean (measure of location)
 Standard Deviation (measure of variation)

Diploma in Medical Laboratory Technology
No ratings yet
Diploma in Medical Laboratory Technology
30 pages
Essential Mathematics For The Australian Curriculum Year 8
50% (2)
Essential Mathematics For The Australian Curriculum Year 8
17 pages
HRM Questions
No ratings yet
HRM Questions
21 pages
PR2 Student Notes QTR 2
No ratings yet
PR2 Student Notes QTR 2
11 pages
Tabular and Graphical Presentation of Data1
100% (1)
Tabular and Graphical Presentation of Data1
7 pages
Chapter1 RevisionBooklet
No ratings yet
Chapter1 RevisionBooklet
18 pages
2 0 MCQ 1-5
100% (1)
2 0 MCQ 1-5
5 pages
Biostatistics-Haramaya University Full - Aug 25 2008
No ratings yet
Biostatistics-Haramaya University Full - Aug 25 2008
88 pages
Epi
No ratings yet
Epi
29 pages
Hiv Cat Paper
0% (1)
Hiv Cat Paper
3 pages
HRM Notes
No ratings yet
HRM Notes
102 pages
MCQ Epi
No ratings yet
MCQ Epi
45 pages
Probability and Probability Distn
100% (2)
Probability and Probability Distn
138 pages
Course Outline (Level 200)
No ratings yet
Course Outline (Level 200)
23 pages
Community Strategy 1
No ratings yet
Community Strategy 1
3 pages
Chapter 6 - Part II
No ratings yet
Chapter 6 - Part II
36 pages
Final Exam - Best, Sample Exam
No ratings yet
Final Exam - Best, Sample Exam
8 pages
Psychology Question Bank
No ratings yet
Psychology Question Bank
12 pages
Basic Patient Care Cat I Special 2018
No ratings yet
Basic Patient Care Cat I Special 2018
9 pages
Session 2 Patient History Taking
No ratings yet
Session 2 Patient History Taking
36 pages
Unit 1 - Introduction To Qa
100% (1)
Unit 1 - Introduction To Qa
52 pages
Amnesia PPT. Group 7
No ratings yet
Amnesia PPT. Group 7
18 pages
Cbtpmotherdoc (Commented)
No ratings yet
Cbtpmotherdoc (Commented)
44 pages
Topic 4a
No ratings yet
Topic 4a
28 pages
5.vital Signs-1
No ratings yet
5.vital Signs-1
89 pages
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
No ratings yet
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
55 pages
Biostat Lecture 6
No ratings yet
Biostat Lecture 6
94 pages
HSU B301 BIOSTATISTICS FOR HEALTH SCIENCES Main Exam
100% (1)
HSU B301 BIOSTATISTICS FOR HEALTH SCIENCES Main Exam
12 pages
Research On Caregivers of Mental Illness People
No ratings yet
Research On Caregivers of Mental Illness People
73 pages
Data Arrangement and Presentation Formation of Tables and Charts
No ratings yet
Data Arrangement and Presentation Formation of Tables and Charts
55 pages
Medical School Admission Test Sample Chemistry2018
No ratings yet
Medical School Admission Test Sample Chemistry2018
5 pages
IMS MCQ Bank 2023
No ratings yet
IMS MCQ Bank 2023
28 pages
Sphweb - Bumc.bu - Edu-Measures of Disease Frequency
No ratings yet
Sphweb - Bumc.bu - Edu-Measures of Disease Frequency
24 pages
Bio Introduction
No ratings yet
Bio Introduction
101 pages
Past Sis Exam Questions
No ratings yet
Past Sis Exam Questions
22 pages
Pain Perception and Comfort BB
No ratings yet
Pain Perception and Comfort BB
39 pages
Questions: Blood Vessels - Arteries, Veins and Capillaries
No ratings yet
Questions: Blood Vessels - Arteries, Veins and Capillaries
1 page
Immunology Notes
No ratings yet
Immunology Notes
131 pages
Differential Leukocyte Count
100% (1)
Differential Leukocyte Count
2 pages
Statistic Quiz 1
No ratings yet
Statistic Quiz 1
5 pages
Cardiac Function Tests
No ratings yet
Cardiac Function Tests
40 pages
Day 2-Statistical Measures of Data Rev
100% (1)
Day 2-Statistical Measures of Data Rev
82 pages
Individual Exercise and Assignments
50% (4)
Individual Exercise and Assignments
2 pages
BIO2A03-Muscle Lecture Notes 1
No ratings yet
BIO2A03-Muscle Lecture Notes 1
22 pages
D. A Systematic Process of Gathering, Analyzing and Interpreting Data
No ratings yet
D. A Systematic Process of Gathering, Analyzing and Interpreting Data
8 pages
Epi For MLS Students
No ratings yet
Epi For MLS Students
453 pages
EPIData Presentation
No ratings yet
EPIData Presentation
36 pages
Biostat
100% (1)
Biostat
66 pages
Excercise 2 - Frequency Measure
75% (4)
Excercise 2 - Frequency Measure
6 pages
Biostatistics Assignment
No ratings yet
Biostatistics Assignment
3 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
53 pages
Maths
100% (3)
Maths
110 pages
FHP (Gordon's Approach) Format
No ratings yet
FHP (Gordon's Approach) Format
25 pages
GW N2
No ratings yet
GW N2
2 pages
Data Stratification
No ratings yet
Data Stratification
19 pages
HMIS Group 3
No ratings yet
HMIS Group 3
31 pages
GW N2
No ratings yet
GW N2
2 pages
End of Term of Exams Form 1
No ratings yet
End of Term of Exams Form 1
8 pages
RM-MCQs 12
No ratings yet
RM-MCQs 12
7 pages
Practice Exam Chapter 3 Solution
No ratings yet
Practice Exam Chapter 3 Solution
9 pages
Nciph ERIC10
No ratings yet
Nciph ERIC10
5 pages
Biochemistry of Neurotransmission
No ratings yet
Biochemistry of Neurotransmission
47 pages
Quiz Last Set
No ratings yet
Quiz Last Set
8 pages
Steps in Questionnaire Construction
100% (1)
Steps in Questionnaire Construction
3 pages
tmp1413 TMP
No ratings yet
tmp1413 TMP
25 pages
Quartiles For Ungrouped Data:: Measures of Location and Position
No ratings yet
Quartiles For Ungrouped Data:: Measures of Location and Position
3 pages
Review Question Stat
No ratings yet
Review Question Stat
19 pages
Lectures - ProbaStat For Engineers
No ratings yet
Lectures - ProbaStat For Engineers
60 pages
Central Tendency Practice Sheet
No ratings yet
Central Tendency Practice Sheet
12 pages
Teacher'S Activity Learner'S Activity A. Daily Routine: (The Learners Recite The Our Father)
No ratings yet
Teacher'S Activity Learner'S Activity A. Daily Routine: (The Learners Recite The Our Father)
5 pages
May 8 INT 2021l
No ratings yet
May 8 INT 2021l
20 pages
Lesson 2. Measures of Central Tendency
No ratings yet
Lesson 2. Measures of Central Tendency
9 pages
SS2 Mathematics Week 4 Third Term
No ratings yet
SS2 Mathematics Week 4 Third Term
3 pages
MMW Finals Notes Mod 5&6
No ratings yet
MMW Finals Notes Mod 5&6
52 pages
Error in Chemical Analysis
No ratings yet
Error in Chemical Analysis
18 pages
Research Revision Questions 2-1-1
No ratings yet
Research Revision Questions 2-1-1
23 pages
Item Analysis of National Geography Olympiad Multiple-Choice Questions MCQs in Indonesia
No ratings yet
Item Analysis of National Geography Olympiad Multiple-Choice Questions MCQs in Indonesia
12 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
4 pages
ASA Notes
No ratings yet
ASA Notes
28 pages
Demography
No ratings yet
Demography
46 pages
Abnormal Audit Fee and Audit Quality
No ratings yet
Abnormal Audit Fee and Audit Quality
23 pages
An Introduction To Distribution-Free Statistical Methods: Douglas G. Bonett University of California, Santa Cruz
No ratings yet
An Introduction To Distribution-Free Statistical Methods: Douglas G. Bonett University of California, Santa Cruz
48 pages
Using Random Forests v4.0
No ratings yet
Using Random Forests v4.0
33 pages
5.1 Measures of Central Tendency - Docx Note
No ratings yet
5.1 Measures of Central Tendency - Docx Note
5 pages
Measures of Central Tendency or Measure of Location: Definition
No ratings yet
Measures of Central Tendency or Measure of Location: Definition
12 pages
Uji T
No ratings yet
Uji T
13 pages
Descriptive Statistics Task 50 Completed
No ratings yet
Descriptive Statistics Task 50 Completed
8 pages
W4xnx0o1 Recursive Median Filter
No ratings yet
W4xnx0o1 Recursive Median Filter
6 pages
Exercise Set 1
No ratings yet
Exercise Set 1
5 pages
Course Critique 5327
No ratings yet
Course Critique 5327
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture Notes 2 Data Organization and Presentation

Uploaded by

Lecture Notes 2 Data Organization and Presentation

Uploaded by

NATIONAL DIPLOMA IN QUANTITY SURVEYING-575/15/TN/0

SUBJECT TITLE: STATISTICS

DATA ORGANIZATION AND PRESENTATION

Why do we summarise data?

How to design a table after collecting data?

3. Frequency distribution Table for the number of rooms in each of 40 houses

Pie chart for number of rooms in each of 40 houses

Cumulative frequency table for the number of rooms in each of 40 houses

Draw a Cumulative frequency curve

1. Measures of Central Tendency

One main disadvantage of mean: it is particularly susceptible to the influence of outliers.

Calculate the mean for the data below

Mean for grouped data

Height cm Frequency (f) Midpoints (x) fx

-However, it is not generally influenced by extreme values (outliers), and as a result it is

Example (ungrouped data)

We again rearrange that data into order of magnitude (smallest first):

𝑙𝑚 = 𝑙𝑜𝑤𝑒𝑟 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦

𝑐𝑚 = 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

𝑓𝑚−1 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑖𝑚𝑚𝑒𝑑𝑖𝑎𝑡𝑒𝑙𝑦 𝑏𝑒𝑙𝑜𝑤 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

i) 1, 2, 3, 4, 100 mode does not exist

Summary of when to use the mean, median and mode

Type of Variable Best measure of central tendency

GM is an appropriate measure when values change exponentially and in case of skewed

Alternatively, the reciprocal of HM is the mean of reciprocals of individual observations.

Skewness: Measure of Asymmetry

Shape of the Distribution: Symmetry and

SYMMETRIC DATA (MEAN = MEDIAN)

In a symmetric distribution, the value of the mean is equal to the median.

Range = (Largest measurement) - (smallest measurement)

It depends on only two measurements

The important advantage of interquartile range is that it can be used as a measure of

Less sensitive to extreme values

Need fairly large numbers of observations

Quartile deviation (semi-quartile range)

1st quartile (Q1) or 25th percentile

2nd quartile (Q2) or 50th percentile

3rd quartile (Q3) or 75th percentile

The mean deviation

-for ungrouped data

-for grouped data

The standard deviation is a measure of the degree to which individual observations in a

-Not robust to outliers

and is the mean described above.

Calculate the standard deviation for the data below

-for grouped data

Height cm Frequency (f) Midpoints (x) fx x2f

-for ungrouped data

-for grouped data

Variance is easy to use mathematically

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.