AEB801 20222023-Lecture 03-1
AEB801 20222023-Lecture 03-1
Data Presentation
11/11/2024 1
DATA PRESENTATION
Data can be presented in various forms of Tables and graphs.
1. Frequency Distribution Table
It is a table showing the distribution of the total number of observations among the various categories.
Advantage
• Data are presented in a more manageable and comprehensible form.
11/11/2024 2
Examples of frequency distribution of discrete variable
(A). Colour choice of 60 students in a class.
(B). Scores by 20 students in a class grouped into class intervals.
Colour Number
White 12
Pink 20
Blue 12
Green 4
Yellow 12
Total 60
11/11/2024 3
Continuous variables are more likely to be presented in class intervals
Example of frequency distribution table for continuous variable
Continuous variable data on the weight (kg) of 30 adult males in the age of 25-35 years are given.
55, 78, 61, 61, 76, 70, 72, 58, 53, 67, 56, 68, 64, 78, 77, 76, 65, 53, 48, 57, 61, 68, 74, 58, 53, 48,
69, 71, 69, 57
Note: Class interval of “45-50” include individual values ranging exactly from 45 to 49.99 and “50-
55” include individual values ranging exactly from 50 to 54.99.
11/11/2024 4
Rules for data sets that contain a large number of observations
4. The endpoints of a class interval are the lowest and highest values
that a variable can take
11/11/2024 5
Cumulative frequency table showing Relative frequency
11/11/2024 6
Cumulative frequency distribution table
A cumulative frequency distribution table is a more detailed table. It is almost the same as a
frequency distribution table but it has added columns that give the
Cumulative Frequency
• Cumulative Percentage of the results (cumulative frequency ÷ total number of results multiply by
100).
11/11/2024 7
Graphical presentation
A graph is a pictorial representation of the relationship between variables.
• They help to simplify, clarify and beautify data that would have otherwise been clumsy
and confusing to understand.
• Graph may be designed for nominal, ordinal, interval and ratio data.
• There are several types of graphs used in data presentation; however, the type used
depends on the nature of the data involved and the purpose for which the graph is intended.
Types of Graphs
Line
Bar
Histogram
Pie chart
11/11/2024 8
How Do I Choose Which Type of Graph to Use?
Line-simple or multiple
• Line graphs are used to track changes over short and long periods of time.
• When smaller changes exist, line graphs are better to use than bar graphs.
• Line graphs can also be used to compare changes over the same period of time
11/11/2024 9
Examples of line graphs
pH
7.5
4.5
pH
1.5
0
MARCH APRIL MAY JUNE JULY AUGUST
MONTH
11/11/2024 10
Bar-simple or multiple
It is basically used to represent nominal or ordinal data.
They are commonly–used and a clear way of presenting categorical data or any ungrouped discrete
frequency observations.
Bar charts provide a simple method of quickly spotting simple patterns of popularity within a discrete
data set.
• Bar graphs are used to compare things between different groups or to track changes over time.
• However, when trying to measure change over time, bar graphs are best when the changes are larger.
• By convention the variable being measured goes on the horizontal (x–axis) and the frequency goes on the
vertical (y–axis).
11/11/2024 11
Bar Charts
11/11/2024 12
Histogram
It is the graph most commonly used in representing continuous data of an interval
or ratio scale.
• It is basically a bar graph in which the bars are connected to reflect the continuity
of relevant data.
Histogram is different from bar charts in two critical aspects:
The horizontal (x-axis) is a continuous scale. As a result of this there are no
gaps between the bars (unless there are no observations within a class
interval);
The area of the rectangle is proportional to the frequency.
11/11/2024 13
Histograms
11/11/2024 14
Use of Histograms as a tool in data analysis.
It is easy to spot the modal or most popular class in the data, i.e. the one with
Allow us to make early judgements as to whether all our data come from the
same population.
11/11/2024 15
Pie chart
It is most appropriate for nominal and ordinal data. Pie charts are simple diagrams for
They are best used when there are only a handful of categories to display.
A pie chart consists of a circle divided into segments, one segment for each category.
11/11/2024 16
Chordata Sipuncula Nemertea
3% Echinodermata 2% 2%
4%
Mollusca:
Echiurida
Gastropoda
4%
27%
Mollusca:
Bivalvia
12%
Crustacea
27%
Polychaeta
19%
11/11/2024 17
Measures of Central Tendency or Measure of Location
They are a group of statistical techniques which measures the typical trait of a
distribution of data.
• Mean,
• Mode
• Median
11/11/2024 18
Arithmetic mean
It is the most commonly used and the most powerful measure of central tendency and it has the most
assumptions;
it is applied only to ratio and interval scale data.
In addition, the distribution should be normally distributed or,
at least, not highly skewed.
Population mean (µ = mu, a greek letter) is calculated as;
11/11/2024 19
Properties of means
The mean is very sensitive to extreme scores/values, therefore when there are
extremely high or low scores in a distribution the mean should not be used to
compute average. E.G
S/N Distribution A Distribution B
1 30 30
2 40 40
3 25 25
4 44 44
5 36 36
6 98 47
∑= 273 ∑= 222
ഥ = 45.50
𝑿 ഥ = 37
𝑿
11/11/2024 20
Median
It is the middle measurement in a set of data arranged in an array (decreasing or increasing order of
magnitude).
11/11/2024 21
Disadvantage
1. Median expresses less information than the mean, for it does not take into account the actual values of each
Advantage
1. Extremely high or low values will not affect the median as much as it will affect the mean; thus, when dealing
with skewed populations, it will be preferable to use median than the mean to express central tendency.
2. There will not be need to have data for all members of the sample to calculate the median. As for example, if some
of the first few data are omitted the median could still be determined but the mean cannot.
3. The median can be used for interval, ratio and ordinal data for which the use of mean will not be considered
appropriate.
11/11/2024 22
Mode
of great concentration, for some frequency distribution may have more than one
Example: 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.0, 4.0, 4.1,
Mode = 4.0
11/11/2024 23
Advantage
1. In a symmetrical unimodal population, the mode is an unbiased and consistent estimate of the mean and
3. The mode may be used for data on nominal, ordinal, interval and ratio scale
Disadvantage
2. The mode is not often used in biological research though it can be of interest to report the number of
11/11/2024 24
Measures of dispersion and variability
This is an indication of the clustering of measurements around the centre of the distribution OR an
indication of how variable the measurements are.
Range,
Mean Deviation,
Variance,
Standard Deviation,
11/11/2024 25
Range
This is the difference between the highest and lowest measurements in a group of data.
Disadvantage
1. The range is a relatively crude measure of dispersion, since it does not take into account other
measurements except the highest and the lowest.
2. It is unlikely that a sample will contain both the highest and lowest values in the population, the sample
range usually underestimates the population range and therefore it is a biased and inefficient estimator.
Inspite of these shortcomings it is still useful in some circumstances as an estimate of the population range, it
should however be given along with another measure of dispersion.
11/11/2024 26
Mean Deviation
It is an indication of how clustered or dispersed from the mean the measurements are.
Example: Find the mean deviation of the set of numbers in grams: 1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.
S/N x x1 – 𝑋ത ത
|x1 – 𝑋|
1 1.2 -0.6 0.6
2 1.4 -0.4 0.4
3 1.6 -0.2 0.2
4 1.8 0.0 0.0
5 2.0 0.2 0.2
6 2.2 0.4 0.4
7 2.4 0.6 0.6
∑ = 12.6 0 2.4
ത will always equal to zero
The sum of all deviations from the mean, i.e. ∑(x – 𝑋),
11/11/2024 27
ത results in a quantity that
So the absolute values of the deviations from the mean (|x – 𝑋|)
is an expression of dispersion about the mean.
• Dividing this quantity by n gives a measure known as the mean deviation, or mean
absolute deviation of the sample;
n=7
11/11/2024 28
Variance
To eliminate the negative signs of the deviations from the means, the deviations are squared.
The sum of the squares of the deviations from the mean is called the
Sample SS = ∑(x–𝑋ത) 2
The mean sum of squares is called the variance (or mean square, the latter being short for means
squared deviation).
For a population it is denoted by a δ2 (sigma squared, using the lower greek letter).
Population variance (δ2) = ∑(x – µ) 2 ÷ N
11/11/2024 29
ത 2 ÷ n-1
Sample Variance (S2) = ∑(x – 𝑋)
It is necessary to divide the sample sum of squares (SS) by n – 1 called the degree
i.e. it compensates for the small sample size compared to the entire population
11/11/2024 30
Example: Find the variance of the set of numbers in grams: 1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.
Note: Calculated Mean = 1.8
S/N x ഥ
x1 – 𝑿 ഥ )2
(x – 𝑿
1 1.2 -0.6 0.36
2 1.4 -0.4 0.16
3 1.6 -0.2 0.04
4 1.8 0.0 0.00
5 2.0 0.2 0.04
6 2.2 0.4 0.16
7 2.4 0.6 0.36
∑ = 12.6 ∑=0 ∑ = 1.12
11/11/2024 31
ത 2 ÷ n-1,
Therefore, S2 = ∑(x – 𝑋)
= 1.12 ÷ 6 = 0.18667gm2
11/11/2024 32
Using machine formular
To make calculation of variance (S2) easier when handling large samples an alternative
method known as working formula or machine formula is applied;
ത 2
This is equivalent to ∑(x – 𝑋)
11/11/2024 33
Using the machine formula find the variance of the set of numbers in grams:
∑x 2 = 23.8; ∑x = 12.6; n = 7
= 0.18667 𝑔𝑚2
Variance has squared units. If measurements are in grams their variance will be in
grams squared.
11/11/2024 34
Standard Deviation
It is the positive square root of the variance; therefore it has the same unit as the original
measurements
OR
Example: Using the machine formula find the standard deviation of the set of numbers in grams:
11/11/2024 35
Standard Error or (Standard deviation of the mean)
It indicates how close the values of means are to the population mean. It is
expressed as
Example 6: using the machine formula find the standard Error of the set of numbers
in grams:
11/11/2024 36
Coefficient of Variation
It is expressed as
CV = SD OR SD x 100
𝑋ത 𝑋ത
SD = 0.432, 𝑋ത = 1.8
CV = 0.432
1.8 = 0.24 or 24%
Since SD and 𝑋ത have identical units, CV has no unit at all, a fact which emphasizes that it is a relative measure
divorced from the actual magnitude or units of measurements of the data.
To describe the population that one has sampled, then the following sample statistics must be reported as a
summary of the data collected.
2. Range might also be reported, along with other measures of variability e.g Standard Deviation (SD).
3. If it is the intention to provide a statement about the precision of estimation of the population mean the use
of Standard Error (SX) is appropriate.
5. Clearly state the measure of variability used in the caption i.e SD or SX. There is however, no widely
accepted convention (the alternatives are: +SD, +SX, +95%, +99%).
6. The units of measurements must be clear i.e. cm/sec, mg/l, ppm etc.