0% found this document useful (0 votes)
7 views38 pages

AEB801 20222023-Lecture 03-1

The document covers data presentation techniques, including frequency distribution tables for both categorical and continuous variables, and various graphical representations such as line graphs, bar charts, histograms, and pie charts. It also discusses measures of central tendency (mean, median, mode) and measures of dispersion (range, mean deviation, variance, standard deviation). The content emphasizes the importance of choosing appropriate methods for data analysis and presentation based on the nature of the data.

Uploaded by

Nimi Elisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

AEB801 20222023-Lecture 03-1

The document covers data presentation techniques, including frequency distribution tables for both categorical and continuous variables, and various graphical representations such as line graphs, bar charts, histograms, and pie charts. It also discusses measures of central tendency (mean, median, mode) and measures of dispersion (range, mean deviation, variance, standard deviation). The content emphasizes the importance of choosing appropriate methods for data analysis and presentation based on the nature of the data.

Uploaded by

Nimi Elisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

LECTURE 3

Data Presentation

Measures of Central Tendency or Measure of Location

Measures of dispersion and variability

11/11/2024 1
DATA PRESENTATION
Data can be presented in various forms of Tables and graphs.
1. Frequency Distribution Table
It is a table showing the distribution of the total number of observations among the various categories.

 a list of the observed values

 and how many times (Frequency) each value was observed

 It can be used for

 both categorical and numeric variables.


 Continuous variables should only be used with class intervals.

Advantage
• Data are presented in a more manageable and comprehensible form.

11/11/2024 2
Examples of frequency distribution of discrete variable
(A). Colour choice of 60 students in a class.
(B). Scores by 20 students in a class grouped into class intervals.
Colour Number
White 12
Pink 20
Blue 12
Green 4
Yellow 12
Total 60

11/11/2024 3
Continuous variables are more likely to be presented in class intervals
Example of frequency distribution table for continuous variable
Continuous variable data on the weight (kg) of 30 adult males in the age of 25-35 years are given.
55, 78, 61, 61, 76, 70, 72, 58, 53, 67, 56, 68, 64, 78, 77, 76, 65, 53, 48, 57, 61, 68, 74, 58, 53, 48,
69, 71, 69, 57

Class interval Frequency


45-50 2
50-55 3
55-60 6
60-65 4
65-70 6
70-75 4
75-80 5
total 30

Note: Class interval of “45-50” include individual values ranging exactly from 45 to 49.99 and “50-
55” include individual values ranging exactly from 50 to 54.99.

11/11/2024 4
Rules for data sets that contain a large number of observations

1. Find the lowest and highest values of the variables.

2. Decide on the width of the class intervals

3. It is important to make sure that the class intervals are mutually


exclusive

4. The endpoints of a class interval are the lowest and highest values
that a variable can take

11/11/2024 5
Cumulative frequency table showing Relative frequency

Class Cum. Cum. Rel.


interval Frequency Freq. Rel. Freq. % Freq Freq. Cum. %
45-50 2 2 0.07 7 0.07 7
50-55 3 5 0.10 10 0.17 17
55-60 6 11 0.20 20 0.37 37
60-65 4 15 0.13 13 0.50 50
65-70 6 21 0.20 20 0.70 70
70-75 4 25 0.13 13 0.83 83
75-80 5 30 0.17 17 1.00 100
Total 30 1.00 100

11/11/2024 6
Cumulative frequency distribution table

A cumulative frequency distribution table is a more detailed table. It is almost the same as a
frequency distribution table but it has added columns that give the

 Cumulative Frequency

• Relative frequency = (frequency ÷ number of observations)

• Percentage Frequency (multiplying each relative frequency value by 100).

• Cumulative Relative Frequency

• Cumulative Percentage of the results (cumulative frequency ÷ total number of results multiply by
100).

11/11/2024 7
Graphical presentation
A graph is a pictorial representation of the relationship between variables.
• They help to simplify, clarify and beautify data that would have otherwise been clumsy
and confusing to understand.
• Graph may be designed for nominal, ordinal, interval and ratio data.

• There are several types of graphs used in data presentation; however, the type used
depends on the nature of the data involved and the purpose for which the graph is intended.

Types of Graphs
 Line
 Bar
 Histogram
 Pie chart

11/11/2024 8
How Do I Choose Which Type of Graph to Use?

Line-simple or multiple

• Line graphs are used to track changes over short and long periods of time.

• When smaller changes exist, line graphs are better to use than bar graphs.

• Line graphs can also be used to compare changes over the same period of time

for more than one group.

11/11/2024 9
Examples of line graphs

pH
7.5

4.5
pH

1.5

0
MARCH APRIL MAY JUNE JULY AUGUST

MONTH

11/11/2024 10
Bar-simple or multiple
It is basically used to represent nominal or ordinal data.

 They are commonly–used and a clear way of presenting categorical data or any ungrouped discrete

frequency observations.

 Bar charts provide a simple method of quickly spotting simple patterns of popularity within a discrete

data set.

 Bar charts cannot be used to present continuous data.

• Bar graphs are used to compare things between different groups or to track changes over time.

• However, when trying to measure change over time, bar graphs are best when the changes are larger.

• By convention the variable being measured goes on the horizontal (x–axis) and the frequency goes on the

vertical (y–axis).

11/11/2024 11
Bar Charts

11/11/2024 12
Histogram
It is the graph most commonly used in representing continuous data of an interval
or ratio scale.

• It is basically a bar graph in which the bars are connected to reflect the continuity
of relevant data.
Histogram is different from bar charts in two critical aspects:
The horizontal (x-axis) is a continuous scale. As a result of this there are no
gaps between the bars (unless there are no observations within a class
interval);
The area of the rectangle is proportional to the frequency.

11/11/2024 13
Histograms

11/11/2024 14
Use of Histograms as a tool in data analysis.

 They provide a clear visual representation of the data.

 It is easy to spot the modal or most popular class in the data, i.e. the one with

the highest peak.

 It is also easy to spot simple patterns in the data.

 Allow us to make early judgements as to whether all our data come from the

same population.

11/11/2024 15
Pie chart

It is most appropriate for nominal and ordinal data. Pie charts are simple diagrams for

displaying categorical or grouped data.

 Pie charts are best used when comparing parts of a whole.

 They do not show changes over time.

 They are used to show the proportions of a whole.

 They are best used when there are only a handful of categories to display.

 A pie chart consists of a circle divided into segments, one segment for each category.

11/11/2024 16
Chordata Sipuncula Nemertea
3% Echinodermata 2% 2%
4%
Mollusca:
Echiurida
Gastropoda
4%
27%

Mollusca:
Bivalvia
12%

Crustacea
27%
Polychaeta
19%

Relative percentage of the major groups of benthic macrofauna

11/11/2024 17
Measures of Central Tendency or Measure of Location

They are a group of statistical techniques which measures the typical trait of a

distribution of data.

The three most frequently employed measures of Central Tendency are

• Mean,

• Mode

• Median

11/11/2024 18
Arithmetic mean
It is the most commonly used and the most powerful measure of central tendency and it has the most
assumptions;
 it is applied only to ratio and interval scale data.
 In addition, the distribution should be normally distributed or,
 at least, not highly skewed.
Population mean (µ = mu, a greek letter) is calculated as;

11/11/2024 19
Properties of means
The mean is very sensitive to extreme scores/values, therefore when there are
extremely high or low scores in a distribution the mean should not be used to
compute average. E.G
S/N Distribution A Distribution B
1 30 30
2 40 40
3 25 25
4 44 44
5 36 36
6 98 47
∑= 273 ∑= 222
ഥ = 45.50
𝑿 ഥ = 37
𝑿
11/11/2024 20
Median

It is the middle measurement in a set of data arranged in an array (decreasing or increasing order of

magnitude).

It divides the data set into two equal halves.

11/11/2024 21
Disadvantage

1. Median expresses less information than the mean, for it does not take into account the actual values of each

measurements but only considers the rank of each measurement.

Advantage

1. Extremely high or low values will not affect the median as much as it will affect the mean; thus, when dealing

with skewed populations, it will be preferable to use median than the mean to express central tendency.

2. There will not be need to have data for all members of the sample to calculate the median. As for example, if some

of the first few data are omitted the median could still be determined but the mean cannot.

3. The median can be used for interval, ratio and ordinal data for which the use of mean will not be considered

appropriate.

4. The median has the same unit as each individual measurement.

11/11/2024 22
Mode

Is the most frequently occurring measurement in a set of data or as a measurement

of great concentration, for some frequency distribution may have more than one

such points of concentration.

Example: 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.0, 4.0, 4.1,

4.1, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.5.

Mode = 4.0

11/11/2024 23
Advantage

1. In a symmetrical unimodal population, the mode is an unbiased and consistent estimate of the mean and

median, but it is relatively inefficient and should not be so used.

2. As a measure of central tendency, the mode is less affected by skewedness.

3. The mode may be used for data on nominal, ordinal, interval and ratio scale

Disadvantage

1. It can be seriously affected by sampling procedure.

2. The mode is not often used in biological research though it can be of interest to report the number of

modes detected in a population, if there are more than one.

11/11/2024 24
Measures of dispersion and variability
This is an indication of the clustering of measurements around the centre of the distribution OR an
indication of how variable the measurements are.

The common measures of dispersion used are;

 Range,

 Mean Deviation,

 Variance,

 Standard Deviation,

 Standard Error etc.

11/11/2024 25
Range
This is the difference between the highest and lowest measurements in a group of data.

It is applicable to data in Ordinal, Interval and Ratio scale only.

Disadvantage

1. The range is a relatively crude measure of dispersion, since it does not take into account other
measurements except the highest and the lowest.

2. It is unlikely that a sample will contain both the highest and lowest values in the population, the sample
range usually underestimates the population range and therefore it is a biased and inefficient estimator.

Inspite of these shortcomings it is still useful in some circumstances as an estimate of the population range, it
should however be given along with another measure of dispersion.

11/11/2024 26
Mean Deviation

It is an indication of how clustered or dispersed from the mean the measurements are.

Example: Find the mean deviation of the set of numbers in grams: 1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.

Step 1: Find the mean


Mean ( ) = 12.6 ÷ 7 = 1.8
Step 2: Subtract the mean from each of the x values (see table)

S/N x x1 – 𝑋ത ത
|x1 – 𝑋|
1 1.2 -0.6 0.6
2 1.4 -0.4 0.4
3 1.6 -0.2 0.2
4 1.8 0.0 0.0
5 2.0 0.2 0.2
6 2.2 0.4 0.4
7 2.4 0.6 0.6
∑ = 12.6 0 2.4
ത will always equal to zero
The sum of all deviations from the mean, i.e. ∑(x – 𝑋),

11/11/2024 27
ത results in a quantity that
So the absolute values of the deviations from the mean (|x – 𝑋|)
is an expression of dispersion about the mean.

• Dividing this quantity by n gives a measure known as the mean deviation, or mean
absolute deviation of the sample;

Step 3: i.e. Mean Deviation (MD) = ∑|x - 𝑋ത | ÷ n;

Where ∑|x - 𝑋ത | = 2.4;

n=7

Mean Deviation (MD) = 2.4 ÷ 7 = 0.3428gms

11/11/2024 28
Variance
To eliminate the negative signs of the deviations from the means, the deviations are squared.

The sum of the squares of the deviations from the mean is called the

Sum of Squares abbreviated (SS), it is expressed as;

Sample SS = ∑(x–𝑋ത) 2

The mean sum of squares is called the variance (or mean square, the latter being short for means

squared deviation).

For a population it is denoted by a δ2 (sigma squared, using the lower greek letter).
Population variance (δ2) = ∑(x – µ) 2 ÷ N

11/11/2024 29
ത 2 ÷ n-1
Sample Variance (S2) = ∑(x – 𝑋)

It is necessary to divide the sample sum of squares (SS) by n – 1 called the degree

of freedom (DF), because

it yields an unbiased estimate

 i.e. it compensates for the small sample size compared to the entire population

from which the samples are taken.

11/11/2024 30
Example: Find the variance of the set of numbers in grams: 1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.
Note: Calculated Mean = 1.8

S/N x ഥ
x1 – 𝑿 ഥ )2
(x – 𝑿
1 1.2 -0.6 0.36
2 1.4 -0.4 0.16
3 1.6 -0.2 0.04
4 1.8 0.0 0.00
5 2.0 0.2 0.04
6 2.2 0.4 0.16
7 2.4 0.6 0.36
∑ = 12.6 ∑=0 ∑ = 1.12

11/11/2024 31
ത 2 ÷ n-1,
Therefore, S2 = ∑(x – 𝑋)

= 1.12 ÷ 6 = 0.18667gm2

 S2 becomes increasingly large as the amount of variability or dispersion


increases.

 Since S2 is a mean sum of squares, it can never be a negative quantity.


Variance expresses the same type of information as the mean deviation

 but it is distinctly more superior to mean deviation in hypothesis testing, hence


mean deviation is seldom used in biostatistical analysis.

11/11/2024 32
Using machine formular
To make calculation of variance (S2) easier when handling large samples an alternative
method known as working formula or machine formula is applied;

ത 2
This is equivalent to ∑(x – 𝑋)

This formular has the advantage of

1. Fewer computational steps; this decreases chances of error

ത a situation which leads to


2. Avoid several rounding error in calculating each x –𝑋,
decreased accuracy in computation.

11/11/2024 33
Using the machine formula find the variance of the set of numbers in grams:

1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.

∑x 2 = 23.8; ∑x = 12.6; n = 7

= 0.18667 𝑔𝑚2

Variance has squared units. If measurements are in grams their variance will be in
grams squared.

11/11/2024 34
Standard Deviation

It is the positive square root of the variance; therefore it has the same unit as the original
measurements

OR

Example: Using the machine formula find the standard deviation of the set of numbers in grams:

1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.

SD = √0.18667 gms2 = 0.4320gms

Standard Deviation is frequently abbreviated as SD, and it is always a positive quantity

11/11/2024 35
Standard Error or (Standard deviation of the mean)
It indicates how close the values of means are to the population mean. It is
expressed as

Unit is the same as the unit of the individual measurements.

Example 6: using the machine formula find the standard Error of the set of numbers
in grams:

1.2, 1.4, 1.6, 1.8, 2.0, 2.2 and 2.4.

S2 = 0.18667, n = 7, therefore (SE) = √0.18667 ÷ 7 = 0.1633


= 0.1633gm

11/11/2024 36
Coefficient of Variation

CV expresses sample variability relative to the mean of the sample.

It is called a measure of relative variability or relative dispersion

It is expressed as
CV = SD OR SD x 100
𝑋ത 𝑋ത

SD = 0.432, 𝑋ത = 1.8
CV = 0.432
1.8 = 0.24 or 24%

CV is generally a small quantity, so it is often expressed in percentages.

Since SD and 𝑋ത have identical units, CV has no unit at all, a fact which emphasizes that it is a relative measure
divorced from the actual magnitude or units of measurements of the data.

CV may be calculated only for ratio scale data.


11/11/2024 37
Reporting Variability about the Mean

To describe the population that one has sampled, then the following sample statistics must be reported as a
summary of the data collected.

1. Report the sample mean ( ) and the Standard Deviation (SD).

2. Range might also be reported, along with other measures of variability e.g Standard Deviation (SD).

3. If it is the intention to provide a statement about the precision of estimation of the population mean the use
of Standard Error (SX) is appropriate.

4. It is important to state N, so that whichever of SD or SX is given, it can be converted to the other.

5. Clearly state the measure of variability used in the caption i.e SD or SX. There is however, no widely
accepted convention (the alternatives are: +SD, +SX, +95%, +99%).

6. The units of measurements must be clear i.e. cm/sec, mg/l, ppm etc.

7. If these information are to be in a table, the table must be self-explanatory.


11/11/2024 38

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy