Introduction To Biostatistics
Introduction To Biostatistics
BIOSTATISTICS.
COURSE CODE; ML 0112
Covers
Types of variables
Descriptive methods for qualitative data
Descriptive methods for quantitative data
VARIABLES
• What is a variable?
• An observation or characteristic that can take different
values for different objects.
Examples of Variables
• Height cm • 158,169,170,200
• Weight kg • 10.2.50.69.34
• Outcome of disease • Recovery, chronic illness,
• Marital status death
• Single, married, widowed,
separated
Types of variables
• Qualitative variables: do not take numerical values.
• e.g. sex, type of health facility, marital status, etc.
• Quantitative variables: take numerical values
• e.g. age, no. of sexual partners, hb, parity, etc.
• Quantitative variables
• Discrete; Take only fixed values e.g age, counts
• Continuous; take any values within meaningful extremies eg
heigh, weight
Relationships between variables
• Two variables that show some connection with one
another are called associated (dependent)
• Association can be further described as positive or
negative
• If two variables are not associated, they are said to be
independent
Descriptive methods for
qualitative data
• Tables
– frequency tables (one-way table)
– cross-tabulations (two-way table)
• Diagrams
– pie chart
– bar chart
– map
TABLES
• Set of data arranged in rows and columns.
• Should be as simple as possible
• Should be self explanatory (stand alone)
One-way table
• Distribution of types of latrines
TWO-WAY TABLE
• Two variables presented simultaneously
Any method 26
Pill 6
Injectables 8
Male condom 2
MAPS
• used to show the location of events or attributes
• useful for showing the geographic distribution of an event
THANK
YOU
Descriptive methods for
quantitative data
♦ Tables
– Frequency distribution
♦ Diagrams
– Histograms
– Scatter diagrams
– Line diagrams
– Frequency polygon
– Cumulative frequency curve
Frequency Distribution
Cumulative
No of partners Frequency Frequency
1 2 2
2 3 5
3 5 10
4 9 19
5 1 20
2+3+5
Total 20 =10
Frequency Distribution.
• Example: on mobile phone cost
• Sample of mobile Cost for 50 UDOM residents
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Frequency Distributions
SUMMARIZING DATA
Central Tendency (measures of center)
• Mean
• Median
• Mode
Variation (measures of spread)
• Range
• Interquartile Range
• Variance
• Standard Deviation
Introduction: distributional
shapes
• Recall: the mid-points of the bars in a histogram
for a continuous variables can be joined smoothly
to produce a curve.
• In the process, a distributional shape of such a
variable can be visualized.
• In general the distribution can be UNIMODAL
(i.e. exhibiting one peak) or, less commonly,
BIMODAL
• In UNIMODAL distributions, the curves can be symmetric
(“normal”) or asymmetric
• Asymmetric distributions can be skewed positively
(skewed to the right: with a long tail on the right) or
negatively skewed (skewed to the left: with a long tail on
the left)
• In UNIMODAL distributions, the curves can be symmetric
(“normal”) or asymmetric
• Asymmetric distributions can be skewed positively
(skewed to the right: with a long tail on the right) or
negatively skewed (skewed to the left: with a long tail on
the left)
Symmetric (normal) distribution
Positively skewed distribution
Positively skewed distribution
Negatively skewed distribution
Negatively skewed distribution
Bimodal distribution
Measures of central tendency
♦ Mean = xi n
♦ Example. Consider the following data:
2, 3, 9, 5, 4, 0, 6, 3, 4
xi = 36; n = 9
å xi n = 36/9 = 4.0
♦ It is useful for symmetrical
♦ For skewed distributions, it can mislead
♦ For example, mean of 5, 5, 5, 7, 10, 20, 102 is 22
Measures of central tendency
♦ Median is the central value
– e.g. in example1 data, rearranging the values, we obtain: 0, 2, 3, 3, 4,
4, 5, 6, 9
– middle value = (n+1)2th value
= 5th value = 4
♦ Consider following: 0, 2, 3, 3, 4, 5, 5, 6, 6, 9
Median = (10+1)th value
2
Median= 5.5th value
= (4+5)/2= 4.5
Measures of central tendency
♦ Mode is the most commonly occurring value in a
set of values
♦ That is, mode is value with highest frequency
What measure to use?