0% found this document useful (0 votes)
4 views13 pages

Statistical Method Lecture Note

The document provides an introduction to biostatistics, defining it as the application of statistical principles to biological and health-related data. It discusses the role of statistics in healthcare, the importance of vital statistics, and the divisions of statistics into descriptive and inferential categories. Additionally, it covers variables, scales of measurement, basic statistical terms, data presentation methods, and measures of central tendency including mean, median, and mode.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

Statistical Method Lecture Note

The document provides an introduction to biostatistics, defining it as the application of statistical principles to biological and health-related data. It discusses the role of statistics in healthcare, the importance of vital statistics, and the divisions of statistics into descriptive and inferential categories. Additionally, it covers variables, scales of measurement, basic statistical terms, data presentation methods, and measures of central tendency including mean, median, and mode.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

INTRODUCTION TO BIOSTATISTICS

Introduction
Statistics is a very broad subject, with applications in a vast number of different fields. In
generally one can say that statistics is the methodology for collecting, analyzing, interpreting and
drawing conclusions from information. Everything that deals even remotely with the collection,
processing, interpretation and presentation of data belongs to the domain of statistics
Definition:-Statistics is the scientific method for collecting, organizing, presenting and analysis
of data, for the purpose of making reasonable decisions and drawing valid conclusion on the
basis of such analysis.
Statistics consist of the methods for collecting and analyzing data. (Agresti& Finlay, 1997)
Statistics is the science of gaining information from numerical and categorical data.
Biostatistics can be defined as application of statistical principles or concept to biological
Science, public health or health related data.
Role of Statistics in Science and Health care delivery
1. It provides information for understanding, monitoring, improving the use of resources to
improve the lives of people.
2. It helps in figuring who is at risk for certain disease, finding ways to control diseases and
deciding which disease should be studied.
3. Statistics are important to health care companies in measuring performance success or
failure.
4. Descriptive statistics summarize the utility, efficacy and costs of medical goods and
services.
Vital Statistics
These are statistics concerning the important events on human life such as birth rate, death rate,
marriages, divorce, migration fetal death and other important health details.
Importance of Vital Statistics
1. Vital statistics is important for analysis of health trends, programs, planning and policy
development and implementation.
2. It gives information like leading causes of death, low birth weight babies, and mother's access
to prenatal care.

1
Divisions of statistics
1. Descriptive Statistics (Deductive Statistics).
2. Inferential Statistics (Inductive Statistics).
Descriptive Statistics:Consist of methods for organizing and summarizing information (Weiss,
1999).
Descriptive statistics includes the construction of graphs, charts, and tables, and the calculation
of various descriptive measures such as averages, measures of variation, and percentiles. In fact,
the most part of this course deals with descriptive statistics.
Inferential Statistics:Consist of methods for drawing and measuring the reliability of
conclusions about population based on information obtained from a sample of the population.
(Weiss, 1999)
Inferential statistics includes methods like point estimation, interval estimation and hypothesis
testing, determining relation between variables, making prediction which are all based on
probability theory.
Variable
A variable is a quantity that may vary from object to object. Or
Is a characteristics or attribute that can assume different value.
Types of Variables
Qualitative (Categorical) Variables
A variable can be described as qualitative when it yields categorical responses, Some examples
of qualitative (or categorical) variables and their values are:
1. Color of a person’s hair (black, gray, red, brownetc)
2. Gender of child (male, female)
3. Province of residence of a Nigerian Citizen (Kano, Kaduna, Katsinaetc)
4. Cause of death of newborn (congenital malformation, asphyxia, etc)
Quantitative Variables
A variable can be described as quantitative when it yields numerical responses or value.
Quantitative variables may be further described as either continuous or discrete:
Some examples of quantitative variables (with scale of measurement; values) are the following:
1. Height
2 inch units; 0.0, 0.5, 1.0, 1.5, . . . , 99.0, 99.5, 100.0)

2
2. Number of particles emitted by a radioactive source (counts per minute; 0, 1, 2, 3, . . . )
3. Total body calcium of a patient with osteoporosis (nearest gram; 0, 1, 2, . . . , 9999, 10,000)
4. Survival time of a patient diagnosed with lung cancer (nearest day; 0, 1, 2, . . . , 19,999,
20,000)
5. Apgar score of infant 60 seconds after birth (counts; 0, 1, 2, . . ., 8, 9, 10)
6. Number of children in a family (counts; 0, 1, 2, 3, . . . )
Scales of measurement
Measurement scales are instrument for measuring variables. There are four types of scales on
which a variable may be measured:
1. Nominal scale - merely attempts to assign identities to categories.Observations can take a
value that is not able to be organized in a logical sequence.e.g. sex, religion.
2. Ordinary scale - ranks ideas or object in an order of priority or preference. Interval between
ranks is not equal. Observations can take a value that can be logically ordered or ranked.e.g. Attitude
(strongly agree, disagree, no response),
3. Ratio scale - have equal intervals, and each is identified with a number e.g. speed length e.g.
4. Interval scale - similar to ratio scale but lack a true zero. The intervals are equal but the zero is
fixed arbitrarily e.g. temperature.
Basic Statistical Terms
Statistic: - Measurable (Numerical) characteristics of sample
Inference:-Making predictions and generalizing about phenomena represented by the data.
Variable: - any characteristic that varies from one individual member of the population to
another. E.g height, weight, number of siblings, sex, marital status, and religion
Parameter : - Measurable characteristics of the population or the true value we hope to obtain.
Population :- The totality of object of interest orthe collection of all individuals or items under
consideration in a statistical study. (Weiss, 1999)
Sample:-A sample is a selection of cases from the population. The sample size is the number of
cases in the sample or A portion of the population selected for enquiry or is that part of the
population from which information is collected. (Weiss, 1999)
Census :-A census is a sample that contains the entire population or the process of obtaining
information about the population.
Element:-is a unit in the variable or each object in a set of variable.

3
Frequency:- Is the number of times a particular data point occurs in the set of data.
Frequency Distribution:- Is a table that list each data point and its frequency.
Relative Frequency:- Is the frequency of a data point expressed as a percentage of the total
number of data points.
SPSS:-Statistical Package For Social Science
Data
Row unprocced information
Sources of Statistical Data
1. Primary source / primary data
2. Secondary source/ secondary data
Data presentation
Statistical data can be presented in any three key ways namely, tabular, graphical and
diagrammatic presentation of data.
Tabular presentation of data
1. Raw data are collected data that have not been organized numerically. An array is an
arrangement of raw numerical data in ascending or descending order of magnitude.
2. When summarizing large masses of data, it is often useful to distribute the data into classes, or
categories, and to determine the number of individuals belonging to each class, called the class
frequency.
3. A tabular arrangement of data by classes together with corresponding class frequencies is
called a frequency distribution, or frequency table.

4. Rules for forming frequency distribution


(i) Find the range i.e. highest value - lowest value
(ii) Find the number of classes required (k), by Sturges’ rule:
k = 3.322log(N) + 1. E.g. If we have 50 observations, that is N=50 then
k =3.322 log ( 50 ) +1=5.644+ 1=6.644 ≅ 7 ≥7 classes
(ii) Calculate the width or size of the class as
Range
W=
k
(iii) Calculate the upper limit of the first class using the formula:

4
U1 = L1 + C -1 for whole numbers
L1 + C - 0.1 for data with 1 decimal
L1 + C - 0.01 for data with 2 decimals
L1 + C - 0.001 for data with 3 decimal, e.t.c
(iv) Form frequency table
Example
The following relates to the weights of 40 male students in School of Nursing Kano. The data
were recorded to the nearest kg. construct a grouped frequency distribution table:
138 146 168 146 161
164 158 126 173 145
150 140 138 142 135
132 147 176 147 142
144 136 163 135 150
125 148 119 153 156
149 152 154 140 145
157 144 165 135 128
Steps
(i) Range = 176 – 119 = 57
(ii) k =3.322 log ( 40 ) +1=5.322+ 1=6.322 ≅ 6 ≥ 6 classes
Range 57
(iii) W = = =9.5≅ 10
k 6

So, the frequency table is formed with minimum of 6classes and a class width of 10.
Since the smallest reading is 119kg, the lower limit of the first class should be 119 or less. This
limit can be convenience, be chosen as 119kg. We then have the following 6class intervals
Table 1.0
Class interval Tally Frequency
(Weights) Marks
119-129
129-139
139-149

5
149-159
159-169
169-179

Measures of central tendency


Measures of central tendency otherwise known as measures of location are simply
averages. The most commonly used are the arithmetic mean, mode, median, geometric mean and
harmonic mean but we will strict only to mean, mode median of ungrouped data and grouped
data.
Descriptive measures that indicate where the center or the most typical value of the variable lies
in collected set of measurements are called measures of center. Measures of center are often
referred to as averages.
The median and the mean apply only to quantitative data, whereas the mode can be used with
either quantitative or qualitative data.
The Arithmetic Mean
The most commonly used measure of center for quantitative variable is the (arithmetic) sample
mean. When people speak of taking an average, it is mean that they are most often referring to
Arithmetic Mean.
The sample mean of the variable is the sum of observed values in a data divided by the number
of observations.
To effectively present the ideas and associated calculations, it is convenient to represent
variables and observed values of variables by symbols to prevent the discussion from becoming
anchored to a specific set of numbers. So let us use x to denote the variable in question, and then
the symbol xidenotes ith observation of that variable in the data set.
The sample mean of the variable is the sum of observed values x1, x2, x3, . . . ,xnin a data divided
by the number of observations n.
The sample mean is denoted by x and expressed operationally,

If the sample size is n, then the mean of the variable x is

6
x1+ x2 + x3 + · · · + xn
n
To further simplify the writing of a sum, the Greek letter Σ (sigma) is used
as a shorthand. The sum x1 + x2 + x3 + · · · + xnis denoted as
n

∑ x i∨∑ x
i

Note that the sample mean of the variable is the sum of observed values x 1, x 2, x 3 ,… x nin a data
divided by the number of observations n.
The sample mean is denoted by x , and expressed operationally,
n

∑ x
∑ xi
x= ∨ i
n n
Note that above formular is for ungrouped data
For the grouped data the formula is given as
n

fx ∑
fx i
x=
∑ ∨ i
n n
Example 1.1
Obtain the arithmetic mean for the set of numbers 3,8,4,6, and 7.
n

∑ xi 3+8+ 4+6 +7
i
AM =x= = =5.6
n 5
Example 1.2
Marks scored by 50 students in a course are presented below:
Marks(x) Frequency(f) Fx
0 4 0
1 6 6
2 4 8
3 3 9
4 15 60
5 10 50
6 5 30

7
7 3 21
Total 50 184

∑ fX i 184
AM =X= i=1 = =3.68
N 50

Median
To obtain the median of the variable, we arrange observed values in a data set in ascending order
and then determine the middle value in the ordered list.
1. If the number of observation is odd, then the sample median is the observed value exactly in
the middle of the ordered list.
2. If the number of observation is even, then the sample median is the number halfway between
the two middle observed values in the ordered list.
Example 2.1
Using the data in example 1.1 given as 3 , 8 , 4 , 6 , 7
The median can be obtain as follows
3,4,6,7,8
Hence the median is 6
For an even number of observations e.g. 10, 3, 12, 8, 15,17,6,13. The arranged data is
10+12
3,6,8,10,12,13,15,17. n=8, the median is the average of the two middle values = =11
2
Example 2.2
Using the data in example 1.2 above given as
Marks(x) Frequency(f) Fx CF
0 4 0 4
1 6 6 10
2 4 8 14
3 3 9 17
4 15 60 32

8
5 10 50 42
6 5 30 47
7 3 21 50
Total 50 184

To find median first find cumulative frequency


n+1 50+ 1
Then the position = =25.5 approximately 26
2 2
It lies between 32 and the corresponding mark score is 4 which means the median is 4
Or the following formula can be use

[ ]
n
2 ∑ 1
− f
Median=Lm + C
fm

Example 2.3
Class Frequency(f) Cum-f
1-10 1 1
11-20 5 6
21-30 10 16
31-40 19 35
41-50 42 77
51-60 10 87
61-70 6 93
71-80 4 97
81-90 2 99
91-100 1 100

9
( )
th
n+1 th
The median position is =50.5 .
2
Lm=40.5, ∑ f 1=35 , f m=42, C=10

[ ]
n
−∑ f 1
2
Median=Lm + C
fm

[ ]
100
−35
2
Median=40.5+ × 10=44.06
42

The Mode
The model is simply the item with the highest frequency.A distribution can have more
than one model, unimodal - one mode, bimodal - two modes, tri-modal - three modes, and
multimodal - more than three modes.
Example 3.1
Mode from raw data - the mode can be obtained from raw data by simply picking the item that
occurs most frequently.
Given 2,8,3,4,2,6,2,4.
Mode =2 since it occurs most frequently.

X F
1 4
2 6
3 5
4 5

Mode = highest frequency is 6 and the corresponding value of x is 2. Hence, mode is 2.


Mode from grouped frequency table
Example 3.2

10
Time taken in seconds by 100 different chemical substances to melt when subjected to a
particular temporary condition are given below:
Time (in seconds) F
4.51 -5.32 15
5.33 – 6.14 7
6.15 – 6.96 35
6.97 – 7.78 28
7.79 – 8.60 10
8.61 – 9.42 5
Total 100

Mode=Lmode +
[ ]f1
f 1+ f 2
C

Mode=6.145+
[ 28
28+ 7]× 0.82=6.801

Measures of Variability
In addition to locating the center of the observed values of the variable in the data, another
important aspect of a descriptive study of the variable is numerically measuring the extent of
variation around the center. Two data sets of the same variable may exhibit similar positions of
center but may be remarkably different with respect to variability.
Just as there are several different measures of center, there are also several different measures of
variation like. In this section, we will examine two of the most frequently used measures of
variation; the range and the standard deviation. Measures of variation are used mostly only for
quantitative variables.
Range
It is simply the difference between the largest and the smallest values in a distribution.
Range = Max −Min.

11
Variance and Standard Deviation

For raw data


n

∑ ( x i−x )2
Sample Variance(S2 )= i=1
n−1
For frequency data
n

∑ f (x i−x)2
Sample Variance(S2 )= i=1
n−1

Standard Deviation

For raw data


n

∑ (xi −x)2
i=1
S=
n−1

For frequency data


n

∑ f ( x i−x)2
i=1
S=
n

Note that the examples above can be used to find the sample variance and standard deviation

EXERCISE
Question 1
Consider the aflatoxin data given below;

12
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37
Find the AM
Mode
Median
Question 2
Consider the following 10 observations of systolic blood pressure in mmHg
118, 120, 122, 160, 130, 150, 122, 119, 120, 122
Find; mean, median, mode, range and standard deviation

Question 3
Given the blood sample below
O,O,A,B,A,A,B,AB,O,A,AB,O,O,A,O,B,A,B,B,A
Use the observation above to create a frequency table

Question 4
The data below represents scores obtained by first students of midwifery Kano in statistics
course
17 47 52 92 8
28 23 53 90 9
17 63 17 23 17
10 66 19 47 20
8 66 20 17 25
90 82 10 45 40
i. What are the real limit of the class interval and width?
ii. Construct frequency table
iii. Calculate the mean and median of the distribution
iv. Determine the mode
v. Compute the variance and standard deviation

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy