Statistical Method Lecture Note
Statistical Method Lecture Note
Introduction
Statistics is a very broad subject, with applications in a vast number of different fields. In
generally one can say that statistics is the methodology for collecting, analyzing, interpreting and
drawing conclusions from information. Everything that deals even remotely with the collection,
processing, interpretation and presentation of data belongs to the domain of statistics
Definition:-Statistics is the scientific method for collecting, organizing, presenting and analysis
of data, for the purpose of making reasonable decisions and drawing valid conclusion on the
basis of such analysis.
Statistics consist of the methods for collecting and analyzing data. (Agresti& Finlay, 1997)
Statistics is the science of gaining information from numerical and categorical data.
Biostatistics can be defined as application of statistical principles or concept to biological
Science, public health or health related data.
Role of Statistics in Science and Health care delivery
1. It provides information for understanding, monitoring, improving the use of resources to
improve the lives of people.
2. It helps in figuring who is at risk for certain disease, finding ways to control diseases and
deciding which disease should be studied.
3. Statistics are important to health care companies in measuring performance success or
failure.
4. Descriptive statistics summarize the utility, efficacy and costs of medical goods and
services.
Vital Statistics
These are statistics concerning the important events on human life such as birth rate, death rate,
marriages, divorce, migration fetal death and other important health details.
Importance of Vital Statistics
1. Vital statistics is important for analysis of health trends, programs, planning and policy
development and implementation.
2. It gives information like leading causes of death, low birth weight babies, and mother's access
to prenatal care.
1
Divisions of statistics
1. Descriptive Statistics (Deductive Statistics).
2. Inferential Statistics (Inductive Statistics).
Descriptive Statistics:Consist of methods for organizing and summarizing information (Weiss,
1999).
Descriptive statistics includes the construction of graphs, charts, and tables, and the calculation
of various descriptive measures such as averages, measures of variation, and percentiles. In fact,
the most part of this course deals with descriptive statistics.
Inferential Statistics:Consist of methods for drawing and measuring the reliability of
conclusions about population based on information obtained from a sample of the population.
(Weiss, 1999)
Inferential statistics includes methods like point estimation, interval estimation and hypothesis
testing, determining relation between variables, making prediction which are all based on
probability theory.
Variable
A variable is a quantity that may vary from object to object. Or
Is a characteristics or attribute that can assume different value.
Types of Variables
Qualitative (Categorical) Variables
A variable can be described as qualitative when it yields categorical responses, Some examples
of qualitative (or categorical) variables and their values are:
1. Color of a person’s hair (black, gray, red, brownetc)
2. Gender of child (male, female)
3. Province of residence of a Nigerian Citizen (Kano, Kaduna, Katsinaetc)
4. Cause of death of newborn (congenital malformation, asphyxia, etc)
Quantitative Variables
A variable can be described as quantitative when it yields numerical responses or value.
Quantitative variables may be further described as either continuous or discrete:
Some examples of quantitative variables (with scale of measurement; values) are the following:
1. Height
2 inch units; 0.0, 0.5, 1.0, 1.5, . . . , 99.0, 99.5, 100.0)
2
2. Number of particles emitted by a radioactive source (counts per minute; 0, 1, 2, 3, . . . )
3. Total body calcium of a patient with osteoporosis (nearest gram; 0, 1, 2, . . . , 9999, 10,000)
4. Survival time of a patient diagnosed with lung cancer (nearest day; 0, 1, 2, . . . , 19,999,
20,000)
5. Apgar score of infant 60 seconds after birth (counts; 0, 1, 2, . . ., 8, 9, 10)
6. Number of children in a family (counts; 0, 1, 2, 3, . . . )
Scales of measurement
Measurement scales are instrument for measuring variables. There are four types of scales on
which a variable may be measured:
1. Nominal scale - merely attempts to assign identities to categories.Observations can take a
value that is not able to be organized in a logical sequence.e.g. sex, religion.
2. Ordinary scale - ranks ideas or object in an order of priority or preference. Interval between
ranks is not equal. Observations can take a value that can be logically ordered or ranked.e.g. Attitude
(strongly agree, disagree, no response),
3. Ratio scale - have equal intervals, and each is identified with a number e.g. speed length e.g.
4. Interval scale - similar to ratio scale but lack a true zero. The intervals are equal but the zero is
fixed arbitrarily e.g. temperature.
Basic Statistical Terms
Statistic: - Measurable (Numerical) characteristics of sample
Inference:-Making predictions and generalizing about phenomena represented by the data.
Variable: - any characteristic that varies from one individual member of the population to
another. E.g height, weight, number of siblings, sex, marital status, and religion
Parameter : - Measurable characteristics of the population or the true value we hope to obtain.
Population :- The totality of object of interest orthe collection of all individuals or items under
consideration in a statistical study. (Weiss, 1999)
Sample:-A sample is a selection of cases from the population. The sample size is the number of
cases in the sample or A portion of the population selected for enquiry or is that part of the
population from which information is collected. (Weiss, 1999)
Census :-A census is a sample that contains the entire population or the process of obtaining
information about the population.
Element:-is a unit in the variable or each object in a set of variable.
3
Frequency:- Is the number of times a particular data point occurs in the set of data.
Frequency Distribution:- Is a table that list each data point and its frequency.
Relative Frequency:- Is the frequency of a data point expressed as a percentage of the total
number of data points.
SPSS:-Statistical Package For Social Science
Data
Row unprocced information
Sources of Statistical Data
1. Primary source / primary data
2. Secondary source/ secondary data
Data presentation
Statistical data can be presented in any three key ways namely, tabular, graphical and
diagrammatic presentation of data.
Tabular presentation of data
1. Raw data are collected data that have not been organized numerically. An array is an
arrangement of raw numerical data in ascending or descending order of magnitude.
2. When summarizing large masses of data, it is often useful to distribute the data into classes, or
categories, and to determine the number of individuals belonging to each class, called the class
frequency.
3. A tabular arrangement of data by classes together with corresponding class frequencies is
called a frequency distribution, or frequency table.
4
U1 = L1 + C -1 for whole numbers
L1 + C - 0.1 for data with 1 decimal
L1 + C - 0.01 for data with 2 decimals
L1 + C - 0.001 for data with 3 decimal, e.t.c
(iv) Form frequency table
Example
The following relates to the weights of 40 male students in School of Nursing Kano. The data
were recorded to the nearest kg. construct a grouped frequency distribution table:
138 146 168 146 161
164 158 126 173 145
150 140 138 142 135
132 147 176 147 142
144 136 163 135 150
125 148 119 153 156
149 152 154 140 145
157 144 165 135 128
Steps
(i) Range = 176 – 119 = 57
(ii) k =3.322 log ( 40 ) +1=5.322+ 1=6.322 ≅ 6 ≥ 6 classes
Range 57
(iii) W = = =9.5≅ 10
k 6
So, the frequency table is formed with minimum of 6classes and a class width of 10.
Since the smallest reading is 119kg, the lower limit of the first class should be 119 or less. This
limit can be convenience, be chosen as 119kg. We then have the following 6class intervals
Table 1.0
Class interval Tally Frequency
(Weights) Marks
119-129
129-139
139-149
5
149-159
159-169
169-179
6
x1+ x2 + x3 + · · · + xn
n
To further simplify the writing of a sum, the Greek letter Σ (sigma) is used
as a shorthand. The sum x1 + x2 + x3 + · · · + xnis denoted as
n
∑ x i∨∑ x
i
Note that the sample mean of the variable is the sum of observed values x 1, x 2, x 3 ,… x nin a data
divided by the number of observations n.
The sample mean is denoted by x , and expressed operationally,
n
∑ x
∑ xi
x= ∨ i
n n
Note that above formular is for ungrouped data
For the grouped data the formula is given as
n
fx ∑
fx i
x=
∑ ∨ i
n n
Example 1.1
Obtain the arithmetic mean for the set of numbers 3,8,4,6, and 7.
n
∑ xi 3+8+ 4+6 +7
i
AM =x= = =5.6
n 5
Example 1.2
Marks scored by 50 students in a course are presented below:
Marks(x) Frequency(f) Fx
0 4 0
1 6 6
2 4 8
3 3 9
4 15 60
5 10 50
6 5 30
7
7 3 21
Total 50 184
∑ fX i 184
AM =X= i=1 = =3.68
N 50
Median
To obtain the median of the variable, we arrange observed values in a data set in ascending order
and then determine the middle value in the ordered list.
1. If the number of observation is odd, then the sample median is the observed value exactly in
the middle of the ordered list.
2. If the number of observation is even, then the sample median is the number halfway between
the two middle observed values in the ordered list.
Example 2.1
Using the data in example 1.1 given as 3 , 8 , 4 , 6 , 7
The median can be obtain as follows
3,4,6,7,8
Hence the median is 6
For an even number of observations e.g. 10, 3, 12, 8, 15,17,6,13. The arranged data is
10+12
3,6,8,10,12,13,15,17. n=8, the median is the average of the two middle values = =11
2
Example 2.2
Using the data in example 1.2 above given as
Marks(x) Frequency(f) Fx CF
0 4 0 4
1 6 6 10
2 4 8 14
3 3 9 17
4 15 60 32
8
5 10 50 42
6 5 30 47
7 3 21 50
Total 50 184
[ ]
n
2 ∑ 1
− f
Median=Lm + C
fm
Example 2.3
Class Frequency(f) Cum-f
1-10 1 1
11-20 5 6
21-30 10 16
31-40 19 35
41-50 42 77
51-60 10 87
61-70 6 93
71-80 4 97
81-90 2 99
91-100 1 100
9
( )
th
n+1 th
The median position is =50.5 .
2
Lm=40.5, ∑ f 1=35 , f m=42, C=10
[ ]
n
−∑ f 1
2
Median=Lm + C
fm
[ ]
100
−35
2
Median=40.5+ × 10=44.06
42
The Mode
The model is simply the item with the highest frequency.A distribution can have more
than one model, unimodal - one mode, bimodal - two modes, tri-modal - three modes, and
multimodal - more than three modes.
Example 3.1
Mode from raw data - the mode can be obtained from raw data by simply picking the item that
occurs most frequently.
Given 2,8,3,4,2,6,2,4.
Mode =2 since it occurs most frequently.
X F
1 4
2 6
3 5
4 5
10
Time taken in seconds by 100 different chemical substances to melt when subjected to a
particular temporary condition are given below:
Time (in seconds) F
4.51 -5.32 15
5.33 – 6.14 7
6.15 – 6.96 35
6.97 – 7.78 28
7.79 – 8.60 10
8.61 – 9.42 5
Total 100
Mode=Lmode +
[ ]f1
f 1+ f 2
C
Mode=6.145+
[ 28
28+ 7]× 0.82=6.801
Measures of Variability
In addition to locating the center of the observed values of the variable in the data, another
important aspect of a descriptive study of the variable is numerically measuring the extent of
variation around the center. Two data sets of the same variable may exhibit similar positions of
center but may be remarkably different with respect to variability.
Just as there are several different measures of center, there are also several different measures of
variation like. In this section, we will examine two of the most frequently used measures of
variation; the range and the standard deviation. Measures of variation are used mostly only for
quantitative variables.
Range
It is simply the difference between the largest and the smallest values in a distribution.
Range = Max −Min.
11
Variance and Standard Deviation
∑ ( x i−x )2
Sample Variance(S2 )= i=1
n−1
For frequency data
n
∑ f (x i−x)2
Sample Variance(S2 )= i=1
n−1
Standard Deviation
√
n
∑ (xi −x)2
i=1
S=
n−1
√
n
∑ f ( x i−x)2
i=1
S=
n
Note that the examples above can be used to find the sample variance and standard deviation
EXERCISE
Question 1
Consider the aflatoxin data given below;
12
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37
Find the AM
Mode
Median
Question 2
Consider the following 10 observations of systolic blood pressure in mmHg
118, 120, 122, 160, 130, 150, 122, 119, 120, 122
Find; mean, median, mode, range and standard deviation
Question 3
Given the blood sample below
O,O,A,B,A,A,B,AB,O,A,AB,O,O,A,O,B,A,B,B,A
Use the observation above to create a frequency table
Question 4
The data below represents scores obtained by first students of midwifery Kano in statistics
course
17 47 52 92 8
28 23 53 90 9
17 63 17 23 17
10 66 19 47 20
8 66 20 17 25
90 82 10 45 40
i. What are the real limit of the class interval and width?
ii. Construct frequency table
iii. Calculate the mean and median of the distribution
iv. Determine the mode
v. Compute the variance and standard deviation
13