Sta 111 1ST Lecture Note
Sta 111 1ST Lecture Note
Course Outline:
Introductory statistical methods for biological data
o Types of Data (quantitative and qualitative)
o Scales of Measurement
o Population vs Sample
o Accuracy vs Precision
Describing Data
o Frequency Tables (grouped and ungrouped)
o Graphical Summary (bar chart, pie chart, histogram, box plot scatter plot, stem and
leaf plot etc)
o Numerical Summary (measure of location and measures of dispersion)
Introduction to Probability
Intermediate statistical methods
o Comparing Groups (analysis of variance)
o Quantal Bioassay
Analyzing Associations (linear and logistic regression)
Methods for Categorical Data
o contingency tables and odds ratio
o Sensitivity and Specificity
1
1.0 INTRODUCTION TO STATISTICAL METHODS FOR BIOLOGICAL
DATA
In today's world of high-throughput experiments, we deal with laboratory equipment
constantly churning out mountains of data. But without an understanding of statistics and
knowledge of the techniques required to analyze, summarize and interpret these data, we
are very limited in what we can learn from our observations, which will in turn inhibit
our ability to move forward in our research. Even with experiments that generate very
little data, there is a need to simulate phenomena by modeling the behavior of systems
and their parameters, which again often needs to be done statistically. It is therefore
imperative that we understand some basics concepts of statistics in our field. The
knowledge of statistics in various fields assists in the following;
i. Enables one to read and understand the various statistical studies performed in
your fields. To have this understanding, you must be knowledgeable about the
vocabulary, symbols, concepts, and statistical procedures used in these studies.
ii. Allows you to conduct research in your field. Since statistical procedures are
basic to research. To accomplish this, you must be able to design experiments;
collect, organize, analyze, and summarize data; and possibly make reliable
predictions or forecasts for future use. You must also be able to communicate
the results of the study in your own words.
iii. You can also use the knowledge gained from studying statistics to become
better consumers and citizens. For example, you can make intelligent decisions
about what products to purchase based on consumer studies, about government
spending based on utilization studies, and so on.
Statistical data are the basic raw materials for statistical investigation. Information is
essentially referred to as data in Statistics. In everything we do, we seek information in
order to guide us in all our activities. In fact, activities we embark upon today will
provide information to guide us better in executing similar activities in (subsequent days)
future activities. However, gathering information may be formal or informal. Formal
Gathering of Information involves documented information in which every bit of what
has been observed in the past or what is being observed currently is expected to be kept
in its original (or raw) form. Informal Gathering of Information involves information
about experiences in the past which were not immediately captured. It may not always
provide desired level of information that is equivalent to complete retrieval as in the
formal method of gathering information.
1.1 DATA
Data are the values (measurements or observations) that a variable such as age, weight,
height, exam scores, shoe size etc. can assume. On the other hand, Biological data are
2
data or measurements collected from biological sources, which are often stored or
exchanged in a digital form. Biological data are commonly stored in files or databases.
Examples of biological data include;
Sequences: DNA, RNA, Protein
Structures of biological Molecules
Gene expressions profiles
Biochemical pathway
Chromosomal mapping
Phylogenetic data
Single Nucleotide Polymorphisms (SNPs)
Etc.
The challenge thus lies in the use of statistical methods in analyzing and making
meaningful inference for immediate and future use using some biological data.
DATA COLLECTION
There are two main source of data collection in statistics namely;
- Primary source
- Secondary source
Primary source of data
Data from primary source are datasets obtained directly from the concerned object.
Primary sources of data provide data compiled as a result of population count or results
obtained from a sample of the population where the population is too large for individual
count. Primary sourced data can be collected either by
i. Direct personal observations (e.g. Laboratory experiments)
ii. Personal interview
iii. Mailed questionnaire
iv. Questionnaires administered by enumerators
v. Direct interview by people
Advantages of Primary source of data
i. It supplies exact information
ii. It gives more reliable data than the secondary source
iii. It gives detailed data than the secondary source
Disadvantages of Primary source of data
i. It is very expensive
ii. It takes time
iii. It may involve large non-responses.
3
Secondary source of data
Secondary source provide data readily available or previously used data from
administrative sources such as journals, newspapers, databases, and official compilations
etc.
Advantages of secondary source of data
i. It gives quicker information than the primary source
ii. It is more timely than the primary source
iii. It is not as expensive as the primary source
Disadvantages of secondary source of data
i. It gives less information than the primary source
ii. It may be wider or narrower than the objectives of the research
iii. It may not be as detailed information as the primary source
Quantitative Data:
Quantitative data are observations that are measured on a numerical scale. The
most common type of data is quantitative data, since many descriptive variables in
nature are measured on numerical scales. Examples of quantitative data are:
Number of leaves per plant, yield of cowpea, the heights (or weights) of students
in a class, the number of Lecturers in the faculty of Science, University of Ilorin,
Nigeria. The measurements in these examples are all numerical. Quantitative
variables can also be divided into two types; Continuous or Discrete.
Qualitative Data:
All data that are not quantitative are qualitative. Qualitative data are data whose
values cannot be put in any numerical order. That is, they are observations that are
categorical rather than numerical and are not capable of being measured.
Examples of these are; political affiliations of a group of people, gender of a
person. Qualitative variable can also be classified into Discrete category.
4
- Continuous Variable: These are variables that can assume an infinite number of
values between any two specific values. They are obtained by measuring. They
often include fractions and decimals. Example include: weight of seeds, age of
plants, amount of water etc.
- Discrete Variable: Discrete variables are variables that assume values that can be
counted or if their values change by steps or jumps.
jumps For or example,
examp the number of
days it rained in malete in the month of September 2016, number of plants per
plot,, etc. Decimals and fractions are not allowed for this type of variable.
ORDINAL: The ordinal level of measurement classifies data into categories that
can be ranked; however, precise differences between the ranks do not exist.
exist For
instance, when people are classified according to their siize of shoes (small,
medium, or large), a large variation exists among the individuals
individuals in each class.
5
Other examples incluude; class of degrees, position in a competition,
competition HIV test result
etc.
Exercise