Fundamentals Biostatistics: Lecture 1: Introduction
Fundamentals Biostatistics: Lecture 1: Introduction
Lecture 1: Introduction
Definitions, Data, Variables, Population, Sample and Sampling
Strategies
Definitions and Scope of Biostatistics, Data, and
Variables
2
Statistics
▪ Statistics is the science whereby inferences are made about specific random
phenomena on the basis of relatively limited sample data.
▪ The field of statistics has two main areas: mathematical statistics and applied statistics:
o Mathematical statistics concerns the development of new methods of statistical inference
and requires detailed knowledge of abstract mathematics for its implementation.
3
Biostatistics
4
Data
Definition
Data are observations that are collected, measured, or recorded during a
research study, experiment, survey, or any other data collection process. Data
can take various forms and may be categorized based on its nature and
characteristics.
Types of Data:
o Quantitative Data: Numerical data that represents quantities and can be
measured. Examples include height, weight, temperature.
o Qualitative Data: Categorical or non-numerical data that describes qualities
or characteristics. Examples include blood type, gender, or opinion.
5
Variable
Definition
Examples:
Heart rate, height, weights, age, gender, education level, blood type,
disease stage, smoking status, ....
6
Types of Variables
Quantitative Qualitative
For example:
For example:
Smoking status, disease status
Height, weight, time, ....
(present absent), blood type,
disease stage (stage I, II, III ,...),
severity of disease (mild,
moderate, severe)
7
Types of Quantitative Variables
Discrete Continuous
8
Types of qualitative variables
Nominal Ordinal
9
Measurement scales (1 of 3)
2. Ordinal Scale
3. Interval Scale
4. Ratio Scale
10
Measurement scales (2 of 3)
1. Nominal Scale:
◦ Description: The simplest level of measurement that categorizes data into distinct
categories or groups without any inherent order or ranking.
◦ Examples: Categories like gender, color, or marital status.
◦ Properties: No numerical values are assigned, and there is no meaningful order
among categories.
2. Ordinal Scale:
◦ Description: Orders and ranks data into distinct categories, but the intervals
between categories are not consistent or meaningful.
◦ Examples: Educational levels (e.g., high school, bachelor's, master's), survey
responses (e.g., strongly agree, agree, neutral, disagree, strongly disagree).
◦ Properties: Relative order is meaningful, but the differences between ranks are not
consistent.
Note: Both nominal and ordinal scales are used to describe measurement level of
qualitative variables
11
Measurement scales (3 of 3)
3. Interval Scale:
◦ Description: Orders and ranks data with consistent intervals between consecutive
points, but there is no true zero point.
◦ Examples: Temperature measured in Celsius or Fahrenheit, IQ scores.
◦ Properties: Has a meaningful order, consistent intervals, but the absence of a true
zero point means that ratios are not meaningful.
4. Ratio Scale:
◦ Description: Orders and ranks data with consistent intervals between consecutive
points and has a true zero point, allowing for meaningful ratios.
◦ Examples: Height, weight, income, age.
◦ Properties: Has a meaningful order, consistent intervals, and a true zero point,
making ratios meaningful.
Note: Both interval and ratio scales are used to describe the measurement level of
quantitative variables
12
Population, Sample and Sampling
Strategies
13
Population and Sample
14
Sampling vs census (1 of 3)
Definition:
◦ Sampling: Involves selecting a subset of individuals from a larger population
to represent and make inferences about that population.
◦ Census: Encompasses the complete enumeration or collection of data from
every individual in a population.
Inclusion:
◦ Sampling: Only a portion (sample) of the population is included in the study.
◦ Census: Every individual in the entire population is included.
Scope:
◦ Sampling: Provides information about the population through the study of a
smaller, carefully chosen subset.
◦ Census: Aims to collect data from the entire population, leaving no one out.
15
Sampling vs census (2 of 3)
Time and Resource Requirements:
◦ Sampling: Typically requires less time and resources compared to a census,
especially for large populations.
◦ Census: Can be resource-intensive and time-consuming, especially for large
populations.
Feasibility:
◦ Sampling: More feasible for large populations where a complete
enumeration is impractical.
◦ Census: More feasible for small populations or when resources allow for
complete data collection.
16
Sampling vs census (3 of 3)
Variability and Error:
◦ Sampling: Introduces variability due to the inherent randomness of the
sampling process; may involve sampling error.
◦ Census: Assumes no sampling error; potential errors may arise from non-
response or data collection issues.
Statistical Inference:
◦ Sampling: Statistical methods are used to infer characteristics of the
population from the sample.
◦ Census: Directly describes the entire population without the need for
statistical inference.
Representativeness:
◦ Sampling: Requires careful design to ensure that the sample is
representative of the population.
◦ Census: Presumed to be representative as it includes every individual.
17
Random sample and representativeness
18
Probabilistic/Random Sampling Strategies
19
Simple Random Sampling (SRS):
20
Stratified Random Sampling:
◦ Description: Divides the population into subgroups or strata based
on certain characteristics, and then random samples are taken from
each stratum.
◦ Procedure: Ensures representation from different subgroups,
reducing sampling bias.
◦ Advantages: Improved precision and representation compared to
simple random sampling.
◦ Limitations: Requires accurate information about the population's
characteristics for effective stratification.
◦ Example: Divide a population of students into strata based on grade
levels (e.g., freshman, sophomore, junior, senior) and then
randomly select a proportionate number of students from each
stratum.
21
Systematic Sampling:
◦ Description: Selects every nth individual from a list after a random start.
◦ Example: Select every 10th patient from a list of patients visiting a clinic
after randomly choosing a starting point between 1 and 10.
22
Cluster Sampling
23
Multi-Stage Sampling:
24