Module 5 - Inferential Statistics and Their Application
Module 5 - Inferential Statistics and Their Application
Outline
1. Introduction to Inferential Statistics - Paula
2. Fundamentals of Inferential Statistics - Sushma Yonzon
3. Sampling and sampling methods - Sushma (for Ronald)
4. Estimation and confidence Intervals - Faith
5. Statistical test (t-test/score) - Sushma
1
1. Introduction to Inferential Statistics
2
Introduction to Inferential Statistics
4
Applications in various fields
Applications in various fields, particularly biostatistics:
● Clinical trials
● Epidemiology
● Public health studies
● Survival analysis
● Sociology
● Market Research
5
2. Fundamentals of Inferential Statistics
Basic Concepts and Probability Theory
6
Basic Concepts:
SAMPLE & POPULATION
7
Basic Concepts:
PARAMETRICS & STATISTICS
Parameter: Statistics:
8
Probability Theory: Fundamental principles of probability
Probability: a measure of how likely it is for something to happen.
Consider a very large number of identical trials of a certain process; example, flipping a
coin, rolling a dice, picking a ball from a box (with replacement). If the probability of a
particular event occurring (example: getting a Heads, rolling a 5, or picking a blue
ball) is p, then the event will occur in a fraction p of the trials, on average.
Example:
The probability of getting a Heads on a coin flip is 1/2 (or equivalently 50%). This is true
because the probabilities of getting a Heads or a Tails are equal, which means that these
two outcomes must each occur half of the time, on average. (Fig: Tossing a Coin)
9
Ref: chap2p.pdf (harvard.edu)
Basic rules of probability: Addition
Addition Rule
“The OR: The “Union” probability, P(A or B)
probability
of the Let A = {rolling a 2 on the left dice} and B = {rolling a 5 on the right dice}
occurrenc What is the probability that A and B at least one of two occur?
e of at 2 cases:
least one (1) A and B are non-exclusive events, or (2) A and B are exclusive events.
10
Addition Rule Example
Calculation:
● P(A) = 1/6
● P(B) = 3/6 (2, 4, 6)
● P(A and B) = 1/6 (since 2 is the only even number that is also a specific outcome)
● P(A or B) = P(A) + P(B) - P(A and B)
= 3/6 = 1/2
11
Basic rules of probability: Multiplication
“The Multiplication Rule
probability AND: The “intersection” probability, P(A and B)
of the Let A = {rolling a 2 on the left die} and B = {rolling a 5 on the right die}
occurrence
What is the probability that A and B both occur?
of both of 2 cases:
two (1) A and B are dependent events, or (2) A and B are independent events.
independe 1. “And” rule for dependent events:
nt events”. If events A and B are dependent, then the probability that they both occur equals:
12
Ref: chap2p.pdf (harvard.edu)
Multiplication Rule Example (Independent Events)
Scenario: Flipping a coin and rolling a dice.
Calculation:
● P(A) = 1/2
● P(B) = 1/6
● P(A and B) = P(A) * P(B)
= 1/2 * 1/6
= 1/12
13
Multiplication Rule Example (Dependent Events)
Scenario: Drawing two cards from a deck without replacement.
Calculation:
● P(A) = 4/52
● P(B|A) = 4/51 (after drawing an ace, 51 cards remain with 4 kings)
● P(A and B) = P(A) * P(B|A)
= 4/52 * 4/51
= 16/2652
= 4/663
14
Conditional probability: Bayes’ Theorem
Conditional Probability: probability that B occurs, given that A occurs [ P(B|A) ] or vice-versa
3 forms:
1. Simple form:
2. Explicit form:
where, ~A = not A
3. General form:
15
Types of Probability Distribution
Normal Distribution: Symmetrical, bell-shaped distribution
16
Importance of Inference in Biostatistics
● Making Generalizations from a Sample to a Population
● Estimating Population Parameters
● Testing Hypotheses
17
3. Sampling and Sampling Methods
18
Sampling Approach
NON-PROBABILITY SAMPLING
PROBABILITY SAMPLING
●
Selecting subjects based on discretion and
● Selecting subjects based on a statistically
judgement of researcher (Non-Random
random basis (Random Sampling); every
Sampling); not everyone has equal chance of
subject has equal chance of being selected
being selected
● Predetermined process
●
Ex: convenience sampling
● Ex: lottery
●
Population parameters are unknown or hard to
● Commonly used in Quantitative Research
obtain
(numbers)
● Aims to produce generalizable findings
●
Commonly used in Qualitative Research
●
Getting depth of data is important than
generalizability
19
Probability Based Random Sampling Methods
Simple Random Sampling:Every individual of the population has an equal chance of
being selected. e.g picking names from a hat
Systematic random sampling:A starting point is selected randomly, and then every
nth individual of the population is chosen e.g selecting every 5th person after getting a starting
point
Cluster sampling :The population is divided into clusters, usually based on geographical
areas or naturally occurring groups
20
Sampling Distribution
● Is a probability distribution of a statistic that is obtained through repeated
sampling of a specific population
● Shows how the statistic would vary in value when you repeatedly draw
samples from the same population and calculate the statistic for each
sample.
Importance:
● Highlights the chance or probability of an event that may take place
● Since populations are typically large in size, it is important to use a sampling
distribution so that you can randomly select a subset of the entire population.
This eliminates variability in research or gathering statistical data.
● It makes the data easier to manage and builds a foundation for Statistical
inference, which leads to making inferences for the whole population.
21
Central Limit Theorem
● States that the random sampling distribution of means will always tend to be normal, irrespective of the shape of the population
distribution from which the samples were drawn.
● The mean of the random sampling distribution of means (symbolized by μx ̄ , showing that it is the mean of the population of all the
sample means) is equal to the mean of the original population; in other words, μx ̄ is equal to μ.
Key characteristics:
These characteristics largely revolve around samples, sample sizes, and the population of data.
Sampling is successive. This means some sample units are common with sample units selected on previous occasions.
Sampling is random. All samples must be selected at random so that they have the same statistical possibility of being selected.
Samples should be independent. The selections or results from one sample should have no bearing on future samples or other sample
results.
Large sample size. As sample size increases the sampling distribution approaches the normal distribution.
https://www.investopedia.com/terms/c/central_limit_theorem.asp
22
Standard Error and Its Role
● It is a measure of the extent to which the sample means deviate from the true
population mean.
● It is inversely proportional to the sample size;the larger the sample size, the smaller
the standard error .The smaller the standard error, the more representative the sample
will be of the overall population
23
4. Estimation and Confidence Intervals
24
Point Estimation
Point estimation involves using sample data to calculate a single
value (point estimate) that serves as the best guess for an unknown
population parameter.
Properties:
● Unbiasedness: An estimator is unbiased if its expected value is
equal to the true parameter value.
● Efficiency: Among unbiased estimators, the one with the smallest
variance is considered the most efficient.
● Consistency: An estimator is consistent if, as the sample size
increases, it converges to the true parameter value.
25
Common Point Estimators: Mean
Formula:
26
Common Point Estimators: Proportion
Formula:
Example: Estimating the proportion of people who prefer a certain brand of cereal in a
sample of 200 individuals.
27
Constructing Confidence Intervals
The sample mean ( X ) lies within ±1.96 standard errors of the population mean (μ) 95% (.95) of the
time; conversely, μ lies within ±1.96 standard errors of X 95% of the time.
These limits of ±1.96 standard errors are called the confidence limits.
Confidence limits are equal to the sample mean plus or minus the z score obtained from the table
(for the appropriate level of confidence) multiplied by the standard error:
Therefore, 95% confidence limits are approximately equal to the sample mean plus or minus two
standard errors.
Confidence Interval: difference between the upper and lower confidence limits
28
Constructing confidence intervals
Confidence Interval (CI): difference between the upper and lower confidence limits
29
Constructing confidence intervals
Components of the formula:
Standard error (SE): inversely related to the square root of the sample size, so that the larger n
becomes, the more closely will the sample means represent the true population mean.
Z- Score: z score is calculated in terms of the number of standard errors by which a sample
mean lies above or below the population mean.
30
Constructing confidence intervals
CONFIDENCE INTERVALS
Finding the confidence limits involves inferential statistics, because a sample statistic ( X ) is being used to estimate a population
parameter (μ).
Example:
If a researcher wishes to find the true mean resting heart rate of a large population, it would be impractical to take the pulse of
every person in the population. Instead, he or she would draw a random sample from the population and take the pulse of the
persons in the sample.
As long as the sample is truly random, the researcher can be 95% confident that the true population mean lies within ±1.96
standard errors of the sample mean.
31
Interpretation of Confidence Interval
32
Factors Affecting the Width of Confidence Intervals
Key Factors:
● Sample Size: Larger sample sizes result in narrower
confidence intervals.
● Confidence Level: Higher confidence levels (e.g., 99%)
result in wider confidence intervals.
● Population Variability: Greater variability in the population
leads to wider confidence intervals.
33
5. Statistical test (t-test)
34
t- score (t-value)
t-score (or t-value) is a type of standard score used
in hypothesis testing and in the calculation of
confidence intervals.
35
t- score (t-value)
Components of formula: sample mean, population mean, standard deviation, sample size
(see confidence interval slide 30)
- Sample size (n) is not stated directly in t - score tables; instead, the tables express
sample size in terms of degrees of freedom (df):
df = n - 1
36
t- table
37
Ref: High Yield Biostatistics, Epidemiology…
Calculation of t- score
Example:
random sample of people (n) = 15
mean heart rate ( X ) = 74 beats/min
standard deviation of this sample (SD) = 8.2
estimated standard error, can be calculated as follows:
38
Ref: High Yield Biostatistics, Epidemiology…
Calculation of t- score
Example: continue…
For a sample consisting of 15 people, the t tables will give the appropriate value of t (corresponding
to the middle 95% of the distribution) for df = 14 (i.e., n = 1).
The sample mean therefore allows us to estimate that the true mean resting heart rate of this population
is 74 beats/min, and we can be 95% confident that it lies between 69.5 and 78.5.
39
Ref: High Yield Biostatistics, Epidemiology…
t- score (t-value)
Importance of t-score in Inferential Statistics
- Essential for testing hypotheses about population means.
- Useful when the sample size is small (< 30).
- Helps account for variability in sample data.
40
5. Statistical test (t-test)
Situations Where t-Score is Used
41
Questions and Discussion on Probability
42
Questions and Discussion
● Question 1: Can you explain the difference between mutually exclusive and not mutually exclusive events with an example?
● Follow-up: Why do we subtract P(A and B) in the addition rule for not mutually exclusive events?
● Question 2: How would the probability change if we roll two dice and are asked to find the probability of rolling a 3 on at least one of them?
● Follow-up: What is the probability of rolling a 3 on both dice?
● Question 3: Why is the multiplication rule different for independent and dependent events?
● Follow-up: Can you provide a real-life example where two events are dependent?
● Question 4: How can understanding probability rules help in making decisions based on data?
● Follow-up: Can you think of a scenario in biostatistics where these rules are applied?
● Question 5: If two events A and B are independent, what is the probability of both not occurring?
● Follow-up: How would this probability be calculated if A and B were dependent?
● Question 6: How do you determine whether events are independent or dependent when given a problem?
● Follow-up: Can you provide an example to illustrate your explanation?
● Question 7: How does the Central Limit Theorem relate to the rules of probability, particularly in inferential statistics?
● Follow-up: Why is the Central Limit Theorem important in biostatistics?
● Question 8: If a fair die is rolled twice, what is the probability of rolling an even number first and then a number greater than 4?
● Follow-up: What if the die is biased? How would that change the calculation?
● Question 9: In the context of biostatistics, how can the rules of probability assist in interpreting the results of a clinical trial?
● Follow-up: Can you give an example of a clinical trial scenario and apply these probability rules?
● Question 10: Can you think of any limitations or challenges when applying the rules of probability to real-world data?
● Follow-up: How can these challenges be addressed?
43