0% found this document useful (0 votes)
8 views48 pages

Inferential Statistics

Uploaded by

2022879878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views48 pages

Inferential Statistics

Uploaded by

2022879878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Inferential Statistics

Siti Nor Ismalina Isa


BSc. (Hons) (Forensic Sc) (USM)
MSc. (Medical Statistics) (USM)
PhD (Biomedical Sc) (UKM)
What is the difference between
descriptive statistics and inferential
statistics?
STATISTICS

Descriptive Inferential
statistics statistics

Measure of
Measure of Hypothesis
central Estimation
dispersion testing
tendency

Mean,
Variance, SD, Point Confidence P-value
Median, Ho/Ha
IQR,etc. estimate interval calculation
Mode
Inferential Statistics
• Statistical method used to infer results of sample
(statistic) to population (parameter)

Population

Sample
Statistical conclusion
Notations of
parameter & statistic
Sample Population
Statistic Parameter
Sample size n N

Mean x µ

Standard deviation s σ
IMPORTANT CONCEPTS
IN INFERENTIAL STATISTICS
Important concepts
• Binomial distribution

• Poisson distribution

• Normal distribution

• Sampling distribution

• Central Limit Theorem

• Standard normal distribution


Some distribution frequently used in statistical
analysis for discrete random variable

1. Binomial distribution
2. Poisson distribution
Binomial distribution
• One of the most widely used probability
• Follow Bernoulli Process
• Each trial results in one of two possible mutually
exclusive outcomes. One denoted as success and
another is failure
• the probability of a success denoted by p remains
constant from trials to trials and probability of
failures , 1-p is denoted by q
• the trials are independent
Poisson distribution
• Usually used if the events under observation is
rarely happen
• E.g.
• Suicides cases

Poisson probability =
Normal distribution
Important characteristics:
• Symmetrical about its mean, µ
• Mirror image
• Mean=median=mode
• Total area under the curve above x axis =
1
• Area enclosed under the curve 68%,
95% and 99% by 1SD, 2SD and 3SD both
direction
• Changing µ shift the graph at x axis
• Changing σ influence the peak/flatness
of graph
Sampling distribution
• Suppose that we draw all possible samples of
size n from a given population, compute
a statistic (e.g., a mean, proportion, SD) for each
sample.

• A sample statistic is often unequal to the value of


the corresponding population parameter because of
sampling error.

• Inferential statistics allow us to estimate how close


to the population value the calculated statistic is
likely to be.
Sampling from normally
distributed population
• The distribution of x will be normal

• The mean of the sampling distribution (sample mean)


will be equal to the mean of population

• The variance of the sampling distribution will be equal


to the variance of population divided by sample size
Sampling from non-normally
distributed population
Central Limit Theorem

• The sampling distribution of x computed from samples of


size of n from the population will have mean µ and
variance σ2/n and will be approximately normally
distributed when the sample size is large.

Note: Sample size of 30 is satisfactory


The Central Limit Theorem
• By this theorem the distribution of the means of large
samples (size > 30) taken from a population has, in theory,
three characteristics:
• (1) The shape is normal
• (2) The mean is the same as the mean of the original
population.
• (3) The standard deviation is smaller than the standard
deviation of the original population depending on the size
of the sample.

If the sample size is large, parametric statistics (e.g t-test or ANOVA) should
15
be used, because the averages used in the tests, are reasonably close to a
normal distribution.
Standard normal distribution
• Normal distribution with mean 0 and SD 1
• Random variable, z = (x - µ)/σ
Inferential Statistics
2 components of inferential statistics:

• Estimation (confidence interval)


• Estimating a mean (numerical)
• Estimating a proportion (categorical)

• Hypothesis testing
• Comparing two means
• Comparing two proportion
• Association/relationship between one variable and another
variable
Confidence Interval (C.I)
• One of the main objective of statistics is
estimation of population parameters based on
the information contained in the sample.

• The way of estimation


• Point estimate
• Confidence interval estimate
Point estimate
• Point estimate for population mean, μ is the
sample mean, x̄, which will vary from sample to
sample.
• Point estimate method fails to indicate how
close the estimate is to population parameter.
• The CI estimate can remedy this flaw.
CI
• The computed interval with a given probability,
e.g: 95% that the true value of variable such as a
mean, proportion, or rate is contained within the
interval
• Confidence limit: the upper and lower
boundaries of the CI
• Confidence level: one minus significance level (α)
• E.g.: 0.95 if α is 0.05. It is usually expressed as
percentage
CI
• The width of CI depends on
• The amount of random variability inherent in
the data collection process (SE)
• Sample size
• An arbitrary selected alpha level that specifies
the degree of compatibility between the limits
of interval and the data
• The width of CI indicates the degree of precision
of estimate reciprocally
Hypothesis testing
• Researcher produce a hypothesis based on
observation or experience (research hypothesis)

• Testing the hypothesis using the statistical analysis

• Researcher make decision aided by the statistical


results

Above process is called hypothesis testing…


Hypothesis testing
Important terminology
• Type of hypothesis
• Null hypothesis
• Alternative hypothesis
• Level of significance
• P value and statistical significance
• Critical zone
• One sided vs two sided tests
Type of hypothesis
Ho: Null hypothesis
(no difference/ no association)

Ha: Alternative hypothesis


(there is a difference/ an association)
@research hypothesis
Statistical Hypothesis

Definition
A null hypothesis is a claim (or statement) about
a population parameter that is assumed to be
true until it is declared false.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Statistical Hypothesis

Definition
An alternative hypothesis is a claim about a
population parameter that will be true if the null
hypothesis is false.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Types of error
• Types of error in making decision box
In reality
(Population)
True Ho False Ho
No association Yes association
Test Not reject Ho True Negative False negative
Decision (No association) Type II error
(β)
(Sample) False positive True Positive
Reject Ho
Type I error (power)
(Yes association) (α)
Wrongly reject true Ho Failure reject false Ho
Level of significance
Level of significance = Type 1 error (α)

• Probability of rejecting a true null hypothesis

• @ conclude as the Alternative hypothesis

• @ probability of error in making decision that there is


association when in actual fact that there is no
association
P value

• The probability of making Type 1 error

• Normally allowed less than 5% error

• It is the level of error @risk the researcher


willing to take in making decision
• α=5% (0.05)
• α=1%(0.01)
P value & level of significance

• If alpha is set at 5%, P value=0.05.


• There is 5% probability of error in our conclusion if we conclude
there is association @ conclude as the Alternative Hypothesis

• If alpha is set at 1%, P value=0.01.


• There is 1% probability of error in our conclusion. Error in
concluding there is association @ conclude on Ha.
• The researcher decide to take less error, decide cannot tolerate
high error in making decision
Significance level &
rejecting decision
Example:
If alpha is set at 5%, P value=0.05.

P value

P < 0.05 P > 0.05

Reject Ho Do Not Reject Ho

There is association between There is no association between


smoking and lung cancer. smoking and lung cancer.
β error & Power
• @ 1- β error, (1-Type II error)

• Example:
• Our study has beta error of 20%
• It means that there is 20% chance that we will get no
significant association even though there is an
association in reality.
• In other words, there is 80% chance that we will get
significant association if there is an association in
reality.
• It means our study has power of 80%.
Two procedures to make tests of hypothesis

1. The p-value approach


2. The critical-value approach
Steps in Hypothesis Testing

Step 1: State the hypothesis


Step 2: Set the significance level (commonly α=0.05)
Step 3: Check assumptions of appropriate
parametric test
Step 4: Perform statistical test
Step 5: Make interpretation (based on P value & CI)
Step 6: Draw conclusion
You will use the p-value
approach when doing SPSS
Steps to Perform a Test of Hypothesis with the
Critical-Value Approach

1. State the hypothesis.


2. Determine the rejection and non-rejection regions.
3. Check assumptions.
4. Calculate the value of the test statistic.
5. Make interpretation.
6. Draw conclusion.
Critical-value Approach
• The decision is
• to reject the null hypothesis if the statistic
being tested falls at or beyond critical zone on
theoretical distribution or …
• to accept the null hypothesis fall outside the
critical zone
Critical regions in sampling
distribution

0.05 0.025 0.025


One sided or two sided test
@ One tailed test

• Interested whether a variable is significantly higher or lower


than others (‘directional hypothesis’)

• Example:
• Is a new drug superior to standard drug?
• Does the air pollution level exceed safe limits?

• e.g Hypothesis: The mean cholesterol level is higher in male


than female teachers
• Ho: µmale - µfemale = 0
• HA: µmale > µfemale
One sided or two sided test
@ Two tailed test

• Looking for differences without specifying whether the


difference is numerically higher or lower

• Example:
• Is there a difference between the BMI of men and women?
• Does the mean age of volunteers differ from that of general
population?

• e.g Hypothesis: The mean cholesterol level is different


between male and female teachers
• Ho: µmale - µfemale = 0
• HA: µmale - µfemale = 0
One sided test

(α=0.05)

Z value 0 1.645

Nonrejection region Rejection region


Two sided test

(α/2=0.025) (α/2=0.025)

Z value -1.96 0 1.96


Rejection region Nonrejection region Rejection region
t table
Type of univariable analysis
Dependent Independent Number of groups Parametric test Non- parametric test
variable variable in independent
variable

- - One Sample t -

Numerical Categorical 2 gps. Independent t Mann Whitney


(one) (independent)

Categorical 2 gps. Paired t Signed Rank test


(dependent)

Categorical > 2 gps. One way ANOVA Kruskal-Wallis


(independent)

Categorical 2 gps. - Chi-square test


(independent) Fisher’s Exact
Categorical
(2 gps.)
Categorical 2 gps. - Mc Nemar
(dependent)
Hypothesis testing
Steps

Step 1: State the hypothesis


Step 2: Set the significance level (α=0.05 or 0.01)
Step 3: Check assumptions
Step 4: Test statistics (p-value @ critical-value
approach)
Step 5: Interpretation
Step 6: Conclusion
• Wayne W Daniel (2005). Biostatistics A Foundation for
Analysis in the Health Sciences. 8th Edition. John Wiley &
Sons.
• Prem Mann (2010). Introductory Statistics. 7th Edition.
John Wiley & Sons.
• Lecture Notes Assoc. Prof. Dr. Aniza Abd. Aziz & Dr. Khairil
Anuar Md Isa.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy