0% found this document useful (0 votes)
34 views48 pages

Inferential Statistics

Uploaded by

Manjot Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views48 pages

Inferential Statistics

Uploaded by

Manjot Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Statistics for Data Science(23CSH-233)


Faculty: Prof. (Dr.) Madan Lal Saini(E13485)

Inferential Statistics DISCOVER . LEARN . EMPOWER

1
Statistics for Data Science : Course Objectives

COURSE OBJECTIVES
The Course aims to:
1. To equip students with the skills to summarize and interpret data using descriptive
statistics and visualization techniques.
2. To develop a foundational understanding of probability and its applications in data
science.
3. To enable students to perform hypothesis testing and construct confidence intervals
for statistical inference.
4. To teach students how to build and assess linear and logistic regression models for
predictive analysis.
5. To provide hands-on experience with statistical software for data manipulation,
analysis, and visualization.

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

Summarize and describe the main features of a dataset using measures such as mean,
CO1 median, mode, variance, and standard deviation, as well as graphical representations
like histograms, box plots, and scatter plots.
Understand of probability theory, including concepts such as random variables,
CO2 probability distributions, and the law of large numbers, enabling them to model and
reason about uncertainty in data.
Apply/perform statistical inference, including hypothesis testing, confidence interval
CO3 estimation, and p-value computation, to draw valid conclusions from sample data about
larger populations.

Apply linear and logistic regression techniques to identify relationships between


CO4
variables, make predictions, and evaluate model performance.

Utilize statistical software tools to perform data analysis, including data cleaning,
CO5
transformation, visualization, and implementing various statistical methods.

3
Unit-3 Syllabus

Unit-3 Inferential Statistics

Inferential Statistical Inference Terminology,


Statistics & Hypothesis Testing,
Hypothesis Parametric Tests,
Testing Non-parametric Tests

Industry Hypothesis Testing using Excel


Application Industry Practices & Applications of Statistics

4
SUGGESTIVE READINGS

TEXT BOOKS:
• T1. Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York:
Publisher: Springer, Edition: Second Edition (2009), ISBN: 978-0387848570
• T2. Montgomery, Douglas C., and George C. Runger. Applied statistics and probability for
engineers. John Wiley & Sons, 2010.
• T3. Probability and Statistics The Science of Uncertainty Second Ed., Michael J. Evans and
Jeffrey S. Rosenthal.

REFERENCE BOOKS:
• R1. Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al,
Publisher: O'Reilly Media, Edition: Second Edition (2020), ISBN: 978-1492072942
• R2. An Introduction to Statistical Learning: with Applications in R, Authors: Gareth James, et
al, Publisher: Springer, Edition: Second Edition (2021), ISBN: 978-1071614174
• R3. Think Stats: Exploratory Data Analysis in Python, Author: Allen B. Downey, Publisher:
O'Reilly Media, Publication Year: 2014 (2nd Edition), ISBN: 978-1491907337

5
What is a Statistic????

Sample
Sample
Sample

Population
Sample

Parameter: value that describes a population

Statistic: a value that describes a sample


PSYCH  always using samples!!!
Descriptive & Inferential
Statistics
Descriptive Statistics Inferential Statistics

• Organize • Generalize from


samples to pops
• Summarize
• Hypothesis
• Simplify testing
• Presentation of • Relationships
data among variables

Describing data Make predictions


Descriptive
Statistics
3 Types

1. Frequency Distributions 3. Summary Stats


# of Ss that fall Describe data in just one
in a particular category number

2. Graphical Representations

Graphs & Tables


1. Frequency Distributions

# of Ss that fall
in a particular category

How many males and how many females are


in our class?

total

Frequency ? ?
(%)
?/tot x 100 ?/tot x 100
scale of measurement?
-----% ------%
nominal
1. Frequency Distributions

# of Ss that fall
in a particular category

Categorize on the basis of more that one variable at same time


CROSS-TABULATION

total

Democrats 24 1 25

Republican 19 6 25

Total 43 7 50
1. Frequency
Distributions
How many brothers & sisters do you have?

# of bros & sis


Frequency
7 ?
6 ?
5 ?
4 ?
3 ?
2 ?
1 ?
0 ?
2. Graphical Representations

Graphs & Tables

Bar graph (ratio data - quantitative)


2. Graphical Representations

Histogram of the categorical variables


2. Graphical Representations

Polygon - Line Graph


2. Graphical Representations

Graphs & Tables

How many brothers & sisters do you have?


Lets plot class data: HISTOGRAM

# of bros & sis


Frequency
7 ?
6 ?
5 ?
4 ?
3 ?
2 ?
1 ?
0 ?
jagged

Altman, D. G et al. BMJ 1995;310:298

smooth
Central Limit Theorem: the larger the sample size, the closer a
distribution
will approximate the normal distribution or

A distribution of scores taken at random from any distribution will


tend to
form a normal curve
Normal Distribution:
halfTwo Tail above
the scores 68%
mean…half below
(symmetrical)

2.5% 95%
2.5%

13.5%
13.5%

IQ
body temperature, shoe sizes, diameters of trees,
5% region of rejection of null hypothesis
Wt, height etc…
Non directional
Summary Statistics
describe data in just 2 numbers

Measures of variability
• typical average variation
Measures of central tendency
• typical average score
Measures of Central
Tendency
• Quantitative data:
• Mode – the most frequently occurring
observation
• Median – the middle value in the data (50 50 )
• Mean – arithmetic average
• Qualitative data:
• Mode – always appropriate
• Mean – never appropriate
Mean
Notation
• The most common and most
useful average • Sample vs
• Mean = sum of all population
observations • Sample mean = X
number of all
observations • Population mean =m

• Observations can be added • Summation sign =
in any order. • Sample size = n
• Population size = N
Special Property of the Mean
Balance Point

• The sum of all observations expressed as


positive and negative deviations from
the mean always equals zero!!!!
• The mean is the single point of equilibrium
(balance) in a data set
• The mean is affected by all values in the
data set
• If you change a single value, the mean
changes.
The mean is the single point of equilibrium (balance) in a data set

SEE FOR YOURSELF!!! Lets do the Math


Summary Statistics
describe data in just 2 numbers

Measures of variability
Measures of central tendency • typical average variation
• typical average score
1. range: distance from the
lowest to the highest (use 2
data points)
2. Variance: (use all data points)
3. Standard Deviation
4. Standard Error of the Mean
Descriptive & Inferential
Statistics
Descriptive Statistics Inferential Statistics

• Organize • Generalize from


samples to pops
• Summarize
• Hypothesis
• Simplify testing
• Presentation of • Relationships
data among variables

Describing data Make predictions


Measures of Variability

2. Variance: (use all data points):

average of the distance that each score is


from the mean (Squared deviation from the
mean)
Notation for variance
s2

3. Standard Deviation= SD= s2

4. Standard Error of the mean = SEM = SD/ n


Inferential Statistics

Sample
Sample

Population Sample

Sample

Draw inferences about the


larger group
Sampling Error: variability among
samples due to chance vs population

Or true differences? Are just due


to
sampling error?
Probability…..
Error…misleading…not a mistake
Probability
• Numerical indication of how likely it is that a
given event will occur (General
Defi nition)“hum…what’s the probability it will rain?”
• Statistical probability:
the odds that what we observed in the sample did
not occur because of error (random and/ or
systematic)“hum…what’s the probability that my results
are not just due to chance”
• I n other words, the probability associated with
a statistic is the level of confi dence we have that
the sample group that we measured actually
represents the total population
Chain of Reasoning for
Inferential Statistics

Selection
Sample
Population

Measure
Inference data

Probability

Are our inferences valid?…Best we can do is to calculate probability


about inferences
Inferential Statistics: uses sample data
to evaluate the credibility of a
hypothesis about a population

NULL Hypothesis:

NULL (nullus - latin): “not any”  no


differences between means

H 0 : m1 = m2

Always testing the null hypothesis “H- Naught”


Inferential statistics: uses sample data
to evaluate the credibility of a
hypothesis about a population

Hypothesis: Scientific or alternative


hypothesis

Predicts that there are differences


between the groups

H 1 : m1 = m2
Hypothesis
A statement about what findings are expected

null hypothesis
"the two groups will not differ“

alternative hypothesis
"group A will do better than group B"
"group A and B will not perform the same"
Inferential Statistics

When making comparisons


btw 2 sample means there are 2
possibilities

Null hypothesis is false


Null hypothesis is true

Reject the Null


Not reject the Null Hypothesis hypothesis
Possible Outcomes in
Hypothesis Testing (Decision)

Null is True Null is False


Correct
Accept Error
Decision
Type II Error

Correct
Reject Error
Decision
Type I Error

Type I Error: Rejecting a True Hypothesis


Type II Error: Accepting a False Hypothesis
Hypothesis Testing - Decision
Decision Right or Wrong?
But we can know the probability of being right
or wrong

Can specify and control the probability of


making TYPE I of TYPE II Error

Try to keep it small…


ALPHA
the probability of making a type I error  depends on
the criterion you use to accept or reject the null
hypothesis = significance level (smaller you make
alpha, the less likely you are to commit error) 0.05 (5
chances in 100 that the difference observed was really
due to sampling error – 5% of the time a type I error
will occur)
Possible Outcomes in
Hypothesis Testing

Null is True Null is False

Alpha (a) Accept


Correct
Decision
Error
Type II Error

Correct
Difference observed is really Reject Error
Decision
just sampling error Type I Error

The prob. of type one error


When we do statistical analysis… if alpha
(p value- significance level) greater than 0.05

WE ACCEPT THE NULL HYPOTHESIS

is equal to or less that 0.05 we

REJECT THE NULL (difference btw means)


Two Tail

2.5% 2.5%

5% region of rejection of null hypothesis


Non directional
One Tail

5%

5% region of rejection of null hypothesis


Directional
BETA
Probability of making type II error  occurs when we
fail to reject the Null when we should have

Possible Outcomes in
Hypothesis Testing

Null is True Null is False

Beta (b) Accept


Correct
Decision
Error
Type II Error

Correct
Difference observed is real Reject Error
Decision
Failed to reject the Null Type I Error

POWER: ability to reduce type II error


POWER: ability to reduce type II error
(1-Beta) – Power Analysis

The power to find an effect if an effect is present

1. Increase our n

2. Decrease variability

3. More precise measurements

Effect Size: measure of the size of the difference


between means attributed to the treatment
Inferential statistics

Significance testing:

Practical vs statistical significance


Inferential statistics
Used for Testing for Mean Differences

T-test: when experiments include only 2


groups
a. Independent
b. Correlated
i. Within-subjects
ii. Matched

Based on the t statistic (critical values) based


on
df & alpha level
Inferential statistics
Used for Testing for Mean Differences

Analysis of Variance (ANOVA): used when


comparing more than 2 groups

1. Between Subjects
2. Within Subjects – repeated measures

Based on the f statistic (critical values) based


on
df & alpha level

More than one IV = factorial (iv=factors)


Only one IV=one-way anova
Inferential statistics

Meta-Analysis:

Allows for statistical averaging of


results
From independent studies of the same
phenomenon
References
Books:
• Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York: Publisher: Springer, Edition:
Second Edition (2009), ISBN: 978-0387848570
• Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al, Publisher: O'Reilly
Media, Edition: Second Edition (2020), ISBN: 978-1492072942

Research Papers:
• Garg, Ram and Goyal, Ruchi, Inferential Statistics As a Measure of Judging the Short-Term Solvency An Empirical Study of Three Steel
Companies in India (February 5, 2019). International Journal of Advanced Studies of Scientific Research, Vol. 4, No. 1, 2019, Available at
SSRN: https://ssrn.com/abstract=3329388.
• Alacaci, C. (2004). Inferential Statistics: Understanding Expert Knowledge and its Implications for Statistics Education. Journal of Statistics
Education, 12(2). https://doi.org/10.1080/10691898.2004.11910737
Websites:
• https://www.simplilearn.com/inferential-statistics-article/
• https://builtin.com/data-science/inferential-statistics#:~:text=Inferential%20statistics%20is%20the%20practice,
sample%20data%20sample%20or%20population./

Videos:
• https://www.youtube.com/watch?v=cjTgyRUaD1s&list=PLbRMhDVUMngeD_vOeveVE-3b7wu_AZph9
• https://www.youtube.com/watch?v=ZmCBF5JXOPM&list=PLFW6lRTa1g80s2MWqXNg2o0haq1k14v2I 47
THANK YOU

For queries
Email: madan.e13485@cumail.in

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy