Inferential Statistics
Inferential Statistics
1
Statistics for Data Science : Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. To equip students with the skills to summarize and interpret data using descriptive
statistics and visualization techniques.
2. To develop a foundational understanding of probability and its applications in data
science.
3. To enable students to perform hypothesis testing and construct confidence intervals
for statistical inference.
4. To teach students how to build and assess linear and logistic regression models for
predictive analysis.
5. To provide hands-on experience with statistical software for data manipulation,
analysis, and visualization.
2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-
Summarize and describe the main features of a dataset using measures such as mean,
CO1 median, mode, variance, and standard deviation, as well as graphical representations
like histograms, box plots, and scatter plots.
Understand of probability theory, including concepts such as random variables,
CO2 probability distributions, and the law of large numbers, enabling them to model and
reason about uncertainty in data.
Apply/perform statistical inference, including hypothesis testing, confidence interval
CO3 estimation, and p-value computation, to draw valid conclusions from sample data about
larger populations.
Utilize statistical software tools to perform data analysis, including data cleaning,
CO5
transformation, visualization, and implementing various statistical methods.
3
Unit-3 Syllabus
4
SUGGESTIVE READINGS
TEXT BOOKS:
• T1. Hastie, Trevor, et al., The elements of statistical learning. Vol. 2. No. 1. New York:
Publisher: Springer, Edition: Second Edition (2009), ISBN: 978-0387848570
• T2. Montgomery, Douglas C., and George C. Runger. Applied statistics and probability for
engineers. John Wiley & Sons, 2010.
• T3. Probability and Statistics The Science of Uncertainty Second Ed., Michael J. Evans and
Jeffrey S. Rosenthal.
REFERENCE BOOKS:
• R1. Practical Statistics for Data Scientists: 50 Essential Concepts, Authors: Peter Bruce, et al,
Publisher: O'Reilly Media, Edition: Second Edition (2020), ISBN: 978-1492072942
• R2. An Introduction to Statistical Learning: with Applications in R, Authors: Gareth James, et
al, Publisher: Springer, Edition: Second Edition (2021), ISBN: 978-1071614174
• R3. Think Stats: Exploratory Data Analysis in Python, Author: Allen B. Downey, Publisher:
O'Reilly Media, Publication Year: 2014 (2nd Edition), ISBN: 978-1491907337
5
What is a Statistic????
Sample
Sample
Sample
Population
Sample
2. Graphical Representations
# of Ss that fall
in a particular category
total
Frequency ? ?
(%)
?/tot x 100 ?/tot x 100
scale of measurement?
-----% ------%
nominal
1. Frequency Distributions
# of Ss that fall
in a particular category
total
Democrats 24 1 25
Republican 19 6 25
Total 43 7 50
1. Frequency
Distributions
How many brothers & sisters do you have?
smooth
Central Limit Theorem: the larger the sample size, the closer a
distribution
will approximate the normal distribution or
2.5% 95%
2.5%
13.5%
13.5%
IQ
body temperature, shoe sizes, diameters of trees,
5% region of rejection of null hypothesis
Wt, height etc…
Non directional
Summary Statistics
describe data in just 2 numbers
Measures of variability
• typical average variation
Measures of central tendency
• typical average score
Measures of Central
Tendency
• Quantitative data:
• Mode – the most frequently occurring
observation
• Median – the middle value in the data (50 50 )
• Mean – arithmetic average
• Qualitative data:
• Mode – always appropriate
• Mean – never appropriate
Mean
Notation
• The most common and most
useful average • Sample vs
• Mean = sum of all population
observations • Sample mean = X
number of all
observations • Population mean =m
• Observations can be added • Summation sign =
in any order. • Sample size = n
• Population size = N
Special Property of the Mean
Balance Point
Measures of variability
Measures of central tendency • typical average variation
• typical average score
1. range: distance from the
lowest to the highest (use 2
data points)
2. Variance: (use all data points)
3. Standard Deviation
4. Standard Error of the Mean
Descriptive & Inferential
Statistics
Descriptive Statistics Inferential Statistics
Sample
Sample
Population Sample
Sample
Selection
Sample
Population
Measure
Inference data
Probability
NULL Hypothesis:
H 0 : m1 = m2
H 1 : m1 = m2
Hypothesis
A statement about what findings are expected
null hypothesis
"the two groups will not differ“
alternative hypothesis
"group A will do better than group B"
"group A and B will not perform the same"
Inferential Statistics
Correct
Reject Error
Decision
Type I Error
Correct
Difference observed is really Reject Error
Decision
just sampling error Type I Error
2.5% 2.5%
5%
Possible Outcomes in
Hypothesis Testing
Correct
Difference observed is real Reject Error
Decision
Failed to reject the Null Type I Error
1. Increase our n
2. Decrease variability
Significance testing:
1. Between Subjects
2. Within Subjects – repeated measures
Meta-Analysis:
Research Papers:
• Garg, Ram and Goyal, Ruchi, Inferential Statistics As a Measure of Judging the Short-Term Solvency An Empirical Study of Three Steel
Companies in India (February 5, 2019). International Journal of Advanced Studies of Scientific Research, Vol. 4, No. 1, 2019, Available at
SSRN: https://ssrn.com/abstract=3329388.
• Alacaci, C. (2004). Inferential Statistics: Understanding Expert Knowledge and its Implications for Statistics Education. Journal of Statistics
Education, 12(2). https://doi.org/10.1080/10691898.2004.11910737
Websites:
• https://www.simplilearn.com/inferential-statistics-article/
• https://builtin.com/data-science/inferential-statistics#:~:text=Inferential%20statistics%20is%20the%20practice,
sample%20data%20sample%20or%20population./
Videos:
• https://www.youtube.com/watch?v=cjTgyRUaD1s&list=PLbRMhDVUMngeD_vOeveVE-3b7wu_AZph9
• https://www.youtube.com/watch?v=ZmCBF5JXOPM&list=PLFW6lRTa1g80s2MWqXNg2o0haq1k14v2I 47
THANK YOU
For queries
Email: madan.e13485@cumail.in