100% found this document useful (1 vote)

458 views10 pages

Inferential Statistics For Data Science

Notes on Inferential Statistics. Inferential statistics allows you to make inferences about the population from the sample data.

Uploaded by

rsaranms

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

458 views10 pages

Inferential Statistics For Data Science

Notes on Inferential Statistics. Inferential statistics allows you to make inferences about the population from the sample data.

Uploaded by

rsaranms

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Inferential Statistics for Data Science

 Inferential Statistics
 Sampling Distributions & Estimation
o Hypothesis Testing (One and Two Group Means)
o Hypothesis Testing (Categorical Data)
o Hypothesis Testing (More Than Two Group Means)
 Quantitative Data (Correlation & Regression)
 Significance in Data Science

Inferential Statistics

Inferential statistics allows you to make inferences about the population from the sample data.

Population & Sample

A sample is a representative subset of a population. Conducting a census on population is an

ideal but impractical approach in most of the cases. Sampling is much more practical, however it
is prone to sampling error. A sample non-representative of population is called bias, method
chosen for such sampling is called sampling bias. Convenience bias, judgement bias, size bias,
response bias are main types of sampling bias. The best technique for reducing bias in sampling
is randomization. Simple random sampling is the simplest of randomization techniques, cluster
sampling & stratified sampling are other systematic sampling techniques.

Sampling Distributions

Sample means become more and more normally distributed around the true mean (the population
parameter) as we increase our sample size. The variability of the sample means decreases as
sample size increases.

Central Limit Theorem

The Central Limit Theorem is used to help us understand the following facts regardless of
whether the population distribution is normal or not:
1. the mean of the sample means is the same as the population mean
2. the standard deviation of the sample means is always equal to the standard error.
3. the distribution of sample means will become increasingly more normal as the sample size
increases.

Confidence Intervals

A sample mean can be referred to as a point estimate of a population mean. A confidence

interval is always centered around the mean of your sample. To construct the interval, you
add a margin of error. The margin of error is found by multiplying the standard error of the
mean by the z-score of the percent confidence level:

The confidence level indicates the number of times out of 100 that the mean of the population
will be within the given interval of the sample mean.

Hypothesis Testing
Hypothesis testing is a kind of statistical inference that involves asking a question, collecting
data, and then examining what the data tells us about how to proceed. The hypothesis to be tested
is called the null hypothesis and given the symbol Ho. We test the null hypothesis against an
alternative hypothesis, which is given the symbol Ha.

When a hypothesis is tested, we must decide on how much of a difference between means is
necessary in order to reject the null hypothesis. Statisticians first choose a level of significance or
alpha(α) level for their hypothesis test.

Critical values are the values that indicate the edge of the critical region. Critical regions
describe the entire area of values that indicate you reject the null hypothesis.

left, right & two-tailed tests

These are the four basic steps we follow for (one & two group means) hypothesis testing:

1. State the null and alternative hypotheses.

2. Select the appropriate significance level and check the test assumptions.
3. Analyze the data and compute the test statistic.
4. Interpret the result.
Hypothesis Testing (One and Two Group Means)

Hypothesis Test on One Sample Mean When the Population Parameters are Known

We find the z-statistic of our sample mean in the sampling distribution and determine if that z-
score falls within the critical(rejection) region or not. This test is only appropriate when you
know the true mean and standard deviation of the population.

Hypothesis Tests When You Don’t Know Your Population Parameters

The Student’s t-distribution is similar to the normal distribution, except it is more spread out and
wider in appearance, and has thicker tails. The differences between the t-distribution and the
normal distribution are more exaggerated when there are fewer data points, and therefore fewer
degrees of freedom.
Estimation as a follow-up to a Hypothesis Test

When a hypothesis is rejected, it is often useful to turn to estimation to try to capture the true
value of the population mean.

Two-Sample T Tests

Independent Vs Dependent Samples

When we have independent samples we assume that the scores of one sample do not affect the
other.

unpaired t-test

In two dependent samples of data, each score in one sample is paired with a specific score in the
other sample.
paired t-test

Hypothesis Testing (Categorical Data)

Chi-square test is used for categorical data and it can be used to estimate how closely the
distribution of a categorical variable matches an expected distribution (the goodness-of-fit test),
or to estimate whether two categorical variables are independent of one another (the test of
independence).

goodness-of-fit

degree of freedom (d f) = no. of categories(c)−1

test of independence

degree of freedom (df) = (rows−1)(columns−1)

Hypothesis Testing (More Than Two Group Means)

Analysis of Variance (ANOVA) allows us to test the hypothesis that multiple population means
and variances of scores are equal. We can conduct a series of t-tests instead of ANOVA but that
would be tedious due to various factors.

We follow a series of steps to perform ANOVA:

1. Calculate the total sum of squares (SST )

2. Calculate the sum of squares between (SSB)
3. Find the sum of squares within groups (SSW ) by subtracting
4. Next solve for degrees of freedom for the test
5. Using the values, you can now calculate the Mean Squares Between (MSB) and Mean
Squares Within (MSW ) using the relationships below
6. Finally, calculate the F statistic using the following ratio
7. It is easy to fill in the Table from here — and also to see that once the SS and df are filled
in, the remaining values in the table for MS and F are simple calculations
8. Find F critical

ANOVA formulaes

If F-value from the ANOVA test is greater than the F-critical value, so we would reject our Null
Hypothesis.

One-Way ANOVA

One-way ANOVA method is the procedure for testing the null hypothesis that the population
means and variances of a single independent variable are equal.

Two-Way ANOVA

Two-way ANOVA method is the procedure for testing the null hypothesis that the population
means and variances of two independent variables are equal. With this method, we are not only
able to study the effect of two independent variables, but also the interaction between these
variables.

We can also do two separate one-way ANOVA but two-way ANOVA gives us Efficiency,
Control & Interaction.

Quantitative Data (Correlation & Regression)

Correlation

Correlation refers to a mutual relationship or association between quantitative variables. It can

help in predicting one quantity from another. It often indicates the presence of a causal
relationship. It used as a basic quantity and foundation for many other modeling techniques.
Pearson Correlation

Regression

Regression analysis is a set of statistical processes for estimating the relationships among
variables.

Regression

Simple Regression

This method uses a single independent variable to predict a dependent variable by fitting the best
relationship.
Multiple Regression

This method uses more than one independent variable to predict a dependent variable by fitting
the best relationship.

It works best when multicollinearity is absent. It’s a phenomenon in which two or more predictor
variables are highly correlated.

Nonlinear Regression

In this method, observational data are modeled by a function which is a nonlinear combination
of the model parameters and depends on one or more independent variables.
Significance in Data Science

In data science, inferential statistics is used is many ways:

 Making inferences about the population from the sample.

 Concluding whether a sample is significantly different from the population.
 If adding or removing a feature from a model will really help to improve the model.
 If one model is significantly better than the other?
 Hypothesis testing in general.

Data Visualization Notes Ou
No ratings yet
Data Visualization Notes Ou
125 pages
Cfa (Initial
No ratings yet
Cfa (Initial
15 pages
SEM:Confirmatory Factor Analysis (CFA)
No ratings yet
SEM:Confirmatory Factor Analysis (CFA)
28 pages
SPSS - Practice Questions For Exam
50% (2)
SPSS - Practice Questions For Exam
7 pages
001 2014 4 e PDF
No ratings yet
001 2014 4 e PDF
168 pages
Inferential Statistics in Details
No ratings yet
Inferential Statistics in Details
652 pages
Chi Square Test
100% (2)
Chi Square Test
75 pages
Chi Square Test Final
100% (1)
Chi Square Test Final
40 pages
Jakobsson, - Rmann - Jakobsson, Sverrir - The Routledge Research Companion To The Medieval Icelandic Sagas-Taylor
No ratings yet
Jakobsson, - Rmann - Jakobsson, Sverrir - The Routledge Research Companion To The Medieval Icelandic Sagas-Taylor
377 pages
Answers To Exercises and Review Questions: T-Test
100% (1)
Answers To Exercises and Review Questions: T-Test
27 pages
IEER
100% (1)
IEER
252 pages
Statistical Data Analysis
No ratings yet
Statistical Data Analysis
150 pages
Quantitative Analysis Using Spss
100% (1)
Quantitative Analysis Using Spss
42 pages
Sampling and Sampling Distribution
100% (1)
Sampling and Sampling Distribution
64 pages
School of Public Health: Haramaya University, Chms
100% (1)
School of Public Health: Haramaya University, Chms
40 pages
Statistics and Data
No ratings yet
Statistics and Data
67 pages
Spss Syllabus
No ratings yet
Spss Syllabus
3 pages
T
No ratings yet
T
267 pages
Data Arrangement and Presentation Formation of Tables and Charts
No ratings yet
Data Arrangement and Presentation Formation of Tables and Charts
55 pages
Btech Ce 6 Sem Foundation Design Kce 064 2023
No ratings yet
Btech Ce 6 Sem Foundation Design Kce 064 2023
2 pages
Correlation and Regression-1
No ratings yet
Correlation and Regression-1
32 pages
Regression Analysis MCQ
No ratings yet
Regression Analysis MCQ
15 pages
Statistical Analysis in Finance Session 4: Hypothesis Testing
No ratings yet
Statistical Analysis in Finance Session 4: Hypothesis Testing
32 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
Problems On ANOVA
100% (1)
Problems On ANOVA
5 pages
Hall ND The Rediscovery of Ideology PDF
100% (1)
Hall ND The Rediscovery of Ideology PDF
20 pages
Hypothesis Testing
100% (1)
Hypothesis Testing
58 pages
Hypothesis Testing
0% (1)
Hypothesis Testing
139 pages
Multiple Regression and Correlation Analysis: BX A Y
100% (1)
Multiple Regression and Correlation Analysis: BX A Y
35 pages
Data Validation & Research
No ratings yet
Data Validation & Research
41 pages
Measures of Central Tendency
100% (2)
Measures of Central Tendency
35 pages
Oskar Goldberg
No ratings yet
Oskar Goldberg
19 pages
GRX1200 GettingStarted V1 0en
No ratings yet
GRX1200 GettingStarted V1 0en
66 pages
Statistical Inference
100% (1)
Statistical Inference
33 pages
Logic (Immediate Inference)
100% (6)
Logic (Immediate Inference)
4 pages
ANOVA PPT Explained PDF
No ratings yet
ANOVA PPT Explained PDF
50 pages
Global Vector Control Response 2017-2030: Fourth Draft (Version 4.3)
No ratings yet
Global Vector Control Response 2017-2030: Fourth Draft (Version 4.3)
50 pages
Chapter 11 - Compatibility Mode
No ratings yet
Chapter 11 - Compatibility Mode
26 pages
LECTURE 03 Styles of Communication
No ratings yet
LECTURE 03 Styles of Communication
39 pages
Chapter 6 Sample Size Estimation
No ratings yet
Chapter 6 Sample Size Estimation
13 pages
Ideology and The Ideological State Apparatuses
No ratings yet
Ideology and The Ideological State Apparatuses
22 pages
Hypothesis Testing: Frances Chumney, PHD
No ratings yet
Hypothesis Testing: Frances Chumney, PHD
38 pages
Ubd Template
No ratings yet
Ubd Template
2 pages
Research Hypothesis & Types
100% (1)
Research Hypothesis & Types
11 pages
Parametric Tests
100% (1)
Parametric Tests
9 pages
CH 9
No ratings yet
CH 9
9 pages
Compilers: Tools For Scientists and Engineers
No ratings yet
Compilers: Tools For Scientists and Engineers
42 pages
RWS-Identifying Claims Implicitly and Explicitly in A Written
No ratings yet
RWS-Identifying Claims Implicitly and Explicitly in A Written
23 pages
Latin Square Design
No ratings yet
Latin Square Design
8 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
2021 2022 2nd
No ratings yet
2021 2022 2nd
7 pages
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
100% (1)
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
5 pages
Vocabulary For Academic IELTS Writing Task 2
100% (1)
Vocabulary For Academic IELTS Writing Task 2
17 pages
ÔN TẬP CK
No ratings yet
ÔN TẬP CK
3 pages
EM 1110-2-5025 - Dredging and Dredged Material Disposal - Web
No ratings yet
EM 1110-2-5025 - Dredging and Dredged Material Disposal - Web
94 pages
Consumer Behaviour in The Food Service Industry
No ratings yet
Consumer Behaviour in The Food Service Industry
17 pages
Lecture 8 ANOVA
100% (1)
Lecture 8 ANOVA
44 pages
Bridon Construction Products
No ratings yet
Bridon Construction Products
16 pages
Visionary Leadership: Great Video On The 3 Most Important
No ratings yet
Visionary Leadership: Great Video On The 3 Most Important
21 pages
Descriptive and Causal Research
No ratings yet
Descriptive and Causal Research
18 pages
Sampling Distributions Coursera
No ratings yet
Sampling Distributions Coursera
8 pages
Types of Statistical Tests
No ratings yet
Types of Statistical Tests
4 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
RectorDecryptor.2.3.14.0 07.05.2011 22.02.03 Log
No ratings yet
RectorDecryptor.2.3.14.0 07.05.2011 22.02.03 Log
2 pages
Tests of Significance and Measures of Association
No ratings yet
Tests of Significance and Measures of Association
21 pages
Assignment 2 Questions One
No ratings yet
Assignment 2 Questions One
2 pages
Types of Data & The Scales of Measurement: Data at The Highest Level: Qualitative and Quantitative
No ratings yet
Types of Data & The Scales of Measurement: Data at The Highest Level: Qualitative and Quantitative
7 pages
Learning Module 5 Part B
No ratings yet
Learning Module 5 Part B
3 pages
Spss Project (Prashant Rajput)
No ratings yet
Spss Project (Prashant Rajput)
23 pages
MDC 4 5 Basic Statistics
No ratings yet
MDC 4 5 Basic Statistics
2 pages
Practice Exam III
100% (2)
Practice Exam III
8 pages
Contents of Ranjit Kumar Book
No ratings yet
Contents of Ranjit Kumar Book
9 pages
Normal Distributions: The Normal Curve, Skewness, Kurtosis, and Probability
No ratings yet
Normal Distributions: The Normal Curve, Skewness, Kurtosis, and Probability
14 pages
Noli Me Tangere
No ratings yet
Noli Me Tangere
2 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
ANOVA Homework
No ratings yet
ANOVA Homework
7 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
2 pages
Chi-Square Test A Nonparametric Hypothesis Test
No ratings yet
Chi-Square Test A Nonparametric Hypothesis Test
52 pages
3.1 Sampling Concept
No ratings yet
3.1 Sampling Concept
10 pages
Steps of Test Development
No ratings yet
Steps of Test Development
7 pages
Psychology Revision: Research Methods
No ratings yet
Psychology Revision: Research Methods
17 pages
Cooling by Underground Earth Tubes
No ratings yet
Cooling by Underground Earth Tubes
4 pages
Nonparametric Test
No ratings yet
Nonparametric Test
29 pages
Tests of Hypothesis
No ratings yet
Tests of Hypothesis
16 pages
Exploratory Factor Analysis (Efa)
No ratings yet
Exploratory Factor Analysis (Efa)
56 pages
Chi Square Statistics
No ratings yet
Chi Square Statistics
7 pages
E-ABRASIC P 12 To P 220: For Coated Abrasives Products
100% (1)
E-ABRASIC P 12 To P 220: For Coated Abrasives Products
2 pages
The Merciad, April 14, 1978
No ratings yet
The Merciad, April 14, 1978
8 pages
Formulas in Inferential Statistics
No ratings yet
Formulas in Inferential Statistics
4 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Inferential Statistics For Data Science

Uploaded by

Inferential Statistics For Data Science

Uploaded by

Inferential Statistics for Data Science

Population & Sample

A sample is a representative subset of a population. Conducting a census on population is an

Central Limit Theorem

A sample mean can be referred to as a point estimate of a population mean. A confidence

left, right & two-tailed tests

1. State the null and alternative hypotheses.

Hypothesis Tests When You Don’t Know Your Population Parameters

Independent Vs Dependent Samples

Hypothesis Testing (Categorical Data)

degree of freedom (d f) = no. of categories(c)−1

degree of freedom (df) = (rows−1)(columns−1)

Hypothesis Testing (More Than Two Group Means)

We follow a series of steps to perform ANOVA:

1. Calculate the total sum of squares (SST )

Quantitative Data (Correlation & Regression)

Correlation refers to a mutual relationship or association between quantitative variables. It can

In data science, inferential statistics is used is many ways:

 Making inferences about the population from the sample.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.