0% found this document useful (0 votes)

40 views9 pages

Hypothesis Testing PDF

The document discusses different statistical tests that can be used for analyzing categorical and continuous variables from sample data including the chi-square test, t-tests, ANOVA tests, and correlation. Examples of applying each test using Python are shown.

Uploaded by

mdkashif1299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views9 pages

Hypothesis Testing PDF

Uploaded by

mdkashif1299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Chi-Square Test-

The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant
association between the two variables.

import scipy.stats as stats

import seaborn as sns

import pandas as pd
import numpy as np
dataset=sns.load_dataset('tips')

dataset.head()

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

2 21.01 3.50 Male No Sun Dinner 3

3 23.68 3.31 Male No Sun Dinner 2

4 24.59 3.61 Female No Sun Dinner 4

dataset_table=pd.crosstab(dataset['sex'],dataset['smoker'])
print(dataset_table)

smoker Yes No
sex
Male 60 97
Female 33 54

dataset_table.values

array([[60, 97],
[33, 54]], dtype=int64)

#Observed Values
Observed_Values = dataset_table.values
print("Observed Values :-\n",Observed_Values)

Observed Values :-
[[60 97]
[33 54]]

val=stats.chi2_contingency(dataset_table)

val

(0.008763290531773594,
0.925417020494423,
1,
array([[59.84016393, 97.15983607],
[33.15983607, 53.84016393]]))

Expected_Values=val[3]

no_of_rows=len(dataset_table.iloc[0:2,0])
no_of_columns=len(dataset_table.iloc[0,0:2])
ddof=(no_of_rows-1)*(no_of_columns-1)
print("Degree of Freedom:-",ddof)
alpha = 0.05

Degree of Freedom:- 1

from scipy.stats import chi2

chi_square=sum([(o-e)**2./e for o,e in zip(Observed_Values,Expected_Values)])
chi_square_statistic=chi_square[0]+chi_square[1]

print("chi-square statistic:-",chi_square_statistic)

chi-square statistic:- 0.001934818536627623

critical_value=chi2.ppf(q=1-alpha,df=ddof)
print('critical_value:',critical_value)

critical_value: 3.841458820694124

#p-value
p_value=1-chi2.cdf(x=chi_square_statistic,df=ddof)
print('p-value:',p_value)
print('Significance level: ',alpha)
print('Degree of Freedom: ',ddof)
print('p-value:',p_value)

p-value: 0.964915107315732
Significance level: 0.05
Degree of Freedom: 1
p-value: 0.964915107315732

if chi_square_statistic>=critical_value:
print("Reject H0,There is a relationship between 2 categorical variables")
else:
print("Retain H0,There is no relationship between 2 categorical variables")

if p_value<=alpha:
print("Reject H0,There is a relationship between 2 categorical variables")
else:
print("Retain H0,There is no relationship between 2 categorical variables")

Retain H0,There is no relationship between 2 categorical variables

T Test
A t-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups
which may be related in certain features

T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.

One-sample T-test with Python

The test will tell us whether means of the sample and the population are different

ages=[10,20,35,50,28,40,55,18,16,55,30,25,43,18,30,28,14,24,16,17,32,35,26,27,65,18,43,23,21,20,19,70]

len(ages)

from statsmodels.stats.weightstats import ztest

import numpy as np
ages_mean=np.mean(ages)
print(ages_mean)

30.34375

## Lets take sample

sample_size=8
age_sample=np.random.choice(ages,sample_size)

age_sample

array([35, 16, 43, 30, 24, 43, 30, 27])

np.mean(age_sample)

31.0

from scipy.stats import ttest_1samp

ttest,p_value=ttest_1samp(age_sample,30)

print(p_value)

0.7681189381229006

if p_value < 0.05: # alpha value is 0.05 or 5%

print(" we are rejecting null hypothesis")
else:
print("we fail to reject the null hypothesis")

we fail to reject the null hypothesis

Some More Examples

Consider the age of students in a college and in Class A

import numpy as np
import pandas as pd
import scipy.stats as stats
import math
np.random.seed(6)
school_ages=stats.poisson.rvs(loc=18,mu=35,size=1500)
classA_ages=stats.poisson.rvs(loc=18,mu=30,size=25)

np.mean(school_ages)

53.303333333333335

classA_ages.mean()

48.2

_,p_value=stats.ttest_1samp(a=classA_ages,popmean=school_ages.mean())

p_value

3.26936314797003e-05

school_ages.mean()

53.303333333333335

if p_value < 0.05: # alpha value is 0.05 or 5%

print(" we are rejecting null hypothesis")
else:
print("we are accepting null hypothesis")

we are rejecting null hypothesis

Two-sample T-test With Python

The Independent Samples t Test or 2-sample t-test compares the means of two independent groups in order to determine whether there
is statistical evidence that the associated population means are significantly different. The Independent Samples t Test is a parametric
test. This test is also known as: Independent t Test
np.random.seed(12)
ClassB_ages=stats.poisson.rvs(loc=18,mu=33,size=60)
ClassB_ages.mean()

50.63333333333333

_,p_value=stats.ttest_ind(a=classA_ages,b=ClassB_ages,equal_var=False)

p_value

0.06021969607248894

if p_value < 0.05: # alpha value is 0.05 or 5%

print(" we are rejecting null hypothesis")
else:
print("we are accepting null hypothesis")

we are accepting null hypothesis

Paired T-test With Python

When you want to check how different samples from the same group are, you can go for a paired T-test

weight1=[25,30,28,35,28,34,26,29,30,26,28,32,31,30,45]
weight2=weight1+stats.norm.rvs(scale=5,loc=-1.25,size=15)

print(weight1)
print(weight2)

[25, 30, 28, 35, 28, 34, 26, 29, 30, 26, 28, 32, 31, 30, 45]
[30.57926457 34.91022437 29.00444617 30.54295091 19.86201983 37.57873174
18.3299827 21.3771395 36.36420881 32.05941216 26.93827982 29.519014
26.42851213 30.50667769 41.32984284]

weight_df=pd.DataFrame({"weight_10":np.array(weight1),
"weight_20":np.array(weight2),
"weight_change":np.array(weight2)-np.array(weight1)})

weight_df

weight_10 weight_20 weight_change

0 25 30.579265 5.579265

1 30 34.910224 4.910224

2 28 29.004446 1.004446

3 35 30.542951 -4.457049

4 28 19.862020 -8.137980

5 34 37.578732 3.578732

6 26 18.329983 -7.670017

7 29 21.377139 -7.622861

8 30 36.364209 6.364209

9 26 32.059412 6.059412

10 28 26.938280 -1.061720

11 32 29.519014 -2.480986

12 31 26.428512 -4.571488

13 30 30.506678 0.506678

14 45 41.329843 -3.670157

_,p_value=stats.ttest_rel(a=weight1,b=weight2)

print(p_value)
0.5732936534411279

if p_value < 0.05: # alpha value is 0.05 or 5%

print(" we are rejecting null hypothesis")
else:
print("we are accepting null hypothesis")

we are accepting null hypothesis

Correlation
import seaborn as sns
df=sns.load_dataset('iris')

df.shape

(150, 5)

df.corr()

sepal_length sepal_width petal_length petal_width

sepal_length 1.000000 -0.117570 0.871754 0.817941

sepal_width -0.117570 1.000000 -0.428440 -0.366126

petal_length 0.871754 -0.428440 1.000000 0.962865

petal_width 0.817941 -0.366126 0.962865 1.000000

sns.pairplot(df)

<seaborn.axisgrid.PairGrid at 0x29595f8ea60>

Anova Test(F-Test)
The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time.

For example, if we wanted to test whether petal_width age differs based on some categorical variable like species, we have to compare
the means of each level or group the variable

One Way F-test(Anova) :-

It tell whether two or more groups are similar or not based on their mean similarity and f-score.

Example : there are 3 different category of iris flowers and their petal width and need to check whether all 3 group are similar or not

import seaborn as sns

df1=sns.load_dataset('iris')

df1.head()

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

df_anova = df1[['petal_width','species']]

grps = pd.unique(df_anova.species.values)

grps

array(['setosa', 'versicolor', 'virginica'], dtype=object)

d_data = {grp:df_anova['petal_width'][df_anova.species == grp] for grp in grps}

d_data

{'setosa': 0 0.2
1 0.2
2 0.2
3 0.2
4 0.2
5 0.4
6 0.3
7 0.2
8 0.2
9 0.1
10 0.2
11 0.2
12 0.1
13 0.1
14 0.2
15 0.4
16 0.4
17 0.3
18 0.3
19 0.3
20 0.2
21 0.4
22 0.2
23 0.5
24 0.2
25 0.2
26 0.4
27 0.2
28 0.2
29 0.2
30 0.2
31 0.4
32 0.1
33 0.2
34 0.2
35 0.2
36 0.2
37 0.1
38 0.2
39 0.2
40 0.3
41 0.3
42 0.2
43 0.6
44 0.4
45 0.3
46 0.2
47 0.2
48 0.2
49 0.2
Name: petal_width, dtype: float64,
'versicolor': 50 1.4
51 1.5
52 1.5
53 1.3
54 1.5
55 1.3
56 1.6
57 1.0
58 1.3
59 1.4
60 1.0
61 1.5
62 1.0
63 1.4
64 1.3
65 1.4
66 1.5
67 1.0
68 1.5
69 1.1
70 1.8
71 1.3
72 1.5
73 1.2
74 1.3
75 1.4
76 1.4
77 1.7
78 1.5
79 1.0
80 1.1
81 1.0
82 1.2
83 1.6
84 1.5
85 1.6
86 1.5
87 1.3
88 1.3
89 1.3
90 1.2
91 1.4
92 1.2
93 1.0
94 1.3
95 1.2
96 1.3
97 1.3
98 1.1
99 1.3
Name: petal_width, dtype: float64,
'virginica': 100 2.5
101 1.9
102 2.1
103 1.8
104 2.2
105 2.1
106 1.7
107 1.8
108 1.8
109 2.5
110 2.0
111 1.9
112 2.1
113 2.0
114 2.4
115 2.3
116 1.8
117 2.2
118 2.3
119 1.5
120 2.3
121 2.0
122 2.0
123 1.8
124 2.1
125 1.8
126 1.8
127 1.8
128 2.1
129 1.6
130 1.9
131 2.0
132 2.2
133 1.5
134 1.4
135 2.3
136 2.4
137 1.8
138 1.8
139 2.1
140 2.4
141 2.3
142 1.9
143 2.3
144 2.5
145 2.3
146 1.9
147 2.0
148 2.3
149 1.8
Name: petal_width, dtype: float64}

F, p = stats.f_oneway(d_data['setosa'], d_data['versicolor'], d_data['virginica'])

print(p)

4.169445839443116e-85

if p<0.05:
print("reject null hypothesis")
else:
print("accept null hypothesis")

reject null hypothesis

# imports
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15

# similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq

# print mean and sd

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

mean=109.61 stdv=2.22

# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check whether the
# mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

# the function outputs a p_value and z-score corresponding to that value, we compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
# else we reject it.

if(p_value < alpha):

print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")

Reject Null Hypothesis

### EDA ASsignment

from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True)
df=housing['data']

df.head()

Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities ... ScreenPorch PoolArea PoolQC Fence

0 1.0 60.0 RL 65.0 8450.0 Pave None Reg Lvl AllPub ... 0.0 0.0 None None

1 2.0 20.0 RL 80.0 9600.0 Pave None Reg Lvl AllPub ... 0.0 0.0 None None

2 3.0 60.0 RL 68.0 11250.0 Pave None IR1 Lvl AllPub ... 0.0 0.0 None None

3 4.0 70.0 RL 60.0 9550.0 Pave None IR1 Lvl AllPub ... 0.0 0.0 None None

4 5.0 60.0 RL 84.0 14260.0 Pave None IR1 Lvl AllPub ... 0.0 0.0 None None

5 rows × 80 columns
import seaborn as sns
sns.load_dataset('titanic')

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True

887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True

888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False

889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True

890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True

891 rows × 15 columns

#Assignments: EDA Of Algerian Dataset

#https://archive.ics.uci.edu/ml/datasets/Algerian+Forest+Fires+Dataset++

#Assignments: Housing Dataset(California)

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

Statistics
No ratings yet
Statistics
163 pages
R Project Document
No ratings yet
R Project Document
48 pages
Assignment2 DMS672
No ratings yet
Assignment2 DMS672
15 pages
ML Lab Manual
No ratings yet
ML Lab Manual
23 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Stat 5 T Test F Test Z Test Chi Square Test PDF
100% (2)
Stat 5 T Test F Test Z Test Chi Square Test PDF
20 pages
Data Science Practical With Solutions BSC Cs Sem 6
No ratings yet
Data Science Practical With Solutions BSC Cs Sem 6
29 pages
Keeratsi HW8
No ratings yet
Keeratsi HW8
17 pages
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
No ratings yet
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
20 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Ploomber Notebook Conversion - 2
No ratings yet
Ploomber Notebook Conversion - 2
14 pages
AbhishekVallecha 2003184 ADS Exp9
No ratings yet
AbhishekVallecha 2003184 ADS Exp9
6 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
ANOVA Models
No ratings yet
ANOVA Models
44 pages
Nandini Matplotlib Ws
No ratings yet
Nandini Matplotlib Ws
10 pages
Import As Import As From Import Import As Import As From Import From Import From Import
No ratings yet
Import As Import As From Import Import As Import As From Import From Import From Import
6 pages
SỐ LIỆU V
No ratings yet
SỐ LIỆU V
52 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
Python Solution
No ratings yet
Python Solution
30 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
Machine Learning Group Project
No ratings yet
Machine Learning Group Project
22 pages
Dsa 1
No ratings yet
Dsa 1
8 pages
ADS Practical Exam Questions
No ratings yet
ADS Practical Exam Questions
14 pages
Analysis of Variance (F-Test) Using Jamovi
No ratings yet
Analysis of Variance (F-Test) Using Jamovi
9 pages
ML#07
No ratings yet
ML#07
21 pages
Python Codes Test 2
No ratings yet
Python Codes Test 2
12 pages
4 12
No ratings yet
4 12
17 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Lola Deviani
No ratings yet
Lola Deviani
8 pages
Statistic Mini Project
No ratings yet
Statistic Mini Project
7 pages
Pratical 11 Python DP
No ratings yet
Pratical 11 Python DP
5 pages
25 - Assignment10.ipynb - Colaboratory
No ratings yet
25 - Assignment10.ipynb - Colaboratory
13 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Dsbdalab 6
No ratings yet
Dsbdalab 6
5 pages
Exame Do Dia 31 01 2020
No ratings yet
Exame Do Dia 31 01 2020
7 pages
Stats Lab (7-9)
No ratings yet
Stats Lab (7-9)
8 pages
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
FDS Lab Question Bank
No ratings yet
FDS Lab Question Bank
11 pages
Ai Lab 01
No ratings yet
Ai Lab 01
6 pages
Similarity Computation of Categrical and Ordinal Data
No ratings yet
Similarity Computation of Categrical and Ordinal Data
11 pages
Unit 8 MCQ 2
No ratings yet
Unit 8 MCQ 2
3 pages
Exp 5,6,7
No ratings yet
Exp 5,6,7
2 pages
Assignment Chapter 13 - Nabilah Aulia
No ratings yet
Assignment Chapter 13 - Nabilah Aulia
12 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Stats December 2024
No ratings yet
Stats December 2024
4 pages
PR Ekonometrika
No ratings yet
PR Ekonometrika
8 pages
Analisis Kermak
No ratings yet
Analisis Kermak
7 pages
7 Output
No ratings yet
7 Output
4 pages
Sample
No ratings yet
Sample
1 page
Assignment 5'
No ratings yet
Assignment 5'
4 pages
1 Assignment 3 - Classification
No ratings yet
1 Assignment 3 - Classification
16 pages
Contoh Output Penelitian
No ratings yet
Contoh Output Penelitian
12 pages
Statistics Exam
No ratings yet
Statistics Exam
7 pages
Ads Exp 1 Code
No ratings yet
Ads Exp 1 Code
3 pages
Print Print Print Print: Import As
No ratings yet
Print Print Print Print: Import As
6 pages
The F Test or Anova
No ratings yet
The F Test or Anova
5 pages
Fds Slips
No ratings yet
Fds Slips
6 pages
Total - Bill Tip Sex Smoker Day Time Size
No ratings yet
Total - Bill Tip Sex Smoker Day Time Size
12 pages
# A Tibble: 34 × 4 : Warning Message: Package Readxl' Was Built Under R Version 4.3.3
No ratings yet
# A Tibble: 34 × 4 : Warning Message: Package Readxl' Was Built Under R Version 4.3.3
3 pages
Id Sepallengthcm Sepalwidthcm Petallengthcm Petalwidthcm Species 0 1 2 3 4
No ratings yet
Id Sepallengthcm Sepalwidthcm Petallengthcm Petalwidthcm Species 0 1 2 3 4
4 pages
Tabulasi Data Hasil Uji Coba Kuesioner
No ratings yet
Tabulasi Data Hasil Uji Coba Kuesioner
11 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
20BCE1205 Lab6
No ratings yet
20BCE1205 Lab6
12 pages
Durbin Watson Tabel (Anwar)
No ratings yet
Durbin Watson Tabel (Anwar)
148 pages
Lampiran Hasil Uji Statistik
No ratings yet
Lampiran Hasil Uji Statistik
4 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
V/DM 3 P (Exp) /atm P (TH) /atm Z
No ratings yet
V/DM 3 P (Exp) /atm P (TH) /atm Z
31 pages
Ouput Penelitian: Keberhasilan - ASI
No ratings yet
Ouput Penelitian: Keberhasilan - ASI
6 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
Stat and Prob Q4 Week 3 Module 11 Alexander Randy Estrada
No ratings yet
Stat and Prob Q4 Week 3 Module 11 Alexander Randy Estrada
21 pages
Lampiran Frequency Table: Pengetahuan
No ratings yet
Lampiran Frequency Table: Pengetahuan
8 pages
UMUR Kekurangan Energi Kronis Crosstabulation
No ratings yet
UMUR Kekurangan Energi Kronis Crosstabulation
9 pages
Npar Tests: Disusun Hipotesis Sebagai Berikut: Ho: Data Berdistribusi Normal Ha: Data Tidak Berdistribusi Normal
No ratings yet
Npar Tests: Disusun Hipotesis Sebagai Berikut: Ho: Data Berdistribusi Normal Ha: Data Tidak Berdistribusi Normal
4 pages
Hasil Output Spss
No ratings yet
Hasil Output Spss
8 pages
Median Test and Fisher Sign Test
No ratings yet
Median Test and Fisher Sign Test
17 pages
Department of Statistics: COURSE STATS 330/762
No ratings yet
Department of Statistics: COURSE STATS 330/762
8 pages
Anova and T Test
No ratings yet
Anova and T Test
5 pages
Analisis Bivariat SEX DIARE Crosstabulation
No ratings yet
Analisis Bivariat SEX DIARE Crosstabulation
5 pages
Uji Linearitas: ANOVA Table
No ratings yet
Uji Linearitas: ANOVA Table
5 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Questions For Exercise 50
No ratings yet
Questions For Exercise 50
3 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hypothesis Testing PDF

Uploaded by

Hypothesis Testing PDF

Uploaded by

Chi-Square Test-

import scipy.stats as stats

import seaborn as sns

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

2 21.01 3.50 Male No Sun Dinner 3

3 23.68 3.31 Male No Sun Dinner 2

4 24.59 3.61 Female No Sun Dinner 4

from scipy.stats import chi2

chi-square statistic:- 0.001934818536627623

Retain H0,There is no relationship between 2 categorical variables

T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.

One-sample T-test with Python

from statsmodels.stats.weightstats import ztest

## Lets take sample

array([35, 16, 43, 30, 24, 43, 30, 27])

from scipy.stats import ttest_1samp

if p_value < 0.05: # alpha value is 0.05 or 5%

we fail to reject the null hypothesis

Some More Examples

if p_value < 0.05: # alpha value is 0.05 or 5%

we are rejecting null hypothesis

Two-sample T-test With Python

if p_value < 0.05: # alpha value is 0.05 or 5%

we are accepting null hypothesis

Paired T-test With Python

weight_10 weight_20 weight_change

if p_value < 0.05: # alpha value is 0.05 or 5%

we are accepting null hypothesis

sepal_length sepal_width petal_length petal_width

sepal_length 1.000000 -0.117570 0.871754 0.817941

sepal_width -0.117570 1.000000 -0.428440 -0.366126

petal_length 0.871754 -0.428440 1.000000 0.962865

petal_width 0.817941 -0.366126 0.962865 1.000000

One Way F-test(Anova) :-

import seaborn as sns

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

array(['setosa', 'versicolor', 'virginica'], dtype=object)

d_data = {grp:df_anova['petal_width'][df_anova.species == grp] for grp in grps}

F, p = stats.f_oneway(d_data['setosa'], d_data['versicolor'], d_data['virginica'])

reject null hypothesis

# Generate a random array of 50 numbers having mean 110 and sd 15

# print mean and sd

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

if(p_value < alpha):

Reject Null Hypothesis

### EDA ASsignment

0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False

1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False

3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False

4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

891 rows × 15 columns

#Assignments: EDA Of Algerian Dataset

#Assignments: Housing Dataset(California)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.