0% found this document useful (0 votes)

11 views9 pages

Chisquare

The document outlines a data analysis process using the 'tips' dataset from Seaborn, focusing on feature extraction and one-hot encoding of categorical variables. It includes steps for preparing the data, performing a Chi-square test to assess the relationship between features and the target variable 'tip', and provides an example of how to conduct a Chi-square test with hypothetical data. The analysis results show the Chi-square values and p-values for various features, indicating their significance in relation to the target variable.

Uploaded by

Lucky Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views9 pages

Chisquare

Uploaded by

Lucky Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

#import libraries

import seaborn as sns

import pandas as pd

# creating df
df=sns.load_dataset('tips')

df.head(5)

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

df.dtypes

total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
dtype: object

df.isnull().sum()

total_bill 0
tip 0
sex 0
smoker 0
day 0
time 0
size 0
dtype: int64

df.shape

(244, 7)

##Extract categorical columns from the dataframe

#Here we extract the columns with category datatype as they are the
categorical columns
categorical_columns =
df.select_dtypes(include=['category']).columns.tolist()

categorical_columns

['sex', 'smoker', 'day', 'time']

# ccreating features
X=df.drop('tip',axis=1)

total_bill sex smoker day time size

0 16.99 Female No Sun Dinner 2
1 10.34 Male No Sun Dinner 3
2 21.01 Male No Sun Dinner 3
3 23.68 Male No Sun Dinner 2
4 24.59 Female No Sun Dinner 4
.. ... ... ... ... ... ...
239 29.03 Male No Sat Dinner 3
240 27.18 Female Yes Sat Dinner 2
241 22.67 Male Yes Sat Dinner 2
242 17.82 Male No Sat Dinner 2
243 18.78 Female No Thur Dinner 2

[244 rows x 6 columns]

# creating target varibale

y=df['tip']

0 1.01
1 1.66
2 3.50
3 3.31
4 3.61
...
239 5.92
240 2.00
241 2.00
242 1.75
243 3.00
Name: tip, Length: 244, dtype: float64

#one hot encoding using OneHotEncoder of Scikit-Learn

from sklearn.preprocessing import OneHotEncoder

#Initialize OneHotEncoder
encoder=OneHotEncoder(sparse=False,dtype=int)

encoder

OneHotEncoder(dtype=<class 'int'>, sparse=False)

# Apply one-hot encoding to the categorical columns

one_hot_encoded=encoder.fit_transform(df[categorical_columns])#--
>df[['sex','day']]
C:\Users\lucky\anaconda3\Lib\site-packages\sklearn\preprocessing\
_encoders.py:868: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4.
`sparse_output` is ignored unless you leave `sparse` to its default
value.
warnings.warn(

# df[['sex','day']]
#df[categorical_columns]

#Create a DataFrame with the one-hot encoded columns

#We use get_feature_names_out() to get the column names for the
encoded data
encoded_df=pd.DataFrame(one_hot_encoded,columns=encoder.get_feature_na
mes_out(categorical_columns))

encoded_df

sex_Female sex_Male smoker_No smoker_Yes day_Fri day_Sat

day_Sun \
0 1 0 1 0 0 0
1
1 0 1 1 0 0 0
1
2 0 1 1 0 0 0
1
3 0 1 1 0 0 0
1
4 1 0 1 0 0 0
1
.. ... ... ... ... ... ...
...
239 0 1 1 0 0 1
0
240 1 0 0 1 0 1
0
241 0 1 0 1 0 1
0
242 0 1 1 0 0 1
0
243 1 0 1 0 0 0
0

day_Thur time_Dinner time_Lunch

0 0 1 0
1 0 1 0
2 0 1 0
3 0 1 0
4 0 1 0
.. ... ... ...
239 0 1 0
240 0 1 0
241 0 1 0
242 0 1 0
243 1 1 0

[244 rows x 10 columns]

# Concatenate the one-hot encoded dataframe with the original

dataframe
df_encoded = pd.concat([df, encoded_df], axis=1)

df_encoded

total_bill tip sex smoker day time size

sex_Female \
0 16.99 1.01 Female No Sun Dinner 2 1

1 10.34 1.66 Male No Sun Dinner 3 0

2 21.01 3.50 Male No Sun Dinner 3 0

3 23.68 3.31 Male No Sun Dinner 2 0

4 24.59 3.61 Female No Sun Dinner 4 1

.. ... ... ... ... ... ... ... ...

239 29.03 5.92 Male No Sat Dinner 3 0

240 27.18 2.00 Female Yes Sat Dinner 2 1

241 22.67 2.00 Male Yes Sat Dinner 2 0

242 17.82 1.75 Male No Sat Dinner 2 0

243 18.78 3.00 Female No Thur Dinner 2 1

sex_Male smoker_No smoker_Yes day_Fri day_Sat day_Sun

day_Thur \
0 0 1 0 0 0 1
0
1 1 1 0 0 0 1
0
2 1 1 0 0 0 1
0
3 1 1 0 0 0 1
0
4 0 1 0 0 0 1
0
.. ... ... ... ... ... ...
...
239 1 1 0 0 1 0
0
240 0 0 1 0 1 0
0
241 1 0 1 0 1 0
0
242 1 1 0 0 1 0
0
243 0 1 0 0 0 0
1

time_Dinner time_Lunch
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
.. ... ...
239 1 0
240 1 0
241 1 0
242 1 0
243 1 0

[244 rows x 17 columns]

# Drop the original categorical columns

df_encoded = df_encoded.drop(categorical_columns, axis=1)

df_encoded

total_bill tip size sex_Female sex_Male smoker_No

smoker_Yes \
0 16.99 1.01 2 1 0 1
0
1 10.34 1.66 3 0 1 1
0
2 21.01 3.50 3 0 1 1
0
3 23.68 3.31 2 0 1 1
0
4 24.59 3.61 4 1 0 1
0
.. ... ... ... ... ... ... .
..
239 29.03 5.92 3 0 1 1
0
240 27.18 2.00 2 1 0 0
1
241 22.67 2.00 2 0 1 0
1
242 17.82 1.75 2 0 1 1
0
243 18.78 3.00 2 1 0 1
0

day_Fri day_Sat day_Sun day_Thur time_Dinner time_Lunch

0 0 0 1 0 1 0
1 0 0 1 0 1 0
2 0 0 1 0 1 0
3 0 0 1 0 1 0
4 0 0 1 0 1 0
.. ... ... ... ... ... ...
239 0 1 0 0 1 0
240 0 1 0 0 1 0
241 0 1 0 0 1 0
242 0 1 0 0 1 0
243 0 0 0 1 1 0

[244 rows x 13 columns]

X=df_encoded.drop('tip',axis=1)

y=df_encoded['tip'].astype('int')

from sklearn.feature_selection import chi2

chi2_value,p_value =chi2(X,y)

chi2_value

array([451.45173551, 23.98377395, 1.99590599, 1.1060116 ,

2.41410127, 3.9196698 , 4.64580879, 5.68446664,
7.50191554, 8.85006698, 3.79699815, 9.82752462])

p_value

array([1.80666902e-92, 2.30619124e-03, 9.81137095e-01, 9.97485792e-01,

9.65616197e-01, 8.64297389e-01, 7.94674362e-01, 6.82528082e-01,
4.83569431e-01, 3.55101700e-01, 8.74958733e-01, 2.77340784e-
01])

import numpy as np

p_value = np.around(p_value,6)

temp =
pd.DataFrame({'features':X.columns,'chi_square':chi2_value,'p_value':p
_value})
temp

features chi_square p_value

0 total_bill 451.451736 0.000000
1 size 23.983774 0.002306
2 sex_Female 1.995906 0.981137
3 sex_Male 1.106012 0.997486
4 smoker_No 2.414101 0.965616
5 smoker_Yes 3.919670 0.864297
6 day_Fri 4.645809 0.794674
7 day_Sat 5.684467 0.682528
8 day_Sun 7.501916 0.483569
9 day_Thur 8.850067 0.355102
10 time_Dinner 3.796998 0.874959
11 time_Lunch 9.827525 0.277341

temp.sort_values('chi_square',ascending=False)

features chi_square p_value

0 total_bill 451.451736 0.000000
1 size 23.983774 0.002306
11 time_Lunch 9.827525 0.277341
9 day_Thur 8.850067 0.355102
8 day_Sun 7.501916 0.483569
7 day_Sat 5.684467 0.682528
6 day_Fri 4.645809 0.794674
5 smoker_Yes 3.919670 0.864297
10 time_Dinner 3.796998 0.874959
4 smoker_No 2.414101 0.965616
2 sex_Female 1.995906 0.981137
3 sex_Male 1.106012 0.997486

Let's walk through an example of a Chi-square test with real data. Imagine we are analyzing the
relationship between gender and preference for a new product. We want to determine whether
gender (male/female) affects the likelihood of liking or disliking the product. This creates two
categorical variables: Gender and Product Preference.

Data Example:
Gender Likes Product Dislikes Product Total
Male 30 10 40
Female 20 30 50
Total 50 40 90

Hypotheses:
• Null Hypothesis (H₀): Gender and product preference are independent (no association).
• Alternative Hypothesis (H₁): Gender and product preference are dependent (there is an
association).
Step-by-Step Chi-square Test:
1. Observed Frequencies:
The table above shows the observed frequencies (real counts from our data).

2. Expected Frequencies:
We calculate the expected frequency for each cell assuming that the two variables are
independent, using the formula:

[ E_{ij} = \frac{{\text{{Row Total}}_i \times \text{{Column Total}}_j}}{\text{{Grand Total}}} ]

For example, the expected frequency of males who like the product is:

[ E_{\text{Male, Likes}} = \frac{{40 \times 50}}{90} = 22.22 ]

Similarly, we can calculate the expected frequency for each cell:

Dislikes Product
Gender Likes Product (Expected) (Expected)
Male 22.22 17.78
Female 27.78 22.22

3. Chi-square Statistic:
We compute the Chi-square statistic using the formula:

[ \chi^2 = \sum \frac{{(O_{ij} - E_{ij})^2}}{E_{ij}} ]

Where (O_{ij}) is the observed frequency and (E_{ij}) is the expected frequency.

For each cell:

• For males who like the product: (\frac{{(30 - 22.22)^2}}{22.22} = 2.72)

• For males who dislike the product: (\frac{{(10 - 17.78)^2}}{17.78} = 3.4)
• For females who like the product: (\frac{{(20 - 27.78)^2}}{27.78} = 2.18)
• For females who dislike the product: (\frac{{(30 - 22.22)^2}}{22.22} = 2.72)

The total Chi-square statistic is:

[ \chi^2 = 2.72 + 3.4 + 2.18 + 2.72 = 11.02 ]

4. Degrees of Freedom:
The degrees of freedom for a Chi-square test is calculated as:

[ \text{{Degrees of Freedom}} = (r - 1) \times (c - 1) ]

Where:

• (r) is the number of rows (2 for male and female).

• (c) is the number of columns (2 for like and dislike).
[ \text{{Degrees of Freedom}} = (2 - 1) \times (2 - 1) = 1 ]

5. p-value:
Using the Chi-square distribution table or a calculator, we find the p-value for (\chi^2 = 11.02)
and 1 degree of freedom.

The p-value turns out to be p = 0.0009.

6. Conclusion:
Since p < 0.05, we reject the null hypothesis and conclude that gender and product preference
are dependent — there is a significant relationship between gender and liking/disliking the
product.

This is how you would apply a Chi-square test in practice!

Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Heart Disease Prediction (1) (1) - 1
No ratings yet
Heart Disease Prediction (1) (1) - 1
1 page
Diabetics Data Analysis
No ratings yet
Diabetics Data Analysis
5 pages
Data Loading - Jupyter Notebook
No ratings yet
Data Loading - Jupyter Notebook
15 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Atividade7 Scikitlearn - Ipynb Colab
No ratings yet
Atividade7 Scikitlearn - Ipynb Colab
8 pages
Ml1.ipynb - Colaboratory
No ratings yet
Ml1.ipynb - Colaboratory
5 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Suvdata Analysis
No ratings yet
Suvdata Analysis
7 pages
Project 16 Calories Burnt Prediction
No ratings yet
Project 16 Calories Burnt Prediction
10 pages
Practical 1
No ratings yet
Practical 1
26 pages
Logistic Regression Implementation
No ratings yet
Logistic Regression Implementation
10 pages
Assignment3 VidulGarg
No ratings yet
Assignment3 VidulGarg
14 pages
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
Item Analysis (Grace)
No ratings yet
Item Analysis (Grace)
6 pages
Churn V2
No ratings yet
Churn V2
15 pages
Diabetes Prediction Using KNN - Ipynb
No ratings yet
Diabetes Prediction Using KNN - Ipynb
723 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
IndahAgustienML 7 1-CNN
No ratings yet
IndahAgustienML 7 1-CNN
11 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Practical 1
No ratings yet
Practical 1
7 pages
Supplychain Businessmodel
No ratings yet
Supplychain Businessmodel
27 pages
Employees Burnout Analysis
No ratings yet
Employees Burnout Analysis
20 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Assignment 2 1
No ratings yet
Assignment 2 1
15 pages
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
Laporan Praktikum Elektronika Digital: Counter
No ratings yet
Laporan Praktikum Elektronika Digital: Counter
10 pages
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
No ratings yet
Correlation: Import As Import As Import As Import As From Import From Import Import Matplotlib Import
1 page
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
6 pages
Prajakta Tajane BARP Assignment (MBA-IBM)
No ratings yet
Prajakta Tajane BARP Assignment (MBA-IBM)
4 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Tabel Kebenaran 2 Bit Mid 1
No ratings yet
Tabel Kebenaran 2 Bit Mid 1
8 pages
NEW Laporan Praktikum 7 Elektronika Digital
No ratings yet
NEW Laporan Praktikum 7 Elektronika Digital
17 pages
Split-Halves: N Odd Group Item No. X Rank CN Ranking N Even Group Item No. Y Rank CN Ranking Rank Differences R R D D
No ratings yet
Split-Halves: N Odd Group Item No. X Rank CN Ranking N Even Group Item No. Y Rank CN Ranking Rank Differences R R D D
7 pages
GEE and Mixed Models For Longitudinal Data
No ratings yet
GEE and Mixed Models For Longitudinal Data
76 pages
DL EXP2.ipynb - Colaboratory
No ratings yet
DL EXP2.ipynb - Colaboratory
6 pages
Shailesh020902@gmail - Com 1
No ratings yet
Shailesh020902@gmail - Com 1
1 page
ML Cops
No ratings yet
ML Cops
17 pages
"D:/ML/Cleaned-Data - CSV": Import As From Import From Import From Import
No ratings yet
"D:/ML/Cleaned-Data - CSV": Import As From Import From Import From Import
5 pages
Experiment No. 9
No ratings yet
Experiment No. 9
9 pages
Python 2025
No ratings yet
Python 2025
25 pages
Ujicoba OK
No ratings yet
Ujicoba OK
30 pages
Lamp 10 Soal Ujicoba
No ratings yet
Lamp 10 Soal Ujicoba
1 page
PTM Item Analysis FDMB
No ratings yet
PTM Item Analysis FDMB
9 pages
Lampiran 10 Distribusi Jawaban Tes Uji Coba Siswa Kelas XI IPA Semester 4 Painan
No ratings yet
Lampiran 10 Distribusi Jawaban Tes Uji Coba Siswa Kelas XI IPA Semester 4 Painan
1 page
Codigo Binario
100% (1)
Codigo Binario
10 pages
DAL Experiment Outputs 6to10
No ratings yet
DAL Experiment Outputs 6to10
16 pages
Lampiran 9. Tingkat Kesukaran
No ratings yet
Lampiran 9. Tingkat Kesukaran
2 pages
Ban Tinh Thiet Ke MVAC
No ratings yet
Ban Tinh Thiet Ke MVAC
99 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Week 4 Naive Bayes Classifier
No ratings yet
Week 4 Naive Bayes Classifier
2 pages
Report Oleksandra Cheipesh
No ratings yet
Report Oleksandra Cheipesh
9 pages
Duck Commander Devotions for Kids
From Everand
Duck Commander Devotions for Kids
Korie Robertson
No ratings yet
Shake Them Haters off Volume 12: Mastering Your Mathematics Skills – the Study Guide
From Everand
Shake Them Haters off Volume 12: Mastering Your Mathematics Skills – the Study Guide
Russell Bailey
No ratings yet
Mms - E.pdf 3
No ratings yet
Mms - E.pdf 3
11 pages
Stationarity Intuition and Definition
No ratings yet
Stationarity Intuition and Definition
5 pages
Solved Exercises and Problems of Statist PDF
100% (1)
Solved Exercises and Problems of Statist PDF
229 pages
Statistics For Economists 1 Ec132 Module Outline 2021
No ratings yet
Statistics For Economists 1 Ec132 Module Outline 2021
4 pages
Feature Selection For Machine Learning Based Iot Botnet Attack Detection
No ratings yet
Feature Selection For Machine Learning Based Iot Botnet Attack Detection
98 pages
Impact of Behavioral Finance On Stock Investment Decisions Applied Study On A Sample of Investors at Amman Stock Exchange
No ratings yet
Impact of Behavioral Finance On Stock Investment Decisions Applied Study On A Sample of Investors at Amman Stock Exchange
17 pages
Lecture #1
No ratings yet
Lecture #1
22 pages
Determination of Titer and Method Blank For Thermometric Titrations Using Tiamo
No ratings yet
Determination of Titer and Method Blank For Thermometric Titrations Using Tiamo
2 pages
BOW in PR2
No ratings yet
BOW in PR2
2 pages
Finite Population
No ratings yet
Finite Population
13 pages
Robust Estimation Methods and Outlier Detection in Mediation Model
No ratings yet
Robust Estimation Methods and Outlier Detection in Mediation Model
25 pages
Practical 2 - T-Test - Practical
No ratings yet
Practical 2 - T-Test - Practical
4 pages
Statistics For Psychology
No ratings yet
Statistics For Psychology
9 pages
Chapter 15
No ratings yet
Chapter 15
76 pages
Academic Journal List Ranking - Updated 2013
No ratings yet
Academic Journal List Ranking - Updated 2013
24 pages
Anova
No ratings yet
Anova
3 pages
Dean I. Radin - Psychophysiological Evidence of Possible Retrocausal Effects in Humans
100% (2)
Dean I. Radin - Psychophysiological Evidence of Possible Retrocausal Effects in Humans
21 pages
Source Code Attractiveness PDF
No ratings yet
Source Code Attractiveness PDF
10 pages
Dislike of Math Thesis (FINAL VERSION)
94% (36)
Dislike of Math Thesis (FINAL VERSION)
35 pages
Non Probability Sampling
No ratings yet
Non Probability Sampling
6 pages
06 Neural Networks For NLP
No ratings yet
06 Neural Networks For NLP
26 pages
Trade Co-Occurrence, Trade Flow Decomposition, and Conditional Order Imbalance in Equity Markets
No ratings yet
Trade Co-Occurrence, Trade Flow Decomposition, and Conditional Order Imbalance in Equity Markets
38 pages
Qualitative Versus Quantitative Research
No ratings yet
Qualitative Versus Quantitative Research
3 pages
Assignment 1 - Do-File Final
No ratings yet
Assignment 1 - Do-File Final
2 pages
9.1 Significance Tests: The Basics: Problem 1 - 911 Calls
No ratings yet
9.1 Significance Tests: The Basics: Problem 1 - 911 Calls
15 pages
9th Biology Chap2
No ratings yet
9th Biology Chap2
17 pages
Animal+Behavior+Pre Lab
No ratings yet
Animal+Behavior+Pre Lab
2 pages
Econometrics Problem Set 5
No ratings yet
Econometrics Problem Set 5
4 pages
Applied Mineral Inventory Estimation 1st Edition Alastair J. Sinclair Instant Download
No ratings yet
Applied Mineral Inventory Estimation 1st Edition Alastair J. Sinclair Instant Download
86 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chisquare

Uploaded by

Chisquare

Uploaded by

#import libraries

import seaborn as sns

total_bill tip sex smoker day time size

##Extract categorical columns from the dataframe

['sex', 'smoker', 'day', 'time']

total_bill sex smoker day time size

[244 rows x 6 columns]

# creating target varibale

#one hot encoding using OneHotEncoder of Scikit-Learn

OneHotEncoder(dtype=<class 'int'>, sparse=False)

# Apply one-hot encoding to the categorical columns

#Create a DataFrame with the one-hot encoded columns

sex_Female sex_Male smoker_No smoker_Yes day_Fri day_Sat

day_Thur time_Dinner time_Lunch

[244 rows x 10 columns]

# Concatenate the one-hot encoded dataframe with the original

total_bill tip sex smoker day time size

1 10.34 1.66 Male No Sun Dinner 3 0

2 21.01 3.50 Male No Sun Dinner 3 0

3 23.68 3.31 Male No Sun Dinner 2 0

4 24.59 3.61 Female No Sun Dinner 4 1

.. ... ... ... ... ... ... ... ...

239 29.03 5.92 Male No Sat Dinner 3 0

240 27.18 2.00 Female Yes Sat Dinner 2 1

241 22.67 2.00 Male Yes Sat Dinner 2 0

242 17.82 1.75 Male No Sat Dinner 2 0

243 18.78 3.00 Female No Thur Dinner 2 1

sex_Male smoker_No smoker_Yes day_Fri day_Sat day_Sun

[244 rows x 17 columns]

# Drop the original categorical columns

total_bill tip size sex_Female sex_Male smoker_No

day_Fri day_Sat day_Sun day_Thur time_Dinner time_Lunch

[244 rows x 13 columns]

from sklearn.feature_selection import chi2

array([451.45173551, 23.98377395, 1.99590599, 1.1060116 ,

array([1.80666902e-92, 2.30619124e-03, 9.81137095e-01, 9.97485792e-01,

features chi_square p_value

features chi_square p_value

[ E_{ij} = \frac{{\text{{Row Total}}_i \times \text{{Column Total}}_j}}{\text{{Grand Total}}} ]

[ E_{\text{Male, Likes}} = \frac{{40 \times 50}}{90} = 22.22 ]

Similarly, we can calculate the expected frequency for each cell:

[ \chi^2 = \sum \frac{{(O_{ij} - E_{ij})^2}}{E_{ij}} ]

For each cell:

• For males who like the product: (\frac{{(30 - 22.22)^2}}{22.22} = 2.72)

The total Chi-square statistic is:

[ \chi^2 = 2.72 + 3.4 + 2.18 + 2.72 = 11.02 ]

[ \text{{Degrees of Freedom}} = (r - 1) \times (c - 1) ]

• (r) is the number of rows (2 for male and female).

The p-value turns out to be p = 0.0009.

This is how you would apply a Chi-square test in practice!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.