0% found this document useful (0 votes)

14 views7 pages

Assignment On ANOVA

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views7 pages

Assignment On ANOVA

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment on ANOVA

Name: Ansari Mohammed Shanouf Valijan

Class: B.E. Computer Engineering, Semester - VII
UID: 2021300004
Batch: Monday (30-09-2024)

Problem Statement:
Considering the stroke dataset, perform one-way ANOVA test to determine whether
smoking status of a person plays a significant role in the person’s body mass index. Further,
include gender as an additional factor under consideration and perform two-way ANOVA
test to determine the significance of the above-mentioned factors (individual and
combined) on the body mass index of a person.

Implementation:
Following is a depiction of step-by-step implementation of the above-mentioned task as it
was carried out to reach respective decisions-

Importing the dataset as a pandas dataframe

import pandas as pd
df = pd.read_csv('/content/healthcare-dataset-stroke-data.csv')
df

Getting rid of those rows in the dataset where smoking status of the patient is unknown
df_clean = df[(df['smoking_status'] != 'Unknown')]
df_clean

Dropping the rows from the dataset where BMI value is missing
df_clean = df_clean.dropna(subset=['bmi'])
df_clean

Dropping all the columns from the dataset that are unrelated with the analysis task
df = df_clean
df = df.drop(['id', 'age', 'hypertension', 'heart_disease', 'ever_married',
'work_type', 'Residence_type', 'avg_glucose_level', 'stroke'], axis=1)
The above processed data was used henceforth for ANOVA testing

Implementation of one-way ANOVA-

Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

Grouping the dataset based on categories present in smoking status (never smoked,
formerly smoked, smokes)
grouped = df.groupby('smoking_status')['bmi'].apply(list)
grouped

Calculating the within and between variances of bmi (based on categories of smoking
status). Further, calculating f-statistic and p-value
f_statistic, p_value = stats.f_oneway(*grouped)
overall_mean = df['bmi'].mean()
ss_between = sum(len(group) * (np.mean(group) - overall_mean) ** 2 for group in
grouped)
ss_within = sum(sum((x - np.mean(group)) ** 2 for x in group) for group in grouped)

SS Between: 397.3861360057199
SS Within: 181918.8244565218
F-statistic: 3.738625586470506, p-value: 0.023883960142755647

Plotting F-statistic as calculated and F-critical (obtained through predefined function) (alpha
assumed as 0.05)
alpha = 0.05
critical_value = stats.f.ppf(1 - alpha, len(grouped) - 1, df.shape[0] -
len(grouped))

plt.figure(figsize=(8, 6))
plt.axvline(f_statistic, color='red', label='Calculated F-statistic')
plt.axvline(critical_value, color='green', label='Critical Value (alpha=0.05)')
plt.title('One-Way ANOVA')
plt.xlabel('F-value')
plt.ylabel('Density')
plt.legend()
plt.grid()
plt.show()
Making a decision based on the above plot
if f_statistic > critical_value:
print("Reject the null hypothesis: smoking status has a significant effect on
BMI.")
else:
print("Fail to reject the null hypothesis: smoking status does not have a
significant effect on BMI.")

Decision as obtained
Reject the null hypothesis: smoking status has a significant effect on BMI.

Thus, we may infer that categorically speaking, different statuses of smoking have mean
BMI values that significantly vary from each other in at-least a pair of categories. Thus,
through one-way ANOVA, we may conclude that smoking status of a person significantly
affects the person’s body mass index.

Implementation of two-way ANOVA-

Importing the required libraries
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

Calculating overall mean, group means and column means

overall_mean = df['bmi'].mean()
group_means = df.groupby(['smoking_status', 'gender'])['bmi'].mean()
means_smoking = df.groupby('smoking_status')['bmi'].mean()
means_gender = df.groupby('gender')['bmi'].mean()

Overall Mean --> 30.290046701692937

Group Means -->
smoking_status gender
formerly smoked Female 30.615721
Male 30.928571
Other 22.400000
never smoked Female 29.862677
Male 30.204777
smokes Female 30.750353
Male 30.261859
Name: bmi, dtype: float64
Means Smoking -->
smoking_status
formerly smoked 30.747192
never smoked 29.982559
smokes 30.543555
Name: bmi, dtype: float64
Means Gender -->
gender
Female 30.208869
Male 30.422405
Other 22.400000
Name: bmi, dtype: float64

Counting the number of records, number of categories in smoking status and that in gender
n = len(df)
n_smoking = len(df['smoking_status'].unique())
n_gender = len(df['gender'].unique())

Calculating the various SS terms

sst = sum((df['bmi'] - overall_mean) ** 2)
ssr = sum(df.groupby('smoking_status').size() * (means_smoking - overall_mean) **
2)
ssg = sum(df.groupby('gender').size() * (means_gender - overall_mean) ** 2)

ss_interaction = 0
for (smoke, gen), group in df.groupby(['smoking_status', 'gender']):
ss_interaction += len(group) * (group['bmi'].mean() - means_smoking[smoke] -
means_gender[gen] + overall_mean) ** 2
ss_within = sst - (ssr + ssg + ss_interaction)

Calculating degrees of freedom and corresponding f-statistic

df_r = n_smoking - 1
df_g = n_gender - 1
df_interaction = (n_smoking - 1) * (n_gender - 1)
df_w = n - (n_smoking + n_gender - 1)

f_smoking = (ssr / df_r) / (ss_within / df_w)

f_gender = (ssg / df_g) / (ss_within / df_w)
f_interaction = (ss_interaction / df_interaction) / (ss_within / df_w)

SSt (Total): 182316.21059252787

SSr (Smoking Status): 397.3861360057199
SSg (Gender): 99.45680590873165
SSc (Interaction): 98.05329791586556
SS Within: 181721.31435269755
F-statistic for Smoking Status: 3.740502252358344
F-statistic for Gender: 0.9361635266224352
F-statistic for Interaction: 0.46147631796115257

Getting the critical values from predefined functions

alpha = 0.05
critical_smoking = stats.f.ppf(1 - alpha, df_r, df_w)
critical_gender = stats.f.ppf(1 - alpha, df_g, df_w)
critical_interaction = stats.f.ppf(1 - alpha, df_interaction, df_w)

Plotting the critical and calculated values

plt.figure(figsize=(10, 6))

plt.axvline(f_smoking, color='red', linestyle='--', label='F-statistic for Smoking

Status')
plt.axvline(critical_smoking, color='green', linestyle='--', label='Critical Value
(Smoking Status)')

plt.axvline(f_gender, color='blue', linestyle='--', label='F-statistic for Gender')

plt.axvline(critical_gender, color='orange', linestyle='--', label='Critical Value
(Gender)')

plt.axvline(f_interaction, color='purple', linestyle='--', label='F-statistic for

Interaction')
plt.axvline(critical_interaction, color='brown', linestyle='--', label='Critical
Value (Interaction)')

plt.title('Two-Way ANOVA F-statistics and Critical Values')

plt.xlabel('F-value')
plt.ylabel('Density')
plt.legend()
plt.grid()
plt.show()

Making respective decisions based on the above comparison

for f_stat, crit_val, factor in zip(
[f_smoking, f_gender, f_interaction],
[critical_smoking, critical_gender, critical_interaction],
['Smoking Status', 'Gender', 'Interaction']
):
if f_stat > crit_val:
print(f"Reject the null hypothesis for {factor}: Significant effect
detected.")
else:
print(f"Fail to reject the null hypothesis for {factor}: No significant
effect detected.")

Reject the null hypothesis for Smoking Status: Significant effect detected.
Fail to reject the null hypothesis for Gender: No significant effect detected.
Reject the null hypothesis for Interaction: significant effect detected.

Thus, as previously hypothesized, smoking status significantly affects the BMI of a person.
However, gender independently does not have a significant effect on the BMI. Gender and
smoking status, on the other hand, show a significant combined effect on the BMI of an
individual.
Conclusion:
By implementing one-way and two-way ANOVA, I was able to develop a better intuition on
how these hypothesis testing methodologies work. By using an example healthcare dataset,
I was able to calculate the various statistic parameters associated with the tests and was
able to compare the same with predefined critical values based on significance level under
consideration. I was able to make decisions of whether or not to reject the null hypothesis
(no difference/significant effect) by comparing the above values. In brief, this assignment
aided me in understanding how ANOVA as a hypothesis testing paradigm may be used to
form certain statements on a dataset as a part of its analysis.

Pset 6 - Fall2019 - Solutions PDF
100% (3)
Pset 6 - Fall2019 - Solutions PDF
33 pages
BASIC CBLM9 Work in A Diverse Environment
100% (2)
BASIC CBLM9 Work in A Diverse Environment
55 pages
American Culture and Drug Abuse
No ratings yet
American Culture and Drug Abuse
1 page
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
17 pages
Step 1
No ratings yet
Step 1
10 pages
Rapport
No ratings yet
Rapport
21 pages
Aih Exp 3
No ratings yet
Aih Exp 3
8 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Two Way Anova
No ratings yet
Two Way Anova
35 pages
Linear Regression: Data Exploration
No ratings yet
Linear Regression: Data Exploration
12 pages
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
Test Questions and Analysis
No ratings yet
Test Questions and Analysis
10 pages
Mas 202
No ratings yet
Mas 202
22 pages
Pertemuan 7 - New
No ratings yet
Pertemuan 7 - New
30 pages
XSTK Câu hỏi
No ratings yet
XSTK Câu hỏi
19 pages
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
No ratings yet
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
17 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Binary Prediction of Smoker Status Using Bio-Signals
No ratings yet
Binary Prediction of Smoker Status Using Bio-Signals
20 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
RL - EX1.Ipynb - Colab
No ratings yet
RL - EX1.Ipynb - Colab
3 pages
About Log Linear Validation
No ratings yet
About Log Linear Validation
10 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
q3 Stat2100 Bautista-Lhuriely
No ratings yet
q3 Stat2100 Bautista-Lhuriely
11 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Adventist University of The Philippines
No ratings yet
Adventist University of The Philippines
3 pages
Harshit Yadav 07
No ratings yet
Harshit Yadav 07
13 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
STAT501 Online - HW2R - Spring2024
No ratings yet
STAT501 Online - HW2R - Spring2024
7 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Stata Session 1 KA (Class)
No ratings yet
Stata Session 1 KA (Class)
6 pages
Programming With Python - Final Assignment - Valerie Riady Huette
No ratings yet
Programming With Python - Final Assignment - Valerie Riady Huette
11 pages
Correction ML
No ratings yet
Correction ML
13 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Brain Stroke Prediction Using ML - Jupyter Notebook
No ratings yet
Brain Stroke Prediction Using ML - Jupyter Notebook
17 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Command For Stata
No ratings yet
Command For Stata
8 pages
Diabetes Prediction - Logistic Regression - Jupyter Notebook
No ratings yet
Diabetes Prediction - Logistic Regression - Jupyter Notebook
4 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Kubsa Guyo Advance Biostatistic
No ratings yet
Kubsa Guyo Advance Biostatistic
30 pages
Programming For Data Analytics
No ratings yet
Programming For Data Analytics
27 pages
Advance Biostatic Group Assignment
No ratings yet
Advance Biostatic Group Assignment
10 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Devry Math325 Ilab 6
No ratings yet
Devry Math325 Ilab 6
6 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Annova and Chi-Square
100% (2)
Annova and Chi-Square
30 pages
Explanationdocx
No ratings yet
Explanationdocx
9 pages
ProbList5 24 SLN
No ratings yet
ProbList5 24 SLN
9 pages
Pset2 Question
No ratings yet
Pset2 Question
5 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Cardiovascular Disease Prediction
No ratings yet
Cardiovascular Disease Prediction
2 pages
Chuchu's Assignment
100% (1)
Chuchu's Assignment
26 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Experiment 3
No ratings yet
Experiment 3
9 pages
Experiment 2
No ratings yet
Experiment 2
12 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Experiment 7
No ratings yet
Experiment 7
13 pages
Experiment 1
No ratings yet
Experiment 1
21 pages
DSM Practical 1
No ratings yet
DSM Practical 1
14 pages
Experiment 5
No ratings yet
Experiment 5
14 pages
Experiment 8
No ratings yet
Experiment 8
13 pages
DSM Mini Project
No ratings yet
DSM Mini Project
11 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Experiment 7
No ratings yet
Experiment 7
6 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Experiment 4
No ratings yet
Experiment 4
12 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Lab6A-Asset Tracking
No ratings yet
Lab6A-Asset Tracking
27 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Class-Work-Naive-Bayes (21-10-2024)
No ratings yet
Class-Work-Naive-Bayes (21-10-2024)
5 pages
Assignment-1, 2
No ratings yet
Assignment-1, 2
2 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Experiment 1
No ratings yet
Experiment 1
16 pages
Class Assignment On Decision Trees
No ratings yet
Class Assignment On Decision Trees
6 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
No ratings yet
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
14 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
No ratings yet
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
29 pages
The Use of Smart Materials in Building Design
No ratings yet
The Use of Smart Materials in Building Design
5 pages
Unit 1 Family Life Lesson 2 Language
No ratings yet
Unit 1 Family Life Lesson 2 Language
76 pages
Lesson Plan in Direct Proof (Paragraph Form)
No ratings yet
Lesson Plan in Direct Proof (Paragraph Form)
6 pages
Photographic Superimpositions
100% (1)
Photographic Superimpositions
10 pages
Matter and Measurement: Theodore L. Brown H. Eugene Lemay, Jr. and Bruce E. Bursten
No ratings yet
Matter and Measurement: Theodore L. Brown H. Eugene Lemay, Jr. and Bruce E. Bursten
48 pages
Top 10 Solar O&M KPIs To Track - Arbox Renewable Energy
No ratings yet
Top 10 Solar O&M KPIs To Track - Arbox Renewable Energy
4 pages
11.metar and Taf
No ratings yet
11.metar and Taf
51 pages
WEEK 7 Module - Circuits
No ratings yet
WEEK 7 Module - Circuits
6 pages
Best Ferrocement Structure 2016
No ratings yet
Best Ferrocement Structure 2016
7 pages
Interpreting Studies L On Fidelity in Interpretation
No ratings yet
Interpreting Studies L On Fidelity in Interpretation
11 pages
Table.1 Demographic Profile of The Respondents in Terms of Age
No ratings yet
Table.1 Demographic Profile of The Respondents in Terms of Age
5 pages
Sentusys™ Intelligent Tube System: Michael Andersson & Erika Hedblom - Sandvik Materials Technology
No ratings yet
Sentusys™ Intelligent Tube System: Michael Andersson & Erika Hedblom - Sandvik Materials Technology
19 pages
Resources and Development Practise Sheet 1
100% (1)
Resources and Development Practise Sheet 1
3 pages
3 Simple Habits To Improve Your Critical Thinking
No ratings yet
3 Simple Habits To Improve Your Critical Thinking
6 pages
Spanos - Past-Life Ids Ufos Satanic Abuse
No ratings yet
Spanos - Past-Life Ids Ufos Satanic Abuse
8 pages
BNAD 277 Tableau Assignment
No ratings yet
BNAD 277 Tableau Assignment
1 page
Complacency - Safety Toolbox Talks Meeting Topics
No ratings yet
Complacency - Safety Toolbox Talks Meeting Topics
2 pages
XMLR400M1P25 - 0-400 Bar
No ratings yet
XMLR400M1P25 - 0-400 Bar
6 pages
ACR-Orientation Work Arrangement
No ratings yet
ACR-Orientation Work Arrangement
10 pages
Progress Test 2A (Units 4-6)
No ratings yet
Progress Test 2A (Units 4-6)
7 pages
WC4331
No ratings yet
WC4331
4 pages
Learning Area Grade Level 7 Quarter Date: English 4
No ratings yet
Learning Area Grade Level 7 Quarter Date: English 4
4 pages
Binomial Theorem: IIT JEE (Main) Examination
No ratings yet
Binomial Theorem: IIT JEE (Main) Examination
56 pages
Saudi Aramco Typical Inspection Plan: LEAK TESTING (Per SAES-A-004) 14-May-18
No ratings yet
Saudi Aramco Typical Inspection Plan: LEAK TESTING (Per SAES-A-004) 14-May-18
10 pages
Beta Catalog Et b1 2005
No ratings yet
Beta Catalog Et b1 2005
317 pages
Strategic Choice Internal External Objectives Mission
No ratings yet
Strategic Choice Internal External Objectives Mission
3 pages
Director's Concept & Vision Slides
100% (2)
Director's Concept & Vision Slides
14 pages
Safety Data Sheet: Section 1. Product and Company Identification
No ratings yet
Safety Data Sheet: Section 1. Product and Company Identification
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Assignment On ANOVA

Uploaded by

Assignment On ANOVA

Uploaded by

Assignment on ANOVA

Name: Ansari Mohammed Shanouf Valijan

Importing the dataset as a pandas dataframe

Implementation of one-way ANOVA-

Implementation of two-way ANOVA-

Calculating overall mean, group means and column means

Overall Mean --> 30.290046701692937

Calculating the various SS terms

Calculating degrees of freedom and corresponding f-statistic

f_smoking = (ssr / df_r) / (ss_within / df_w)

SSt (Total): 182316.21059252787

Getting the critical values from predefined functions

Plotting the critical and calculated values

plt.axvline(f_smoking, color='red', linestyle='--', label='F-statistic for Smoking

plt.axvline(f_gender, color='blue', linestyle='--', label='F-statistic for Gender')

plt.axvline(f_interaction, color='purple', linestyle='--', label='F-statistic for

plt.title('Two-Way ANOVA F-statistics and Critical Values')

Making respective decisions based on the above comparison

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.