0% found this document useful (0 votes)
11 views3 pages

ADS Exp-1

The document outlines a data analysis process using a diabetes healthcare dataset, including importing libraries, loading data, and performing exploratory data analysis. It details data cleaning steps, basic statistics calculations, and the application of a Poisson distribution and ANOVA test to analyze glucose levels across BMI categories. The results indicate significant differences in glucose levels among different BMI groups.

Uploaded by

pritiyadavce2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views3 pages

ADS Exp-1

The document outlines a data analysis process using a diabetes healthcare dataset, including importing libraries, loading data, and performing exploratory data analysis. It details data cleaning steps, basic statistics calculations, and the application of a Poisson distribution and ANOVA test to analyze glucose levels across BMI categories. The results indicate significant differences in glucose levels among different BMI groups.

Uploaded by

pritiyadavce2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

2/4/25, 9:00 PM ADS.

ipynb - Colab

Rohit Goud

Import Libraries

import pandas as pd
import numpy as np
import seaborn as sns # For visualization
import matplotlib.pyplot as plt # For plotting graphs
from scipy import stats # For statistical analysis
from scipy.stats import poisson, chi2_contingency, f_oneway

Load and Read the Datasets

# Upload the file manually


uploaded = files.upload()

# Get the uploaded file name dynamically


file_name = list(uploaded.keys())[0]

# Read the dataset


df = pd.read_csv("/content/health care diabetes.csv")

Choose Files health care diabetes.csv


health care diabetes.csv(text/csv) - 23873 bytes, last modified: 2/4/2025 - 100% done
Saving health care diabetes csv to health care diabetes (2) csv

Exploratory Data Analysis

print("Dataset Info:")
df.info()

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

print("\nFirst 5 rows:")
print(df.head())

First 5 rows:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome


0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

Data Cleaning

# Data Cleaning - Check for duplicate records


duplicates = df.duplicated().sum()
print(f"\nNumber of duplicate records: {duplicates}")

Number of duplicate records: 0

https://colab.research.google.com/drive/1aL4Hcol5NWRAh7AJ9uJv6plSMcoCg8Q6#scrollTo=st01zBF7kzvh&printMode=true 1/3
2/4/25, 9:00 PM ADS.ipynb - Colab

Replace number of zero values with mean of that column (excluding Outcome column)

numeric_cols = df.columns[df.columns != 'Outcome']


for col in numeric_cols:
df[col] = df[col].replace(0, df[col].mean())

Basic Statistics

print("\nDescriptive Statistics:")
print(df.describe())

Descriptive Statistics:
Pregnancies Glucose BloodPressure SkinThickness Insulin \
count 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479
std 3.369578 31.972618 19.355807 15.952218 115.244002
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000

BMI DiabetesPedigreeFunction Age Outcome


count 768.000000 768.000000 768.000000 768.000000
mean 31.992578 0.471876 33.240885 0.348958
std 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.078000 21.000000 0.000000
25% 27.300000 0.243750 24.000000 0.000000
50% 32.000000 0.372500 29.000000 0.000000
75% 36.600000 0.626250 41.000000 1.000000
max 67.100000 2.420000 81.000000 1.000000

Poisson Distribution

lambda_val = df['Glucose'].mean() # Using Glucose levels as an example


poisson_dist = poisson(lambda_val)
x_vals = np.arange(0, lambda_val * 2)
poisson_probs = poisson_dist.pmf(x_vals)

plt.figure(figsize=(6, 4))
plt.bar(x_vals, poisson_probs, color='blue', alpha=0.7)
plt.title(f"Poisson Distribution (λ = {lambda_val:.2f})")
plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()

ANOVA Test Code

from scipy import stats

# Ensure 'BMI_Category' exists before running ANOVA


if 'BMI_Category' in df.columns:
# ANOVA Test: Difference in 'Glucose' levels among BMI categories
anova_result = stats.f_oneway(df[df['BMI_Category'] == 'Low']['Glucose'],
df[df['BMI_Category'] == 'Medium']['Glucose'],
df[df['BMI Category'] 'High']['Glucose'])
https://colab.research.google.com/drive/1aL4Hcol5NWRAh7AJ9uJv6plSMcoCg8Q6#scrollTo=st01zBF7kzvh&printMode=true 2/3
2/4/25, 9:00 PM ADS.ipynb - Colab
df[df[ BMI_Category ] == High ][ Glucose ])

# Display results in the desired format


print("\nANOVA Test for Difference in Glucose Levels among BMI Categories:")
print(f"F-Statistic: {anova_result.statistic:.4f}, P-Value: {anova_result.pvalue:.4f}")

# Conclusion based on p-value


if anova_result.pvalue < 0.05:
print("Conclusion: Significant differences exist between BMI groups in Glucose levels.")
else:
print("Conclusion: No significant differences found between BMI groups in Glucose levels.")
else:
print("Error: 'BMI_Category' column not found. Please ensure it is created before running ANOVA.")

ANOVA Test for Difference in Glucose Levels among BMI Categories:


F-Statistic: 16.0193, P-Value: 0.0000
Conclusion: Significant differences exist between BMI groups in Glucose levels.

https://colab.research.google.com/drive/1aL4Hcol5NWRAh7AJ9uJv6plSMcoCg8Q6#scrollTo=st01zBF7kzvh&printMode=true 3/3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy