Pythone code for predicting diabetes using ML
Pythone code for predicting diabetes using ML
Out[95]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
In [96]: #Checking the total rows and columns
In [97]: d_check.shape
Out[97]: (768, 9)
In [99]: d_check.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
In [100… d_check.columns
Correlation Matrix:
Pregnancies Glucose BloodPressure SkinThickness \
Pregnancies 1.000000 0.129459 0.141282 -0.081672
Glucose 0.129459 1.000000 0.152590 0.057328
BloodPressure 0.141282 0.152590 1.000000 0.207371
SkinThickness -0.081672 0.057328 0.207371 1.000000
Insulin -0.073535 0.331357 0.088933 0.436783
BMI 0.017683 0.221071 0.281805 0.392573
DiabetesPedigreeFunction -0.033523 0.137337 0.041265 0.183928
Age 0.544341 0.263514 0.239528 -0.113970
diabetes 0.221898 0.466581 0.065068 0.074752
Age diabetes
Pregnancies 0.544341 0.221898
Glucose 0.263514 0.466581
BloodPressure 0.239528 0.065068
SkinThickness -0.113970 0.074752
Insulin -0.042163 0.130548
BMI 0.036242 0.292695
DiabetesPedigreeFunction 0.033561 0.173844
Age 1.000000 0.238356
diabetes 0.238356 1.000000
In [103… #To check the count occurrences of each unique value in the 'diabetes' column
In [104… d_check.diabetes.value_counts()
Out[104… diabetes
0 500
1 268
Name: count, dtype: int64
In [106… x_features=list(d_check.columns)
x_features.remove('diabetes')
x_features
Out[106… ['Pregnancies',
'Glucose',
'BloodPressure',
'SkinThickness',
'Insulin',
'BMI',
'DiabetesPedigreeFunction',
'Age']
In [107… #defining explantory(X) and outcome variable(Y),Adding constant to explantory variable(X) get (Bo)
In [108… Y=d_check.diabetes
X = sm.add_constant(d_check[x_features])
In [109… # Initialize the logistic regression model with outcome (Y) and explanatory (X) variables
# Fit the logistic regression model to the data
# Display a detailed summary of the logistic regression results
In [110… logit=sm.Logit(Y,X)
logit_model=logit.fit()
logit_model.summary2()
# Step 3: Rename the columns to 'pvals' (for p-values) and 'vars' (for variable names)
var_p_vals_df.columns = ['pvals', 'vars']
# Step 4: Find the variables where p-value <= 0.05 and return their names as a list
return list(var_p_vals_df[var_p_vals_df.pvals <= 0.05]['vars'])
In [113… significant_vars=get_significant_vars(logit_model)
significant_vars
Out[113… ['const',
'Pregnancies',
'Glucose',
'BloodPressure',
'BMI',
'DiabetesPedigreeFunction']
In [114… # Fit a logistic regression model using significant variables and adding constant to the (X) explanatory variable
In [115… final_logit=sm.Logit(Y,sm.add_constant(X[significant_vars])).fit()
In [117… #Printing actual value vs predicted value for the significant variables from the final summary
Y_pred=pd.DataFrame({'actual':Y,
'predicted_prob':final_logit.predict(
sm.add_constant(X[significant_vars]))})
In [118… # Sample 10 random predictions from the predicted values, ensuring the same random sample every time by setting ran
Y_pred.sample(10,random_state=7)
In [139… # Call the draw_cm function to visualize the confusion matrix using the 'actual' and 'predicted' columns from the Y
draw_cm(Y_pred['actual'],Y_pred['predicted'])
"""
Interpretation
True Positive (Top-Left): 436 instances were correctly predicted as "negative."
False Positive (Top-Right): 64 instances were incorrectly predicted as "positive" when they were actually "negative
False Negative (Bottom-Left): 113 instances were incorrectly predicted as "Negative" when they were actually "posit
True Negative (Bottom-Right): 155 instances were correctly predicted as "positive"
"""
Out[139… '\nInterpretation \nTrue Positive (Top-Left): 436 instances were correctly predicted as "Not Subscribed."\nFalse P
ositive (Top-Right): 64 instances were incorrectly predicted as "Subscribed" when they were actually "Not Subscrib
ed."\nFalse Negative (Bottom-Left): 113 instances were incorrectly predicted as "Not Subscribed" when they were ac
tually "Subscribed."\nTrue Negative (Bottom-Right): 155 instances were correctly predicted as "Subscribed."\n'
In [122… # Print the classification report using actual and predicted labels from the Y_pred DataFrame
print(metrics.classification_report(Y_pred.actual, Y_pred.predicted))
# Display plot
plt.show()
Out[127… 0.84
Youndens index:
localhost:8964/lab/tree/Desktop/MAJORS/ML/CIE 3/diabetes check CIE (3).ipynb 16/18
4/22/25, 4:30 PM diabetes check CIE (3)
In [128… tpr_fpr=pd.DataFrame({"tpr":tpr,"fpr":fpr,"thresholds":thresholds})
tpr_fpr["diff"]=tpr_fpr.tpr - tpr_fpr.fpr
tpr_fpr.sort_values("diff",ascending=False)[0:5]
In [ ]: