0% found this document useful (0 votes)
40 views7 pages

Name: Yandrapu Manoj Naidu Roll No: 20MDT1017: Choose Files

This document analyzes a diabetes dataset with 768 rows and 9 columns containing information like pregnancies, glucose level, blood pressure, age, and patient outcome. The dataset is loaded and explored, including checking data types and calculating the correlation between columns. Zero values are replaced with column means. A neural network model is created and trained to predict patient outcomes based on the other column values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views7 pages

Name: Yandrapu Manoj Naidu Roll No: 20MDT1017: Choose Files

This document analyzes a diabetes dataset with 768 rows and 9 columns containing information like pregnancies, glucose level, blood pressure, age, and patient outcome. The dataset is loaded and explored, including checking data types and calculating the correlation between columns. Zero values are replaced with column means. A neural network model is created and trained to predict patient outcomes based on the other column values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

8/31/2021 Untitled6.

ipynb - Colaboratory

Name : YANDRAPU MANOJ NAIDU

Roll no: 20MDT1017

from google.colab import files 
uploaded = files.upload()

Choose Files Diabetes.csv


Diabetes.csv(application/vnd.ms-excel) - 23873 bytes, last modified: 8/31/2021 - 100% done
Saving Diabetes.csv to Diabetes (1).csv

import pandas as pd
import io
df = pd.read_csv(io.BytesIO(uploaded['Diabetes.csv']))
print(df)

Pregnancies Glucose ... Age Outcome

0 6 148 ... 50 1

1 1 85 ... 31 0

2 8 183 ... 32 1

3 1 89 ... 21 0

4 0 137 ... 33 1

.. ... ... ... ... ...

763 10 101 ... 63 0

764 2 122 ... 27 0

765 5 121 ... 30 0

766 1 126 ... 47 1

767 1 93 ... 23 0

[768 rows x 9 columns]

df.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

There are total of 9 columns in the Diabetes dataset

df.dtypes

Pregnancies int64

Glucose int64

BloodPressure int64

SkinThickness int64

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 1/7
8/31/2021 Untitled6.ipynb - Colaboratory

Insulin int64

BMI float64

DiabetesPedigreeFunction float64

Age int64

Outcome int64

dtype: object

each column data type

correlation=df.corr()
correlation.style.background_gradient(cmap='coolwarm')

Pregnancies Glucose BloodPressure SkinThickness Insulin BM


Pregnancies 1.000000 0.129459 0.141282 -0.081672 -0.073535 0.0176
Glucose 0.129459 1.000000 0.152590 0.057328 0.331357 0.2210
BloodPressure 0.141282 0.152590 1.000000 0.207371 0.088933 0.2818
SkinThickness -0.081672 0.057328 0.207371 1.000000 0.436783 0.3925
Insulin -0.073535 0.331357 0.088933 0.436783 1.000000 0.1978
BMI 0.017683 0.221071 0.281805 0.392573 0.197859 1.0000
DiabetesPedigreeFunction -0.033523 0.137337 0.041265 0.183928 0.185071 0.1406
Age 0.544341 0.263514 0.239528 -0.113970 -0.042163 0.0362
Outcome 0.221898 0.466581 0.065068 0.074752 0.130548 0.2926

correlation matrix for diabetes dataset

correlation.style.background_gradient(cmap='coolwarm').set_precision(2)

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab


Pregnancies 1.00 0.13 0.14 -0.08 -0.07 0.02 -0.03
Glucose 0.13 1.00 0.15 0.06 0.33 0.22 0.14
BloodPressure 0.14 0.15 1.00 0.21 0.09 0.28 0.04
SkinThickness -0.08 0.06 0.21 1.00 0.44 0.39 0.18
Insulin -0.07 0.33 0.09 0.44 1.00 0.20 0.19
BMI 0.02 0.22 0.28 0.39 0.20 1.00 0.14
DiabetesPedigreeFunction -0.03 0.14 0.04 0.18 0.19 0.14 1.00
Age 0.54 0.26 0.24 -0.11 -0.04 0.04 0.03
Outcome 0.22 0.47 0.07 0.07 0.13 0.29 0.17

rounding the decimal values to two

import matplotlib.pyplot as plt
plt.matshow(df.corr())
plt.show()

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 2/7
8/31/2021 Untitled6.ipynb - Colaboratory

age=df['Age']
out=df['Outcome']

import matplotlib.pyplot as plt
plt.bar(age,out)
plt.show()

bar chart for age vs outcome

df.boxplot(by ='Outcome', column =['Insulin'], grid = False)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 3/7
8/31/2021 Untitled6.ipynb - Colaboratory

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)

<matplotlib.axes._subplots.AxesSubplot at 0x7fdd06376050>
for i in df.columns:
  print(i,":",df[i][df[i]==0].count())

Pregnancies : 111

Glucose : 5

BloodPressure : 35

SkinThickness : 227

Insulin : 374

BMI : 11

DiabetesPedigreeFunction : 0

Age : 0

Outcome : 500

number of zeros present in each column

for col in df.columns:
  val=df[col].mean()
  df[col]=df[col].replace(0,val)

replaced zeros with mean values

df.head(10)

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabete

0 6.000000 148.0 72.000000 35.000000 79.799479 33.600000

1 1.000000 85.0 66.000000 29.000000 79.799479 26.600000

2 8.000000 183.0 64.000000 20.536458 79.799479 23.300000

3 1.000000 89.0 66.000000 23.000000 94.000000 28.100000

4 3.845052 137.0 40.000000 35.000000 168.000000 43.100000

5 5.000000 116.0 74.000000 20.536458 79.799479 25.600000

6 3.000000 78.0 50.000000 32.000000 88.000000 31.000000

7 10.000000 115.0 69.105469 20.536458 79.799479 35.300000

8 2.000000 197.0 70.000000 45.000000 543.000000 30.500000

9 8.000000 125.0 96.000000 20.536458 79.799479 31.992578

df.boxplot(by ='Outcome', column =['Insulin'], grid = False)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 4/7
8/31/2021 Untitled6.ipynb - Colaboratory

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)

<matplotlib.axes._subplots.AxesSubplot at 0x7fdd085f80d0>

# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]

type(X)

numpy.ndarray

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)

# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(514, 8) (254, 8) (514,) (254,)

# determine the number of input features
n_features = X_train.shape[1]

# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_feat
model.add(Dense(8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(1, activation='sigmoid'))

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 5/7
8/31/2021 Untitled6.ipynb - Colaboratory

<keras.callbacks.History at 0x7fdd06210110>

# evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Test Accuracy: %.3f' % acc)

Test Accuracy: 0.661

# make a prediction
import numpy as np
row = np.array([[1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708]])
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)

Predicted: 0.204

import numpy as np
row1=np.array([[1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708]])
row1.shape

(1, 8)

yhat = model.predict([row1])
print('Predicted: %.3f' % yhat)

Predicted: 0.204

model.summary()

Model: "sequential_1"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense_3 (Dense) (None, 10) 90

_________________________________________________________________

dense_4 (Dense) (None, 8) 88

_________________________________________________________________

dense_5 (Dense) (None, 1) 9

=================================================================

Total params: 187

Trainable params: 187

Non-trainable params: 0

_________________________________________________________________

df.boxplot(by ='Outcome', column =['Insulin'], grid = False)

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 6/7
8/31/2021 Untitled6.ipynb - Colaboratory

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationW
return array(a, dtype, copy=False, order=order)

<matplotlib.axes._subplots.AxesSubplot at 0x7fdd086bc450>

check 0s completed at 12:23

https://colab.research.google.com/drive/1IAYPbv5kKKV04u8-wWsy6TvzPIXQplNv#scrollTo=FctYEwzZ-0zM&printMode=true 7/7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy