Water Quality 1673157384
Water Quality 1673157384
import warnings
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')
%matplotlib inline
Data Collection ¶
In [2]: #loading water dataset in pandas
data=pd.read_csv(water_potability.csv')
EDA
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 1 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3276 entries, 0 to 3275
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ph 2785 non-null float64
1 Hardness 3276 non-null float64
2 Solids 3276 non-null float64
3 Chloramines 3276 non-null float64
4 Sulfate 2495 non-null float64
5 Conductivity 3276 non-null float64
6 Organic_carbon 3276 non-null float64
7 Trihalomethanes 3114 non-null float64
8 Turbidity 3276 non-null float64
9 Potability 3276 non-null int64
dtypes: float64(9), int64(1)
memory usage: 256.1 KB
Null Values
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 2 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Out[6]: <AxesSubplot:>
PH
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 3 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Out[10]: 0
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 4 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [13]:
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 5 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 6 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Out[20]: Hardness 0
Solids 0
Chloramines 0
Conductivity 0
Organic_carbon 0
Turbidity 0
Potability 0
ph_random 0
Sulfate_random 0
Trihalomethanes_random 0
dtype: int64
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 7 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 8 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 9 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Hardness
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 10 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
}, template = 'plotly_dark')
fig.update_layout(title='Hardness wrt Potability')
fig.show()
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 11 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [27]:
#plot histogram
px.histogram(data_frame = data, x = 'Hardness', nbins = 10, color =
template = 'plotly_dark')
Solids
In [28]: #check Solids describe
data['Solids'].describe()
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 12 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
},
color_continuous_scale=px.colors.sequential.tempo,
template = 'plotly_dark')
fig.update_layout(title='Hardness wrt Potability')
fig.show()
Chloramines
In [33]: data['Chloramines'].describe()
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 13 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
}, template = 'plotly_dark')
fig.update_layout(title='Chloramines wrt Potability')
fig.show()
Conductivity
In [37]: data["Conductivity"].describe()
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 14 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
},
color=data['Potability']
,template = 'plotly_dark')
fig.update_layout(title='Conductivity wrt Potability')
fig.show()
In [40]:
group_labels = ['distplot'] # name of the dataset
Organic_carbon
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 15 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [41]: data['Organic_carbon'].describe()
In [42]:
group_labels = ['Organic_carbon'] # name of the dataset
In [43]: dt_5=data[data['Organic_carbon']<5]
dt_5_10=data[(data['Organic_carbon']>5)&(data['Organic_carbon']<10)]
dt_10_15=data[(data['Organic_carbon']>10)&(data['Organic_carbon']<15
dt_15_20=data[(data['Organic_carbon']>15)&(data['Organic_carbon']<20
dt_20_25=data[(data['Organic_carbon']>20)&(data['Organic_carbon']<25
dt_25=data[(data['Organic_carbon']>25)]
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 16 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Turbidity
In [45]: data['Turbidity'].describe()
In [46]:
group_labels = ['Turbidity'] # name of the dataset
In [47]: data['turbid_class']=data['Turbidity'].astype(int)
In [48]: data['turbid_class'].unique()
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 17 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [51]: data=data.drop(['turbid_class'],axis=1)
ph_random
In [52]: data['ph_random'].describe()
In [53]:
group_labels = ['ph_random'] # name of the dataset
Sulfate_random
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 18 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [56]: data['Sulfate_random'].describe()
Trihalomethanes_random
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 19 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [59]: data['Trihalomethanes_random'].describe()
In [60]:
group_labels = ['Trihalomethanes_random'] # name of the dataset
}, template = 'plotly_dark')
fig.update_layout(title='Trihalomethane wrt Potability')
fig.show()
Potability
In [63]: data['Potability'].describe()
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 20 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Data Preprocessing
In [66]: from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
In [67]: X=data.drop(['Potability'],axis=1)
y=data['Potability']
Since the data is not in a uniform shape, we scale the data using standard scalar
Modeling
Logistic Regression
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 21 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Out[70]: 0.6219512195121951
K Nearest Neighbours
#Predict Output
predicted= knn.predict(x_test) # 0:Overcast, 2:Mild
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 22 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
SVM
y_pred = svmc.predict(x_test)
print(accuracy_score(y_test,y_pred))
0.6808943089430894
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 23 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Decision Tree
y_pred = tre.predict(x_test)
print(accuracy_score(y_test,y_pred))
0.5487804878048781
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 24 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
Random Forest
y_pred = model_rf.predict(x_test)
print(accuracy_score(y_test,y_pred))
0.6788617886178862
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 25 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
XG Boost
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 26 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 27 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
SVM tuned
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 28 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 29 of 30
water-quality - Jupyter Notebook 26/12/22, 10:02 PM
In [88]: y_pred=gridsearch.predict(x_test)
from sklearn.metrics import confusion_matrix
http://localhost:8888/notebooks/Documents/Machine-Learning-Projects/Water%20Quality%20Classification/water-quality.ipynb Page 30 of 30