0% found this document useful (0 votes)
6 views3 pages

Inbuilt Kmeans

The document is a Jupyter notebook that processes the Iris dataset using Python libraries such as pandas, numpy, and sklearn. It includes data loading, preprocessing (including one-hot encoding and scaling), and KMeans clustering to identify clusters within the data. The notebook also visualizes the sum of squared errors (SSE) for different cluster counts to help determine the optimal number of clusters.

Uploaded by

sai kolupoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Inbuilt Kmeans

The document is a Jupyter notebook that processes the Iris dataset using Python libraries such as pandas, numpy, and sklearn. It includes data loading, preprocessing (including one-hot encoding and scaling), and KMeans clustering to identify clusters within the data. The notebook also visualizes the sum of squared errors (SSE) for different cluster counts to help determine the optimal number of clusters.

Uploaded by

sai kolupoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

9/23/24, 4:39 PM 21BCE2920.

ipynb - Colab

import math
import os
import gc
import random

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import pprint

from sklearn.cluster import KMeans


from sklearn.preprocessing import StandardScaler

input_data = pd.read_csv("Iris.csv")
input_data.head()

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

Next steps: Generate code with input_data


toggle_off View recommended plots New interactive sheet

input_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB

input_data = pd.get_dummies(input_data)
input_data = input_data.drop(['Id'], axis = 1)
input_data

Species_Iris- Species_Iris- Species_Iris-


SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
setosa versicolor virginica

0 5.1 3.5 1.4 0.2 True False False

1 4.9 3.0 1.4 0.2 True False False

2 4.7 3.2 1.3 0.2 True False False

3 4.6 3.1 1.5 0.2 True False False

4 5.0 3.6 1.4 0.2 True False False

... ... ... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 False False True

146 6.3 2.5 5.0 1.9 False False True

147 6.5 3.0 5.2 2.0 False False True

148 6.2 3.4 5.4 2.3 False False True

149 5.9 3.0 5.1 1.8 False False True

Next steps: Generate code with input_data


toggle_off View recommended plots New interactive sheet

scaled_data = StandardScaler().fit_transform(input_data)
scaled_data[:10]

https://colab.research.google.com/drive/1FoEQ0l5WVUciLo7jL2A2eWqAuGB1pwE_#scrollTo=SPDHHI7Miz6h&printMode=true 1/3
9/23/24, 4:39 PM 21BCE2920.ipynb - Colab

array([[-0.90068117, 1.03205722, -1.3412724 , -1.31297673, 1.41421356,


-0.70710678, -0.70710678],
[-1.14301691, -0.1249576 , -1.3412724 , -1.31297673, 1.41421356,
-0.70710678, -0.70710678],
[-1.38535265, 0.33784833, -1.39813811, -1.31297673, 1.41421356,
-0.70710678, -0.70710678],
[-1.50652052, 0.10644536, -1.2844067 , -1.31297673, 1.41421356,
-0.70710678, -0.70710678],
[-1.02184904, 1.26346019, -1.3412724 , -1.31297673, 1.41421356,
-0.70710678, -0.70710678],
[-0.53717756, 1.95766909, -1.17067529, -1.05003079, 1.41421356,
-0.70710678, -0.70710678],
[-1.50652052, 0.80065426, -1.3412724 , -1.18150376, 1.41421356,
-0.70710678, -0.70710678],
[-1.02184904, 0.80065426, -1.2844067 , -1.31297673, 1.41421356,
-0.70710678, -0.70710678],
[-1.74885626, -0.35636057, -1.3412724 , -1.31297673, 1.41421356,
-0.70710678, -0.70710678],
[-1.14301691, 0.10644536, -1.2844067 , -1.4444497 , 1.41421356,
-0.70710678, -0.70710678]])

kmeans_kwargs = {
"init": "random",
"n_init": 10,
"random_state": 1,
}

sse = []
for k in range(1, 11):
kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
kmeans.fit(scaled_data)
sse.append(kmeans.inertia_)
plt.plot(range(1, 11), sse)
plt.xticks(range(1, 11))
plt.xlabel("Number of Clusters")
plt.ylabel("SSE")
plt.show()

kmeans = KMeans(init="random", n_clusters=4, n_init=10, random_state=1)


kmeans.fit(scaled_data)
kmeans.labels_

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1, 3, 1, 1, 3, 1, 3, 1,
1, 3, 1, 3, 3, 1, 1, 1, 1, 3, 1, 3, 1, 3, 1, 1, 3, 3, 3, 1, 1, 1,
3, 3, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 3], dtype=int32)

Start coding or generate with AI.

https://colab.research.google.com/drive/1FoEQ0l5WVUciLo7jL2A2eWqAuGB1pwE_#scrollTo=SPDHHI7Miz6h&printMode=true 2/3
9/23/24, 4:39 PM 21BCE2920.ipynb - Colab

https://colab.research.google.com/drive/1FoEQ0l5WVUciLo7jL2A2eWqAuGB1pwE_#scrollTo=SPDHHI7Miz6h&printMode=true 3/3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy