0% found this document useful (0 votes)

4 views7 pages

Unit3

The document contains Python code demonstrating various clustering techniques, including K-Means, hierarchical clustering, and bisecting K-Means, along with visualizations for each method. Additionally, it showcases the application of PCA for dimensionality reduction on student test scores, transforming the data into two principal components and visualizing the results. The code uses libraries such as NumPy, Matplotlib, and Scikit-learn for data manipulation and visualization.

Uploaded by

ssiippit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views7 pages

Unit3

Uploaded by

ssiippit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

# Generate synthetic data: 100 points around (1,1) and 100 points
around (5,5)
np.random.seed(42)
data1 = np.random.randn(100, 2) + np.array([1, 1])
data2 = np.random.randn(100, 2) + np.array([5, 5])
data = np.vstack((data1, data2))

# Apply K-Means clustering with K=2

kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the data points and centroids

plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis',
marker='o')
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='x',
s=100, label='Centroids')
plt.title('K-Means Clustering Example')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# Generate synthetic data

np.random.seed(42)
data = np.random.randn(20, 2)

# Perform hierarchical/agglomerative clustering

linked = linkage(data, method='ward')

# Plot the dendrogram

plt.figure(figsize=(10, 7))
dendrogram(linked,
orientation='top',
distance_sort='descending',
show_leaf_counts=True)
plt.title('Dendrogram for Agglomerative Clustering')
plt.xlabel('Data Points')
plt.ylabel('Euclidean Distance')
plt.show()
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist

# Generate synthetic data

np.random.seed(42)
data = np.random.randn(100, 2)

def bisecting_kmeans(data, k):

clusters = [data]
while len(clusters) < k:
# Choose the largest cluster to split
largest_cluster = max(clusters, key=len)
clusters.remove(largest_cluster)

# Perform K-Means with k=2 on the selected cluster

kmeans = KMeans(n_clusters=2,
random_state=42).fit(largest_cluster)
labels = kmeans.labels_
# Split the cluster into two clusters
cluster1 = largest_cluster[labels == 0]
cluster2 = largest_cluster[labels == 1]

clusters.append(cluster1)
clusters.append(cluster2)

return clusters

# Perform bisecting K-Means to get 3 clusters

clusters = bisecting_kmeans(data, 3)

# Plot the clusters

colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
plt.figure(figsize=(8, 6))
for i, cluster in enumerate(clusters):
plt.scatter(cluster[:, 0], cluster[:, 1], c=colors[i %
len(colors)], label=f'Cluster {i+1}')
plt.title('Divisive Clustering using Bisecting K-Means')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Sample student test scores (10 subjects for 6 students)

data = np.array([
[85, 78, 92, 88, 76, 95, 89, 91, 85, 87], # Student 1
[70, 75, 80, 78, 85, 90, 79, 77, 88, 86], # Student 2
[60, 65, 55, 50, 70, 72, 68, 60, 58, 63], # Student 3
[90, 85, 88, 92, 93, 97, 95, 91, 89, 90], # Student 4
[50, 55, 48, 45, 52, 58, 54, 50, 53, 55], # Student 5
[80, 82, 85, 87, 88, 92, 89, 85, 83, 81] # Student 6
])

# Convert to DataFrame
df = pd.DataFrame(data, columns=[f'Subject {i+1}' for i in range(10)])
print("Original Student Scores:")
print(df)
# Step 1: Standardizing the Data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

# Step 2: Applying PCA (Reduce from 10 subjects to 2 main components)

pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_data)

# Step 3: Creating a new DataFrame with the PCA results

pca_df = pd.DataFrame(principal_components, columns=['Overall Academic
Strength', 'Sports Ability'])
print("\nTransformed Data after PCA:")
print(pca_df)

# Step 4: Visualizing the transformed data

plt.figure(figsize=(8,6))
plt.scatter(pca_df['Overall Academic Strength'], pca_df['Sports
Ability'], color='b', s=100)
for i, txt in enumerate(["Student 1", "Student 2", "Student 3",
"Student 4", "Student 5", "Student 6"]):
plt.annotate(txt, (pca_df['Overall Academic Strength'][i],
pca_df['Sports Ability'][i]), fontsize=12)
plt.axhline(0, color='gray', linestyle='--')
plt.axvline(0, color='gray', linestyle='--')
plt.xlabel('Overall Academic Strength')
plt.ylabel('Sports Ability')
plt.title('Student Grouping using PCA')
plt.show()

Original Student Scores:

Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6 \
0 85 78 92 88 76 95
1 70 75 80 78 85 90
2 60 65 55 50 70 72
3 90 85 88 92 93 97
4 50 55 48 45 52 58
5 80 82 85 87 88 92

Subject 7 Subject 8 Subject 9 Subject 10

0 89 91 85 87
1 79 77 88 86
2 68 60 58 63
3 95 91 89 90
4 54 50 53 55
5 89 85 83 81

Transformed Data after PCA:

Overall Academic Strength Sports Ability
0 2.191978 -0.920909
1 0.978242 0.349057
2 -3.026049 0.459299
3 3.222156 0.228173
4 -5.305649 -0.352800
5 1.939321 0.237180

Cofe2o4 Jcpds Data Card
100% (1)
Cofe2o4 Jcpds Data Card
3 pages
Correlation Regression
No ratings yet
Correlation Regression
16 pages
File Handling
No ratings yet
File Handling
13 pages
Statistics Exp 1
100% (1)
Statistics Exp 1
15 pages
Histogram: Bin Size Chart Name Minimum For Lowest Bin # of Bins (Max 20)
No ratings yet
Histogram: Bin Size Chart Name Minimum For Lowest Bin # of Bins (Max 20)
3 pages
MBA C MBA20207 SinghPravinRajendra Assignment R
No ratings yet
MBA C MBA20207 SinghPravinRajendra Assignment R
6 pages
Student No: Name: Class: Sem: Subject: Experiment No: Maximum Marks: Marks Obtained
No ratings yet
Student No: Name: Class: Sem: Subject: Experiment No: Maximum Marks: Marks Obtained
14 pages
Shailesh020902@gmail - Com 1
No ratings yet
Shailesh020902@gmail - Com 1
1 page
Dimentionality Reduction Implementation
No ratings yet
Dimentionality Reduction Implementation
8 pages
Fifth Class Hands On - Jupyter Notebook
No ratings yet
Fifth Class Hands On - Jupyter Notebook
11 pages
Data Visulization Notes
No ratings yet
Data Visulization Notes
3 pages
JCPDScardno 024-0735
No ratings yet
JCPDScardno 024-0735
3 pages
Gridding Report - : Data Source
No ratings yet
Gridding Report - : Data Source
8 pages
Findings - Aging in The LGBT Community
No ratings yet
Findings - Aging in The LGBT Community
5 pages
Py Assignment
No ratings yet
Py Assignment
1 page
1 Simple Linear Regression
No ratings yet
1 Simple Linear Regression
9 pages
Assignment 2 Question
No ratings yet
Assignment 2 Question
4 pages
AMCCATALAN DS Python Summative
No ratings yet
AMCCATALAN DS Python Summative
10 pages
3final ML Lab Manual
No ratings yet
3final ML Lab Manual
17 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
Kaolinite Database
No ratings yet
Kaolinite Database
4 pages
Plackett Burman
No ratings yet
Plackett Burman
46 pages
Assignment - DCM2104 - Business Statistics - Set 1 and 2 - Sep 2023.
No ratings yet
Assignment - DCM2104 - Business Statistics - Set 1 and 2 - Sep 2023.
12 pages
DWM Final Exps
No ratings yet
DWM Final Exps
14 pages
K - Means - Clustering - Ipynb - Colaboratory
No ratings yet
K - Means - Clustering - Ipynb - Colaboratory
2 pages
2 Logarisno de Minimos Cuadrados
No ratings yet
2 Logarisno de Minimos Cuadrados
8 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Practical No-2
No ratings yet
Practical No-2
4 pages
Is Zc415 (Data Mining BITS-WILP)
No ratings yet
Is Zc415 (Data Mining BITS-WILP)
4 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
Lab 13
No ratings yet
Lab 13
5 pages
F1 Score Vs ROC AUC Vs Accuracy Vs PR AUC Which Evaluation Metric Should You Choose - Neptune - Ai
No ratings yet
F1 Score Vs ROC AUC Vs Accuracy Vs PR AUC Which Evaluation Metric Should You Choose - Neptune - Ai
1 page
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
7 pages
Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science
No ratings yet
Comprehensive Guide To Multiclass Classification With Sklearn - Towards Data Science
19 pages
Exam1+Review+old 2
No ratings yet
Exam1+Review+old 2
5 pages
Chap8 Basic Cluster Analysis
No ratings yet
Chap8 Basic Cluster Analysis
98 pages
ABAQUSVUMAT初学者用户子程序小例子
No ratings yet
ABAQUSVUMAT初学者用户子程序小例子
15 pages
Dendrogram - Slides
No ratings yet
Dendrogram - Slides
27 pages
Ex 5 - NN - Wheat Seed Data
No ratings yet
Ex 5 - NN - Wheat Seed Data
9 pages
dISCRIPTIVE 6707
No ratings yet
dISCRIPTIVE 6707
39 pages
Project Report Chetan Sharma
No ratings yet
Project Report Chetan Sharma
114 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
DM Lab Internal
No ratings yet
DM Lab Internal
37 pages
Non-Parametric Methods: Power Spectrum Estimation
No ratings yet
Non-Parametric Methods: Power Spectrum Estimation
10 pages
Keeraiit 2
No ratings yet
Keeraiit 2
19 pages
IBA Practical Set A 14th Dec
No ratings yet
IBA Practical Set A 14th Dec
3 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
Clustering Dan Evaluasi
No ratings yet
Clustering Dan Evaluasi
35 pages
Old Exam
No ratings yet
Old Exam
104 pages
Improving Classification With AdaBoost
No ratings yet
Improving Classification With AdaBoost
20 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Practical 4
No ratings yet
Practical 4
9 pages
Classification Algos 222
No ratings yet
Classification Algos 222
23 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
19 pages
Perbandingan Metode Naïve Bayes Dan C4.5 Klasifikasi Status Gizi Bayi Balita
No ratings yet
Perbandingan Metode Naïve Bayes Dan C4.5 Klasifikasi Status Gizi Bayi Balita
11 pages
1 s2.0 S1877050923001102 Main
No ratings yet
1 s2.0 S1877050923001102 Main
7 pages
2085-Article Text-5597-1-10-20220804
No ratings yet
2085-Article Text-5597-1-10-20220804
12 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Python Codes
No ratings yet
Python Codes
15 pages
K Fold
No ratings yet
K Fold
2 pages
3 Clustering
No ratings yet
3 Clustering
18 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Impact of Outliers On Machine Learning Models
No ratings yet
Impact of Outliers On Machine Learning Models
2 pages
Tutorial 8
No ratings yet
Tutorial 8
2 pages
UAS Mechine Learning
No ratings yet
UAS Mechine Learning
5 pages
The Advantages of The Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation
No ratings yet
The Advantages of The Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation
13 pages
Churn Forecasting Using Deep Ljearning Model
No ratings yet
Churn Forecasting Using Deep Ljearning Model
5 pages
6 Code MLP Export
No ratings yet
6 Code MLP Export
2 pages
The Rise and Fall of Byju
No ratings yet
The Rise and Fall of Byju
4 pages
Lec 12 Performances Metrices Matrix Part 2
No ratings yet
Lec 12 Performances Metrices Matrix Part 2
26 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
Untitled - Gamma
No ratings yet
Untitled - Gamma
6 pages
Open Lab 2
No ratings yet
Open Lab 2
15 pages
Fda Batch2program
No ratings yet
Fda Batch2program
18 pages
TABLE 3.2 Regression Parameter Estimation - Solution
No ratings yet
TABLE 3.2 Regression Parameter Estimation - Solution
14 pages
Oil (01) 256 Eee4416
No ratings yet
Oil (01) 256 Eee4416
10 pages
Cos 302
No ratings yet
Cos 302
2 pages
Lab 4 Solved
No ratings yet
Lab 4 Solved
6 pages
MLFILE
No ratings yet
MLFILE
21 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
JCPDS Card 00-022-1086
No ratings yet
JCPDS Card 00-022-1086
3 pages
Gab Case
No ratings yet
Gab Case
5 pages
MIP Handbook 25
No ratings yet
MIP Handbook 25
22 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
28 pages
Basic Financial Plan Template-Final
No ratings yet
Basic Financial Plan Template-Final
4 pages
Project 13 Customer Segmentation Using K Means Clustering
No ratings yet
Project 13 Customer Segmentation Using K Means Clustering
9 pages
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
No ratings yet
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
33 pages
Numpy Advanced Functional Analysis Questions
No ratings yet
Numpy Advanced Functional Analysis Questions
1 page
11-2 - Enrichment PDF
No ratings yet
11-2 - Enrichment PDF
1 page
Class X Practical-2025 - Jupyter Notebook
No ratings yet
Class X Practical-2025 - Jupyter Notebook
6 pages
EfE DA Challenge Week1-2
No ratings yet
EfE DA Challenge Week1-2
12 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Blazor and API Example: Classroom Quiz Application
From Everand
Blazor and API Example: Classroom Quiz Application
Taurius Litvinavicius
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit3

Uploaded by

Unit3

Uploaded by

import numpy as np

import matplotlib.pyplot as plt

# Apply K-Means clustering with K=2

# Plot the data points and centroids

# Generate synthetic data

# Perform hierarchical/agglomerative clustering

# Plot the dendrogram

# Generate synthetic data

def bisecting_kmeans(data, k):

# Perform K-Means with k=2 on the selected cluster

# Perform bisecting K-Means to get 3 clusters

# Plot the clusters

# Sample student test scores (10 subjects for 6 students)

# Step 2: Applying PCA (Reduce from 10 subjects to 2 main components)

# Step 3: Creating a new DataFrame with the PCA results

# Step 4: Visualizing the transformed data

Original Student Scores:

Subject 7 Subject 8 Subject 9 Subject 10

Transformed Data after PCA:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.