0% found this document useful (0 votes)
4 views7 pages

Unit3

The document contains Python code demonstrating various clustering techniques, including K-Means, hierarchical clustering, and bisecting K-Means, along with visualizations for each method. Additionally, it showcases the application of PCA for dimensionality reduction on student test scores, transforming the data into two principal components and visualizing the results. The code uses libraries such as NumPy, Matplotlib, and Scikit-learn for data manipulation and visualization.

Uploaded by

ssiippit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

Unit3

The document contains Python code demonstrating various clustering techniques, including K-Means, hierarchical clustering, and bisecting K-Means, along with visualizations for each method. Additionally, it showcases the application of PCA for dimensionality reduction on student test scores, transforming the data into two principal components and visualizing the results. The code uses libraries such as NumPy, Matplotlib, and Scikit-learn for data manipulation and visualization.

Uploaded by

ssiippit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

import numpy as np

import matplotlib.pyplot as plt


from sklearn.cluster import KMeans

# Generate synthetic data: 100 points around (1,1) and 100 points
around (5,5)
np.random.seed(42)
data1 = np.random.randn(100, 2) + np.array([1, 1])
data2 = np.random.randn(100, 2) + np.array([5, 5])
data = np.vstack((data1, data2))

# Apply K-Means clustering with K=2


kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the data points and centroids


plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis',
marker='o')
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='x',
s=100, label='Centroids')
plt.title('K-Means Clustering Example')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# Generate synthetic data


np.random.seed(42)
data = np.random.randn(20, 2)

# Perform hierarchical/agglomerative clustering


linked = linkage(data, method='ward')

# Plot the dendrogram


plt.figure(figsize=(10, 7))
dendrogram(linked,
orientation='top',
distance_sort='descending',
show_leaf_counts=True)
plt.title('Dendrogram for Agglomerative Clustering')
plt.xlabel('Data Points')
plt.ylabel('Euclidean Distance')
plt.show()
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist

# Generate synthetic data


np.random.seed(42)
data = np.random.randn(100, 2)

def bisecting_kmeans(data, k):


clusters = [data]
while len(clusters) < k:
# Choose the largest cluster to split
largest_cluster = max(clusters, key=len)
clusters.remove(largest_cluster)

# Perform K-Means with k=2 on the selected cluster


kmeans = KMeans(n_clusters=2,
random_state=42).fit(largest_cluster)
labels = kmeans.labels_
# Split the cluster into two clusters
cluster1 = largest_cluster[labels == 0]
cluster2 = largest_cluster[labels == 1]

clusters.append(cluster1)
clusters.append(cluster2)

return clusters

# Perform bisecting K-Means to get 3 clusters


clusters = bisecting_kmeans(data, 3)

# Plot the clusters


colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
plt.figure(figsize=(8, 6))
for i, cluster in enumerate(clusters):
plt.scatter(cluster[:, 0], cluster[:, 1], c=colors[i %
len(colors)], label=f'Cluster {i+1}')
plt.title('Divisive Clustering using Bisecting K-Means')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Sample student test scores (10 subjects for 6 students)


data = np.array([
[85, 78, 92, 88, 76, 95, 89, 91, 85, 87], # Student 1
[70, 75, 80, 78, 85, 90, 79, 77, 88, 86], # Student 2
[60, 65, 55, 50, 70, 72, 68, 60, 58, 63], # Student 3
[90, 85, 88, 92, 93, 97, 95, 91, 89, 90], # Student 4
[50, 55, 48, 45, 52, 58, 54, 50, 53, 55], # Student 5
[80, 82, 85, 87, 88, 92, 89, 85, 83, 81] # Student 6
])

# Convert to DataFrame
df = pd.DataFrame(data, columns=[f'Subject {i+1}' for i in range(10)])
print("Original Student Scores:")
print(df)
# Step 1: Standardizing the Data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

# Step 2: Applying PCA (Reduce from 10 subjects to 2 main components)


pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_data)

# Step 3: Creating a new DataFrame with the PCA results


pca_df = pd.DataFrame(principal_components, columns=['Overall Academic
Strength', 'Sports Ability'])
print("\nTransformed Data after PCA:")
print(pca_df)

# Step 4: Visualizing the transformed data


plt.figure(figsize=(8,6))
plt.scatter(pca_df['Overall Academic Strength'], pca_df['Sports
Ability'], color='b', s=100)
for i, txt in enumerate(["Student 1", "Student 2", "Student 3",
"Student 4", "Student 5", "Student 6"]):
plt.annotate(txt, (pca_df['Overall Academic Strength'][i],
pca_df['Sports Ability'][i]), fontsize=12)
plt.axhline(0, color='gray', linestyle='--')
plt.axvline(0, color='gray', linestyle='--')
plt.xlabel('Overall Academic Strength')
plt.ylabel('Sports Ability')
plt.title('Student Grouping using PCA')
plt.show()

Original Student Scores:


Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6 \
0 85 78 92 88 76 95
1 70 75 80 78 85 90
2 60 65 55 50 70 72
3 90 85 88 92 93 97
4 50 55 48 45 52 58
5 80 82 85 87 88 92

Subject 7 Subject 8 Subject 9 Subject 10


0 89 91 85 87
1 79 77 88 86
2 68 60 58 63
3 95 91 89 90
4 54 50 53 55
5 89 85 83 81

Transformed Data after PCA:


Overall Academic Strength Sports Ability
0 2.191978 -0.920909
1 0.978242 0.349057
2 -3.026049 0.459299
3 3.222156 0.228173
4 -5.305649 -0.352800
5 1.939321 0.237180

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy