0% found this document useful (0 votes)
10 views4 pages

Axe Submission

Gf

Uploaded by

Sagar veerala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Axe Submission

Gf

Uploaded by

Sagar veerala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Python Developer Initial Submission

Question:

Create a Python script or workflow that automates the analysis of customer data
(e.g., purchase history, browsing behavior) to identify trends and segment customers
for targeted marketing campaigns. What data processing and visualization tools would
you use? Please include a code snippet or pseudocode.

Libraries:

1. Pandas for data manipulation


2. NumPy for numerical computations
3. Matplotlib and Seaborn for visualization
4. Scikit-learn for clustering and segmentation
5. SciPy for statistical analysis

Tools:

1. Jupyter Notebook or Python script for data analysis


2. Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, and SciPy for data processing
and visualization
3. CSV file for data storage

Visualization:

1. Scatter plots for cluster visualization


2. Box plots for demographic data analysis
3. Heatmaps for correlation analysis

Script:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('customer_data.csv')
df['purchase_history'] = df['purchase_history'].str.count(',')
df['browsing_behavior'] = df['browsing_behavior'].str.count(',')
scaler = StandardScaler()
df[['purchase_history', 'browsing_behavior']] =
scaler.fit_transform(df[['purchase_history', 'browsing_behavior']])

kmeans = KMeans(n_clusters=5)
df['cluster'] = kmeans.fit_predict(df[['purchase_history', 'browsing_behavior']])

sns.scatterplot(x='purchase_history', y='browsing_behavior', hue='cluster', data=df)


plt.title('Customer Clusters')
plt.show()

demographic_df = df['demographic_data'].apply(pd.Series)
sns.boxplot(x='age', data=demographic_df)
plt.title('Age Distribution')
plt.show()

segments = []
for cluster in df['cluster'].unique():
segment_df = df[df['cluster'] == cluster]
segments.append({'cluster': cluster, 'demographics':
segment_df['demographic_data'].describe()})

for segment in segments:


print(f"Cluster {segment['cluster']}:")
print(segment['demographics'])

Explanation:

Step 1: Load Data

df = pd.read_csv('customer_data.csv')
- Loads customer data from a CSV file named customer_data.csv into a Pandas
DataFrame (df).

Step 2: Scale Data

scaler = StandardScaler()
df[['purchase_history', 'browsing_behavior']] =
scaler.fit_transform(df[['purchase_history', 'browsing_behavior']])
- Creates a StandardScaler object (scaler) to normalize the data.
- Selects the purchase_history and browsing_behavior columns from the DataFrame
(df).
- Applies the scaler to these columns using fit_transform(), which:
- Subtracts the mean value from each column.
- Divides by the standard deviation.

Step 3: Cluster Customers

kmeans = KMeans(n_clusters=5)
df['cluster'] = kmeans.fit_predict(df[['purchase_history', 'browsing_behavior']])
- Creates a KMeans clustering object (kmeans) with 5 clusters (n_clusters=5).
- Selects the scaled purchase_history and browsing_behavior columns.
- Applies KMeans clustering using fit_predict(), which:
- Assigns each customer to a cluster based on their scaled values.
- Returns the cluster labels (0-4) and stores them in a new column (cluster) in the
DataFrame.

Step 4: Visualize Clusters

sns.scatterplot(x='purchase_history', y='browsing_behavior', hue='cluster', data=df)


plt.show()
- Uses Seaborn's scatterplot() function to visualize the clusters.
- Plots the scaled purchase_history values on the x-axis and browsing_behavior values
on the y-axis.
- Colors each point according to its cluster label (hue='cluster').

Step 5: Print Cluster Demographics

for cluster in df['cluster'].unique():


print(f"Cluster {cluster}:")
print(df[df['cluster'] == cluster]['demographic_data'].describe())

- Loops through each unique cluster label.


- Filters the DataFrame to include only customers in the current cluster.
The solution:

1. Identifies patterns in customer behavior (purchase history and browsing behavior).


2. Groups customers into 5 clusters based on these patterns.
3. Visualizes the clusters to understand the customer segments.
4. Provides demographic insights for each cluster.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy