Axe Submission
Axe Submission
Question:
Create a Python script or workflow that automates the analysis of customer data
(e.g., purchase history, browsing behavior) to identify trends and segment customers
for targeted marketing campaigns. What data processing and visualization tools would
you use? Please include a code snippet or pseudocode.
Libraries:
Tools:
Visualization:
Script:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('customer_data.csv')
df['purchase_history'] = df['purchase_history'].str.count(',')
df['browsing_behavior'] = df['browsing_behavior'].str.count(',')
scaler = StandardScaler()
df[['purchase_history', 'browsing_behavior']] =
scaler.fit_transform(df[['purchase_history', 'browsing_behavior']])
kmeans = KMeans(n_clusters=5)
df['cluster'] = kmeans.fit_predict(df[['purchase_history', 'browsing_behavior']])
demographic_df = df['demographic_data'].apply(pd.Series)
sns.boxplot(x='age', data=demographic_df)
plt.title('Age Distribution')
plt.show()
segments = []
for cluster in df['cluster'].unique():
segment_df = df[df['cluster'] == cluster]
segments.append({'cluster': cluster, 'demographics':
segment_df['demographic_data'].describe()})
Explanation:
df = pd.read_csv('customer_data.csv')
- Loads customer data from a CSV file named customer_data.csv into a Pandas
DataFrame (df).
scaler = StandardScaler()
df[['purchase_history', 'browsing_behavior']] =
scaler.fit_transform(df[['purchase_history', 'browsing_behavior']])
- Creates a StandardScaler object (scaler) to normalize the data.
- Selects the purchase_history and browsing_behavior columns from the DataFrame
(df).
- Applies the scaler to these columns using fit_transform(), which:
- Subtracts the mean value from each column.
- Divides by the standard deviation.
kmeans = KMeans(n_clusters=5)
df['cluster'] = kmeans.fit_predict(df[['purchase_history', 'browsing_behavior']])
- Creates a KMeans clustering object (kmeans) with 5 clusters (n_clusters=5).
- Selects the scaled purchase_history and browsing_behavior columns.
- Applies KMeans clustering using fit_predict(), which:
- Assigns each customer to a cluster based on their scaled values.
- Returns the cluster labels (0-4) and stores them in a new column (cluster) in the
DataFrame.