Customer Churn Prediction Capstone Projectdocx
Customer Churn Prediction Capstone Projectdocx
Machine Learning
Himanshu Tripathi
Reg.No – 22CBBBA014
BBA in Business Analytics
CMR University
Executive Summary
Customer churn represents a significant issue for businesses, especially
in highly competitive industries like telecommunications, SaaS, and
retail. Predicting customer churn allows companies to proactively
implement retention strategies that help maintain revenue, reduce
acquisition costs, and build long-term customer relationships.
Introduction
Customer churn refers to the loss of customers or subscribers, which
directly impacts the revenue and growth of a business. In today’s data-
driven world, companies are increasingly relying on predictive analytics
to anticipate and prevent churn.
Data Overview
Dataset Source:
- Dataset Name: Telco Customer Churn
- Source: Kaggle
- Link: https://www.kaggle.com/datasets/blastchar/telco-customer-
churn
Structure:
- Number of Rows: 7,043
- Number of Columns: 21
- Target Variable: Churn (Yes/No)
Sample Features:
- gender: Male, Female
- SeniorCitizen: 0 (No), 1 (Yes)
- Partner: Yes/No
- Dependents: Yes/No
- tenure: Number of months the customer has stayed
- PhoneService: Yes/No
- InternetService: DSL, Fiber optic, No
- Contract: Month-to-month, One year, Two year
- PaymentMethod: Electronic check, Mailed check, etc.
- MonthlyCharges and TotalCharges: Financial details
Churn by Tenure:
- Customers with low tenure (0-12 months) showed a high churn
tendency.
Model Building
Model Chosen:
- Logistic Regression: Chosen for its simplicity and interpretability.
Train-Test Split:
- 80% training, 20% testing
Model Training:
The logistic regression model was trained using scikit-learn with
max_iter set to 1000 for better convergence.
Model Prediction:
After training, predictions were made on the test dataset to evaluate
performance.
Model Evaluation
Confusion Matrix:
- Shows True Positives, True Negatives, False Positives, and False
Negatives.
Classification Report:
- Accuracy: ~80%
- Precision: Indicates correctness of positive predictions
- Recall: Indicates coverage of actual positives
- F1-Score: Balance between precision and recall
Business Recommendations:
1. Incentivize Long-Term Contracts: Offer discounts for switching to
yearly plans.
2. Targeted Retention Campaigns: Focus on customers with low tenure
and high charges.
3. Improve Customer Support: Ensure tech support is prompt and
helpful.
4. Bundles & Loyalty Programs: Offer bundle discounts on internet +
phone service.
Conclusion
Customer churn is a critical metric for business success. Through this
project, a predictive machine learning model was successfully built to
identify customers likely to churn.
# Encode Churn
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})
Train-Test Split
from sklearn.model_selection import train_test_split
X = df.drop('Churn', axis=1)
y = df['Churn']
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Evaluation Metrics