0% found this document useful (0 votes)
34 views11 pages

Customer Churn Prediction Capstone Projectdocx

The document discusses a capstone project focused on predicting customer churn using machine learning, specifically utilizing the Telco Customer Churn dataset. A Logistic Regression model was developed to identify customers likely to churn based on various features, achieving an accuracy of approximately 80%. The project concludes with actionable business recommendations to improve customer retention and suggests future work involving advanced modeling techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views11 pages

Customer Churn Prediction Capstone Projectdocx

The document discusses a capstone project focused on predicting customer churn using machine learning, specifically utilizing the Telco Customer Churn dataset. A Logistic Regression model was developed to identify customers likely to churn based on various features, achieving an accuracy of approximately 80%. The project concludes with actionable business recommendations to improve customer retention and suggests future work involving advanced modeling techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Customer Churn Prediction Using

Machine Learning
Himanshu Tripathi
Reg.No – 22CBBBA014
BBA in Business Analytics
CMR University
Executive Summary
Customer churn represents a significant issue for businesses, especially
in highly competitive industries like telecommunications, SaaS, and
retail. Predicting customer churn allows companies to proactively
implement retention strategies that help maintain revenue, reduce
acquisition costs, and build long-term customer relationships.

In this capstone project, a machine learning-based approach is


implemented to predict whether a customer is likely to churn based on
historical behavioral and demographic data. The analysis uses the Telco
Customer Churn dataset, which includes a variety of features such as
service usage, contract type, tenure, payment methods, and more.

The methodology involves data cleaning, exploratory data analysis


(EDA), preprocessing, and modeling using Logistic Regression. The
model's performance is evaluated using accuracy, precision, recall, F1-
score, and confusion matrix. Insights derived from this analysis are then
translated into actionable business strategies.

Introduction
Customer churn refers to the loss of customers or subscribers, which
directly impacts the revenue and growth of a business. In today’s data-
driven world, companies are increasingly relying on predictive analytics
to anticipate and prevent churn.

The objective of this project is to create a predictive model using


machine learning to identify customers who are at risk of leaving. This
will help businesses act in advance by targeting those customers with
specific offers, improving services, or addressing potential concerns.

Predicting churn is not only cost-effective but also enhances customer


satisfaction by showing proactive engagement from the company.
Problem Statement
The goal is to predict customer churn using historical data provided by
a telecommunications company. The company wants to:

- Identify customers who are likely to stop using their services.


- Understand the key drivers of churn.
- Develop actionable strategies for customer retention.

The dataset consists of over 7,000 customer records and 21 features,


including demographic information, account details, and service usage
metrics.

Data Overview
Dataset Source:
- Dataset Name: Telco Customer Churn
- Source: Kaggle
- Link: https://www.kaggle.com/datasets/blastchar/telco-customer-
churn

Structure:
- Number of Rows: 7,043
- Number of Columns: 21
- Target Variable: Churn (Yes/No)

Sample Features:
- gender: Male, Female
- SeniorCitizen: 0 (No), 1 (Yes)
- Partner: Yes/No
- Dependents: Yes/No
- tenure: Number of months the customer has stayed
- PhoneService: Yes/No
- InternetService: DSL, Fiber optic, No
- Contract: Month-to-month, One year, Two year
- PaymentMethod: Electronic check, Mailed check, etc.
- MonthlyCharges and TotalCharges: Financial details

Data Cleaning & Preprocessing


Data Type Corrections:
- TotalCharges was identified as an object type and converted to
numeric.

Handling Missing Values:


- Null values found in TotalCharges were removed after conversion.

Encoding Categorical Variables:


- Target variable Churn was encoded: Yes = 1, No = 0
- Other categorical columns were one-hot encoded using
get_dummies().

Final Dataset Shape:


- After preprocessing, the dataset had no null values and all features
were numeric.

Exploratory Data Analysis (EDA)


Churn Distribution:
- Around 26.5% of customers churned.
- Class imbalance is moderate and manageable.

Churn by Contract Type:


- Customers with Month-to-month contracts had the highest churn rate.

Churn by Tenure:
- Customers with low tenure (0-12 months) showed a high churn
tendency.

Churn by Monthly Charges:


- Customers with higher monthly charges were more likely to churn.
Correlation Matrix:
- A heatmap was generated to understand feature correlations.
- Tenure was negatively correlated with churn.

Model Building
Model Chosen:
- Logistic Regression: Chosen for its simplicity and interpretability.

Train-Test Split:
- 80% training, 20% testing

Model Training:
The logistic regression model was trained using scikit-learn with
max_iter set to 1000 for better convergence.

Model Prediction:
After training, predictions were made on the test dataset to evaluate
performance.

Model Evaluation
Confusion Matrix:
- Shows True Positives, True Negatives, False Positives, and False
Negatives.

Classification Report:
- Accuracy: ~80%
- Precision: Indicates correctness of positive predictions
- Recall: Indicates coverage of actual positives
- F1-Score: Balance between precision and recall

The Logistic Regression model provided a good baseline. For future


improvement, models like Random Forest, XGBoost, or ensemble
methods can be tested.
Insights & Business Recommendations
Key Insights:
- Customers on month-to-month contracts are more likely to churn.
- High monthly charges and lower tenure also indicate higher churn
risk.
- Customers without internet service or tech support showed reduced
engagement.

Business Recommendations:
1. Incentivize Long-Term Contracts: Offer discounts for switching to
yearly plans.
2. Targeted Retention Campaigns: Focus on customers with low tenure
and high charges.
3. Improve Customer Support: Ensure tech support is prompt and
helpful.
4. Bundles & Loyalty Programs: Offer bundle discounts on internet +
phone service.

Conclusion
Customer churn is a critical metric for business success. Through this
project, a predictive machine learning model was successfully built to
identify customers likely to churn.

With proper data preparation, feature engineering, and model training,


the logistic regression model achieved an acceptable level of
performance.

Future work will include testing with more advanced algorithms,


integrating additional customer feedback data, and deploying the model
into a production environment for real-time monitoring.
References
1. Telco Customer Churn Dataset -
https://www.kaggle.com/datasets/blastchar/telco-customer-churn
2. Scikit-learn Documentation - https://scikit-learn.org/
3. Python Data Analysis Library (Pandas) - https://pandas.pydata.org/
4. Seaborn Documentation - https://seaborn.pydata.org/
5. Matplotlib Documentation - https://matplotlib.org/
Modeling Code Snippets
Data Preprocessing
# Convert TotalCharges to numeric
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(inplace=True)

# Encode Churn
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})

# One-hot encode categorical variables


df = pd.get_dummies(df)

Train-Test Split
from sklearn.model_selection import train_test_split

X = df.drop('Churn', axis=1)
y = df['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Logistic Regression Model


from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Evaluation Metrics

from sklearn.metrics import classification_report, confusion_matrix


print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Visualizations
These charts illustrate the distribution and relationship of key features with churn.

Customer Churn Distribution

Churn Rate by Contract Type


Monthly Charges vs Churn

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy