0% found this document useful (0 votes)

22 views15 pages

Mini-Project - Churn Analysis .

The document discusses analyzing customer churn by predicting customer behavior. It describes gathering transaction data, visualizing it, engineering features like RFM scores recursively over time, handling class imbalance, training a random forest model, and evaluating results by predicting churn risk probabilities.

Uploaded by

showm8272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views15 pages

Mini-Project - Churn Analysis .

Uploaded by

showm8272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

MINI-PROJECT - CHURN ANALYSIS

➢ PROBLEM STATEMENT:
Businesses must compete fiercely to win over new consumers from suppliers. Since it directly
affects a company’s revenue, client retention is a hot topic for analysis, and early detection of
client churn enables businesses to take proactive measures to keep customers. As a result, all firms
could practice a variety of approaches to identify their clients early on through client retention
initiatives.Churning customers are sudden and problematic in business sense.A UK-based and
registered non-store online retail company mainly sells unique all-occasion gifts. Many customers
of the company are wholesalers.As such, a customer can decide to terminate their services at any
time. This makes it more difficult to intuitively understand when a customer might churn.

➢ GOAL
The desired outcome is to analyze consumers’ behaviors and predict what they might do in the
future.

➢ CUSTOMER CHURN
To the untrained eye, customer behavior is difficult to predict. After all, they are humans with
erratic whims and desires. However, to a machine that can compute thousands of things a second,
trends and patterns are increasingly obvious. Businesses aim to engage with customers in a way
that they return to the store repeatedly, generating revenue each time. However, it can be
challenging to determine which customers are likely to return, and which customers have lost
interest in the goods or services being provided.

Customer Churn: A customer is considered churn-ing if they are actively returning to the store.
Whereas a churn-ed customer is one that is no longer coming back for more.
Customer Churn Risk is the probability that a customer will disengage with the business.

Churn Risk = 1 - Probability of purchase over a determined period

Understanding how your customers behave is imperative to make the most of their
patronage. Today, we can leverage the volumes of data available to us, to predict how likely
a customer is to continue engaging with your business. This can be valuable in the following
ways:

● Provides a data-driven customer-level metric to aid in implementing proactive

decisions which will impact the business.
● Enriches the customer database by adding another dimension over which to
perform a customer segmentation.
● Can be taken a step further to predict exactly when a customer will return. ●
Understanding your inbound and outbound customers, provides insight into
customer retention rate, as well as the overall health of your customer base.

STEP 1 : GATHERING DATA

The dataset is collected from kaggle here.The original dataset has following columns:
Such instances are :

Shape of data before and after removing NaN's Customer_ID is (541909,8) and
(406829,8) respectively. There were 311640 missing records from the Date column ,
removed. Only features required are Date , Customer_ID , transaction value , transaction
time. Basic info of such columns :
Quantity column was cleaned .

VISUALIZATION
No of unique Customers and products in dataset is 2185 and 2881 . Date Range is
decided .
STEP 2 : FEATURE ENGINEERING
RFM is a method of quantifying customers in a meaningful way and can serve as a good
baseline when it comes to performing any analytics on customer specific transactional
data.

Recency, Frequency and Monetary value capture when the customer made their most
recent transaction, how often they have returned for business and what the average sale
was for each customer. We can add on to this by using any other available features (like
GrossMargin, Age, CostToRetain) or other predicted features (Lifetime Value) or
Sentiment Analysis).

The way it works is that we can split the training data into an observed period and a
future period. If we want to predict how much a customer will spend in a year, we would
set the length of the future period as one year, and the rest would come under observed.

This allows us to fit a model to classify which customers engaged with the business in the
future period using features computed in the observed period. Here we introduce the
concept of the cut-off. This is simply where the observed period ends, and defines before
what date we should calculate our features.

● Recency: Time since most recent transaction (hours/days/weeks). We need to set a

cut off to calculate Recency. As in: how many days since the cut-off did they make
a transaction?
● Frequency : Number of distinct time periods in which a customer made a
transaction. This will allow us to to track how much transactions a customer made,
and when they occurred. We can also retain the practice of calculating these metrics
from a cut off date as it will be handy later.
● Monetary value: Average sales amount. Here we are simply calculating
what the average sales amount was across all transactions for each customer.
We may additionally add a ‘TotalAmountSpent’ feature by taking the sum
instead of the mean in the last step.

● Age: Time since first transaction. For this feature we will simply find the number of
days since each customers first transaction. Again, we will need a cut off to calculate the
time between the cut off and the first transaction.

Ideally, this can capture information about customer retention within a certain time
period. This might look something like this:
For the labels we would just set 1 for those who bought something in the future period,
and 0 for everyone who didn’t.

Recursive RFM
Let us apply what we know of RFM thus far and loop it through the dataset.

Let's say the data begins on the left at the beginning of a year. We’ll select a
frequency (for example, one month) and iterate through the data set, computing our
features from observed (o) and generate our labels from future (f). The idea is to
recursively compute these features in order for the model to learn how customers’
behavior changes over time.

Observed (o), Future (f)

For this part of the algorithm we will first get the date of each interval in the span
of the data set and use each of those dates as a cut off to compute our RFM features
and labels. To reiterate, we have selected a frequency of 1 month in our example.
For each cut off (co) date:
● Compute RFM features from all rows (i) before the the cut off ( i → co) ●
Compute labels from rows (i) between the cut off and one month after the
cut off (co → i → co + frequency)
● Outer join the features and labels based on Customer ID to create a data
set to fill in customers who did not make any transactions.
Concatenate all datasets in the loop.

Now that we have generated our dataset, all we need to do now is shuffle and perform a
train/test split on our data. We’ll use 80% for training and 20% for testing.

Class Imbalance
In a classification task, sometimes the classes we want to predict are imbalanced in the
data set. For example, if there are 10 observations and two classes; 2 of them may be in
Class_0 and the other 8 are in Class_1. This could introduce bias into the model as it sees
significantly more of one class than the other. We define the minority class as the one
with fewer observations, and the majority class as the one with more observations. In our
tutorial, this would like something like this:

A technique to remedy this is to either under-sample the majority class, or over-sample

the minority class. Sampling is the practice of taking a subset of the data on which to
perform some operation. Furthermore, under/over sampling is when we either duplicate
observations (over) or remove observations (under) pertaining to the relevant class. It is
definitely worth experiments which option will work best for the task at hand and the data
you are working with.

SMOTE (Synthetic Minority Oversampling Technique) is a tool used for this.

➢ STEP 3 : MODEL
For this example we will try a Random Forest Classifier, as they are very
plug-and-play in their implementation, and so they are very easy to try straight
away.
OUTPUT :

➢ RESULTS
It’s interesting that both data sets produced very similar results, in fact the over sampled
data performed worse than the imbalanced data. Here we can look at the classification
report to see how precise the predictions actually were.
Classification report for predictions on imbalanced data

Classification report for predictions on over-sampled data

Now that our model has been trained, we can use the predict_proba()function to get the
probabilities associated with each prediction. Here is a plot of the predicted probability
distribution. Remember, the probability predicted by the model is how likely a customer
is to engage with the business, and we are looking for the probability that they won’t, so
we can simply subtract each probability from 1.
Histogram of probability distribution of churn risk among customers

As expected, most customers are on either end of the spectrum. However, the most
meaningful and actionable insights are found between them. Customers lying before 0.5
are at a low risk of disengaging, and so this plot indicates that most customers are
healthy. On the other hand, those with a churn risk of over 0.5 are more likely to
disengage and so paying attention to their preferences are imperative to retain them.

➢ Conclusion
Feature engineering techniques like Recursive RFM allow for rich features to describe
customers. As seen here, these features can be useful to analyze their behaviors and
predict what they might do in the future. We also covered how to handle class imbalance
if necessary using SMOTE. Churn Risk is just one of these predictable metrics. Others
include Customer Lifetime Value and Customer Segmentation. What’s special about
Churn Risk, is that it taken be taken a step further to identify the probability the
customers will do something more specific, like buy a specific category of product, or
likelihood of engaging on each day of the week. The potential of customer analytics is
far-reaching and ever-insightful, especially for businesses.

Finance & Accounting Courses - Udemy (13K+ Course)
No ratings yet
Finance & Accounting Courses - Udemy (13K+ Course)
29 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
Customer Segmentation Course 21102024
No ratings yet
Customer Segmentation Course 21102024
33 pages
Presentation 2
No ratings yet
Presentation 2
19 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Ref 4
No ratings yet
Ref 4
16 pages
Concept Note - Chhandavi Gowardhan
No ratings yet
Concept Note - Chhandavi Gowardhan
2 pages
Course Project Report: Indian Institute of Technology, Kanpur
No ratings yet
Course Project Report: Indian Institute of Technology, Kanpur
15 pages
Sample Major Project-1 Report-7th Sem Word
No ratings yet
Sample Major Project-1 Report-7th Sem Word
36 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
Customer Segmentation Using Machine Learning Model
No ratings yet
Customer Segmentation Using Machine Learning Model
12 pages
PDF Custome Segmentation
No ratings yet
PDF Custome Segmentation
18 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
Reseacch
No ratings yet
Reseacch
29 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
Synopsis Customer
No ratings yet
Synopsis Customer
12 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
Varshini Phase 3
No ratings yet
Varshini Phase 3
12 pages
Wa0001.
No ratings yet
Wa0001.
11 pages
Varshini Phase 2
No ratings yet
Varshini Phase 2
19 pages
Conference Paper
No ratings yet
Conference Paper
11 pages
Full Text 01
No ratings yet
Full Text 01
26 pages
ML Project Part B
No ratings yet
ML Project Part B
8 pages
Barney, J. B & Hesterly, W. S. (2019) Strategic Management and Competitive Advantage. Concepts and Cases
0% (1)
Barney, J. B & Hesterly, W. S. (2019) Strategic Management and Competitive Advantage. Concepts and Cases
21 pages
Phase 3
No ratings yet
Phase 3
12 pages
Churn Prediction
100% (3)
Churn Prediction
41 pages
Naresh PBL
No ratings yet
Naresh PBL
18 pages
Phase 1
No ratings yet
Phase 1
2 pages
Executive Summary - Douaa
No ratings yet
Executive Summary - Douaa
3 pages
ML Project Life Cycle With Example
No ratings yet
ML Project Life Cycle With Example
2 pages
Group 13 - Analyzing Customer Churn
No ratings yet
Group 13 - Analyzing Customer Churn
6 pages
Output 4
No ratings yet
Output 4
5 pages
Business Problem
No ratings yet
Business Problem
10 pages
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
No ratings yet
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
9 pages
Predictive Analytics Strategy
No ratings yet
Predictive Analytics Strategy
4 pages
Churnprediction Project File
No ratings yet
Churnprediction Project File
12 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
IJIKMv18p087 105tran8783
No ratings yet
IJIKMv18p087 105tran8783
20 pages
Customer Churn Prediction Capstone Himanshu
No ratings yet
Customer Churn Prediction Capstone Himanshu
5 pages
Nimish
No ratings yet
Nimish
4 pages
Erum
No ratings yet
Erum
18 pages
ReSci - Retention Marketing & Predictive Analytics
No ratings yet
ReSci - Retention Marketing & Predictive Analytics
27 pages
Project Report
No ratings yet
Project Report
11 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Daa 01
No ratings yet
Daa 01
11 pages
INNOVATION - PDF Phrase 2
No ratings yet
INNOVATION - PDF Phrase 2
9 pages
1, DOA LEASE BG-SBLC 11+2% BPU (HSBC) (12) (1) (7) (2) (5) (4) (4) (5) (6) (2) (Enregistré Automatiquement) Corrige
No ratings yet
1, DOA LEASE BG-SBLC 11+2% BPU (HSBC) (12) (1) (7) (2) (5) (4) (4) (5) (6) (2) (Enregistré Automatiquement) Corrige
28 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
Synopsis
No ratings yet
Synopsis
3 pages
Data Mining
No ratings yet
Data Mining
7 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Report
No ratings yet
Report
17 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
No ratings yet
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
7 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
8 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Value Creation: The Role of Pricing
No ratings yet
Value Creation: The Role of Pricing
27 pages
Record Keeping Final
No ratings yet
Record Keeping Final
70 pages
Business Law
No ratings yet
Business Law
12 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
Definitions For Edexcel Economics Unit 3 6EC03 (St. Paul's)
50% (2)
Definitions For Edexcel Economics Unit 3 6EC03 (St. Paul's)
9 pages
Congratulations On Successfully Paying All Your Premiums Regularly & Completing Another Year With Max Life Smart Term Plan
No ratings yet
Congratulations On Successfully Paying All Your Premiums Regularly & Completing Another Year With Max Life Smart Term Plan
2 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
6449 Northridge Mall Brochure 2024 - P2
No ratings yet
6449 Northridge Mall Brochure 2024 - P2
13 pages
BM M2 Merged
No ratings yet
BM M2 Merged
120 pages
GATE Economics 2024
No ratings yet
GATE Economics 2024
37 pages
Risk & Returns Unit 4
No ratings yet
Risk & Returns Unit 4
54 pages
Lecture NOtes On Chapter 9
No ratings yet
Lecture NOtes On Chapter 9
6 pages
The United Republic of Tanzania Tarura - Dar Es Salaam Regional Office
No ratings yet
The United Republic of Tanzania Tarura - Dar Es Salaam Regional Office
58 pages
Percentage Sheet - 06
No ratings yet
Percentage Sheet - 06
3 pages
Kwefaako Development Initiative Kdi Sacco Loan Applications Form
No ratings yet
Kwefaako Development Initiative Kdi Sacco Loan Applications Form
6 pages
PADMALIFE-Annual Report - 2016
No ratings yet
PADMALIFE-Annual Report - 2016
72 pages
Mar-A-Lago Accord - Dollar Fantasy
No ratings yet
Mar-A-Lago Accord - Dollar Fantasy
22 pages
Founders 2.0 Conference
No ratings yet
Founders 2.0 Conference
12 pages
Program of Works - Rehabilitation of Facilities - NVRC
No ratings yet
Program of Works - Rehabilitation of Facilities - NVRC
13 pages
Exercise 7.42 Modern Analysis
No ratings yet
Exercise 7.42 Modern Analysis
2 pages
Marlow Construction (A) Dan Marlow, Founder and President
100% (1)
Marlow Construction (A) Dan Marlow, Founder and President
1 page
Imran Ka Ladka
No ratings yet
Imran Ka Ladka
2 pages
Coinpedia Org Earning Site Audure Review
No ratings yet
Coinpedia Org Earning Site Audure Review
9 pages
S3F84NBXZZ QT8B
No ratings yet
S3F84NBXZZ QT8B
1 page
Devis Olivier
No ratings yet
Devis Olivier
7 pages
ACCT 4200 Project Transactions Final 2022
No ratings yet
ACCT 4200 Project Transactions Final 2022
4 pages
Operation Issue
No ratings yet
Operation Issue
5 pages
French Coffee Shop
No ratings yet
French Coffee Shop
5 pages
310 Management Accounting May 2021
No ratings yet
310 Management Accounting May 2021
4 pages
Lean $elling
From Everand
Lean $elling
Robert Kravontka
No ratings yet
Unleash The Power Of Ad Tracking
From Everand
Unleash The Power Of Ad Tracking
Peterson P. O. Teles
No ratings yet
Six Sigma Principles with Practice
From Everand
Six Sigma Principles with Practice
John Fraser
3.5/5 (3)
Behavior Analytics in Retail: Measure, Monitor and Predict Employee and Customer Activities to Optimize Store Operations and Profitably, and Enhance the Shopping Experience.
From Everand
Behavior Analytics in Retail: Measure, Monitor and Predict Employee and Customer Activities to Optimize Store Operations and Profitably, and Enhance the Shopping Experience.
Ronny Max
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Mini-Project - Churn Analysis .

Uploaded by

Mini-Project - Churn Analysis .

Uploaded by

MINI-PROJECT - CHURN ANALYSIS

Churn Risk = 1 - Probability of purchase over a determined period

● Provides a data-driven customer-level metric to aid in implementing proactive

STEP 1 : GATHERING DATA

● Recency: Time since most recent transaction (hours/days/weeks). We need to set a

Observed (o), Future (f)

A technique to remedy this is to either under-sample the majority class, or over-sample

SMOTE (Synthetic Minority Oversampling Technique) is a tool used for this.

Classification report for predictions on over-sampled data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.