0% found this document useful (0 votes)
301 views16 pages

Lead Score Case Study Presentation

The document describes a case study to build a logistic regression model to assign lead scores between 0-100 to leads for an education company. The goal is to identify the most promising leads, with a target conversion rate of 80%. The methodology involved data cleaning, exploratory analysis, feature selection using RFE, and building a logistic regression model. The model was evaluated on train and test sets and achieved a sensitivity of 81% on test data, meeting the target. Key indicators of leads likely and unlikely to convert were identified. A lead score threshold of 34 is recommended to identify hot leads.

Uploaded by

ashwin choudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
301 views16 pages

Lead Score Case Study Presentation

The document describes a case study to build a logistic regression model to assign lead scores between 0-100 to leads for an education company. The goal is to identify the most promising leads, with a target conversion rate of 80%. The methodology involved data cleaning, exploratory analysis, feature selection using RFE, and building a logistic regression model. The model was evaluated on train and test sets and achieved a sensitivity of 81% on test data, meeting the target. Key indicators of leads likely and unlikely to convert were identified. A lead score threshold of 34 is recommended to identify hot leads.

Uploaded by

ashwin choudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

LEAD SCORING CASE

STUDY

Submitted by : Ashwin Chaudhari


Yash Dudure
Lead Score Case Study for X education
Problem Statement :
An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested
in the courses land on their website and browse for courses.
The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse
the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number,
they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the
sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead
conversion rate at X education is around 30%.

Business Goal :
X Education want to develop a model to select the most promising leads, i.e. the leads that are most likely to convert into paying customers.
The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher
lead score have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular,
has given a ballpark of the target lead conversion rate to be around 80%.

Goal of the Case Study is :

1.Build a logistic regression model to assign a lead score between 0 and 100 to each of the leads which can be used by the company to target
potential leads. A higher score would mean that the lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead
is cold and will mostly not get converted.
2.There are some more problems presented by the company which your model should be able to adjust to if the company's requirement
changes in the future so you will need to handle these as well. These problems are provided in a separate doc file. Please fill it based on the
logistic regression model you got in the first step. Also, make sure you include this in your final PPT where you'll make recommendations.
Problem solving methodology

STEP
Data Sourcing , Cleaning and
3

STEP
Preparation
2

STEP
• Read the Data from Source

STEP
•Convert data into clean format
suitable for analysis
•Remove duplicate data
•Outlier Treatment
•Exploratory Data Analysis
•Feature Standardization
Feature Scaling and Splitting
Train and Test
Sets
• Feature Scaling of Numeric data
• Splitting data into train and test set.

Model Building
• Feature Selection using RFE Result
• Determine the optimal model using Logistic • Determine the lead score and check if target final
Regression predictions amounts to 80% conversion rate.
• Calculate various metrics like accuracy, • Evaluate the final prediction on the test set using
sensitivity, specificity, precision and recall and cut off threshold from sensitivity and specificity
evaluate the model. metrics
Data Cleaning and Preparation
Strategy for Data Cleaning :

a. Check percentage of null values in columns and


drop the columns which have more than 45%
missing values.

b.Also, some of the variables are created by the


sales team once they contact the potential lead.
These variables will not be available for the
model building as these features would not be
available before the lead is being contacted. We
will drop these columns too.

c. Some of the columns have only 1 category.


These columns will not add any value to the
model and can be deleted.

d.Some of the columns have one of the value as


"Select" These should be considered as null
values. Data Value needs to be updated for
these columns
Univariate Analysis - Categorical
Univariate Analysis - Categorical
Univariate Analysis - Categorical
Univariate Analysis - Numerical

Insight:

TotalVisits and Page Views per


Visit has some outliers which
needs to be treated
Bivariate Analysis - Numerical

Insight:

Data is not normally


distributed.
Outliers Treatment

Insight:

Though outliers in TotalVisits and Page Views Per


Visit shows valid values, this will misclassify the
outcomes and consequently create problems
when making inferences with the wrong model. Box Plot before handling outliers
Logistic Regression is heavily influenced by
outliers. So lets cap the TotalVisits and Page
Views Per Visit to their 95 th percentile due to
following reasons:
• Data set is fairly high number.
• 95th percentile and 99th percentile of these
columns are very close and hence impact of
capping to 95th or 99th percentile will be the
same.
Box Plot after handling outliers
Correlation of variables with Target variable

Top 5 Positive correlated variables

Top 5 Negative correlated variables


Model Building

RFE for Feature Reduction

• So far, we inspected, cleansed, eliminated and


visualized the data.
• We also Standardized the continuous variables,
one-hot encoded categorical variables and divided
the dataset into training and test set
• However, there are still large number of variables,
all of which may not be significant, or may have a
high multi- collinearity.
• RFE was done to attain the top 20 relevant
variables. Later the rest of the variables were
removed manually depending on the VIF values
and p-value (The variables with VIF < 5 and p- value
< 0.05 were kept).
• The resulting dataset thus consists of features that
are significant for the regression modelling
Model Evaluation : Train Dataset
Confusion Matrix ROC Curve Optimal Cut-off

Model Performance
ROC Curve area is 0.88, which From the above graph, 0.335 seems to be ideal
indicates that the model is good. cut-off points
Precision - Recall Trade off
Confusion Matrix Precision – Recall Curve
Based on
Precision- Recall
Trade off curve,
the cutoff point
seems to 0.404.
We will use this
threshold value for
Test Data
Evaluation

Model Performance
Model Evaluation : Test Dataset
Confusion Matrix ROC Curve

ROC Curve area is 0.88, which


indicates that the model is good.

Model Performance

The sensitivity value on Test data is 81.83% vs


80.29% in Train data. The accuracy values is
80.7%. It shows that model is performing well in
test data set also and is not over-trained.
Inferences and Recommendations

Major indicators that a lead will get converted to a hot lead:

1. Lead Origin_Lead Add Form : A lead sourced from Lead


Origin_Lead Add Form is more likely to get converted.
2. Occupation_Working Professional :- Working professionals are
more likely to get converted.
3. Lead_Source_Welingak website : A lead sourced from Welingak
Website is more likely to get converted.
4. Last Activity_SMS Sent :A lead having SMS sent previously are
more likely to get converted.
5. Lead Source_Olark Chat :A lead sourced from Olark Chat is more
likely to get converted

Major indicators that a lead will NOT get converted to a hot


lead:
1. Last_Activity_Olark chat conversation : Customer who had
olark chat conversion, are less likely to get converted into hot
leads.
2. Lead Ongin_Landmg Page Submission : Customer who Recommendations:
hadLead Ongin_Landmg Page Submission, are less likely to The company should use a leads score threshold of 34 to
get converted into hot leads .
3. Do Not Email :Customer who choose Do Not Email, are less identify "Hot Leads" as at this threshold, Sensitivity Score of
likely to get converted into hot leads . the model is around 81% which is as good as CEO's target of
80%.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy