Lead Score Case Study Presentation
Lead Score Case Study Presentation
STUDY
Business Goal :
X Education want to develop a model to select the most promising leads, i.e. the leads that are most likely to convert into paying customers.
The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher
lead score have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular,
has given a ballpark of the target lead conversion rate to be around 80%.
1.Build a logistic regression model to assign a lead score between 0 and 100 to each of the leads which can be used by the company to target
potential leads. A higher score would mean that the lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead
is cold and will mostly not get converted.
2.There are some more problems presented by the company which your model should be able to adjust to if the company's requirement
changes in the future so you will need to handle these as well. These problems are provided in a separate doc file. Please fill it based on the
logistic regression model you got in the first step. Also, make sure you include this in your final PPT where you'll make recommendations.
Problem solving methodology
STEP
Data Sourcing , Cleaning and
3
STEP
Preparation
2
STEP
• Read the Data from Source
STEP
•Convert data into clean format
suitable for analysis
•Remove duplicate data
•Outlier Treatment
•Exploratory Data Analysis
•Feature Standardization
Feature Scaling and Splitting
Train and Test
Sets
• Feature Scaling of Numeric data
• Splitting data into train and test set.
Model Building
• Feature Selection using RFE Result
• Determine the optimal model using Logistic • Determine the lead score and check if target final
Regression predictions amounts to 80% conversion rate.
• Calculate various metrics like accuracy, • Evaluate the final prediction on the test set using
sensitivity, specificity, precision and recall and cut off threshold from sensitivity and specificity
evaluate the model. metrics
Data Cleaning and Preparation
Strategy for Data Cleaning :
Insight:
Insight:
Insight:
Model Performance
ROC Curve area is 0.88, which From the above graph, 0.335 seems to be ideal
indicates that the model is good. cut-off points
Precision - Recall Trade off
Confusion Matrix Precision – Recall Curve
Based on
Precision- Recall
Trade off curve,
the cutoff point
seems to 0.404.
We will use this
threshold value for
Test Data
Evaluation
Model Performance
Model Evaluation : Test Dataset
Confusion Matrix ROC Curve
Model Performance