Big Data Medicare Fraud Detection - Finance - Project
Big Data Medicare Fraud Detection - Finance - Project
Detection
WHY ?
01 Database Selection
●
●
Payment Data 2017
Excluded (LEIE) dataset
02 Data Pre-processing
●
●
●
Data cleaning
Feature Engineering
Class weights Balancing
● Logistic Regression
● Gaussian Naïve Bayes
03 Data Modelling ●
●
Random Forest Classifier
Extra Tree Classifier
● Gradient Boosting Classifier
04 End Result
●
●
Conclusion
Future scope
Problem Build an innovative machine
learning model that predicts fraud in
the Medicare industry using
Insights 1.
2.
3.
Tableau
Power BI
Data Modelling
Models Implemented:
• Logistic Regression
Train-Test-Split • Gaussian Naïve Bayes
• and Gradient Boosting
• Classifier
Model Evaluation
Conclusion
● With the increasing number of population of over 65 in USA, Medicare Fraud Detection
is essential
● All types of Fraud Patterns have been Covered.
● Most Fraud Cases committed are in bay area
● Out of 5 Models Performed, best resulting model is Random Forest with AUC 72 %
Future Scope
● Part D Prescriber Data CY 2017. (n.d.). Retrieved June 23, 2020, from
https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-
Reports/Medicare-Provider-Charge-Data/PartD2017
● LEIE Downloadable Databases: Office of Inspector General: U.S. Department of Health
and Human Services. (2020, June 10). Retrieved June 23, 2020, from
https://oig.hhs.gov/exclusions/exclusions_list.asp
● Dataset Downloads. (n.d.). Retrieved June 23, 2020, from
https://www.cms.gov/OpenPayments/Explore-the-Data/Dataset-Downloads
Thank you