0% found this document useful (0 votes)

131 views25 pages

Big Data Medicare Fraud Detection - Finance - Project

Big Data can help detect Medicare fraud through machine learning models. Medicare spending has increased to $3 trillion annually, while fraud costs up to 10% of expenditures. The document describes building a model that predicts fraud using physician prescription data, payment amounts, and a list of excluded fraudulent providers. Key steps include data selection from CMS and payments datasets, cleaning, feature engineering like joining on identifiers and mapping drug fraud, and class balancing random forests. The best model achieved a 72% AUC. Future work includes cross-validation, hyperparameter tuning, and a real-time fraud detection pipeline.

Uploaded by

santhosh appu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views25 pages

Big Data Medicare Fraud Detection - Finance - Project

Uploaded by

santhosh appu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Big Data Medicare Fraud AT H A RVA K O U S A D I K A R

Detection
WHY ?

US Healthcare spending has Medicare accounts for up to Fraud impact is estimated up

increased by 6.7 % making it $800 bn. to 10%
$ 3 trillion.
Workflow
● CMS Prescriber Data 2017

01 Database Selection
●
●
Payment Data 2017
Excluded (LEIE) dataset

● Data Visualization/ Exploratory Data Analysis

02 Data Pre-processing
●
●
●
Data cleaning
Feature Engineering
Class weights Balancing

● Logistic Regression
● Gaussian Naïve Bayes

03 Data Modelling ●
●
Random Forest Classifier
Extra Tree Classifier
● Gradient Boosting Classifier

04 End Result
●
●
Conclusion
Future scope
Problem Build an innovative machine
learning model that predicts fraud in
the Medicare industry using

Statement anomaly analysis and geo-

demographic metrics.
1. Fraud by Service Providers (Doctors, hospitals, pharmacies)

Fraud 2. Fraud by Insurance subscribers (patient or patient’s employers)

Patterns 3. Fraud by insurance carriers

4. Conspiracy Frauds (involved with all parties)

Govt.
Efforts
Government has initialized the
programs, such as the Medicare
Fraud Strike Force, enacted to
help combat fraud, but
continued efforts are needed to
better mitigate the effects of
fraud.
Tools Used:

Insights 1.

3.
Tableau

Power BI

Spark using Azure HDinsight

Population by states
NPI per State
Exclusion Count
Number of Frauds By state
Dataset Selection

● 25M+ rows and 21 columns

● All information related to prescription, drugs,

01 CMS – Prescriber Data 2017

●
payments and charges by National Provider
Identifier (NPI).
All information on the physician (NPI, Name, City,
Practice, etc.)
● 11M+ rows and 75 columns
● Physicians in the US are required to declare all

02 Payments Received by Physicians

2017
●
payments received from pharmaceutical
companies
The sum of general payment
● Name of drug associated with the payments

● list of individuals and entities that are excluded

List of Excluded Individuals
03 and Entities (LEIE) database
2017
from participating in federally funded healthcare
programs (i.e. Medicare) due to previous healthcare
fraud.
● Mapped fraud labels
Data Pre-Processing
Data cleaning

● Impute missing Data

● Removing duplicates
● Removing outliers
● Factoring the categorical data
● Removing data based on general information.
● Data Sampling: The data set is very imbalanced in terms of fraud detection context as it is very skewed
(99 % no fraudulent cases and less than 1% fraudulent cases)
Feature Engineering

Joining datasets based on NPI, state, city, first and last n

Drug- based Fraudulent cases

Merging drug fraudulent cases with

prescriber data to create more features
Transforming Data and class balancing

Transform skewed data to approximately conform to normality by using log transformation

Class weights assigned to reduce

skewness according to the
balancing ratio
ExtraTrees

Data Modelling

Models Implemented:
• Logistic Regression
Train-Test-Split • Gaussian Naïve Bayes
• and Gradient Boosting
• Classifier

Scaling data using Standard Scalar

Random Classifier

Model Evaluation
Conclusion

● With the increasing number of population of over 65 in USA, Medicare Fraud Detection
is essential
● All types of Fraud Patterns have been Covered.
● Most Fraud Cases committed are in bay area
● Out of 5 Models Performed, best resulting model is Random Forest with AUC 72 %
Future Scope

• Use cross validation for sampling the data into train-test

split.
• Hyper-parameter tuning to increase the overall performance
of the algorithm.
• Build a real-time fraud detection pipeline using ML flow and
Kafka.
• The model needs to be retrained without stopping the
prediction service, since users will keep interacting.
Kafka and zookeeper server initialized
using docker

Random Forest Model hosted using ML flow

References

● Part D Prescriber Data CY 2017. (n.d.). Retrieved June 23, 2020, from
https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-
Reports/Medicare-Provider-Charge-Data/PartD2017
● LEIE Downloadable Databases: Office of Inspector General: U.S. Department of Health
and Human Services. (2020, June 10). Retrieved June 23, 2020, from
https://oig.hhs.gov/exclusions/exclusions_list.asp
● Dataset Downloads. (n.d.). Retrieved June 23, 2020, from
https://www.cms.gov/OpenPayments/Explore-the-Data/Dataset-Downloads
Thank you

Art of Defense With XDR Ebook
No ratings yet
Art of Defense With XDR Ebook
65 pages
Deliberately Divided
100% (1)
Deliberately Divided
521 pages
Interview and Interrogation Techniques
No ratings yet
Interview and Interrogation Techniques
16 pages
HRMDP M&E System
100% (1)
HRMDP M&E System
162 pages
Final Sworn Statement Minnesota 10 Sep 2019-1
No ratings yet
Final Sworn Statement Minnesota 10 Sep 2019-1
30 pages
Cluster Computing
100% (6)
Cluster Computing
28 pages
Benjamin Song FBI Complaint
No ratings yet
Benjamin Song FBI Complaint
16 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
38 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Fraud Master Class - Prevention & Detection
100% (1)
Fraud Master Class - Prevention & Detection
21 pages
Federal Bureau of Investigation Department of Homeland Security Strategic Intelligence Assessment and Data On Domestic Terrorism
No ratings yet
Federal Bureau of Investigation Department of Homeland Security Strategic Intelligence Assessment and Data On Domestic Terrorism
40 pages
Web Forensics For Tackling Terrorism Investigation Use Case
No ratings yet
Web Forensics For Tackling Terrorism Investigation Use Case
14 pages
Cyber Threat Management Summary
No ratings yet
Cyber Threat Management Summary
19 pages
Accuracy of Deception Judgments
100% (1)
Accuracy of Deception Judgments
23 pages
Intelligence Innovation
No ratings yet
Intelligence Innovation
54 pages
Deception 101 Primer On Deception
No ratings yet
Deception 101 Primer On Deception
26 pages
Library Dissertation On CBCT
100% (2)
Library Dissertation On CBCT
8 pages
Important For Opnqryfile With Join Condition
No ratings yet
Important For Opnqryfile With Join Condition
7 pages
Mapping Chinese Footprints and Influence Operations in India2
No ratings yet
Mapping Chinese Footprints and Influence Operations in India2
76 pages
The Use of Trigonometry in Blood Spatter
No ratings yet
The Use of Trigonometry in Blood Spatter
9 pages
AD3491 UNIT 1 NOTES EduEngg
100% (1)
AD3491 UNIT 1 NOTES EduEngg
35 pages
English: Quarter 4 - Module 5 Research Report
No ratings yet
English: Quarter 4 - Module 5 Research Report
16 pages
OPSEC Awareness For Military Members, DoD Employees
100% (1)
OPSEC Awareness For Military Members, DoD Employees
6 pages
ICA Declass 16MAR21
No ratings yet
ICA Declass 16MAR21
15 pages
Current Log
No ratings yet
Current Log
50 pages
Series Practical Guidance To Qualitative Research Part 4 Trustworthiness and Publishing PDF
No ratings yet
Series Practical Guidance To Qualitative Research Part 4 Trustworthiness and Publishing PDF
6 pages
Chapter 2 - Audit Principles
100% (1)
Chapter 2 - Audit Principles
6 pages
Big Data Fraud
No ratings yet
Big Data Fraud
44 pages
How To Coordinate A Campaign
No ratings yet
How To Coordinate A Campaign
114 pages
Guide To Computer Forensics and Investigations Fifth Edition
No ratings yet
Guide To Computer Forensics and Investigations Fifth Edition
48 pages
Talent Management and HRM
No ratings yet
Talent Management and HRM
31 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
5061 - Summer Internship Report
No ratings yet
5061 - Summer Internship Report
25 pages
2022 Global Threat Report
No ratings yet
2022 Global Threat Report
32 pages
02 - Explaining Threat Actors and Threat Intelligence
No ratings yet
02 - Explaining Threat Actors and Threat Intelligence
26 pages
OIG 22 29 Mar22 Redacted
No ratings yet
OIG 22 29 Mar22 Redacted
54 pages
Price Change June 2021
No ratings yet
Price Change June 2021
20 pages
Hulsey Letter and Rausch Response
100% (1)
Hulsey Letter and Rausch Response
5 pages
788-Article Text-2248-1-10-20240213 2
No ratings yet
788-Article Text-2248-1-10-20240213 2
13 pages
How Ransomware Attacks Exploit Active Directory
No ratings yet
How Ransomware Attacks Exploit Active Directory
19 pages
2023 Article 1809
No ratings yet
2023 Article 1809
14 pages
Client Ledger V 10
No ratings yet
Client Ledger V 10
11 pages
Fraud Ebook Latest - Databricks PDF
No ratings yet
Fraud Ebook Latest - Databricks PDF
14 pages
Idera SQL Workload Analysis
No ratings yet
Idera SQL Workload Analysis
12 pages
Forensic Serology, Blood Stain Pattern Analysis, Forensic Biochemistry and Molecular Biology
No ratings yet
Forensic Serology, Blood Stain Pattern Analysis, Forensic Biochemistry and Molecular Biology
56 pages
DBMS Project Final
No ratings yet
DBMS Project Final
21 pages
Idb Lab 2
No ratings yet
Idb Lab 2
8 pages
Adbms Lab Practical-1
No ratings yet
Adbms Lab Practical-1
8 pages
FBI Extremist Symbols 2006
No ratings yet
FBI Extremist Symbols 2006
12 pages
Unit 1
No ratings yet
Unit 1
14 pages
Russian Influence Campaigns Aga - Kevin McCauley - 1 - 61
No ratings yet
Russian Influence Campaigns Aga - Kevin McCauley - 1 - 61
61 pages
Psyop
No ratings yet
Psyop
63 pages
GRSM Standard Operating Procedure: Backup, Storage & Recovery
No ratings yet
GRSM Standard Operating Procedure: Backup, Storage & Recovery
6 pages
Computer Fundementals
No ratings yet
Computer Fundementals
5 pages
005 Lab-Blind-SQLi-WebAppSecurity
No ratings yet
005 Lab-Blind-SQLi-WebAppSecurity
5 pages
ch02 ITSS 459
No ratings yet
ch02 ITSS 459
37 pages
Analyst's Notebook 8
No ratings yet
Analyst's Notebook 8
2 pages
A Guide To The MDMPV2
No ratings yet
A Guide To The MDMPV2
81 pages
CAATS and Fraud - June 14
No ratings yet
CAATS and Fraud - June 14
85 pages
Oracle Cloud Infrastructure 2024 Data Foundations Associate Exam Number: 1Z0-1195-24
No ratings yet
Oracle Cloud Infrastructure 2024 Data Foundations Associate Exam Number: 1Z0-1195-24
23 pages
The Reid Technique A Model For Effective Interviewing and Interrogation Procedures
No ratings yet
The Reid Technique A Model For Effective Interviewing and Interrogation Procedures
6 pages
MIS & Adv. Excel Training Course Brochure
No ratings yet
MIS & Adv. Excel Training Course Brochure
8 pages
Difference Between Classical Integrated
No ratings yet
Difference Between Classical Integrated
3 pages
Chapter 6 Summary
No ratings yet
Chapter 6 Summary
2 pages
1 PB
No ratings yet
1 PB
6 pages
1 2 ArcGIS Components V2 Color
No ratings yet
1 2 ArcGIS Components V2 Color
8 pages
Cat Exam
No ratings yet
Cat Exam
1 page
Claim No Easy Victories: A HIstory and Analysis of Anti-Racist Action
No ratings yet
Claim No Easy Victories: A HIstory and Analysis of Anti-Racist Action
7 pages
Bernhard Fayettevillefoia
No ratings yet
Bernhard Fayettevillefoia
16 pages
Profiling Cyber Attackers Using Case-Based Reasoning
No ratings yet
Profiling Cyber Attackers Using Case-Based Reasoning
10 pages
SD Freedom Caucus 061223
No ratings yet
SD Freedom Caucus 061223
6 pages
Goal-Based Intelligent Agents
No ratings yet
Goal-Based Intelligent Agents
12 pages
Frequency - Wavelength Chart
No ratings yet
Frequency - Wavelength Chart
2 pages
Class 5 Cbse Maths Syllabus 2012-13
No ratings yet
Class 5 Cbse Maths Syllabus 2012-13
17 pages
Patrick Baird PDJ Services Disciplinary Action Colorado Licensing and Regulation
0% (1)
Patrick Baird PDJ Services Disciplinary Action Colorado Licensing and Regulation
4 pages
Rusia Espionaje
No ratings yet
Rusia Espionaje
26 pages
Exposing Hostile Intent
No ratings yet
Exposing Hostile Intent
4 pages
Chapter's Leaked IRS Docs Reveal FULL DONOR LIST and Much More. Here Are Some Screenshot Highlights. (Full PDF Link Below)
No ratings yet
Chapter's Leaked IRS Docs Reveal FULL DONOR LIST and Much More. Here Are Some Screenshot Highlights. (Full PDF Link Below)
5 pages
TITAN Fusion Center Privacy Policy
No ratings yet
TITAN Fusion Center Privacy Policy
13 pages
DS File
No ratings yet
DS File
21 pages
RQS - Reithmaier - Redacted PDF
0% (1)
RQS - Reithmaier - Redacted PDF
2 pages
TOC - The Art and Science of Military Deception (Rothstein, Whaley, Eds)
No ratings yet
TOC - The Art and Science of Military Deception (Rothstein, Whaley, Eds)
10 pages
Attacks On Crisis Pregnancy Centers
No ratings yet
Attacks On Crisis Pregnancy Centers
4 pages
Data Mining For Fraud Detection 4381
No ratings yet
Data Mining For Fraud Detection 4381
27 pages
Goldman Letter To Speaker Johnson
No ratings yet
Goldman Letter To Speaker Johnson
7 pages
Osint Essay
No ratings yet
Osint Essay
3 pages
Usjfcom Codiac
No ratings yet
Usjfcom Codiac
79 pages
Lesson Plan Siop With Reflection and Video
No ratings yet
Lesson Plan Siop With Reflection and Video
4 pages
NCTC FirstResponderIEDs
No ratings yet
NCTC FirstResponderIEDs
8 pages
Digital Forensics Framework A Complete Guide
From Everand
Digital Forensics Framework A Complete Guide
Gerardus Blokdyk
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Medicare Fraud Detection - Finance - Project

Uploaded by

Big Data Medicare Fraud Detection - Finance - Project

Uploaded by

Big Data Medicare Fraud AT H A RVA K O U S A D I K A R

US Healthcare spending has Medicare accounts for up to Fraud impact is estimated up

● Data Visualization/ Exploratory Data Analysis

Statement anomaly analysis and geo-

Fraud 2. Fraud by Insurance subscribers (patient or patient’s employers)

Patterns 3. Fraud by insurance carriers

4. Conspiracy Frauds (involved with all parties)

Spark using Azure HDinsight

● 25M+ rows and 21 columns

01 CMS – Prescriber Data 2017

02 Payments Received by Physicians

● list of individuals and entities that are excluded

● Impute missing Data

Joining datasets based on NPI, state, city, first and last n

Merging drug fraudulent cases with

Transform skewed data to approximately conform to normality by using log transformation

Class weights assigned to reduce

Scaling data using Standard Scalar

• Use cross validation for sampling the data into train-test

Random Forest Model hosted using ML flow

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.