Medical Insurance Cost Prediction
Medical Insurance Cost Prediction
PRESENTED BY ,
B182006 (M.Praveen)
B182096 (A.Prathyusha)
Dept of ECE.
Table of Contents
Abstract
work Flow
Proposed system
Attribute information
Implementation
Conclusion
Abstract
Health Insurance companies have a tough task at determining premiums for their
customers. While the health care law in the United States does have some rules for
the companies to follow to determine premiums, its really up to the companies on
what factors they want to hold more weightage to.
Using Linear Regression (Machine Learning Technique), try to determine the most
significant factors(independent variables) by an insurance company.
Work flow
PROPOSED SYSTEM
The working of the system starts with the collection of the data and selecting the
important attributes.
Then the required data is pre-processed into the required format.
The data is then divided into two parts : training and testing data.
The algorithms are applied and the model is trained using training data.
The accuracy of the system is obtained by testing the system using the testing the data.
This system is implemented using the following modules
1. Collection of data set
2. Selection of attributes
3. Data pre-processing
4. Balancing of data
5. Insurance cost prediction
DATA SET INFORMATION
ATTRIBUTE INFORMATION:
Age
Sex
BMI
Number of Children
Smoker
Region
Charges
DATA SET INFORMATION
ATTRIBUTE INFORMATION:
Age
Sex
BMI
Number of Children
Smoker
Region
Charges
Implementation
Preparing and pre-processing the data ,as it required for good accuracy
Choosing the model which gives the best accuracy with minimal error , reduce
overfitting
Decision Tree
Random Forest
Linear Regression Equation :
Y = A+B*X
where,
X : input variable (Training data)
B : coefficient of X
A : Intercept
Y : Predicted value
Decision Tree
Random Forest creates Decision trees on randomly selected data samples, get
predictions from each tree and select the best solution by means of voting.
Performance Evaluation
In this project three regression models are evaluated for individual Medical
Insurance Data. The Medical Insurance data was used develop the three
regression models, and the predicted premiums from these models were
compared with actual premiums to compare the accuracies of these models. It
has been found that Random Forest regression model is the best performing
model.
This can help not only people but also insurance companies to work in
tandem for better and more health-centric insurance amount
THANK YOU