50% found this document useful (2 votes)

4K views33 pages

Mini - Project - Report Health Insurance Price Prediction

This document presents a project report on predicting health insurance prices using machine learning algorithms. It aims to develop a system to predict medical prices to help patients choose cost-effective healthcare providers and curb health spending. The project will implement Random Forest Regression on a health insurance cost dataset to predict prices and compare results to other models like Gradient Boosted Trees and Linear Regression. If successful, it could help patients select insurance plans with appropriate costs and help policymakers understand pricing trends.

Uploaded by

Bhavesh Sonje

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

4K views33 pages

Mini - Project - Report Health Insurance Price Prediction

Uploaded by

Bhavesh Sonje

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

HEALTH INSURANCE PRICE PREDICTION

Submitted in partial fulfillment of the requirements

Of the degree of

Bachelor of Engineering

Information Technology

By-

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Under the Guidance of:

Dr. K. LAKSHMI SUDHA

Department of Information Technology

SIES graduate School of Technology

2022-2023

1
CERTIFICATE
This is to certify that the Mini project entitled HEALTH INSURANCE PRICE
PREDICTION is a bonafide work of the following students, submitted to the University of
Mumbai in partial fulfillment of the requirement for the award of the degree of Bachelor of
Engineering in Information Technology.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

2
PROJECT REPORT APPROVAL

This Mini project report entitled HEALTH INSURANCE PRICE PREDICTION by

following students is approved for the degree of Bachelor of Engineering in Information

Technology.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Name of External Examiner: --------------------------------

Signature:--------------------------------

Name of Internal Examiner: --------------------------------

Signature:--------------------------------

Date: 26/04/2022

Place: SIES GST, NERUL

3
DECLARATION

We declare that this written submission represents my ideas in my own words and where
others ideas or words have been included, we have adequately cited and referenced the
original sources. we also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in my submission. we understand that any violation of the above will be cause for disciplinary
action by the Institute and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken when needed.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Date: 26/04/2022

4
ACKNOWLEDGEMENT

We wish to express our deep sense of gratitude and thank to our Internal Guide, Dr. K.
Lakshmi Sudha for her guidance, help and useful suggestions, which helped in completing
our project work in time. We also thank to Head of Department Dr. K. Lakshmi Sudha for her
support in completing the project. It gives us immense pleasure to thank Dr. Atul Kemker,
Principal for extending his support to carry out this project.

Also, we would like to thank the entire faculty of Information Technology Department for
their valuable ideas and timely assistance in this project, last but not the least, we would like
to thank our non-teaching staff members of our college for their support, in facilitating timely
completion of this project.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

5
ABSTRACT

Predicting medical insurance costs using ML approaches is still a problem in the healthcare
industry that requires investigation and improvement. Using a series of machine learning
algorithms, this project provides a computational intelligence approach for predicting health
insurance costs. The proposed research approach uses Logistic Regression, Support Vector
Regression, Random Forest Regressor, Gradient Boosting Regressor.

A health insurance cost dataset is acquired from the KAGGLE repository for this purpose,
and machine learning methods are used to show how different regression models can
forecast insurance costs and to compare the models’ accuracy. In this work, we will develop
a medical price prediction system using machine learning algorithms which will aid in
steering patients to cost effective providers and thereby curb health spending. The
policymakers can also use the tool to better understand which providers are relatively
expensive and take punitive actions if necessary. The prediction of the medical price will be
done using implementing Random Forest Regression algorithm in machine learning.
Additionally, we plan to include the experiments on the same data with other machine
learning models such as Gradient Boosted Trees and Linear Regression and compare results.
The findings from these experiments will also be included.

6
CONTENTS

Chapter No. Topic Page

No.
Chapter 1 INTRODUCTION 8
1.1 Motivation 9
1.2 Problem Statement 9
1.3 Objective 10

Chapter 2 REVIEW OF LITERATURE 11

Chapter 3 PRESENT INVESTIGATION 14

3.1 Survey of Existing System 14
3.2 Limitations of Existing System 14

Chapter 4 PROPOSED SYSTEM AND ITS IMPLEMENTATION 15

4.1 Introduction 15
4.2 Implementation 15
4.3 System architecture 17
4.4 Algorithm used 18
4.5 Hardware and software specifications 21

Chapter 5 CODE 22

Chapter 6 RESULT AND DISCUSSIONS 29

Chapter 7 FUTURE WORK 30

Chapter 8 CONCLUSION 31
Chapters 9 REFERENCES 32

7
CHAPTER 1

INTRODUCTION

People’s health insurance cost forecasting is now a valuable tool for improving healthcare
accountability. The healthcare sector produces a very large amount of data related to patients,
diseases, and diagnosis, but since it has not been analysed properly, it does not provide the
significance which it holds along with the patient health insurance cost.

The goal of this project is to allows a person to get an idea about the necessary amount
required according to their own health status. Later they can comply with any health
insurance company and their schemes & benefits keeping in mind the predicted amount from
our project. This can help a person in focusing more on the health aspect of an insurance
rather than the futile part.

A health insurance policy is a policy that covers or minimises the expenses of losses caused
by a variety of hazards. A variety of factors influence the cost of insurance or healthcare. For
a variety of stakeholders and health departments, accurately predicting individual healthcare
expenses using prediction models is critical. Accurate cost estimates can help health insurers
and, increasingly, healthcare delivery organisations to plan for the future and prioritise the
allocation of limited care management resources. Furthermore, knowing ahead of time what
their probable expenses for the future can assist patients to choose insurance plans with
appropriate deductibles and premiums. These elements play a role in the development of
insurance policies.

In the insurance sector, ML can help enhance the efficiency of policy wording. In healthcare,
ML algorithms are particularly good at predicting high-cost, high-need patient expenditures.
ML can be categorized into three different types. These types are supervised machine learning
(i.e., a task-driven approach) used for classification/regression and all data labeled;
unsupervised machine learning (i.e., a data-driven approach) used for clustering and all data
un-labled and reinforcement learning (i.e., learning from mistakes) used for decision making.

Health insurance is a necessity nowadays, and almost every individual is linked with a
government or private health insurance company. Factors determining the amount of
insurance vary from company to company. Also people in rural areas are unaware of the fact

8
that the government of India provide free health insurance to those below poverty line. It is
very complex method and some rural people either buy some private health insurance or do
not invest money in health insurance at all. Apart from this people can be fooled easily about
the amount of the insurance and may unnecessarily buy some expensive health insurance.

1.1 Motivation

Health care coverage makes it possible for people to afford medical treatment in the face of
health-related complications. It is extremely important to be medically insured in case of an
emergency, accident, or disease onset. Insurance companies assess an individual’s lifestyle,
medical history, and other physical attributes to determine their premium price for medical
coverage.

1.2 Problem Statement

• In India only 35% of citizens have health insurance and the more problematic issues
is that out of these 35% only 10% people have health insurance of the right amount.

• People keep delaying when it comes to buying health insurance thinking that its
waste of money.

• People are also confused regarding the right amount for health insurance.

• Our purposed system will bridge this gap by provide people with information on why
health insurance is important.

• We will also be providing a prediction on what is the right amount for health
insurance based on the information they provide about themselves.

9
1.3 Objectives

• Investigating the applicability of the machine learning-based computational

intelligence approach for predicting healthcare insurance cost in the healthcare
industry section.

• Comparing the performance results of the most popular machine learning algorithms
for forecasting the costs of healthcare insurance by using a public dataset.

• Providing a guide for developers to choose the appropriate machine learning method
when developing an effective healthcare insurance cost prediction system.

10
CHAPTER 2

REVIEW OF LITERATURE

Paper [1]: Predicting Health Care Costs Using Evidence Regression

Author: - Belisario Panay 1,Nelson Baloian ,José A. Pino 1,Sergio Peñafiel 1 ,Horacio
Sanson 2 and Nicolas Bersano 2.

Statistical methods (e.g., linear regression) suffer from the spike at zero and skewed
distribution with a heavy right-hand tail of health care costs in small to medium sample sizes.
Advanced methods have been proposed to address this problem, for example, Generalized
Linear Models (GLM) where a mean function (between the linear predictor and the mean) and
a variance function (between the mean and variance on the original scale) are specified and
the parameters are estimated given these structural assumptions. Another example is the two-
part and hurdle model, where a Logit or Probit model is used in the first instance to estimate
the probability of the cost been zero, and then if it is not, a statistical model is applied, such as
Log-linear or GLM.

The most complex statistical method used to solve this problem is the Markov Chain model;
an approach based on a finite Markov chain suggested estimating resource use over different
phases of health care. Supervised learning methods have been vastly used to predict health
care costs; the data used for these methods vary. While a few works use only demographic
and clinical information (e.g., diagnosis groups, number of admissions and number of
laboratory tests), the majority have incorporated cost inputs (e.g., previous total costs,
previous medication costs) as well, obtaining better performance. excels as the method with
the best performance for this problem, which is an ensemble-learning algorithm, where the
final model is an ensemble of weak regression tree models, which are built in a forward stage-
wise fashion. The most essential attribute of the algorithm is that it combines the models by
allowing optimization of an arbitrary loss function.

11
Paper [2]: Feature Selection for Health Care Costs Prediction Using Weighted
Evidential Regression

Author: - Belisario Panay, Nelson Baloian, José A. Pino,1 Sergio Peñafiel.

Although many authors have highlighted the importance of predicting people’s health costs to
improve healthcare budget management, most of them do not address the frequent need to
know the reasons behind this prediction, i.e., knowing the factors that influence this
prediction. This knowledge allows avoiding arbitrariness or people’s discrimination.
However, many times the black box methods (that is, those that do not allow this analysis,
e.g., methods based on deep learning techniques) are more accurate than those that allow an
interpretation of the results. For this reason, in this work, we intend to develop a method that
can achieve similar returns as those obtained with black box methods for the problem of
predicting health costs, but at the same time it allows the interpretation of the results. This
interpretable regression method is based on the Dempster-Shafer theory using Evidential
Regression (EVREG) and a discount function based on the contribution of each dimension.
The method “learns” the optimal weights for each feature using a gradient descent technique.
The method also uses the nearest k-neighbor algorithm to accelerate calculations. It is
possible to select the most relevant features for predicting a patient’s health care costs using
this approach and the transparency of the Evidential Regression model. We can obtain a
reason for a prediction with a k-NN approach.

12
Paper [3]: A Computational Intelligence Approach for Predicting Medical Insurance
Cost.
Author: Ch Anwar Ul Hassan, Jawaid Iqbal, Saddam Hussain

In the domains of computational and applied mathematics, soft computing, fuzzy logic, and
machine learning (ML) are well-known research areas. ML is one of the computational
intelligence aspects that may address diverse difficulties in a wide range of applications and
systems when it comes to exploitation of historical data. Predicting medical insurance costs
using ML approaches is still a problem in the healthcare industry that requires investigation
and improvement. Using a series of machine learning algorithms, this study provides a
computational intelligence approach for predicting healthcare insurance costs. The proposed
research approach uses Linear Regression, Support Vector Regression, Ridge Regressor,
Stochastic Gradient Boosting, XGBoost, Decision Tree, Random Forest Regressor, Multiple
Linear Regression, and k-Nearest Neighbors A medical insurance cost dataset is acquired
from the KAGGLE repository for this purpose, and machine learning methods are used to
show how different regression models can forecast insurance costs and to compare the
models’ accuracy. The results shows that the Stochastic Gradient Boosting (SGB) model
outperforms the others with a cross-validation value of 0.0.858 and RMSE value of 0.340 and
gives 86% accuracy.

13
CHAPTER 3

REPORT ON THE PRESENT INVESTIGATION

3.1 Survey of the Existing System

There is no existing system which predicts how much insurance one requires. People who
want to buy insurance they have to manually calculate or in proper words with they have to
guess how much insurance they might need. This is the major drawback in the existing system
people either by very high valuation insurance or low insurance and these both cases are
harmful and may costly burden and stress on the insurance buyer. If not manually people pay
use amount to the companies and get an amount of which the might need an insurance this is
not an ideal case as the company or employee giving you the right amount is itself working
for the insurance company this often leads to people buying high valuation insurance.

3.2 Limitations of the Existing System:

Sub-limit is an extra limit to a health insurance policy coverage placed on certain medical
expenses. These medical expenses are covered as a part of the original policy coverage
limit, that is the sum insured. Your insurance company can limit its liability under specific
covers by including sub-limits.

In simpler terms, a sub-limit is a monetary cap that your insurance service provider places
on your medical insurance claim for certain covers. They are usually expressed as a fixed
value for a particular illness/disease/treatment but can also be included as a percentage of
the total sum insured.

Sub-limits can be placed on hospital room rent, doctor’s consultation fee, ambulance
charges, and a few pre-planned medical procedures, such as cataract removal, knee
ligament reconstruction, etc.

14
CHAPTER 4

PROPOSED SYSTEM AND ITS IMPLEMENTATION

4.1 Introduction

We have performed machine learning techniques on medical insurance data. The medical
insurance cost dataset is gained from KAGGLE’s repository, and we performed the data
preprocessing. After preprocessing, we select the features by performing feature engineering.
Then, the dataset is split into two parts, train and test datasets. Some of the total data are used
for training, while the rest is for testing. The training dataset is used to create a model that
predicts medical insurance costs for the year, while the test dataset is used to evaluate the
regression models. For regression exploring the dataset, then categorical values are converted
to numerical values.

4.2 Implementation

Step 1: Data collection: This will involve collection of student feedback in the form of
structured data like the grades, enrollment data, progression rates as well as unstructured data
like student opinions expressed through surveys, web blogs, twitter, Facebook etc.

Step 2: Data Preprocessing: In this phase, the data is prepared for the analysis purpose which
contains relevant information. Pre-processing and cleaning of data are one of the most
important tasks that must be one before dataset can be used for machine learning. The real-
world data is noisy, incomplete and inconsistent. So, it is required to be cleaned.

Step 3: Extraction of Feature Set/Training Data Feature set or training data can be prepared
from the cleaned data by using any of the available techniques like bag of words, -gram, N-
gram, POS, TOS tagging etc. The training data can also be prepared by providing them
labels and then divide it into two classes like positive class and negative class. The feature
sets and training set that has obtained by using any of the above methods will be used for the
implementation of machine learning algorithms.

15
Step 4: Implementation of Machine Learning Algorithm on Feature Set/Training Data

• Classification:

To determine a label or category – it is either one thing or another. We train the

model using a set of labelled data. As an example, we want to predict if a person’s
mole is cancerous or not, so we create a model using a data set of mole scans from
1000 patients that a doctor has already examined to determine whether they show
cancer or not. We also feed the model a whole bunch of other data such as a patient’s
age, gender, ethnicity, and place of residence. Then create a model which will enable
us to present a new mole scan & decide if it depict cancer or not.

• Regression:

A Regression model is created when we want to find out a number – for example how
many days before a patient discharged from hospital with a chronic condition such as
diabetes will return.

Step 5: Testing of Data Testing of data is done based on training model which is classified
using supervised learning algorithm. Evaluation of the total responses for every question and
determine the polarity of feedback received in context of the given data.

16
4.3 System Architecture

17
4.4 Algorithms used

Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict new
output values. Machine learning algorithms are mathematical model mapping methods used to
learn or uncover underlying patterns embedded in the data. Machine learning comprises a
group of computational algorithms that can perform pattern recognition, classification, and
prediction on data by learning from existing data (training set).

We have tested and used the following machine learning algorithms: -

• Logistic Regression
• Support Vector Regression
• Random Forest Regressoion
• Gradient Boosting Regressoion

18
Logistic Regression

Logistic regression is a process of modeling the probability of a discrete outcome given an

input variable. The most common logistic regression models a binary outcome; something
that can take two values such as true/false, yes/no, and so on. Multinomial logistic regression
can model scenarios where there are more than two possible discrete outcomes. Logistic
regression is a useful analysis method for classification problems, where you are trying to
determine if a new sample fits best into a category. As aspects of cyber security are
classification problems, such as attack detection, logistic regression is a useful analytic
technique. Logistic regression is a simple and more efficient method for binary and linear
classification problems. It is a classification model, which is very easy to realize and
achieves very good performance with linearly separable classes. It is an extensively
employed algorithm for classification in industry

Support vector machine

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane. SVM
chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine

SVM can be of two types:

➢ Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM
classifier.

19
➢ Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Random Forest Regressoion

Random Forest Regression is a supervised learning algorithm that uses ensemble

learning method for regression. Ensemble learning method is a technique that combines
predictions from multiple machine learning algorithms to make a more accurate prediction
than a single model.

A Random Forest Regression model is powerful and accurate. It usually performs great on
many problems, including features with non-linear relationships. Disadvantages, however,
include the following: there is no interpretability, overfitting may easily occur, we must
choose the number of trees to include in the model.

Gradient Boosting

It is a popular boosting algorithm. In gradient boosting, each predictor corrects its

predecessor’s error. In contrast to Adaboost, the weights of the training instances are not
tweaked, instead, each predictor is trained using the residual errors of predecessor as labels.

There is a technique called the Gradient Boosted Trees whose base learner is CART
(Classification and Regression Trees).

20
4.5 Hardware and Software Specifications.

The Health Insurance Price Prediction is developed to operate within the following
environment: -

Hardware requirements

• Processor: Intel Core Duo 1.0 GHz or more

• RAM: 2 GB or More
• Hard disk: 80GB or more
• Monitor: 15” CRT, or LCD monitor
• Keyboard: Normal or Multimedia
• Mouse: Compatible mouse

Software requirements

• Jupyter notebook (anaconda3)

• Tkinter

21
CHAPTER 5

CODE
pwd

cd desktop

import pandas as pd

data=pd.read_csv("insurance.csv")

data.head()

data.describe(include='all')

data['sex'].unique()

data['sex']=data['sex'].map({'female':0,'male':1})

data['smoker']=data['smoker'].map({'yes':1,'no':0})

data['region']=data['region'].map({'southwest':1,'southeast':2,

'northwest':3,'northeast':4})

X = data.drop(['charges'],axis=1)

y = data['charges']

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

from sklearn.linear_model import LinearRegression

from sklearn.svm import SVR

22
from sklearn.ensemble import RandomForestRegressor

from sklearn.ensemble import GradientBoostingRegressor

lr = LinearRegression()

lr.fit(X_train,y_train)

svm = SVR()

svm.fit(X_train,y_train)

rf = RandomForestRegressor()

rf.fit(X_train,y_train)

gr = GradientBoostingRegressor()

gr.fit(X_train,y_train)

y_pred1 = lr.predict(X_test)

y_pred2 = svm.predict(X_test)

y_pred3 = rf.predict(X_test)

y_pred4 = gr.predict(X_test)

df1 = pd.DataFrame({'Actual':y_test,'Lr':y_pred1,

'svm':y_pred2,'rf':y_pred3,'gr':y_pred4})

df1

import matplotlib.pyplot as plt

plt.subplot(221)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

23
plt.plot(df1['Lr'].iloc[0:11],label="Lr")

plt.legend()

plt.subplot(222)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

plt.plot(df1['svm'].iloc[0:11],label="svr")

plt.legend()

plt.subplot(223)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

plt.plot(df1['rf'].iloc[0:11],label="rf")

plt.legend()

plt.subplot(224)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

plt.plot(df1['gr'].iloc[0:11],label="gr")

plt.tight_layout()

plt.legend()

from sklearn import metrics

score1 = metrics.r2_score(y_test,y_pred1)

score2 = metrics.r2_score(y_test,y_pred2)

score3 = metrics.r2_score(y_test,y_pred3)

24
score4 = metrics.r2_score(y_test,y_pred4)

print(score1,score2,score3,score4)

s1 = metrics.mean_absolute_error(y_test,y_pred1)

s2 = metrics.mean_absolute_error(y_test,y_pred2)

s3 = metrics.mean_absolute_error(y_test,y_pred3)

s4 = metrics.mean_absolute_error(y_test,y_pred4)

print(s1,s2,s3,s4)

data = {'age' : 21,

'sex' : 1,

'bmi' : 19,

'children' : 0,

'smoker' : 0,

'region' : 1}

df = pd.DataFrame(data,index=[0])

new_pred = gr.predict(df)

print("Medical Insurance cost for New Customer is : ",new_pred[0])

25
gr = GradientBoostingRegressor()

gr.fit(X,y)

new_pred = gr.predict(df)

print("Medical Insurance cost for New Customer is : ",new_pred[0])

import joblib

joblib.dump(gr,'model_joblib_gr')

model = joblib.load('model_joblib_gr')

model.predict(df)

from tkinter import *

import joblib

from tkinter import *

import tkinter as tk

import joblib

#------------

master =tk.Tk()

master.title("Insurance Cost Prediction")

label = Label(master,text = "Insurance Cost Predict", bg = "black",fg="white").grid (row=0,

columnspan=2)
26
Label(master, text = "Enter Your Age").grid (row=1)

Label(master, text = "Male Or Female [1/0]").grid(row=2)

Label(master, text = "Enter Your BMI Value").grid (row=3)

Label(master, text = "Enter Number of Children").grid(row=4)

Label(master, text = "Smoker Yes/No [1/0]").grid(row=5)

Label(master, text = "Region [1-4]").grid(row=6)

#------------

e1 = Entry(master)

e2 = Entry(master)

e3 = Entry(master)

e4 = Entry(master)

e5 = Entry(master)

e6 = tk.Entry(master)

e1.grid(row=1,column=1)

e2.grid(row=2,column=1)

e3.grid(row=3,column=1)

e4.grid(row=4,column=1)

e5.grid(row=5,column=1)

e6.grid(row=6,column=1)

def show_entry():

p1 = float(e1.get())

27
p2 = float(e2.get())

p3 = float(e3.get())

p4 = float(e4.get())

p5 = float(e5.get())

p6 = float(e6.get())

#------------

model = joblib.load('model_joblib_gr')

result = model.predict([[p1,p2,p3,p4,p5,p6]])

Label(master, text ="Insurance Cost").grid(row=7)

Label(master, text = result).grid(row=8)

#------------

Button(master, text="Predict",command=show_entry).grid()

master.mainloop()

28
CHAPTER 6
RESULT AND DISCUSSION

29
CHAPTER 7

FUTURE WORKS

Even though results are similar or better than previous works, we believe our results are still
improvable. One of the approaches to improve performance is classifying patients in cost
buckets as recommended by various studies; this strategy results in better performance but
escapes the goal of this work, so we can apply this classification process in future work to
obtain a patient’s risk class as first step to try to improve the performance of WEVREG.
Another approach could be the use of a nearness to death feature as it has been found that
costs rise sharply with it. It is impossible to know a patient’s near death status with our
current data. We could include census data to create the new feature, obtaining nearness to
death by age group taking into account the change in life expectancy depending on patients’
age. We can also try to solve the prediction of health care cost using deep learning methods,
but this purpose may become feasible with the availability of a larger dataset. Finally, we
also plan to apply this model to other regression problems in the health care domain, such as
predicting the hospital length of stay and predicting the days of readmission based on each
patient’s diagnosis and history, which are two classic prediction problems in this domain.

30
CHAPTER 8

CONCLUSION

In this paper, a model for HEALTH INSURANCE PRICE PREDICTION is proposed. The
model which we have developed has been successfully implemented by applying our
knowledge gained in class rooms, referring books and through help of facilities colleagues
and applying our own knowledge related to the subject itself.
The project report entitled " HEALTH INSURANCE PRICE PREDICTION " has come to
its conclusion. The new system has been developed with so much care that it is free of errors
and at the same time efficient and less time consuming. System is robust. Also, provision is
provided for future developments in the system. Several user-friendly coding has also
adopted.

A description of the background and context of the project and its relation to work already
done in the area. Made statement of the aims and objectives of the project. The description of
Purpose, Scope, and applicability. We define the problem on which we are working in the
project. We describe the requirement Specifications of the system and the actions that can be
done on these things. We understand the problem domain and produce a model of the system,
which describes operations that can be performed on the system. We included features and
operations in detail, including screen layouts. We designed user interface and security issues
related to system. Finally, the system is implemented and tested according to test cases

31
CHAPTER 9

REFERENCES

[1] WHO . Public Spending on Health: A Closer Look at Global Trends. World Health
Organization; Geneva, Switzerland: 2018. Technical Report.

[2] Prostate Cancer Probability Prediction By Machine Learning Technique. JoviÄ‡, SrÄ‘an;
MiljkoviÄ‡, Milica; IvanoviÄ‡, Miljan; Å aranoviÄ‡, Milena; ArsiÄ‡, Milena

[3] Recent Advances in Predictive (Machine) Learning Friedman, J

[4] Automatically explaining machine learning prediction results: a demonstration on type 2

diabetes risk prediction.

[5] Garber A.M., Skinner J. Is American health care uniquely inefficient? J. Econ.
Perspect. 2008;22:27–50. doi: 10.1257/jep.22.4.27.

[6] Yoo I., Alafaireet P., Marinov M., Pena-Hernandez K., Gopidi R., Chang J.F., Hua L.
Data mining in healthcare and biomedicine: A survey of the literature. J. Med.
Syst. 2012;36:2431–2448.doi: 10.1007/s10916-011-9710-5.

[7] Bilger M., Manning W.G. Measuring overfitting in nonlinear models: A new method and
an application to health expenditures. Health Econ. 2015;24:75–85. doi: 10.1002/hec.3003.

[8] Diehr, P.; Yanez, D.; Ash, A.; Hornbrook, M.; Lin, D. Methods for analyzing health care
utilization and costs.
Ann. Rev. Public Health 1999

32
[9] Petit-Renaud, S.; Denoeux, T. Nonparametric regression analysis of uncertain and
imprecise data using belief
functions. Int. J. Approx. Reason. 2004

[10] Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning–
based sequence
model. Nat. Methods 2015

[11] Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you?: Explaining the
predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York,
NY, USA, 2016; pp. 1135–1144.

[12] Morid M.A., Kawamoto K., Ault T., Dorius J., Abdelrahman S. AMIA Annual
Symposium Proceedings. Volume 2017. American Medical Informatics Association;
Bethesda, MD, USA: 2017. Supervised Learning Methods for Predicting Healthcare Costs:
Systematic Literature Review and Empirical Evaluation; p. 1312.

[13] Elith J., Leathwick J.R., Hastie T. A working guide to boosted regression
trees. J. Anim. Ecol. 2008;77:802–813. doi: 10.1111/j.1365-2656.2008.01390.x.

Medical Insurance Cost Prediction
100% (1)
Medical Insurance Cost Prediction
18 pages
Toll Gate - 01 Assessment (Effective April 21)
100% (1)
Toll Gate - 01 Assessment (Effective April 21)
33 pages
TYBCS Java Slips Solution 2022
No ratings yet
TYBCS Java Slips Solution 2022
82 pages
Medical Insurance Cost Prediction
100% (2)
Medical Insurance Cost Prediction
16 pages
Internship Documnet - 1
No ratings yet
Internship Documnet - 1
34 pages
Accurate Predictionof Medical Insurance Pricesusing Machine Learningin Python
No ratings yet
Accurate Predictionof Medical Insurance Pricesusing Machine Learningin Python
28 pages
Medical Expenses Prediction
No ratings yet
Medical Expenses Prediction
51 pages
1 - Heart Disease Prediction Using Machine Learning
81% (26)
1 - Heart Disease Prediction Using Machine Learning
59 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
48 pages
Ai Chatbot Using Python Report
100% (8)
Ai Chatbot Using Python Report
30 pages
Project Report
70% (10)
Project Report
47 pages
Corporate Finance, 6th Edition PDF
No ratings yet
Corporate Finance, 6th Edition PDF
53 pages
Vehicle Insurance Management System Report
78% (9)
Vehicle Insurance Management System Report
48 pages
CSE - 2020 - 21 - Batch10 - BMI Calculator App Report
40% (5)
CSE - 2020 - 21 - Batch10 - BMI Calculator App Report
28 pages
PHP and Mysql Project On Electricity Billing System
No ratings yet
PHP and Mysql Project On Electricity Billing System
108 pages
Phoneme PDF
No ratings yet
Phoneme PDF
26 pages
Condition Assessment of Structures
100% (1)
Condition Assessment of Structures
55 pages
Life Insurance Management System Documentation
50% (4)
Life Insurance Management System Documentation
79 pages
Diabetics Prediction Using Machine Learning
100% (1)
Diabetics Prediction Using Machine Learning
18 pages
Voting System Mini Project Report
100% (2)
Voting System Mini Project Report
18 pages
Heart Disease Prediction Final Report
100% (1)
Heart Disease Prediction Final Report
31 pages
Medicine Recommedation System Project T.Y.
No ratings yet
Medicine Recommedation System Project T.Y.
43 pages
Stress Detection in It Professional by Image Processing and Machine Learning
No ratings yet
Stress Detection in It Professional by Image Processing and Machine Learning
91 pages
Final Year Project Report
50% (2)
Final Year Project Report
53 pages
Mini Project PPT
100% (3)
Mini Project PPT
35 pages
Internship Report
100% (4)
Internship Report
16 pages
21cs644 Module 3
100% (1)
21cs644 Module 3
95 pages
P4 Project Report
No ratings yet
P4 Project Report
28 pages
Heart Disease Prediction Using Machine Learning
100% (1)
Heart Disease Prediction Using Machine Learning
54 pages
"House Price Prediction": Internship Project Report On
No ratings yet
"House Price Prediction": Internship Project Report On
34 pages
Crop Recommender System
No ratings yet
Crop Recommender System
23 pages
Chatbot Synopsis
75% (4)
Chatbot Synopsis
19 pages
Final Report Spam Mail Detection 33
No ratings yet
Final Report Spam Mail Detection 33
51 pages
Flight Fare Prediction Final
No ratings yet
Flight Fare Prediction Final
65 pages
Fianl Year Project Report
No ratings yet
Fianl Year Project Report
62 pages
Synopsis of Smart Health Prediction
50% (4)
Synopsis of Smart Health Prediction
22 pages
Machine Learning Internship Report
33% (9)
Machine Learning Internship Report
31 pages
Diabetes PPT
100% (1)
Diabetes PPT
9 pages
Ifsc Code Finder System
No ratings yet
Ifsc Code Finder System
16 pages
18+430 List of CH.: Design of Pier P1
No ratings yet
18+430 List of CH.: Design of Pier P1
54 pages
Cricket Management System - TutorialsDuniya
100% (1)
Cricket Management System - TutorialsDuniya
53 pages
Reservation System Using JAVA: Mini Project
No ratings yet
Reservation System Using JAVA: Mini Project
32 pages
List All The Categorical (Or Nominal) Attributes and The Real Valued Attributes Separately
100% (3)
List All The Categorical (Or Nominal) Attributes and The Real Valued Attributes Separately
58 pages
J.A.R.V.I.S Project Presentatiopn
33% (3)
J.A.R.V.I.S Project Presentatiopn
24 pages
Disease Prediction and Drug Recommendation Using Machine Learning
100% (1)
Disease Prediction and Drug Recommendation Using Machine Learning
26 pages
Face Mask Detection Project
0% (1)
Face Mask Detection Project
57 pages
College Admission Predictor: A Project Report
100% (2)
College Admission Predictor: A Project Report
32 pages
Phishing Website Detection by Machine Learning Techniques Presentation
No ratings yet
Phishing Website Detection by Machine Learning Techniques Presentation
12 pages
Digital Gram Panchayat-1
No ratings yet
Digital Gram Panchayat-1
69 pages
ML PPT On Laptop Price Prediction
100% (1)
ML PPT On Laptop Price Prediction
17 pages
Perhitungan Sistem Bilga Di Kapal
No ratings yet
Perhitungan Sistem Bilga Di Kapal
63 pages
Online Survey Document
100% (1)
Online Survey Document
42 pages
19MCA1097 Project Report On Heart Failure Prediction
No ratings yet
19MCA1097 Project Report On Heart Failure Prediction
63 pages
Project Synopsis
33% (3)
Project Synopsis
4 pages
Software Requiement Specifications: Fake News Detector
100% (2)
Software Requiement Specifications: Fake News Detector
10 pages
Python Mini Report PDF
100% (2)
Python Mini Report PDF
13 pages
File Structures Mini Project Human Resource Records
0% (1)
File Structures Mini Project Human Resource Records
33 pages
Object Detection System Data Flow Diagram
100% (1)
Object Detection System Data Flow Diagram
16 pages
Desktop Virtual Assistant
No ratings yet
Desktop Virtual Assistant
12 pages
Alarm Clock With GUI
83% (6)
Alarm Clock With GUI
10 pages
Micro-Project Report ON "Wedding Management Database System"
No ratings yet
Micro-Project Report ON "Wedding Management Database System"
17 pages
1NH17CS407
No ratings yet
1NH17CS407
110 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
Tableau Tutorial For Beginners
No ratings yet
Tableau Tutorial For Beginners
8 pages
Statistics of Inheritance POGIL
50% (2)
Statistics of Inheritance POGIL
3 pages
Likert Scales, Levels of Measurement and The 'Laws' of Statistics PDF
No ratings yet
Likert Scales, Levels of Measurement and The 'Laws' of Statistics PDF
8 pages
CBSE Class12 PYQs Electric Charges and Fields-1
No ratings yet
CBSE Class12 PYQs Electric Charges and Fields-1
2 pages
Hospital Management System VB
0% (1)
Hospital Management System VB
22 pages
90 Integrals
No ratings yet
90 Integrals
2 pages
Materials For Mechanical Parts
No ratings yet
Materials For Mechanical Parts
20 pages
Airport and Railway Engin
No ratings yet
Airport and Railway Engin
36 pages
Determining The Amount of Acetic Acid in Vingar
No ratings yet
Determining The Amount of Acetic Acid in Vingar
13 pages
Es 13 - Module 7 - Flanged Bolt Coupling
No ratings yet
Es 13 - Module 7 - Flanged Bolt Coupling
9 pages
Chapters 3 To 7 Study Guide
No ratings yet
Chapters 3 To 7 Study Guide
38 pages
Sartorius Extend Flyer
No ratings yet
Sartorius Extend Flyer
8 pages
Full Download Fundamentals of Renewable Energy Processes 4th Edition Aldo Vieira Da Rosa PDF
100% (3)
Full Download Fundamentals of Renewable Energy Processes 4th Edition Aldo Vieira Da Rosa PDF
52 pages
Poster - 6 - PATH - Oxygen Oxygen Conversion Calculation - 33x23 in (NEW)
No ratings yet
Poster - 6 - PATH - Oxygen Oxygen Conversion Calculation - 33x23 in (NEW)
1 page
MSA - Manual
No ratings yet
MSA - Manual
18 pages
Keyboard Shortcuts Linux
No ratings yet
Keyboard Shortcuts Linux
1 page
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
No ratings yet
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
8 pages
Consumer Theory
No ratings yet
Consumer Theory
17 pages
Maxime Cohen Promo Paper Final
No ratings yet
Maxime Cohen Promo Paper Final
58 pages
16 - 1 - Part I Angular Momentum - Explicit Solutions
No ratings yet
16 - 1 - Part I Angular Momentum - Explicit Solutions
4 pages
Normalization 1
No ratings yet
Normalization 1
14 pages
Grandstream HTTP API
No ratings yet
Grandstream HTTP API
39 pages
Cable Crimping (Punching)
No ratings yet
Cable Crimping (Punching)
3 pages
Acti 9 iEM3000 - A9MEM3255
No ratings yet
Acti 9 iEM3000 - A9MEM3255
3 pages
Physics Questions
No ratings yet
Physics Questions
7 pages
Wireless Network Lab 2
No ratings yet
Wireless Network Lab 2
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Mini - Project - Report Health Insurance Price Prediction

Uploaded by

Mini - Project - Report Health Insurance Price Prediction

Uploaded by

HEALTH INSURANCE PRICE PREDICTION

Submitted in partial fulfillment of the requirements

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Under the Guidance of:

Dr. K. LAKSHMI SUDHA

Department of Information Technology

SIES graduate School of Technology

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

This Mini project report entitled HEALTH INSURANCE PRICE PREDICTION by

following students is approved for the degree of Bachelor of Engineering in Information

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Name of External Examiner: --------------------------------

Name of Internal Examiner: --------------------------------

Place: SIES GST, NERUL

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Chapter No. Topic Page

Chapter 2 REVIEW OF LITERATURE 11

Chapter 3 PRESENT INVESTIGATION 14

Chapter 4 PROPOSED SYSTEM AND ITS IMPLEMENTATION 15

Chapter 6 RESULT AND DISCUSSIONS 29

Chapter 7 FUTURE WORK 30

1.2 Problem Statement

• Investigating the applicability of the machine learning-based computational

Paper [1]: Predicting Health Care Costs Using Evidence Regression

Author: - Belisario Panay, Nelson Baloian, José A. Pino,1 Sergio Peñafiel.

REPORT ON THE PRESENT INVESTIGATION

3.1 Survey of the Existing System

3.2 Limitations of the Existing System:

PROPOSED SYSTEM AND ITS IMPLEMENTATION

To determine a label or category – it is either one thing or another. We train the

We have tested and used the following machine learning algorithms: -

Logistic regression is a process of modeling the probability of a discrete outcome given an

Support vector machine

SVM can be of two types:

Random Forest Regressoion

Random Forest Regression is a supervised learning algorithm that uses ensemble

It is a popular boosting algorithm. In gradient boosting, each predictor corrects its

• Processor: Intel Core Duo 1.0 GHz or more

• Jupyter notebook (anaconda3)

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.svm import SVR

from sklearn.ensemble import GradientBoostingRegressor

import matplotlib.pyplot as plt

from sklearn import metrics

data = {'age' : 21,

print("Medical Insurance cost for New Customer is : ",new_pred[0])

print("Medical Insurance cost for New Customer is : ",new_pred[0])

from tkinter import *

from tkinter import *

master.title("Insurance Cost Prediction")

label = Label(master,text = "Insurance Cost Predict", bg = "black",fg="white").grid (row=0,

Label(master, text = "Male Or Female [1/0]").grid(row=2)

Label(master, text = "Enter Your BMI Value").grid (row=3)

Label(master, text = "Enter Number of Children").grid(row=4)

Label(master, text = "Smoker Yes/No [1/0]").grid(row=5)

Label(master, text = "Region [1-4]").grid(row=6)

Label(master, text ="Insurance Cost").grid(row=7)

Label(master, text = result).grid(row=8)

[3] Recent Advances in Predictive (Machine) Learning Friedman, J

[4] Automatically explaining machine learning prediction results: a demonstration on type 2

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.