50% found this document useful (2 votes)
4K views33 pages

Mini - Project - Report Health Insurance Price Prediction

This document presents a project report on predicting health insurance prices using machine learning algorithms. It aims to develop a system to predict medical prices to help patients choose cost-effective healthcare providers and curb health spending. The project will implement Random Forest Regression on a health insurance cost dataset to predict prices and compare results to other models like Gradient Boosted Trees and Linear Regression. If successful, it could help patients select insurance plans with appropriate costs and help policymakers understand pricing trends.

Uploaded by

Bhavesh Sonje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
4K views33 pages

Mini - Project - Report Health Insurance Price Prediction

This document presents a project report on predicting health insurance prices using machine learning algorithms. It aims to develop a system to predict medical prices to help patients choose cost-effective healthcare providers and curb health spending. The project will implement Random Forest Regression on a health insurance cost dataset to predict prices and compare results to other models like Gradient Boosted Trees and Linear Regression. If successful, it could help patients select insurance plans with appropriate costs and help policymakers understand pricing trends.

Uploaded by

Bhavesh Sonje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

HEALTH INSURANCE PRICE PREDICTION

Submitted in partial fulfillment of the requirements

Of the degree of

Bachelor of Engineering

In

Information Technology

By-

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Under the Guidance of:

Dr. K. LAKSHMI SUDHA

Department of Information Technology

SIES graduate School of Technology

2022-2023

1
CERTIFICATE
This is to certify that the Mini project entitled HEALTH INSURANCE PRICE
PREDICTION is a bonafide work of the following students, submitted to the University of
Mumbai in partial fulfillment of the requirement for the award of the degree of Bachelor of
Engineering in Information Technology.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

2
PROJECT REPORT APPROVAL

This Mini project report entitled HEALTH INSURANCE PRICE PREDICTION by

following students is approved for the degree of Bachelor of Engineering in Information

Technology.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Name of External Examiner: --------------------------------

Signature:--------------------------------

Name of Internal Examiner: --------------------------------

Signature:--------------------------------

Date: 26/04/2022

Place: SIES GST, NERUL

3
DECLARATION

We declare that this written submission represents my ideas in my own words and where
others ideas or words have been included, we have adequately cited and referenced the
original sources. we also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in my submission. we understand that any violation of the above will be cause for disciplinary
action by the Institute and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken when needed.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

Date: 26/04/2022

4
ACKNOWLEDGEMENT

We wish to express our deep sense of gratitude and thank to our Internal Guide, Dr. K.
Lakshmi Sudha for her guidance, help and useful suggestions, which helped in completing
our project work in time. We also thank to Head of Department Dr. K. Lakshmi Sudha for her
support in completing the project. It gives us immense pleasure to thank Dr. Atul Kemker,
Principal for extending his support to carry out this project.

Also, we would like to thank the entire faculty of Information Technology Department for
their valuable ideas and timely assistance in this project, last but not the least, we would like
to thank our non-teaching staff members of our college for their support, in facilitating timely
completion of this project.

NAMAH RAKESH KOHLI – 220A3067

NISHIKA HARESH PATEL – 220A3069

SUFIYAN FAROOQUE DALVI – 220A3072

5
ABSTRACT

In the domains of computational and applied mathematics, soft computing, fuzzy logic, and
machine learning (ML) are well-known research areas. ML is one of the computational
intelligence aspects that may address diverse difficulties in a wide range of applications and
systems when it comes to exploitation of historical data.

Predicting medical insurance costs using ML approaches is still a problem in the healthcare
industry that requires investigation and improvement. Using a series of machine learning
algorithms, this project provides a computational intelligence approach for predicting health
insurance costs. The proposed research approach uses Logistic Regression, Support Vector
Regression, Random Forest Regressor, Gradient Boosting Regressor.

A health insurance cost dataset is acquired from the KAGGLE repository for this purpose,
and machine learning methods are used to show how different regression models can
forecast insurance costs and to compare the models’ accuracy. In this work, we will develop
a medical price prediction system using machine learning algorithms which will aid in
steering patients to cost effective providers and thereby curb health spending. The
policymakers can also use the tool to better understand which providers are relatively
expensive and take punitive actions if necessary. The prediction of the medical price will be
done using implementing Random Forest Regression algorithm in machine learning.
Additionally, we plan to include the experiments on the same data with other machine
learning models such as Gradient Boosted Trees and Linear Regression and compare results.
The findings from these experiments will also be included.

6
CONTENTS

Chapter No. Topic Page


No.
Chapter 1 INTRODUCTION 8
1.1 Motivation 9
1.2 Problem Statement 9
1.3 Objective 10

Chapter 2 REVIEW OF LITERATURE 11

Chapter 3 PRESENT INVESTIGATION 14


3.1 Survey of Existing System 14
3.2 Limitations of Existing System 14

Chapter 4 PROPOSED SYSTEM AND ITS IMPLEMENTATION 15


4.1 Introduction 15
4.2 Implementation 15
4.3 System architecture 17
4.4 Algorithm used 18
4.5 Hardware and software specifications 21

Chapter 5 CODE 22

Chapter 6 RESULT AND DISCUSSIONS 29

Chapter 7 FUTURE WORK 30


Chapter 8 CONCLUSION 31
Chapters 9 REFERENCES 32

7
CHAPTER 1

INTRODUCTION

People’s health insurance cost forecasting is now a valuable tool for improving healthcare
accountability. The healthcare sector produces a very large amount of data related to patients,
diseases, and diagnosis, but since it has not been analysed properly, it does not provide the
significance which it holds along with the patient health insurance cost.

The goal of this project is to allows a person to get an idea about the necessary amount
required according to their own health status. Later they can comply with any health
insurance company and their schemes & benefits keeping in mind the predicted amount from
our project. This can help a person in focusing more on the health aspect of an insurance
rather than the futile part.

A health insurance policy is a policy that covers or minimises the expenses of losses caused
by a variety of hazards. A variety of factors influence the cost of insurance or healthcare. For
a variety of stakeholders and health departments, accurately predicting individual healthcare
expenses using prediction models is critical. Accurate cost estimates can help health insurers
and, increasingly, healthcare delivery organisations to plan for the future and prioritise the
allocation of limited care management resources. Furthermore, knowing ahead of time what
their probable expenses for the future can assist patients to choose insurance plans with
appropriate deductibles and premiums. These elements play a role in the development of
insurance policies.

In the insurance sector, ML can help enhance the efficiency of policy wording. In healthcare,
ML algorithms are particularly good at predicting high-cost, high-need patient expenditures.
ML can be categorized into three different types. These types are supervised machine learning
(i.e., a task-driven approach) used for classification/regression and all data labeled;
unsupervised machine learning (i.e., a data-driven approach) used for clustering and all data
un-labled and reinforcement learning (i.e., learning from mistakes) used for decision making.

Health insurance is a necessity nowadays, and almost every individual is linked with a
government or private health insurance company. Factors determining the amount of
insurance vary from company to company. Also people in rural areas are unaware of the fact

8
that the government of India provide free health insurance to those below poverty line. It is
very complex method and some rural people either buy some private health insurance or do
not invest money in health insurance at all. Apart from this people can be fooled easily about
the amount of the insurance and may unnecessarily buy some expensive health insurance.

1.1 Motivation

Health care coverage makes it possible for people to afford medical treatment in the face of
health-related complications. It is extremely important to be medically insured in case of an
emergency, accident, or disease onset. Insurance companies assess an individual’s lifestyle,
medical history, and other physical attributes to determine their premium price for medical
coverage.

1.2 Problem Statement

• In India only 35% of citizens have health insurance and the more problematic issues
is that out of these 35% only 10% people have health insurance of the right amount.

• People keep delaying when it comes to buying health insurance thinking that its
waste of money.

• People are also confused regarding the right amount for health insurance.

• Our purposed system will bridge this gap by provide people with information on why
health insurance is important.

• We will also be providing a prediction on what is the right amount for health
insurance based on the information they provide about themselves.

9
1.3 Objectives

• Investigating the applicability of the machine learning-based computational


intelligence approach for predicting healthcare insurance cost in the healthcare
industry section.

• Comparing the performance results of the most popular machine learning algorithms
for forecasting the costs of healthcare insurance by using a public dataset.

• Providing a guide for developers to choose the appropriate machine learning method
when developing an effective healthcare insurance cost prediction system.

10
CHAPTER 2

REVIEW OF LITERATURE

Paper [1]: Predicting Health Care Costs Using Evidence Regression

Author: - Belisario Panay 1,Nelson Baloian ,José A. Pino 1,Sergio Peñafiel 1 ,Horacio
Sanson 2 and Nicolas Bersano 2.

Statistical methods (e.g., linear regression) suffer from the spike at zero and skewed
distribution with a heavy right-hand tail of health care costs in small to medium sample sizes.
Advanced methods have been proposed to address this problem, for example, Generalized
Linear Models (GLM) where a mean function (between the linear predictor and the mean) and
a variance function (between the mean and variance on the original scale) are specified and
the parameters are estimated given these structural assumptions. Another example is the two-
part and hurdle model, where a Logit or Probit model is used in the first instance to estimate
the probability of the cost been zero, and then if it is not, a statistical model is applied, such as
Log-linear or GLM.

The most complex statistical method used to solve this problem is the Markov Chain model;
an approach based on a finite Markov chain suggested estimating resource use over different
phases of health care. Supervised learning methods have been vastly used to predict health
care costs; the data used for these methods vary. While a few works use only demographic
and clinical information (e.g., diagnosis groups, number of admissions and number of
laboratory tests), the majority have incorporated cost inputs (e.g., previous total costs,
previous medication costs) as well, obtaining better performance. excels as the method with
the best performance for this problem, which is an ensemble-learning algorithm, where the
final model is an ensemble of weak regression tree models, which are built in a forward stage-
wise fashion. The most essential attribute of the algorithm is that it combines the models by
allowing optimization of an arbitrary loss function.

11
Paper [2]: Feature Selection for Health Care Costs Prediction Using Weighted
Evidential Regression

Author: - Belisario Panay, Nelson Baloian, José A. Pino,1 Sergio Peñafiel.


Although many authors have highlighted the importance of predicting people’s health costs to
improve healthcare budget management, most of them do not address the frequent need to
know the reasons behind this prediction, i.e., knowing the factors that influence this
prediction. This knowledge allows avoiding arbitrariness or people’s discrimination.
However, many times the black box methods (that is, those that do not allow this analysis,
e.g., methods based on deep learning techniques) are more accurate than those that allow an
interpretation of the results. For this reason, in this work, we intend to develop a method that
can achieve similar returns as those obtained with black box methods for the problem of
predicting health costs, but at the same time it allows the interpretation of the results. This
interpretable regression method is based on the Dempster-Shafer theory using Evidential
Regression (EVREG) and a discount function based on the contribution of each dimension.
The method “learns” the optimal weights for each feature using a gradient descent technique.
The method also uses the nearest k-neighbor algorithm to accelerate calculations. It is
possible to select the most relevant features for predicting a patient’s health care costs using
this approach and the transparency of the Evidential Regression model. We can obtain a
reason for a prediction with a k-NN approach.

12
Paper [3]: A Computational Intelligence Approach for Predicting Medical Insurance
Cost.
Author: Ch Anwar Ul Hassan, Jawaid Iqbal, Saddam Hussain

In the domains of computational and applied mathematics, soft computing, fuzzy logic, and
machine learning (ML) are well-known research areas. ML is one of the computational
intelligence aspects that may address diverse difficulties in a wide range of applications and
systems when it comes to exploitation of historical data. Predicting medical insurance costs
using ML approaches is still a problem in the healthcare industry that requires investigation
and improvement. Using a series of machine learning algorithms, this study provides a
computational intelligence approach for predicting healthcare insurance costs. The proposed
research approach uses Linear Regression, Support Vector Regression, Ridge Regressor,
Stochastic Gradient Boosting, XGBoost, Decision Tree, Random Forest Regressor, Multiple
Linear Regression, and k-Nearest Neighbors A medical insurance cost dataset is acquired
from the KAGGLE repository for this purpose, and machine learning methods are used to
show how different regression models can forecast insurance costs and to compare the
models’ accuracy. The results shows that the Stochastic Gradient Boosting (SGB) model
outperforms the others with a cross-validation value of 0.0.858 and RMSE value of 0.340 and
gives 86% accuracy.

13
CHAPTER 3

REPORT ON THE PRESENT INVESTIGATION

3.1 Survey of the Existing System


There is no existing system which predicts how much insurance one requires. People who
want to buy insurance they have to manually calculate or in proper words with they have to
guess how much insurance they might need. This is the major drawback in the existing system
people either by very high valuation insurance or low insurance and these both cases are
harmful and may costly burden and stress on the insurance buyer. If not manually people pay
use amount to the companies and get an amount of which the might need an insurance this is
not an ideal case as the company or employee giving you the right amount is itself working
for the insurance company this often leads to people buying high valuation insurance.

3.2 Limitations of the Existing System:

Sub-limit is an extra limit to a health insurance policy coverage placed on certain medical
expenses. These medical expenses are covered as a part of the original policy coverage
limit, that is the sum insured. Your insurance company can limit its liability under specific
covers by including sub-limits.

In simpler terms, a sub-limit is a monetary cap that your insurance service provider places
on your medical insurance claim for certain covers. They are usually expressed as a fixed
value for a particular illness/disease/treatment but can also be included as a percentage of
the total sum insured.

Sub-limits can be placed on hospital room rent, doctor’s consultation fee, ambulance
charges, and a few pre-planned medical procedures, such as cataract removal, knee
ligament reconstruction, etc.

14
CHAPTER 4

PROPOSED SYSTEM AND ITS IMPLEMENTATION

4.1 Introduction

We have performed machine learning techniques on medical insurance data. The medical
insurance cost dataset is gained from KAGGLE’s repository, and we performed the data
preprocessing. After preprocessing, we select the features by performing feature engineering.
Then, the dataset is split into two parts, train and test datasets. Some of the total data are used
for training, while the rest is for testing. The training dataset is used to create a model that
predicts medical insurance costs for the year, while the test dataset is used to evaluate the
regression models. For regression exploring the dataset, then categorical values are converted
to numerical values.

4.2 Implementation

Step 1: Data collection: This will involve collection of student feedback in the form of
structured data like the grades, enrollment data, progression rates as well as unstructured data
like student opinions expressed through surveys, web blogs, twitter, Facebook etc.

Step 2: Data Preprocessing: In this phase, the data is prepared for the analysis purpose which
contains relevant information. Pre-processing and cleaning of data are one of the most
important tasks that must be one before dataset can be used for machine learning. The real-
world data is noisy, incomplete and inconsistent. So, it is required to be cleaned.

Step 3: Extraction of Feature Set/Training Data Feature set or training data can be prepared
from the cleaned data by using any of the available techniques like bag of words, -gram, N-
gram, POS, TOS tagging etc. The training data can also be prepared by providing them
labels and then divide it into two classes like positive class and negative class. The feature
sets and training set that has obtained by using any of the above methods will be used for the
implementation of machine learning algorithms.

15
Step 4: Implementation of Machine Learning Algorithm on Feature Set/Training Data

• Classification:

To determine a label or category – it is either one thing or another. We train the


model using a set of labelled data. As an example, we want to predict if a person’s
mole is cancerous or not, so we create a model using a data set of mole scans from
1000 patients that a doctor has already examined to determine whether they show
cancer or not. We also feed the model a whole bunch of other data such as a patient’s
age, gender, ethnicity, and place of residence. Then create a model which will enable
us to present a new mole scan & decide if it depict cancer or not.

• Regression:

A Regression model is created when we want to find out a number – for example how
many days before a patient discharged from hospital with a chronic condition such as
diabetes will return.

Step 5: Testing of Data Testing of data is done based on training model which is classified
using supervised learning algorithm. Evaluation of the total responses for every question and
determine the polarity of feedback received in context of the given data.

16
4.3 System Architecture

17
4.4 Algorithms used

Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict new
output values. Machine learning algorithms are mathematical model mapping methods used to
learn or uncover underlying patterns embedded in the data. Machine learning comprises a
group of computational algorithms that can perform pattern recognition, classification, and
prediction on data by learning from existing data (training set).

We have tested and used the following machine learning algorithms: -

• Logistic Regression
• Support Vector Regression
• Random Forest Regressoion
• Gradient Boosting Regressoion

18
Logistic Regression

Logistic regression is a process of modeling the probability of a discrete outcome given an


input variable. The most common logistic regression models a binary outcome; something
that can take two values such as true/false, yes/no, and so on. Multinomial logistic regression
can model scenarios where there are more than two possible discrete outcomes. Logistic
regression is a useful analysis method for classification problems, where you are trying to
determine if a new sample fits best into a category. As aspects of cyber security are
classification problems, such as attack detection, logistic regression is a useful analytic
technique. Logistic regression is a simple and more efficient method for binary and linear
classification problems. It is a classification model, which is very easy to realize and
achieves very good performance with linearly separable classes. It is an extensively
employed algorithm for classification in industry

Support vector machine

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane. SVM
chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine

SVM can be of two types:

➢ Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM
classifier.

19
➢ Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Random Forest Regressoion

Random Forest Regression is a supervised learning algorithm that uses ensemble


learning method for regression. Ensemble learning method is a technique that combines
predictions from multiple machine learning algorithms to make a more accurate prediction
than a single model.

A Random Forest Regression model is powerful and accurate. It usually performs great on
many problems, including features with non-linear relationships. Disadvantages, however,
include the following: there is no interpretability, overfitting may easily occur, we must
choose the number of trees to include in the model.

Gradient Boosting

It is a popular boosting algorithm. In gradient boosting, each predictor corrects its


predecessor’s error. In contrast to Adaboost, the weights of the training instances are not
tweaked, instead, each predictor is trained using the residual errors of predecessor as labels.

There is a technique called the Gradient Boosted Trees whose base learner is CART
(Classification and Regression Trees).

20
4.5 Hardware and Software Specifications.

The Health Insurance Price Prediction is developed to operate within the following
environment: -

Hardware requirements

• Processor: Intel Core Duo 1.0 GHz or more


• RAM: 2 GB or More
• Hard disk: 80GB or more
• Monitor: 15” CRT, or LCD monitor
• Keyboard: Normal or Multimedia
• Mouse: Compatible mouse

Software requirements

• Jupyter notebook (anaconda3)


• Tkinter

21
CHAPTER 5

CODE
pwd

cd desktop

import pandas as pd

data=pd.read_csv("insurance.csv")

data.head()

data.describe(include='all')

data['sex'].unique()

data['sex']=data['sex'].map({'female':0,'male':1})

data['smoker']=data['smoker'].map({'yes':1,'no':0})

data['region']=data['region'].map({'southwest':1,'southeast':2,

'northwest':3,'northeast':4})

X = data.drop(['charges'],axis=1)

y = data['charges']

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

from sklearn.linear_model import LinearRegression

from sklearn.svm import SVR

22
from sklearn.ensemble import RandomForestRegressor

from sklearn.ensemble import GradientBoostingRegressor

lr = LinearRegression()

lr.fit(X_train,y_train)

svm = SVR()

svm.fit(X_train,y_train)

rf = RandomForestRegressor()

rf.fit(X_train,y_train)

gr = GradientBoostingRegressor()

gr.fit(X_train,y_train)

y_pred1 = lr.predict(X_test)

y_pred2 = svm.predict(X_test)

y_pred3 = rf.predict(X_test)

y_pred4 = gr.predict(X_test)

df1 = pd.DataFrame({'Actual':y_test,'Lr':y_pred1,

'svm':y_pred2,'rf':y_pred3,'gr':y_pred4})

df1

import matplotlib.pyplot as plt

plt.subplot(221)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

23
plt.plot(df1['Lr'].iloc[0:11],label="Lr")

plt.legend()

plt.subplot(222)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

plt.plot(df1['svm'].iloc[0:11],label="svr")

plt.legend()

plt.subplot(223)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

plt.plot(df1['rf'].iloc[0:11],label="rf")

plt.legend()

plt.subplot(224)

plt.plot(df1['Actual'].iloc[0:11],label='Actual')

plt.plot(df1['gr'].iloc[0:11],label="gr")

plt.tight_layout()

plt.legend()

from sklearn import metrics

score1 = metrics.r2_score(y_test,y_pred1)

score2 = metrics.r2_score(y_test,y_pred2)

score3 = metrics.r2_score(y_test,y_pred3)

24
score4 = metrics.r2_score(y_test,y_pred4)

print(score1,score2,score3,score4)

print(score1,score2,score3,score4)

s1 = metrics.mean_absolute_error(y_test,y_pred1)

s2 = metrics.mean_absolute_error(y_test,y_pred2)

s3 = metrics.mean_absolute_error(y_test,y_pred3)

s4 = metrics.mean_absolute_error(y_test,y_pred4)

print(s1,s2,s3,s4)

data = {'age' : 21,

'sex' : 1,

'bmi' : 19,

'children' : 0,

'smoker' : 0,

'region' : 1}

df = pd.DataFrame(data,index=[0])

df

new_pred = gr.predict(df)

print("Medical Insurance cost for New Customer is : ",new_pred[0])

25
gr = GradientBoostingRegressor()

gr.fit(X,y)

new_pred = gr.predict(df)

print("Medical Insurance cost for New Customer is : ",new_pred[0])

import joblib

joblib.dump(gr,'model_joblib_gr')

model = joblib.load('model_joblib_gr')

model.predict(df)

from tkinter import *

import joblib

from tkinter import *

import tkinter as tk

import joblib

#------------

master =tk.Tk()

master.title("Insurance Cost Prediction")

label = Label(master,text = "Insurance Cost Predict", bg = "black",fg="white").grid (row=0,


columnspan=2)
26
Label(master, text = "Enter Your Age").grid (row=1)

Label(master, text = "Male Or Female [1/0]").grid(row=2)

Label(master, text = "Enter Your BMI Value").grid (row=3)

Label(master, text = "Enter Number of Children").grid(row=4)

Label(master, text = "Smoker Yes/No [1/0]").grid(row=5)

Label(master, text = "Region [1-4]").grid(row=6)

#------------

e1 = Entry(master)

e2 = Entry(master)

e3 = Entry(master)

e4 = Entry(master)

e5 = Entry(master)

e6 = tk.Entry(master)

e1.grid(row=1,column=1)

e2.grid(row=2,column=1)

e3.grid(row=3,column=1)

e4.grid(row=4,column=1)

e5.grid(row=5,column=1)

e6.grid(row=6,column=1)

def show_entry():

p1 = float(e1.get())

27
p2 = float(e2.get())

p3 = float(e3.get())

p4 = float(e4.get())

p5 = float(e5.get())

p6 = float(e6.get())

#------------

model = joblib.load('model_joblib_gr')

result = model.predict([[p1,p2,p3,p4,p5,p6]])

Label(master, text ="Insurance Cost").grid(row=7)

Label(master, text = result).grid(row=8)

#------------

Button(master, text="Predict",command=show_entry).grid()

master.mainloop()

28
CHAPTER 6
RESULT AND DISCUSSION

29
CHAPTER 7

FUTURE WORKS

Even though results are similar or better than previous works, we believe our results are still
improvable. One of the approaches to improve performance is classifying patients in cost
buckets as recommended by various studies; this strategy results in better performance but
escapes the goal of this work, so we can apply this classification process in future work to
obtain a patient’s risk class as first step to try to improve the performance of WEVREG.
Another approach could be the use of a nearness to death feature as it has been found that
costs rise sharply with it. It is impossible to know a patient’s near death status with our
current data. We could include census data to create the new feature, obtaining nearness to
death by age group taking into account the change in life expectancy depending on patients’
age. We can also try to solve the prediction of health care cost using deep learning methods,
but this purpose may become feasible with the availability of a larger dataset. Finally, we
also plan to apply this model to other regression problems in the health care domain, such as
predicting the hospital length of stay and predicting the days of readmission based on each
patient’s diagnosis and history, which are two classic prediction problems in this domain.

30
CHAPTER 8

CONCLUSION

In this paper, a model for HEALTH INSURANCE PRICE PREDICTION is proposed. The
model which we have developed has been successfully implemented by applying our
knowledge gained in class rooms, referring books and through help of facilities colleagues
and applying our own knowledge related to the subject itself.
The project report entitled " HEALTH INSURANCE PRICE PREDICTION " has come to
its conclusion. The new system has been developed with so much care that it is free of errors
and at the same time efficient and less time consuming. System is robust. Also, provision is
provided for future developments in the system. Several user-friendly coding has also
adopted.

A description of the background and context of the project and its relation to work already
done in the area. Made statement of the aims and objectives of the project. The description of
Purpose, Scope, and applicability. We define the problem on which we are working in the
project. We describe the requirement Specifications of the system and the actions that can be
done on these things. We understand the problem domain and produce a model of the system,
which describes operations that can be performed on the system. We included features and
operations in detail, including screen layouts. We designed user interface and security issues
related to system. Finally, the system is implemented and tested according to test cases

31
CHAPTER 9

REFERENCES

[1] WHO . Public Spending on Health: A Closer Look at Global Trends. World Health
Organization; Geneva, Switzerland: 2018. Technical Report.

[2] Prostate Cancer Probability Prediction By Machine Learning Technique. Jović, Srđan;
Miljković, Milica; Ivanović, Miljan; Šaranović, Milena; Arsić, Milena

[3] Recent Advances in Predictive (Machine) Learning Friedman, J

[4] Automatically explaining machine learning prediction results: a demonstration on type 2


diabetes risk prediction.

[5] Garber A.M., Skinner J. Is American health care uniquely inefficient? J. Econ.
Perspect. 2008;22:27–50. doi: 10.1257/jep.22.4.27.

[6] Yoo I., Alafaireet P., Marinov M., Pena-Hernandez K., Gopidi R., Chang J.F., Hua L.
Data mining in healthcare and biomedicine: A survey of the literature. J. Med.
Syst. 2012;36:2431–2448.doi: 10.1007/s10916-011-9710-5.

[7] Bilger M., Manning W.G. Measuring overfitting in nonlinear models: A new method and
an application to health expenditures. Health Econ. 2015;24:75–85. doi: 10.1002/hec.3003.

[8] Diehr, P.; Yanez, D.; Ash, A.; Hornbrook, M.; Lin, D. Methods for analyzing health care
utilization and costs.
Ann. Rev. Public Health 1999

32
[9] Petit-Renaud, S.; Denoeux, T. Nonparametric regression analysis of uncertain and
imprecise data using belief
functions. Int. J. Approx. Reason. 2004

[10] Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning–
based sequence
model. Nat. Methods 2015

[11] Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you?: Explaining the
predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York,
NY, USA, 2016; pp. 1135–1144.

[12] Morid M.A., Kawamoto K., Ault T., Dorius J., Abdelrahman S. AMIA Annual
Symposium Proceedings. Volume 2017. American Medical Informatics Association;
Bethesda, MD, USA: 2017. Supervised Learning Methods for Predicting Healthcare Costs:
Systematic Literature Review and Empirical Evaluation; p. 1312.

[13] Elith J., Leathwick J.R., Hastie T. A working guide to boosted regression
trees. J. Anim. Ecol. 2008;77:802–813. doi: 10.1111/j.1365-2656.2008.01390.x.

33

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy