0% found this document useful (0 votes)

94 views14 pages

DSML Project Report - Group05

This document provides a report on analyzing factors that affect life expectancy using data from 193 countries between 2000 and 2015. It discusses the inspiration and business understanding behind the project, describes the dataset and variables, and outlines the data processing steps including handling missing data and encoding categories. Regression models like Ridge, Lasso, and Elastic Net were applied and compared to identify significant factors influencing life expectancy. Key factors found to have effects include development status, HIV/AIDS prevalence, alcohol consumption, GDP, adult mortality, and BMI. Visualizations are also included to aid in conclusions.

Uploaded by

deepak raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views14 pages

DSML Project Report - Group05

Uploaded by

deepak raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

DSML Project Report

Life Expectancy
Data Analysis

Submitted By: Group - 05

About the Dataset .............................................................................................................. 2

Inspiration ......................................................................................................................... 3

Data Understanding .............................................................................................................. 4

Processing Data ................................................................................................................. 5

Ridge Regression ............................................................................................................... 9

Lasso Regression ............................................................................................................... 9

Elnet Regression ................................................................................................................ 9

Data Visualization .............................................................................................................10

Conclusion........................................................................................................................13

1
Introduction/Business Understanding

Although there have been lot of studies undertaken in the past on factors affecting life expectancy

considering demographic variables, income composition and mortality rates. It was found that

effect of immunization and human development index was not considered in the past. Also, some

of the past research was done considering multiple linear regression based on data set of one year

for all the countries. Hence, this gives motivation to resolve both the factors stated previously by

formulating a regression model based on mixed effects model and multiple linear regression while

considering data from a period of 2000 to 2015 for all the countries. Important immunization like

Hepatitis B, Polio and Diphtheria will also be considered. In a nutshell, this study will focus on

immunization factors, mortality factors, economic factors, social factors and other health related

factors as well. Since the observations this dataset are based on different countries, it will be easier

for a country to determine the predicting factor which is contributing to lower value of life

expectancy. This will help in suggesting a country which area should be given importance in order

to efficiently improve the life expectancy of its population.

About the Dataset

The project relies on accuracy of data. The Global Health Observatory (GHO) data repository

under World Health Organization (WHO) keeps track of the health status as well as many other

related factors for all countries The datasets are made available to public for the purpose of health

data analysis. The dataset related to life expectancy, health factors for 193 countries has been

collected from the same WHO data repository website and its corresponding economic data was

collected from United Nation website. Among all categories of health-related factors only those

2
critical factors were chosen which are more representative. It has been observed that in the past 15

years, there has been a huge development in health sector resulting in improvement of human

mortality rates especially in the developing nations in comparison to the past 30 years. Therefore,

in this project we have considered data from year 2000-2015 for 193 countries for further analysis.

The individual data files have been merged together into a single dataset. On initial visual

inspection of the data showed some missing values. As the datasets were from WHO, we found

no evident errors. There were missing data to be identified in the dataset.

Inspiration
The dataset aims to answer the following key questions:

• Do various predicting factors which has been chosen initially really affect the Life

expectancy? What are the predicting variables affecting the life expectancy?

• Should a country having a lower life expectancy value (<65) increase its healthcare

expenditure to improve its average lifespan?

• How does Infant and Adult mortality rates affect life expectancy?

• Does Life Expectancy have positive or negative correlation with eating habits, lifestyle,

exercise, smoking, drinking alcohol etc.

• What is the impact of schooling on the lifespan of humans?

• Does Life Expectancy have positive or negative relationship with drinking alcohol?

• Do densely populated countries tend to have lower life expectancy?

• What is the impact of Immunization coverage on life Expectancy?

3
For all the analysis done throughout the report, we use Crisp-DM methodology. Crisp-DM stands

for Cross Industry standard process for Data Mining. It is nothing but an industry-proven method

that guides the process of our data mining. It is a model that consists of six phases that

systematically describe the data mining process and implementation. The six phases are- Business

understanding, data understanding, data preparation, modelling, evaluation, and deployment.

Data Understanding
Variables

To prepare the data for modelling, the dataset was loaded onto Python workspace in Jupyter and

then using the ‘info’ command in Python Pandas, the features of the dataset were observed. The

4
data consisted of 19 columns and each column had 1649 entries. The dataset had all the columns

in numerical data type either in int or float. Only the Customer ID column was of object data type.

Processing Data
Since there were 0 missing values, there was no need for data processing. The only variable
which needed to be encoded was Status.

5
We trained the model on 70% of the dataset using random_state = 42 and tested the
remaining 30%.

After running regression the output is as follows,

6
Significant variable for the same are

7
Columns to be removed because of large vif values.

Significant variables after running regression again,

8
Ridge Regression

Lasso Regression

Elnet Regression

9
Data Visualization

10
11
12
Conclusion
The response for Ridge and Elnet regression fair out better than Lasso Regression. Therefore,
we can use the same models and reject Lasso Regression. Based on the regression outputs, the
variables affecting Life Expectancy are, Status Developing, HIV/AIDS, Alcohol, GDP, Adult
Mortality and BMI.

Life Expectancy = 70.2734 -2.1297Status Developing – 0.4423HIV/AIDS – 0.0249*Adult

Mortality + 0.2610*Alcohol + 0.1208*BMI + 0.0001*GDP

Math IA Example
100% (1)
Math IA Example
26 pages
Arthur Ramos - As Culturas Negras
100% (2)
Arthur Ramos - As Culturas Negras
118 pages
Tabla Artículos
No ratings yet
Tabla Artículos
12 pages
Predicting Life Expectancy Using Machine Learning
100% (1)
Predicting Life Expectancy Using Machine Learning
9 pages
Application of Data Mining Techniques To Predict Adult Mortality Thecase of Butajira Rural Health Program Butajira Ethiopia 2157 7420 1000197
No ratings yet
Application of Data Mining Techniques To Predict Adult Mortality Thecase of Butajira Rural Health Program Butajira Ethiopia 2157 7420 1000197
10 pages
Determinants of Life Expectancy in Developing Countries - Cross-Country Analysis
No ratings yet
Determinants of Life Expectancy in Developing Countries - Cross-Country Analysis
46 pages
Tamak Swas
No ratings yet
Tamak Swas
3 pages
DSML Project Report - Group05
No ratings yet
DSML Project Report - Group05
14 pages
Life Expectancy Using Data Analytics
100% (1)
Life Expectancy Using Data Analytics
9 pages
Can We Really Live Longer - A Machine Learning Study - by Nicolasdealba - Medium
No ratings yet
Can We Really Live Longer - A Machine Learning Study - by Nicolasdealba - Medium
34 pages
Frailty Models in Survival Analysis 1st Edition Full PDF Download
No ratings yet
Frailty Models in Survival Analysis 1st Edition Full PDF Download
16 pages
Final Report
No ratings yet
Final Report
26 pages
Monitoring Health by Healthy Active Life Expectancy-A User's Guide
No ratings yet
Monitoring Health by Healthy Active Life Expectancy-A User's Guide
27 pages
Factors Affecting Life Expectancy at Birth: Deeppal Singh (T00602474)
No ratings yet
Factors Affecting Life Expectancy at Birth: Deeppal Singh (T00602474)
8 pages
Charts of Outcomes Vaers Reports Covid 19 Vax 01-22-2021 To 10-15-21
No ratings yet
Charts of Outcomes Vaers Reports Covid 19 Vax 01-22-2021 To 10-15-21
2,286 pages
Life Expectancy
No ratings yet
Life Expectancy
13 pages
Assigment Ofbasic Econometrics
No ratings yet
Assigment Ofbasic Econometrics
8 pages
Proiect Econometrie
No ratings yet
Proiect Econometrie
15 pages
Life Expectancy USING MACHINE LEARNING ALGORITHMS
No ratings yet
Life Expectancy USING MACHINE LEARNING ALGORITHMS
5 pages
Group - 1 - Socio Economic Factors Affecting Life Expectancy - FINAL (2) - Converted - by - Abcdpdf
0% (1)
Group - 1 - Socio Economic Factors Affecting Life Expectancy - FINAL (2) - Converted - by - Abcdpdf
22 pages
Zimbabwe Life Tables
No ratings yet
Zimbabwe Life Tables
13 pages
Student - Healthcare Expenditure and Life Expectancy
No ratings yet
Student - Healthcare Expenditure and Life Expectancy
5 pages
Nutritional Assessment in HIV-infected/exposed Infants
100% (1)
Nutritional Assessment in HIV-infected/exposed Infants
2 pages
RQ2223A08 - Assignment 2 - ECO332 - 12spss
No ratings yet
RQ2223A08 - Assignment 2 - ECO332 - 12spss
10 pages
(Chapman & Hall CRC Biostatistics Series'',) Andreas Wienke - Frailty Models in Survival Analysis (Chapman & Hall CRC Biostatistics Series) - Chapman and Hall - CRC (2010)
No ratings yet
(Chapman & Hall CRC Biostatistics Series'',) Andreas Wienke - Frailty Models in Survival Analysis (Chapman & Hall CRC Biostatistics Series) - Chapman and Hall - CRC (2010)
320 pages
Data Insights and Visualization: Akshay Gidde Kalyani Hudekar
No ratings yet
Data Insights and Visualization: Akshay Gidde Kalyani Hudekar
11 pages
Analysis of The Recent Evolution of Healthy Life Expectancy in The MENA Region With A Focus On Algeria
No ratings yet
Analysis of The Recent Evolution of Healthy Life Expectancy in The MENA Region With A Focus On Algeria
10 pages
Debopriya
No ratings yet
Debopriya
21 pages
Factors Contributing To Lower Value of Life Expectancy
No ratings yet
Factors Contributing To Lower Value of Life Expectancy
18 pages
AIML Assignment 2
No ratings yet
AIML Assignment 2
2 pages
DV&I With Speaker Notes
No ratings yet
DV&I With Speaker Notes
10 pages
Life
No ratings yet
Life
7 pages
Eab Research Paper
No ratings yet
Eab Research Paper
21 pages
Term Project - Stats 1E
No ratings yet
Term Project - Stats 1E
14 pages
Determinants of Life Expectancy in Developing
No ratings yet
Determinants of Life Expectancy in Developing
46 pages
Sociology (Part B) - 2
No ratings yet
Sociology (Part B) - 2
6 pages
Aarav Bhatnagar 11MASG4
No ratings yet
Aarav Bhatnagar 11MASG4
6 pages
Iryu.4 Team - Medical.Dragon.E09.720p.Web ENG
No ratings yet
Iryu.4 Team - Medical.Dragon.E09.720p.Web ENG
45 pages
Exam4135 2004 Solutions
No ratings yet
Exam4135 2004 Solutions
8 pages
Modeling Life Insurance Risk: Prudential Insurance Data Set
No ratings yet
Modeling Life Insurance Risk: Prudential Insurance Data Set
7 pages
Machine Learning For Prognosis of Life Expectancy
No ratings yet
Machine Learning For Prognosis of Life Expectancy
7 pages
DADM Project - Group 14 - Sec C
No ratings yet
DADM Project - Group 14 - Sec C
10 pages
Forcep Delivery: Dr. Niranjan Chavan
100% (1)
Forcep Delivery: Dr. Niranjan Chavan
36 pages
IA Math
No ratings yet
IA Math
17 pages
Group 16
No ratings yet
Group 16
24 pages
Case Control and Cohort Study - Jaya
No ratings yet
Case Control and Cohort Study - Jaya
70 pages
Abstract Book SaMED 2016 PDF
No ratings yet
Abstract Book SaMED 2016 PDF
192 pages
Bloom Canning Sevilla 2004
No ratings yet
Bloom Canning Sevilla 2004
13 pages
Factors Effecting Life Expectancy in Developed and Developing Countries of The World (An Approach To Available Literature)
No ratings yet
Factors Effecting Life Expectancy in Developed and Developing Countries of The World (An Approach To Available Literature)
4 pages
Population and Lifespan - The Linear Regression Mini-Project
No ratings yet
Population and Lifespan - The Linear Regression Mini-Project
4 pages
Group 1 Project Report DA
No ratings yet
Group 1 Project Report DA
65 pages
Module 1 Identify and Record Cases of Priority Diseases, Conditions or Events - 7aug
No ratings yet
Module 1 Identify and Record Cases of Priority Diseases, Conditions or Events - 7aug
49 pages
Cancer-A Closer Look ANSWERS
100% (1)
Cancer-A Closer Look ANSWERS
4 pages
Exam4135 2004 Solutions
No ratings yet
Exam4135 2004 Solutions
8 pages
87-Ageing Populations - The Challenges Ahead
No ratings yet
87-Ageing Populations - The Challenges Ahead
13 pages
WCLC2016 Abstract Book VF WEB RevDec12
No ratings yet
WCLC2016 Abstract Book VF WEB RevDec12
912 pages
COVID-19 Dummy Report
No ratings yet
COVID-19 Dummy Report
2 pages
3caffc7a4480bae36d9b13faa92ee16f
No ratings yet
3caffc7a4480bae36d9b13faa92ee16f
11 pages
Proiect Econometrie
No ratings yet
Proiect Econometrie
15 pages
The Regression Project Report
No ratings yet
The Regression Project Report
4 pages
Concept Paper
100% (1)
Concept Paper
3 pages
SBRD R2 Central Regionweeklyscientificactivitytimetable Training Year 2020/2021 King Abdulaziz University Hospital
No ratings yet
SBRD R2 Central Regionweeklyscientificactivitytimetable Training Year 2020/2021 King Abdulaziz University Hospital
11 pages
Determinants of Life Expectancy: A Panel Data Approach
No ratings yet
Determinants of Life Expectancy: A Panel Data Approach
7 pages
The Effects of Health Care Expenditures As A Perce
No ratings yet
The Effects of Health Care Expenditures As A Perce
16 pages
Factors Affecting Life Expectancy in The Philippines
No ratings yet
Factors Affecting Life Expectancy in The Philippines
53 pages
Set 111
No ratings yet
Set 111
12 pages
The Determinants of Life Expectancy: A Cross-Country Multiple Linear Regression Analysis
No ratings yet
The Determinants of Life Expectancy: A Cross-Country Multiple Linear Regression Analysis
17 pages
TMC Citizens Charter Handbook 2023 5th Edition
No ratings yet
TMC Citizens Charter Handbook 2023 5th Edition
311 pages
Starting A New Chapter - 201712
No ratings yet
Starting A New Chapter - 201712
118 pages
Labs
No ratings yet
Labs
114 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
5 pages
World Bank HIV/AIDS Program Development Project (II) in Nigeria, An Exploration of TB and TB/HIV Options
No ratings yet
World Bank HIV/AIDS Program Development Project (II) in Nigeria, An Exploration of TB and TB/HIV Options
29 pages
Strategic Management: Agarwal Packers & Movers Ltd. Case: Group 9
No ratings yet
Strategic Management: Agarwal Packers & Movers Ltd. Case: Group 9
11 pages
Kidney 1
No ratings yet
Kidney 1
26 pages
Provesional Restoration 1
No ratings yet
Provesional Restoration 1
45 pages
Management Information System
No ratings yet
Management Information System
3 pages
POCKET GUIDE GOLD 2024 Ver 1.0 - WMV
No ratings yet
POCKET GUIDE GOLD 2024 Ver 1.0 - WMV
53 pages
Predictive Analytics For Future Life Expectancy Using Machine Learning
No ratings yet
Predictive Analytics For Future Life Expectancy Using Machine Learning
6 pages
Study Proposal
No ratings yet
Study Proposal
4 pages
Mark Sheet Station 05
No ratings yet
Mark Sheet Station 05
2 pages
A Comparative Study To Assess The Effect of Steam Inhalation V/s Tulsi Leaves Inhalation On The Sign and Symptoms of Cold and Cough Among Adult Group in Selected Areas of Pune City
No ratings yet
A Comparative Study To Assess The Effect of Steam Inhalation V/s Tulsi Leaves Inhalation On The Sign and Symptoms of Cold and Cough Among Adult Group in Selected Areas of Pune City
3 pages
Answer Key Quiz-5
No ratings yet
Answer Key Quiz-5
3 pages
البلهارسيا PDF
No ratings yet
البلهارسيا PDF
5 pages
MBA20291 - Deepak Raj Tool Used: Do We Receive More Likes On Certain Days of The Week?
No ratings yet
MBA20291 - Deepak Raj Tool Used: Do We Receive More Likes On Certain Days of The Week?
3 pages
MBA20291 - Deepak Raj - Answer 2
No ratings yet
MBA20291 - Deepak Raj - Answer 2
2 pages
II. Variable Description
No ratings yet
II. Variable Description
2 pages
Synopsis - Section C - Group 11
No ratings yet
Synopsis - Section C - Group 11
1 page
SG Invetory Management Case Analysis
No ratings yet
SG Invetory Management Case Analysis
1 page
Progress Test 2
No ratings yet
Progress Test 2
3 pages
Cory Ne Bacterium
No ratings yet
Cory Ne Bacterium
25 pages
JJ - Rules - 2017 Rule 34
No ratings yet
JJ - Rules - 2017 Rule 34
2 pages
Resume For Portflio
No ratings yet
Resume For Portflio
2 pages
Agutayan PNB Elementary School Barkada Kontra Droga
No ratings yet
Agutayan PNB Elementary School Barkada Kontra Droga
3 pages
Wine Study Rough Work
No ratings yet
Wine Study Rough Work
1 page
Hdi Report 1
No ratings yet
Hdi Report 1
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DSML Project Report - Group05

Uploaded by

DSML Project Report - Group05

Uploaded by

DSML Project Report

Submitted By: Group - 05

About the Dataset .............................................................................................................. 2

Data Understanding .............................................................................................................. 4

Processing Data ................................................................................................................. 5

Ridge Regression ............................................................................................................... 9

Lasso Regression ............................................................................................................... 9

Elnet Regression ................................................................................................................ 9

Data Visualization .............................................................................................................10

to efficiently improve the life expectancy of its population.

About the Dataset

no evident errors. There were missing data to be identified in the dataset.

expenditure to improve its average lifespan?

exercise, smoking, drinking alcohol etc.

• What is the impact of schooling on the lifespan of humans?

• Do densely populated countries tend to have lower life expectancy?

• What is the impact of Immunization coverage on life Expectancy?

understanding, data understanding, data preparation, modelling, evaluation, and deployment.

After running regression the output is as follows,

Significant variables after running regression again,

Life Expectancy = 70.2734 -2.1297Status Developing – 0.4423HIV/AIDS – 0.0249*Adult

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

DSML Project Report - Group05

Uploaded by

DSML Project Report - Group05

Uploaded by

DSML Project Report

Submitted By: Group - 05

About the Dataset .............................................................................................................. 2

Data Understanding .............................................................................................................. 4

Processing Data ................................................................................................................. 5

Ridge Regression ............................................................................................................... 9

Lasso Regression ............................................................................................................... 9

Elnet Regression ................................................................................................................ 9

Data Visualization .............................................................................................................10

to efficiently improve the life expectancy of its population.

About the Dataset

no evident errors. There were missing data to be identified in the dataset.

expenditure to improve its average lifespan?

exercise, smoking, drinking alcohol etc.

• What is the impact of schooling on the lifespan of humans?

• Do densely populated countries tend to have lower life expectancy?

• What is the impact of Immunization coverage on life Expectancy?

understanding, data understanding, data preparation, modelling, evaluation, and deployment.

After running regression the output is as follows,

Significant variables after running regression again,

Life Expectancy = 70.2734 -2.1297*Status Developing – 0.4423*HIV/AIDS – 0.0249*Adult

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Life Expectancy = 70.2734 -2.1297Status Developing – 0.4423HIV/AIDS – 0.0249*Adult