World Happiness Report
World Happiness Report
1 Happiness Report
1.1 Introduction
The motivation to investigate this work comes from the will to understand the countries’ charac-
teristics in respect to a few factors, and hopefully follow the examples of nations that managed
to incorporate happiness among its citizens. What can we infer about happiness taking economy,
generosity or freedom into account? Can Linear Regression be used to predict Happiness Score?
The dataset from the World Happiness Report from the years of 2015 to 2019 is based on the World
Hapiness Report on Kaggle (originally found here). The .csv files and the “UCSD” module created
can be found here
[67]: import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
import ucsd #module created in order to shorten the Jupyter Notebook
from math import sqrt
%matplotlib inline
[69]: #Creating columns for those dataframes that do not have Year, Region and/or␣
,→Colormap columns
1
data2015['Year'] = 2015
data2016['Year'] = 2016
data2017['Year'] = 2017
data2018['Year'] = 2018
data2019['Year'] = 2019
data2017['Region'] = 0
data2018['Region'] = 0
data2019['Region'] = 0
remove_column(data2015,commonParameters)
remove_column(data2016,commonParameters)
remove_column(data2017,commonParameters)
remove_column(data2018,commonParameters)
remove_column(data2019,commonParameters)
Example:
target_dataframe.loc[target_dataframe['Country'] == 'Nepal', 'Region'] = 'Southern Asia'
assigns ‘Southern Asia’ to the Region field where the Country == ‘Nepal’ condition is met
[72]: ucsd.setRegionToDataFrame(data2017)
ucsd.setRegionToDataFrame(data2018)
ucsd.setRegionToDataFrame(data2019)
2
1.4.1 Defining plot function
For the scatterplot function, one may see several needed attributes for the proper plot to be shown.
Sometimes the dot radius is too little to be seen, so the factors slope and exponent are used to
distinguish some desirable characteristic as a sort of a third dimension variable.
For example, let’s say we want to investigate how the Economy GDP behaves as we analyze the
Generosity x Happiness Score plot. We can see that, for the year of 2015, Generosity does not
directly influence the Happiness Score and, albeit considered an outlier here, the most generous
country (Myanmar) has a low Happiness Score. On the other hand, we can clearly realize how
countries with lower GDP occupy lower positions on the plot. The bigger the dot size the bigger the
GDP per capita. The dot radius here is multiplicated by a factor of 40 for the sake of visualization.
[73]: def scatterplot(dataframe, x, y, sizeVariable, slope=1, exponent=1, xmax=1.8,␣
,→ymax=8):
regions = set(dataframe['Region'])
title = str(dataframe['Year'].iloc[0])
i = 0
if sizeVariable != 'None':
for item in regions:
xaxis = dataframe[x].loc[dataframe['Region'] == item]
yaxis = dataframe[y].loc[dataframe['Region'] == item]
plt.scatter(xaxis, yaxis, s = slope*dataframe[sizeVariable].
,→loc[dataframe['Region'] == item]**exponent, label = list(regions)[i], alpha␣
,→= 0.7)
i += 1
plt.xlabel(x)
plt.ylabel(y)
plt.title(title)
plt.legend(loc = (1.05, 0))
plt.axis([0, xmax, 0, ymax])
plt.show()
else:
for item in regions:
xaxis = dataframe[x].loc[dataframe['Region'] == item]
yaxis = dataframe[y].loc[dataframe['Region'] == item]
plt.scatter(xaxis, yaxis, label = list(regions)[i], alpha = 0.7)
i += 1
plt.xlabel(x)
plt.ylabel(y)
plt.title(title)
plt.legend(loc = (1.05, 0))
plt.axis([0, xmax, 0, ymax])
plt.show()
[74]: #Plot 1
linearfactor = 40
3
exponentialfactor = 1
xmax = 1
scatterplot(data2015, 'Generosity', 'Happiness Score', 'Economy (GDP per␣
,→Capita)' , linearfactor, exponentialfactor, xmax)
The Economy GDP per capita and Freedom (Plot 2 and 3) have more influence on the final
happiness outcome.
For the subsequent years, the plots have similar behavior to their respectively similar analysis.
[75]: #Plot 2
linearfactor = 1
exponentialfactor = 1
xmax = 1.8
scatterplot(data2015, 'Economy (GDP per Capita)', 'Happiness Score', 'None' ,␣
,→linearfactor, exponentialfactor, xmax)
4
[76]: #Plot 3
linearfactor = 1
exponentialfactor = 1
xmax = 0.7
scatterplot(data2015, 'Freedom', 'Happiness Score', 'None', linearfactor,␣
,→exponentialfactor, xmax)
5
1.5 Multi Linear Regression for Happiness Score Prediction
[77]: #importing the Machine Learning libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
[85]: y_test.describe()
6
1.5.3 Root Mean Squared Error
[86]: 0.567465700011897