Econometrics Project Group 06 Tut 03
Econometrics Project Group 06 Tut 03
ECONOMETRICS PROJECT
FACTORS AFFECTING EARNINGS OF HANU GRADUATED
STUDENTS
2
TABLE OF CONTENTS
Abstract……………………………………………………….………………………………………….6
I. INTRODUCTION…………………………………………………………………...7
II. LITERATURE REVIEW………………………………………….........................8
III. METHODOLOGY…………………………………………………………….…..9
1. Data Collection………………………………………………………........................10
2. Model and Research Methodology ……………………………………………….…11
3. Expectations……………………………………………………………………….....11
IV. ANALYSIS OF DATA AND RESULTS………………………………………...12
1. Testing the functional form of the regression model………………………………...12
1.1 The Lin-Lin model ………………………………………………………….……...12
1.2 The Log-Lin model…………………………………………………………….…...12
1.3 Conclusion……………………………………………………………………….....13
2. Descriptive statistics………………………………………………………………....13
3. Testing individual partial coefficients …………………………………………….…14
4. Testing the overall significance of all coefficients…………………………………...16
V. CHECKING ERRORS………………………………………………………………….17
1. Multicollinearity ……………………………………………………………………..17
1.1 The nature of multicollinearity ……………………………………………………..17
1.2 Detection of multicollinearity………………………………………………………17
1.3 Remedial measures…………………………………………………………….…….18
2. Heteroskedasticity…………………………………………………………………….18
2.1 The nature of heteroskedasticity………………………………………………….....18
2.2 Detection of heteroskedasticity……………………………………………………………..18
2.3 Remedial measures………………………………………………………………………….20
3. Autocorrelation……………………………………………………………………………….20
3.1 The nature of autocorrelation…………………………………………………………….….20
3.2 Detection of autocorrelation…………………………………………………………………20
3.3 Remedial measures………………………………………….……………………….………22
3
VII. CONCLUSION , LIMITATION AND RECOMMENDATIONS………………….….24
1. Conclusion…………………………………………………………….....................................24
2. Limitation…………………………………………………………………………………..…24
3. Recommendations………………………………………………………………………….....24
References………………………………..………………………….……………………..26
4
TABLE OF FIGURES
Figure 1 The Lin-Lin model………………………………………………………………….12
Figure 2 The Log-Lin Model…………………………………………………………………13
Figure 3 Descriptive statistics………………………………………………………………..14
Figure 4 Coefficient covariance matrix………………………………………………………14
Figure 5 Correlation………………………………………………………………………….14
Figure 6 The lin-lin model for F test…………………………………………………………17
Figure 7 Variance Inflation Factors……………………………………………………….….18
Figure 8 The lin-lin model for White Heteroskedasticity Test ………………………………19
Figure 9 The log-lin model for White Heteroskedasticity Test………………………………20
Figure 10 The lin-lin model for Durbin-Watson test ……………………………………….21
Figure 11 The lin-lin model for Breusch-Goddfrey Test……………………………………22
Figure 12 The new log-lin model for Durbin-Waston test……………………………………23
5
Abstract
In recent years, Vietnam's rapid integration into the global economy has been marked by
significant strides in national campaigns and participation in global economic organizations.
Within this context, human capital has emerged as a critical driver for Vietnam's long-term
development, with university students playing a pivotal role in the domestic labor market (Jades,
2015). Hanoi University (HANU), recognized as one of Vietnam's premier institutions with a
global outlook, takes pride in producing well-trained graduates who contribute high-quality
human resources to the national economy. In addition to specialized knowledge, HANU students
acquire a myriad of practical skills, including financial management, which is paramount given
the complexities of balancing expenditures. Recognizing the importance of equipping graduates
with the skills necessary for financial success, we have chosen to delve into the project "Factors
Affecting Earnings of HANU Graduated Students" aiming to shed light on the determinants of
their post-graduation income levels.
6
I. INTRODUCTION
Over the past few years, Vietnam's rapid integration into the global economy has been marked by
significant strides in national campaigns and participation in global economic organizations.
Within this context, human capital has emerged as a critical driver for Vietnam's long-term
development, with university students playing a pivotal role in the domestic labor market (Jades,
2015). Hanoi University (HANU), recognized as one of Vietnam's premier institutions with a
global outlook, takes pride in producing well-trained graduates who contribute high-quality human
resources to the national economy. In addition to specialized knowledge, HANU students acquire
a myriad of practical skills, including financial management, which is paramount given the
complexities of balancing expenditures. Recognizing the importance of equipping graduates with
the skills necessary for financial success, we have chosen to delve into the project "Factors
Affecting Earnings of HANU Graduated Students," aiming to shed light on the determinants of
their post-graduation income levels.
This study employs an online survey distributed to students at Hanoi University, with 100
observations, aiming to provide a comprehensive understanding of the major factors influencing
the earnings of HANU students. The objective is to facilitate the development of a strategic
expenditure plan based on the insights and statistics generated from the project. The central
problem addressed is: What are the factors affecting the earnings of HANU graduated students?
This inquiry arises from genuine concern regarding various factors such as age, personality traits,
and knowledge. We have narrowed our focus to six variables: language proficiency, income,
gender, experience, education, and age. Among these, age and income are quantitative variables,
while the rest are qualitative. These variables are anticipated to significantly influence the
consumption habits of HANU students. To address this, we seek to answer several questions:
● How do these factors affect income?
● Is there any correlation between these factors?
● Are there any errors in the model? If so, what is the remedy?
It can be said that education and income, and experience are critical elements of the market
economy, and as predicted, the results show that there also a close-knit relationship between
those factors.
7
II. LITERATURE REVIEW
The topic of "Factors Affecting Earnings of Graduated Students" has been extensively analyzed in
various research articles. Beginning with George Psacharopoulos and Harry Anthony Patrinos
(2004) in their research paper titled "Returns to Investment in Education: A Further Update," a
consistent positive correlation between higher levels of education and increased income among
graduate students has been demonstrated. Additionally, this study underscores the significance of
education beyond merely imparting specialized knowledge, highlighting its role in fostering
critical thinking skills and adaptability, which are highly sought after in the labor market.
Furthermore, numerous empirical analyses have explored various facets of this topic, each
focusing on different factors for examination. As a result, our group aims to conduct an
independent research study to corroborate findings from the empirical research we have reviewed.
A seminal work by Chiswick and Miller in 1995, titled "Language Skills and Earnings: Evidence
from Childhood Immigrants," sheds considerable light on the profound impact of language
proficiency on the income trajectories of immigrants from their formative years. Their research
underscores the pivotal role that language abilities, particularly proficiency in English, play in
shaping individuals' employment prospects and career advancement opportunities. Through
meticulous analysis, Chiswick and Miller elucidate how a strong command of language, especially
in English as a lingua franca, significantly enhances immigrants' capacity to navigate the job
market effectively. Moreover, their findings suggest that language proficiency acts as a catalyst
for upward mobility, facilitating access to higher-paying jobs and fostering greater financial
stability over time.
The influence of experience, encompassing both academic and professional realms, is crucial in
determining the income paths of graduate students. Research conducted by John P. Conley and Ali
Sina Önder in 2011, titled "The Effect of Education and Experience on Self-Employment Success,"
highlights the significance of practical experience in transforming academic knowledge into
tangible skills highly sought after by employers. Engaging in internships, research endeavors, and
part-time employment during graduate studies not only fosters skill enhancement but also creates
networking prospects that may result in more lucrative job opportunities upon graduation.
8
In summary, our study on "Factors Affecting Earnings of HANU Graduated Students" selects six
key variables—language proficiency, income, gender, experience, education, and age—based on
theoretical foundations and empirical findings. These variables are crucial in shaping the earning
trajectories of HANU graduates and will be thoroughly investigated to provide insights into their
financial outcomes.
III. METHODOLOGY
1. Data Collection
This research aims to investigate the correlation between income and various factors influencing
the earnings of graduates from HANU. To gather data for our study, we conducted an online survey
among HANU graduated students. The survey included multiple-choice questions and simplified
short-answer formats for ease of response. With 100 randomly selected participants, our study
provides insights into the pivotal factors affecting the income levels of HANU graduates. The
survey questionnaire comprises six key inquiries regarding personal income determinants.
● What is your gender?
● What is your age?
● How many years of experience do you have in your current job?
● What is your highest level of education?
● What is your English level? (or other languages which help you the most in your job)
● What is your monthly income?
Drawing from the empirical research outlined in Chapter 1, our primary focus will revolve around
examining the impact of key factors, including gender, age, level of education, years of
professional experience, and language proficiency on individuals' income levels.
To represent income, the selected dependent variable will be INC, which will be regressed as the
logarithm of INC in the model. The decision to use the logarithm of INC was made due to its more
normal distribution compared to just INC, making it better suited for data analysis.
The variables controlled for in this analysis are:
● GEN (GEN = 1 if gender = “male” or =0 if gender = “female”)
● AGE
● EDU (level of education)
● EXPE (years of experience in the current job)
● LANG ( English level or other languages which help you the most in your job)
9
● INC (monthly income)
Where β1 represents the intercept or constant in the model; β2, β3, β4, β5, β6, and β7 denote the
coefficients associated with the independent variables. These coefficients quantify the impact of
AGE, EDU, EXPE, LANG, GEN (where GEN = 1 if gender is "male" or 0 if gender is "female"),
and EDU (with PhD=3, MD=2, and Bachelor=1), LANG (with B1=1, B2=2, C1=3, and C2=4),
respectively; while u represents the stochastic error term.
Our project paper will be conducted in four stages:
● Step 1: We begin by examining the provided data to determine the dependent variable and
the explanatory variables. This process enables us to construct the regression model
outlined above.
● Step 2: Following the analysis using Eviews software, the most suitable model was
identified and subsequently employed for further testing. Testing for functional form was
conducted, examining four models: lin-lin, lin-log, log-lin, and log-log. Upon evaluation,
it was determined that the lin-lin model was the most appropriate form, as the logarithm of
the gender variable did not exhibit synchronization with the smallest calculated CV across
these models.
● Step 3: Various tests were recommended, including the t-test to assess individual partial
coefficients, the F-test to evaluate the overall significance of all coefficients, and additional
tests for adding or dropping variables. These tests aim to ascertain the extent to which each
independent variable influences HANU graduated students.
● Step 4: Econometric models commonly encounter challenges such as autocorrelation,
multicollinearity, and heteroscedasticity. This step involves an examination of these issues,
including methods for detection, testing, and mitigation. The Ordinary Least Squares
10
(OLS) method relies on specific assumptions to generate unbiased, consistent, and efficient
estimates with lower variance compared to other methods. We will address these issues in
sequential order, beginning with multicollinearity, followed by autocorrelation, and
concluding with heteroscedasticity.
3. Expectations
The "a priori" expectation refers to the anticipated relationship that each exogenous variable is
expected to exhibit with the endogenous variable.
β2 is positive: Students' income will rise as they get older since they have more experience in their
field of study.
β3 is positive: The longer individuals invest in their education and the higher their attained degree,
the greater their prospects of earning a higher income.
β4 is positive: When students accumulate more experience in their chosen field of work, they may
have the opportunity to pursue careers as professors within that field, leading to an increase in
income.
β5 is positive: When students possess advanced proficiency in English or other languages that are
highly beneficial in their field of work, they have greater opportunities for career development,
ultimately resulting in an increase in income.
11
IV. ANALYSIS OF DATA AND RESULTS
1. Testing the functional form of the regression model
To determine the most suitable form of the model, we consider R^2 as the condition because R^2
represents the “goodness of fit” of the model, which means how well the data fit the model. The
higher R ^2 , the more suitable the regression model form for analyzing our theory. After the data
was collected, we displayed the result below.
12
Figure 2 : The Log-Lin Model
1.3 Conclusion
Eviews tables indicate that the Lin-lin model has a lower R^2 than the Log-lin model (0.663965 <
0.703465). As mentioned above, we choose models with a higher R^2, concluding that the Log-
lin Model is the most suitable model to fit the data. It can interpret the relation between dependent
and variables. From now on, we estimate the regression line model:
LOG(INC) = -1.67873 + 0.164055*AGE + 0.085026*EDU - 0.049839*EXPE + 0.123902*LANG
+ 0.010674*GEN + u
2. Descriptive statistics
All of the data for this research was gathered from the survey database and reported from a
population of 100 people. Variables include INC, AGE, EDU, EXPE, LANG, GEN. The table
below illustrates an overview of descriptive statistics for all variables taken from the model
13
Figure 3: Descriptive Statistics
Figure 5: Correlation
The following graphs show the relationship between Income and other factors, which are age,
education (level of education), experience (years of experience in the current job), language
(English level or other languages which help you the most in your job) and gender.
3. Testing individual partial coefficients
To evaluate theories on any partial regression coefficients, we employ the t-test. Keeping the other
variables fixed, the purpose of these tests is to determine if each independent variable is significant.
14
● Hypothesis:
Ho: 𝛃i=0 (i=2;3;4;5;6)
Ha: 𝛃i≠0
● Test statistic:
It can be clearly seen that at the 5% level of significance, the age and the level of language
significantly differ from 0. In contrast, the level of education, the years of work experience and
gender are not.
LOG(INC) = -1.67873 + 0.164055*AGE + 0.085026*EDU - 0.049839*EXPE +
0.123902*LANG + 0.010674*GEN + u
In case of other variables equal, if the age increases by 1 unit, the expected monthly income of
graduated Hanu students increases by 16.41% on average. If the level of education increases by 1
unit, the expected monthly income increases by 8.50% on average. If the level of language
15
increases by 1 unit, the expected monthly income increases by 12.39% on average. If gender
increases by 1 unit, the expected monthly income increases by 1.07% on average. In contrast, If
the years of work experience increases by 1 unit, the expected monthly income decreases by 4.98%
on average. This issue could arise when, even with extensive professional experience, the level of
education and (or) level of language are insufficient to keep up with salary increases. The following
elements, except the number of years of work experience, all have a positive correlation with
monthly income. The monthly compensation increases with age, level of education, level of
language and gender.
4. Testing the overall significance of all coefficients
We use the F-test to test overall significance of the multiple regression and to check the effect of
all independent variables.
In hypothesis testing,we use a significance level of 5% with the number of observation is 100.
We have the following model :
INC = β1 + β2AGE + β3EDU + β4EXPE + β5LANG + β6GEN + u
● Hypothesis testing
Ho: β2 =β3 =β4 =β5 =β6=0 ( All variables are zero effect)
Ha: β2 ≠0, or β3 ≠0, or β4 ≠0, or β5 ≠0, or β6 ≠0 (At least one variable
has the effect)
● F-statistics
𝑅2 /𝑘−1
F= = 37.14657
(1−𝑅2 )/𝑛−𝑘
𝑐
● Critical value : 𝐹0.05,5,14 = 2.96
● Decision rule:
Since F > 𝐹 𝑐 => reject Ho
● Conclusion: The overall estimators are statistically significantly different from zero. In
other words, there is at least one variable that affects income.
16
Figure 6: The Lin-lin model for F test
V.Checking errors
1. Multicollinearity
1.1 The nature of multicollinearity
Multicollinearity occurs when the regression model's independent variables have perfect linear
connections. This precise relationship occurs when the following condition is satisfied:
λ1X1+ λ 2X2+ …+ λkXk vi = 0
1.2 Detection of multicollinearity
Multicollinearity occurs when one or more regressors are precise or roughly linear combinations
of the other regressors. Calculating the variance inflation factor (VIF) for each independent
variable is one method for identifying multicollinearity. If the VIF value is greater than 10, we can
conclude that multicollinearity exists. We used Eviews to check this problem and below is the
result:
17
Figure 7 : Variance Inflation Factors
As we can see from the table above, the centered VIF of two variables ( age and experience) is
higher than 10. Therefore, two variables above is said to be highly collinear.
1.3 Remedial measures
Because our objective is only for prediction, we will do nothing with multicollinearity problem.
2. Heteroskedasticity
2.1 The nature of heteroskedasticity
Heteroscedasticity occurs when the variances of error components in a model do not remain
constant when explanatory and explained factors vary
Var(ui) = E(u2i) = 𝝈𝒊𝟐 is not constant (for i = 1, 2,..., n)
2.2 Detection of heteroskedasticity
We can identify this issue using formal methods since the analysis findings may be erroneous if
heteroskedasticity occurs and the population under study has an asymmetrical variance. When
testing in Eviews without using cross-terms
18
Figure 8: The lin-lin model for White Heteroskedasticity Test
● Hypothesis: H0: No heteroskedasticity exists in this model (homoscedasticity).
H₁: Heteroskedasticity exists in this model.
● The test statistic: W=n*R² = 27.57 (result from figure 8)
2
● Critical value: 𝑋0.05,5 = 11.07 (at 5% level of significance)
2
● Decision rule: Because W =27.57 > 𝑋0.05,5 = 11.07
=> reject Ho
● Conclusion: There is enough evidence to conclude that heteroskedasticity has
existed in the regression model at 5% level of significance.
19
2.3 Remedial measures
We use the log-linear model to reduce the effects of heteroskedasticity
20
Figure 10: The lin-lin model for Durbin-Watson test
● Hypothesis: H0: No autocorrelation exists in the model
H1: Autocorrelation exists in the model
● Test statistic: DW = 1.285030
● Critical value: dL =1.571 ; dU = 1.780 (at 5% level of significance, k' = 5; n
= 100)
● Decision rule:
- If DW < dL => reject Ho
- If DW >dU => not reject Ho
- If dL <DW <dU => this test is inconclusive.
● Conclusion:
Because DW = 1.285030 < dL = 1.571, we have enough evidence to reject H0, and
conclude that autocorrelation exists in this model.
21
● Step2: Test statistic:
BG-sta = (n-p) × 𝑅2 with level of significant: α=0.05
● Step3: Decision rule
Reject Ho if BG* > X20.05, df
22
In Eviews, we generate a new series NINC = LOG(INC(-1)).
Therefore, the new regression model can be displayed as the following:
LOG(INC) = β1 + β2AGE + β3EDU + β4EXPE + β5LANG + β6GEN + β7NINC+u
● Conclusion: Because dU =1.830 > DW=1.558 > dL= 1.550 , we can say that this test is
inconclusive or the regression model has zero autocorrelation, we neither reject nor accept
the null hypothesis. Hence, autocorrelation is eliminated from the regression model.
23
VII. CONCLUSION AND LIMITATION
1. Conclusion
This research investigated numerous methodologies and models for predicting the variables
impacting Hanoi students' salaries and applied them to a high-quality data source. We have stressed
the importance of model specification, specifically the distinction between single-treatment and
multiple-treatment models, as well as the need of accounting for heterogeneous returns, or returns
that differ between individuals from different backgrounds. We studied three fundamental
estimating techniques: multicollinearity, autocorrelation, and heteroscedasticity, all of which are
dependent on different identifying assumptions. Determining the "parameter of interest" is critical
when dealing with diverse outcomes. under the homogeneous effects model, all of them are
similar. However, under the heterogeneous effects model, they might differ significantly.
Depending on the policy inquiry, one is more important than the other. Our computer looked for
and estimated pay returns for numerous characteristics that impact Hanoi residents. This dataset is
ideal for analyzing how various factors influence wages. Extensive ability assessments at various
ages, reliable measurement of background and job kinds, and a focus on income factors are
excellent for approaches based on observable selection assumptions. These approaches include
statistics, variable descriptions, and quantitative analysis.
2. Limitation
Despite the above benefits and utility, the regression analysis approach has the following important
drawbacks: The factors' cause-and-effect linkages are likely to remain unchanged. This assumption
may only be valid in some cases, thus predicting a variable's values using the regression equation
may yield erroneous and misleading results. Assume that more and more data is taken into account.
Restricted data may alter the functional connection between variables. The data was obtained using
the author's predicted assumptions, thus there will be some inaccuracies in the surveyed data,
although they are not major.
3. Recommendations
In general, when you have to analyze and process a block of randomly collected data, we highly
recommend you use EVIEW software to process the data in the most correct and accurate way,
saving time. Using tests like multicollinearity, heteroskedastic or autocorrelation and other
hypotheses to prove whether your expectations based on data are realistic or not.
24
25
REFERENCES
Autocorrelation. (n.d.). Statistics Solutions. Retrieved May 3, 2023, from
https://www.statisticssolutions.com/dissertation-resources/autocorrelation/
CFI Team. (2022, December 5). Heteroskedasticity - Overview, Causes and Real-World
Example. Corporate Finance Institute. Retrieved May 3, 2023, from
https://corporatefinanceinstitute.com/resources/data-science/heteroskedasticity/
The effect of education and experience on self-employment success. Retrieved March 1994 from
https://www.sciencedirect.com/science/article/abs/pii/088390269490006X
26