0% found this document useful (0 votes)
94 views

CH - 3 - Simple and Multiple Linear Regressions in Stata

Application to Cross Sectional Econometrics in stata

Uploaded by

mengistu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
94 views

CH - 3 - Simple and Multiple Linear Regressions in Stata

Application to Cross Sectional Econometrics in stata

Uploaded by

mengistu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 36
Mengistu Yismaw (MSc.) Department of Economics Debre Markos University (Burie Campus) Email: menyis.2012@gmail.com Chapter ou Simple linear regression. ‘Regression with only qualitative (dummy) regressors: ANOVA © Specification © Estimation © Interpretation Multiple inear regression Regression with qualitative and quantitative regressors: ANCOVA © Specification © Estimation © Interpretation © Test of LRM assumptions © Violations of some of the CLRM assumption © Interaction effect 4 qualitative Response Regression Models: Dummy as dependent variable (Binary choice model) * near Probability Model (LPM) © Specification © Estimation © Interpretation o "CHAPTER THREE: CROSS SECTIONAL ECONOMETRICS. CMI a y PNAaAPRON = Methodology of econometrics analysis What are the steps or procedures of econometricians in their analysis of an economic problem? Broadly speaking, classical econometric methodology proceeds along the following lines (steps): Develop statement of theory or hypothesis Specification of the mathematical model of the theory Specification of the statistical, or econometric, model Obtaining the data Estimation of the parameters of the econometric model Hypothesis testing Forecasting or prediction Using the model for control or policy purposes OT 2.1. Simple Linear Regression Simple linear regression= single regressor (independent variable) Suppose a regression with only qualitative (durnmy) regressor= ANOVA Regression: ‘Step 1: Develop a statement of theory or hypothesis Suppose we want to know if there isa productivity difference between male and female headed households > i.e. Suppose Gender is our independent variable Gender is a dummy (binary or Nominal scale ) variable Nominal scale variable: it is a type of variable which gives qualitative information only. male 1male i. Gender { cans let yan Note: the above coding ‘0’ or'1’ is used for identification purpose only. > Then values of nominal scale variable can't be divided, subtracted or ordered for comparison > This type of variable sometimes called dummy variable IIo) Step 2: Specification of the mathematical model of the theory Yield = BO+ B1Gender Step 3: Specification of the statistical, or econometric, model And let your multiple linear regression model is: Yield = By+ B1Gender +pi Step 4: Obtaining the data Then the next step is going to field and collect the data. » The next step is entering the data in to the appropriate software and format. O Remember ways of entering the data in to stata i. Directly entering the data in to the stata ii, Entering the data in to excel and import to the stata iii. Entering the data in to SPSS and save it in the appropriate stata format or use stata transfer software > The next step is estimating the model Oe Step 5: Estimation of the parameters of the econometric model Ordinary Least Square (OLS) estimation techniques using stata Statistics mam models and related ==) Linear regression mm)Select the dependent and independent variables ===> Click submit = Click ok Syntax: reg depvar indepvar Example: eg Yield Gender nt Some statistical manipulations Depo of eedom(, sample size Penumber of parameters indep Vas) ben. of variables > (1, 30-2) (1,28) 8 — 0.4610 Estimate te residuals Estimate te ited value 4-BiGender Fe) ~ 1088 t= 508 Cl for fy = + #220) where; ¢2 ~~ value at (30 — 2,0%5/,) ~ 2.048 > CIfor fy=5.5278 +2.048(1.088)= (3.2986, 7.7568) Wa QO To estimate the RSS (Residual), follow the following steps PaO Q To estimate the ESS (model), follow the following steps Dae SA Interpretation of coefficients What does the estimate 5.527 show? It is coefficient for Male showing that the average productivity of Male headed households is higher than female headed households by 5.527 at (sig at 1%): remember the t-test result in ch-2 What about the estimate 3.5? Average productivity of omitted category (Female headed households) ‘Why we omitted one category (Female)? Not to fall in dummy variable trap Average productivity of male headed hhs=3.5+ 5.527 *1= 9.027: remember the t-test result in ch-2 (Or use prediction ‘Average productivity of female headed hhs= 3.5+ 5.527 *0= 3.5 Or use prediction The productivity difference b/n male and female headed hh 9,027-3.5= 5.527 Wa Co (e Exercise « Is there a significant difference in average productivity between households with and without access to credit? « What is the average productivity of households with access to credit? « How much of the productivity variation is explained by access to credit? CROSS SECTIONAL ECONOMET 2.2. Multiple Linear Regression 2 Multiple linear regression= many regressors (independent variable) © Suppose: Dummy and continuous variables as an independent variable= ANCOVA Regression. Suppose you are going to analyze various determinants of maize productivity Based on your literature review, you think that maize productivity can be affected by: + Age of the household head ¥ Land fragmentation + Fertilizer applied per hectare ¥ Household land size ¥ Gender of the household head Then the multiple linear regression model will be: Yield= Bot Byaget Byfragment+ B.fertlizert B,land+ B.Gender+pi Note: estimation techniques are the same as simple linear regression model. Syntax: reg dep var indep vars. MENGISTU Y, UE Dae EL) Example: reg Yield age fragment fertlizer land Gender_n1 2 Based on p-value from five explanatory variables, only two variables (fragment and Gender) are significant. Note: only significant variables will be analyzed. a Then let as analyze the coefficients of significant variables 2 However, before making the analysis of the result, it is important to judge the efficiency of the model using some ed onteretetioneut equation of the regression model beams Ss 08sGender Vild~ 5.561740.055age 0.676Kragment-0.00Afertier 0, diagnostic tests. 2 In particular, inferences based on OLS results can be valid depending on whether the classical linear regression (CLRM) assumptions hold. UO a Now let as test the some of CLRM assumptions called diagnostic tests: i. Multicollinearity Test a The term multicollinearity means the existence of perfect or exact linear relationship among all or some of the explanatory variables of the regression model. a And the existence of multicollinearity can be examined (detected) using various techniques such as using auxiliary regression, pair-wise correlations among regressors and variance inflation factor (VIF) and or tolerance margin (1/VIF). @ VIF is most commonly used which measures how the variance of an estimator is inflated by the presence of multicollinearity. Note: Multicollinearity is a matter of degree and not of kind. < Itis not between the presence and the absence of its degrees (high or perfect)! Informal test: High R2 but t-ratio Formal tests: Take auxiliary regression Test pair-wise correlations among regressors Decision: best if less than 0.50 Test for variance inflation factor and tolerance Decision Asa rule of thumb if VIF is >10 or if 1/VIF < 10% (close to zero) there 1s multicollinearit. > Since our result shows that VIF ofall variables are less than 10 and I/VIF of all variables are grater than 10%, multicollinearity 1s not a problem in our model Note: > Multicollinearty is not a problem for nonlinear relationships between variables > Multicollinearity is essentially a sample (regression) phenomenon not for the population. Wa + ° Remedial measures if there is multicollinearity problem Drop one or more of the perfectly collinear variables Take sample over wide area (increase the sample size) Take new data Transformation of variables (take square, natural logarithm...) Combining cross-sectional and time series data Do nothing: Multicollinearity is God’s will, according to Blanchard multicollinearity is essentially a data deficiency problem not a problem with OLS or statistical technique in general. MENGISTU Y, LEST DEES) |. Test of homoscedasticity a It is the test of the variance of the error (disturbance) term. alf the error term doesn’t have a constant variance, we say there is Heteroscedasticity problem. a The nature of the variance of the error term can be judged by Breusch-Pagan test. Stata command: hettest Then you get the following result (Deasion- if the P-value 1s sufficiently small, e, if below chosen significant level (usually TO%), we reject the null hypothesis (Ho) of homoscedasticity (constant variance and accept the alternative hypothesis (1). Since our result shows that P-value is less than 10%, we have to reject Ho Then there is no constant variance (there is Heteroscedasticity problem) in our model. MENGISTU Y, eur ee US Remedial measures for Heteroscedasticity problem Check for outliers (for the dependent variables) Use robust regression Example: reg Yield age fragment fertlizer land Gender_ni, robust Note: hettest is not appropriate after robust regression Wa iii. Model Specification test Model specification test basically deals about: » The exclusion of relevant explanatory variables > The inclusion of irrelevant variables > Functional form error UIE Dae EL) Q Take Ramsey reset test Syntax: ovtest Decision: if the P-value is sufficiently small, that is, if below chosen significant level (usually 10%), we reject the null hypothesis (Ho) of homoscedasticity (constant variance and accept the alternative hypothesis (H,) < Implies that there is no model specification problem. TER THREE” CROSS SECTIONAL ECONOMETRICS DEBRE MARKOS UNIVERSITY(DMU) MENGISTU Y, iv. Normality of the disturbance term + There are various ways of testing the normality of ui. For example: y_ histogram with normal curve of residuals ¥ Normal probability plot and others COSA) LEST MENGISTU Y, oy Test of normality of the disturbance term using stata > First generate the disturbance term (U;) Syntax: predict ui, residual > Second test of normality of the disturbance term (Ui) a. Draw histogram of the ui with normal curve Syntax: histogram ui, normal v Then you get the result like Normal probability and quartile plot Syntax: pnorm ui or qnorm ui > Then you get the following result respectively Noman) bia 0% enpwcsifsumie) °° o : aos > Both graphs shows that the disturbance term (ui) is almost normal. MENGISTU Y, LEST DEES) lations of Some of the VPC ellie The presence of multicollinearity a We said that multicollinearity means the existence of perfect or exact linear relationship among all or some of the explanatory variables of the regression model. » Let us assume that the variable fertilizer is twice that of age. » Then let us create hypothetical variable called age3 which is a function of age Syntax: gen age3=50+age Note: we deliberately make the 10th observation of age3 95 instead of 75, unless the stata will drop one of the perfectly correlated variables in the regression. We Then after regression with the new data, we get the following VIF result vit r DOT Categorical variables as a regressor ‘Suppose: Educational level (EducLevel) ‘Syntax: reg depvar i. Categorical var Example: reg Yield fertlizer Gender_n1 i.€ducLevel_ni Note: When you put i. Infront of the Categorical variables variable the software automatically drop the one category (usually the lowest category) that will be your bench mark Unless you put | Infront of the Categorical variables the software consider the variable as a continuous variable Your estimate will be wrong Wa Answer the following questions based on the regression result given below A. What does 4.612 shows? 8. Whats the average productivity d/ce b/n male and female headed hhs? C. Whatis the difference in average productivity b/n hhs with illiterate and secondary educ. completed heads? D. What is the difference in average productivity b/n hhs with secondary and post-secondary educ. completed heads? 7 PIE eer Discussion question a What is the average productivity of households managed by male and secondary educ. completed heads? Ne eS To know the average productivity of households managed by male and secondary educ. Completed heads. 1%: we have to generate interaction variable of Gender and educational level 2nd: make regression using the newly generated variable reg Yield fertlizer i.IntGenEduc The average productivity of households managed by male and secondary educ. Completed heads is 3.82 = UES The linear probabi model (LPM) Suppose you are intended to investigate the effect of gender and land size on access to credit Model: Credit_Dummy_n1= By* B,land+ B,Gender+pi Since the dependent variable is takes values which are either 0 or 1, the model can be interpreted as the probability of observing a 0 or 1 given the explanatory variables Though the LPM model is not entirely correct, we can use OLS to estimate it. Wa Interpretation of coefficients Interpret the intercept Interpret the coefficient of land Interpret the coefficient of Gender Answer ‘A. The probability of access to credit for female managed HHs with no land Is 0.168 or 17% Note: if the intercept term is negative, it will be interpreted as zero (because probability can’t be negative) B, The coefficient of land shows that for one hectare increase in HH's land size, on average, probability of access to credit decrease by 0.00069 or 0.07% but itis not statistically significant. However, we can estimate the actual probability of access to credit for a particular HH land size. Example: suppose the male managed HH with land size of S hectare E(x/land = 5, Gender = 1) = 0.168 ~ 0.00067 *5 +0499 +1 = 0.664 Or use prediction C. The coefficient of Gender shows that the probability of access to credit for male managed HHs greater than female managed HHs on average by 50% Uae SA ieee UEP MCrsP eM CoM ire coreim ecm PeLar TT Importing STATA result to Microsoft word 1. Using asdoc Syntax: add asdoc before stata commands except for figure commands Examples: asdoc sum asdoc reg Yield age fragment fertlizer land Gender_n1 = EducLevel_n1 Credit t_Dummy_n1 Fete r & The software authomatically save your result in Microsoft word file “Myfile.doc” in the working directory you are working on. >» Click on “Myfile.doc” in the stata result window to open the document MENGISTU Y, CU LESSEE Uae SA 2. Using outreg2 Itis used for regression results Syntax: Note: run simultaneously Example: reg Yield age fragment fertlizer land Gender_n1 EducLevel_n1 Credit_Dummy_ni outreg2 using Table1.doc, replace The software automatically save your result in Microsoft word file “Table1” in the working directory you are working on. Click on “Table” in the stata result window to open the document Note: outreg2 is usually used for publication purpose. For your senior essay please use asdoc option. — Wa

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy