0% found this document useful (0 votes)

7 views6 pages

TNDY - TA Session 2

The document provides a tutorial on statistical analysis using multiple regression in Stata, focusing on the Current Population Survey dataset. It explains the concepts of bivariate and multiple linear regression, including how to interpret regression output and coefficients, particularly in relation to income, age, gender, and race. The tutorial also includes practical commands for running regression models in Stata and visualizing income differences by gender.

Uploaded by

elfmerooh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

TNDY - TA Session 2

Uploaded by

elfmerooh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

##TNDY TA Session 2: Statistical analysis using multiple regression

##Stata (version 14.2 for Macs)

##Last update: Javier M. Rodriguez, June 12, 2021
##Main dataset: “TNDY_TASession2_CPS.dta”
##These data are a 10% random sample from the Current Population Survey, one of the
key intercensal surveys in the U.S. and sponsored by the U.S. Census Bureau and the
Bureau of Labor Statistics. The CPS is the survey officially used in the U.S. to monitor key
labor force statistics (e.g., the unemployment rate).

Linear regression

I. Bivariate Linear Model

 In many situations, we want to quantify the relationship between two variables (X and
Y), where we have good reasons to think that X affects Y.
 Sometimes we may also want to find (estimate) the predicted value of Y for a given level
of X. By “predicted value of Y” we mean, what can be known about Y using X (as an
explanation of Y).
o These two scenarios can be resolved via regression analysis. In our case, we will
use linear regression. This means that we will assume that the association
between X and Y, or at least some part of it, can be summarized by a line.
o For example, the predicted income (Y) for people of a given age “X” can be
determined as follows:

reg inctot age

Interpretation of the regression output table

Keep in mind that the regression output table above is for a linear regression, meaning that it
describes a straight line. And the equation (or mathematical representation) of this “predicted”
line is as follows:

1
y=α + βX
or, in this case:
income=α + β ( age )

or, according to the regression output table:

income =cons +Coef (age)

 _cons: The intercept is the expected mean of Y (income) when X (age)=0.

 Coef: the coefficient β measures by how much does the dependent variable (income
“Y”) change if the independent variable (age “X”) is increased by 1 unit (1 year of life).
o Q: How would you interpret the coefficient of age in this regression? How much
income will change if I increased the age by one unit.

 t = coef/SE.
o Sign: The sign tells you if the mean coefficient value is to the left (negative) or to
the right (positive) of zero. Another way to think about it, is that it tells you if the
variables X and Y are negatively (or, inversely: “more of X less of Y”) or positively
(“more of X more of Y”) associated.
o Value: For a conventional statistical significance level, we expect the absolute
value of t to be greater than 1.96 (rule of thumb) to reject the null hypothesis
that income and age are not related (i.e., that β=0 ) at the 0.05 level of
significance.
 Interpretation for t=1.96: It means that β (the coef) is 1.96 standard
errors (SE) away from “0”. In statistical terms, this is good indication that
our estimate of β (the quantification of the association between X and Y)
is large enough to be differentiable from “0”; indeed, that “distance” of
1.96 SEs from “0” stands for a 95% certainty that there is an association
between X and Y.

II. Multiple Linear Regression Model

 You use a multiple or multivariate regression model when you need to “hold constant”
variables that may be associated with the relationship (X-and-Y relationship) of interest.
o This means that you incorporate more [independent] variables into the model.
 In general terms, what is a “model”?
 A multiple regression model explicitly “holds fixed” other variables. By this we mean
that, what the regression does, is to “statistically control” for other variables. Let’s
elaborate on the idea of “statistical control”… To do this, let’s imagine that our new
linear model is:

y=α + β 1 X + β 2 Z
or,

2
income=α + β 1 ( age ) + β 2 (female)

What our new linear model is telling us is that the income of individuals is a function
(i.e., depends) on their age and on their gender.
o By “statistical control” we mean that, in this case, the association between
income (Y) and age (X) is independent from the gender (Z) of the individual, and
that the association between income (Y) and gender (Z) is independent from the
age (X) of the individual.
o By “independent” we mean that estimated differences in income between males
and females ( β 2) are not related to differences in age between males and
females and differences in income as people age. Note that, at the same time,
the estimated differences in income as people age ( β 1) are not related to
differences in income and age between males and females.
o That’s what a multiple regression does: It separates (isolates) the
associations between Y and each of the independent variables, and
quantifies their independent associations in the form of a coefficient
(the βs ).

This is how you run the multiple regression model income=α + β 1 ( age ) + β 2 (female)
in Stata:

reg inctot age female

 Interpretation of the coefficient

 By how much does the dependent variable (income) change if the independent variable
age increases by one unit, holding all other independent variables fixed/constant (in this
case gender (female))?
o Answer: For every additional year of age, income increases by $72, on average.
 Q: How do we interpret the coefficient of the dummy variable female?

3
o Answer:
 Because the variable is coded 1=female, 0=male, then a 1-unit change is
the same as going from males to females.
 Accordingly, the coef. for gender is the average difference between males
and females.
 Because the coef. for being a female is negative ( β 2=−2202), then we
can say that females show an income that is $2202 smaller than that of
males.
 In other words: The difference in income between males and
females is $2202, on average.

Let’s run a quick visualization (not from the model but from the observed data) on the income
differences by gender as individuals age:

twoway (lfit inctot age if female==1) (lfit inctot age if female==0)

*** If you have time, go over the following material:

Including a categorical variable in the multiple regression model

4
Imagine that you also wonder if the difference in income as people age are related to the fact
that there are differences in age by races/ethnicity, and within each race/ethnicity between
males and females. What to do? Yup: run a multiple regression model now including “race”.
reg inctot age female i.race

 Here, the variable race is coded: 1=white, 2=black, 3=Asian, 4=other.

 Also, remember that Stata automatically separated them for us (by using i.race in the
command syntax). You could also have generated the 4 dummies for race and include them
in the model one-by-one; for example: reg inctot age female race1 race2 race3…, etc.
o In the regression output, the first category race=1 (white) is treated as the base
category (or category of reference) in the regression model. This means that all
coefficients related to race will be relative to the base category.
o Q: Can you interpret the coefficient of race2?
 Answer:
 Because the variable is coded 1=black, 0=others, but the base
category is for whites (1=white, 0=others), then a 1-unit change is
the same as going from whites to blacks.
 Accordingly, the coef. for race2 is the average difference between
black and white people in the sample.
 Because the coef. for being a black person is negative (
β race2=−520 ), then we can say that black individuals show an
income that is $520 smaller than that of white people, on
average.

5
o Q: Can you interpret the coefficient of race3? The Asian show 478$ income less
than the white people, on average

Chapter 3
No ratings yet
Chapter 3
36 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
43 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
UNIT II Regression
No ratings yet
UNIT II Regression
59 pages
University of Caloocan City: Managerial Economics Eco 3
No ratings yet
University of Caloocan City: Managerial Economics Eco 3
34 pages
Linear Regression Model: Man - PN@VNP - Edu.vn
No ratings yet
Linear Regression Model: Man - PN@VNP - Edu.vn
77 pages
Cross Sectional
No ratings yet
Cross Sectional
40 pages
Multi Regrson
No ratings yet
Multi Regrson
40 pages
Unit 11
No ratings yet
Unit 11
21 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Lecture Note #8 - PEC-CS701E
No ratings yet
Lecture Note #8 - PEC-CS701E
20 pages
ML Unit-2
100% (1)
ML Unit-2
52 pages
Brief Lecture Notes On Simple Linear Regression Regression Analysis
No ratings yet
Brief Lecture Notes On Simple Linear Regression Regression Analysis
8 pages
Regression Analysis (AI)
No ratings yet
Regression Analysis (AI)
9 pages
Module - 05 Statistical Computing and R Programming
No ratings yet
Module - 05 Statistical Computing and R Programming
53 pages
Running A Proper Regression Analysis: V G R Chandran Govindaraju Uitm Email: Website
No ratings yet
Running A Proper Regression Analysis: V G R Chandran Govindaraju Uitm Email: Website
36 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Statistical Interference Lecture-8
No ratings yet
Statistical Interference Lecture-8
12 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Lecture06 MultReg
No ratings yet
Lecture06 MultReg
38 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
19 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
Linear Regression
No ratings yet
Linear Regression
216 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Lec 5 V 11
No ratings yet
Lec 5 V 11
44 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Econometrics - Chapter - Chapter - II
No ratings yet
Econometrics - Chapter - Chapter - II
34 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
59 pages
Note 13 - Linear Regression
No ratings yet
Note 13 - Linear Regression
25 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Copyofcopyof1lec25 27simplelinearregression 231224065709 c7c439d0
No ratings yet
Copyofcopyof1lec25 27simplelinearregression 231224065709 c7c439d0
31 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Chapter 4 (Compatibility Mode)
No ratings yet
Chapter 4 (Compatibility Mode)
66 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
11 - Econometrics - Linear Regression
No ratings yet
11 - Econometrics - Linear Regression
20 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Multiple Regression and Dummy Variable Analysis
No ratings yet
Multiple Regression and Dummy Variable Analysis
16 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
39 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Lecture Two (Copy)
No ratings yet
Lecture Two (Copy)
27 pages
Topic 3 - Simple Regression Analysis
No ratings yet
Topic 3 - Simple Regression Analysis
37 pages
Econometrics I - Lecture 7 (Wooldridge)
No ratings yet
Econometrics I - Lecture 7 (Wooldridge)
34 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Screenshot 2024-02-26 at 3.47.20 PM
No ratings yet
Screenshot 2024-02-26 at 3.47.20 PM
15 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Session 1.3 Notes
No ratings yet
Session 1.3 Notes
39 pages
Dr. Hussin Abdullah School of Economics, Finance and Banking, Uum Cob
No ratings yet
Dr. Hussin Abdullah School of Economics, Finance and Banking, Uum Cob
12 pages
Correlation
No ratings yet
Correlation
13 pages
Quanti - Simple Linear Regression - With Group Activities
No ratings yet
Quanti - Simple Linear Regression - With Group Activities
6 pages
Final Data
No ratings yet
Final Data
460 pages
TNDY Inequality Class4 Summer
No ratings yet
TNDY Inequality Class4 Summer
29 pages
TNDY Inequality Class3 Summer
No ratings yet
TNDY Inequality Class3 Summer
22 pages
TNDY Inequality Class2 Summer
No ratings yet
TNDY Inequality Class2 Summer
23 pages
TNDY Inequality Class1 Summer
No ratings yet
TNDY Inequality Class1 Summer
19 pages
Cara Membaca Hasil Regresi
No ratings yet
Cara Membaca Hasil Regresi
17 pages
Chapter 4 - SQUASH SEED POLVORON
No ratings yet
Chapter 4 - SQUASH SEED POLVORON
10 pages
Test Questions For Grade 11
No ratings yet
Test Questions For Grade 11
10 pages
Problem 7.5 A)
No ratings yet
Problem 7.5 A)
11 pages
Covariance & Correlation
No ratings yet
Covariance & Correlation
16 pages
IT6006-Data Analytics Department of CSE 2018-2019
No ratings yet
IT6006-Data Analytics Department of CSE 2018-2019
193 pages
STA3043S Test 1 2019 - Solutions
No ratings yet
STA3043S Test 1 2019 - Solutions
6 pages
Hypothesis T
No ratings yet
Hypothesis T
40 pages
Com 107
No ratings yet
Com 107
217 pages
Essential Concept 2 - Standard Error of Estimate, Coefficient of Determination, Confidence Interval For A Regression Coefficient - IFT World
No ratings yet
Essential Concept 2 - Standard Error of Estimate, Coefficient of Determination, Confidence Interval For A Regression Coefficient - IFT World
2 pages
Chapter 13, Numbers 13.6, 13.8, 13.9, and 13.10 2. Chapter 14, Numbers 14.11, 14.12, and 14.14 3. Chapter 15, Numbers 15.7, 15.8, 15.10 and 15.14
No ratings yet
Chapter 13, Numbers 13.6, 13.8, 13.9, and 13.10 2. Chapter 14, Numbers 14.11, 14.12, and 14.14 3. Chapter 15, Numbers 15.7, 15.8, 15.10 and 15.14
5 pages
Descriptive and Predictive Analytics
0% (1)
Descriptive and Predictive Analytics
45 pages
M&M Analysis KC Shared
No ratings yet
M&M Analysis KC Shared
103 pages
Student T-Distribution Table
67% (3)
Student T-Distribution Table
1 page
1.3 Type I Error Type II Error and Power PDF
No ratings yet
1.3 Type I Error Type II Error and Power PDF
11 pages
D MBA Plans
No ratings yet
D MBA Plans
3 pages
Confidence Intervals: Assignment 3
No ratings yet
Confidence Intervals: Assignment 3
4 pages
Immediate Download (Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Ebooks 2024
100% (3)
Immediate Download (Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Ebooks 2024
49 pages
SAS 02 - MAT089 (Biostat) - Branches of Statistics, Biostatistics
No ratings yet
SAS 02 - MAT089 (Biostat) - Branches of Statistics, Biostatistics
6 pages
Skittles Project
No ratings yet
Skittles Project
17 pages
Day 17 - Numpy
No ratings yet
Day 17 - Numpy
7 pages
ANNEX III - Course Specificationfor ME
No ratings yet
ANNEX III - Course Specificationfor ME
54 pages
Confidence Interval For Different of Mean: X X Z N N
No ratings yet
Confidence Interval For Different of Mean: X X Z N N
1 page
Z Test
No ratings yet
Z Test
18 pages
Recursive - and - Non - Recursive - Models - Lecture
No ratings yet
Recursive - and - Non - Recursive - Models - Lecture
14 pages
A Stata Implementation of The Blinder-Oaxaca Decomposition
No ratings yet
A Stata Implementation of The Blinder-Oaxaca Decomposition
25 pages
Kertas Review Jurnal
No ratings yet
Kertas Review Jurnal
2 pages
Business Report Sparkling Dataset - TSF
No ratings yet
Business Report Sparkling Dataset - TSF
26 pages
Two Way ANOVA Analysis
No ratings yet
Two Way ANOVA Analysis
43 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

TNDY - TA Session 2

Uploaded by

TNDY - TA Session 2

Uploaded by

##TNDY TA Session 2: Statistical analysis using multiple regression

##Stata (version 14.2 for Macs)

I. Bivariate Linear Model

reg inctot age

Interpretation of the regression output table

or, according to the regression output table:

income =cons +Coef (age)

 _cons: The intercept is the expected mean of Y (income) when X (age)=0.

II. Multiple Linear Regression Model

reg inctot age female

 Interpretation of the coefficient

twoway (lfit inctot age if female==1) (lfit inctot age if female==0)

*** If you have time, go over the following material:

Including a categorical variable in the multiple regression model

 Here, the variable race is coded: 1=white, 2=black, 3=Asian, 4=other.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.