0% found this document useful (0 votes)
9 views

Take Home Assignment 2

The document outlines the instructions for a take-home assignment for ECON 330, detailing the tasks to be completed in three parts. Part A focuses on data manipulation and analysis using geographical and infectious disease datasets, while Part B involves interpreting a Randomized Controlled Trial related to education. Part C discusses regression analysis related to job training programs and housing prices, including various statistical modeling techniques.

Uploaded by

sheralam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Take Home Assignment 2

The document outlines the instructions for a take-home assignment for ECON 330, detailing the tasks to be completed in three parts. Part A focuses on data manipulation and analysis using geographical and infectious disease datasets, while Part B involves interpreting a Randomized Controlled Trial related to education. Part C discusses regression analysis related to job training programs and housing prices, including various statistical modeling techniques.

Uploaded by

sheralam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ECON 330 Fall 2023

Take Home Assignment 2


Instructions:
• You have to work in the groups that you have already formed for the project. However,
each individual must submit the assignment individually.
• An individual’s assignment from within a group will be picked randomly for grading and
the same grade will be assigned to the entire group for that assignment.
• Please note that sharing or discussing assignments with anyone outside your own group is
not allowed and makes grounds for a disciplinary action.
• The assignment is due on Sunday 19th November 2023 11:55 pm. Late submissions will
result in a 15% deduction.

Part A (28 Marks)


The three datasets (geography1.csv, geography2.dta and infectious_disease.dta) were created as
part of The Center for International Development’s ongoing research into the role of geography in
economic development. The original datasets have been altered for this assignment.
The infectious disease dataset was revised August 31, 2001. In the infectious disease dataset, there
are two sets of variables for every country: Area affected by malaria in 1946, 1966, 1982 and 1994;
and, area affected by malaria weighted by 1995 population in 1946, 1966, 1982 and 1994.
The two geography datasets are part of a larger dataset which was revised on August 31, 2001.
Variables in the geography datasets include: latitude and longitude of each country's centroid, mean
elevation, mean distance to nearest ice-free coastline, mean distance to nearest ice-free coastline
or sea-navigable river, distance from a country's centroid to nearest coastline, distance from a
country's centroid to nearest coastline or sea-navigable river, total population, percent of
population within 100km of the coastline, percent of the population within 100km of the nearest
coastline or sea-navigable river, percent of land area within 100km of the coastline, percent of land
area within 100km of the nearest coastline or sea-navigable river, percent of population in the
geographic tropics, , percent land area in the geographic tropics, and the typical population density
an average person experiences, by country.
1) Transfer the data in geography1.csv to Stata. (1 Mark)

2) Append this data with the remaining geographical data available in geography2.dta.
(1 Mark)

3) Generate a chart using your appended data, which tells us something insightful about the
data. You can choose any variables and any type of chart e.g. bar, pie, scatterplot. Briefly
state what insights does your graph reveal about the data. (5 Marks)

4) There might be unwanted spaces before or after country names in the variable “country”.
These spaces could adversely affect the merge to be completed in the next question. Please

1
ECON 330 Fall 2023

remove these spaces. What other issues could adversely affect merging results?
(3 Marks)

5) Merge this dataset with the dataset infectious_disease.dta (HINT: Merging can be
completed using a variable that is common to both datasets). How many observations are
there in total? Drop all observations that do not match. (5 Marks)

6) Incorrectly measured values in pop95 and pdenpavg are recorded as negative values in the
dataset. Please save these negative values to a separate dataset and drop them from the
original dataset. (2 Marks)

7) Rename variable elev as elevation. (1 Mark)

8) Variables Mal46p, Mal66p, Mal82p and Mal94p describe the percentage of 1995
population in malaria- affected areas in four different years (1946, 1966, 1982 and 1994).
Generate a variable that categorizes observations in Mal46p (percentage of population)
by low, moderate and high, based on the following groupings: (4 Marks)
a. “Low” if less than or equal to 33 percent of the 1995 population resided in a
malaria-affected area in 1946.
b. “Moderate” if more than 33 percent and less than or equal to 66 percent of the
1995 population resided in a malaria-affected area in 1946.
c. “High” if more than 66 percent of the 1995 population resided in a malaria-
affected area in 1946.

9) The variable “time” captures the point in time when the observation was recorded. Change
the format of this variable so that a current observation of the form “Mon Aug 10 11:06:25
UTC 2015” is converted to “Aug 10 11:06:25 2015”. (2 Marks)

10) Sort in descending order, the variable that captures information on population in 1995.
(1 Mark)

11) What was the lowest magnitude of malaria area (percentage) for every country across the
four years: 1946, 1966, 1982 and 1994? Construct a variable containing this value for
each country. (3 Marks)

2
ECON 330 Fall 2023

Part B (30 Marks)


You are provided with an extract from a paper about a Randomized Controlled Trial (RCT)
designed to investigate the returns to education. This paper was published in a top economics
journal 10 years back. The extract has been slightly modified for this assignment.
Answer the following questions:
1. In your own words, state what Table I tells us. (5 Marks)
2. What is the ‘treatment’ whose impact we are trying to measure and what are the outcomes
of interest? In your own words, briefly describe the theory of change linking the
treatment with the outcome. (5 Marks)
3. Table V on the last page of the paper’s extract reports regression estimates on the results
from this randomized experiment or the RCT. Interpret the estimates reported in columns
1-4 of Table V, making sure to comment on their magnitude and significance. Can we
interpret these estimates as the average effect (i.e., causal effect) of our treatment? What
do we learn from this experiment? (10 Marks)
4. Write a short 150-200 word abstract for this paper based on the extract provided to you.
(5 Marks)
5. Write a brief one-paragraph conclusion for this paper. (5 Marks)

3
ECON 330 Fall 2023

Part C
1. To test the effectiveness of a job training program on the subsequent wages of workers,
we specify the model:
𝑙𝑜𝑔(𝑤𝑎𝑔𝑒) = 𝛽0 + 𝛽1 𝑡𝑟𝑎𝑖𝑛 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝑢,
where train is a binary variable equal to unity if a worker participated in the program.
Think of the error term 𝑢 as containing unobserved worker ability. If less able workers
have a greater chance of being selected for the program, and you use an OLS analysis,
what can you say about the likely bias in the OLS estimator of b1? (Hint: Refer back to
Chapter 3.) (2 Marks)

2. For a child 𝑖 living in a particular school district, let 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 be a dummy variable equal
to one if a child is selected to participate in a school voucher program, and let 𝑠𝑐𝑜𝑟𝑒𝑖 be
that child’s score on a subsequent standardized exam. Suppose that the participation
variable, 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 , is completely randomized in the sense that it is independent of both
observed and unobserved factors that can affect the test score.
(i) If you run a simple regression 𝑠𝑐𝑜𝑟𝑒𝑖 on 𝑣𝑜𝑢𝑐ℎ𝑒𝑟𝑖 using a random sample of size
𝑛, does the OLS estimator provide an unbiased estimator of the effect of the
voucher program? (2 Marks)
(ii) Suppose you can collect additional background information, such as family
income, family structure (e.g., whether the child lives with both parents), and
parents’ education levels. Do you need to control for these factors to obtain an
unbiased estimator of the effects of the voucher program? Explain. (2 Marks)
(iii) Why should you include the family background variables in the regression? Is
there a situation in which you would not include the background variables?
(2 Marks)

4
ECON 330 Fall 2023

3. Using the dataset HPRICE2.dta, we are going to explore the implications of using
different functional forms in regression analysis. Let’s start with a simple bivariate
relation between air quality (nox) and housing price (price).
a. Report the units in which each variable is measured. Estimate and interpret the
simple regression of price on nox. (4 Marks)
b. Looking at the density of each variable & its logarithm, the scatterplot (see Figure
1 below), as well as any guidance from Chapter 6, what functional form would
you like to use when specifying this regression model. Write down and estimate
your preferred regression equation. Interpret the OLS estimates. (4 Marks)

c. We estimated four regressions involving logarithm of either or both variables.


Regression estimates involving logarithms are reported below. In each model,
interpret the slope of price with respect to nox. Are these slope estimates very
different from each other? (6 Marks)

5
ECON 330 Fall 2023

6
ECON 330 Fall 2023

d. Produce a graph in Stata that overlays the sample regression functions from our
log-log model and lin-lin model on the scatterplot of price against nox. (Hint: use
the “graph twoway” and “graph function” commands and learn the correct use of
‘range’ option in the latter.) (5 Marks)
e. Estimate second-order (quadratic) and third-order polynomials (cubic function) in
nox. Is the cubic term, nox3, statistically significant? Overlay the two estimated
regression functions in a chart like the one in (d). (3 Marks)
f. Do we know the true functional form of price-nox relation in the population?
(1 Mark)
g. If the answer to (f) is ‘No’, are we justified in using a polynomial function in x to
approximate the unknown (potentially non-linear) y-x relation? [Hint: In calculus,
we study functions of single variables y = f(x). There are results in calculus (e.g.,
Taylor theorem or Taylor series expansion) which allow us to approximate
arbitrarily complex non-linear functions, f(x), with a sum of polynomials in x, g(x),
by finding and computing the derivatives of f(x) (e.g., we saw the example of a
linear approximation to the natural-log function in class).] (1 Mark)
h. Estimate a tenth-order polynomial in nox and produce a chart showing the estimated
function along with the quadratic function and scatter plot (as down in (d) above).
Describe the second- and tenth-order polynomial function estimates? How and why
are these two regressions different when estimated in the same sample?
(6 Marks)
i. Is there a sense in which you may have “overfit” your sample data by adding too
many non-linear terms in the tenth-order polynomial? (1 Mark)
j. Estimate a regression of price on nox, rooms, dist, crime and stratio. Which of the
explanatory variables has the biggest effect on housing price? Confirm your
answer by computing the standardized beta coefficients. (Hint: use the option
‘beta’ in the regress command) (3 Marks)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy