0% found this document useful (0 votes)
19 views5 pages

Using Stata Chapter 3

Uploaded by

kkjpj1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Using Stata Chapter 3

Uploaded by

kkjpj1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 3: UE Chapter 3

While Chapter 3 of Using Econometrics covers quite a bit of ground, the one
advancement in Stata it motivates is thinking about dummy variables.

3.1 Dummy Variables and Conditional Statements

Dummy variables (more formally called “dichotomous variables) are as useful as


they are simple. Recall that a dummy variable is one that can take on only two
values: zero or one.

There is nothing particularly special about dummy variables in Stata. We give them
special treatment here because we quite often have to “construct” them. In Stata, we
can do so with “conditional” statements. Before I confuse things too much, let’s
consider an example.

Recall the Woody’s restaurant example from Using Econometrics section 3.2. The
initial model to estimate gross sales (Y) took the following form:

Y i=β 0 + β N N i+ β P P i+ β I I i +ε i

where:

Y = Sales: gross sales (in dollars) of the ith Woody’s location


N = Competition: the number of direct market competitors within a two-
mile radius of the ith Woody’s location
P = Population: the number of people living in a three-mile radius of the
ith Woody’s location
I = Income: the average household income of the population
measure in variable P

While having the actual number of direct market competitors in a two-mile radius is
ideal, suppose your boss asks you to specify and estimate the following model:

Y i=β 0 + β N D i+ β P Pi + β I I i +ε i

where:
D = Competition: Dummy variable = 1 if there are four or more
competitors within a two-mile radius of the Woody’s
location and 0 otherwise.

D is a dummy variable that is “turned on” if a location has more than three
competitors in a two-mile radius and “turned off” if a location has fewer than four.

Using Stata 3-1


Why would anyone do this? There could be a number of reasons. It could be
expensive to get an exact number of competitors in a two-mile radius when there
are lots of competitors but relatively easy to count the first four.

Whatever the reason, let’s go with it as it makes for a nice example.

First, we replicate the results from Using Econometrics, Chapter 3, Table 3.2 (the
data set WOODY3.dta is available at
http://www.pearsonhighered.com/studenmund). The command in Stata is:

regress Y N P I

But now we want to estimate the model our boss asked for (always a good idea!)—
the one replacing N with D.

Unfortunately, we don’t have D in the data set!

But, we know how to create a variable in Stata. Recall the “generate” command
introduced in Using Stata Chapter 1. To create D, however, we need to go a bit
beyond “generate.” This is where conditional statements come in handy.

First, let’s initialize the variable D in Stata. By “initialize” I mean create the variable
in Stata but don’t actually give it any meaningful value. Instead, we will give it a
value of “missing.” In Stata, that means we will assign the variable to be equal to “.”

generate D = .

Stata will report that you have generated 33 missing values. But that’s OK. Let’s keep
going.

We want D to have a value of zero or one. Let’s start with zero. We want D to be zero
if N takes a value of 3, 2, 1, or 0. An easy mathematical way to do this is to tell Stata
to make D zero if N is less than or equal to 3. The command in Stata:

replace D = 0 if N<=3

We have two new “commands” here. The first is “replace” which tells Stata, quite
simply, to replace the value of variable D. The “if” tells Stata when to actually do the
replacement. (note: the “if” command will come in very handy as you work with
data!) In our case, we want Stata to replace “.” with 0 if N is less than or equal to 3.

We should also mention the “<=” part of the command. That is read as “less than or
equal to.” The math character for “less than or equal to” would look like this: ≤. Stata
does not have that character, so we have to write “<=”. The same goes for “greater
than or equal to” (i.e. we use “>=” rather than “≥”).

Using Stata 3-2


Taking the same approach, we would want to replace D with 1 if N is greater than or
equal to 4. The command:

replace D = 1 if N>=4

We can now estimate the new model as requested. Use the command:

regress Y D P I

The results can be found in Figure 3.1

FIGURE 3.1

You can think about how this compares to the original specification and which is
“better.” The big idea for now is how to create and use a dummy variable.

But, we shouldn’t stop here. In the current example, we had two groups or
categories. What if we wanted three? Say one category for Woody’s locations with
three or fewer competitors, another for a locations with four to six, and another
with seven or more competitors? If we have three groups, we’d want to create two
dummy variables. Let’s create D4to6 and D7orMore

where:

D4to6: dummy variable = 1 if there are four, five or six


competitors within a two-mile radius of the Woody’s
location and 0 otherwise.

Using Stata 3-3


D7orMore: dummy variable = 1 if there are seven or more
competitors within a two-mile radius of the Woody’s
location and 0 otherwise.

Note that the omitted condition is three or fewer competitors in a two-mile radius.
This is when D4to6 and D7orMore are both equal to zero.

Creating these in Stata is not difficult but does require a few lines of code and will
call upon more conditional statements. Let’s start with D4to6. First, let’s initialize:

generate D4to6 = .

Then we need to assign when D4to6 is 0 and when it is 1. This is not hard but can be
tricky. First, make D4to6 = 0 if there are three or fewer competitors.

replace D4to6 = 0 if N<=3

Next, assign D4to6 to be 0 if there are seven or more competitors.

replace D4to6 = 0 if N>=7

So far, so good. Now, we need to assign when D4to6 should be one.

replace D4to6 = 1 if N>=4 & N<=6

This last statement is a bit more complicated and we have used the symbol “&”. In
this statement we are saying “replace D4to6 to be equal to 1 if N is greater than or
equal to four AND if N is less than or equal to 6.”

This scratches the surface of how useful conditional statements can be in Stata. We’ll
call upon these conditional statements as we move along, as you can use them with
any type of variable—not just dummies.

It is ALWAYS good practice, especially when you are relatively new at programming,
to take a few minutes and visually check to see if Stata is doing what you want it to
do. In this case, we want to make sure that D4to6 is 1 when it is supposed to be and
0 when it is supposed to be.

There are a number of ways to check. A low-brow approach—but very effective!—is


to open up the data editor and manually inspect (recall how to open your Stata data

in the data editor—you click on the icon).

Figure 3.2 show the first 10 rows in the dataset.

FIGURE 3.2

Using Stata 3-4


Checking a few observations reassures us that we are getting what we want. For
example, the Woody’s location in the first row has three competitors in a two-mile
radius (you know this because N = 3). With three competitors, the D4to6 varaible
should be 0. Nicely, it is.

For the second observation, N = 5. That means D4to6 should “turn on.” And, in fact, it
does.

We can now move on to creating the D7orMore variable. The approach would be the
same as before. First, initialize the variable and give it a value of “missing.” Second,
replace the value of D7orMore with 0 or 1 depending on the condition.

Can you come up with the code? You should try. After you do, check your answers in
the footnote below.1

Dummy variables are useful. Conditional statements in Stata are useful. And we will
see more as we move along.

1
gen D7orMore = .
replace D7orMore = 0 if N<=6
replace D7orMore = 1 if N>=7

Using Stata 3-5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy