0% found this document useful (0 votes)
64 views8 pages

ECON 310 Stata Assignment

Uploaded by

sxiang23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views8 pages

ECON 310 Stata Assignment

Uploaded by

sxiang23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment questions

1)
a)

There are 19 variables and 72 observations.

b)

The median household income data seems to be right skewed but approximately normal.
c)

Mean: 55833.25
Standard deviation: 9428.312
95th percentile: 82627

d)

The 90% confidence interval for the mean of the Median Household Income is from 53981.42 to 57685.07, this means
that we are 90% confident this range contains the true population mean. The standard error is 1111.137, this is calculated
by the following formula:

9428.312
√❑
This is interpreted as the variability of the sample statistic, and in this case, the sample mean. In this case, the standard
error of the sample mean would be smaller than the standard deviation of the sample because it considers the variability of
sample mean instead of individual sample data points.
e)

At a significance level (α ) of 0.10, we have the test statistic = 5.2498 (very high/extreme), which is very extreme and
indicates a strong evidence against the null hypothesis that the mean Median Household Income is equal to 50000, in
favor of the alternate hypothesis that the mean is different from 50000. The p-value is the probability of obtaining the
sample results assuming that the null is true, and in this case, the p-value is 0.0000, which is less than the chosen
significance level of 0.10, which suggests that we have strong evidence against the null. The 90% confidence interval
confirms our findings, since it has a lower bound of 53981.42 and an upper bound of 57685.07, which doesn’t contain
50000.

At a significance level of 0.05, we conclude that we have strong evidence against the null hypothesis that the mean is
equal to 50000. The test statistic is equal to 5.2498, indicating strong evidence against the null, and the p-value of 0.000,
which is less than our chosen significance level of 0.05. The 95% confidence interval has a lower bound from 53617.71
and an upper bound of 58048.79, which doesn’t include 50000 either. Therefore, we reject the null.
f)

The correlation coefficient is 0.3030, which indicates a weak/moderate positive relationship between median household
income and the highschool graduation rate. Meaning that there is a weak tendency that the two variables move together.

g)

In this graph, we can see that the variables HS graduation rate and Median Household Income move together, but the
correlation is not so obvious as the data points are scattered around the plot. Although it does show an overall weak trend.

h)

In the regression equation: y=β 0 + β 1 x + ε , beta_0 is the constant term of the coefficient in the output, which is -10911.9,
beta_1 = 727.6879 which is associated with the hs graduation rate term. Both of these coefficients have standard error
output in the table. This regression has a null hypothesis that the population parameter beta_1 is equal to 0 (no
correlation), and the alternative hypothesis that the population parameter beta_1 is not equal to 0 (there is correlation).
The t test is done at a 95% significance level, which means that the associated confidence interval at this level will contain
the true population parameter about 95% of the time. The t test statistic for beta_1 is 2.66, and its associated p value of
0.01(<0.05), which means that there is statistically significant evidence against the null, which means that there is a
correlation between hs graduation rate and median household income. The t test statistic for beta_0 is -0.43 and its
associated p value of 0.665(>0.05), in this case, we don’t have enough evidence to reject the null hypothesis and therefore
the value of beta_0 is statistically identical to 0. The R squared value in this regression is 0.0918, which indicates a bad fit
of the model to the data. R squared value represents the proportion of the variance in the dependent variable (median
house income) that is explained by the independent variable (hs graduation rate).

2)
a)

There is a moderate strength correlation between most of the variables, with the strongest being percent unemployed and
percent excessive drinking (-0.6986), and the weakest being percent unemployed and percent uninsured (0.3722).

b)

I see that there is a positive correlation between percent uninsured and percent unemployed.

c)
After removing the data where percent uninsured >15, the correlation coefficient increased because we may have removed
the data points that deviate from the general trend in data points where percent uninsured <15, thus increasing the
correlation.

d)

From this regression test, we can conclude that there is a positive correlation between percent unemployed and the percent
adult smoker, the p value for the coefficient (0.08945) is 0.006, which is extremely small and therefore we can conclude
that β 1is statistically different from 0, meaning that there is a relationship between percent unemployed and percent adult
smoker.

Additionally, there is a negative correlation between percent unemployed and percent excessive drinking. The coefficient
is -0.247 with a p value of 0.000, which is also extremely small and we can conclude that there is a relationship between
these two variables for the same reason as above.

On the other hand, we cannot conclude that there is a relationship between percent unemployed and percent uninsured.
Although we are given a coefficient for β 1 (-0.002529), the p value for this coefficient is 0.946, which is too high for us to
reject the null that the coefficient is statistically different from 0.

We have also determined that there is a constant term for the dependent variable percent unemployed, which is 8.1794,
which in this case would be if all other variables were 0. This can be interpreted as the unemployment rate if all other
factors are 0. The p value for this coefficient is 0.000, which means that this constant term is statistically significant, and
that unemployment will still exist even if all 3 variables tested are 0.
APPENDIX I (STATA DOIT CODE)
*Steven Xiang
describe
histogram medianhouseholdincome
summarize medianhouseholdincome, detail
ci means medianhouseholdincome, level(90)
ttest medianhouseholdincome=50000, level(90)
ttest medianhouseholdincome=50000, level(95)
correlate medianhouseholdincome hsgraduationrate
scatter medianhouseholdincome hsgraduationrate
regress medianhouseholdincome hsgraduationrate
cor percentadultsmokers percentexcessivedrinking percentuninsured percentunemployed
scatter percentunemployed percentuninsured if percentuninsured <15
cor percentunemployed percentuninsured if percentuninsured <15
reg percentunemployed percentadultsmokers percentexcessivedrinking percentuninsured

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy