0% found this document useful (0 votes)
342 views10 pages

Stata Assignment - Bryson Shelist

This document summarizes the results of statistical analyses conducted in Stata on variables from a 72 observation, 19 variable dataset. Key findings include: the median household income is most commonly between $50,000-$60,000; there is a 90% confidence the median falls between $53,981-$57,685; the mean income is $55,833 with a standard deviation of $9,428; and high school graduation rate has a weak positive correlation of 0.3030 with median income and increases income by $730 for each 1% rise in graduation rate. Correlations between health variables like smoking, drinking, and insurance were also calculated.

Uploaded by

api-532083379
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
342 views10 pages

Stata Assignment - Bryson Shelist

This document summarizes the results of statistical analyses conducted in Stata on variables from a 72 observation, 19 variable dataset. Key findings include: the median household income is most commonly between $50,000-$60,000; there is a 90% confidence the median falls between $53,981-$57,685; the mean income is $55,833 with a standard deviation of $9,428; and high school graduation rate has a weak positive correlation of 0.3030 with median income and increases income by $730 for each 1% rise in graduation rate. Correlations between health variables like smoking, drinking, and insurance were also calculated.

Uploaded by

api-532083379
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Stata Assignment

Step 3: Assignment Questions 1

A. There are 72 observations and 19 variables. The describe command appears on stata as
shown below. The variables being measured are flu vaccinations, high school graduation,
and alcohol impaired driving.

B. Histogram of median household income. The most common one we see is between
50,000 and 60,000.
C. Mean 55833.25, Standard deviation 9428.312, 95th percentile 76551
D. The 90% confidence interval for the median household income is 53,981.43, 57,685.07
a. These results show us that there is a 90% confidence that the median household
income in any country will fall in that area.
b. The standard error is 1111.137
i. The standard error of the mean is a measure of the dispersion of sample
means around all the population means

E. Using the ttest command for both to test whether the population mean of Median
Household Income is equal to $50,000 using a 10% size of test and redoing the test again
to test whether the population mean of Median Household Income is equal to $55000
using a 5% size of test.
a. Our results show that in the case of the 10% size, our men can only be less than
$50,000
b. Our results show that in the case of the 5% size, with income $55,000 there was a
0.7721 chance that the mean was less that $55,000 and 0.4558 chance that the
mean would actually equal $55,000 and a 0.2279 chance that the mean was larger
than the income of the $55,000 at a 5% size.

F. The correlation coefficient between median household income and high school
graduation rate is shown in the screenshot below from stata which equals to 0.3030
a. This is a very weak correlation but still positive
b. The rate is 0.3030

G. Use a scatter command to graph the scatter plot between high school graduation rate and
median household income. The most populated area of dots is around 90% and 95%
graduation rate and between 50,000 and 60,000 median household income.
a. The correlation coefficient 0.3030 which is considered a weak correlation
H. The screenshot provided shows us how the independent variable is related to the constant.
It shows that the median household income increases around 730 dollars when there is a
1$ increase in highschool graduation rates. This is statistically significant because the
0.010 is less than the significant level of 0.050. For this test, the significance level is at
%5 so we test the 95% interval for the t statistics
a. T statistics: 2.66 and -0.43
b. P statistics
i. The P values allows us to reject the null hypothesis
c. To understand the r-squared statistic we need to check the r-squared values that
are 0.0918
i. The r-squared value tells us that the variance in the dependent variable can
be explained by the independent variable.
d. The adjusted value of 0.0789 to clearly judge the fitment of the r-squared values
e. In conclusion, high school graduation rate does affect median household income

Step 4: Assignment Question 2


Investigate the relationship between smoking (Percent Adult Smokers), drinking (Percent
Excessive Drinking), uninsured (Percent Uninsured), and unemployment (Percent Unemployed).
A. The correlation between variables 1, 2, 3, and 4 are shown in the screenshots.
a. There is a very weak correlation here between adult smokers and excessive
drinking
b. There is a less weak positive correlation between adult smokers and uninsured
c. There is a weak to moderate correlation between adult smokers and people
unemployed.
d. There is a weak negative correlation between excessive drinking and uninsured
e. There is a strong negative correlation between excessive drinking and
unemployed
f. There is a weak positive correlation between uninsured and unemployed

B. Scatterplot between the unemployed and the uninsured


a. Yes I see something a little unusual about the graph
i. The correlation and concentration of data all below 15
ii. Some of the data seems to be accurate but it would make a lot more sense
to change and redo the sample with a different constraint
b. The second part of the screenshot shows us lower ranges of values to not include
the singular outlier above 15 percent of persons uninsured. This makes it easier to
understand the data as it shows us more and widens the range we see
C. If we remove all observations that have percent uninsured above 15, the correlation
between uninsured and unemployment becomes 0.5434 from 0.3722
a. It changes the amount of the data we can observe while also changing the
constraint and the correlation is more positive
D. Estimate the impact of smoking, drinking, and being uninsured on unemployment. Using
the reg command and take unemployment as the dependent variable and the other
variables as your independent variables

a. The correlation coefficient measures how strong the relationship is between


multiple variables
b. The percent of uninsured is negative which means it is not correlated
c. The percent of excess drinking is weaker and more negatively correlated
d. Adult smokers is positive but still not correlated
e. With all this new data we see that with an increase of 1 in the percentage of adult
smokers causes a 0.0894579 increase in persons unemployed
i. We also see with an increase of 1 percent increase in people excessive
drinking, makes the unemployment -0.0025289
ii. With more adult smokers, the t-statistic is larger than the p-value which
tells us that there is more unemployment
iii. With more excessive drinking, the t-statistic is smaller than the p-value
which tells us that there is less unemployment
iv. With lower percent insured, the t-statistic is is again smaller than the
p-value which tells us that there is again probably less unemployment
Do-File:
*(Bryson Shelist)
describe
*(72 observations and 19 variables)
histogram medianhouseholdincome
*(results: (bin=8, start=36936, width=6020.25))
summarize medianhouseholdincome
*(mean: 55833.25, Standard Deviation: 9428.312)
summarize medianhouseholdincome, detail
*(the 95th percentile result was: 76551, largest: 82627,)
ci means medianhouseholdincome, level(90)
*(standard error is 1111.137)
ttest medianhouseholdincome=50000, level(90)
ttest medianhouseholdincome=55000, level(95)
correlate hsgraduationrate medianhouseholdincome
*(0.3030)
scatter hsgraduationrate medianhouseholdincome
regress medianhouseholdincome hsgraduationrate
cor percentadultsmokers percentexcessivedrinking percentuninsured percentunemployed
scatter percentunemployed percentuninsured
scatter percentunemployed percentuninsured if percentuninsured <15
cor percentunemployed percentuninsured if percentuninsured <15
*(correlation: 0.5434)
reg percentunemployed percentuninsured percentexcessivedrinking percentadultsmokers

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy