0% found this document useful (0 votes)
12 views4 pages

Stats Notes

Statistics lecture notes

Uploaded by

ozgur.dincer03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

Stats Notes

Statistics lecture notes

Uploaded by

ozgur.dincer03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

 Corollary: if we are sure there is nothing in the error term correlated with the slope, we

can interpret the coefficient.


 Exploratory data analysis:
o Library(GGally) : ggpairs
o ggpairs(data_set)
o for selecting variables in ggpairs:
 data_set %>% select(var1, var2, var3, …) %>% ggpairs()
o library(mosaic)
o inspect(data_set)
 Editing data:
o Library(dplyr) : select
o Newdf <- select(data_set, var1, var2, var3, …)
o Adding a new column:
 mutate(data_set, new_var = var*2 (or other function))
o filtering data:
 filter(data_set, filtering options)
 Graphing to see the relation:
o Library(ggplot2)
o Scatterplot: ggplot(data_set, aes(x = x_var, y = y_var)) + geom_point()
o Classic linear regression:
 Model <- lm(y_var ~ x_var, data = data_set)
 summary(model)
 use multiple R squared to report.
 To compare models:
o library(stargazer)
o stargazer(model1, model2, type = ‘text’, report = (‘vc*sp’))
 Omitted Variable Bias:
o only occur if x2 and y are related & x2 and x1 are related
o The sign of the bias is the product of the correlation between x2-x1 and x2-y
 Interpreting categorical variables:
o all of the analysis shown in R are interpreted against the reference category, the
one that is not shown.
 analyzing models without the effect of heteroscedasticity:
o make your model (model1)
o use summary to get R2: summary(model1)
o Use coefficient test:
 library(sandwich)
 library(lmtest)
 coeftest(model1, vcov = vcovHC, type = ‘HC1’)
 Interpreting interaction terms class 6:
o if one of the terms is a dummy variable then use scenarios or eyeball it.
o to see the result of different matchs use:
 library(margins)
 margins(model1, variables = ‘var1’, at = list(dummy = c(0, 1)))
 Quadratic Models:
o use when the ggpairs function implies a quadratic relation between variables
o Both the linear and quadratic terms being significant means that you need a
quadratic function
o model1 <- lm(y-var ~ x-var + I(x-var)^2
o Interpreting the coefficient of quadratic term:
 take partial derivative to see when the change in x-var slows down.
 Logarithmic models:
o use to make the effect on variables a unit change, and makes large outliers less
problematic.
o Effective when a variable is significantly right skewed but not zero.
o only if variables are greater than zero, use log models.
o Log-log models:
 both changes are in percentage
 interpretation: 1% change in x-var is associated with a coefficient%
change in y-var.
o Log-linear models:
 interpretation: each additional increase in x-var is associated with a
coefficient*100 % change in y-var
o linear-log models:
 interpretation: a 1% change in x-var is associated with a coefficient*0.01
change in y-var
 Logistic Regression:
o use when y-variable is a binary or categorical
o model1 <- glm(y-var~x-var, family = binomial(link = ‘logit’), data = data)
o summary(model1)
o interpretation:
 exp(coef(model1))
 for odds ratio <1 : the odds of y-var if x-var is increased by one unit is 1-
exp(coef(model1)) decrease, on average, of what they are if you maintain
the same x-var.
 for odd ratio >1: increasing the x-var by one unit would make the odds of
y-var exp(coef(model1))-1 times higher than what they would have been
if x-var did not increase, cet par.
 make predictions:
 dataset <- dataset %>% mutate(predictions = predict(model1,
type = ‘response’, dataset)
 Create scenarios:
 library(tidyr)
 scenarios <- expand_grid(x-var1 = seq(val1, val2, val3), x-var2 =
seq(val1, val2))
 scenarios <- scenarios %>% mutate(prediction = predict(model1,
scenarios, type = ‘response’))

 Fixed Effects Models:


o use when there are variables that you want to keep the effects constant for each
entity, panel data.
o EDA:
 ggplot(data_set, aes(x = x-var, y = y-var, color = categorical-var )) +
geom_line()
o Some variables do not change instantly (ea police number) get lagged data for
that:
 library(dplyr)
 data_set <- data_set %>% group_by(categorical_var) %>%
mutate( lag_var = dplyr::lag(var, order_by = ordering_var (ea time) )) %>
% ungroup()
o Pooled model: regression model with every data point in it, no grouping or
filtering.
 function : lm
o fixed effects models are used to eliminate differences between units over the
time of the study, such as differences in average income in different states
o creating the model to keep one variables constant:
 library(plm)
 model1 <- plm(y-var ~ x-var, data = data_set, index = ‘categorical_var’,
model = ‘within’)
o creating the model to keep two variables constant:
 library(plm)
 model1 <- plm(y-var ~ x-var, data = data_set, index = c(‘entity_var’,
‘time_var’), model = ‘within’, effect = ‘twoways’)
o Checking to see time variations and individual variations:
 pvar(data_set, index = c(‘entity-variable’, ‘time_variable’))
o interpreting fixed effects models:
 coeftest(model1, vcoc = vcovHC, type = ‘HC1’)
 Dif-in-Dif:
o Key: there has to be a treatment group and a control group selected at random
o parallel trends assumption
o model1 <- lm(y-var~x-var*treatment-var+ control, data = data_set)
o interpreting dif-in-dif models:
 coeftest(model1, vcov = vcovHC, type = ‘HC1’)
 Regression Discontinuity:
o Key: there is a threshold that determines if you are in the treatment or not.
o the threshold variable is called an assignment variable
o EDA:
 make a scatterplot
 color the treatment group
 add lines of best fit
 add vertical line at cutoff:
 data_set %>% ggplot(aes(x = x-var, y = y-var, color = (assignment
condition ea age <21))) + geom_point() + geom_smooth(method = ‘lm’,
se = FALSE) + geom_vline(xintercept = assignment value)
 if there is a shift aka discontinuity, it is an indication of this model being a
good match
o making the model:
 without centering the var:
 model1 <- lm(y-var~(assignment var) + control, data = data_set)
 with centering the var:
 model1 <- lm(y-var~I(x-var—assignment value)*(assignment
variable) + control, data = data_set)
o interpreting models:
 coeftest(model1, vcov= vcovHC, type = “HC1”)
 in both cases you interpret the key variable not the interaction term
 Instrumental variable 2SLS:
o an instrumental variable needs to be correlated with the x-var and uncorrelated
with the y-var
o use it when there is a lottery case or any other random treatment group but
there is no evidence if the treatment group actually got the treatment, is not
used to find the effect of the treatment on the y-var—non-compliance.
o creating the model:
 library(ivreg)
 model1 <- ivreg::ivreg(y-var~x-var + control1 + control2 | instrumental-
var + control1 + control2)
o interpreting results:
 summary(model1, vcov. = vcovHC)
 Good instrumental var:
 one instrumental var: weak instruments test p-value = 0
 more than one instrumental-var: sargan p-value small:
o at least one of the instruments is not exogenous
 R^2 doesn’t matter because it gets unreliable in 2SLS tests
 Graphing & Visualization
o Graphing for logistic scenarios:
 scenarios %>% ggplot(aes(x = x-var, y = prediction, color =
as.factor(categoprical-var))) + geom_point() + geom_line() +
facet_wrap(~x-var, ncol = 5)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy