Chapter 2 and 3
Chapter 2 and 3
Introduction
Chapter two:
Causal models aim to establish causal relationships between variables. These models are de-
signed to understand the effect of independent variables independent variables, while control-
ling for other factors.Casual models help researchers understand how changes in one variable
1
lead to changes in another, allowing for predictions and interventions.Non causal models focus
on describing and analyzing the associations between variables, regardless of whether these
associations imply causation.They are are used for exploratory analysis, prediction, and de-
scription of data patterns. Micro-econometrics is a specialized field focused on analyzing data,
typically in the form of cross-sectional or longitudinal panel data where units are observed
over time. The main objectives are defining statistical properties of response variables and
establishing causal relationships between variables relevant to micro-economic behavior.
Two approaches are used to investigate micro-econometric:
1.Analysis of Causal or Structural Relationships: This approach explores how interdependent
micro-economic variables causally relate to each other. By integrating economic theory into
statistical models, researchers aim to identify and quantify these relationships, providing in-
sights into underlying economic mechanisms.
2.Reduced Form Studies: These studies aim to uncover associations and correlations among
variables without specifying detailed inter-dependencies. They are empirical in nature, focusing
on identifying patterns and relationships observed within the data. Micro-econometrics plays
a crucial role in understanding individual economic decision-making and informing evidence-
based policy analysis. By applying advanced statistical techniques to data, researchers con-
tribute valuable insights to economic theory and practice, advancing our understanding of
micro-economic dynamics.
Through its rigorous analysis of data, micro-econometrics enhances our ability to address
complex economic challenges and develop informed strategies for economic policy and decision-
making.
𝑔(𝑦𝑖 , 𝑧𝑖 , 𝑢𝑖 ∣ 𝜃) = 0
2
This equation represents the reduced form of the structural model, solved for the endogenous
variables Y. The parameters of the reduced form (�) are functions of the structural parameters
(�). If the conditional distribution 𝑓(𝑧𝑖 , 𝑢𝑖 ∣ 𝜋) is known, we can express Y as:
𝑦𝑖 = 𝑔(𝑧𝑖 ∣ 𝜋) + 𝑢𝑖 = 𝐸[𝑦𝑖 ∣ 𝑧𝑖 + 𝑢𝑖
The structural model specifies which variables are causes (exogenous) and which are effects
(endogenous). Typically, response variables Y are endogenous, and explanatory variables Z
are exogenous.Micro-econometric models organize variables, specify causal relationships, and
analyze inter-dependencies among economic variables to explain and predict economic out-
comes.
2.2: Exogeneity
In case of the joint distribution of variables joint distribution of variables 𝑊 into conditional
and marginal components, highlighting the exogeneity of 𝑍 with respect to certain parameters.
By reparameterizing the model into new parameters 𝜙 and partitioning them into subsets 𝜙1
and 𝜙2, conditions for the exogeneity of 𝑍 with respect to 𝜙1 are established. The parameter of
interest,𝜆 depends on 𝜙 and necessitates specific conditions for 𝑍 to be considered exogenous:
𝜆 should only depend on 𝜙1 and 𝜙2 must be “variation free” without cross-restrictions concepts
that emerge from the joint distribution factorization include:
Weak Exogeneity:𝑍 is weakly exogenous for 𝜆 if certain conditions are met, enabling inference
based on conditional distributions. Granger Non causality: This concept denotes conditional
independence, where subsets of 𝑍 do not predict 𝑌 after conditioning on others, implying lack
of causal relationship. Strong Exogeneity: A subset of 𝑍 is strongly exogenous if it meets weak
exogeneity criteria and does not cause 𝑌 based on defined conditions. Exogeneity, considered
a strong assumption, is often justified in empirical research by external determinants, such as
government interventions, influencing exogenous variable values. These concepts underscore
the intricacies of statistical modeling.
SEM is a statistical framework used to analyze systems of equations where multiple dependent
variables are interrelated and influenced by a set of explanatory variables. The general form
of a linear SEM can be expressed as:
𝑌 = 𝐵𝑋 + 𝑈
3
But since SEM represents a linear relationship between an endogenous variable and the set
of explanatory variables,The general equation for the 𝑖 − 𝑡ℎ endogenous variable 𝑌𝑖 can be
written as:
𝑛
𝑌𝑖 = 𝛽𝑖0 + 𝛽𝑖1 𝑋1 + 𝛽𝑖2 𝑋2 ⋯ + 𝛽𝑖𝑘 𝑋𝑘 + ∑ 𝛽𝑖𝑗 𝑌𝑗 + 𝑢𝑖
𝑗=1
Where: 𝛽𝑖0 , 𝛽𝑖1 , …, 𝛽𝑖𝑘 are coefficients representing the impact of the exogenous variables X
_1,X _2,…,X _k on Y _i.� _ij are coefficients representing the impact of other endogenous
variables 𝑌𝑗 on 𝑌𝑖 u_i is the error term associated with the i-th equation.
SEM involves distinguishing between structural and reduced form disturbances. The reduced
form disturbances (V) are linear combinations of the structural disturbances (U), allowing for
the estimation of conditional means and variances based on exogenous changes in variables.
The main goal of SEM is to consistently estimate parameters (𝐵, Γ, Σ), requiring proper model
identification. Identification ensures that there is a unique set of parameters consistent with
observed data, achieved by imposing restrictions that rule out linear transformations and
equivalent structures.
Observational equivalence is crucial in defining identification. Two structural models are ob-
servationally equivalent if they imply identical joint probability distributions of variables. A
structure (𝜃0) is identified if there is no other observationally equivalent structure in the pa-
rameter space (Θ).
In SEM, identification often involves imposing restrictions like exclusion restrictions, which
specify certain variables have no impact on certain endogenous variables. The order condition,
a necessary condition for identification, states that the number of excluded exogenous variables
must be at least equal to the number of included endogenous variables. The rank condition
is a sufficient condition ensuring a unique solution for estimating parameters given certain
restrictions.
Econometric models play a crucial role in understanding how public policies and decision vari-
ables impact various outcomes. However, using observational data in such models can pose
challenges in accurately identifying causal parameters due to potential biases and confounding
factors. To address these issues, researchers often rely on data from controlled experiments,
though these can be costly to implement on a large scale. As an alternative, natural exper-
iments and quasi-experimental settings offer valuable opportunities to study causal relation-
ships in a more practical and accessible manner.
A notable development in micro-econometrics is the approach of program evaluation or treat-
ment evaluation, which provides a statistical framework known as the Rubin causal model
4
(RCM). This framework allows researchers to estimate causal parameters by comparing out-
comes between treated and non-treated individuals. The core idea is to measure the impact
of a treatment (or cause) on outcomes as the average difference between outcomes for treated
and non-treated groups. In this context, the fundamental components of program evaluation
include the triple (𝑦1𝑖 , 𝑦0𝑖 , 𝐷𝑖 ), where 𝐷𝑖 indicates whether an individual receives treatment (1)
or not (0). The impact of the treatment is quantified as 𝑦1𝑖 − 𝑦0𝑖 , representing the difference
in outcomes between the treated and non-treated states.
Furthermore, the average causal effect of treatment D=1 is calculated as the difference between
the expected outcomes for treated and non-treated individuals 𝐸[𝑦 ∣ 𝐷 = 1] − 𝐸[𝑦 ∣ 𝐷 = 0]).
This measure provides valuable insights into the average impact of a treatment on outcomes
across a population.Therefore,econometric models, particularly through program evaluation
frameworks like the Rubin causal model, offer a structured approach to estimating causal
parameters and understanding the effects of public policies and interventions on outcomes.
This methodology is essential for evidence-based decision-making in economics and policy
analysis.
In causal modeling within SEM (Structural Equation Modeling) and POM (Potential Out-
comes Modeling) frameworks, various approaches are employed to model causal relationships
and address challenges related to parameter identification. Here’s a summary of these strate-
gies:
Exogenization:
• Challenge: Omission of factors correlated with included variables can introduce con-
founding bias.
• Solution: Introduces control variables into the model to approximate the influence of
omitted variables and improve parameter identification. Creating Synthetic Samples:
• Problem: Lack of suitable benchmarks for estimating causal parameters in POM.
5
• Approach: Generates synthetic samples with comparison groups to serve as proxies for
causal inference.
Instrumental Variables:
Re-weighting Samples:
Chapter Three
The use of microeconometrics with observational data, such as survey or census data, is a com-
mon practice distinct from experimental data. Observational data have specific limitations
related to their collection methods, sample frame, design, and scope.Survey data are typically
gathered by randomly sampling a population of subjects. In this context, let’s denote 𝑆𝑡 as
a sample from the population’s probability distribution 𝐹 (𝑤𝑡 ∣ 𝜃𝑡 ), with the assumed known
form. One strong but useful assumption in this setting is the concept of a stationary population,
where 𝜃𝑡 = 𝜃 for all time t. This assumption implies that the characteristics of the population
remain unchanged over time, with the moments of these characteristics being time invari-
ant.however, some population characteristics may change over time. To address this dynamic
nature, researchers often consider that the parameters governing each population are selected
from a larger “superpopulation” that has constant characteristics. This approach allows for
acknowledging potential changes in population characteristics over time while maintaining a
constant framework for analysis.
In the context of sample design, there are different approaches beyond simple random sam-
pling, such as stratified multistage cluster sampling, which is a complex survey method. This
method involves partitioning the population into strata, primary sampling units, secondary
sampling units, and ultimately selecting the final units for interview.Unlike simple random
sampling, where every individual has an equal probability of selection, this method assigns
different probabilities to individuals, making the sample unrepresentative. To correct for this,
6
surveys often provide sampling weights to compensate for these differences in selection proba-
bilities. Departures from random sampling can lead to biased sampling, where the probability
distribution of the data differs from that of the population. Examples of departures include:
• Exogenous Sampling: Segments the available sample based solely on exogenous variables
x, independent of y and �.
• Response-based Sampling: Sampling probabilities depend on individuals’ responses, pre-
ferred for cost efficiency.
• Length-biased Sampling: Bias resulting from sampling one population to infer charac-
teristics of another.
• Censored Sampling: Occurs when the response variable is observed only for certain
individuals (e.g., those treated), limiting conclusions about treatment effects due to
missing data for untreated cases. These deviations from randomness highlight challenges
in achieving representative samples for population inference in survey data analysis.
The quality of survey data relies not only on the sample design but also on the responses
received. Surveys are typically voluntary, leading to potential issues with representativeness
due to nonresponse. If nonparticipation is random and independent of the response variable,
the sample remains unbiased. However, if nonparticipation is related to the response variable,
bias can occur.
Nonresponse may involve not answering certain questions rather than total nonparticipation.
Even when individuals respond to all questions, their answers could be inaccurate or false,
leading to measurement errors. Missing data, whether due to nonresponse or attrition in
panel data, reduces sample size and can introduce biases similar to selection bias.
In panel data settings, where individuals are observed repeatedly over time, missing data can
occur due to sample attrition—individuals who initially respond but later stop participating.
This attrition can affect the representativeness and quality of panel data analysis, influencing
the accuracy of conclusions drawn from the data. Therefore, addressing issues related to
nonresponse, missing data, and measurement errors is crucial for maintaining the validity and
reliability of survey-based research.
Cross-section data involve observing a sample 𝑆𝑡 at a specific time t, which can be used to
infer about the parameters 𝜃𝑡 assuming a stationary population. If the population is indeed
stationary, inferences made using cross-section data 𝑆𝑡 at one period t may also be valid for
other periods, suggesting some stability in population characteristics over time.Repeated cross-
section data consist of independent samples 𝑆𝑡 taken from the same population distribution
7
𝐹 (𝑤𝑡 ∣ 𝜃𝑡 ) over multiple periods. However, there is no effort to track or retain the same units
across samples, resulting in a loss of information regarding dynamic behavioral dependencies
over time. Both cross-section and repeated cross-section data are not suitable for modeling
intertemporal dependencies in outcomes.panel or longitudinal data involve selecting a sample
S and collecting observations for the same individuals over a sequence of time periods t. Key
issues with panel data include ensuring the representativeness of the panel, addressing inference
challenges if the population is not stationary (i.e., if characteristics change over time), and
managing sample attrition where individuals may drop out of the study over time, impacting
the completeness and continuity of the data.
Experimental data differ from observational data in that they are obtained within a controlled
and monitored environment. This controlled setting allows researchers to manipulate a causal
variable of interest while holding other covariates at controlled settings, thus isolating the effect
of the variable under study. In contrast, observational data are collected in uncontrolled envi-
ronments where uncontrolled confounding factors can complicate the identification of causal
relationships.
In social sciences, data analogous to experimental data often come from social experiments.
Experimental methodology in this context involves comparing outcomes between a group re-
ceiving a treatment (referred to as “experimentals”) and a control group not receiving the
treatment (“controls”). The process of assigning individuals to these groups is known as “ran-
dom assignment,” where individuals are randomly assigned to either the treatment or control
group to ensure comparability and reduce selection bias.
Key concepts in experimental methodology include:
• Experimentals: The group receiving the treatment whose outcomes are of primary inter-
est.
• Controls: The group that does not receive the treatment, used for comparison to assess
the impact of the treatment.
• Random Assignment: The process of randomly assigning individuals to either the treat-
ment or control group to ensure unbiased comparison and reduce the influence of con-
founding variables. By employing experimental designs and random assignment, re-
searchers can more confidently establish causal relationships between variables of interest,
mitigating the impact of potential confounders that might otherwise bias observational
data analyses.