Jann 2021 Ebalfit
Jann 2021 Ebalfit
Social Sciences
Department of Social Sciences
Ben Jann
http://ideas.repec.org/p/bss/wpaper/39.html
http://econpapers.repec.org/paper/bsswpaper/39.htm
1 Introduction
The goal of entropy balancing, a procedure made popular by Hainmueller (2012), is to
find a vector of weights that balances the data between two subsamples with respect
to specific moments (e.g. the means and variances of a given set of covariates). For
example, in order to estimate an “average treatment e↵ect on the treated” (atet) from
observational data we might want to reweight a “control group” such that the means of
observed pre-treatment variables match the means of these variables in the “treatment
group”. Entropy balancing thus provides an alternative to other reweighting techniques
commonly used in the treatment e↵ects literature, such as inverse probability weighting
(ipw) or matching (see, e.g., Imbens and Wooldridge 2009 for an overview), some of
which are implemented in Stata’s teffects command ([TE] te↵ects). An advantage
of entropy balancing over classic ipw or matching is that it leads to perfect balance (if
perfect balance is possible given the degree to which the common support assumption
is violated); classic ipw and matching typically balance the data only approximately
(unless the balancing problem is very simple). Perfect balance means that modeling the
outcome (e.g. using regression adjustment) after the data have been balanced will lead
to no refinements in the treatment e↵ect estimate, implying that entropy balancing has
the “doubly robust” property (also see Zhao and Percival 2017).
Entropy balancing can also be useful for other types of applications. For example,
we may employ entropy balancing to construct weights for population surveys, say, by
1
2 Entropy balancing as an estimation command
1. See command ebalance by Hainmueller and Xu (2011, 2013). Note that entropy balancing can also
be performed by command psweight by Kranker (2019), a command that implements “covariate-
balancing propensity score” (cbps) estimation as proposed by Imai and Ratkovic (2014). Entropy
balancing is formally equivalent to just-identified cbps, leading to the same coefficients and the
same balancing weights.
Ben Jann 3
that S and R do not need to be disjoint nor exhaustive (for example, the two samples
may overlap). Each observation has a base weight wi (e.g. a sampling weight based
PN
on the survey design) and a k ⇥ 1 vector xi of data. Furthermore, W = i=1 wi is
PN P
the sum of weights across the joint sample; WS = i=1 Si wi = i2S wi and WR =
PN P
i=1 Ri wi = i2R wi are the sums of weights in the primary sample and the reference
sample, respectively.
P
Given the target sum of weights ⌧ˆ = WR = i2R wi P (i.e. the size of the reference
sample) and the k ⇥ 1 vector of target moments µ̂ = W1R i2R wi xi (i.e. the means of
the data in the reference sample), entropy balancing looks for an estimate of ( 0 , ↵)0
such that
1X X
!ˆ i xi = µ̂ and !ˆ i = ⌧ˆ with ! ˆ i = wi exp(x0i ˆ + ↵ˆ) (1)
⌧ˆ
i2S i2S
Note that ↵ is just a normalizing constant ensuring that the sum of balancing weights
is equal to ⌧ˆ. We could also set the target sum to some other (strictly positive) value,
say, 1 or WS . This would only a↵ect ↵, but not .
Let = (µ0 , ⌧, 0 , ↵)0 be the complete vector of estimates involved in the entropy
balancing problem. Rearranging the above formulas for the di↵erent elements in we
can express the model as a system of moment equations given as
2 µ 3 2 3
hi ( ) Ri (xi µ)
1 X
N
6 h⌧i ( ) 7 6 W Ri ⌧ 7
wi hi ( ) = 0 with hi ( ) = 6 7=6
6 0
7
7 (2)
W 4h ( )5 4 i ⇣ S exp(x i + ↵)(x i µ)
⌘5
i=1 i
⌧
h↵
i ( ) Si exp(x0i + ↵) WS
Following the approach outlined in Jann (2020b), the influence function for ˆ can thus
be obtained as
N
1 X @hi ( )
ifiˆ = G 1
hi (ˆ ) where G= wi (3)
W i=1 @ 0 =ˆ
Furthermore, applying rules for the inversion of a block matrix we can write
" # ⇣ ⌘ 1
1 1
G G↵ A d G G↵ A= G G↵ G↵ /G↵
= with
G↵ G↵ G↵ A/G↵ d d = G↵ 1
G↵ (G ) 1G
↵
⌧ˆ ⌧ˆ
If balance is achieved, then Gµ = W Ik , G↵ = 0, and G↵ = W such that the influence
functions simplify to
✓ ◆
ˆ ⌧ˆ µ
ifi = (G ) 1 hi (ˆ ) hi (ˆ ) (8)
WR
✓ ◆
W 1 ↵ ˆ
if↵
ˆ
i = h ↵
i (ˆ ) h ⌧
(ˆ ) G if (9)
⌧ˆ W i i
In the current setup, note that ⌧ˆ/WR = 1, but we may wish to normalize the weights
using some other value for ⌧ˆ, in which case ⌧ˆ/WR would no longer be equal to 1.
For example, we Pmay set ⌧ˆ to the sum of base weights in the primary sample, that
is, ⌧ˆ = WS = i2S wi . In this case, use h⌧i ( ) = W Si ⌧ in (7) or (9) instead of
h⌧i ( ) = W Ri ⌧ . Alternatively, we may want to set ⌧ to some fixed value, such as
⌧ = 1. In this case, h⌧i ( ) = 0. Yet, an advantage of using ⌧ˆ = WR is that, in this case,
p̂i = exp(x0i ˆ + ↵
ˆ )/(1 + exp(x0i ˆ + ↵
ˆ )) can be interpreted as a propensity score, that is,
as an estimate of the conditional probability of belonging to R rather than S given xi .
In general, it seems justifiable to assume ⌧ as fixed even when it is set to sample
quantities such as WR or WS . First, the moment condition for ⌧ will only a↵ect the in-
fluence function of ↵
ˆ , which is typically only of minor interest (for example, the influence
function of ↵
ˆ is typically not needed when correcting the standard errors of statistics
computed from the reweighted data). Second, also for the influence function of ↵ ˆ the
bias introduced by assuming ⌧ as fixed will typically be small. This is why command
ebalfit discussed below will treat ⌧ as fixed when computing influence functions and
standard errors.
with
Gµ 0
G= (16)
Gµ G
N
" #
1 X R i Ik ⇣ 0 ⌘
= wi P
W i=1 Si ⌦1ˆ exp(x0i ˆ)Ik hi (ˆ ) x0i i2S wi ⌦1ˆ exp(x0i ˆ)x0i
2.4 Estimation
We could use gmm ([R] gmm) to estimate the entropy balancing coefficients based on
the moment equations provided in section 2.1. However, given that ↵ is simply a
normalization constant, it may be more convenient to first run an optimization algorithm
to fit ˆ and then determine ↵ˆ as
!
X
0ˆ
↵
ˆ = ln(⌧ ) ln wi exp(x ) i (19)
i2S
as discussed in section 2.3. This ensures that the sum of balancing weights will always
match the target sum of weights. Furthermore, in the two-sample case, the complexity
of the estimation can be reduced by computing the target means µ and the target sum
of weights ⌧ upfront instead of including them in a joint optimization problem.
To obtain an estimate for , we can run a standard Newton-Raphson algorithm that
minimizes
! !
X X
L! = ln wi exp((xi µ)0 ) = ln !
˜i where !˜ i = wi exp((xi µ)0 )
i2S i2S
(20)
with respect to (also see Hainmueller 2012). The vector of first derivatives of L! (the
gradient vector) and the matrix of second derivatives (the Hessian), which are required
by the Newton-Raphson procedure, are given as
1 X 1 X
g= P !
˜ i (xi µ) and H= P !
˜ i (xi µ)(xi µ)0 (21)
i2S !
˜i i2S !
˜i
i2S i2S
!
˜ i = wi exp((xi µ)0 c) where c = max((xi µ)0 ) (22)
and redefine L! as !
X
L! = ln !
˜i +c (23)
i2S
Furthermore, instead of using L! , one may also determine convergence based on a loss
criterion that is directly defined in terms of achieved balance, while still employing the
gradient vector and Hessian given in (21) for updating . For example, we could use
the maximum absolute di↵erence
is equal to the di↵erence between the means of the reweighted data and the target values
µ, given the current current values of . That is, g quantifies for each variable how well
the balancing has been achieved up to that point in the algorithm.
Practical experience indicates that using one of these balancing loss criteria instead
of L! makes the algorithm more robust in situations where perfect balance is not pos-
sible. However, as the optimization criterion is no longer fully consistent with the used
gradient and Hessian, the algorithm profits from some standardization of the data (so
that the di↵erent variables have similar scales). For example, we may obtain the stan-
dard deviations
s
1 X 1 X
S = wi (xi xS )2 with xS = wi xi (28)
WS WS
i2S i2S
from the primary sample and then use xi / S and µ/ S instead of xi and µ in equations
(20) to (27). Before computing ↵
ˆ in (19), back-transform the resulting estimate for
by dividing it by S .
Furthermore, as usual, collinear terms have to be excluded from estimation. These
terms, however, are relevant for the evaluation of final quality of the achieved balancing
(collinear terms may remain unbalanced). My suggestion thus is to use xnc i , a vari-
ant of xi without elements that are collinear in S, for estimation of (with elements
corresponding to collinear terms set to 0) and then evaluate the final fit based on the
complete data by applying one of the above loss functions to
1 X
ĝ = P !
ˆ i xi µ with ! ˆ i = wi exp(x0i ˆ + ↵
ˆ) (29)
i2S !
ˆi
i2S
Complex survey design such as clustering or stratification can be taken into account
by appropriately modifying the aggregation. In practice, variance estimates can be
obtained by applying command [R] total to i , possibly including the [SVY] svy prefix.
Technical note
That is, for observations within the reweighted sample, ! ˆ i is equal to the balancing
weight, for all other observations, !
ˆ i is equal to the base weight. Most estimators can
be expressed as a system of moment equations
N
X
1
PN ˆ i h✓i (✓) = 0
! (35)
i=1 !
ˆi i=1
Ben Jann 9
ˆ i does not appear in h✓i (✓). For such estimators, the necessary correction to
such that !
take account of the uncertainty imposed by the estimation of the balancing weights has
a very simple form. Re-expressing the system as
N ✓ ◆ N
1 X ˆi ✓ 1 X
wi h (✓) = 0 with c = !
ˆi (36)
W i=1 c i W i=1
ˆ
e ✓i is the influence function of ✓ˆ assuming the weights !
where if ˆ i as fixed. Since Gi↵ef = 0
by definition, the corrected influence function simplifies to
ˆ ˆi ✓ˆ ˆ
if✓i = e
if Gief if (39)
c i
To summarize, we can fist compute the influence function for ✓ˆ in the usual way, as if
balancing weights were fixed, and then adjust the influence function using equation (39).
Naturally, we need a way to obtain the (unadjusted) influence function of our estimator
in the first place, but in many cases this is not very di↵cult (for example, see Jann 2020b
for practical instruction on how to obtain influence functions for maximum-likelihood
models given the results returned by Stata).2
3 Stata implementation
Command ebalfit, available from the ssc Archive, implements the methods described
above. To install the command on your system, type
. ssc install ebalfit
The heavy lifting is done by Mata function mm ebalance() that is provided as part of
the moremata library (Jann 2005), also available from the ssc Archive. To be able to
run ebalfit, the latest update of moremata is required. To install moremata, type
. ssc install moremata, replace
2. In the above derivation I assumed c, which depends on the relative size of the reweighted group
(i.e. the sum of balancing weights) with respect to the size (sum of base weights) of the rest of
the data, to be fixed. This is valid as long as the statistic conditions on Si such that the sum of
balancing weights does not matter or if ⌧ = WS such that c is always equal to 1. In other cases the
true correction would be more complicated, but the bias introduced by assuming c as fixed should
be negligible in most situations.
10 Entropy balancing as an estimation command
The exposition below focuses on Stata command ebalfit and does not provide details
on Mata function mm ebalance(). Users interested in applying mm ebalance() directly
can type help mata mm ebalfit() after installation to view its documentation.
3.1 Syntax
Syntax 1: adjust a subsample to values from another subsample (two-sample balancing)
⇥ ⇤ ⇥ ⇤ ⇥ ⇤ ⇥ ⇤
ebalfit varlist if in weight , by(varname) options
Replay results
⇥ ⇤
ebalfit , reporting options
3.2 Options
Main
where popsize is the size of the population and numlist provides the population
averages of the variables. numlist must contain one value for each variable. If
popsize is omitted, it will be set to the sum of weights in the sample.
tau(spec) specifies a custom target sum of weights for the balancing weights within the
reweighted sample. spec may either be real number (# > 0) or one of Wref (sum
of base weights in the reference sample), W (sum of base weights in the reweighted
sample), Nref (number of rows the reference sample), or N (number of rows the
reweighted sample). The default is Wref.
scales(spec) determines the scales to be used for standardization during estimation
(unless nostd is specified) and for computation of standardized di↵erences in the
balancing table. spec may either be a numlist containing custom values (one for
each term in the model; the values must be positive) or, alternatively, main (use
standard deviations from the main sample), reference (use standard deviations
from the reference sample), average (use standard deviations averaged between the
two samples), waverage (use standard deviations averaged between the two samples,
weighted by sample size), pooled (use standard deviations from the pooled sample).
reference, average, waverage, and pooled are only allowed in syntax 1. Standard
deviations are computed using population formulas (division by N rather than N 1).
Scales equal to 0 will be reset to 1. The default is main.
targets(options) specifies the types of moments to be balanced. options are:
12 Entropy balancing as an estimation command
to balance the means of hours and tenure, the covariance between hours and
tenure, the proportions of the levels of south, as well as the averages of tenure
within levels of south (see [U] 11.4.3 Factor variables for details on notation).
That is, you can use custom interactions as an alternative to option targets() if you
want to have more control over the exact configuration of moments to be balanced.
⇥ ⇤
no adjust(numlist) selects the terms to be balanced. Use this option if you want
to construct weights such that only a subset of terms is adjusted, while keeping the
others fixed. numlist provides the indices of the relevant terms. For example, in a
model with three variables, to adjust the means of the first two variables and keep the
mean of the third variable fixed, type adjust(1 2) or, equivalently, noadjust(3).
Keeping terms fixed leads to di↵erent results than excluding the terms from the
model.
Reporting
level(#) specifies the confidence level, as a percentage, for confidence intervals. The
default is level(95) or as set by set level (see [R] level).
noheader suppresses the display of the header.
nowtable suppress the display of the summary table of balancing weights.
notable suppresses the display of the coefficient table.
display options are standard reporting options to be applied to the coefficient table,
such as eform, cformat(), or coeflegend; see [R] eform option and the Reporting
options in [R] Estimation options.
Ben Jann 13
baltab displays a balancing table in addition to the table of coefficients. The balancing
table contains for each term the target value, the unbalanced value, the standardized
di↵erence between the target value and the unbalanced value, the balanced value,
and the standardized di↵erence between the target value and the balanced value.
VCE/SE
vce(vcetype) determines how standard errors are computed. vcetype may be:
robust
cluster clustvar
none
vce(robust), the default, computes standard errors based on influence functions.
Likewise, vce(cluster clustvar ) computes standard errors based on influence func-
tion allowing for intragroup correlation, where clustvar specifies to which group each
observation belongs. vce(none) omits the computation of standard errors.
cluster(clustvar ) can be used as a synonym for vce(cluster clustvar ).
nose omits the computation of standard errors. Use this option to save computer time.
nose is a synonym for vce(none).
Generate
Optimization
btolerance(#) sets the balancing tolerance. Balance is achieved if the balancing loss
is smaller than the balancing tolerance. The default is btolerance(1e-6).
ltype(ltype) sets the type of loss function to be used to evaluate balancing. ltype can
be reldif (maximum relative di↵erence), absdif (maximum absolute di↵erence),
or norm (norm of di↵erences). The default is reldif .
etype(etype) selects the evaluator to be used to fit the coefficients. etype can be
bl (evaluator based on the balancing loss), wl (evaluator based on distribution of
weights, i.e. criterion L! from equation 20), mm (method of moments evaluator),
or mma (method of moments evaluator including the intercept). The default is bl.
Irrespective of the choice of evaluator, balancing loss will be used to evaluate the
final fit.
iterate(#) specifies the maximum number of iterations. Error will be returned if
convergence is not reached within the specified maximum number of iterations. The
default is as set by set maxiter ([R] set iter).
ptolerance(#) specifies the convergence tolerance for the coefficient vector. Conver-
gence is reached if ptolerance() or vtolerance() is satisfied. See [M–5] optimize()
for details. The default is ptolerance(1e-6).
vtolerance(#) specifies the convergence tolerance for the balancing loss. Conver-
gence is reached if ptolerance() or vtolerance() is satisfied. See [M–5] opti-
mize() for details. The default is vtolerance(1e-7) in case of etype(bl) and
vtolerance(1e-10) else.
difficult uses a di↵erent stepping algorithm in nonconcave regions. See the singular
h methods in [M–5] optimize() and the description of the difficult option in
[R] Maximize.
nostd omits standardization of the data during estimation. Specifying nostd is not
recommended.
nolog suppresses the display of progress information.
relax causes ebalfit to proceed even if convergence or balance is not achieved. ebalfit
uses formulas assuming balance when computing influence functions and standard
errors. The stored influence functions and reported standard errors will be invalid
if balance has not been achieved.
nowarn suppresses any “convergence not achieved” or “balance not achieved” messages.
This is only relevant if option relax has been specified.
4 Examples
4.1 Balancing two samples
Consider the data from LaLonde (1986), provided by Dehejia and Wahba (1999) at
http://users.nber.org/ rdehejia/nswdata.html. The following code combines a subset of
the treatment group from the nsw training program with one of the psid comparison
groups.
The focus of the LaLonde data lies on the comparison of re78 (real earnings in 1978
after the program intervention) between the (experimental) treatment group and the
(non-experimental) control group. The comparison is not straight forward as there are
substantial di↵erences between the two groups in terms of pre-treatment characteristics.
Members of the treatment group are younger, more often black, less often married, more
often without college degree, and have lower pre-treatment earnings than members of
the control group:
treat
0 1
Various techniques such as matching or inverse probability weighting (ipw) have been
16 Entropy balancing as an estimation command
proposed in the literature to address the problem of making the groups comparable
such that the average e↵ect of program participation (the atet) can be estimated
consistently. Inverse probability weights, for example, could be obtained as follows:
. logit treat age-re75 [pw=w0], nolog
Logistic regression Number of obs = 438
Wald chi2(8) = 93.08
Prob > chi2 = 0.0000
Log pseudolikelihood = -159.20379 Pseudo R2 = 0.4663
Robust
treat Coefficient std. err. z P>|z| [95% conf. interval]
treat
0 1
This worked quite well and much of the group di↵erences disappeared, but there are still
some non-negligible discrepancies, especially with respect to pre-treatment earnings. We
can now try to improve the reweighting using entropy balancing:
. ebalfit age-re75 [pw=w0], by(treat)
Iteration 0: balancing loss = .88095577
Iteration 1: balancing loss = .20574871
Iteration 2: balancing loss = .11227971
Iteration 3: balancing loss = .01088361
Iteration 4: balancing loss = .00056568
Iteration 5: balancing loss = 1.833e-06
Iteration 6: balancing loss = 1.884e-11
Iteration 7: balancing loss = 9.108e-17
Ben Jann 17
Robust
Coefficient std. err. z P>|z| [95% conf. interval]
Option by() identifies the groups to be compared; the specified variable must be di-
chotomous (e.g. 0 and 1). By default, ebalfit takes the group with the lower value as
the group to be reweighted and takes the other group as the reference group. Specify
option swap to switch the groups.
The coefficients displayed by ebalfit are similar to the coefficients of the logit
model above. In fact, the coefficients do have a similar interpretation: a positive e↵ect
means that people with high values on the respective variable tend to be overrepresented
in the reference group (and vice versa).
The output contains some more information that is relevant. For example, the
“balancing loss” is a measure of how well ebalfit managed to balance the data. In the
current situation, perfect balancing could be achieved as the balancing loss is essentially
zero.3 Furthermore, some information on the distribution of the weights is provided.
cv is the coefficient of variation of the weights, defined as
q P
1
NS i2S (ˆ
!i ! S )2 1 X
cv = with ! S = !
ˆi
!S NS
i2S
where summation is across the reweighted group (NS is the number of observations
in the reweighted group); deff is the “design e↵ect” of the weights based on Kish’s
3. ebalfit returns error if perfect balance cannot be achieved, unless option relax is specified. The
critical value for “perfect balance” can be set using option btolerance(). By default, the critical
value is set to 10 6 , that is, a solution is considered as balanced if balancing loss, the maximum
relative di↵erence between the reweighted means and the target values, is smaller than 0.000001.
18 Entropy balancing as an estimation command
Both statistics indicate that there is large variation in the weights. Apparently, the two
groups are very di↵erent and balancing them is an ambitious exercise.
As mentioned, however, despite the difficulty of the problem, the output by ebalfit
tells us that perfect balance has been achieved. We can confirm that this is true by
replaying results with option baltable to displaying the balancing table that is provided
by ebalfit (but is suppressed in the output by default):
Options noheader, nowtable, and notable have been specified so that the default
output is not displayed again. As is evident, the reweighted means (column “Balanced
value”) perfectly match the target values (column “Target value”). The standardized
di↵erence between the target value and and the balanced value is essentially zero for all
variables.
If we still do not trust this result, we can use predict to generate the balancing
weights and then construct a balancing table manually:
. predict wbal
. table () (treat) [pw=wbal], stat(mean age-re75) nototal
treat
0 1
A comparison of the weights from ipw and the weights from entropy balancing reveals
that the latter contain more variation:4
. dstat (cv0) ipw wbal if treat==0
cv0 Number of obs = 253
. program DEFF
1. syntax varname [if]
2. tempvar x2
3. quietly generate `x2´ = `varlist´^2
4. summarize `x2´ `if´, meanonly
5. local NX2 = r(sum) * r(N)
6. summarize `varlist´ `if´, meanonly
7. display as res `NX2´/r(sum)^2
8. end
. DEFF ipw if treat==0
10.419239
. DEFF wbal if treat==0
12.301634
Apparently, the better balance came at the cost of more variation in the weights. Large
variation in weights generally reduces statistical efficiency so that weights with lower
variation may be preferable. As illustrated below, however, this is not necessarily true
for treatment e↵ect analyses because the degree to which the weights balance the data
also plays a role for the efficiency of the estimate. Yet, for some applications, for example
when using entropy balancing to construct sampling weights, we might want to apply
some trimming to the resulting weights to reduce the design e↵ect without sacrificing
too much precision in balance.5
c.re78@treat
0 9104.129 758.2113 7613.935 10594.32
4. I use command dstat (Jann 2020a), available from the ssc Archive, because it allows computing
the cv in the same way as ebalfit does. The cv could also be computed using [R] tabstat, which
applies a slightly di↵erent definition (division by N 1 rather than N in the variance).
5. Also see Kranker et al. (2020) who propose penalized cbps to address this issue (on cbps see
footnote 1).
20 Entropy balancing as an estimation command
However, as seen above, the two groups are very di↵erent in terms of pre-treatment
characteristics. Using ipw or entropy balancing to remove these discrepancies, the
treatment e↵ect estimate becomes positive:
c.re78@treat
0 5088.788 943.9743 3233.493 6944.082
1 6004.657 567.8919 4888.518 7120.796
. drop ipw
. mean re78 [pw=wbal], over(treat)
Mean estimation Number of obs = 438
c.re78@treat
0 4174.016 999.5839 2209.426 6138.605
1 6004.657 567.8919 4888.518 7120.796
. drop wbal
The two e↵ect estimates are not statistically significant, but note that we did not yet
correct the standard errors for the fact that the balancing weights are estimated. To do
so for the estimate based on entropy balancing, we can use the formulas provided in sec-
Ben Jann 21
tion 2.6. As inputs we need the influence functions of the entropy balancing coefficients
as well as the influence functions of the mean estimates assuming the balancing weights
as fixed. The former we can obtain by applying command predict after ebalfit; the
W
latter we can compute as ifµ̂ = W S
Si (xi µ̂) where xi is the variable of interest, Si is
an indicator for the analyzed subsample, WS is the sum of weights in the subsample,
and W is the overall sum of weights. In the computations below I omit the leading
W because this is how ebalfit defines influence functions and because it implies that
factor c in the correction formulas will be equal to 1 and can be omitted. To obtain
standard errors from influence functions that are scaled in this way command [R] total
can be used (rather than command [R] mean).
Note how total applied to the influence functions of the two mean estimates reproduces
the standard errors reported by mean above. We can now correct the influence functions
using the formulas from section 2.6. We only need to correct IFy0, the influence function
of the mean estimate in the control group because in the treatment group we did not
apply any reweighting.
. mata:
mata (type end to exit)
: // data
: grp = st_data(., "treat")
: X = st_data(., "age-re75")
: IFy0 = st_data(., "IFy0")
: IFeb = st_data(., "IFeb*")
: wbal = st_data(., "wbal")
: w0 = st_data(., "w0")
: // compute (negative of) G
: G = colsum(select(wbal :* IFy0 :* X, grp:==0))´
: // adjust IF
: st_store(., st_addvar("double", "IFy0c"), wbal :/ w0 :* IFy0 + IFeb * G)
: end
22 Entropy balancing as an estimation command
To compute the corrected standard error of the reweighted mean di↵erence take the
total of the di↵erence between the (corrected) influence functions of the two means:
. drop IF*
We see how taking account of the estimated nature of the balancing weights reduces
the standard error of the mean estimate in the control group and also brings down the
standard error of the treatment e↵ect estimate, such that the treatment e↵ect estimate
is now statistically significant (t = 1830.6/906.2 = 2.02, p = 0.043).6
As mentioned above, entropy balancing is doubly-robust so that applying a regression
adjustment model to the reweighted data does not change the estimate of the treatment
e↵ect (as long as the same covariates are used in the regression adjustment). I illustrate
this here by running [TE] te↵ects ra including the entropy balancing weights:
Robust
re78 Coefficient std. err. z P>|z| [95% conf. interval]
ATET
treat
(1 vs 0) 1830.641 905.139 2.02 0.043 56.60155 3604.681
POmean
treat
0 4174.016 749.7919 5.57 0.000 2704.45 5643.581
The estimate is still the same and also the standard error is identical even though
regression adjustment assumed the balancing weights as fixed (the small di↵erence is
6. The appendix illustrates how a similar correction can be implemented for ipw.
Ben Jann 23
Robust
Coefficient std. err. z P>|z| [95% conf. interval]
. predict wbal2
. summarize age education black if treat==0 [iw=wbal2]
Variable Obs Weight Mean Std. dev. Min Max
age 1.0000
education -0.0080 1.0000
black 0.0535 -0.0368 1.0000
. corr age education black if treat==1 [aw=wbal2]
(sum of wgt is 185)
(obs=185)
age educat~n black
age 1.0000
education -0.0080 1.0000
black 0.0535 -0.0368 1.0000
We see that means, standard deviations (and variances), as well as correlations (and
covariances) have been perfectly balanced. Alternatively, option targets() can be used
to generate the necessary terms automatically. ebalfit will then expand the variable
list accordingly, taking account of the types of the variables (e.g., no terms for variances
of categorical variables will be included as balancing the mean of a 0/1 variable also
balances its variance):
. ebalfit age education 1.black, by(treat) targets(variance covariance) ///
> nolog vsquish
Entropy balancing Number of obs = 438
Wald chi2(8) = 80.49
Prob > chi2 = 0.0000
Evaluator = bl
Main = 0.treat (253 obs) Loss type = reldif
Reference = 1.treat (185 obs) Balancing loss = 2.532e-15
balancing weights
minimum average maximum total CV DEFF
.00023148 .7312253 13.072914 185 1.7253741 3.9769159
Robust
Coefficient std. err. z P>|z| [95% conf. interval]
Manually typing out the higher-oder and interaction terms is only needed if one wants
to balance a subset of the higher moments and covariances (e.g. the covariance between
age and education, but not between black and the other variables).
Robust
Coefficient std. err. z P>|z| [95% conf. interval]
Balancing table
The balancing table illustrates that the reweighted sample data perfectly reproduces
the population values. We did not specify a population size, so ebalfit normalized the
sum of balancing weights to the sample size. Assume that the size of the population is
1.36 million. We could normalize the weights to this target sum as follows:
. ebalfit age education black hispanic, population(1.36e6: 30 10 .4 .1) nolog
Entropy balancing Number of obs = 438
Wald chi2(4) = 92.37
Prob > chi2 = 0.0000
26 Entropy balancing as an estimation command
Evaluator = bl
Loss type = reldif
Population size = 1,360,000 Balancing loss = 1.547e-14
balancing weights
minimum average maximum total CV DEFF
587.87243 3105.0228 14976.219 1360000 .68869277 1.4742977
Robust
Coefficient std. err. z P>|z| [95% conf. interval]
The target values do not necessarily need to be true values from a population. We
can also use entropy balancing to construct as-if scenarios by setting the targets to
theoretically interesting values, as long as the targets are not too far away from the
center of the data such that no balancing solution exists.
Such partial reweighting can be useful, for example, to study the “contributions”
of individual covariates to an overall group di↵erence in an outcome variable.8 In
the following example we see that the group di↵erence in 1978 earnings is reduced
substantially if the racial distribution is adjusted while keeping age and education fixed.
Additionally adjusting age and education only leads to a minor further decrease in the
di↵erence.
7. The same result could also be obtained by typing noadjust(1 2) instead of adjust(3).
8. See Fortin et al. (2011) for an overview of counterfactual decomposition methods; see Jann (2008)
for an implementation of the popular Oaxaca-Blinder decomposition in Stata.
Ben Jann 27
treat
0 1
treat
0 1
treat
0 1
5 Conclusions
Entropy balancing is a powerful alternative to other reweighting techniques such as
inverse probability weighting based on logistic regression. In this paper I defined the
model, derived influence functions for the parameters of the model, and illustrated how
consistent standard errors can be obtained for statistics based on the reweighted data.
I further presented software that implements the discussed methods.
The software provides a convenient tool for estimating balancing weights and gen-
erating influence functions, but adjusting statistical inference for reweighted statistics
still requires a good understanding of the problem at hand and some programming skills
on the side of the user. Based on the results presented in this paper, entropy balancing
could be integrated into other estimation commands as a preprocessing device, such that
reweighted estimates with consistent statistical inference would be readily available for
a variety of applications without the need to write lengthy code. Some steps in this
direction have already been taken. Command dstat, available from the ssc Archive,
o↵ers a balance() option that applies reweighting to a large collection of summary
statistics (Jann 2020a). In fact, the treatment e↵ect estimate in section 4.2 can be
replicated by dstat in a single line of code:
Balancing:
method = eb
reference = 1.treat
controls = e(balance)
Similar support for reweightig is provided in reldist, a command for relative distri-
bution analysis (Jann 2020c). Furthermore, the matching and reweighting command
kmatch (Jann 2017) supports entropy balancing, albeit without specific adjustment of
statistical inference. Since regression adjustment after entropy balancing yields consis-
tent standard errors, however, command kmatch can still be used to obtain valid results.
Here is a replication of the treatment e↵ect from section 4.2 using kmatch. The trick is
to include the full vector of covariates also in the outcome equation.
. kmatch eb treat age-re75 (re78=age-re75) [pw=w0], nomtable att
(fitting balancing weights ... done)
Entropy balancing Number of obs = 438
Balance tolerance = .00001
Treatment : treat = 1
Targets : 1
Covariates : age education black hispanic married nodegree re74 re75
RA equations: re78 = age education black hispanic married nodegree re74 re75 ...
Treatment-effects estimation
. drop score
. drop IFipw9 // drop the IF for the constant; it is not needed
. summarize re78 if treat==0 [aw=ipw], meanonly
. generate IFy0 = (treat==0) * (re78 - r(mean)) / r(sum_w)
. summarize re78 if treat==1 [aw=ipw], meanonly
. generate IFy1 = (treat==1) * (re78 - r(mean)) / r(sum_w)
. mata:
mata (type end to exit)
: // data
: grp = st_data(., "treat")
: X = st_data(., "age-re75")
: IFy0 = st_data(., "IFy0")
: IFipw = st_data(., "IFipw*")
: ipw = st_data(., "ipw")
: w0 = st_data(., "w0")
: // compute (negative of) G
: G = colsum(select(ipw :* IFy0 :* X, grp:==0))´
: // adjust IF
: st_store(., st_addvar("double", "IFy0c"), ipw :/ w0 :* IFy0 + IFipw * G)
: end
. drop IF*
The e↵ect of the correction is less pronounced than for entropy balancing. This is related
to the finding that conditioning on the estimated propensity score is more efficient
than conditioning on the true propensity score, because random imbalance is partially
removed. For entropy balancing this efficiency gain is stronger than for ipw because
entropy balancing completely removes random imbalance.
7 References
Dehejia, R. H., and S. Wahba. 1999. Causal E↵ects in Non-Experimental Studies:
Reevaluating the Evaluation of Training Programs. Journal of the American Statis-
tical Association 94(448): 1053–1062.
Hainmueller, J., and Y. Xu. 2011. ebalance: Stata module to perform Entropy reweight-
ing to create balanced samples. Statistical Software Components S457326. Available
from https://ideas.repec.org/c/boc/bocode/s457326.html.
———. 2013. ebalance: A Stata Package for Entropy Balancing. Journal of Statistical
Software 54(7): 1–18.
Imai, K., and M. Ratkovic. 2014. Covariate balancing propensity score. Journal of the
Royal Statistical Society: Series B (Statistical Methodology) 76(1): 243–263.
———. 2008. The Blinder-Oaxaca decomposition for linear regression models. The
Stata Journal 8(4): 453–479.
———. 2020c. Relative distribution analysis in Stata. University of Bern Social Sciences
Working Papers 37. Available from https://ideas.repec.org/p/bss/wpaper/37.html.
Kish, L. 1965. Survey Sampling. New York: Wiley.
Kranker, K. 2019. psweight: IPW- and CBPS-type propensity score reweighting,
with various extensions. Statistical Software Components S458657. Available from
https://ideas.repec.org/c/boc/bocode/s458657.html.
Kranker, K., L. Blue, and L. Vollmer Forrow. 2020. Improving E↵ect Estimates by Lim-
iting the Variability in Inverse Propensity Score Weights. The American Statistician
.