Paper For Referee Report
Paper For Referee Report
May 3, 2020
Abstract
Using real estate transaction data, we provide the first empirical test of the com-
parative cheap talk model in Chakraborty and Harbaugh (2010), which predicts that
a real estate agent can credibly reveal information of a house by making comparative
statements that make the house more appealing in some dimensions but less along oth-
ers. Consist with the prediction, we find that comparative statements are associated
with a 0.8% price premium, all else equal. The premium is larger for houses with more
potential buyers, but switches sign and becomes a discount when there are too few
buyers.
1
Page 2 of 39
1 Introduction
Institutions and individuals rely heavily on the advice of experts in making decisions be-
cause of their particular information. Experts, however, often have state-independent (and
therefore biased) preferences so that they may want the decision maker to make the same
decision regardless of the state of the nature. For instance, a financial adviser has a big in-
centive to recommend the stock or option that pays him/her the most whether or not those
investments are really best for the client. A biased salesperson always wants a customer to
buy a product regardless of the quality.
The question of how the expert can credibly communicate important information for bet-
ter decision making has been intensively studied in the literature (Crawford and Sobel, 1982;
Ottaviani and Sorensen, 2006; Inderst and Ottaviani, 2009). In particular, Chakraborty and
Harbaugh (2010) prove that when information is multidimensional, a very biased expert who
has state-independent preferences can still influence decision making by making comparative
statements that help the expert on certain dimensions but hurt him on other dimensions.
In this paper, we focus on the cheap talk communication between a real estate agent who
wants to sell a house and the potential buyers, and we will test Chakraborty and Harbaugh’s
(2010) comparative cheap talk model in real estate transactions.
Empirical test of the comparative cheap talk model—or any information communication
model—is very challenging given the difficulty in observing and measuring the communi-
cation between market participants. Nevertheless, real estate transactions provide a very
attractive setting to test the impact of a comparative cheap talk by experts, for two main
reasons. First, the real estate market is characterized by heterogeneous products, unsophis-
ticated buyers/sellers and a significant role of agents. A typical real estate consumer engages
in a limited number of transactions during his or her lifetime. Due to imperfect information
on the market value of properties and the location of potential buyers, sellers often seek the
services of a real estate agent for assistance in the home selling process: 88% of buyers and
92% of sellers use a real estate agent when buying or selling their home (National Realtor
2
Page 3 of 39
Association, 2016). Since real estate agents play such an important role in real estate trans-
actions, the communication strategies used by them can potentially affect the final sales
To minimize the impacts from different property types and from different owners types, we
3
Page 4 of 39
4
Page 5 of 39
of competing houses and then re-estimate the price premium of comparative statements in
these two subsamples. And as expected, the price impact is positive for houses with few
competitions, but becomes negative for houses with many competitions.
The third dimension is along with the atypicality index of the house. Following Haurin
(1988) we calculate the atypicality index for each house and classify our full data into two
exclusive and exhaustive subsamples according to the atypical index. A house with a high
atypicality index is more likely to have unusual characteristics, and therefore may have fewer
potential buyers. Indeed, we find that the price premium of comparative statements is larger
for houses with a higher atypicality index.
This paper is related to two strands of literature. First, it is built on the theoretical
literature of communication of non-verifiable information, i.e., cheap talk. Crawford and
Sobel (1982) first find that information can be partially communicated in the equilibrium
of a cheap talk model where the information space is one dimensional and the agent has
state-dependent preferences. Chakraborty and Harbaugh (2007, 2010) further show that
informative communication can be achieved in multidimensional models even if the agent’s
preferences are state independent. More recently, Malenko and Tsoy (2019) proves that
information can be fully delivered in a dynamic setting, such as ascending-price (English)
auction. Bouvard, Chaigneau and Motta (2015) build a stylized model to study the optimal
level of information disclosure by regulators of the finance system. This paper is the first,
to our best knowledge, to provide empirical evidence of cheap talk theories.
Second, the paper contributes to the empirical literature on information asymmetry in
the real estate market. For instance, Garmaise and Moskowitz (2004) focus on the informa-
tion asymmetry between seller and buyers in the commercial real estate market. By using
property tax assessment quality as a measure of information asymmetry, they find strong
evidence of information asymmetry. Rutherford, Springer and Yavas (2005) and Levitt and
Syverson (2008) study the information asymmetry between real estate agents and their seller
clients. By using the MLS data, they find agents sell their clients’ houses cheaper and faster
5
Page 6 of 39
than their own houses, and interpret this finding as evidence of the informational disadvan-
tage of the sellers relative to agents. More recently, Agarwal, He, Sing and Song (2019)
and Allen, Rutherford, Rutherford and Yavas (2019) focus on the information asymmetry
on the buy-side, between real estate agents and buyers, in the housing market. They find
that real estate agents enjoy a 2.45%-4% discount when buying houses for themselves versus
for their clients. Kurlat and Stroebel (2015) find that neighborhood characteristics provide
a significant source of information asymmetry in housing markets. In this paper, we instead
look at the information asymmetry between the listing agent and the buyer.
The next section of the paper presents the predictions of the theoretical model. Section
3 provides an overview of the data. In section 4, we present the discussion of the estimation
of the models. Section 5 shows the results of the alternative robustness check. Section
6 provides empirical results on how the impact of comparative statements varies with the
number of buyers. Section 7 summarizes the results and offers some concluding remarks.
2 The Model
The model is built heavily on Section II. C of Chakraborty and Harbaugh (2010). A real
estate agent lists a house for sale on the Multiple Listing Service (MLS) so that the listing
is viewable to all potential buyers. The agent privately knows multiple characteristics of
the house that affects the house value but is hard to be measured and quantified, such as
the quality of the handcraft and the material used to build the house and how well the
house is maintained. Without loss of generality, we focus on two characteristics denoted by
θ = (θ1 , θ2 ).
The agent can communicate information of θ to all potential buyers through costless
public remarks, m ∈ M , in the MLS listing. None of the potential buyers know the true
value of θ, but all of them have the same prior belief on the distribution of θ, which—without
loss of generality—can be assumed to be uniformly distributed on [0, 1] × [0, 1]. Indeed, even
6
Page 7 of 39
distribution functions of θ1 and θ2 , respectively. And we can treat (η1 , η2 ) as the new state
variable.
A communication strategy of the agent specifies a m ∈ M as a function of the house’s
characteristics θ.1 Each buyer estimates the expected value of θ, given his prior belief, the
agent’s communication strategy, and the agent’s remarks. The updated estimate of each
buyer can be denoted by e = E[θ|m].
Given the updated estimate, each potential buyer’s valuation of the house is vi (e) =
αi e1 + (1 − αi )e2 , where αi ∈ [0, 1] measures how much Buyer i cares about θ1 relative to θ2 .
We assume that each αi is independent and identically distributed uniformly on [0, 1], and
is independent of θ. Buyer i privately knows his/her own αi .
The listing agent is paid a commission that is a fixed percentage of the final sale price.
The agent’s utility can be written as:
u = r · P, (1)
where r is the commission rate and P is the final sale price. We assume the agent sells
the house in a first-price auction. Since each buyer’s valuation of the house is private and
independently distributed (given e1 and e2 ), according to the Revenue Equivalence Theorem,
the first-price and second-price auctions yield the same expected sale price (Menezes and
Monteiro, 2008). In addition, in the second-price auction, the Bayesian Nash equilibrium
bidding strategy of each buyer is to bid his/her valuation. Therefore, the expected sale price
of the house is:
E[P ] = E[v2:n ], (2)
where v2:n is the second highest valuation among the n potential buyers.
1
We focus on the pure strategy of the agent.
7
Page 8 of 39
To derive the formula of v2:n , we need to distinguish between two cases: (i) e1 ≥ e2 and
(ii) e1 < e2 . In particular, we have:
α
2:n · e1 + (1 − α2:n ) · e2 , if e1 ≥ e2
v2:n = (3)
αn−1:n · e1 + (1 − αn−1:n ) · e2 , if e1 < e2
where αj:n is the jth highest value of α among all the n buyers. That is, α2:n is the second
highest value of α, and αn−1:n is the second lowest, among all the n buyers.
Substituting (3) into (2) and then (2) into (1), we have:
h i
r · E[α
2:n ] · e1 + (1 − E[α2:n ]) · e2 , if e1 ≥ e2
E[u(e)] = h i (4)
r · E[αn−1:n ] · e1 + (1 − E[αn−1:n ]) · e2 , if e1 < e2
Since α follows a uniform distribution on [0, 1], E[αj:n ] = (n − j + 1)/(n + 1). Then (3)
can be rewritten as:
h i
r
n+1
· (n − 1) · e1 + 2 · e2 , if e1 ≥ e2
E[u(e)] = h i (5)
r
n+1
· 2 · e1 + (n − 1) · e2 , if e1 < e2
Note that E[u(e)] is strictly increasing in both e1 and e2 . That is, the expected sale price
increases with buyers’ estimate of both characteristics of the house. Therefore, the listing
agent has an incentive to misreport information of θ by exaggeration and puffery, which is
however not credible.
A pure-strategy perfect Bayesian equilibrium of this cheap talk game is fully specified by
the listing agent’s communication strategy. There is always a “babbling” equilibrium where
there is no communication. That is, the listing agent flatters about both characteristics of
the house and the buyers simply ignore what agent says.
There might be, however, other equilibria—what we call responsive equilibria—in which
8
Page 9 of 39
h i
r · e1 + 2e2 , if e1 ≥ e2
3
E[u(e)] = h i (6)
r · 2e1 + ·e2 , if e1 < e2
3
In this case, E[u(e)] is a concave function of e, and its indifference curves are “bowed
inward” as shown in Figure 1a. Suppose the space is partitioned by a line h— from (0, 0) to
(1, 1)— into two subspaces R+ and R−, and the listing agent indicates which region θ falls
into. The estimates e+ = E[θ|R+] and e− = E[θ|R−] fall on the same indifference curve,
implying that the listing agent has no incentive to misreport which region θ falls in.
Note that in the babbling equilibrium, where messages convey no information, buyers’
estimate is at the center point c = (1/2, 1/2) in Figure 1a. Formula (6) is an increasing and
concave function of e, implying that c is on an indifference curve that has a higher expected
sale price than the indifference curve where e+ and e− are located. That is, the equilibrium
estimates E[θ|m] lead to lower expected agent utility than E[θ]. In other words, when there
are only two buyers, there is a responsive equilibrium that has a lower expected sale price
than the babbling equilibrium.
Next, let’s look the three-buyer case where n = 3, then we have:
r
E[u(e)] = (e1 + e2 ). (7)
2
That is, the listing agent’s expected utility is increasing and linear in e. In this case, the
indifference curve of the agent’s utility is shown in Figure 1b. Again, we can draw a line
9
Page 10 of 39
h— from (0, 0) to (1, 1)— that partitions the space into two subspaces: R+ and R−. If
the listing agent indicates which region θ falls into, then the estimate e+ = E[θ|R+] and
e− = E[θ|R−] will fall on the same indifference curve, implying that the listing agent has no
incentive to misreport which region θ falls in. In addition, c, e+ and e− are all on the the
same indifference curve, implying that responsive equilibrium has the same expected utility
(sale price) as the babbling equilibrium, when there are three buyers.
When there are more than 3 buyers, i.e., n > 3, the indifference curves of (5) are as
shown in Figure 1c: since (5) is increasing and convex in e, the indifference curves are
“bowed outward.” Again, we can draw a line h— from (0, 0) to (1, 1)— that partitions the
space into two subspaces: R+ and R−. If the listing agent indicates which region θ falls
into, then the estimate e+ = E[θ|R+] and e− = E[θ|R−] will fall on the same indifference
curve, implying that the listing agent has no incentive to misreport which region θ falls in.
In addition, both e+ and e− are on an indifference curve that has a higher expected utility
than the difference curve where c is located. This implies that the responsive equilibrium
has a higher expected utility (sale price) than the babbling equilibrium, when there are more
than three buyers.
Therefore, the expected sale price increases with the number of buyers. We summarize the
above results in the following two propositions:
Proposition 2. The expected sale price in the responsive equilibrium increases with the num-
ber of buyers. In particular, the responsive equilibrium is associated with a price premium—
10
Page 11 of 39
relative to the babbling equilibrium—when there are many buyers, but a price discount when
there are too few (less than 4) buyers.
In what follows, we will test the above two propositions using the MLS data.
3 Data
We use the Multiple Listing Service (MLS) data from Indiana, that cover all residential
real estate transactions involving real estate agents in Johnson County, Indiana, from June
1 2000 to May 31, 2010. Johnson County is one of the largest counties in Indiana and is
essentially a suburb of Indianapolis in the adjacent county of Marion.
The MLS data employed are unique in several respects. First, the data contain detailed
information about each transaction, including sale price, property characteristics, contract
term, calendar information (listing and closing dates), and geographic location (school dis-
trict). Physical characteristics include, but are not limited to, various measures of the scale
of the property (number of bathrooms, size of garage, fireplace, pool, and square footage),
age of the property, siding (vinyl, stone, brick, etc.). Contract terms include the duration of
the listing contract, the buyer agent commission rate, and whether the listing agent has the
exclusive-right-to-sell, which gives the agent the right to receive the commission no matter
who brings the buyer. Calendar information is employed to generate property marketing
span—i.e., days-on-market—which is calculated as the number of days from listing date to
sold date.
Second, the MLS data contain the listing agent’s public remarks for each listing. By
searching for the positive and negative words in the public remarks, we can identify the
usage of comparative statements. We follow the literature in defining positive and negative
words, which are shown in Table 1. (Goodwin, Waller and Weeks, 2014 & 2018; Haag,
Rutherford and Thomson, 2000.) We say the comparative communication strategy is used
if at least one positive word and at least one negative word are used in the public remark.
11
Page 12 of 39
The original data have 18,895 observations/transactions. To minimize the impacts from
different property types and from different owners types, we focus on single family houses
owned by non-agent individuals, by omitting condominiums and by omitting agent-owned,
bank-owned and government-owned houses. We also drop foreclosure sales and short sales, to
minimize the impact of motivated sellers. Finally, we discard observations with missing val-
ues and clearly erroneous data (zero bedroom, zero bathroom, less than 300 square footage,
etc.). The final data have 14,285 transactions, among which 6,152 (43%) are associated with
comparative statements. This result suggests that the responsive equilibrium where agents
use comparative statements is very common in the real estate market.
Table 1 provides summary statistics for transactions where comparative statements are
used and for the other transactions, respectively. While measures of the property size (e.g.,
number of bedrooms, square footage of the living space, and lot size) show no systematical
differences across the two types of transaction, non-size-related characteristics indeed vary
systematically. For instance, houses sold with comparative statements are more likely to have
a fireplace, pool and basement, but are less likely to be newly constructed. These systematic
differences in the observables highlight the importance of controls in the following analysis
we conduct.
ments
We estimate the impact of comparative statements on the house sale price with the following
hedonic model:
log(Sale P rice) = α + β ∗ Cheaptalk + γ ∗ X + . (9)
The dependent variable is the logged sale price, log(Sale P rice). X is a vector of property
and transactional information. The key independent variable is the indicator variable—
12
Page 13 of 39
Cheaptalk—which equals 1 if the listing agent used comparative statements in the public
remark of the listing, and 0 otherwise.
We estimate the specification in Equation (9) and report the estimated coefficients, the
robust standard errors, and the significance levels in Table 3. Each column in Table 3
represents a separate regression, with the specification gradually saturated from left to right
as the set of control variables expand.
In Column (1), we regress the logged sale price on the Cheaptalk indicator variable
and house characteristics. The house characteristic information includes the number of
bedrooms, number of bathrooms, size of garage, et al. After controlling for these variables,
we note the price impact of comparative statements is 1.73% and statistically significant.
Also noteworthy, house characteristic information can explain almost 80% of the variation in
the sale price, suggesting the presence of important house characteristic differences affecting
sale prices.
In Column (2), we control for geographic and calendar information of the transaction.
In particular, we control for the fixed effects of the school district and month and year of
sale. The price impact of comparative statements reduces slightly to 1.37% and remains
significant at the 1% level.
In Column (3), we control for measures of agent’s effort, such as the number of images
of the house put online, whether the agent runs any open house, and whether the agent
provides a virtual tour online. The price impact of comparative statements increases slightly
to 1.43% and remains significant at the 1% level.
Much of the agent’s effort is actually unobservable and affected by the contract terms.
For instance, agents with the exclusive right to sell will often have stronger incentive to exert
effort than the other agents do. In Column (4), we control for the listing contractual terms,
including duration of the listing contract, whether the agent has the exclusive right to sell,
and buyer agent commission rate. After controlling for these contractual terms, the price
impact of comparative statements reduces to 1.24% and remains significant at the 1% level.
13
Page 14 of 39
14
Page 15 of 39
In this section, we check the robustness of our main results by studying the following potential
issues: (i) Endogeneity of comparative cheap talk, (ii) endogeneity of days-on-market, (iii)
sample selection bias, and (iv) model misspecification. The results are reported in Table 4.
Comparative cheap talk may be an endogenous decision affected by many factors. If there
are unobservable factors that affect both the usage of comparative statements and the house
sale price, then our previous estimations may be biased. Instrument variables are usually
used to solve the endogeneity issue. However, it is often hard to find a valid instrument
variable that affects the usage of comparative statements but not the final sale price.
In this subsection, we instead take two alternative approaches to study the impact of
this potential endogeneity. Results from both approaches suggest that our main results are
robust to the endogeneity of usage of comparative statements.
In the first approach, we build and estimate an endogenous switching regression model.
Suppose that the sale price is determined by two different equations for two possible regimes
(i.e., comparative cheap talk and babbling cheap talk), and selection into one regime is
endogenously determined. The model comprises three equations as follows:
0, if Cheaptalki∗ = µ + λ0 Zi + ξi ≤ 0,
Cheaptalki = (10)
1, if Cheaptalki∗ = µ + λ0 Zi + ξi > 0,
15
Page 16 of 39
Equation (10) is the selection equation, where Cheaptalki is the indicator variable of com-
parative cheap talk and Cheaptalki∗ is the corresponding latent variable. Zi is a vector of
observables that affect the usage of comparative cheap talk, and is assumed to be the same
as Xi , except that the calendar information in Zi is related to the time when the house is
To estimate the endogenous switching regression model, we need the following distribu-
tional assumptions:
Corr(0 , ξ) 6= 0, (16)
Corr(1 , ξ) 6= 0, (17)
where Corr stands for correlation coefficient. Equations (16) and (17) suggest that the error
terms of the outcome equations are correlated with the error term in the selection equation.
That is, the selection into comparative cheap talk is endogenous.
We implement the maximum likelihood estimation of the endogenous switching regression
model, (10)-(17). Then for each observation, we estimate the fitted values of the log sale
16
Page 17 of 39
\
log(sale price0 i ) = α̂0 + γ̂00 Xi + E[0 i |Cheaptalki = 0], (18)
\
log(sale price1 i ) = α̂1 + γ̂10 Xi + E[1 i |Cheaptalki = 1]. (19)
Finally, we calculate the average difference in the fitted values of the log sale price across
all observations: PN h i
\ \
log(sale price1 i ) − log(sale price0 i )
i
τ= . (20)
N
The estimated coefficient of τ is reported in Column (1) of Table 4, and can be interpreted
as a treatment effect of comparative cheap talk on the sale price. The result from the
endogenous switching regression model is consistent with our main results: comparative
cheap talk is associated with a 0.50% price premium, holding everything else constant, and
the premium is significant at the 1% level.
Adoption of comparative cheap talk may be based on unobservables. While we have con-
trolled for a detailed set of factors in the estimations, it is possible that a small amount of
selection on unobservables could explain much of the estimated effect of comparative cheap
talk. We now explore this possibility by using the relationship between comparative cheap
talk and the observables to make inferences about the relationship between selection on the
observables and selection on the unobservables.
We take the approach provided by Altonji et al. (2005 & 2008). This technique estimates
the relative amount of selection of unobservables required to explain the estimated compar-
ative cheap talk effect if the true effect is zero (i.e., the null hypothesis). This technique
17
Page 18 of 39
E(|Cheaptalk = 1) − E(|Cheaptalk = 0)
V ar()
E(γ ∗ X|Cheaptalk = 1) − E(γ ∗ X|Cheaptalk = 0)
= . (21)
V ar(γ ∗ X)
The left-hand side of (21) represents selection of unobservables and the right-hand side
represents selection of observables. This condition assumes that the use of comparative
cheap talk relies on unobservables to the same extent as observables. Note that all items in
(21) can be estimated from the data, except for E(|Cheaptalk = 1) − E(|Cheaptalk = 0).
Let ˜ be the residuals of a regression of Cheaptalk on X so that Cheaptalk = µ ∗ X + ˜.
Then substituting the last equation into (9), one gets:
That is, the bias in the estimated comparative cheap talk effect due to selection of unob-
servables is:
V ar(Cheaptalk) h i
Bias(β) = ∗ E(|Cheaptalk = 1) − E(|Cheaptalk = 0) . (23)
V ar(˜)
The fraction in Equation (23) can be estimated directly from the data, and the item in
the squared brackets can be calculated from (21). As shown in Table 5, following (23), we
estimate the bias, Bias(β), to be 0.0052. Recall that the estimated comparative cheap talk
effect is 0.0080 (Column 6, Table 3). This suggests that the selection on unobservables needs
18
Page 19 of 39
to be more than 65% of the selection on observables,2 which is very unlikely given that we
have a detailed list of observables. Therefore, we reject the null hypothesis that the effect of
comparative cheap talk on sale price is zero.
The sale price and days-on-market (DOM) of a house may be endogenously determined. If
this is true, then we have a bias due to endogeneity of the DOM. To solve this potential
endogeneity problem, In Column 2 of Table 4, we re-estimate our full specification (as in
Column 6 of Table 3) via 3-Stage Least Square (3SLS) regressions, with the first stage being
a regression of the log of listing price specified as follows:
where log(listing price) is the natural logarithm of listing price. Z is a vector of observables
that affect the listing price, and is the same as in (10). In particular, the calendar information
in Z is related to the time when the house is listed, but not the time when the house is sold,
as we believe that the listing price relies on the listing time, rather than the sold time.
\price), from this first stage regression
The residual, ˆ2 = log(listing price) − log(listing
measures how much the house is listed above its expected listing price, which we think affects
the days on market, but not necessarily the sale price. We then control for ˆ2 in the second
stage regression of days-on-market specified as follows:
19
Page 20 of 39
Column (2) of Table 4 reports the results of this 3SLS estimation. The Impact of comparative
cheap talk is estimated to be 0.79%, and is significant at the 1% level.
The outcome variable–log(sale price)–is only observable for houses being successfully sold,
which may not be representative of the entire housing market. If the subsample of sold
houses is not a random sample of the entire housing market, our previous estimators are
likely to suffer from sample selection bias.
To correct this potential bias, we employ the Bivariate Sample Selection Model, which
uses the full sample including both sold and unsold houses. The Bivariate Sample Selection
Model composes a selection equation and a outcome equation. The selection equation is as
follows:
1, if Sold∗ = λ0 Z + ξ > 0,
Sold = (26)
0, if Sold∗ = λ0 Z + ξ ≤ 0,
where Sold is the indicator variable of a house being sold. Sold∗ is the corresponding latent
variable. Z is a list of covariates that affects the probability of a house being sold, which we
assume to be the same as in (10).3 ξ is the error term that follows a normal distribution.
The outcome equation is as follows. In particular, the outcome variable log(sale price)
is observable only when Sold = 1.
α + β Cheaptalk + γ 0 X + , if Sold = 1,
log(sale price) = (27)
.
, if Sold = 0.
We estimate the Bivariate Sample Selection Model, (26) and (27), by Heckman’s two-step
3
Identification of the Heckman two-step estimator can be achieved without the exclusive restriction. In
particular, exactly the same regressors can appear in the first-step probit regression and the second-step OLS
regression, as long as the the first-step probit model can well discriminate between sold and unsold homes.
Indeed, in our first-step probit regression, there is considerable range in the predicted probabilities of houses
be sold from 1.51% to 99.99%.
20
Page 21 of 39
estimation, sometimes also called Heckman estimator. The Heckman’s two-step estimation
relies on the assumption that
= υ · ξ + ω, (28)
where the random variable ω is independent of ξ. Heckman (1979) proves that, under
assumptions (28) and (29), the OLS estimation of the following model using the subsample
of sold houses is consistent and robust to sample selection bias.
The only difference between (30) and (9) is the additional covariate, IM R(λ̂0 Z), that
stands for the inverse Mills ratio, and is calculated by the following formula:
φ(λ̂0 Z)
IM R(λ̂0 Z) = , (31)
Φ(λ̂0 Z)
where φ and Φ are the standard normal density and distribution functions, respectively. In
addition, we have υ = COV (ξ, ). That is, the coefficient of the inverse Mills ratio is the
covariance of the error terms in the selection and the outcome equations.
Therefore, our estimation of (30) in fact takes two steps: the first step is a probit re-
gression of (26) using the full sample of both sold and unsold homes. The second step is a
OLS regression of (30), using only the subsample of sold homes and controlling for the IM R
calculated from the first step.
The result from the Heckman’s two-step estimation is reported in Column (3) of Table
4. Consistent with our main results, comparative cheap talk is associated with a 0.77% price
premium.
21
Page 22 of 39
All the models we have used so far assume linear impacts of covariates on the dependent
variable. If this assumption is invalid, our previous estimators may be biased due to func-
tional misspecification. To deal with this potential issue, we apply the matching method,
more specifically the propensity score matching (PSM) method.
The matching estimation is obtained by simply comparing outcomes among transactions
that received the treatment (i.e., the treatment group) versus those that did not (i.e., the
comparison group). Using terminology from the matching literature, we define the outcome
as the natural logarithm of the sale price; the treatment group is defined as transactions
where the listing agents used comparative cheap talk; the comparison group is defined as all
the other transactions.
One advantage of matching estimation (compared to regression) is that the key identifying
assumption is weaker: the effect of covariates on the outcome need not be linear, as the
matching method estimates the effect by matching homes with the same covariates instead
of a linear model for the effect of covariates. Matching, however, can not solve for any
unobservable variable bias. Similar to regression, matching is based on the assumption that
the source of selection bias is the set of observed covariates. Matching estimators would be
biased if adoption of comparative cheap talk was based on unobservable variables.
Finding matches that are similar with respect to all relevant covariates can be difficult if
the number of covariates is large and the sample is relatively small. Nevertheless, Rosenbaum
and Rubin (1983) prove that matching on the one-dimensional propensity score (which is the
estimated probability of comparative cheap talk) suffices to adjust for the differences in the
observed covariates. Matching on the propensity score is called propensity score matching,
which is the technique we will use for the following estimation. The key estimator is called
the Average Treatment effect on the Treated (ATT), which has a similar interpretation to
the coefficient in the OLS: they measure the difference in the sale price between transactions
with comparative cheap talk and the other transactions, everything else being equal.
22
Page 23 of 39
There are various matching algorithms that differ in how the matched single transactions
are selected. In this paper, we focus on the kernel matching.4 As in Smith and Todd (2005),
we implement the trimming method to determine the region of common support: we drop
10 percent of the treatment observations (i.e., transactions with comparative cheap talk) at
which the propensity score density of the comparison observations (other transactions) is
the lowest. The ATTs estimated by the kernel matching are reported in Column 4 of Table
4. Consistent with our previous estimators, the PSM results show that comparative cheap
talk is associated with a 0.81% price premium. Although the estimated ATT is statistically
insignificant, it is of the same sign and of similar size to the coefficient from OLS regression
(Column 6 of Table 3).
As noted by Heckman, Ichimura and Todd (1997) and Dehejia and Wahba (1999), the
PSM estimator is only defined in the region of common support. Matching incomparable
observations could cause evaluation biases. Hence, an important further step is to check the
common support of the propensity scores for comparative cheap talk transactions and that
for other transactions.
The most straightforward way to verify common support is a visual analysis of the density
distributions of the propensity scores for both comparative cheap talk transactions and for
the other transactions. Figure 2 displays the propensity score distribution for transactions
with comparative cheap talk (above), and for the other transactions (below). Figure 2
provides strong evidence of overlapping propensity score distributions, suggesting that our
PSM estimation is well identified and reliable.
4
For the technical detail of each matching algorithm, see Imbens (2004), Smith and Todd (2005), and
Caliendo and Kopeinig (2008).
23
Page 24 of 39
Statements
The basic result that we’ve identified and tested in the previous two sections is consistent
with the prediction of the comparative cheap talk model, but also with other competing
hypotheses. For instance, homes sold with comparative statements may have unobservable
characteristics that make them sell for high prices. However, this is unlikely a valid expla-
nation, as negative comments are likely to represent undesirable characteristics, which will
make homes sell for low (rather than high) prices.
Besides our basic result, we find additional evidence consistent with the comparative
cheap talk explanation, but could not be explained by unobservables. In particular, we find
empirical evidence in support of our Proposition 2, that is, the price premium of comparative
statements is higher when there are more buyers. In particular, the premium switches sign
and becomes a discount when there are few potential buyers.
We find these pieces of evidence along three dimensions. (i) the boom period (2001-
2006) vs the bust period (2008-2010); (ii) Listings with few competitions vs those with many
competitions; and (iii) typical houses vs atypical houses. All the results are reported in
Table 6, and are consistent with the theoretical prediction from the model, providing strong
empirical support to the theoretical model of comparative cheap talk.
The first dimension is related to the time of the sale. We classify the full sample into the
boom period (2000-2006) and the bust period (2008-2010). Compared to the bust period, the
boom period tends to have a seller’s market, where each listing has more potential buyers.
We reestimate our full specification (as in Column 6 of Table 3) using the two subsamples,
respectively. The results are reported in Columns (1) and (2) of Table 6. Indeed, we find
that the price impact of comparative statements is positive in the boom period, but negative
in the bust period (1.27% vs -2.53%).
The second dimension that we test is about the number of competing houses. For each
24
Page 25 of 39
house, competing houses are defined as the other houses which are located in the same school
district as the subject house and which are actively listed at the time when the subject house
is sold. A house facing more competition from similar houses tends to have less potential
buyers. We therefore classify our data into two exclusive and exhaustive subsample according
to the number of competing houses, and then reestimate the price premium of comparative
statements in these two subsamples. The results are reported in Columns (3) and (4) of
Table 6. As expected, the price impact is positive for houses with fewer competitions but
negative for houses with many competitions (2.13% vs -0.51%), though the negative impact
is statistically insignificant.
The third dimension is along the atypicality index of the house. Following Haurin (1988),
we calculate the atypicality index of each house as follows, assuming that the house is located
in zip code j:
m
X
I= γ̂i |Xi − X̄i,j |, (32)
i
where m is the number of observable characteristics of the house, |Xi − X̄i,j | is the deviation
of the house’s observed characteristics i from the average level of all the houses in zip code
j, and γ̂i is the estimated coefficient of characteristic Xi in the following model:
m
X
p= γi Xi + , (33)
i
where p is the sale price. Therefore, γ̂i can be interpreted as the implied marginal price of
characteristic i. In sum, the atypicality index of a house is the weighted average deviation of
the house’s characteristics from the average level in the zip code where the house is located.
The weight of each characteristic is the implied marginal price of that characteristic.
The atypicality index is an aggregated measure of how much the house is different from
a typical house in the zip code. A house with a high atypicality index is more likely to have
unusually characteristics, and therefore tends to have fewer potential buyers. We thereby
classify our full data into two exclusive and exhaustive subsample according to the atypical
25
Page 26 of 39
index. We then reestimate the price premium of comparative cheap talk in these two sub-
samples. The results are reported in Columns (5) and (6) in Table 6. Indeed, we find that
the price premium of comparative cheap talk is larger for houses with lower atypicality index
than for houses with high atypicality index (1.06% vs 0.50%).
7 Conclusion
In this paper, we apply the comparative cheap talk model in Chakraborty and Harbaugh
(2010) in real estate and provide empirical evidence to demonstrate that comparative cheap
talk exists in equilibrium and the impact of comparative cheaper talk increases when there
are more potential buyers. This is the first study, to the best of our knowledge, to use
empirical data validating the theoretical cheap talk model. The results highlight the value
of the comparative cheap talk model in a market where assets are heterogenous with various
attributes that are hard to quantify, and transaction of the asset relies heavily on interme-
diation.
The comparative cheap talk model predicts that when describing characteristics of a listed
house, a listing agent with state-independent preferences can credibly reveal information to
potential buyers, by making comparative statements that make the house more appealing in
some dimensions but less on other dimensions. Moreover, the listing agent strictly benefits
from comparative statements, if his/her preferences over buyers’ estimates are quasiconvex.
In an auction setting, this quasiconvex preference condition is satisfied if there are enough
buyers. In other words, comparative statements increase expected house sale price when
there are many (i.e., more than 3) potential buyers, but decreases the price when there are
too few (less than 4) buyers. The reason is that comparative statements induce a better
match of the house with the buyer who values it most, but also weaken competition among
potential buyers. When there are many potential buyers, the positive impact from better
match overweights the negative impact from weakened competition. When there are few
26
Page 27 of 39
27
Page 28 of 39
and exhaustive subsamples according to the atypical index. A house with a high atypicality
index is likely to have more unusual characteristics, and therefore tend to have fewer potential
buyers. Indeed, we find that the price impact of comparative statements is larger for houses
with lower atypicality index.
28
Page 29 of 39
References
Agarwal, Sumit, Jia He, Tien Foo Sing and Changcheng Song. 2019. “Do real estate
agents have information advantages in housing markets?” Journal of Financial Economics,
forthcoming.
Allen, M., Rutherford, J., Rutherford, R., Yavas, A., 2016. “Conflicts of Interest in Resi-
dential Real Estate Transactions: new Evidence,” Florida Gulf Coast University, University
of South Florida, and University of Wisconsin, Madison Unpublished working paper.
Altonji, J., Elder, T., and Taber, C. (2005). “Selection on Observed and Unobserved Vari-
ables: Assessing the Effectiveness of Catholic Schools,” Journal of Political Economy, 113(1),
151-184.
Altonji, J., Elder, T., and Taber, C. (2008). “Using Selection on Observed Variables to
Assess Bias from Unobservables when Evaluating Swan-ganz Catheterization,” American
Economic Review, 98(2), 345-350.
Bouvard, Matthieu, Pierre Chaigneau and Adolfo De Motta. 2015. “Transparency in the
Financial System: Rollover Risk and Crises,” Journal of Finance, 70(4),1805-1837.
Caliendo, M., and Kopeinig, S. (2008). “Some Practical Guidance for the Implementation
of Propensity Score Matching,” Journal of Economic Surveys, 22(1), 31-72.
Chakraborty, A. and Rick Harbaugh. 2007. “Comparative Cheap Talk,” Journal of Eco-
nomic Theory, 132(1): 70-94.
Chakraborty, A. and Harbaugh, R. (2010). “Persuasion by Cheap Talk,” American Economic
Review, 100(5): 2361-2382.
Crawford, V. P. and Sobel, J. 1982. “Strategic information transmission,” Econometrica,
50(6), 1431-1451.
Dehejia, R.H., and Wahba, S. (1999). “Causal Effects in Nonexperimental Studies: Reevalu-
ating the Evaluation of Training Programs,” Journal of the American Statistical Association,
94(448), 1053-1062.
Garmaise, M. J., and T. J. Moskowitz. 2004. “Confronting Information Asymmetries:
29
Page 30 of 39
161.
Imbens, G.W. (2004). “Nonparametric Estimation of Average Treatment Effects Under
Exogeneity: A Review,” Review of Economics and Statistics, 86(1), 4-29.
Inderst, Roman and Marco Ottaviani. 2009. “Misselling through Agents,” American Eco-
nomic Review, 99(3): 883-908.
Kurlat, Pablo, and Johannes Stroebel. 2015. “Testing for Information Asymmetries in Real
Estate Markets,” Review of Financial Studies, 28(8): 2429-2461.
Levitt, S. and C. Syverson. 2008. “Market Distortions When Agents are Better Informed:
The Value of Information in Real Estate Transactions,” Review of Economics and Statistics,
90(4): 599-611.
Malenko, Andrey, and Anton Tsoy. 2019. “Selling to Advised Buyers,” American Economic
Review, 109 (4): 1323-48.
Menezes, F.M. and P.K. Monteiro. 2005. “An Introduction to Auction Theory,” Oxford
University Press, Oxford, UK.
30
Page 31 of 39
Ottaviani, Marco and Peter Norman Sørensen. 2006. “Reputational Cheap Talk,” RAND
Journal of Economics, 37(1): 155-175.
Rutherford, R.C., Springer, T.M. and Yavas, A. (2005). “Conflicts between principles and
agents: evidence from residential brokerage,” Journal of Financial Economics, 76, 627-665.
Smith, J. and Todd, P. (2005). “Does Matching Overcome Lalonde’s Critique of Nonexper-
imental Estimators,” Journal of Econometrics, 125(1-1), 305-353.
31
Page 32 of 39
Note: This table lists the positive and the negative words in the agent’s remarks of each listing in
the Multiple Listing Service (MLS), as well as the number of listings that each word appears. We
use the same dictionary as used in Haag, Rutherford and Thomson (2000), and Goodwin, Waller
and Weeks (2014 & 2018). The word “Cute” is omitted because it is inconsistently identified as
positive or negative words in the literature.
32
Page 33 of 39
Note: * significant at the 10% level, ** significant at the 5% level, *** significant at the 1% level.
33
Page 34 of 39
34
Page 35 of 39
Table 3. Estimated Impact of Persuasive Cheap Talk on Sale Price– Continued from previous page
Note: * significant at the 10% level, ** significant at the 5% level, *** significant at the 1% level. Each
column in Table 3 represents a separate regression, with the specification gradually saturated from left to
right as the set of control variables expand.
35
Page 36 of 39
Note: * significant at the 10% level, ** significant at the 5% level, *** significant at the 1% level.
? The propensity score matching uses kernel matching algorithm.
Page 37 of 39
37
Page 38 of 39
Note: Each column is from a separate regression using different subsamples. * significant at the 10% level, ** significant at the 5%
level, *** significant at the 1% level.
Page 39 of 39
0 .2 .4 .6 .8 1
Propensity Score
Figure 2: The propensity score distributions of comparative cheap talk (treated) and bab-
bling cheap talk (untreated)
39