DDMA03 PricingAnalytics Model
DDMA03 PricingAnalytics Model
- R코드 형식
SSE
7889
=0 !
7:
Inseong Song, SNU Business School
Matrix Representation of Regression Model
! 4 $ +
! = 4$ + +, where +~6(0, 9 : ;)
i=1,2,3,…,n (observations)
j=1,2,3,…,k (variables)
10 5 5
!= 5 3 3
15 6 5
10 6 3
%
! =
"′ 1( "
&
% = # $ # − 1⁄) 1$ # $
1$ # = #* ’ #*
– Covariance matrix:
+ = %⁄() − 1) = #* ′#* ⁄() − 1)
– Correlation matrix:
/ = #0 ′#0 ⁄() − 1) = 1 23/5 +1 23/5
• Matrix notation
!" 1 '"" ⋯ '") +, ."
⋮ = ⋮ ⋮ ⋱ ⋮ ⋮ + ⋮
!$ 1 '$" ⋯ '$) +) .$
! = /+ + ., where .~1(0, 5 6 7)
• 추정: OLS(Ordinary Least Squares) – 잔차 제곱합을 최소화하는 b 를 찾음
+9 = argmin∑ !B − 'BD + 6
@
!"# %$ = !"# ' ( ' )* ' ( + = ' ( ' )* ' ( !"# + ' ' ( ' )*
• Standard Errors of OLS estimator: square roots of the diagonal elements of the
estimated variance matrix above
• Intercept ($% ): the expected value of y when all X’s are equal to zero
• Coefficients (Impact of (' on !: $' ): other things being equal, when ('
changes by 1 unit, ! would change by $'
– So Be careful on the measurement unit of the variable: the values of $'
change as you change the measurement unit of (!, ()
-.
– Marginal effect: $' captures
-/0
• Point prediction: *[!% ] = $%& ' so prediction is !-% = $%& '. and * !-% = $%& ' =
*[!% ]
– Two source of error for my prediction: (1) use of estimated ' instead of
the true ' and (2) ): natural variation in ! unexplained by X and
– var !-% = var $%& '. + var )% = ; 7 $% ′ " & " =>
$% + ; 7
?
var !-% = ;- 7 1 + $% ′ " & " =>
$% and thus 3* !-% = ?
var !-%
• 독립변수
– D1, D2 : 제품의 색 (D1=1 for yellow, D2=1 for green, blue is the baseline)
– M: 제품의 품질 수준
• N=100
Log(Sales) 1
Log(Price) -0.2964 1
Yellow
Dummy 0.0551 0.1884 1
Green
Dummy 0.1348 -0.1700 -0.6205 1
Call:
lm(formula = logQ ~ logPr + dummy1 + dummy2 + quality)
Residuals:
Min 1Q Median 3Q Max
-6.147 -1.893 0.177 1.608 6.169
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0364 0.9020 4.475 2.12e-05 ***
logPr -2.6832 0.7265 -3.693 0.00037 ***
dummy1 1.7332 0.7212 2.403 0.01820 *
dummy2 1.4027 0.6704 2.092 0.03906 *
quality 1.3682 0.7339 1.864 0.06537 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Interpretation of coefficients
• Independent variables are highly correlated with one another. (Ex. Log(Price) and
Quality index = 0.5042)
• Consequence: difficult to separate “Price” effect from “Quality” effect, and then
high estimated standard errors for one or more slope coefficients.
Evidence: The model as a whole is quite significant, while none of the individual slope
parameter is!
b) drop a variable.
Call:
lm(formula = logQ ~ logPr + dummy1 + dummy2)
Residuals:
Min 1Q Median 3Q Max
-5.8695 -2.0233 0.0904 1.6674 6.9931
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8175 0.8091 5.954 4.27e-08 ***
logPr -1.9868 0.6311 -3.148 0.00219 **
dummy1 1.5947 0.7266 2.195 0.03059 *
dummy2 1.3935 0.6789 2.053 0.04284 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model 1: log $% = '( + '* log +% + ',-%* + '.-%, + '/0% + 1% ('* = −2.6832)
Model 2: log $% = '( + '* log +% + ',-%* + '.-%, + 1% ('* = −1.9863)
ØEffect of log(Price) understated omitting Quality variable.
If X1 and X2 are correlated, b1 will be biased with the direction of the bias depending on the sign of
the correlation between X1 and X2 and the sign of b2.
Direction of bias in b1
Sign of b2
Sign of Correlation
Between X1 and X2 Positive Negative
• If #$- #' = 0 (that is, X1 and X2 are uncorrelated), then + ),$ = %$ . That is, omitting
an uncorrelated variable would not result in biased estimates
• If not, the bias = #$- #$ .$ #$- #' %' . And Sign(bias) = sign(#$- #' )*sign(%' ). So omitting
a correlated variable result in biased estimates. So it is not a good idea to drop a
variable as a remedy to multicollinearity.
./0 !, & = ./0 # + %& + ', & = ./0 %& + ', &
= %./0 &, & + ./0 ', & = %034 & + ./0(', &)
Then
( %)*+, = ./0 !, & ⁄234(&) = % + ./0(', &)⁄234(&)
• Suppose we have data from two stores, store 1 and store 2. Store 1 is a large store
and store 2 is a small store. Suppose the price sensitivity is identical for the two
stores, i.e., !" = $" + &'" + (" and !) = $) + &') + ()
• They differ only in the intercept (the baseline market size). We also assume that the
price is uncorrelated with error term. Now suppose the researcher pooled the data
into one data set and use the following model: ! = $ + &' + *
• Note that the true model should be ! = $ + &' + +, + ( where D is the dummy
variable for store 1 (D=1 if store=1, D=0 if store=2). So it is obvious that * = +, + (
and + = $" − $) . That is, the research is ignoring the dummy variable which is
correlated with price. (Price is naturally higher in the large store, see the illustration).
That means that price is correlated with * in the misspecified model. So the
estimated b would be biased.
• Marketers will price so that MR=MC in each store. (Marketers want to maximize
profit.) This would result in the following pattern in the data.
Estimated
Regression Line
Q1
Q2
MC: fluctuating
- source of exogenous price variation
MR2 MR1
- Probably the best solution would be to identify all the omitted variables
correlated with X and include them in the model. But such solution may not be
available in all cases.
- Second best solution: use some econometric solutions such as IV and 2SLS
• Use Instrumental Variable (Z): correlated with X, but not with error terms, i.e.,
cov $, & ≠ 0, and cov &, ) = 0
• For the context of the demand function, Z should not be a demand shifter but Z
should be correlated with price.
• Marketers take into account such term when setting the price. It is possible that
price is correlated with error term.
- Manufacturing Cost: correlated with price, but uncorrelated with demand shock
- Lagged Price: correlated with today’s price (probably through cost factor) but
uncorrelated with today’s demand shock
- Price in other market: correlated with price in the focal market, but
uncorrelated with demand shock in the focal market