0% found this document useful (0 votes)
39 views33 pages

DDMA03 PricingAnalytics Model

Marketing analytics lecture note

Uploaded by

sqade20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views33 pages

DDMA03 PricingAnalytics Model

Marketing analytics lecture note

Uploaded by

sqade20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

회귀분석을 이용한 수요 함수 추정

• Demand function: ! = #(%, '|)), X={Price, other demand shifters}

- 시장 수요를 반영하여 수요 함수 형태 모형화

- 모형화된 수요함수를 선형 함수로 변환 (로그 등 사용)

• 선형 회귀 모형: + = ,- + ,/0/ + ,101 + ⋯ + '

- 회귀 모형의 계수를 데이터를 이용하여 추정: ,3-, ,3/, ,31, ⋯

- R코드 형식

regout = lm(y ~ x1+x2+x3)

Inseong Song, SNU Business School


Regression Analysis
• Explain the variation in the dependent variable as a function of the variation in
the independent variables.
• Basic Model
!" = $% + $' ("' + ⋯ + $* ("* + +" = ∑*-.% $- ("- + +" = ("/ $ + +"
("/ = [1 ("' ("2 … (",* ] : regressors
$ = [$% , $' , … , $* ]′ : parameters
• Regression is on conditional expectation
7[!" |(" ] = ("/ $
• Why error term? (!" = 7[!" |(" ] + +" )
– Effect of misspecification (that is, the effect of omitted variables)
– Random behaviors
– Measurement errors

Inseong Song, SNU Business School


• Assumptions
1. The expectation of the error is zero
2. The variance of the error is the same for all observations
(homoscedasticity)
3. Errors of different observations are not correlated with each other (No
serial correlation, or no autocorrelation)
4. The error term is not correlated with any of the independent variables.
(critical assumption)
• Implication
– Assumption 1 is always satisfied (we have an intercept, or we may use
demeaned data)
– Assumption 2 and 3 are not important. If they violate, it would make our
estimates inefficient (that is, larger standard errors)
– Assumption 4 is critical. If it violates, OLS ends up biased estimation.

Inseong Song, SNU Business School


• How to estimate? Find the values of !’s that minimizes the sum of the squared
errors
"# = %# − !' − !( )#( − ⋯ − !+ )#+
• Sum of the squared errors (SSE)
,,- = ∑0#/( "#1 = ∑0#/( %# − !' − !( )#( − ⋯ − !+ )#+ 1

• Minimize ∑0#/( %# − !' − !( )#( − ⋯ − !+ )#+ 1


w.r.t. ! = [!' , !( , … , !+ ]′
• SSE is a quadratic function of ! : existence of minimum
7889
• How to find minimum? Solve the first order condition (FOC) =0
7:

SSE

7889
=0 !
7:
Inseong Song, SNU Business School
Matrix Representation of Regression Model

!" = $% + $' ("' + ⋯ + $* ("* + +" , , = 1, … , 0

!' $% + $' ('' + ⋯ + $* ('* +'


⋮ = ⋮ + ⋮
!2 $% + $' (2' + ⋯ + $* (2* +2

!' 1 ('' ⋯ ('* $% +'


⋮ = ⋮ ⋮ ⋱ ⋮ ⋮ + ⋮
!2 1 (2' ⋯ (2* $* +2

! 4 $ +

! = 4$ + +, where +~6(0, 9 : ;)

Inseong Song, SNU Business School


Data Matrix

Let X be a n by k matrix (!"×$ ) that contains the observations

!"×$ = '( , '* , ⋯ , '$ = ',-

i=1,2,3,…,n (observations)

j=1,2,3,…,k (variables)

ex. 3 variables (k=3) and 4 observations (n=4)

10 5 5
!= 5 3 3
15 6 5
10 6 3

Inseong Song, SNU Business School


• Centroid: (that is, mean of each variable)

%
! =
"′ 1( "
&

where 1( = [1,1, … , 1] called sum vector

• Mean-corrected (or demeaned) data matrix:


"- = " − 1"′ !

• Standardized data matrix:

"/ = "- 0 1%/3

where 0 = 4567(9: ) the jth element given by 9: = 1⁄(= − 1) >-: ′>-:

Inseong Song, SNU Business School


• Cross Product Matrices:

– Raw SSCP(sum of squares and cross product) matrix:


! = #$#

– Mean-corrected SSCP matrix:

% = # $ # − 1⁄) 1$ # $
1$ # = #* ’ #*

– Covariance matrix:
+ = %⁄() − 1) = #* ′#* ⁄() − 1)

– Correlation matrix:
/ = #0 ′#0 ⁄() − 1) = 1 23/5 +1 23/5

• Covariance of Linear Composites

For 6 = #7, + 8 = 6* ′6* ⁄ ) − 1 = 7+ # 7′

Inseong Song, SNU Business School


Back to Regression Model

• Matrix notation
!" 1 '"" ⋯ '") +, ."
⋮ = ⋮ ⋮ ⋱ ⋮ ⋮ + ⋮
!$ 1 '$" ⋯ '$) +) .$
! = /+ + ., where .~1(0, 5 6 7)
• 추정: OLS(Ordinary Least Squares) – 잔차 제곱합을 최소화하는 b 를 찾음
+9 = argmin∑ !B − 'BD + 6
@

– 잔차제곱합 (SSE:sum of squared errors)


. D . = ! − /+ D
! − /+
– The FOC (first order condition)
E. D .
= 2X D X+ − 2/ D ! = 0
E+
+9 = / D / H" / D !
– Another interpretation: error terms are uncorrelated with X
0 = / D . = / D ! − /+ Þ +9 = / D / H" / D !

Inseong Song, SNU Business School


• The Variance of OLS estimator

!"# %$ = !"# ' ( ' )* ' ( + = ' ( ' )* ' ( !"# + ' ' ( ' )*

= ' ( ' )* ' ( , - .' ' ( ' )* = , - ' ( ' )*


(recall !"# + = , - .)

• Estimated Variance of OLS estimator


/%$ = ,0 - ' ( '
!"# )*

• Standard Errors of OLS estimator: square roots of the diagonal elements of the
estimated variance matrix above

• 계수 추정의 정확도와 독립변수의 분산


– ' ( ' : X의 분산 (variation)

– 독립변수의 variation 이 많아야 계수 추정의 정확도 높아짐

Inseong Song, SNU Business School


Interpretation of regression coefficients
!" = $% + $' ("' + ⋯ + $* ("* + +"

• Intercept ($% ): the expected value of y when all X’s are equal to zero

• Coefficients (Impact of (' on !: $' ): other things being equal, when ('
changes by 1 unit, ! would change by $'
– So Be careful on the measurement unit of the variable: the values of $'
change as you change the measurement unit of (!, ()
-.
– Marginal effect: $' captures
-/0

– It is conditional on “other things being equal”. What are “other things”?


They are other variables in the regression model (that is, {(1 , ⋯ , (* })

– So if you want hold something constant when investigating the effect of


price, include that variable in the regression model. (If you do not include
it, you are not sure if you are controlling it.)

Inseong Song, SNU Business School


Dummy variable

• takes a binary value only {0,1}

• Ex. Customer Status (1=current customer, 0=not), Ownership (1=own a home,


0=not)

• Interpretation: incremental effect (the impact of a particular level compared to


baseline level) ,Ex. !" = $% + $' (" “the expected value of y is larger for D=1
cases by $' than for D=0 cases”

Inseong Song, SNU Business School


• Set of dummy variables to indicate a qualitative variable (such as color)

– How to investigate the impact of a qualitative variable? Represent the


qualitative value as a combination of dummies

– When we have “k” qualitative levels, we need k-1 dummy variables

– Ex. Color = {red, blue, green, yellow} then we need 3 dummies.

• D1=1 if red and D1=0 otherwise


• D2=1 if blue and D2=0 otherwise

• D3=1 f green and D3=0 otherwise

red blue green yellow


D1 1 0 0 0
D2 0 1 0 0
D3 0 0 1 0

Inseong Song, SNU Business School


Prediction
• What is expected value of ! for a new case with " = $% ? Note !% = $%& ' + )%

• Point prediction: *[!% ] = $%& ' so prediction is !-% = $%& '. and * !-% = $%& ' =
*[!% ]

• Interval Prediction with confidence level (a): Confidence Interval: !-% ±


01 ×3*(!-% ) where 3*(!-% ) is the standard Error of forecast:

• How to compute 3*(!-% )?

– Two source of error for my prediction: (1) use of estimated ' instead of
the true ' and (2) ): natural variation in ! unexplained by X and

– !-% − !% = !-% − * !-% + * !-% − !%

– * !-% − !% 7 = * !-% − * !-% 7 + * * !-% − !% 7

– var !-% = var $%& '. + var )% = ; 7 $% ′ " & " =>
$% + ; 7

?
var !-% = ;- 7 1 + $% ′ " & " =>
$% and thus 3* !-% = ?
var !-%

Inseong Song, SNU Business School


A Simulated Example

log $% = '( + '* log +% + ', -%* + '. -%, + '/ 0% + 1%

• 독립변수

– D1, D2 : 제품의 색 (D1=1 for yellow, D2=1 for green, blue is the baseline)
– M: 제품의 품질 수준

• N=100

• True values ' = (3, −3, 1.5, 0.7, 3)′, = , = 3,

Inseong Song, SNU Business School


Inseong Song, SNU Business School
Inseong Song, SNU Business School
Inseong Song, SNU Business School
Inseong Song, SNU Business School
Correlation Matrix

Log(Sales) Log(Price) Yellow Green Quality


Dummy Dummy

Log(Sales) 1

Log(Price) -0.2964 1

Yellow
Dummy 0.0551 0.1884 1

Green
Dummy 0.1348 -0.1700 -0.6205 1

Quality -0.0133 0.5042 -0.0100 -0.0265 1

Inseong Song, SNU Business School


Regression Output

Call:
lm(formula = logQ ~ logPr + dummy1 + dummy2 + quality)

Residuals:
Min 1Q Median 3Q Max
-6.147 -1.893 0.177 1.608 6.169

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0364 0.9020 4.475 2.12e-05 ***
logPr -2.6832 0.7265 -3.693 0.00037 ***
dummy1 1.7332 0.7212 2.403 0.01820 *
dummy2 1.4027 0.6704 2.092 0.03906 *
quality 1.3682 0.7339 1.864 0.06537 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.609 on 95 degrees of freedom


Multiple R-squared: 0.1688, Adjusted R-squared: 0.1338
F-statistic: 4.824 on 4 and 95 DF, p-value: 0.001382

Inseong Song, SNU Business School


Reading Regression Output

• Interpretation of coefficients

• Meaning of the coefficients of dummies

• F-test and p-value: validity of the model as a whole

- Null hypothesis: all β’s=0 ( !" = ⋯ = !% = 0 )

• T-test and p-value: significance of individual coefficients

- Null hypothesis: !' = 0

• R-squared and Adjusted R-squared

• Standard deviation of the residuals

Inseong Song, SNU Business School


Multicollinearity

• Independent variables are highly correlated with one another. (Ex. Log(Price) and
Quality index = 0.5042)
• Consequence: difficult to separate “Price” effect from “Quality” effect, and then
high estimated standard errors for one or more slope coefficients.

Evidence: The model as a whole is quite significant, while none of the individual slope
parameter is!

• What to do with multicollinearity?

1. If the objective is to forecast, ignore the collinearity.

2. If the objective is to understanding particular coefficient, you may


a) add more (and better!) data or

b) drop a variable.

Inseong Song, SNU Business School


What happens if we drop Quality variables?

Call:
lm(formula = logQ ~ logPr + dummy1 + dummy2)

Residuals:
Min 1Q Median 3Q Max
-5.8695 -2.0233 0.0904 1.6674 6.9931

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8175 0.8091 5.954 4.27e-08 ***
logPr -1.9868 0.6311 -3.148 0.00219 **
dummy1 1.5947 0.7266 2.195 0.03059 *
dummy2 1.3935 0.6789 2.053 0.04284 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.643 on 96 degrees of freedom


Multiple R-squared: 0.1384, Adjusted R-squared: 0.1115
F-statistic: 5.141 on 3 and 96 DF, p-value: 0.002436

Inseong Song, SNU Business School


Specification Error:
Omitting a Relevant Predictor

Model 1: log $% = '( + '* log +% + ',-%* + '.-%, + '/0% + 1% ('* = −2.6832)
Model 2: log $% = '( + '* log +% + ',-%* + '.-%, + 1% ('* = −1.9863)
ØEffect of log(Price) understated omitting Quality variable.

If X1 and X2 are correlated, b1 will be biased with the direction of the bias depending on the sign of
the correlation between X1 and X2 and the sign of b2.

Direction of bias in b1

Sign of b2
Sign of Correlation
Between X1 and X2 Positive Negative

Positive Upward bias Downward bias

Negative Downward Bias Upward bias

Inseong Song, SNU Business School


Omitted Variable Bias (OVB)

• “True” regression model: ! = #$ %$ + #' %' + (


• “Misspecified” model: ! = #$ )$ + *
• What happens if we use the misspecified model? Would + ),$ = %$ ?
• Note that
),$ = #$- #$ .$ # - !
$ = #$- #$ .$ # -
$ #$ %$ + #' %' + (
= %$ + #$- #$ .$ # - # %
$ ' ' + #$- #$ .$ # - ε
$
• We know that E #$- ε = 0. So,
+ ),$ = %$ + #$- #$ .$ # - # %
$ ' '

• If #$- #' = 0 (that is, X1 and X2 are uncorrelated), then + ),$ = %$ . That is, omitting
an uncorrelated variable would not result in biased estimates
• If not, the bias = #$- #$ .$ #$- #' %' . And Sign(bias) = sign(#$- #' )*sign(%' ). So omitting
a correlated variable result in biased estimates. So it is not a good idea to drop a
variable as a remedy to multicollinearity.

Inseong Song, SNU Business School


Some Math
Consider the regression model, ! = # + %& + '
It can be shown that ( %)*+, = ./0 !, & ⁄234(&)

./0 !, & = ./0 # + %& + ', & = ./0 %& + ', &
= %./0 &, & + ./0 ', & = %034 & + ./0(', &)
Then
( %)*+, = ./0 !, & ⁄234(&) = % + ./0(', &)⁄234(&)

Therefore, if ./0 ', & = 0 (that is, X is independent of error term),


then ( %)*+, = % (called unbiased).

Note that 234 & > 0 by the definition of variance. Therefore,


• If ./0 ', & < 0, ( %)*+, < % (downward bias)
• If ./0 ', & > 0, ( %)*+, > % (upward bias)

Inseong Song, SNU Business School


Illustration of Omitted Variable Bias with Data Pooling

• Suppose we have data from two stores, store 1 and store 2. Store 1 is a large store
and store 2 is a small store. Suppose the price sensitivity is identical for the two
stores, i.e., !" = $" + &'" + (" and !) = $) + &') + ()
• They differ only in the intercept (the baseline market size). We also assume that the
price is uncorrelated with error term. Now suppose the researcher pooled the data
into one data set and use the following model: ! = $ + &' + *

• Note that the true model should be ! = $ + &' + +, + ( where D is the dummy
variable for store 1 (D=1 if store=1, D=0 if store=2). So it is obvious that * = +, + (
and + = $" − $) . That is, the research is ignoring the dummy variable which is
correlated with price. (Price is naturally higher in the large store, see the illustration).
That means that price is correlated with * in the misspecified model. So the
estimated b would be biased.

Inseong Song, SNU Business School


Illustration of Omitted Variable Bias with Data Pooling

• Marketers will price so that MR=MC in each store. (Marketers want to maximize
profit.) This would result in the following pattern in the data.

Estimated
Regression Line

Q1

Q2

MC: fluctuating
- source of exogenous price variation
MR2 MR1

Inseong Song, SNU Business School


Nonspherical Error Terms
• What happens if X is correlated with error terms? OLS will be biased.

• Consider a matrix representation of regression model, ! = #$ + &

• OLS estimator is given by $' = # ( # )*


# ( !. Then,

+ $' = + # ( # )* # ( ! = + #(# )* # ( (#$ + &) = $ + # ( # )* +(# ( &)

• So, if + # ( & ≠ 0, OLS estimate will be biased.


• In many real life cases, we will suffer from this issue. Endogeneity and simultaneity
are sources for that. It is because X was not randomized by researchers. In fact, X was
set by decision makers.
• What to do in order to have unbiased estimate of b under such cases?

- Probably the best solution would be to identify all the omitted variables
correlated with X and include them in the model. But such solution may not be
available in all cases.
- Second best solution: use some econometric solutions such as IV and 2SLS

Inseong Song, SNU Business School


Instrumental Variables

• Use Instrumental Variable (Z): correlated with X, but not with error terms, i.e.,
cov $, & ≠ 0, and cov &, ) = 0

• For the context of the demand function, Z should not be a demand shifter but Z
should be correlated with price.

• Start from the assumption + & , ) = 0. That is & , - − $/ = 0. This yield


& , - = & , $/. cov $, & ≠ 0 means & , $ is invertible (i.e., & , $ ≠ 0).

• From the above we obtain the IV estimator: /012 = & , $ 34 & , -

+ /012 = + & , $ 34 & , - = + &,$ 34 & , ($/ + )) = / + & , $ 34 +(& , ))

• Since + & , ) = 0, + /012 = / (IV estimator is unbiased.)

• A good IV should be well correlated with X (called “first stage”).

• Then, what should be instrumental variables? – It depends on the context.

Inseong Song, SNU Business School


Example: Demand Function ! = # + %& + ' (we are interested in price coefficient)

• Demand shock (error term) is unobserved to researchers but often observed to


marketers. Ex) advertising, availability of coupon, change in consumer taste, etc.

• Marketers take into account such term when setting the price. It is possible that
price is correlated with error term.

• Thus, OLS will be biased.

• Some Candidates for instruments

- Manufacturing Cost: correlated with price, but uncorrelated with demand shock

- Lagged Price: correlated with today’s price (probably through cost factor) but
uncorrelated with today’s demand shock

- Price in other market: correlated with price in the focal market, but
uncorrelated with demand shock in the focal market

Inseong Song, SNU Business School


2SLS

• What if I have a lot of instruments?

• If we have more instrumental variables than original independent variables, ! " # is


not a square matrix any more. So it can’t be inverted. In this case we use 2SLS.

• Two Stage Least Square


$ #$ = ! ! " !
- Stage 1: Regress X on Z to find #, &' ! " #

- Stage 2: Regress Y on #$ (use #$ as new data)

• If number of instruments=number of independent variables, IV estimation is


equivalent to 2SLS.

Inseong Song, SNU Business School

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy