Handout 2
October 30, 2017 | Author: Anonymous | Category: N/A
Short Description
to Estimation of Random Coefficient. Logit Models Professor Matthijs Wildenbeest Structural Econometric ......
Description
Structural Econometric Modeling in Industrial Organization Handout 2 Professor Matthijs Wildenbeest
17 May 2011
1
Reading Peter C. Reiss and Frank A. Wolak. Structural Econometric Modeling: Rationales and Examples from Industrial Organization. Handbook of Econometrics 6A, Chapter 64, Section 7, 2007. Aviv Nevo. A Practitioner’s Guide to Estimation of Random Coefficient Logit Models of Demand. Journal of Economics and Management Strategy 9, 513-48, 2000. Steven Berry, James Levinsohn, and Ariel Pakes. Automobile Prices in Market Equilibrium. Econometrica 63, 841-90, 1995.
2
Discrete choice models
Estimating a system of demand equations is not very practical. Each equations specifies demand as a function of the own price, the price of other products, and other variables. If the number of products is large, a large number of parameters has to be estimated: with 100 products we need to estimate at least 10,000 parameters. Discrete choice models solve the dimensionality problem by focusing on characteristics of products.
3
Discrete choice models
Individuals have the choice between two or more discrete alternatives. In our setting: which product to buy. Differences in consumer tastes cause firms to produce differentiated products and consumers to make different choices among the available products. Can be modeled using logit and probit models. Daniel McFadden won the 2000 Nobel Prize for his pioneering work on discrete choice models.
4
Discrete choice models In a binary choice setting (0/1) one could use a linear probability model, i.e., Y = x 0 β + ε. However, since E [Y |x] = Prob(Y = 1|x) = x 0 β, probabilities can be outside the 0-1 interval. Thus we need the requirement lim
x 0 β→+∞
Prob(Y = 1|x) = 1
and
lim
x 0 β→−∞
Prob(Y = 1|x) = 0.
In principle, any continuous probability distribution can be used.
5
Discrete choice models Two have become popular: the normal and the logistic distribution. The probit model is based on the normal distribution, i.e., Z
x 0β
Prob(Y = 1|x) =
φ(t)dt = Φ(x 0 β).
−∞
The logit model is based on the logistic distribution, i.e., 0
ex β Prob(Y = 1|x) = . 1 + e x 0β However, we are interested in a setting with more than two alternatives. In such a setting the probit model is not so practical: leads to multivariate normal integrals.
6
Nevo (2000) The logit model does (under certain circumstances) give closed form solutions for the probabilities. Nevo (j econ manage strat, 2000) in his practitioner’s guide focuses on the random-coefficients logit model of demand. Explains the random-coefficients logit methodology in differentiated markets using market-level data. Popular method because it allows for endogeneity in prices and because it gives realistic demand elasticities. Method is developed in Berry (rand j econ, 1994) and Berry, Levinsohn, and Pakes (econometrica, 1995) (BLP).
7
Probabilistic choice models Suppose an individual can choose among J alternatives and that the utility of consuming alternative j for individual i is given by uij = xj β − αpj + εij = δj + εij , where j = 1, . . . , J. Consumers are assumed to purchase one unit of the good that gives the highest utility. Consumers only differ in ε, so the set of people choosing product j is Aj (x, p) = {(εi0 , εi1 , . . . , εiJ )|uij ≥ uik ∀ k = 0, 1, . . . , J}.
8
Conditional logit model
McFadden (1974) showed that if ε is i.i.d. and distributed according to a Type I extreme value distribution, i.e., −ε
P ∗ (ε) = e −e , then the market share of product j is Z sj = Prob(Y = j) = dP ∗ (ε) = Aj
exp(xj β − αpj ) . PJ 1 + k=1 exp(xk β − αpj )
This is called the conditional logit model.
9
IlA property Conditional logit model has the Independence of Irrelevant Alternatives (IIA) property. exp(xj β − αpj ) pj (xj ) = . pk (xk ) exp(xk β − αpk ) This means relative probabilities for two alternatives depend only on attributes of those two alternatives.
Example Suppose choice of two modes of transportation: car and red bus. Initially equal probability of 21 . If blue bus introduced, according to IIA probability of each mode should fall to 13 . Not very realistic.
10
Other problems Several other problems with conditional logit model. The price elasticities of the market shares are ∂sj pk −αpj (1 − sj ) if j = k, = ηjk = αpk sk otherwise. ∂pk sj If the market shares are small, the own-price elasticities are proportional to own price: implies higher markup for lower-priced brands. This means that the marginal cost as a percentage of price should be lower for cheaper brands. This might not be true for all products.
11
Other problems Another problem with the conditional logit model is related to the cross-price elasticities: consumers are restricted to substitute towards other brands in proportion to market shares. Intuitively, one would expect that consumers will substitute to similar products of the price of one product goes up. However, in the logit model substitution patterns do not depend on characteristics. Problem comes from the i.i.d. structure of ε. Consumers have different rankings, but only due to i.i.d. shock. If some product becomes unattractive due to a price increase, proportion of consumers who rank some other brand as their second choice is equal to the market share of that product.
12
Nested logit model Solution: make shocks to utility correlated across products. One way would be to generate correlation through εij . Example is nested logit model, in which products are grouped and εij is decomposed into an i.i.d. shock plus a group-specific component. As a result correlation between products within the same group is higher. As in the conditional logit model still a closed form solution for the market shares. However, division of segments might be a problem (which product belongs to which group?) and still i.i.d. assumption within group, so within group substitution patterns are driven by market shares. 13
Random-coefficients logit model Another way to create correlation between choices is to make the coefficients α and β random. Let the indirect utility function now be uij = αi (yi − pj ) + xj βi + ξj + εij , where yi is income and ξj is an unobserved (by the econometrician) product characteristic. Note that the unobserved characteristic is identical for all consumers, but that the coefficient of price is different across consumers. Consistent with theoretical vertical product differentiation literature.
14
Random-coefficients logit model
Individual taste parameters αi and βi depend on observed demographics Di and unobserved additional characteristics vi , i.e., αi α ˆ ∗ (D), = + ΠDi + Σvi , vi ∼ Pv∗ (v ), Di ∼ P D βi β where Pv∗ (·) is a parametric distribution (e.g., multivariate normal) ˆ ∗ (·) is the distribution of observed demographics. and P D
As a result random coefficients depend not only on arbitrary distributional assumptions, but also on information we have about demographics.
15
Random-coefficients logit model We normalize the mean indirect utility from to outside good to zero. Utility can be rewritten as uij = αi yi + δj (xj , pj , ξj ; θ1 ) + µij (xj , pj , vi , Di ; θ2 ) + εij , where δj = xj β − αpj + ξj ,
µij = [−pj , xj ](ΠDi + Σvi ),
and θ1 = (α, β) contains the linear parameters and θ2 = (Π, Σ) contains the nonlinear parameters. Common mean utility to all consumers is δj , while µij + εij is the deviation from the mean utility. 16
Random-coefficients logit model If we assume vi and Di are independent, market shares are now given by Z Z ∗ ∗ sj = dP (D, v , ε) = dPε∗ (ε)dPv∗ (v )d pˆD (D). Aj
Aj
If we maintain the i.i.d. extreme-value distribution assumption on εij , correlation between choices is obtained trough µij . The composite random shock µij + εij does depend on the product and consumer characteristics. As a result, if the price of some product goes up, consumers are more likely to switch to products with similar characteristics.
17
Random-coefficients logit model
The price elasticities of the market shares become ( pj R ˆ ∗ (D)dPv∗ (v )) if j = k, − sj αi sij (1 − sij )d P ∂sj pk D R ηjk = = pk ∗ (D)dP ∗ (v )) ˆ ∂pk sj α s s d P otherwise, i ij ik v D sj where sij =
exp(δj + µij ) PK 1 + k=1 exp(δk + µik )
is the probability of individual i purchasing product j. Each individual will have a different price sensitivity.
18
Random-coefficients logit model
Unfortunately, advantages come at a cost: • No longer closed form solution for the market shares sj . • Computation of the integral is difficult.
Additional problem is the endogeneity of prices.
19
Data Only market-level data is required to estimate the model: • market shares • prices • brand characteristics
ˆ ∗ is In addition, information on the distribution of demographics P D useful, but the model can also be estimated using only the parametric distribution Pv∗ . Using data from several markets is recommended, since this will help identifying the parameters.
20
Data
Need to define the market share of the outside good. Usually defined as the total size of the market minus the shares of the inside goods. Examples: • Nevo (econometrica, 2001) takes the total market as one
serving of cereal per capita per day. • Bresnahan et al. (rand j econ, 1997) assume it is the total
number of office-based employees. • BLP assume it is the total number of households.
21
Identification Random-coefficients logit model is identified separately from the conditional logit model because the models give different predictions in terms of substitution patterns. If product A and B are similar in characteristics, but B and C have very similar market shares, the conditional logit model would predict that the market share of both B and C should increase by the same amount when the price of A goes up. The random-coefficients logit model would predict that the market share of product B would increase more. Observing the actual changes in market share of B and C allows us to distinguish between the models. Degree of change identifies the random coefficients. 22
Estimation
Most straightforward approach to estimate the model would be to solve min ||s(x, p, δ(x, p, ξ; θ1 ); θ2 ) − S||, θ
where S are the observed market shares. Usually not done because parameters enter in a nonlinear way. Berry (1994) proposes to transform the minimization problem in such a way that the parameters enter linearly.
23
Endogeneity problem The main contribution of Berry (1994) is that it allows us to deal with the endogeneity problem. The unobserved individual attributes (Di , vi , i ) are integrated over, so the econometric error term will be the unobserved product characteristics ξj . This error term is likely to be correlated with prices. Standard nonlinear simultaneous-equations model cannot be used, since the error term does not enter additively in the minimization problem. Berry (1994) adapts the model in such a way that the standard (non-)linear simultaneous-equations model can be used.
24
GMM The model is estimated by Generalized Method of Moments (GMM). Let Z = [z1 , . . . , zM ] be a set of instruments such that E [zm ω(θ∗ )] = 0,
m = 1, . . . , M,
where ω is the error term as a function of the parameters and θ∗ are the true parameter values. If we have as many instruments as independent variables we can ˆ equal to simply set the sample analog of the moments, Z 0 ω(θ), ˆ If the model is overidentified, there will not be zero and solve for θ. ˆ as small as possible. an exact solution and we need to make Z 0 ω(θ)
25
GMM In principle we could try to minimize the sum of squares of the moment equations, i.e., ω(θ)0 ZZ 0 ω(θ). However, as shown by Hansen (econometrica, 1982), using the inverse of the variance-covariance matrix of the moment conditions, E [Z 0 ωω 0 Z ], as weights for the moment conditions turns out to be optimal. Therefore, our GMM estimate is going to be θˆ = argmin ω(θ)0 Z Φ−1 Z 0 ω(θ), θ
where Φ is a consistent estimate of the variance-covariance matrix E [Z 0 ωω 0 Z ].
26
Steps to calculate the error term
To find θˆ we need to derive ω(θ). This is done in three steps: 1
for a given value of δ and θ2 , compute the market shares
2
for a given value of θ2 , find δ that equates the calculated market shares to the observed market shares
3
for a given θ, compute the error term
27
Step 0: sample individuals
To approximate the integral in the market share equation, we first sample ns individuals. The individual consists of a K -dimensional vector of shocks that determine the individual’s taste parameters, vi = (vi1 , . . . , viK ) and demographics, Di = (Di1 , . . . , Did ). Remember that v is drawn from a normal distribution. Demographics D can be drawn from the Current Population Survey (CPS) or by sampling real individuals.
28
Step 1: compute the market shares Market shares are sj (x, p, δ; θ2 ) =
R Aj
ˆ ∗ (D). dP∗ ()dPv∗ (v )d P D
Integral can be computed by simulation. sj (·) =
ns ns 1 X 1 X sij = ns ns i=1 i=1 " # K X k k exp δj + xj (σk vi + πk1 Di1 + · · · + πkd Did k=1
× 1+
J X m=1
" exp δm +
K X
#, k xm (σk vik + πk1 Di1 + · · · + πkd Did
k=1
where (vi1 , . . . , viK ) and (Di1 , . . . , Did ), i = 1, . . . , ns, are draws ˆv∗ (v ) and P ∗ (D), respectively. from P D
29
Step 2: invert the system of equations
In the second step we want to find δ that solves s(δ; θ2 ) = S. BLP proof that the following contraction mapping can be used to solve for δ δ h+1 = δ h + ln S − ln s(p, x, δ h , Pns ; θ2 ),
h = 0, . . . , H,
where H is the smallest integer such that |δ H − δ H−1 | is smaller than some tolerance level.
30
Step 3: compute the error term
Once we have δ we can calculate the error term as ωj = δj (S; θ2 ) − X1 θ1 = δj (S; θ2 ) − (xj β − αpj ) ≡ ξj , where X1 is a vector containing all characteristics (including price). Shows why we make the distinction between θ1 and θ2 : θ1 enters linearly, while θ2 enters nonlinearly.
31
Conditional logit model Note that the steps are straightforward when we have the conditional logit model since we have a close form solution for sj : sj =
exp(xj β − αp + ξj ) exp(δj ) = . PJ PJ 1 + k=1 exp(xk β − αpk + ξk ) 1 + k=1 exp(δk )
Step 2 can then be done analytically. Since s0 = 1 −
J X j=1
sj =
1 1+
PJ
k=1 exp(δk )
,
we can rewrite sj = exp(δj ) · s0 . Using sj = Sj , δj = ln Sj − ln S0 . Step 3 is then ωj = ln Sj − ln S0 − (xj β − αpj ).
32
Minimize the objective function
Once we have the ω(θ), we need to find the value of θ that minimizes the objective function ω(θ)0 Z Φ−1 Z 0 ω(θ). To compute Φ we need to have a consistent estimate of E [Z 0 ωω 0 Z ]. Problem, because initially we do not have this consistent estimate. ˆ and use Solution is to start with Φ = Z 0 Z , obtain an estimate θ, 0 ˆ ˆ this estimate to calculate a new weight matrix ω(θ) Z Φ−1 Z 0 ω(θ).
33
Minimize the objective function To speed up the estimation we can use the first order conditions with respect to θ1 to express θ1 as a function of θ2 , i.e., θˆ1 = (X10 Z Φ−1 Z 0 X1 )−1 X10 Z Φ−1 Z 0 δ(θˆ2 ) As a result we can limit the nonlinear search to θ2 . Procedure in a nutshell: 1
begin with some initial θ2
2
calculate ω
3
calculate weight matrix and objective function
4
algorithm comes with a new values for θ2
5
start a new iteration
34
Instruments To estimate the model we need a set of exogenous instrumental variables. Instruments need to be correlated with the endogenous variable, but not with the error term. The endogeneity problem can be treated by assuming the location of the brands in characteristics space is exogenous. Characteristics of other products will be correlated with price since the markup depends on distance from nearest neighbor, but uncorrelated with the error term by assumption. BLP use observed product characteristics, the sum of the product characteristics across own-firm products, and the sum of the characteristics across rival firm products.
35
Berry, Levinsohn, and Pakes (1995)
Classical paper in Industrial Organization. Using only market-level data in a discrete choice framework, cost and demand parameters can be estimated. Random coefficients logit to get realistic substitution patterns. Rewrite the model in such a way that GMM can be used to control for correlation between unobserved heterogeneity and prices. Techniques are applied to the US automobile market.
36
Model
The demand side of the economic model BLP we have discussed using Nevo (2000). Although not needed to estimate demand elasticities, BLP also model the supply side of the market. Pricing equation can be derived by taking the first-conditions with respect to prices. Assume that a Nash equilibrium to the pricing game exists.
37
Demand side
Utility is given by uij = α log(yi − pj ) + xj β¯ + ξj +
X
σk xjk vik + ij ,
k
where vi = (yi , vi1 , vi2 , . . . , viK ) is a vector of random variables with multivariate normal distribution. BLP scale vik such that E [vik2 ] = 1, so that the mean and variance of the marginal utilities of characteristic k are β¯k and σk2 .
38
Cost functions Multi-product setting: F firms produce a subset Ff of J products. For convenience assume the marginal cost of good j is given by ln(mcj ) = wj γ + ωj , where wj is a vector of observed cost components, ωj is an unobserved component, and γ is a vector of parameters to be estimated. Product characteristics might be part of wj , just as ωj might be correlated with ξj .
39
Cost functions
Given the demand system profits of firm f are given by X Πf = (pj − mcj )Msj (p, x, ξ; θ), j∈Ff
where M is the size of the market. Taking first-order conditions gives sj (p, x, ξ; θ) +
X r ∈Ff
(pr − mcr )
∂sr (p, x, ξ; θ) = 0. ∂pj
40
Cost functions To derive the price-cost markups pj − mcj define the following J by J matrix ( −∂sr ∂pj , if r and j are produced by the same firm; ∆jr = 0, otherwise. This first-order condition in vector notation becomes s(p, x, ξ; θ) − ∆(p, x, ξ; θ)[p − mc] = 0, which can be rewritten as mc = p − ∆(p, x, ξ; θ)−1 s(p, x, ξ; θ) = p − b(p, x, ξ; θ), where b(p, x, ξ; θ) = ∆(p, x, ξ; θ)−1 s(p, x, ξ; θ).
41
Cost functions
The cost function becomes ln(mc) = ln(p − b(p, x, ξ; θ)) = w γ + ω. As with the demand functions, use moment condition E [ωj |z] = 0 to estimate the cost function. The vector markups b(p, x, ξ; θ) only depends on the parameters of the demand system and p. The vector of prices p depends on ω, so there will be correlation between ξ and ω.
42
Instruments We now need a set of instruments for ω as well. Instruments should be mean independent of ξ and ω, i.e., E [ξj |z] = E [ωj |z] = E [ξj ] = E [ωj ] = 0. Contrarily to prices, the determination of product characteristics and cost shifters are not modeled, which makes them exogenous. BLP use own and product characteristics and cost shifters of competitors as instruments. Intuition is that in an oligopoly model Nash markups depend on relative substitutability of product. Important to make a distinction between own and rival products.
43
Estimation Objective function is to minimize ||GJ (θ; s, Pns )||, where J 1X ξj (θ, s, Pns ) GJ (θ; s, Pns ) = Hj (z)T (zj ) , J ωj (θs , Pns ) j=1
Hj (z) is a matrix containing functions of z, and T (z)0 T (z) = Ω(z)−1 .
44
Computation Given δ, market shares s can be calculated by simulation, i.e., sj (p, x, ξ, θ, Pns ) =
ns 1 X fj (vi , δ, p, x, θ). ns i=1
A contraction mapping can be used to find δ that solves s(δ; θ) = S, i.e., δ h+1 = δ h + ln s n − ln s(p, x, δ h , Pns ; θ),
h = 0, 1, . . . , H.
Finally, ξj is obtained by ξj = δj (θ, s, P) − xj β.
45
Computation Extra step: calculation of the cost side unobservable ωj . Solving ln(pj − bj (p, x, ξ; θ)) = wj γ + ωj for ωj gives ωj = ln(pj − bj (p, x, ξ; θ)) − wj γ, where bj (p, x, ξ; θ) = ∆−1 jr sj . To calculate ∆jr we take the derivative of R sj = fj (v , δ, p, x, θ)P0 (dv ) with respect to pj and pq , i.e., Z ∂sj ∂µij = fj (v , δ, x, p, θ)(1 − fj (v , δ, x, p, θ)) P0 (dv ), ∂pj ∂pj Z ∂sj ∂µij = −fj (v , δ, x, p, θ)fq (v , δ, x, p, θ) P0 (dv ), ∂pq ∂pq
46
Data
Focus is on the US car market. Price variable is the list retail price. Sales variable is US sales. Data includes information on all models marketed between 1971 and 1990. Model/year is treated as one observation: total sample size is 2217.
47
Data
Because of computational constraints the set of characteristics that can be included is limited. Included characteristics: • ratio of horsepower to weight (power) • miles per dollar (fuel efficiency) • dummy for air-conditioning (luxury) • size (size/safety)
48
Results: Logit and IV Logit
Remember that in case of logit demand δj = xj β − αpj = ln(sj ) − ln(s0 ). To estimate this specification we can simply run an OLS regression of observed difference between the market shares of the products and the outside good on product characteristics, price, and a constant. Since we expect the error term to be correlated with prices, we can also this specification using an instrumental variable estimating technique and the instruments discussed.
49
Results: Logit and IV Logit
50
Results: Full model
Marginal utility for each characteristic is now different across consumers, so we estimate a mean and a variance for each attribute. Two specifications for the cost function: constant returns to scale and non-constant returns to scale by including ln(q), i.e., ln(cj ) = wk γw + γq ln(qj ) + ωj .
51
Results: Full model
52
Results: Full model
53
Results: Full model
54
Results: Full model
55
View more...
Comments