Three Essays on Decision Theory

October 30, 2017 | Author: Anonymous | Category: N/A

Share Embed

Report this link

Short Description

Yutaka Nakamura, Jawwad Noor, John Quiggin, Norio Takeoka, Jean-Marc. Tallon, and .. where ......

Description

Three Essays on Decision Theory by Youichiro Higashi

Submitted in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Supervised by Professor Larry Epstein Department of Economics The College Arts & Sciences

University of Rochester Rochester, New York 2008

ii

Dedication To my family

iii

Curriculum Vitae The author was born in Okayama, Okayama, Japan, January 15, 1976. After finishing his primary education there, he attended Keio University and graduated with a Bachelor of Arts degree in Economics in 1998. He began graduate studies at Keio University and received a Master of Arts degree in Economics in 2000. He came to the University of Rochester in the fall of 2002. He received Yasuo Uekawa Memorial Fellowship from 2002 till 2003 and Graduate Fellowship from 2003 till 2007. Focusing his research on Decision Theory, he received a Master of Arts degree from the University of Rochester in 2004.

iv

Acknowledgments Most importantly, I would like to thank my advisor, Professor Larry Epstein. I learned a lot from him about how to formulate research ideas and how to express them precisely. He patiently encouraged me to find research problems, and generously shared his time and energy to discuss ideas. My deepest thanks goes to him for all his help and advice. I am grateful also to Árpád Ábrahám, Sophie Bade, Paulo Barelli, Larry Blume, Kazuya Hyogo, Atsushi Kaji, Barton Lipman, Tomoyuki Nakajima, Yutaka Nakamura, Jawwad Noor, John Quiggin, Norio Takeoka, Jean-Marc Tallon, and Katsutoshi Wakai for helpful conversations and comments regarding some chapters of the thesis. Chapter 1 is presented at the 2006 Spring Meeting of the Japanese Economic Association (Fukushima University), RUD 2006 (Paris VI), Keio University, Kyoto University, and Hitotsubashi University. Chapter 2 is presented at the 2005 Annual Meeting of the Japanese Economic Association (Kyoto Sangyo University), Canadian Economic Theory Conference (Toronto University), RUD 2006 (Paris VI), Keio University, Kobe University, Kyoto University, Osaka University, and Shiga University. I thank my coauthors, Kazuya Hyogo and Norio Takeoka. I always benefited from conversations and discussions with them. Financial support, especially Yasuo Uekawa Memorial Fellowship and Graduate Fellowship, from the Department of Economics is greatly appreciated. Finally, I would like to thank my family, my parents Hideo and Hiroko, and my brother for their support, understanding, and love.

v

Abstract The standard Savage approach models uncertainty using a primitive state space. This approach is problematic because it presumes that the modeler can observe what kind of uncertainty an agent perceives in her mind: the state space should be derived rather than assumed as a primitive. Kreps (1979, 1992), Nehring (1999), and Dekel, Lipman and Rustichini (2001) (henceforth DLR) show how a subjective state space may be derived from preference defined on a suitable domain, and therefore from in principle observable behavior. I extend DLR in two directions. Chapter 1 provides a lexicographic expected utility model with endogenous states. Therefore, it generalizes both Blume, Brandenberger, and Dekel (1991), who present a lexicographic model with exogenous states, and also DLR, who present an expected utility model with endogenous states. I interpret the representation as modeling an agent who has several “hypotheses” about her state space, and who views some as “infinitely less relevant” than others. In Chapter 2, I formulate an infinite horizon extension of DLR. An axiomatic foundation is provided for the random discounting model, where an agent acts as if she believes that her discount factors change randomly over time. I demonstrate that there exists behavior which can be interpreted as reflecting uncertainty about future discount factors alone. This subjective uncertainty is uniquely pinned down by behavior. Another critique of Savage’s subjective expected utility theory is due to Ellsberg (1961). In the Ellsberg Paradox, behavior interpreted as aversion to ambiguity is inconsistent with Savage’s theory. Chapter 3 examines the value of

vi information when preference conforms to multiple priors utility model (Gilboa and Schmeidler (1989)), and a signal is ambiguous (Epstein and Schneider (2007)). In a Bayesian model, it is well known that the value of information may not be concave as a function of the amount of information, which can lead to nonexistence of equilibria and other modeling difficulties. I show that introduction of ambiguity into a signal doesn’t necessarily cause nonconcavity of the value of information. I identify what type of information quality choice leads to the value of information being nonconcave.

vii

Contents Dedication . . . . . Curriculum Vitae Acknowledgments Abstract . . . . . . Contents . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. ii . iii . iv . v . viii

Foreword 1 Lexicographic expected Utility with 1.1 Introduction . . . . . . . . . . . . . 1.2 The DLR Model . . . . . . . . . . . 1.3 Motivating Example . . . . . . . . 1.4 Lexicographic Representation . . . 1.5 Proofs . . . . . . . . . . . . . . . . 1.5.1 Proof of Theorem 1.4 . . . . 1.5.2 Proof of Corollary 1.5 . . . 1.5.3 Proof of Theorem 1.6 . . . . 1.5.4 Proof of Corollary 1.7 . . .

1 a . . . . . . . . .

Subjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Subjective random discounting and intertemporal 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Objective and Outline . . . . . . . . . . . . 2.1.2 Motivating Example . . . . . . . . . . . . . 2.1.3 Related Literature . . . . . . . . . . . . . . 2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Domain . . . . . . . . . . . . . . . . . . . . 2.2.2 Random Discounting Representations . . . . 2.3 Foundations . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Axioms . . . . . . . . . . . . . . . . . . . .

Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

2 2 4 7 9 14 14 16 19 23 25 25 25 30 32 35 35 38 40 40

viii

2.4 2.5

2.6 2.7

2.3.2 Representation Results . . . . . . . . . . . . . . . . . . . 46 2.3.3 Special Case: Deterministic Discounting . . . . . . . . . 49 2.3.4 Proof Sketch for Sufficiency of Theorem 2.1 . . . . . . . 51 Greater Demand for Flexibility and Greater Uncertainty . . . . 53 Applications of Random Discounting Model . . . . . . . . . . . 58 2.5.1 Consumption-Saving Decision under Random Discounting 58 2.5.2 Optimal Stopping Problem . . . . . . . . . . . . . . . . . 63 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2.7.1 Hausdorff Metric . . . . . . . . . . . . . . . . . . . . . . 70 2.7.2 Perfect Commitment Menus . . . . . . . . . . . . . . . . 70 2.7.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 72 2.7.4 Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . 76 2.7.5 Proof of Theorem 2.2 . . . . . . . . . . . . . . . . . . . . 91 2.7.6 Proof of Corollary 2.3 . . . . . . . . . . . . . . . . . . . 92 2.7.7 Proof of Proposition 2.1 . . . . . . . . . . . . . . . . . . 94 2.7.8 Proof of Theorem 2.4 . . . . . . . . . . . . . . . . . . . . 95 2.7.9 Proof of Theorem 2.5 . . . . . . . . . . . . . . . . . . . . 96 2.7.10 Proof of Proposition 2.2 . . . . . . . . . . . . . . . . . . 100 2.7.11 Proof of Proposition 2.3 . . . . . . . . . . . . . . . . . . 104 2.7.12 Proof of Proposition 2.4 . . . . . . . . . . . . . . . . . . 105 2.7.13 Proof of Proposition 2.5 . . . . . . . . . . . . . . . . . . 110

3 A note on the value of information with an ambiguous 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Model 1 . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Model 2 . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Model 3 . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Model 4 . . . . . . . . . . . . . . . . . . . . . . . 3.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The Detailed Algebra for Model 1 . . . . . . . . . 3.3.2 Proof of Proposition 3.1 . . . . . . . . . . . . . . 3.3.3 Proof of Proposition 3.2 . . . . . . . . . . . . . . 3.3.4 Proof of Proposition 3.3 . . . . . . . . . . . . . . 3.3.5 Proof of Proposition 3.4 . . . . . . . . . . . . . . Bibliography

signal112 . . . . 112 . . . . 114 . . . . 114 . . . . 118 . . . . 120 . . . . 121 . . . . 123 . . . . 123 . . . . 124 . . . . 125 . . . . 126 . . . . 127 128

1

Foreword Chapter 1 is based on the paper entitled “lexicographic expected utility with a subjective state space”, which is a collaboration with Kazuya Hyogo. My own participation in this work is mainly Section 1.2 (The DLR Model), 1.3 (Motivating Example), and 1.4 (Lexicographic Representation) except Theorem 1.4. Chapter 2 is based on the paper entitled “subjective random discounting and intertemporal choice”, which is a collaboration with Kazuya Hyogo and Norio Takeoka. My own participation in this work is mainly Section 2.1 (Introduction, Motivating Example, Related Literature), 2.2 (Model) except Proposition 2.6, 2.3 (Foundations) except Proposition 2.1, and 2.5.2 (Applications, Optimal Stopping Problem).

2

Chapter 1 Lexicographic expected Utility with a Subjective State Space 1.1

Introduction

In the theory of subjective probability, Savage derives unique probability over objective states from preference. In Anscombe-Aumann framework, Blume, Brandenburger, and Dekel [7] (henceforth BBD) develop a non-Archimedean subjective probability model. Their representation has a lexicographic hierarchy of subjective probabilities over objective states. A common restrictive feature of these subjective probability models is the exogenous state space. Kreps [36],[37] showed how subjective uncertainty could be revealed by the ranking of menus of alternatives. Building on it, Dekel, Lipman, and Rustichini [12] (henceforth DLR) endogenize the state space in an Archimedean framework. DLR take preference over menus of lotteries as a primitive and derive a unique subjective state space, corresponding to pos-

3 sible future preferences over lotteries. In this chapter, we consider a nonArchimedean model with subjective states. Our model is related to BBD in the same way that DLR is related to Savage. As in DLR, this chapter considers preference over menus of lotteries. By weakening their axiom Continuity, we provide a lexicographic representation (S, U, {µk }K k=1 ): is a tuple consisting of a nonempty finite state space S, a state dependent utility function U : ∆(B) × S → R, and a hierarchy {µk }K k=1 of (signed) measures such that for every menu x and y K K xy⇔ µk (s) max U (β, s) ≥L µk (s) max U (β, s) ; s∈Sk

β∈x

k=1

s∈Sk

β∈y

k=1

here ∆(B) is the set of lotteries over a finite set of prizes B, and ≥L compares each level of the hierarchy lexicographically. In the special case of our model where flexibility is valued, all µk ’s are positive measures. The interpretation of the representation above is as follows. The agent anticipates that after a state in S will be realized, she chooses the best lottery out of the menu. The difference from DLR is how she perceives subjective contingencies ex ante. That is, in her mind, the agent has multiple hypotheses about subjective states. The measure µ1 indicates her primary hypothesis about subjective states. She has a secondary hypothesis µ2 . If two menus are indifferent according to her primary hypothesis, she uses the secondary hypothesis in order to compare the menus. She has a tertiary hypothesis, which is represented by µ3 . And so on. Since µk matters for the ranking of any two menus only if those menus are indifferent according to µ1 , · · · , µk−1 , we interpret that the hypothesis µk is thought as “infinitely less relevant” than µ1 , · · · , µk−1 .

4 For uniqueness of the representation, the relevant part is the ex post preference ∗s over ∆(B) determined by each s ∈ S. Under suitable conditions, we show uniqueness of the hierarchy of (incomplete) subjective state spaces { ∗s | s ∈ ∪kj=1 supp(µj )}K k=1 . The organization of this chapter is as follows. Section 1.2 introduces the DLR model. In Section 1.3, we provide a motivating example showing that Continuity is not always compelling. Section 1.4 states the main results. All proofs are collected in Section 1.5.

1.2

The DLR Model

DLR include the following primitives: • B: finite set of prizes, let |B| = B • ∆(B): set of probability measures over B, it is compact metric under the weak convergence topology, a generic element is denoted by β and referred to as a lottery • X : set of closed nonempty subsets of ∆(B), it is endowed with the Hausdorff topology, a generic element is denoted by x and called a menu 1 • preference is defined on X The interpretation is as follows: At time 0 (ex ante), the agent chooses a menu according to . At time 1 (ex post), a subjective state is realized and 1

DLR do not restrict menus to be closed. If we allow any subset to be a menu, then we

have to modify the definition of critical set. Under the slight modification, all results remain to be the same.

5 then she chooses a lottery out of the previously chosen menu. Note that the ex post stage is not a primitive of the formal model. However, since the agent is forward looking, her ex ante choice of menus reflects her subjective perception of states. Therefore preference over menus reveals a subjective state space. The following are the main axioms in DLR. Axiom 1.1 (Order). is complete and transitive. We define the mixture of two menus for a number λ ∈ [0, 1] by λx + (1 − λ)x′ = {λβ + (1 − λ)β ′ |β ∈ x, β ′ ∈ x′ } . The following is a version of Independence Axiom adapted to a model with preference over menus. Axiom 1.2 (Independence). For all x, y, z ∈ X and λ ∈ (0, 1), x y ⇔ λx + (1 − λ)z λy + (1 − λ)z. Axiom 1.3 (Nontriviality). There exist x and x′ such that x ≻ x′ . Axiom 1.4 (Continuity). For every menu x, the sets {x′ ∈ X |x′ x} and {x′ ∈ X |x x′ } are closed. The next axiom is introduced by Dekel, Lipman, and Rustichini [13] (henceforth DLR2) to ensure, together with the other axioms, the finiteness of the state space. Let conv(x) denote the convex hull of x. Definition 1.1. A set x′ ⊂ conv(x) is critical for x if for all menus y with x′ ⊂ conv(y) ⊂ conv(x), we have y ∼ x.

6 Axiom 1.5 (Finiteness). Every menu has a finite critical subset. The intuition is that when the agent faces a menu and contemplates future contingencies, she cares about only finite possibilities. Note that the set of states she cares about could depend on the menu. So this axiom does not by itself imply finiteness of the subjective state space. Now, we explain a finite state space version of DLR’s model. Let S be a state space. A function U : ∆(B) × S → R is a state dependent utility function if U(β, s) has an expected utility form, that is, for β ∈ ∆(B), U (β, s) =

β(b)U(b, s).

b∈B

Consider the functional form W : X → R defined by W (x) =

s∈S

µ(s) max U (β, s), β∈x

(1.1)

where µ is a measure on S. Note that S is just an index set though we call it the state space. Given the pair (S, U ), define the ex post preference ∗s over ∆(B) by β ∗s β ′ ⇔ U (β, s) ≥ U (β ′ , s). and let P (S, U ) = {∗s |s ∈ S}. Following DLR, we refer to the set of ex post preferences P (S, U) as the subjective state space. In general, there are many functional forms (1.1) which represent the same preference on X . In order to obtain the uniqueness property, DLR concentrate

7 on “relevant” subjective states: given a representation of the form (1.1), a state s is relevant if there exist menus x and y such that x ≁ y and that for every s′ = s, maxβ∈x U (β, s′ ) = maxβ∈y U(β, s′ ). Definition 1.2. A finite additive representation (S, U, µ) is a tuple consisting of a nonempty finite state space S, a state dependent utility function U : ∆(B) × S → R, and a measure µ such that (i) is represented by the functional form W : X → R, (ii) every state s ∈ S is relevant, and (iii) if s = s′ , then ∗s =∗s′ . DLR and DLR2 prove Theorem 1.1. satisfies Order, Independence, Nontriviality, Continuity, and Finiteness if and only if it has a finite additive representation. Corollary 1.2. Suppose has a finite additive representation. Then all finite additive representations of have the same subjective state space. Axiom 1.6 (Monotonicity). If x ⊂ x′ , then x′ x. Monotonicity states that the agent values the flexibility of having more options. The consequence of Monotonicity is the following. Corollary 1.3. satisfies Monotonicity and the axioms in Theorem 1.1 if and only if it has a finite additive representation with a positive measure µ.

1.3

Motivating Example

In this section, we argue that the axiom Continuity is not always compelling.

8 The intuition against Continuity is as follows: Suppose that a menu x is strictly preferred to a menu x′ . Consider an agent who perceives some subjective contingencies and who has, in her mind, several hypotheses about these contingencies. Think of a hypothesis as a (singed) measure over contingencies that is used to weight the valuation of outcomes across states.2 She may view one hypothesis is “infinitely less relevant” than another. Think of this as being captured by a hierarchy of hypotheses. Then there is a critical level k ∗ such that x and x′ are indifferent according to each hypothesis at level k less than k ∗ , but x is strictly better than x′ according to the hypothesis at level k ∗ . Now consider a “small” variation of x, denoted by xǫ . Then she should rank xǫ strictly better than x′ using only the contingencies derived by the hypothesis at level k ∗ . However, the critical level for comparing x′ and xǫ may be different than k ∗ ; x′ could be better than xǫ according to the hypothesis at the new critical level. Therefore the small deviation might change the ranking between the menus. The following examples are provided to illustrate this intuition. Example: Consider an agent who used to like peanut butter very much, but who has an allergy to peanut now. Moreover, when she chooses an orange, she will pick the one which is more likely to be sweet. There are three alternatives: the first one is an orange oǫ which turns out to be sweet with probability 0.9 + ǫ and sour with 0.1 − ǫ; the second one is an orange o which turns out to be sweet with probability 0.9 and sour with 0.1; 2

As explained later, a hypothesis in the formal model does not corresponds to beliefs

about states, and thus we refer instead to “weights”.

9 the last one is bread with peanut butter, which is denoted by p. Then she may have the following ranking: for every ǫ ∈ (0, 0.1] {oǫ } ≻ {o, p} ≻ {o}.

(1.2)

The intuition is that she has two hypotheses for her allergy: the first is that allergy continues, and the second is that her allergy disappears. However, she thinks that it is infinitely less relevant to take into account the possibility that her allergy disappears. That is, she would rank the two hypotheses hierarchically in her mind. First, consider the first and second menus. Since flexibility provided by bread with peanut butter is irrelevant in the primary hypothesis, the ranking of the first and second menus follows the taste of orange. Hence, the agent prefers the first menu to the second one. Next, consider the second and third menus. At first, the agent uses the primary hypothesis to rank the menus. Since two menus are indifferent in the primary hypothesis, the ranking of menus in the secondary hypothesis is relevant for her choice among menus. Thus she wants to retain the opportunity to have peanut butter. The agent prefers the second menu to the third one. Ranking (1.2) violates Continuity.

1.4

Lexicographic Representation

In the previous section, the difficulties for Continuity arise out of the strict preference relation. Therefore, we impose “continuity” only for indifference sets.

10 Axiom 1.7 (Indifference Continuity). For every menu x, the indifference set {x′ ∈ X |x′ ∼ x} is closed. There is no corresponding axiom in BBD. The reason is that BBD assume that the state space is exogenous and finite. In our model, the state space is derived endogenously from preference. Since we weaken Continuity, a numerical representation is not always possible. We consider a lexicographic representation that compares a vector of utilities assigned to a menu by ≥L .3 More formally, let S and U : ∆(B) × S → R be a state space and a state dependent utility function. Consider the vectorvalued function V : X → RK defined by K V (x) = µk (s) max U (β, s) , s∈S

(1.3)

β∈x

k=1

where {µk }K k=1 is a hierarchy of measures. This vector-valued function is the counterpart of the DLR functional form (1.1). We need also a counterpart of “relevance”: given a representation of the form (1.3), a state s is relevant if there exist menus x and y such that x ≁ y and that for every s′ = s, maxβ∈x U(β, s′ ) = maxβ∈y U (β, s′ ). Definition 1.3. A lexicographic representation S, U, {µk }K k=1 is a tuple con-

sisting of a nonempty finite state space S, a state dependent utility function U : ∆(B) × S → R, and a hierarchy {µk }K k=1 of measures such that K K (i) x y ⇔ µk (s) max U(β, s) ≥L µk (s) max U (β, s) , s∈S

β∈x

k=1

s∈S

β∈y

k=1

(ii) every state s ∈ S is relevant, and (iii) if s = s′ , then ∗s =∗s′ . 3

For a, b ∈ RK , a ≥L b if and only if whenever bk > ak , there is a j < k such that aj > bj .

11 The integer K is referred to as the length (of the hierarchy). Now we state our main result: Theorem 1.4. satisfies Order, Independence, Nontriviality, Indifference Continuity, and Finiteness if and only if it has a lexicographic representation. For interpretation, note that the ex post behavior is as in DLR: a state s in S will be realized at the beginning of time 1. Then she will choose the best alternative out of the previously chosen menu according to the ex post utility function U (·, s). Moreover, she anticipates this ex post behavior at time 0. The difference from DLR is how she perceives subjective contingencies ex ante. The agent has a hierarchy of measures in her mind. Each level of the hierarchy represents her hypothesis about how she should allow for the future contingencies ex ante. The measure µ1 indicates her primary hypothesis. She has a secondary hypothesis, which is represented by µ2 . If menus are indifferent according to her primary hypothesis, she compares them according to her secondary hypothesis. She has a tertiary hypothesis, which is represented by µ3 . And so on. Since µk enters into the ranking of any two menus x and y only if x and y are indifferent as to µ1 , · · · , µk−1 , the measure µk is relevant but may be thought of as being “infinitely less relevant” than µ1 , · · · , µk−1 . To illustrate further the meaning of “µk is infinitely less relevant than µk−1 ”, consider the special case where there is no overlap among the supports of µk ’s. Suppose that sk−1 and sk belong to supp(µk−1 ) and supp(µk ) respectively. Consider two menus x and y such that the agent expects the same ex post utilities at all states except sk−1 and sk . Then the ex post ranking between x and y at sk−1 determines the ex ante ranking regardless of the ex post ranking

12 at sk . This leads us to say “sk is infinitely less relevant than sk−1 ”. In the Archimedean case, as in DLR, every sate is either relevant or not. Our model admits a richer comparison between subjective states. That is, there may be a state which is relevant but infinitely less relevant than another state. Uniqueness of the representation does not hold in general. For example, the µk ’s are not uniquely determined by preference, just as in DLR. Secondly, there may be redundancies in the hierarchy [7, p. 66]. To express the uniqueness properties of our representation, define for each k = 1, . . . , K, ∗ k Pk (S, U, {µk }K k=1 ) = {s |s ∈ ∪j=1 supp(µj )} ⊂ P (S, U ). K Following DLR, we can think of {Pk (S, U, {µk }K k=1 )}k=1 as a hierarchy of (in-

complete) subjective state spaces. Note that there is a lexicographic representation with minimal length K, denoted by K ∗ . To avoid the redundancies, we concentrate on lexicographic representations of minimal length K ∗ . Corollary 1.5. Suppose that admits a lexicographic representation. Let ′ ′ ∗ ′ K∗ S, U, {µk }K k=1 and S , U , {µk }k=1 be lexicographic representations of with the minimal length K ∗ . Then, for k = 1, · · · , K ∗ , ∗

∗

′ ′ ′ K Pk (S, U, {µk }K k=1 ) = Pk (S , U , {µk }k=1 ).

The next axiom is the difference between our model and DLR’s finite additive representation. Axiom 1.8 (Upper Semicontinuity). For every menu x, the upper contour set {x′ ∈ X |x′ x} is closed.

13 Theorem 1.6. has a lexicographic representation and satisfies Upper Semicontinuity if and only if it has a finite additive representation (as in DLR). Since the axiom Indifference Continuity is uncommon, and perhaps original to this paper, we verify next that it is critical in Theorem 1.4, that is, it is not implied by the other axioms. Let u, v : ∆(B) → R be continuous linear nonconstant functions. Consider an order nsc over menus represented by the following functional form U nsc : X → R proposed in Gul and Pesendorfer [26].4 Temptation without self-control : U nsc (x) = max u(l) subject to v(β) ≥ v(β ′ ) for every β ′ ∈ x. β∈x

It is easy to check that nsc satisfies Order, Independence, Nontriviality, and Finiteness. However, in general, it violates Indifference Continuity: let B = ¯ β ∗ , β n be {b1 , b2 , b3 }, u(β) = β2 , and v(β) = β1 , where βi = β(bi ). Let β, lotteries such that β¯ = (β¯1 , β¯2 , β¯3 ) β ∗ = (β¯1 , 1 − β¯1 , 0) β n = (β¯1 − ǫn , 1 − β¯1 + ǫn , 0) for n ≥ 1, ¯ βn} ∞ , where β¯1 , β¯2 , β¯3 > 0 and β¯1 > ǫ > 0. The sequence of menus, {β, n=1

¯ β ∗ } in the Hausdorff topology and {β, ¯ β n } ∼ {β} ¯ converges to the menu {β,

¯ β ∗ } ≻ {β}, ¯ contradicting Indifference Confor every n. However, we have {β, tinuity.5 4 5

The acronym nsc means no self-control. While nsc may violate Indifference Continuity, it satisfies Upper Semicontinuity (see

[26]).

14 Finally, if we add Monotonicity to the axioms in Theorem 1.4, then all measures are positive: Corollary 1.7. satisfies Monotonicity and the axioms in Theorem 1.4 if and only if it has a lexicographic representation where all measures are positive.

1.5 1.5.1

Proofs Proof of Theorem 1.4

The necessity of the axioms is easily verified. We show only the sufficiency part. To begin with, note that Finiteness implies that x ∼ conv(x) for every menu x. In fact, by Finiteness, every menu x has a critical subset x′ . Moreover, x′ ⊂ conv(conv(x)) = conv(x). It follows from the definition of critical set that x ∼ conv(x). Thus, we can restrict attention to the set of closed, convex, nonempty subsets of ∆(B), denoted by X ∗ . Let S B = {s ∈ RB | i si = 0, and i s2i = 1}. For x ∈ X ∗ , define a

function σx : S B → R by

σx (s) = max β · s β∈x

The following is Lemma 5 in DLR2 [13]: Lemma 1.1. There is a finite subset S ∗ ⊂ S B such that for every x, y ∈ X ∗ , [∀s ∈ S ∗ , σx (s) = σy (s)] ⇒ x ∼ y.

15 Proof. In DLR2’s argument, Continuity is required to prove their Lemma 3. In fact, Indifference Continuity is enough to show the result. See [13, pp. 22—27]. Note that S ∗ is not empty by Nontriviality. Let m(≥ 1) be the cardinality of S ∗ . An implication of the above lemma is that we can embed X ∗ into m-dimensional vector space. We denote M = (σx (s))s∈S ∗ | x ∈ X ∗ ⊂ Rm .

It is a closed convex subset of Rn and contains the 0 vector. Moreover, it is a mixture space in the sense of Hausner [28]. In the following, we identify σx with a corresponding element in M. Preference on X ∗ induce ∗ on a mixture space M. That is, σx ∗ σy if and only if x y. Then, since satisfies Order and Independence, ∗ also satisfies Order and Independence. The following lemma directly follows from the result of Hausner [28]. Lemma 1.2. (i) ∗ satisfies Order and Independence if and only if there are K(≦ n) affine functions Vk : M → R such that K σx ∗ σy ⇔ (Vk (σx ))K k=1 ≥L (Vk (σy ))k=1 .

(ii) Moreover, there is minimal K, denoted by K ∗ , less than or equal to n. ∗

∗

K {Vk′ }K k=1 satisfy the above representation in place of {Vk }k=1 if and only if

there are real numbers ak > 0, bkj and ck such that, for every σ ∈ M, Vk′ (σ)

= ak Vk (σ) +

k−1 j=1

bkj Vj (σ) + ck .

16 Now, follow the argument in DLR2 (see [13, p. 27]). Then Vk has extension to a continuous linear function on Rm . Moreover, there exists a vector (µk (s))s∈S ∗ ∈ Rm such that Vk (M ) =

s∈S ∗

µk (s)Ms for every M ∈ M.

Define a state dependent utility function U : ∆(B)×S ∗ → R by U(β, s) = β ·s. By construction, every s ∈ S ∗ is relevant and ∗s =∗s′ if s = s′ . Thus we have the desired representation (S ∗ , U, {µk }K k=1 ).

1.5.2

Proof of Corollary 1.5

∗ ¯ U, ¯ {¯ Definition 1.4. A lexicographic representation S, µk }K k=1 is canonical if

it satisfies

S¯ ⊂ S B , and U¯ (β, s¯) = β · s¯ for every β ∈ ∆(B). ∗ Lemma 1.3. For every S, U, {µk }K k=1 , there exists a canonical representation ∗ K∗ ¯ U, ¯ {¯ ¯ ¯ µk }K ∗ represent the S, µk }K k=1 such that (i) S, U, {µk }k=1 and S, U , {¯ k=1

same preference, and for every k = 1, · · · , K ∗ ,

∗ ∗ ¯ ¯ µk }K (ii) Pk (S, U, {µk }K k=1 ) = Pk (S, U, {¯ k=1 ).

Proof. Let cs =

1 U (b, s) and ck = µk (s)cs . B b∈B s∈S

Note that U (·, s) cannot be a constant function for every s ∈ S since every s ∈ S is relevant. Thus b∈B (U (b, s) − cs )2 has to be strictly positive. Define a function φ : S → RB by

φ(s) =

U (b, s) − cs ′ 2 b′ ∈B (U(b , s) − cs )

b∈B

.

17 It is straightforward that φ is one-to-one and φ(s) belongs to S B for every s ∈ S. ¯ ¯ Define a measure Let S¯ = φ(S) ⊂ S B and U(β, s¯) = β · s¯ for every s¯ ∈ S. µ ¯k over S M by µ ¯k (φ(s)) = µk (s)

(U (b, s) − cs )2 . b∈B

By definition,

∗s =∗φ(s)

and φ(s) ∈ supp(¯ µ) ⇔ s ∈ supp(µ). Hence, for every

∗ ¯ ¯ µk }K ∗ ). Moreover, it holds that k = 1, · · · , K, Pk (S, U, {µk }K k=1 ) = Pk (S, U , {¯ k=1

φ(s)∈S¯

¯ µ ¯k (φ(s)) max U(β, φ(s)) = { β∈x

s∈S

µk (s) max U(β, s)} − ck . β∈x

∗ ¯ ¯ µk }K ∗ reprePart (ii) of Lemma 1.2 implies that S, U, {µk }K k=1 and S, U , {¯ k=1 sents the same preference.

′ ′ ∗ ′ K∗ Lemma 1.4. Let S, U, {µk }K k=1 and S , U , {µk }k=1 be canonical lexico-

graphic representations of with the minimal length K ∗ . Then, for k = 1, · · · , K ∗ , ∗

∗

′ ′ ′ K Pk (S, U, {µk }K k=1 ) = Pk (S , U , {µk }k=1 ).

Proof. To begin with, we see that S = S ′ (and hence U = U ′ ). Suppose to the contrary that S = S ′ . Without loss of generality, we assume there exists s¯ ∈ S \ S ′ . Fix a sphere y ⊂int∆(B). Define x = ∩s∈S∪S ′ \{¯s} {β ∈ ∆(B)| β · s ≦ max β ′ · s}. ′ β ∈y

18 Since both representations are canonical, it holds that max U(β, s¯) > max U(β, s¯), β∈x

β∈y

max U(β, s) = max U(β, s) for every s ∈ S, s = s¯, and β∈x

β∈y

max U ′ (β, s) = max U ′ (β, s) for every s ∈ S ′ . β∈x

β∈y

′ ′ ∗ ′ K∗ Hence, x ≁ y according to S, U, {µk }K , but x ∼ y according to S , U , {µ } k=1 k k=1 . This is a contradiction.

∗

∗

K Finally, we show Pk (S ′ , U ′ , {µ′k }K k=1 ) ⊂ Pk (S, U, {µk }k=1 ). Then the other ∗

∗

∗

′ ′ ′ K K direction Pk (S, U, {µk }K k=1 ) ⊂ Pk (S , U , {µk }k=1 ) also holds since (S, U, {µk }k=1 ) ∗

and (S ′ , U ′ , {µ′k }K k=1 ) are symmetric. Define, as in the proof of Theorem 1.4, M = (σx (s))s∈S | x ∈ X ∗ , Vk (σx ) = µk (s) max β · s, and s∈S

Vk′ (σx ) =

s∈S

β∈x

µ′k (s) max β · s. β∈x

Since Vk and Vk′ represents the same order, it follows from Part (ii) of Lemma 1.2 that Vk′ (σx ) = ak Vk (σx ) +

k−1 j=1

bkj Vj (σx ) + ck for k = 1, · · · , K ∗ .

Thus, supp(µ′k ) ⊂ ∪kj=1 supp(µj ), and hence, for every l ≦ k, supp(µ′l ) ⊂ ∪lj=1 supp(µj ) ⊂ ∪kj=1 supp(µj ). Therefore, ∪kj=1 supp(µ′j ) ⊂ ∪kj=1 supp(µj ). That is, ∗

∗ k ′ Pk (S ′ , U ′ , {µ′k }K k=1 ) ={s |s ∈ ∪j=1 supp(µj )} ∗

⊂{∗s |s ∈ ∪kj=1 supp(µj )} = Pk (S, U, {µk }K k=1 )

19

1.5.3

Proof of Theorem 1.6

Necessity is straightforward. We show only sufficiency. ∗ Let S, U, {µk }K k=1 be a canonical lexicographic representation of minimal length K ∗ . We see that if K ∗ ≥ 2, then Upper Semicontinuity is violated.

More specifically, we construct menus x and x′ and a sequence of menus {xn } such that xn → x, xn ≻ x′ for every n, and x′ ≻ x. Fix a sphere y ⊂int∆(B). Define x′ = ∩s∈∪2j=1 supp(µj ) {β ∈ ∆(B)| β · s ≤ max β ′ · s}. ′ β ∈y

Case 1: supp(µ1 ) ∪2j=1 supp(µj ). Pick a state s′ ∈ supp(µ2 ) \ supp(µ1 ). Subcase 1-1: µ2 (s′ ) > 0. Let ǫ > 0. Define x = ∩s∈∪2j=1 supp(µj ) { β ∈ ∆(B)| β · s ≤ f1 (s)}, where f1 (s′ ) = maxβ ′ ∈y β ′ · s′ − ǫ and f1 (s) = maxβ ′ ∈y β ′ · s for s = s′ . We take ǫ small enough so that x is a menu with maxβ∈x β · s = f1 (s) for every s ∈ S. This is possible since S is finite. Take a state s∗ ∈ supp(µ1 ). First, we consider the case µ1 (s∗ ) < 0. Let ξ ∈ (0, 1). Define xn = ∩s∈∪2j=1 supp(µj ) { β ∈ ∆(B)| β · s ≤ g1 (s, n)}, where g1 (s′ , n) = maxβ ′ ∈y β ′ ·s′ −ǫ, g1 (s∗ , n) = maxβ ′ ∈y β ′ ·s∗ −ξ n , and g1 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s∗ . We take ξ small enough so that maxβ∈xn β ·

20 s = g1 (s, n) for every s and n. Again, this is possible since S is finite. By construction, xn → x. Compare xn and x′ . In the first hierarchy, the difference between the valuations from xn and x′ is s∈S

µ1 (s) max β · s − β∈xn

s∈S

µ1 (s) max′ β · s = −µ1 (s∗ )ξ n > 0. β∈x

Hence, xn ≻ x′ for every n. Next, compare x′ and x. Since s′ ∈ / supp(µ1 ), x′ is indifferent to x in the first hierarchy. In the second hierarchy, the difference between the valuations from x and x′ is s∈S

µ2 (s) max′ β · s − β∈x

s∈S

µ2 (s) max β · s = µ2 (s′ )ǫ > 0. β∈x

Therefore, x′ ≻ x. For µ1 (s∗ ) > 0, we modify {xn }. Define xn = ∩s∈∪2j=1 supp(µj ) { β ∈ ∆(B)| β · s ≤ g2 (s, n)}, where g2 (s′ , n) = maxβ ′ ∈y β ′ ·s′ −ǫ, g2 (s∗ , n) = maxβ ′ ∈y β ′ ·s∗ +ξ n , and g2 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s∗ . Then the same argument holds. Subcase 1-2: µ2 (s′ ) < 0. With the following modification, we can make the same argument as in Subcase 1-1. Let x = ∩s∈∪2j=1 supp(µj ) { β ∈ ∆(B)| β · s ≤ f2 (s)},

21 where f2 (s′ ) = maxβ ′ ∈y β ′ · s′ + ǫ, and f2 (s) = maxβ ′ ∈y β ′ · s for s = s′ . If µ1 (s∗ ) > 0, define xn = ∩s∈∪2j=1 supp(µj ) { β ∈ ∆(B)| β · s ≤ g3 (s, n)}, where g3 (s′ , n) = maxβ ′ ∈y β ′ ·s′ +ǫ, g3 (s∗ , n) = maxβ ′ ∈y β ′ ·s∗ +ξ n , and g3 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s∗ . If µ1 (s∗ ) < 0, define xn = ∩s∈∪2j=1 supp(µj ) { β ∈ ∆(B)| β · s ≤ g4 (s, n)}, where g4 (s′ , n) = maxβ ′ ∈y β ′ ·s′ +ǫ, g4 (s∗ , n) = maxβ ′ ∈y β ′ ·s∗ −ξ n , and g4 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s∗ . Case 2: supp(µ1 ) = ∪2j=1 supp(µj ). First, note that supp(µ1 ) has to contain more than two states since we consider a lexicographic representation of minimal length. Moreover, there exit two states s′ and s′′ such that

µ2 (s′ ) µ1 (s′ )

=

µ2 (s′′ ) . µ1 (s′′ )

Subcase 2-1: µ1 (s) < 0 for s = s′ , s′′ . Label s′ , s′′ as

µ2 (s′ ) µ1 (s′ )

<

µ2 (s′′ ) . µ1 (s′′ )

Let ǫ > 0. Define

x = ∩s∈supp(µ1 ) { β ∈ ∆(B)| β · s ≤ f3 (s)}, ′

(s ) where f3 (s′ ) = maxβ ′ ∈y β ′ · s′ − ǫ, f3 (s′′ ) = maxβ ′ ∈y β ′ · s′′ + µµ11(s ′′ ) ǫ, and f3 (s) =

maxβ ′ ∈y β ′ · s for s = s′ , s′′ . We take ǫ small enough so that x is a menu with maxβ∈x β · s = f3 (s) for every s ∈ S. This is possible since S is finite. Let ξ ∈ (0, 1). Define xn = ∩s∈supp(µ1 ) { β ∈ ∆(B)| β · s ≤ g5 (s, n)},

22 where g5 (s′ , n) = maxβ ′ ∈y β ′ · s′ − ǫ − ξ n , g5 (s′′ , n) = maxβ ′ ∈y β ′ · s′′ +

µ1 (s′ ) ǫ, µ1 (s′′ )

and g5 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s′′ . We take ξ small enough so that maxβ∈xn β · s = g5 (s, n) for every s and n. Again, this is possible since S is finite. By construction, xn → x. Compare xn and x′ . In the first hierarchy, the difference between the valuations from xn and x′ is s∈S

µ1 (s) max β · s − β∈xn

s∈S

µ1 (s) max′ β · s = −µ1 (s′ )ξ n > 0. β∈x

Hence, xn ≻ x′ for all n. Next, compare x′ and x. In the first hierarchy, x is indifferent to x′ because s∈S

µ1 (s) max′ β · s − β∈x

s∈S

µ1 (s) max β · s = µ1 (s′ )ǫ − µ1 (s′′ ) β∈x

µ1 (s′ ) ǫ = 0. µ1 (s′′ )

The difference between the valuations from x′ and x in the second hierarchy is s∈S

µ2 (s) max′ β · s − β∈x

s∈S

µ2 (s) max β · s β∈x

=µ2 (s′ ) max β · s′ + µ2 (s′′ ) max β · s′′ β∈y β∈y

µ1 (s′ ) ′ ′ ′′ ′′ − µ2 (s ) max β · s − ǫ − µ2 (s ) max β · s + ǫ β∈y β∈y µ1 (s′′ ) µ2 (s′ )µ1 (s′′ ) − µ2 (s′′ )µ1 (s′ ) = ǫ > 0. µ1 (s′′ ) Therefore, x′ ≻ x. With the following constructions of x and {xn }, the same argument holds for the other cases.

Subcase 2-2: µ1 (s) > 0 for s = s′ , s′′ .

23 Label s′ , s′′ as

µ2 (s′ ) µ1 (s′ )

<

µ2 (s′′ ) . µ1 (s′′ )

Define

x = ∩s∈supp(µ1 ) { β ∈ ∆(B)| β · s ≤ f4 (s)}, ′

(s ) where f4 (s′ ) = maxβ ′ ∈y β ′ · s + ǫ, f4 (s′′ ) = maxβ ′ ∈y β ′ · s′′ − µµ11(s ′′ ) ǫ, and f4 (s) =

maxβ ′ ∈y β ′ · s for s = s′ , s′′ . Define xn = ∩s∈supp(µ1 ) { β ∈ ∆(B)| β · s ≤ g6 (s, n)}, where g6 (s′ , n) = maxβ ′ ∈y β ′ · s′ + ǫ + ξ n , g6 (s′′ , n) = maxβ ′ ∈y β ′ · s′′ −

µ1 (s′ ) ǫ, µ1 (s′′ )

and g6 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s′′ . Subcase 2-3: µ1 (s′ ) < 0 and µ1 (s′′ ) > 0. If

µ2 (s′ ) µ1 (s′ )

<

µ2 (s′′ ) , µ1 (s′′ )

the constructions of x and {xn } are the same as Subcase

µ2 (s′ ) µ1 (s′ )

>

µ2 (s′′ ) , µ1 (s′′ )

we define

2-1. If

x = ∩s∈supp(µ1 ) { β ∈ ∆(B)| β · s ≤ f5 (s)}, ′

(s ) where f5 (s′ ) = maxβ ′ ∈y β ′ · s′ + ǫ, f5 (s′′ ) = maxβ ′ ∈y β ′ · s′′ − µµ11(s ′′ ) ǫ, and f5 (s) =

maxβ ′ ∈y β ′ · s for s = s′ , s′′ . Define xn = ∩s∈supp(µ1 ) { β ∈ ∆(B)| β · s ≤ g7 (s, n)}, where g7 (s′ , n) = maxβ ′ ∈y β ′ · s′ + ǫ − ξ n , g7 (s′′ , n) = maxβ ′ ∈y β ′ · s′′ − and g7 (s, n) = maxβ ′ ∈y β ′ · s for s = s′ , s′′ .

1.5.4

Proof of Corollary 1.7

The necessity is straightforward. We show only the sufficiency.

µ1 (s′ ) ǫ, µ1 (s′′ )

24 To begin with, note that Monotonicity implies that µ1 is a positive measure. Let (S, U, {µj }K j=1 ) be a canonical lexicographic representation of such that µj is a positive measure for j = 1, · · · , k (≦ K − 1). It is enough to show that there is a canonical lexicographic representation (S, U, {µ′j }K j=1 ) of such that µ′j = µj for j = 1, · · · , k and µ′k+1 is a positive measure. First, we see that if s ∈ supp(µk+1 ) \ ∪kj=1 supp(µj ), then µk+1 (s) > 0. Suppose to the contrary that µk+1 (s) < 0. Fix a sphere y ⊂int∆(B). Define x = ∩s′ ∈∪k+1 supp(µj )\{s} {β ∈ ∆(B)| β · s′ ≦ max β ′ · s′ }. ′ β ∈y

j=1

Then the representation implies that y ≻ x while y ⊂ x, contradicting Monotonicity. Next, we construct the desired {µ′j }K j=1 . Let ǫ > 0 be such that min{µj (s)|j = 1, · · · , k, and s ∈ ∪kj=1 supp(µj )} > ǫ max |µk+1 (s)| . s

Define {µ′j }K j=1 by µ′j = µj for j = k + 1, µ′k+1

=

k

µj + ǫµk+1 .

j=1

Then it follows from Part (ii) of Lemma 1.2 that (S, U, {µ′j }) represents the same order as (S, U, {µj }). By construction, µ′j is a positive measure for j = 1, · · · , k + 1. This completes the proof.

25

Chapter 2 Subjective random discounting and intertemporal choice 2.1 2.1.1

Introduction Objective and Outline

In intertemporal decision making, a decision maker (DM) faces two kinds of trade-offs among alternatives. The first is trade-offs from the difference of alternatives within a period. The second is intertemporal trade-offs between different periods. At an intuitive level, anticipating intertemporal trade-offs seems harder than anticipating trade-offs within a period. Thus we consider a DM who is certain about ranking of alternatives within a period, yet who is uncertain about future intertemporal discount rates. In addition to the above intuition, several authors have mentioned psychological reasons for uncertainty about discount factors. As Yaari [54] and

26 Blanchard [6] point out, a discount factor admits an interpretation as a probability of death. Depending on the future prospects of diseases, armed conflicts, and discoveries in medical treatments, the probabilities of death will change over time. An alternative interpretation is to think not of an agent but of a dynasty, in which case a discount factor is regarded as a degree of altruism. The bequest motives of the current generations may fluctuate over time because they may die without descendants. Becker and Mulligan [3] suggest a model in which a discount factor depends on how much the DM invests resources for making future pleasures less remote. The choice of investment in those activities is affected by economic variables, for instance, the DM’s wealth or interest rates, which are uncertain by their own nature. Thus these uncertainties may lead to random discounting.1 Moreover, random discounting has been used in several macroeconomic models with infinite horizon mainly because it is a useful device for generating heterogeneity across agents, particularly, realistic wealth heterogeneity in quantitative models. For example, see Atkeson and Lucas [2], Chatterjee, Corbae, Nakajima, and Rios-Rull [10], and Krusell and Smith [39]. However, preference shocks to discounting are often postulated in an ad hoc way since shocks are not directly observable to the analyst. The reliance on those unobservables seems problematic. In this chapter, we provide an axiomatic foundation for the random discounting model, in which the DM believes that her discount factors change 1

See Mehra and Sah [42, Section 1.1, pp. 871-873] for more examples about fluctuations

in subjective parameters.

27 randomly over time. That is, we demonstrate that there exists behavior which can, in principle, pin down expected shocks to discount factors. For this purpose, we extend the two-period framework of Kreps [36], [37] and Dekel, Lipman, and Rustichini [12] (hereafter DLR) to an infinite horizon setting. They axiomatize a preference shock model by considering preference over menus (opportunity sets) of alternatives. If a DM is aware of uncertainties regarding her future preference over alternatives, then her ranking of menus reflects how she perceives those uncertainties. Kreps and DLR derive the set of future preferences from the ranking of menus. With regard to the behavioral characterization of random discounting, note first that uncertainty about future preferences, whether it is about future discount factors or other aspects of preference, leads to a demand for flexibility — larger menus are preferred. This observation is due to Kreps and DLR. However, if uncertainty is only about discount factors, then flexibility has value only in the limited cases. The behavioral characterization of random discounting takes the form of identifying primarily the instances where flexibility has no value (See the example that follows shortly for elaboration). To admit sequential decision making, we adopt the same domain of choice object as used by Gul and Pesendorfer [27]. Let C be the outcome space (consumption set), which is a compact and metric space. There exists a compact metric space Z such that Z is homeomorphic to K(∆(C ×Z)), where ∆(C ×Z) is the set of lotteries, that is, all Borel probability measures over C × Z and K(·) denotes the set of all non-empty compact subsets of “·”. An element of Z, called a menu, is an opportunity set of lotteries over pairs of current con-

28 sumption and future menus. Preference is defined on Z ≃ K(∆(C × Z)). We have in mind the following timing of decisions: Period 0: the DM chooses a menu x. Period 1− : current discount factor α becomes known to the DM. Period 1: the DM chooses a lottery l out of the menu x. Period 1+ : the DM receives a pair (c, x′ ) according to realization of the lottery l. Period 2− : another discount factor α′ is realized. Period 2: she chooses a lottery l′ out of the menu x′ . .. . Notice that is preference in period 0. Thus, beyond period 0, the time line above is not part of the formal model. However, if the DM has in mind this time line and anticipates uncertain discount factors to be resolved over time, then should reflect the DM’s perception of those uncertainties. In this way, our domain can capture the expectation of random discounting. We provide an axiomatic foundation for the following functional form, called the random discounting representation: there exist a non-constant, continuous, mixture linear function u : ∆(C) → R, and a probability measure µ over [0, 1] with Eµ [α] < 1 such that is represented numerically by the functional form, U(x) =

max (1 − α)u(lc ) + α U(z) dlz dµ(α),

[0,1] l∈x

Z

where lc and lz are the marginal distributions of l on C and on Z, respectively. The above functional form can be interpreted as follows: the DM behaves as if she has in mind the time line described above, and anticipates a discount

29 factor α to be realized with probability µ in every period. After the realization of α, she evaluates a lottery by the weighted sum of its instantaneous expected utility u(lc ) and its expected continuation value U (z) dlz . The same repre-

sentation U is used to evaluate a menu at all times — the representation has a stationary recursive structure. Consequently, her belief about future discount factors is constant over time. We show uniqueness of the DM’s belief µ. That is, the components (u, µ) of the representation are uniquely derived from preference. This result is in stark contrast to that of Kreps [36], [37] and DLR, where the subjective belief about future preferences is not identified because of state-dependence of those future preferences. In our model, though future preference depends on future states as in the above literature, the ex post ranking of multistage lotteries is certain ex ante, or independent of the states; thus permits uniqueness. Because of the uniqueness result, it is meaningful to compare subjective beliefs among agents. We provide a behavioral condition capturing the situation where one agent is more uncertain about discount factors than the other. In case of objective uncertainty, second-order stochastic dominance has been widely used to describe such comparison (Rothschild and Stiglitz [44]). If agent 2 perceives more uncertainty about discount factors than agent 1, agent 2 should be more reluctant to make a commitment to a specific plan than is agent 1. This greater demand for flexibility is the behavioral manifestation of greater uncertainty about future discount factors. We apply the resulting model to a consumption-saving problem and analyze how uncertainty about discount factors affects saving behavior. Given

30 the assumption that an instantaneous utility function is CRRA with the parameter σ < 1 (or σ > 1), the saving rate increases (or decreases) when the DM becomes more uncertain about future discount factors in the sense of second-order stochastic dominance. We point out that uncertainty about discount factors has the opposite effect on savings, when compared to uncertainty about interest rates. Moreover, we consider optimal stopping problems under sampling with and without recall. In the standard model, the DM never benefits from the option to accept offers rejected previously. In contrast, the DM with random discount factors may strictly desire such option.

2.1.2

Motivating Example

To understand what behavior characterizes the random discounting model, consider a simple example as follows. Let C stand for a set of monetary payoffs. Suppose that the DM faces uncertainty about future discount factors. As pointed out by Kreps [36] and DLR, if a DM is uncertain about her own future preferences, she may desire to keep options open until uncertainty is resolved, that is, she exhibits preference for flexibility. For example, consider two alternatives, ($50, {($100, z)}) and ($100, {($50, z)}) in ∆(C × Z), which might be chosen in period 1. They differ in consumption levels in periods 1 and 2. From period 3 on, both alternatives guarantee the same opportunity set z. The more patient the DM is in period 1, the more likely ($50, {($100, z)}) will be chosen in period 1 over ($100, {($50, z)}). Now suppose that the DM ranks {($50, {($100, z)})} over {($100, {($50, z)})} in terms of commitment ranking,

31 which reflects the DM’s ex ante perspective on random discount factors. Nevertheless, she may still prefer keeping ($100, {($50, z)}) as an option and hence exhibits {($50, {($100, z)}), ($100, {($50, z)})} ≻ {($50, {($100, z)})} . This ranking reflects the DM’s belief that she might become impatient in period 1, in which case ($100, {($50, z)}) would be more attractive than ($50, {($100, z)}). However, if the DM is uncertain only about future discount factors, then other forms of flexibility may not be valued. In that case, she would be sure of her preference over consumption in the next period (in this example, consumption is scalar and hence greater consumption is preferred to less), and also other preference over menus for the rest of the horizon — uncertainty is relevant only for rankings when an intertemporal trade-off must be made, as in comparing ($50, {($100, z)}) and ($100, {($50, z)}), between period 1 consumption and the menu for period 2 onward. Accordingly, some forms of flexibility are not valuable given uncertainty only about future discount factors. To illustrate further, consider two alternatives, a degenerate lottery ($50, {($100, z)}) and a lottery l yielding ($0, {($0, z)}) or ($100, {($200, z)}) with an equal probability of 1/2. In terms of the current consumption, l induces the lottery yielding $0 or $100 with probability one half, while it induces the lottery over menus with an equal chance of {($0, z)} or {($200, z)}. If $50 and {($100, z)} are both preferred to these induced lotteries respectively, the DM does not face an intertemporal trade-off between ($50, {($100, z)}) and l. No matter how patient she will be in the next period, l will not be chosen over ($50, {($100, z)}). Since there is no benefit to keeping l as an option with ($50, {($100, z)}), the

32 DM will exhibit {($50, {($100, z)}), l} ∼ {($50, {($100, z)})} . We say that a lottery l dominates another lottery l′ if the marginal distributions of l on C and on Z are both preferred to those of l′ . The DM facing uncertainty about discount factors will not choose dominated lotteries and hence does not care whether such options are in a menu. Thus, while a preference for flexibility reveals uncertainty about future preferences, the zero flexibility value of dominated options reveals that uncertainty is only about future discount factors.

2.1.3

Related Literature

Random discounting in a number of infinite horizon macroeconomic models, where its role broadly appears, generates suitable heterogeneity across agents. In Krusell and Smith [39], random discounting is better match to wealth heterogeneity observed in data. Chatterjee, Corbae, Nakajima, and Rios-Rull [10] construct a general equilibrium model where agents with random discounting are allowed to default. They are able to match default rate consistent with data in part because agents with low discount factors tend to consume more and to default more frequently. Karni and Zilcha [29] prove that if the agents have random discount factors, in a steady state competitive equilibrium other than the most patient agents hold capital. This contrasts with the result in deterministic economies where only the most patient agents hold capital. (See Becker [4].)

33 Atkeson and Lucas [2], Lucas [41], and Farhi and Werning [22] study an overlapping generations model where agents are altruistic and it is private information for the agents how strongly they are willing to consume currently, which is modeled by random discounting. Their analyses show that the dynamics of the constrained efficient distribution of consumption, led by private information of discount factors, depends crucially on the welfare function used by the social planner. Caballé and Fuster [8] and Dutta and Michel [15] model imperfect altruism by random discounting and analyze how the distributions of wealth and bequests, led by random discounting, depend on social security systems. In models of monetary economics, random discounting also plays important roles. In a two-period model with random discounting, Goldman [25] shows the possibility that an agent holds money that yields lower interest than another interest-bearing asset. If the discount factor is random, the agent may be willing to change her portfolio, consisting of money and another asset, after she knows her discount factor. Since the transaction cost of money is lower than that of another asset, money allows the agent to change her portfolio more easily, and hence can be valuable for the agent with random discounting. In an infinite horizon model, where agents has random discounting only in the first period, Kocherlakota [32] analyzes why illiquid nominally risk-free bonds coexist with money. Random discounting creates motives for trade among agents, but they are not allowed to lend and borrow money. The paper shows that bonds enable agents to engage intertemporal exchange of money if they are illiquid, that is, costly to exchange for goods.

34 To provide a foundation for random discounting, we follow the literature of preference on opportunity sets approach. Koopmans [34] first introduces an opportunity set as a choice object to model sequential decision making, and emphasizes that intertemporal choice may be essentially different from a onceand-for-all decision making. He points out that, if a DM perceives uncertainty about future preferences, she may strictly prefer to leave some options open rather than to choose a completely specified future plan right now. Kreps [36, 37] interprets uncertain future preferences as the subjective uncertainties of the DM. He provides an axiomatic foundation for the subjective state space. Dekel, Lipman and Rustichini [12] refine Kreps’s idea and show uniqueness of the subjective state space. Furthermore, Dekel, Lipman, Rustichini and Sarver [14] modify the argument of DLR surrounding the additive representation with subjective states. In this line of research, our result can be viewed as an infinite-horizon extension of DLR, where the DM’s subjective state space is specified to the set of sequences of discount factors. Several authors provide models about intertemporal choice consistent with preference for flexibility. Rustichini [46] follows the same idea as in DLR and considers, as choice objects, closed subsets of C ∞ . In this framework, all subjective uncertainties are resolved one period ahead. Kraus and Sagi [35] follows the dynamic model provided by Kreps and Porteus [38] and consider a sequence of preferences without the completeness axiom. Each incomplete preference is represented by the decision rule of the form that one choice object is preferred to another if the former is unanimously preferred to the latter with respect to a set of uncertain future preferences. This uncertainty leads to preference

35 for flexibility. Takeoka [50] introduces objective states into DLR’s model and considers preference over menus of menus of Anscombe-Aumann acts, which is viewed as a three-period extension of DLR. He derives as components of representation a subjective decision tree and a subjective probability measure on it. Since a belief µ over [0, 1] is identical across time, we can effectively pay attention to a part of the subjective state space, that is, the support of µ. Unlike Kreps [36], [37] and DLR, we can pin down a subjective belief µ. Stateindependence of u and the stationary recursive form of the representation make this result possible. Moreover, unlike Takeoka [50], [51], we can show uniqueness without assuming any objective states.

2.2 2.2.1

Model Domain

Let C be the outcome space (consumption set), which is assumed to be compact and metric. Let ∆(C) be the set of lotteries, that is, all Borel probability measures over C. Under the weak convergence topology, ∆(C) is also compact and metric. Gul and Pesendorfer [27] show that there exists a compact metric space Z such that Z is homeomorphic to K(∆(C × Z)), where K(·) denotes the set of non-empty compact subsets of “·”. 2 Generic elements of Z are denoted by x, y, z, · · · . Each such object is called a menu (or an opportunity 2

The set K(∆(C × Z)) is endowed with the Hausdorff metric. Details are relegated to

Section 2.7.1.

36 set) of lotteries over pairs of current consumption and menu for the rest of the horizon. Preference is defined on Z ≃ K(∆(C × Z)). We have in mind the timing of decisions as mentioned in Introduction. An important subdomain of Z is the set L of perfect commitment menus where the agent is committed in every period. We identify a singleton menu with its only element. Then a perfect commitment menu can be viewed as a multistage lottery considered in Epstein and Zin [20]. A formal treatment is found in Section 2.7.2. The following examples illustrate the recursive domain Z can accommodate sequential decision problems. Example (Consumption-Saving Problem). Let c and s denote consumption and savings in a period, respectively. Given a constant interest rate r > 0, the DM has a wealth of (1 + r)s, where s ≥ 0 is the savings from the previous period, and decides how much she consumes in the period. The rest, s′ = (1 + r)s − c, is carried over to the next period. Assume that the savings s′ cannot be negative. As a result, the DM faces a budget set

B(s) = { (c, s′ ) ∈ R2+ c + s′ = (1 + r)s},

which is translated into the menu x(s) = {(c, x(s′ ))| (c, s′ ) ∈ B(s)}. If r is random, x(s) can be modified easily to a set of lotteries. Example (Durable Goods). Durable goods provide a flow of services over certain periods. The duration of a good depends both on the physical property of the good and how intensively it is used. Thus a durable good is regarded

37 as a feasible set with a technology frontier f(c) = 0, where c = {ct }∞ t=1 is a flow of services ct ≥ 0 from the durable good. That is, a flow c is feasible for the durable good f if and only if f (c) ≤ 0. In period t, the DM decides how much she consumes a service now given the technology f and the history of consumptions up to period t − 1, denoted by ct−1 = (c1 , · · · , ct−1 ). In other words, she faces the menu: xf (ct−1 ) = {(ct , xf (ct−1 , ct )) | f(ct−1 , ct , c) ≤ 0 for some sequence c}. The DM prefers one durable good f to another g if and only if xf (c0 ) xg (c0 ). Example (Sampling Problem). Given a wage offer w, which is a random sample from a distribution F , a DM has to decide between to accept w or to reject the offer and continue sampling. Once an offer is rejected, it is gone forever. If she accepts w, she receives current payoff from w and nothing for the rest of the horizon. If the DM chooses to continue sampling, she faces in the next period the same decision problem but a new random sample w′ . The same timing of decisions is repeated unless the DM takes an offer. This sampling problem can be described formally as follows: given a current offer w, define the menu x(w) ≡ {accept w, continue}. The object “accept w” is the consumption stream (w, {(0, {(0, · · · )})}), and “continue” is the lottery over menus of the form x(w′ ) = {accept w ′ , continue}, where w′ is given according to the distribution F .

38

2.2.2

Random Discounting Representations

Take any non-constant, continuous, mixture linear function u : ∆(C) → R and any Borel probability measure µ over [0, 1] with the mean α ¯ ≡ Eµ [α] < 1. Consider the functional form U : Z → R defined by

U(x) = max (1 − α)u(lc ) + α U(z) dlz dµ(α), [0,1] l∈x

(2.1)

Z

where lc and lz denote the marginal distributions of l on C and on Z, respectively. The functional form (2.1) can be interpreted as follows: the DM behaves as if she has in mind the time line described in Introduction, and anticipates uncertainty about discount factors, which is captured by µ over [0, 1]. On the other hand, she is certain about her future risk preference, u, over ∆(C). Moreover, she is certain also about her future ranking of menus, which is identical with the current ranking U, that is, the representation has a stationary and recursive structure. After seeing realization of discount factor α, the DM chooses a lottery out of the menu to maximize the “ex post” utility function, (1 − α)u(lc ) + α U(z) dlz , (2.2) Z

which is the weighted sum of expected utilities from current consumption and the opportunity set for the rest of the horizon. The functional form (2.1) says that the DM evaluates a menu x by taking the expected value of these maximum values with respect to her subjective belief µ over discount factors. Definition 2.1. Preference on Z admits a random discounting representation if can be represented numerically by the functional form U as given by (2.1) with components (u, µ).

39 A random discounting representation coincides with a stationary cardinal utility function on the subdomain L, that is, the set of perfect commitment options. Since the DM doesn’t have any opportunity for (non-trivial) choice, random discounting does not matter on this subdomain. The functional form (2.1) reduces to U (l) = (1 − α ¯ )u(lc ) + α ¯

U (l′ ) dlL (l′ ),

L

where lc and lL denote the marginal distributions of l ∈ ∆(C × L) on C and on L, respectively. This is a standard stationary recursive utility with a deterministic discount factor α ¯ < 1. Apart from the difference of choice objects,3 a random discounting representation is a special case of DLR’s additive representation of the form that U (x) =

max V (l, s) dµ(s),

S l∈x

(2.3)

where S is a state space, µ is a non-negative measure on S, and V (·, s) is a state-dependent expected utility function. Indeed, the ex post utility function (2.2) can be written as V (l, s) = (1 − α(s))u(lc ) + α(s)

U (z) dlz

Z

with an index s ∈ S. Hence a random discounting representation is a reduced form of (2.3) with subjective states regarding future discount factors. DLR have a model of the form (2.3) with a signed measure µ, where choice based on some subjective states may be negatively evaluated from the ex ante perspective. The DM having such a representation does not necessarily desire 3

DLR consider preference over K(∆(C)) with finite set C.

40 flexibility. Though the functional form (2.1) are also considered with a signed measure µ, we do not focus on this formally general model by the following reasons. First, as Koopmans [34] and Kreps [36] point out, there are a number of motivations for why uncertainty about future preferences leads to preference for flexibility. Second, a signed measure does not have implications of how choice behavior evolves over time. Such a model is not appropriate as a dynamic model, while the random discounting representation can generate a stochastic choice according to the probability measure µ

2.3 2.3.1

Foundations Axioms

The axioms which we consider on are the following. The first two axioms are standard and need no explanation. Axiom 2.1 (Order). is complete and transitive. Axiom 2.2 (Continuity). For all x ∈ Z, {z ∈ Z|x z} and {z ∈ Z|z x} are closed. For any l ∈ ∆(C × Z), lc and lz denote the marginal distributions of l on C and on Z, respectively. Axiom 2.3 (Nondegeneracy). There exist l, l′ ∈ ∆(C ×Z) such that lc = lc′ , lz = lz′ , and {l} ≻ {l′ }. This axiom says that there exist at least two lotteries that are not indifferent because of the difference regarding current consumption.

41 The next three axioms are the same as those in Gul and Pesendorfer [27]. Axiom 2.4 (Commitment Independence). For all l, l′ , l′′ ∈ ∆(C × Z) and for all λ ∈ (0, 1), {l} ≻ {l′ } ⇒ {λl + (1 − λ)l′′ } ≻ {λl′ + (1 − λ)l′′ }. Commitment preference ranks the set of lotteries ∆(C × Z). Axiom 2.4 says that commitment preference satisfies vNM independence. For any (c, x) ∈ C × Z, {(c, x)} denotes the singleton menu that assigns probability one to (c, x). Axiom 2.5 (Stationarity). For all x, y ∈ Z and c ∈ C, {(c, x)} {(c, y)} ⇔ x y. Since current consumption is the same c, the ranking between {(c, x)} and {(c, y)} reflects how the DM evaluates x and y in the next period. Thus, Stationarity means that the ranking over menus is identical across time. In general, belief about future discount factors may depend on the history of consumptions and realizations of discount factors up to that period. Stationarity, however, excludes such history-dependent beliefs: the DM is sure that her belief about discount factors will not change over time. We adopt Stationarity because it seems sensible as a first step, and because the more general model seems much more difficult to characterize and is beyond our grasp at this time. For any (c, x), (c′ , x′ ) ∈ C × Z and λ ∈ [0, 1], the notation λ ◦ (c, x) + (1 − λ) ◦ (c′ , x′ )

42 denotes the lottery over C × Z yielding (c, x) with probability λ and yielding (c′ , x′ ) otherwise. For any x, x′ ∈ Z and λ ∈ [0, 1], define the mixture of two menus by taking mixtures element by element between x and x′ , that is, λx + (1 − λ)x′ ≡ {λl + (1 − λ)l′ |l ∈ x, l′ ∈ x′ } ∈ Z. If the DM identifies a two-stage lottery λ ◦ l + (1 − λ) ◦ l′ with its reduced lottery λl + (1 − λ)l′ , λx + (1 − λ)x′ can be viewed also as a set of two-stage lotteries. Axiom 2.6 (Timing Indifference). For all x, y ∈ Z, c ∈ C, and λ ∈ (0, 1), {λ ◦ (c, x) + (1 − λ) ◦ (c, x′ )} ∼ {(c, λx + (1 − λ)x′ )}. Notice that λ ◦ (c, x) + (1 − λ) ◦ (c, x′ ) is the lottery yielding (c, x) with probability λ and yielding (c, x′ ) with probability 1−λ, while (c, λx+(1−λ)x′ ) is the degenerate lottery that assigns the pair of consumption c and menu λx + (1 − λ)x′ with certainty. Hence these two lotteries differ in timing of resolution of randomization λ. For the former, the DM makes choice out of a menu (either x or x′ ) after the resolution of λ, while, for the latter, this order is reversed, that is, the choice out of the menu λx + (1 − λ)x′ is made before the resolution of λ. Timing Indifference says that the DM does not care about this difference in timing. Timing Indifference can be justified by the same argument as in DLR. Suppose that a DM faces uncertainty about future preferences, yet she anticipates that her future preference over lotteries ∆(C × Z) will satisfy the expected utility axioms. For the lottery λ ◦ (c, x) + (1 − λ) ◦ (c, x′ ), the expected choice

43 in the future is obtained as λl + (1 − λ)l′ , where l and l′ are rational choices in x and in x′ according to a realized future preference. On the other hand, for the option (c, λx + (1 − λ)x′ ), λl + (1 − λ)l′ is a rational choice also in λx + (1 − λ)x′ because the realized future preference satisfies the expected utility axioms. Therefore, no matter what preference will be realized in the future, the two options will ensure indifferent consequences from the ex ante perspective.4 Axioms Order, Continuity, Commitment Independence, Stationarity, and Timing Indifference appear in Gul and Pesendorfer [27]. 5 They consider a DM facing with a self-control problem. Such a DM may be better off by restricting available options and hence exhibits preference for commitment rather than for flexibility. A key axiom of their model is called Set Betweenness: for any x, y ∈ Z, x y ⇒ x x ∪ y y. Even if x y, she may rank x over x ∪ y because y may contain a tempting option and hence she may have to exercise costly self-control at x ∪ y. We adopt the following two axioms, which distinguish our model from theirs. As motivated in Section 2.1.2, the DM facing uncertainty about her future preferences may want to leave options open as much as possible. This is 4

The DM may care about timing of resolution of risk and prefer earlier or later resolution

of multistage lotteries. Such distinction is examined in Kreps and Porteus [38]. Epstein, Marinacci and Seo [19] argue against Timing Indifference and provide a model with nonlinear future preferences. 5 Their Nondegeneracy axiom requires the existence of menus x, y with x ≻ y and x ⊂ y. That is, this axiom captures preference for commitment — a DM may prefer a smaller menu.

44 because flexibility allows the DM to make a decision contingent upon realization of her future preference. This informational advantage leads to preference for flexibility. Such a DM would rank x ∪ y over x even though x y. To accommodate such behavior, we follow Kreps and DLR, and assume (instead of Set Betweenness): Axiom 2.7 (Monotonicity). For all x, y ∈ Z, y ⊂ x ⇒ x y. This axiom says that a bigger menu is always weakly preferred. That is, Monotonicity is consistent with preference for flexibility. Monotonicity is consistent with any kind of uncertainty about future preferences. To identify behavior that reduces uncertainty about future preferences to that about future discount factors, we need to impose a qualification on the attitude toward flexibility. The DM facing random discount factors is sure how she evaluates a consumption in the next period and a menu from the period onward. Thus the uncertainty is relevant only when an intertemporal trade-off must be made. As motivated in Section 2.1.2, such a DM should not value flexibility provided by “dominated lotteries”, as we now describe formally. We define dimension-wise dominance as follows: let lc ⊗ lz denote the product measure on C × Z that consists of marginal distributions lc ∈ ∆(C) and lz ∈ ∆(Z). Definition 2.2. For all l, l′ ∈ ∆(C × Z), l dominates l′ if {lc ⊗ lz′ } {lc′ ⊗ lz′ } and {lc′ ⊗ lz } {lc′ ⊗ lz′ }, where lc (resp. lz ) denotes the marginal distribution of l on C (resp. Z). If the DM is certain about her risk preferences over ∆(C) and over ∆(Z) tomorrow, the commitment rankings appearing in the above definition should

45 reflect those preferences. Since lc ⊗ lz′ and lc′ ⊗ lz′ differ only in marginal distributions on C, the ranking {lc ⊗ lz′ } {lc′ ⊗ lz′ } reflects that lc is preferred to lc′ in terms of the future risk preference over ∆(C). Similarly, the ranking {lc′ ⊗ lz } {lc′ ⊗ lz′ } should reveal the DM’s future preference for lz over lz′ . If l dominates l′ , the marginal distributions of l on C and on Z are both preferred to those of l′ . Hence l will be chosen over l′ for sure by the DM who is certain about her future risk preferences over C and over Z. For any l ∈ ∆(C × Z), let O(l) be the set of all dominated lotteries by l, that is, O(l) ≡ {l′ ∈ ∆(C × Z) | l dominates l′ }.

(2.4)

If satisfies Order, l ∈ O(l). Thus a DM having preference for flexibility weakly prefers O(l) to {l}. However, there is no reason to choose a dominated lottery l′ ∈ O(l) over l. Hence O(l) should be indifferent to {l}. The same intuition should hold between a general menu x and the set O(x) ≡

O(l),

(2.5)

l∈x

that is, O(x) is the set of all lotteries dominated by some lottery in x. Notice that x ⊂ O(x) when satisfies Order. If satisfies Continuity, Proposition 2.7’s 1 in Section 2.7.3 ensures O(x) ∈ Z, that is, O(x) is a well-defined choice object. Axiom 2.8 (Marginal Dominance). For all x ∈ Z, x ∼ O(x). Marginal Dominance states that the DM should not care about dominated lotteries. Since O(x) is bigger than x, the DM having preference for flexibility weakly prefers O(x) to x. Thus this axiom is a counterpoint to Monotonicity,

46 and requires that it is not useful to keep dominated lotteries within the menu, that is, x O(x). Such behavior can be justified if the DM believes that her future risk preference over ∆(C) is separated from her future ranking of menus, and these two preferences are known to the DM without uncertainty. Then, dominated lotteries are surely useless because they give less utilities in the future, both immediate and remote, and hence she exhibits x ∼ O(x). Marginal Dominance involves a form of separability of preferences between immediate and remote future. Two remarks are in order. First, under this axiom, the DM cares only about the marginal distributions on C and on Z — the correlation between immediate consumption and the future opportunity set does not matter. Second, Marginal Dominance is stronger than the Separability axiom in Gul and Pesendorfer [27], which requires a form of separability only on the singleton sets, that is, for any c, c′ ∈ C and x, x′ ∈ Z, 1 1 1 1 ′ ′ ′ ′ ◦ (c, x) + ◦ (c , x ) ∼ ◦ (c , x) + ◦ (c, x ) . 2 2 2 2 If satisfies Marginal Dominance, for all l ∈ ∆(C × Z), {l} ∼ O({l}) = O({lc ⊗ lz }) ∼ {lc ⊗ lz }, both of the above singleton menus are indifferent to

1 1 1 ′ 1 ′ ◦c+ ◦c ⊗ ◦x+ ◦x , 2 2 2 2 and hence Separability holds.

2.3.2

Representation Results

Now we are ready to state the main theorem. Theorem 2.1. If preference satisfies Order, Continuity, Nondegeneracy, Commitment Independence, Stationarity, Timing Indifference, Monotonicity,

47 and Marginal Dominance, then there exists a random discounting representation (u, µ). Conversely, for any pair (u, µ) with α ¯ < 1, there exists a unique functional form U which satisfies functional equation (2.1) and the preference it represents satisfies all the axioms. A formal proof is relegated to Section 2.7.4. The above theorem is closely related to DLR. They show that preference over menus of lotteries on fintie alternatives admits the additive representation (2.3) with a non-negative measure if and only if it satisfies Order, Continuity, Monotonicity and;6 Axiom 2.9 (Independence). For all x, y, z and λ ∈ (0, 1], x ≻ y ⇒ λx + (1 − λ)z ≻ λy + (1 − λ)z. Since Commitment Independence, Stationarity, and Timing Indifference imply Independence,7 one might think that a natural strategy to obtain a random discounting representation would be as follows: (i) establish the additive representation (2.3) on K(∆(C × Z)); and (ii) by using the additional axioms (especially Marginal Dominance), manipulate the representation to convert the subjective state space S to the set of discount factors [0, 1]. However, we do not follow this strategy mainly because step (i) is not immediate: DLR consider menus of lotteries over finite alternatives as choice object and hence can regard as the subjective state space the compact set of expected utility fuctions over lotteries, while, in this paper, choice objects are menus of lotteries 6

Dekel, Lipman, Rustichini and Sarver [14] fill a gap in DLR surrounding this represen-

tation result. 7 See Gul and Pesendorfer [27, p. 125, footnote 7] for more details.

48 over a compact set. Thus, instead of dealing with the set of all mixture linear functions over the compact set, we start off the subjective state space [0, 1] of discount factors, and establish our functional form by adapting DLR’s argument with appropriate modifications. For an outline of the proof of sufficiency, see Section 2.3.4. The next result concerns uniqueness of the representation. If preference admits two distinct random discounting representations, say (u, µ) and (u′ , µ′ ), we cannot know which belief actually captures the DM’s subjective uncertainty about discount factors. We have the following uniqueness result. A proof can be found in Section 2.7.5. Theorem 2.2. If two random discounting representations, U and U ′ , with components (u, µ) and (u′ , µ′ ) respectively, represent the same preference, then: 1. u and u′ are cardinally equivalent; and 2. µ = µ′ . Theorem 2.2 pins down a subjective probability measure µ over the set of future discount factors, which is interpreted as the set of subjective states of the DM. Our result is a contrast to Kreps [36, 37] and DLR, where probability measures over subjective states are not identified; since the ex post utility functions are state-dependent, probabilities assigned to those states can be manipulated in any way. To prevent such a manipulation, DLR (p. 912) suggest that probability measures can be identified if some aspect of the ex post utility functions is state-independent. Such statement with respect to stateindependence is satisfied here: because u is independent of the state, the ex

49 post ranking of multistage lotteries is certain ex ante, or independent of the state; thus permits uniqueness.

2.3.3

Special Case: Deterministic Discounting

Imagine a “standard” DM, who is not anticipating any uncertainty about future discount factors. Such a DM can be modeled as a special case of the random discounting representation (u, µ), where µ is degenerate at some α ∈ [0, 1). Since she believes no uncertainty regarding discount factors, the standard DM should not care about flexibility. She will evaluate a menu by its best element according to a fixed weak order over the singleton sets, that is, x y ⇔ {lx } {ly },

(2.6)

where {lx } {l} and {ly } {l′ } for all l ∈ x and l′ ∈ y. Kreps [36] characterizes such a standard DM based on the next axiom: Axiom 2.10 (Strategic Rationality). For all x, y ∈ Z, x y ⇒ x ∼ x ∪ y. Strategic Rationality says that, as long as x is preferred to y, the DM does not care whether options in y are added into x or not. Consequently, Strategic Rationality excludes preference for flexibility.8 The next corollary of Theorem 2.1 states that deterministic discounting is characterized by replacing Monotonicity by Strategic Rationality. See Section 2.7.6 for a proof. 8

Strategic Rationality is more restrictive than Monotonicity. Indeed, to verify that the

former implies the latter, assume y ⊂ x. Arguing by contradiction, suppose y ≻ x. Strategic Rationality implies x = x ∪ y ∼ y ≻ x, which is a contradiction.

50 Corollary 2.3. Preference satisfies Order, Continuity, Nondegeneracy, Commitment Independence, Stationarity, Timing Indifference, Strategic Rationality, and Marginal Dominance if and only if admits a random discounting representation (u, µ) such that µ is degenerate. We provide a further perspective on Strategic Rationality. As mentioned above, a standard DM, who surely anticipates her preference in the next period, will rank menus according to the decision rule (2.6). Consequently, she should be indifferent between committing to a lottery l ∈ ∆(C × Z) and having the menu O∗ (l) ≡ {l′ ∈ ∆(C × Z) | {l} {l′ }}, that is, O∗ (l) is the set of all lotteries that are less preferred than l with respect to commitment ranking. Accordingly, setting O∗ (x) ≡ ∪l∈x O∗ (l) for all x ∈ Z, the standard DM should exhibits: Axiom 2.11 (Dominance). For all x ∈ Z, x ∼ O∗ (x). This axiom says that the DM does not care about keeping a lottery which is less preferred than some lottery in the menu in terms of commitment ranking. For l, l′ ∈ x, the support of l may be different from that of l′ , and hence these lotteries may differ in intertemporal trade-offs. However, Dominance implies that, as long as {l} {l′ }, the DM surely anticipates not to choose l′ in the next period. That is, she does not care about flexibility regarding intertemporal trade-offs in the future. Hence this axiom reveals that the DM has a deterministic discount factor. As one might imagine, Dominance has close relations to Strategic Rationality and Marginal Dominance.

51 Proposition 2.1.

1. Dominance is equivalent to the condition that: for

every x, x ∼ {lx }, where lx is a best element in x with respect to commitment ranking. 2. If satisfies Continuity, then Dominance is equivalent to Strategic Rationality. 3. If satisfies {l} ∼ O(l) for all lotteries l, then Dominance implies Marginal Dominance. See Section 2.7.7 for a proof. Taking parts 1 and 2 together, each of Strategic Rationality and Dominance characterizes the DM not having preference for flexibility. Regarding part 3, the condition {l} ∼ O(l) means that satisfies Marginal Dominance for the singleton sets. Without this condition, Dominance does not necessarily imply Marginal Dominance because the former does not impose any restriction on commitment ranking, while the latter requires it to satisfy dimension-wise dominance. Given the additional assumptions appearing in Proposition 2.1, the combination of Monotonicity and Marginal Dominance is weaker than Dominance. As shown in Theorem 2.1, the former is consistent with subjective uncertainty about future discount factors, whereas the latter precludes subjective uncertainty about future preferences as the consequence of Corollary 2.3.

2.3.4

Proof Sketch for Sufficiency of Theorem 2.1

As mentioned in Section 2.3.2, Commitment Independence, Stationarity, and Timing Indifference imply Independence. Focusing on the subdomain Z1 ⊂ Z

52 consisting of convex menus, the mixture space theorem delivers a mixture linear representation U : Z1 → R. We have to show that U can be rewritten as the desired form. Marginal Dominance implies that the DM is certain about her future risk preferences over C and over Z. Let u : ∆(C) → R and W : ∆(Z) → R be u(lc ) ≡ U ({lc ⊗ lz }) and W (lz ) ≡ U ({lc ⊗ lz }), where l ∈ ∆(C × Z) be a minimal lottery in terms of commitment ranking. These two functions should represent those future preferences. Monotonicity captures preference for flexibility, which presumably reflects uncertainty about future preferences. Since u and W are sure for the DM, all the uncertainties about future preferences are effectively reduced to those about future discount factors. The DM should expect her future preference over ∆(C × Z) to have the form of (1 − α)u(lc ) + αW (lz ), where α ∈ [0, 1] is a subjective weight between u and W . We identify a menu x with its “support function” σx : [0, 1] → R, defined by σx (α) ≡ max (1 − α)u(lc ) + αW (lz ), for all α ∈ [0, 1]. l∈x

That is, σx = σy ⇔ x = y. This identification ensures that the mapping σ embeds the set of menus into the space of real-valued continuous functions on [0, 1], and hence the functional V (f) = U(σ −1 (f )) is well-defined on the image of σ. Following the similar argument in DLR (and Dekel, Lipman, Rustichini

53 and Sarver [14]), we show that there exists a unique probability measure µ over [0, 1] such that V (f) can be written as f(α) dµ(α), and hence, U (x) = V (σx ) =

[0,1]

max (1 − α)u(lc ) + αW (lz ) dµ(α). l∈x

The remaining step is to show that U has a stationary and recursive form as desired. Since W (lz ) is a mixture linear function, it has the expected utility form Z W (z) dz . By Stationarity, W and U must represent the same preference. Moreover, Timing Indifference implies that W is mixture linear with

respect to the mixture operation over menus. Hence W (z) can be written as an affine transformation of U(z). Manipulating the functional form appropriately, we obtain the desired representation. Finally, Continuity implies α ¯ < 1.

2.4

Greater Demand for Flexibility and Greater Uncertainty

We would like to capture the situation where one agent is more uncertain about discount factors than the other. We provide behavioral comparisons about preference for flexibility and characterize intuitive properties of subjective beliefs. Consider two agents. Agent i has preference i on Z, i = 1, 2. Since we are interested in comparison about preference for flexibility, we focus on agents having identical commitment rankings. Recall that L is the set of multistage lotteries. If an element of L is chosen, there remains no opportunity for (nontrivial) choice over the rest of the horizon. We say that 1 and 2 are

54 equivalent on L if, for all l, l′ ∈ L, {l} ≻1 {l′ } ⇔ {l} ≻2 {l′ }. A DM’s preference for flexibility is captured by the Monotonicity axiom, that is, the DM prefers bigger menus. Hence one can say that agent 2 has greater demand for flexibility than agent 1 if agent 2 strictly prefers a bigger menu whenever agent 1 does so. Such a behavioral comparison is provided by DLR. Formally: Definition 2.3. Agent 2 desires more flexibility than agent 1 if, for all x, y ∈ Z with y ⊂ x, x ≻1 y ⇒ x ≻2 y. DLR show that, under Definition 2.3, the subjective state space of agent 2 is bigger than that of agent 1. Hence greater demand for flexibility reflects greater uncertainty about future contingencies. In our model, the support of µ is effectively identified with the subjective state space. By analogy of DLR, one might expect that greater demand for flexibility implies a bigger support of the subjective belief about discount factors. To obtain such a characterization, we do not need a full power of Definition 2.3. Since µ captures a belief about one-period-ahead discount factors, it is relevant to compare demands for flexibility for just one-period-ahead. To formalize the idea, let ∆1 ≡ {lc ⊗ {l} ∈ ∆(C × Z) | lc ∈ ∆(C), l ∈ L}, and Z 1 ≡ K(∆1 ) ⊂ Z. A menu x ∈ Z 1 allows the agent to postpone a decision until period 1, from when on she has to make a commitment. The following definition captures greater demand for flexibility only concerning period 1:

55 Definition 2.4. Agent 2 desires more flexibility for one-period-ahead than agent 1 if, for all x, y ∈ Z 1 with y ⊂ x, x ≻1 y ⇒ x ≻2 y. The next theorem is a counterpart of DLR. A proof is found in Section 2.7.8. Theorem 2.4. Assume that i , i = 1, 2, satisfy all the axioms and are equivalent on L. Then, agent 2 desires more flexibility for one-period ahead than agent 1 if and only if there exist random discounting representations (ui , µi ), i = 1, 2, such that 1. u1 = u2 , and; 2. the support of µ2 is bigger than that of µ1 . Though our functional form is a special case of DLR’s, Theorem 2.4 does not follow directly from Theorem 2 (p. 910) of DLR regarding characterization of Definition 2.3. DLR consider menus of lotteries over finite outcomes, while, in our study, choice objects are menus of lotteries on a compact outcome space. We next consider another behavioral comparison about preference for flexibility. If agent 2 faces more uncertainty about discount factors than does agent 1, agent 2 is presumably more averse to make a commitment to a specific plan than agent 1 is. That is, Definition 2.5. Agent 2 is more averse to commitment than agent 1 if, for all x ∈ Z and l ∈ L, x ≻1 {l} ⇒ x ≻2 {l}. This condition says that if agent 1 strictly prefers a menu x to a completely spelled-out future plan {l}, so does agent 2. Since l does not necessarily belong to x, Definition 2.5 is independent of Definitions 2.3 and 2.4.

56 Several authors adapt identical conditions to Definition 2.5 in different contexts. Ahn [1] considers preference over subsets of lotteries and interpret those subsets as ambiguous objects. Since singleton sets are then regarded as options without ambiguity, a similar comparison to Definition 2.5 says that agent 1 is more ambiguity averse than agent 2. By taking preference over menus of lotteries, Sarver [49] models a DM who anticipates regret from choice in the future and hence may prefer smaller menus. In his model, the identical comparison is interpreted as agent 1 being more regret prone than agent 2. 9 Now turn to the implication of the above behavioral comparison. We show that Definition 2.5 characterizes second-order stochastic dominance in terms of subjective beliefs. In case of objective uncertainty, second-order stochastic dominance has been widely used to describe increasing uncertainty since Rothschild and Stiglitz [44]. Definition 2.6. Consider probability measures µ1 and µ2 over [0, 1]. Say that µ1 exhibits second-order stochastic dominance over µ2 if, for all continuous and concave functions v : [0, 1] → R,10

[0,1]

9

1

v(α) dµ (α) ≥

v(α) dµ2 (α).

[0,1]

In literature on ambiguity in Savage-type model, Epstein [18] and Ghirardato and Mari-

nacci [23] adapt closely related conditions to capture comparative attitude toward ambiguity aversion. They compare an arbitrary act with an unambiguous act instead of comparing an arbitrary menu with a commitment menu. 10 Notice that continuity is not redundant because a concave function is continuous in the interior of the domain. In the original definition of Rothschild and Stiglitz [44], continuity is not imposed.

57 Rothschild and Stiglitz [44] show that the above condition holds if and only if µ2 is obtained as µ1 plus some “noise”.11 Thus second-order stochastic dominance is a natural ordering on probability measures to describe increasing uncertainty. One immediate observation is that Eµ1 [α] = Eµ2 [α] if µ1 exhibits second-order stochastic dominance over µ2 because v(α) = α is a convex and concave function. Now we are ready to state a characterization result. A proof is relegated to Section 2.7.9. Theorem 2.5. Assume that i satisfies all the axioms. Then agent 2 is more averse to commitment than agent 1 if and only if there exist random discounting representations (ui , µi ), i = 1, 2, such that 1. u1 = u2 , and; 2. µ1 exhibits second-order stochastic dominance over µ2 . The intuition behind the “only if” part of Theorem 2.5 is as follows. First of all, part 1 is obtained because Definition 2.5 implies that two preferences are equivalent on L. Furthermore, the definition implies together with this observation that U 2 (x) ≥ U 1 (x) for all x ∈ Z. Since, for all x, the function

i max (1 − α)u(lc ) + α U (z) dlz l∈x

(2.7)

Z

is convex in α, U 2 (x) ≥ U 1 (x) means that the integral of a convex function of the form (2.7) with respect to µ2 is always bigger than that corresponding to µ1 . Finally, any continuous convex function v on [0, 1] can be approximated 11

Their argument for this equivalence works even when continuity is imposed on v.

58 arbitrarily well by a function of the form (2.7) if an affine transformation of u is chosen appropriately. Hence the integral of v with respect to µ2 is bigger than that with respect to µ1 . Finally, we mention yet another behavioral comparison. In case of deterministic discounting, if two agents have the same instantaneous utility function and if one’s discount factor is greater than the other’s, then we would say that one agent is more patient than the other. In the case of random discounting, if one’s belief about discount factors exhibits first-order stochastic dominance over the other’s, then we might say that one agent is more patient than the other.12 One difficulty to characterize this comparison is that, unlike a continuous and convex function on [0, 1], it is not immediate how to approximate an arbitrary non-decreasing function on [0, 1] (which could be discontinuous or strictly concave) by a continuous and convex function of the form (2.7).

2.5

Applications of Random Discounting Model

2.5.1

Consumption-Saving Decision under Random Discounting

Here, we apply the resulting model to a consumption-saving problem and analyze how random discounting affects consumption-saving decision. We focus 12

A probability measure µ1 on [0, 1] is said to exhibit first-order stochastic dominance

over µ2 if, for all non-decreasing functions v : [0, 1] → R,

[0,1]

1

v(α) dµ (α) ≥

[0,1]

v(α) dµ2 (α).

59 on the situation where the DM becomes more uncertain about discount factors in the sense of second-order stochastic dominance. We will show that uncertainty about discount factors has the opposite implication to that about interest rates. Recall Example in Section 2.2.1. Assume that an interest rate r is constant as in the example. Given the savings s from the previous period, the DM evaluates x(s) according to the random discounting representation, U(x(s)) =

max

(c,x(s′ ))∈x(s)

((1 − α)u(c) + αU(x(s′ ))) dµ(α).

(2.8)

Throughout this section, the DM is assumed to have a CRRA utility function over instantaneous consumption, that is, u(c) = c1−σ /(1 − σ) for σ > 0, σ = 1. As is well-known, the inverse of σ is the elasticity of intertemporal substitution. We examine the effect of the DM being more uncertain about future discount factors. Suppose that the DM changes her belief µ1 to µ2 , where µ1 second-order stochastically dominates µ2 . Let U i denote the random discounting representation with components (u, µi ), i = 1, 2. After realization of α ∈ (0, 1), the DM faces the following problem: Vµi (s, α) ≡ =

max

(c,x(s′ ))∈x(s)

(1 − α)u(c) + αU i (x(s′ ))

max (1 − α)u(c) + αU i (x(s′ )).

(c,s′ )∈B(s)

(2.9)

Here current discount factor is known as α and the DM believes discount factors to follow distribution µi over the rest of the horizon. Taking (2.8) and (2.9) together, the Bellman equation is obtained as Vµi (s, α) =

max

(c,s′ )∈B(s)

′ ′ i ′ (1 − α)u(c) + α Vµi (s , α ) dµ (α ) .

(2.10)

60 Let gµi (s, α) denote the saving function which solves problem (2.10). We are ready to state the main result in this section. A proof is relegated to Section 2.7.10. Proposition 2.2. Assume that µ1 second-order stochastically dominates µ2 and α ¯ ≡ Eµ1 [α] = Eµ2 [α] < 1/(1 + r)1−σ . Then: 1. the DM saves a constant fraction of wealth, that is, gµi (s, α) = SRµi (α)(1+ r)s, where the saving rate SRµi (α) ∈ (0, 1) is uniquely determined, and; 2. for all α ∈ (0, 1), SRµ1 (α) ≶ SRµ2 (α) if σ ≶ 1. Part 1 is a characterization of the saving function, and follows from the assumption that u is a CRRA utility function. Because of part 1, we can focus on the saving rate rather than the saving function to analyze saving behavior of the DM. Part 2 concerns a comparative analysis. Depending on the relative size of σ compared to one, the saving rate increases or decreases as the DM becomes more uncertain about discount factors. Part 2 of Proposition 2.2 includes as a special case a comparison of random and deterministic discounting with the same mean. According to the theorem, the savings increase or decrease depending on the parameter σ when uncertainty about discount factors is taken into account. That is, observed oversaving or under-saving behavior can be explained by subjective uncertainty about discount factors. Salanié and Treich [47] provide the same observation in a three-period model. To obtain the intuition behind part 2 of Proposition 2.2, imagine the situation where the DM surely believes her discount factor to be the average

61 discounting α. ¯ Then, she faces the decision problem with no uncertainty, Vα¯ (s) =

13

max (1 − α ¯ )u(c) + α ¯ Vα¯ (s′ ).

(c,s′ )∈B(s)

Now define φ(µi ) as the number satisfying i

Vα¯ (φ(µ )s) =

Vµi (s, α′ ) dµi (α′ ).

(2.11)

The right hand side of (2.11) is the expected utility from the savings s. Thus φ(µi )s is the amount of savings in case of no uncertainty which ensures the same utility level. We call φ(µi ) the certainty compensation ratio (CCR) for µi . We show that CCR always increases as µ becomes more uncertain. Intuitively, this is because the savings are more valuable when the DM faces more uncertainty. For an explanation, consider the situation where the DM compares the following two choices: committing to consumption level c in every period, or using savings s to choose how much consuming and saving. Then, the DM with more uncertain µ is more willing to change her consumption from the consumption level c conditional on the realization of discount factors than the DM with less uncertain µ. Hence the DM facing more uncertainty about future discount factors values the savings more. From (2.11), maximization problem (2.9) is rewritten as max (1 − α)u(c) + αVα¯ (φ(µi )s′ ).

(c,s′ )∈B(s)

That is, increasing uncertainty has the same effect as if the interest rate increases in the consumption-saving model with no uncertainty. Therefore the substitution and income effects lead to the desired result. 13

Notice that Vα¯ (s) and Vµi (s, α ¯ ) are distinct functions.

62 If the DM is sure of her discount factor α, it is easy to verify that the saving rate can be obtained as 1

SR = α σ (1 + r)

1−σ σ

)1, which is the opposite to Theorem 2.2’s 2. To understand why this difference comes about, let a discount factor α be constant, but an interest rate r follows a random distribution ν as in the above literature. Then, by the similar argument, the saving rate is obtained as 1

SRν = α σ (1 + r∗ )

1−σ σ

,

where (1 + r∗ ) is the certainty equivalent of uncertain gross interest rate, that is, ∗

−1

(1 + r ) ≡ u

u((1 + r)) dν(r)

1 = Eν [(1 + r)1−σ ] 1−σ .

Since u is concave, the certainty equivalent always decreases as interest rates become more uncertain. Hence increasing uncertainty has the same effect as if the interest rate decreases in the consumption-saving problem with no uncertainty, while CCR increases as discount factors become more uncertain. Thus substitution and income effects lead to the opposite implications.

63 Finally, we mention a comparison about expected saving rates. Since the saving rate depends on the realization of discount factor, the amount of savings varies stochastically according to µi . One might be interested in comparison between expected saving rates Eµ1 [SRµ1 (α)] and Eµ2 [SRµ2 (α)]. However, the relation is in general ambiguous. See the following example: Assume σ < 1. 1 1 ¯ ∈ 0, (1+r)1−σ . Let µ2 satisfy Let µ be the degenerate distribution at α µ(0) = 1 − α ¯ and µ(1) = α ¯ . Then, Eµ2 [α] = α ¯ , and µ1 second-order stochastically dominates µ2 . Since SRµ2 (0) = 0 and SRµ2 (1) = 1, Eµ2 [SRµ2 (α)] = (1 − α) ¯ ×0+α ¯×1=α ¯. 1

¯ σ (1 + r) On the other hand, from (2.12), Eµ1 [SRµ1 (α)] = α α ¯⋚

1−σ σ

. Hence,

1 ⇒ Eµ1 [SRµ1 (α)] ⋚ Eµ2 [SRµ2 (α)]. 1+r

That is, even if σ < 1, the size of the expected saving rates varies depending on the size of α ¯ and r.

2.5.2

Optimal Stopping Problem

We consider optimal stopping problems under sampling with and without recall. In the standard model, the DM never benefits from the option to accept offers rejected previously. In contrast, the DM with random discount factors may strictly desire such option. The following assumptions are maintained throughout the discussion. The DM has a random discounting representation (u, µ). The instantaneous utility function u satisfies u(0) = 0, u′ (·) > 0, and u′′ (·) < 0. The support of belief

64 µ is assumed to be included in (0, 1/2). This assumption means that the DM always puts more weight on current payoff than the continuation value. Every period the DM faces an offer w drawn from a known distribution function F that has a compact support [0, w] and a finite positive mean. There is no cost for sampling. We denote a history of offers and realization of discount factor α up to period t as ht = ((w0 , α0 ), (w1 , α1 ), . . . , (wt , αt )). Sampling without Recall To begin with, we consider the menu which corresponds to sampling without recall. After discount factor α is realized, the DM chooses either to accept current offer w or to continue sampling. That is, she faces the menu x(w) ≡ {accept w,continue}. If the DM accepts w, she enjoys the utility from w now and consumes nothing for the rest of her life, that is, she faces the payoff stream (u(w), 0, 0, · · · ). If the DM chooses to continue sampling, she faces in the following period the same decision problem but a new offer w′ . Thus “continue” is the lottery over menus of the form x(w′ ) = {accept w′,continue}, where w ′ is given according to the distribution F .14 The DM evaluates the menu x(w) according to the random discounting representation, U(x(w)) =

0

1

max (1 − α)u(w), α

0

w ′

′

U(x(w )) dF (w ) dµ(α).

(2.13)

After a realization of discount factor α, she gets the payoff (1 − α)u(w) by accepting w, while by choosing to continue sampling, she gets zero utility now 14

For example, if the support of F consists of two elements w and w′ , “continue” is the

lottery which gives x(w) with probability F (w) and x(w′ ) with 1 − F (w).

65 and the continuation value α

U(x(w′ )) dF (w′ ).

The strategy of the DM is a choice from {accept w, continue} given (w, α). The optimal strategy maximizes the Bellman equation V (w, α) =

0

where U(x(w′ )) = DM solves

1

max (1 − α)u(w), α 1

w 0

1

V (w , α )dµ(α ) dF (w ) , ′

0

′

′

′

V (w′ , α′ )dµ(α′ ). After the realizations of w and α, the

0

max (1 − α)u(w), α Since α

′

′

U (x(w )) dF (w ) .

U(x(w ′ )) dF (w′ ) is constant, the optimal strategy has reservation

property. The reservation value wα∗ is defined as (1 −

α)u(wα∗ )

=α

U (x(w ′ )) dF (w ′ ).

The DM accepts w if w ≥ wα∗ and continues sampling otherwise. Proposition 2.3. The optimal strategy has reservation property with the reservation values wα∗ satisfying the functional equation  1 ∗ ′  ∗ ∗ (1 − α)u(wα ) = α (1 − α ) u(wα′ )F (wα′ ) + 0

−

w

∗ wα ′



u(w′ ) dF (w′ ) dµ(α′ ).

(2.14)

Moreover, wα∗ is increasing in α and non-increasing with respect to secondorder stochastic dominance for µ. Two remarks are in order. Since reservation values depend on realizations of α, they are determined by functional equation (2.14). This contrasts to the case of deterministic discounting, where the reservation value is characterized by a single equation.

66 Suppose that the DM was offered w in the previous period but didn’t accept it. Even if she draws the same offer w again, the DM with random discounting may accept w depending on current discount factor, whereas the DM with deterministic discounting will never accept it. Sampling with Recall We consider menus which correspond to sampling with recall. Unlike sampling without recall, the DM can take any w which is offered up to current period. Let hw t = (w0 , w1 , . . . , wt ) denote a history of offers up to period t. (hw Given history ht , the DM faces the menu x t ) = {accept w0 , · · · ,accept wt ,continue}. The menu in period t + 1 be denoted by x ((hw t , wt+1 )). The alternative “continue” is the lottery over x ((hw t , wt+1 )) with the distribution F .

Unlike sampling without recall, if hw and h′w are different histories of offers, (h′w ). then “continue” in the menu x (hw ) is a different lottery from that in x

The menu x (hw t ) is evaluated by 1 w w U ( x(ht )) = max (1 − α)u(w0 ), . . . , (1 − α)u(wn ), α U ( x((ht , wt+1 )))dF (wt+1 ) dµ(α). 0

Given hw t and α, the strategy of the DM is a choice from {accept w1 , . . . , accept wt , continue}. We also define yt = max{w0 , . . . , wt }. Because u′ > 0, the optimal strategy maximizes the Bellman equation 1 w 1 ′ ′ V (yt , α) = max (1 − α)u(yt ), α V (yt+1 , α )dµ(α ) dF (wt+1 ) . 0

0

0

Following Kohn and Shavell [33], we show the following proposition. A proof is relegated to Section 2.7.12. Proposition 2.4. The optimal strategy satisfies reservation property.

67 The reservation value w α∗ satisfies (1 −

α)u(w α∗ )

=α

U ( x((hw α∗ , w))) dF (w). t−1 , w

The DM accepts the maximum offer w within the past offers satisfying w ≥ wα∗ and continues sampling otherwise. If the DM has a deterministic discount factor, the reservation values of sampling with and without recall are the same and constant over time. Therefore, observed choice behavior is the same between the two setups.15 In case of random discounting, however, reservation values may not coincide between sampling with and without recall. Even if there is an option to accept past offers, the DM with a deterministic discount factor never accepts any offer rejected previously. Thus, she is indifferent between sampling with and without recall. In contrast, the DM with random discounting would strictly appreciate such option since she may accept an offer rejected previously. Formally: Proposition 2.5. Suppose that the support of µ is finite and has at least two points. The DM strictly prefers sampling with recall to sampling without recall, that is, U ( x(hw t−1 , w)) > U (x(w)) for any history ht−1 . A proof can be found in Section 2.7.13. 15

For example, see DeGroot [11] for this observation. He also argues that the two setups

may lead to different implications if F is a normal distribution with unknown mean and the DM learns information about the mean over time according to Bayesian updating.

68

2.6

Conclusion

We provide an axiomatic model in which the DM believes that her future discount factors change randomly over time. For this purpose, an infinite horizon extension of DLR is formulated, and the subjective state space is specified to the set of future discount factors. A subjective belief about discount factors is uniquely derived from preference. We provide behavioral comparisons regarding preference for flexibility. The first comparison is analogous to DLR. We say that agent 2 desires more flexibility than agent 1 if agent 2 strictly prefers a bigger menu whenever agent 1 does so. This condition implies that agent 2 perceives more contingencies regarding future discount factors than does agent 1. Second, agent 2 is said to be more averse to commitment than agent 1 if agent 2 does not prefer to choose a specific plan (an element of L) over a menu x whenever agent 1 does not. We show that this behavioral comparison characterizes the condition that agent 1’s belief about discount factors exhibits second-order stochastic dominance over agent 2’s belief. The resulting model is applied to a consumption-saving problem. We consider how the saving rate will change if a DM faces greater uncertainty about discount factors, and show that the saving rate increases or decreases depending on the parameter of constant relative risk aversion of u. We argue that uncertainty about discount factors has the opposite implication to that about interest rates. Moreover, the model is applied to optimal stopping problems under sampling with and without recall. We show that the DM with random discount factors may strictly desire such option, although the DM never

69 benefits from the option to accept offers rejected previously. We conclude with some remaining problems. In the current model, the DM’s belief about discount factors is constant over time. However, beliefs may have correlation across time. For example, Krusell and Smith [39] assume that dicount factors change according to a Markov process. Moreover, beliefs about discount factors may depend also on past consumption levels as Uzawa [52], Koopmans [34] and Epstein [17] study endogenous determination of time preference by allowing history dependence on past consumption. As mentioned in Section 2.3.1, the Stationarity axiom excludes all the possibilities of history dependence of beliefs. The main difficulty to allow correlation across discount factors is that we cannot start off with preferences conditional on histories of discount factors, because they are parts of the representation — should be derived rather than assumed as primitives. Thus, given preference on Z, we have to derive all the future preferences from the initial period onward and indirectly impose appropriate restrictions on those preferences to admit the recursive form of random discounting. Without Stationarity, it is not immediate to find such restrictions.

70

2.7 2.7.1

Proofs Hausdorff Metric

Let X be a compact and metric space with a metric d. Let K(X) be the set of non-empty compact subsets of X. For x ∈ X and A, B ∈ K(X), let d(x, B) ≡ min d(x, x′ ), d(A, B) ≡ max d(x, B). ′ x ∈B

x∈A

For all A, B ∈ K(X), define the Hausdorff metric dH by dH (A, B) ≡ max[d(A, B), d(B, A)]. Then, dH satisfies (i) dH (A, B) ≥ 0, (ii) A = B ⇔ dH (A, B) = 0, (iii) dH (A, B) = dH (B, A), and (iv) dH (A, B) ≤ dH (A, C) + dH (C, B). Moreover, K(X) is compact under the Hausdorff metric.

2.7.2

Perfect Commitment Menus

We follow the construction of menus by Gul and Pesendorfer [27, Appendix A] (hereafter GP) and define the set L of perfect commitment menus. Then we show that L is homeomorphic to ∆(C × L). That is, a perfect commitment menu can be viewed as a multistage lottery. In this section, all the notations are borrowed from GP. We identify a singleton menu with its only element by slightly abusing notation. Let L1 = ∆(C) ⊂ Z1 = K(∆(C)). An element of L1 is a one period “commitment” consumption problem. For t > 1, we define Lt inductively as Lt = ∆(C × Lt−1 ) ⊂ Zt = K(∆(C × Zt−1 )). An element of Lt is a t period “commitment” consumption problem. Let Z be the set of all infinite horizon

71 consumption problems defined in GP. Finally, define L = Z ∩ ×∞ t=1 Lt . An element in L is a menu in which the agent is committed in every period. Proposition 2.6. L is homeomorphic to ∆(C × L). Proof. GP construct a homeomorphism f : Z → K(∆(C × Z)). Note that L is compact since Lt is compact for every t. It is sufficient to check that f (L) = ∆(C × L). Let ψ : Y kc → ∆(C × Z ∗ ), φ : M c → Y c , and ξ : Z → K(M c ) be the homeomorphisms defined in GP. Note that, by definition, M c ∩ (∆(C) ×∞ t=1 ∆(C × Lt )) = L, and ξ is identity on L. Since f (L) = ψ ◦ φ(ξ(L)) = ψ ◦ φ(L), we show that ψ ◦ φ(L) = ∆(C × L). t = ∆(C, L1 , . . . , Lt−1 ) for t > 1. 1 = ∆(C) and let L Definition 2.7. Let L

kc = Y kc ∩ ×∞ Denote L t=1 Lt .

kc ) = ∆(C × L∗ ). Step 1: ψ(L kc , it holds that margC,L ,...,L Note that, for a sequence { lt } ∈ L 1 t−1 lt+1 =

margC,Z1 ,...,Zt−1 lt+1 = lt . The same argument for Lemma 3 in GP shows that kc → ∆(C ×L∗ ) such that margC ψ({ there exists a homeomorphism ψ′ : L lt }) =

l1 and margC,...,Lt−1 ψ({ lt }) = lt . Then, Step 1 follows from the uniqueness part of the Kolmogorov’s Existence Theorem.

c = Y c ∩ ×∞ Definition 2.8. Let DtL = Dt ∩ ×tn=1 Lt and L 1 Lt . c. Step 2: φ(L) = L

72

c . We show It is straightforward from the definition of φ that φ(L) ⊃ L

c or φ(L) ∈ ×∞ φ(L) ⊂ L 1 Lt by mathematical induction. Let {lt } ∈ L and { µt } = φ({lt }) ∈ Y c . By definition, µ 1 = l1 ∈ ∆(C) and µ 2 = l2 ∈ ∆(C × L1 ).

k for every k = 1, 2 . . . , t. Since { µt } is a KolSuppose that µ k ∈ L

t . Thus, µ mogorov consistent sequence, margC,...,Zt−1 µ t+1 = µ t ∈ L t+1 ∈ ∆(C, . . . , Lt−1 , Zt ). The definition of φ implies that margC,Zt−1 µ t+1 = lt ∈ t+1 = ∆(C, . . . , Lt−1 , Lt ). ∆(C × Lt ). Therefore, µ t+1 ∈ L c ) = { l ∈ ∆(C × L∗ )| l(C × L) = 1}. Step 3: ψ(L c = Since L

kc

{ lt } ∈ L lt+1 (C × DtL ) = 1, ∀t ≥ 1 , Step 3 follows from the

same argument for Lemma 5 in GP.

2.7.3

Preliminaries

Recall O(x) ≡

l∈x

{l′ ∈ Z|{lc ⊗ lz′ } {lc′ ⊗ lz′ }, {lc′ ⊗ lz } {lc′ ⊗ lz′ }},

where lc and lz (lc′ and lz′ ) are the marginal distributions of l (l′ ) on C and on Z, respectively. Proposition 2.7. Assume satisfies Order and Continuity. 1. For any x ∈ Z, O(x) ∈ Z.

73 2. In addition, assume satisfies Independence, Separability and Marginal Dominance. If x is convex, so is O(x), and; 3. O : Z → Z is Hausdorff continuous. Proof. 1. Since ∆(C × Z) is compact, it suffices to show that O(x) is a closed subset of ∆(C × Z). Let ln → l with ln ∈ O(x). By definition, there exits n

n

n

n

a sequence {l } with l ∈ x such that {lc ⊗ lzn } {lcn ⊗ lzn } and {lcn ⊗ lz } {lcn ⊗ lzn }. Since x is compact, without loss of generality, we can assume that n

n

n

{l } converges to a limit l ∈ x. Since lcn → lc and lzn → lz , lc → lc and lz → lz , Continuity implies {lc ⊗ lz } {lc ⊗ lz } and {lc ⊗ lz } {lc ⊗ lz }. Hence, x ∈ O(x). 2. Take l, l′ ∈ O(x) and λ ∈ [0, 1]. Let lλ ≡ λl + (1 − λ)l′ . We want to show lλ ∈ O(x). By definition, there exists l, l′ ∈ x such that {lc ⊗ lz } {lc ⊗ lz }, {lc ⊗ lz } {lc ⊗ lz }, {l′ c ⊗ lz′ } {lc′ ⊗ lz′ }, and {lc′ ⊗ l′ z } {lc′ ⊗ lz′ }. λ

Let l ≡ λl + (1 − λ)l′ ∈ x. From Independence, λ{lc ⊗ lz } + (1 − λ){l′ c ⊗ lz′ } λ{lc ⊗ lz } + (1 − λ){lc′ ⊗ lz′ }, equivalently, {λlc ⊗ lz + (1 − λ)l′ c ⊗ lz′ } {λlc ⊗ lz + (1 − λ)lc′ ⊗ lz′ }. Since O({lc ⊗ lz }) = O({l}), Marginal Dominance implies {lc ⊗ lz } ∼ {l}. By the same reason, {lc′ ⊗ lz′ } ∼ {l′ }, {lcλ ⊗ lzλ } ∼ {lλ }, and {(λlc + (1 − λ)l′ c ) ⊗ (λlz + (1 − λ)lz′ )} ∼ {λlc ⊗ lz + (1 − λ)l′ c ⊗ lz′ }.

74 Thus, λ

{lc ⊗ lzλ } = {(λl + (1 − λ)l′ )c ⊗ (λl + (1 − λ)l′ )z } = {(λlc + (1 − λ)l′ c ) ⊗ (λlz + (1 − λ)lz′ )} ∼ {λlc ⊗ lz + (1 − λ)l′ c ⊗ lz′ } {λlc ⊗ lz + (1 − λ)lc′ ⊗ lz′ } ∼ {λl + (1 − λ)l′ } ∼ {lcλ ⊗ lzλ }. λ

Similarly, {lcλ ⊗ lz } {lcλ ⊗ lzλ }. Hence lλ ∈ O(x). 3. Let xn → x. We want to show O(xn ) → O(x). Since Z is compact, m ∞ the sequence {O(xn )}∞ n=1 has a convergent subsequence {O(x )}m=1 with the

limit y ∈ Z. It suffices to show that O(x) = y. Step 1: O(x) ⊂ y. Take any l ∈ O(x). Then, there exists l ∈ x such that {lc ⊗ lz } {lc ⊗ lz } m

and {lc ⊗ lz } {lc ⊗ lz }. Since xm → x, we can find a sequence {l }∞ m=1 such m

m

that l ∈ xm and l → l. m Now we will construct a sequence {lm }∞ ∈ O(xm ) satisfying m=1 with l

lm → l. First consider the condition {lc ⊗ lz } {lc ⊗ lz }. There are two cases; (1) {lc ⊗ lz } ≻ {lc ⊗ lz }, and (2) {lc ⊗ lz } ∼ {lc ⊗ lz }. In case (1), m

Continuity implies {lc ⊗ lz } ≻ {lc ⊗ lz } for all m sufficiently large. Hence, define lcm by lc for all sufficiently large m, otherwise lcm can be taken to be the marginal of an arbitrary l in O(xm ). In case (2), define lcm ≡ lc as long as m

{lc ⊗lz } {lc ⊗lz } ∼ {lc ⊗lz }. Otherwise, let k ≥ 1 be the first natural number

75 k

k

satisfying {lc ⊗ lz } ∼ {lc ⊗ lz } ≻ {lc ⊗ lz }. Let lc (λ) ≡ λlc + (1 − λ)lc . Since m

lc → lc , Continuity ensures that, for all m, large enough, there is λm ∈ [0, 1] m

such that {lc (λm ) ⊗ lz } ∼ {lc ⊗ lz }. Define lcm ≡ lc (λm ). Thus, in any case, we m

m can find a sequence {lcm }∞ m=1 converging to lc such that {l c ⊗ lz } {lc ⊗ lz }. m

By the same way, {lzm }∞ m=1 can be constructed so as to satisfy {lc ⊗ lz } m

{lc ⊗ lzm } and lzm → lz . From Separability, we have {lc ⊗ lzm } {lcm ⊗ lzm } and m

{lcm ⊗ lz } {lcm ⊗ lzm }. m Take a sequence {lm }∞ m=1 converging to l such that the marginals of l

on C and on Z coincide with lcm and lzm , respectively. By construction, the m ∈ O(xm ). Since lm → l and O(xm ) → y with sequence {lm }∞ m=1 satisfies l

lm ∈ O(xm ), we have l ∈ y. Thus O(x) ⊂ y. Step 2: y ⊂ O(x). Take any l ∈ y. Since O(xn ) → y, we can find a sequence ln ∈ O(xn ) n

n

with ln → l. By definition, there is l ∈ xn such that {lc ⊗ lzn } {lcn ⊗ lzn } n

n

and {lcn ⊗ lz } {lcn ⊗ lzn }. Since ∆(C × Z) is compact, we can assume {l } n

n

converges to the limit l ∈ ∆(C × Z). Since l → l and xn → x with l ∈ x, we have l ∈ x. From Continuity, {lc ⊗ lz } {lc ⊗ lz } and {lc ⊗ lz } {lc ⊗ lz }. Thus, l ∈ O(x).

76

2.7.4

Proof of Theorem 2.1

Necessity Necessity of the axioms is routine. We show that for any (u, µ) there exists U satisfying the functional equation. Let U be the Banach space of all real-valued continuous functions on Z with the sup-norm metric. Define the operator T : U → U by T (U )(x) ≡

max (1 − α)u(lc ) + α U (z) dlz dµ(α).

[0,1] l∈x

Z

Since T (U) is continuous, the operator T is well-defined. To show T is a contraction mapping, it suffices to verify that (i) T is monotonic, that is, T (U ) ≥ T (V ) whenever U ≥ V , and (ii) T satisfies the discounting property, that is, there exists δ ∈ [0, 1) such that, for any U and any constant c ∈ R, T (U + c) = T (U) + δc. Step 1: T is monotonic. Take any U, V ∈ U with U ≥ V . Fix x ∈ Z and α ∈ [0, 1] arbitrarily. Let l∗ ∈ x be a maximizer of

max (1 − α)u(lc ) + α V (z) dlz . l∈x

Z

Since U ≥ V ,

∗ max (1 − α)u(lc ) + α V (z) dlz = (1 − α)u(lc ) + α V (z) dlz∗ l∈x Z

Z ∗ ∗ ≤ (1 − α)u(lc ) + α U (z) dlz ≤ max (1 − α)u(lc ) + α U (z) dlz . Z

l∈x

Z

Since this inequality holds for any α ∈ [0, 1], we have T (U) ≥ T (V ).

77 Step 2: T satisfies the discounting property. Let δ ≡ α. ¯ By assumption, δ ∈ [0, 1). For any U ∈ U and c ∈ R, T (U + c) = =

max (1 − α)u(lc ) + α (U (z) + c) dlz dµ(α) Z

max (1 − α)u(lc ) + α U(z) dlz + αc dµ(α)

[0,1] l∈x

[0,1] l∈x

Z

= T (U ) + α ¯ c = T (U) + δc.

By Step 1 and Step 2, T is a contraction mapping. Thus, the fixed point theorem (See Bertsekas and Shreve [5, p.55]) ensures that there exists a unique U ∗ ∈ U satisfying U ∗ = T (U ∗ ). This U ∗ satisfies the equation (2.1). Sufficiency Lemma 2.1. Independence, Stationarity, and Timing Indifference imply that x ≻ y ⇒ λx + (1 − λ)z ≻ λy + (1 − λ)z, for all x, y, z ∈ Z and λ ∈ (0, 1). Proof. Let x ≻ y. From Stationarity, {(c, x)} ≻ {(c, y)}. For any λ ∈ (0, 1), Independence implies {λ ◦ (c, x) + (1 − λ) ◦ (c, z)} ≻ {λ ◦ (c, y) + (1 − λ) ◦ (c, z)}. From Timing Indifference, {(c, λx + (1 − λ)z)} ≻ {(c, λy + (1 − λ)z)}. Again, from Stationarity, λx + (1 − λ)z ≻ λy + (1 − λ)z. Let co(x) denote the closed convex hull of x. As in DLR, Order, Continuity and Lemma 2.1 imply x ∼ co(x). Hence we can pay attention to the subdomain Z1 ≡ {x ∈ Z|x = co(x)}.

78 Since Z1 is a mixture space, Order, Continuity and the property in Lemma 2.1 ensure that can be represented by a mixture linear function U : Z1 → R. Nondegeneracy implies U is not constant. Since C × Z is compact, there exist a maximal element ¯l and a minimal element l with respect to U. Without loss of generality, assume U ({¯l}) = 1 and U({l}) = 0. For any l ∈ ∆(C × Z), let lc and lz be the marginal distributions of l on C and on Z, respectively. Define u : ∆(C) → R and W : ∆(Z) → R by u(lc ) ≡ U ({lc ⊗ lz }), W (lz ) ≡ U ({lc ⊗ lz }). Lemma 2.2.

1. For any lc , lc′ ∈ ∆(C) and lz ∈ ∆(Z), u(lc ) ≥ u(lc′ ) ⇔ U ({lc ⊗ lz }) ≥ U(lc′ ⊗ lz ).

Similarly, for any lz , lz′ ∈ ∆(Z) and lc ∈ ∆(C), W (lz ) ≥ W (lz′ ) ⇔ U ({lc ⊗ lz }) ≥ U(lc ⊗ lz′ ). 2. u and W are mixture linear. Proof. 1. Consider the restriction of U on ∆(C × Z). Let U (c, z) ≡ U (δ(c,z) ), where δ(c,z) means the degenerate probability measure at (c, z).

Step 1: There exists u : C → R and W : Z → R such that U(c, z) = u(c) + W (z). Since

1 1 1 1 ′ ′ ′ ′ O ◦ (c, z) + ◦ (c , z ) =O ◦ (c , z) + ◦ (c, z ) , 2 2 2 2 Marginal Dominance implies

1 1 1 1 ′ ′ ′ ′ U ◦ (c, z) + ◦ (c , z ) =U ◦ (c , z) + ◦ (c, z ) . 2 2 2 2

79 Mixture linearity of U implies U (c, z) + U (c′ , z ′ ) = U (c′ , z) + U (c, z ′ ). Define u(c) ≡ U(c, z ′ ) and W (z) ≡ U (c′ , z) − U (c′ , z ′ ) for an arbitrarily fixed (c′ , z ′ ). Then, U(c, z) = u(c) + W (z).

From Step 1, for any l ∈ ∆(C × Z), U (l) =

U(c, z)dl(c, z) =

(u(c)+W (z))dl(c, z) =

u(c)dlc (c)+ W (z)dlz (z).

Thus, u(lc ) ≥ u(lc′ ) ⇔ U({lc ⊗ lz }) ≥ U ({lc′ ⊗ lz }) ′ ⇔ u(c)dlc (c) + W (z)dlz (z) ≥ u(c)dlc (c) + W (z)dlz (z) ⇔ u(c)dlc (c) ≥ u(c)dlc′ (c) ′ ⇔ u(c)dlc (c) + W (z)dlz (z) ≥ u(c)dlc (c) + W (z)dlz (z) ⇔ U({lc ⊗ lz }) ≥ U ({lc′ ⊗ lz }).

The symmetric argument works for W . 2. We want to show u(λlc + (1 − λ)lc′ ) = λu(lc ) + (1 − λ)u(lc′ ) for any lc , lc′ and λ ∈ [0, 1]. Since O({(λlc + (1 − λ)lc′ ) ⊗ lz }) = O({λlc ⊗ lz + (1 − λ)lc′ ⊗ lz }), Marginal Dominance implies U ({(λlc + (1 − λ)lc′ ) ⊗ lz }) = U({λlc ⊗ lz + (1 − λ)lc′ ⊗ lz }).

80 Since U ({·}) is mixture linear, u(λlc + (1 − λ)lc′ ) = U({(λlc + (1 − λ)lc′ ) ⊗ lz }) = U({λlc ⊗ lz + (1 − λ)lc′ ⊗ lz }) = λU({lc ⊗ lz }) + (1 − λ)U({lc′ ⊗ lz }) = λu(lc ) + (1 − λ)u(lc′ ). By the symmetric argument, we can show that W is mixture linear. From Marginal Dominance, x ∼ O(x). Hence we can pay attention to the sub-domain, Z2 ≡ {x ∈ Z1 |x = O(x)}. From Proposition 2.7 (iii), Z2 is compact. Moreover, Proposition 2.7 (i) and (ii) imply that any x ∈ Z2 is compact and convex. For each x ∈ Z2 and α ∈ [0, 1], define σx (α) ≡ max ((1 − α)u(lc ) + αW (lz )) . l∈x

(2.15)

Let C([0, 1]) be the set of real-valued continuous functions on [0, 1] with the supnorm. The above formulation (2.15) defines the mapping σ : Z2 → C([0, 1]). Lemma 2.3.

1. σ is continuous.

2. For all x, y ∈ Z2 and λ ∈ [0, 1], λσx + (1 − λ)σy = σO(λx+(1−λ)y) . 3. σ is injective. Proof. 1. Let V (x) ≡ { (u, w)| u = u(lc ), w = W (lz ), l ∈ x} ⊂ R2 . Since u and W are continuous and C × Z is compact, there exists a compact set L ⊂ R2 such that V (x) ⊂ L for all x. Hence, V (x) is also compact and,

81 moreover, convex because u and W are mixture linear. Let K(L) be the set of non-empty compact subsets of L with the Hausdorff metric.

Step 1: The map V : Z2 ∋ x #−→ V (x) ∈ K(L) is Hausdorff continuous. Take a sequence xn → x with xn , x ∈ Z2 . We want to show that V (xn ) → V (x). By contradiction, suppose otherwise. Then, there exists a neighborhood U of V (x) such that V (xm ) ∈ / U for infinitely many m. Let {xm }∞ m=1 be the n m ∞ corresponding subsequence of {xn }∞ n=0 . Since x → x, {x }m=1 also converges

to x. Since {V (xm )}∞ m=1 is a sequence in a compact metric space K(L), there exists a convergent subsequence {V (xl )}∞ l=1 with a limit z = V (x). As a result, now we have xl → x and V (xl ) → z. In the following argument, we will show that z = V (x), which is a contradiction. Take any (u, w) ∈ V (x). There exists l ∈ x such that u = u(lc ) and l l l w = W (lz ). Since xl → x, we can find {ll }∞ l=1 such that l → l with l ∈ x .

Let (ul , wl ) ≡ (u(lcl ), W (lzl )) ∈ V (xl ). The conditions (ul , w l ) → (u, w) and V (xl ) → z with (ul , wl ) ∈ V (xl ) imply (u, w) ∈ z. Thus, V (x) ⊂ z. For the other direction, take any (u, w) ∈ z. Since V (xl ) → z, we can find l l l l l {(ul , w l )}∞ l=1 such that (u , w ) → (u, w) with (u , w ) ∈ V (x ). There exists

ll ∈ xl satisfying (ul , w l ) = (u(lcl ), W (lzl )). Since ∆(C × Z) is compact, there exists a convergent subsequence {lk }∞ k=0 with a limit l. By continuity of u and W , (u(lc ), W (lz )) = (u, w). Moreover, since lk → l, xk → x with lk ∈ xk , we have l ∈ x. Thus, (u, w) ∈ V (x), which implies z ⊂ V (x).

82 Step 2: dsupnorm (σx , σy ) ≤ dHausdorff (V (x), V (y)). For any α ∈ [0, 1], by definition, |σx (α) − σy (α)|

= max ((1 − α)u(lc ) + αW (lz )) − max ((1 − α)u(hc ) + αW (hz ))

l∈x h∈x

= max ((1 − α)u + αw) − max ((1 − α)u + αw)

. (u,w)∈V (x)

(u,w)∈V (y)

Let (uαx , wαx ) ∈ V (x) and (uαy , wαy ) ∈ V (y) be maximizers for the maximization problems, respectively. Without loss of generality, assume (1 − α)uαx + αwαx ≥ (1 − α)uαy + αw αy . Let H αy ≡ { (u, w)| (1 − α)u + αw = (1 − α)uαy + αwαy } and (u∗ , w∗ ) ∈ H αy be a point solving min

(u,w)∈H αy

$(u, w) − (uαx , w αx )$ .

Then, by the Schwarz inequality,

max ((1 − α)u + αw) − max ((1 − α)u + αw)

(u,w)∈V (x)

(u,w)∈V (y)

= |((1 − α)uαx + αw αx ) − ((1 − α)uαy + αwαy )| = |((1 − α)uαx + αw αx ) − ((1 − α)u∗ + αw ∗ )| = |(1 − α) (uαx − u∗ ) + α (wαx − w ∗ )|

≤ $(uαx − u∗ , wαx − w ∗ )$ $(1 − α, α)$ ≤ $(uαx − u∗ , w αx − w∗ )$ ≤

min

(u,w)∈V (y)

$(uαx , w αx ) − (u, w)$ ≤ dHausdorff (V (x), V (y)).

83 Since this inequality holds for all α, dsupnorm (σx , σy ) = sup |σx (α) − σy (α)| ≤ dHausdorff (V (x), V (y)). α∈[0,1]

From Step 1 and 2, σ is continuous. 2. Fix α ∈ [0, 1]. Let lx ∈ x and ly ∈ y satisfy (1 − α)u(lcx ) + αW (lzx ) = max((1 − α)u(lc ) + αW (lz )), l∈x

(1 − α)u(lcy ) + αW (lzy ) = max((1 − α)u(lc ) + αW (lz )). l∈y

Since λlx + (1 − λ)ly ∈ λx + (1 − λ)y, mixture linearity of u and W implies λσx (α) + (1 − λ)σy (α) = λ((1 − α)u(lcx ) + αW (lzx )) + (1 − λ)((1 − α)u(lcy ) + αW (lzy )) = (1 − α)u(λlcx + (1 − λ)lcy ) + αW (λlzx + (1 − λ)lzy ) = =

max

l∈λx+(1−λ)y

max

((1 − α)u(lc ) + αW (lz )) ((1 − α)u(lc ) + αW (lz )) = σO(λx+(1−λ)y) (α).

l∈O(λx+(1−λ)y)

3. Take x, x′ ∈ Z2 with x = x′ . Assume x x′ .16 Take l ∈ x \ x′ . let u = u(lc ) and w = W (lz ). Let V ′ ≡ { (u, w)| u = u(lc ), w = W (lz ), l ∈ x′ } ⊂ R2 .

Step 1: ({(u, w)} + R2+ ) ∩ V ′ = ∅. Suppose otherwise. Then, there exists l′ ∈ x′ such that u ≤ u(lc′ ) and w ≤ W (lz′ ). That is, U({lc ⊗lz }) ≤ U ({lc′ ⊗lz }) and U({lc ⊗lz }) ≤ U ({lc ⊗lz′ }). 16

The similar argument works when x′ x.

84 From Lemma 2.2’ 1, U ({lc ⊗lz }) ≤ U({lc′ ⊗lz }) and U ({lc ⊗lz }) ≤ U ({lc ⊗lz′ }). Thus, l ∈ O(l′ ) ⊂ O(x′ ). Since O(x′ ) = x′ , this is a contradiction. By the separating hyperplane theorem, there exists α ∈ [0, 1] and γ ∈ R such that (1 − α)u + αw > γ > (1 − α)u′ + αw′ for all (u′ , w ′ ) ∈ V ′ . Equivalently, (1 − α)u(lc ) + αW (lz ) > γ > (1 − α)u(lc′ ) + αW (lz′ ) for all l′ ∈ x′ . Hence, σx (α) = max((1 − α)u(lc ) + αW (lz )) ≥ (1 − α)u(lc ) + αW (lz ) l∈x

((1 − α)u(lc′ ) + αW (lz′ )) = σx′ (α). > max ′ ′ l ∈x

Therefore σx = σx′ . Let C ⊂ C([0, 1]) be the range of σ. Lemma 2.4.

1. C is convex.

2. The zero function is in C. 3. The constant function equal to a positive number c > 0 is in C. 4. The supremum of any two points σ, σ ′ ∈ C is also in C. That is, max[σ(α), σ ′ (α)] is also in C. 5. For all f ∈ C, f ≥ 0.

85 Proof. 1. Take any f, f ′ ∈ C and λ ∈ [0, 1]. There are x, x′ ∈ Z2 satisfying f = σx and f ′ = σx′ . From Lemma 2.3’ 2, λf + (1 − λ)f ′ = λσx + (1 − λ)σx′ = σO(λx+(1−λ)x′ ) ∈ Z2 . Hence, C is convex. 2. Let x ≡ O({l}) ∈ Z2 . Then, for all α, σx (α) = max (1 − α)u(lc ) + αW (lz ) = (1 − α)u(lc ) + αW (lz ) l∈O({l})

= (1 − α)U ({lc ⊗ lz }) + αU ({lc ⊗ lz }) = 0. 3. Recall l is a maximal element of U ({·}). Without loss of generality, assume u(lc ) ≥ W (lz ). From Nondegeracy, there exists lz∗ such that W (lz∗ ) > W (lz ) = 0. Since u(lc ) ≥ W (lz∗ ) > 0 = u(lc ), continuity of u implies that there exists lc∗ such that u(lc∗ ) = W (lz∗ ). Let c ≡ W (lz∗ ) > 0. Let x ≡ O({lc∗ ⊗ lz∗ }) ∈ Z2 . Then, for all α, σx (α) =

max

(1 − α)u(lc ) + αW (lz ) = (1 − α)u(lc∗ ) + αW (lz∗ ) = c.

l∈O({lc∗ ⊗l∗z })

4. There exists x′ , x ∈ Z2 such that σ = σx and σ ′ = σx′ . Let σ ′′ ≡ σO(co(x∪x′ )) ∈ C. Then, σ ′′ (α) = max[σx (α), σx′ (α)]. 5. There exists x ∈ Z2 such that f = σx . Since O({l}) ⊂ x, Lemma 2.4’ 2 implies f (α) = σx (α) ≥ σO({l}) (α) = 0 for all α. Define T : C → R by T (f) ≡ U(σ −1 (f )). Notice that T (0) = 0 and T (c) = c, where 0 and c are identified with the zero function and the constant function equal to c > 0, respectively. Since U and σ is continuous and mixture linear, so is T .

86 Lemma 2.5. T (βf + γf ′ ) = βT (f ) + γT (f ′ ) as long as f, f ′ , βf + γf ′ ∈ C, where β, γ ∈ R+ . Proof. For any β ∈ [0, 1], T (βf ) = T (βf + (1 − β)0) = βT (f ) + (1 − β)T (0) = βT (f), where 0 is the zero function. For any β > 1, let f ′′ ≡ βf . Since T

1 ′′ f β

=

1 T (f ′′ ) , β

βT (f) = T (βf ). Additivity follows from ′

T (f + f ) = 2T

1 1 f + f′ 2 2

= T (f) + T (f ′ ).

By the same argument as in DLR, we will extend T : C → R to C([0, 1]). For any r ≥ 0, let rC ≡ {rf |f ∈ C}. Let H ≡ ∪r≥0 rC and H ∗ ≡ H − H = {f1 − f2 ∈ C([0, 1])|f1 , f2 ∈ H}. For any f ∈ H \ 0, there is r > 0 satisfying (1/r)f ∈ C. Define T (f) ≡ rT ((1/r)f ). From linearity of T on C, T (f ) is well-defined. That is, even if there is another r′ > 0 satisfying (1/r ′ )f ∈ C, rT ((1/r)f ) = r′ T ((1/r ′ )f ). It is easy to see that T on H is mixture linear. By the same argument in Lemma 2.5, T is also linear. For any f ∈ H ∗ , there are f1 , f2 ∈ H satisfying f = f1 − f2 . Define T (f ) ≡ T (f1 ) − T (f2 ). We can verify T : H ∗ → R is well-defined. Indeed, suppose that f1 , f2 , f3 and f4 in H satisfy f = f1 − f2 = f3 − f4 . Since

87 f1 + f4 = f2 + f3 , T (f1 ) + T (f4 ) = T (f2 ) + T (f3 ) by linearity of T on H. Moreover, T is increasing in the pointwise order on σ ∈ C. Indeed, take any f, g ∈ H ∗ with f ≥ g. Since H ∗ is a vector space, f − g ∈ H ∗ . Hence there exist σ, σ ′ ∈ C and r > 0 such that r(σ − σ ′ ) = f − g ≥ 0. Thus σ ≥ σ ′ pointwise. Since T (σ) ≥ T (σ ′ ) by Monotonicity, T (r(σ − σ ′ )) ≥ T (0) = 0 implies T (f − g) ≥ 0. That is, we have T (f ) ≥ T (g). Lemma 2.6. H ∗ is dense in C([0, 1]). Proof. From the Stone-Weierstrass theorem, it is enough to show that (i) H ∗ is a vector sublattice, (ii) H ∗ separates the points of [0, 1]; that is, for any two distinct points α, α′ ∈ [0, 1], there exists f ∈ H ∗ with f (α) > f (α′ ), and (iii) H ∗ contains the constant functions equal to one. By the same argument as Lemma 11 (p.928) in DLR, (i) holds. To verify condition (ii), take α, α′ ∈ [0, 1] with α = α′ . Without loss of generality, α > α′ . Let x ≡ O({lc ⊗ lz }). Then, σx ∈ C ⊂ H ∗ . Since u(lc ) > 0 and W (lz ) = 0, σx (α) = (1 − α)u(lc ) + αW (lz ) = (1 − α)u(lc ) > (1 − α′ )u(lc ) = (1 − α′ )u(lc ) + α′ W (lz ) = σx (α′ ). Thus, condition (ii) holds. Finally, condition (iii) directly follows from Lemma 2.4 3 and the definition of H. As Theorem 2 of Dekel, Lipman, Rustichini and Sarver [14] fixes the argument (Lemma 12, p. 929) of DLR, we can show that there is a constant K > 0 such that T (f ) ≤ K$f $ for any f ∈ H ∗ . Indeed, notice that, for all f ∈ H ∗ , f ≤ $f $1, where 1 ∈ H is the function identically equal to 1. Since T is increasing, T (f ) ≤ $f $T (1). Thus K ≡ T (1) is the desired object.

88 By the Hahn-Banach theorem, we can extend T : H ∗ → R to T : C([0, 1]) → R in a linear, continuous and increasing way. Since H ∗ is dense by Lemma 2.6, this extension is unique. Now we have the following commutative diagram: U Z2

→R

↓

րT

σ

C([0, 1]) Since T is a positive linear functional on C([0, 1]), the Riesz representation theorem ensures that there exists a unique countably additive probability measure µ on [0, 1] satisfying T (f) =

f (α)dµ(α),

[0,1]

for all f ∈ C([0, 1]). Thus we have U (x) = T (σ(x)) =

max ((1 − α)u(lc ) + αW (lz )) dµ(α).

[0,1] l∈x

For any x ∈ Z, let δx be the degenerate measure at x. Denote W (δx ) by W (x). Lemma 2.7. U (x) ≥ U (y) ⇔ W (x) ≥ W (y). Proof. First of all, W (lz ) =

W (x)dlz (x). ∆(Z)

Since U({(c, x)}) =

((1 − α)u(c) + αW (x))dµ(α) = (1 − α)u(c) + αW (x),

89 Stationarity implies U(x) ≥ U (y) ⇔ U ({(c, x)}) ≥ U ({(c, y)}) ⇔ W (x) ≥ W (y).

Lemma 2.8. There exist β > 0 and ζ ∈ R such that W (x) = βU (x) + ζ. Proof. By the definition of W , U ({γ ◦ (c, x) + (1 − γ) ◦ (c, y)}) =

((1 − α)u(c) + αW (γ ◦ x + (1 − γ) ◦ y))dµ(α)

= U ({c ⊗ (γ ◦ x + (1 − γ) ◦ y)}) = W (γ ◦ x + (1 − γ) ◦ y), and U ({(c, γx + (1 − γ)y)}) = W (γx + (1 − γ)y). Since W is mixture linear over ∆(Z), Timing Indifference implies γW (x) + (1 − γ)W (y) = W (γ ◦ x + (1 − γ) ◦ y) = W (γx + (1 − γ)y). Hence W is mixture linear over Z. From Lemma 2.7, we know U (x) and W (x) represent the same preference. Since both functions are mixture linear, there exists β > 0 and ζ ∈ R such that W (x) = βU (x) + ζ. We will claim that β can be normalized to one. Define W ∗ : ∆(Z) → R by W ∗ (lz ) = W (lz )/β. For any x ∈ D, define σx∗ : [0, 1] → R by σx∗ (α) ≡ max ((1 − α)u(lc ) + αW ∗ (lz )) . l∈x

Since W ∗ is continuous and mixture linear, the same arguments up to Lemma 2.7 work even for σ ∗ . Thus, there exists a probability measure µ∗ on [0, 1] such that U(x) =

max ((1 − α)u(lc ) + αW ∗ (lz )) dµ∗ (α).

[0,1] l∈x

90 By definition, W ∗ (z) = U (z) + ζ/β. Lemma 2.9. α ¯ < 1, where α ¯ is the mean of µ∗ . Proof. Since U is not constant, there exist x and x′ such that U (x) > U (x′ ). For any fixed c, let xt ≡ {(c, {(c, {· · · {(c, x)})})}, x′t ≡ {(c, {(c, {· · · {(c, x′ )})})}. Then, U(xt ) − U (x′t ) = (1 − α)αt U (x) − (1 − α)αt U (x′ ) = (1 − α)(U(x) − U (x′ ))αt . Since Continuity requires U(x)−U (x′ ) → 0 as t → ∞, we must have α < 1. Define ζ ∗ ≡ ζ/β and u∗ (lc ) ≡ u(lc ) +

α ¯ ζ ∗. 1−α ¯

Then

∗ U (x) = max (1 − α)u(lc ) + α W (z)dlz (z) dµ∗ (α) [0,1] l∈x Z

∗ = max (1 − α)u(lc ) + α (U (z) + ζ )dlz (z) dµ∗ (α) [0,1] l∈x Z

α ¯ ∗ = max (1 − α) u(lc ) + ζ + α U (z)dlz (z) dµ∗ (α) l∈x 1−α ¯ [0,1] Z

= max (1 − α)u∗ (lc ) + α U (z)dlz (z) dµ∗ (α).

[0,1] l∈x

Z

Therefore the functional form U with components (u∗ , µ∗ ) is the required representation.

91

2.7.5

Proof of Theorem 2.2

(i) Since mixture linear functions u and u′ represent the same conditional preference +c over ∆(C), by the standard argument, u′ is rewritten as an affine transformation of u. That is, u and u′ are cardinally equivalent. (ii) From (i), there exist γ > 0 and ζ ∈ R such that u′ = γu + ζ. Since U and U ′ are mixture linear functions representing the same preference +, there exist γ ∗ > 0 and β ∗ ∈ R such that U ′ = γ ∗ U +ζ ∗ . Let xc be the menu requiring to commit c forever, that is, xc ≡ {(c, {(c, {· · · })})}. Since U (xc ) = u(c) and U ′ (xc ) = u′ (c), U ′ (xc ) = γU (xc ) + ζ. Hence, we must have γ ∗ = γ and ζ ∗ = ζ. Now we have U ′ (x)

′ ′ = max (1 − α)u (lc ) + α U (z)dlz dµ′ (α) [0,1] l∈x Z

max (1 − α)(γu(lc ) + ζ) + α (γU(z) + ζ)dlz dµ′ (α) = [0,1] l∈x Z

=γ max (1 − α)u(lc ) + α U(z) dlz dµ′ (α) + ζ. [0,1] l∈x

Z

Hence, ′′

U (x) ≡

max (1 − α)u(lc ) + α U(z) dlz dµ′ (α)

[0,1] l∈x

Z

also represents the same preference. Since U ′ = γU + ζ and U ′ = γU ′′ + ζ, we must have U (x) = U ′′ (x) for all x. For all x ∈ D and α ∈ [0, 1], let

σx (α) ≡ max (1 − α)u(lc ) + α U (z) dlz (x) . l∈x

92 Then, U(x) =

σx (α) dµ(α) =

σx (α) dµ′ (α) = U ′′ (x).

(2.16)

If x is convex, σx is its support function. Equation (2.16) holds even when σx is replaced with γσx − ζσy for any convex menus x, y and γ, ζ ≥ 0. From Lemma 2.6, the set of all such functions is a dense subset of the set of realvalued continuous functions over [0, 1]. Hence, equation (2.16) still holds even if σx is replaced with any real-valued continuous function. Hence, the Riesz representation theorem implies µ = µ′ .

2.7.6

Proof of Corollary 2.3

If part: the representation has the form of

U (x) = max (1 − α)u(lc ) + α l∈x

Z

U(z) dlz ,

for some α ∈ [0, 1). Thus it is easy to verify that U (x) ≥ U(y) implies U (x) = U(x ∪ y). Only-if part: since Strategic Rationality implies Monotonicity, Theorem 2.1 ensures that admits a random discounting representation U with components (u, µ). By contradiction, suppose #supp(µ) = 1. Then, there exist α′ , α′′ ∈ supp(µ) with α′′ > α′ . Let u(∆(C)) denote the image of ∆(C) under u. Let U (L) denote the image of L ⊂ Zunder U. Since U (L) and u(∆(C)) are nondegenerate intervals of R+ , take p1 ∈ u(∆(C)) and p2 ∈ U (L) from the relative interiors. Take two points (p′1 , p′2 ), (p′′1 , p′′2 ) ∈ R2+ such that p′′1 > p1 > p′1 ,

93 p′2 > p2 > p′′2 , and (1 − α′ )p′1 + α′ p′2 = (1 − α′ )p1 + α′ p2 , and (1 − α′′ )p′′1 + α′′ p′′2 = (1 − α′′ )p1 + α′′ p2 . (2.17) Since p1 belongs to the relative interior of u(∆(C)), p′1 , p′′1 can be taken to be in u(∆(C)). Similarly, we can assume p′2 , p′′2 belong to U(L). Then we have (1 − α′ )p′1 + α′ p′2 > (1 − α′ )p′′1 + α′ p′′2 , and (1 − α′′ )p′′1 + α′′ p′′2 > (1 − α′′ )p′1 + α′′ p′2 . (2.18) Indeed, by contradiction, suppose (1 − α′ )p′′1 + α′ p′′2 ≥ (1 − α′ )p′1 + α′ p′2 . By (2.17), (1 − α′ )p′′1 + α′ p′′2 ≥ (1 − α′ )p1 + α′ p2 . Since p′′1 > p1 , p′′2 < p2 , and α′′ > α′ , we have (1 − α′′ )p′′1 + α′′ p′′2 > (1 − α′′ )p1 + α′′ p2 , which contradicts (2.17). The same argument can be applied to the other case. Now take lotteries lc′ , lc′′ ∈ ∆(C) and l′ , l′′ ∈ L such that u(lc′ ) = p′1 ,u(lc′′ ) = p′′1 , U ({l′ }) = p′2 , and U ({l′′ }) = p′′2 . Taking (2.18) and continuity of the inner product together, there exist open neighborhoods B(α′ ) and B(α′′ ) satisfying (1 − α)u(lc′ ) + αU ({l′ }) > (1 − α)u(lc′′ ) + αU ({l′′ }), and ′′ ′ (1 − α)u(l ˜ ˜ U ({l′′ }) > (1 − α)u(l ˜ ˜ U ({l′ }), c) + α c) + α

(2.19)

for all α ∈ B(α′ ) and α ˜ ∈ B(α′′ ). Since α′ , α′′ belong to the support of µ, µ(B(α′ )) > 0 and µ(B(α′′ )) > 0. Thus, by (2.19) and the representation, U ({lc′ ⊗{l′ }, lc′′ ⊗{l′′ }}) > U ({lc′ ⊗{l′ }}), and U ({lc′ ⊗{l′ }, lc′′ ⊗{l′′ }}) > U ({lc′′ ⊗{l′′ }}), which contradicts Strategic Rationality.

94

2.7.7

Proof of Proposition 2.1

1. If satisfies Dominance, {lx } ∼ O∗ (lx ) = O∗ (x) ∼ x. To show the converse, let lx be a best element in x with respect to commitment ranking. By definition of O∗ (x), lx is also a best element in O∗ (x). Thus x ∼ {lx } ∼ O ∗ (x). 2. For all x, let lx denote a best element in x with respect to commitment ranking. Because of part (i), it suffices to show that Strategic Rationality is equivalent to the condition that x ∼ {lx } for all x. First suppose that satisfies x ∼ {lx }. Since x y implies {lx } {ly }, lx is a best element of x ∪ y with respect to commitment ranking. Hence x ∼ {lx } ∼ x ∪ y. Next suppose satisfies Strategic Rationality. Take any finite menu x, denoted by {l1 , l2 , · · · , lN }. Without loss of generality, let lx = l1 . Since {l1 } {l2 }, Strategic Rationality implies {l1 , l2 } ∼ {l1 }. Since {l1 , l2 } ∼ {l1 } {l3 }, again by Strategic Rationality, {l1 , l2 , l3 } ∼ {l1 , l2 } ∼ {l1 }. Repeating the same argument finite times, x ∼ {lx }. For any menu x, Lemma 0 of Gul and Pesendorfer [26, p. 1421] shows that there exists a sequence of finite subsets xn of x converging to x in the sense of the Hausdorff metric. Since lx is a best element of x and xn ⊂ x, applying the above claim, xn ∪ {lx } ∼ {lx }. Thus, by Continuity, x = x ∪ {lx } ∼ {lx } as n → ∞. 3. Applying Commitment Marginal Dominance and Monotonicity, we have {l} ∼ O(l) {l′ } for all l′ ∈ O(l). Hence, if lx is a best element in x with respect to commitment ranking, so is lx in O(x). Therefore, by part (i), x ∼ {lx } ∼ O(x).

95

2.7.8

Proof of Theorem 2.4

Since 1 and 2 are equivalent on L, we have condition (i). Let ui (∆(C)) denote the image of ∆(C) under ui . Let U i (L) denote the image of L ⊂ Z under U i . Let lc+ and lc− be a maximal and a minimal lottery with respect to ui . Since ui (lc+ ) ≥ U i ({l}) ≥ ui (lc− ) for all l ∈ L, we have U 1 (L) = u1 (∆(C)) = u2 (∆(C)) = U 2 (L). Let supp(µi ) denote the support of µi . By contradiction, suppose that there exists α∗ ∈ supp(µ1 ) with α∗ ∈ supp(µ2 ). Since supp(µ2 ) is a relative closed set of [0, 1], there exists a relative open interval (αa , αb ) of α∗ such that (αa , αb ) ∩ supp(µ2 ) = ∅. Since u1 (∆(C)) is a non-degenerate interval of R+ , take p1 ∈ u1 (∆(C)) and p2 ∈ U 1 (L) from the relative interior. Take (pa1 , pa2 ) and (pb1 , pb2 ) such that pa1 > p1 > pb1 , pb2 > p2 > pa2 , and (1−αa )pa1 +αa pa2 = (1−αa )p1 +αa p2 , and (1−αb )pb1 +αb pb2 = (1−αb )p1 +αb p2 . Then we have (1 − α)pb1 + αpb2 > max[(1 − α)p1 + αp2 , (1 − α)pa1 + αpa2 ], for all α > αb , (1 − α)p1 + αp2 > max[(1 − α)pa1 + αpa2 , (1 − α)pb1 + αpb2 ], for all α ∈ (αa , αb ), (1 − α)pa1 + αpa2 > max[(1 − α)p1 + αp2 , (1 − α)pb1 + αpb2 ], for all α < αa . Since (pa1 , pa2 ) and (pb1 , pb2 ) can be chosen sufficiently close to (p1 , p2 ), assume without loss of generality that pa1 , pb1 ∈ u1 (∆(C)) and pa2 , pb2 ∈ U 1 (L). Thus there exist lc , lca , lcb ∈ ∆(C) and l, la , lb ∈ L such that ui (lc ) = p1 , ui (lca ) = pa1 , ui (lcb ) = pb1 , U i ({l}) = p2 , U i ({la }) = pa2 , and U i ({lb }) = pb2 . Define x ≡ {lc ⊗ {l}, lca ⊗ {la }, lcb ⊗ {lb }} and y ≡ {lca ⊗ {la }, lcb ⊗ {lb }}. Since (αa , αb ) ∩ supp(µ2 ) = ∅, U 2 (x) = U 2 (y). On the other hand, since µ1 ((αa , αb )) > 0,

96 U 1 (x) > U 1 (y). This contradicts the assumption that 2 desires more flexibility for one-period ahead than 1 . We show the converse. Assume that u1 = u2 and supp(µ1 ) ⊂ supp(µ2 ). Since i , i = 1, 2 are equivalent on L, we have α1 = α2 . Consequently, U 1 ({l}) = U 2 ({l}) for all l ∈ L. Now take all x, y ∈ Z 1 with y ⊂ x and assume x ≻1 y. There exists α∗ ∈ supp(µ1 ) such that max

lc ⊗{l}∈x

(1 − α∗ )u1 (lc ) + α∗ U 1 ({l}) > max (1 − α∗ )u1 (lc ) + α∗ U 1 ({l}) . lc ⊗{l}∈y

(2.20)

By continuity of the representation, there exists an open neighborhood O ⊂ [0, 1] of α∗ such that strict inequality (2.20) holds for all α ∈ O. Since α∗ ∈ supp(µ1 ) ⊂ supp(µ2 ), µ2 (O) > 0. Moreover, since u1 = u2 and U 1 ({l}) = U 2 ({l}) for all l ∈ L, max

lc ⊗{l}∈x

(1 − α)u2 (lc ) + αU 2 ({l}) > max (1 − α)u2 (lc ) + αU 2 ({l}) lc ⊗{l}∈y

for all α ∈ O, which implies U 2 (x) > U 2 (y).

2.7.9

Proof of Theorem 2.5

Lemma 2.10. Suppose that i satisfies all the axioms of the random discounting model.

If agent 2 is more averse to commitment than agent 1,

x 1 {l} ⇒ x 2 {l} for all x ∈ Z and l ∈ L. Proof. It suffices to show that x ∼1 {l} ⇒ x ∼2 {l}. If agent 1 strictly prefers l to the worst lottery l, then x ≻1 {λx + (1 − λ)l} for all λ ∈ (0, 1). By assumption, x ≻2 {λx + (1 − λ)l}. Thus Continuity implies x 2 {l} as λ → 1. If l is indifferent to l, consider the best lottery l. Since {l} x for all x, mixture

97 linearity of the representation implies U 1 (λx + (1 − λ){l}) > U 1 (x) = U 1 ({l}) for all λ ∈ (0, 1). By assumption, λx + (1 − λ){l} ≻2 {l}. Thus Continuity implies x 2 {l} as λ → 1. Lemma 2.11. Agent 2 is more averse to commitment than agent 1 if and only if there exist random discounting representations U i with components (ui , µi ), i = 1, 2 such that 1. u1 = u2 and α ¯1 = α ¯2, 2. U 1 (x) ≤ U 2 (x) for all x ∈ Z. Proof. Necessity is obvious. We prove only sufficiency. By Lemma 2.10, 1 and 2 are equivalent on L. Thus there exist random discounting representation satisfying 1. Note that for all x ∈ Z there exists l ∈ L such that x ∼1 {l} or U 1 (x) = U 1 ({l}). Since U 1 ({l}) = U 2 ({l}) for all l ∈ L, x ∼1 {l} implies that x 2 {l} or U 2 (x) ≥ U 2 ({l}) = U 1 ({l}) = U 1 (x).

(Necessity) Let U be the Banach space of all real-valued continuous functions on Z. Define the operator T i : U → U by

i T (U )(x) ≡ max (1 − α)u(lc ) + α U (z) dlz dµi (α). l∈x

Z

Pick x ∈ Z arbitrarily. Note that, for all U ∈ U , maxl∈x (1 − α)u(lc ) + α Z U(z) dlz is continuous and convex with respect to α. Hence it holds that T 1 (U )(x) ≤

T 2 (U)(x) for all x ∈ Z. For i = 1, 2, let T i,n denote the operation defined as n-times iterations of T i . We show, by mathematical induction, T 1,n (U )(x) ≤ T 2,n (U )(x) for all

98 x ∈ Z and n = 1, 2, · · · . Then it holds that U 1 (x) ≤ U 2 (x) since T i,n (U ) converges to U i . And the results follows from Lemma 2.11. Suppose that T 1,k (U )(x′ ) ≤ T 2,k (U)(x′ ) for all x′ ∈ Z. Pick x ∈ Z arbitrarily. Then it holds that T 2 (T 1,k (U ))(x) ≤ T 2 (T 2,k (U ))(x). Moreover, since T 1,k (U) is in U, we have T 1 (T 1,k (U ))(x) ≤ T 2 (T 1,k (U))(x). These together imply T 1,k+1 (U )(x) ≤ T 2,k+1 (U)(x) for all x ∈ Z. (Sufficiency) We show that, for all continuous and convex function v of α, there is a sequence {vn } of functions of the form (2.7) such that 1 1 sup |v(α) − vn (α)| < , and vn (α) dµ (α) ≤ vn (α) dµ2 (α) n α for all n = 1, · · · . Then the results follows from the dominated convergence theorem. Let v : [0, 1] → Rbe a continuous convex function. Then, for every α ˆ ∈ [0, 1], there exists a vector pαˆ ∈ R2 such that for all α ∈ [0, 1], v(α) ≥ (1 − α)pα,1 ˆ + αpα,2 ˆ with equality for α. ˆ Fix n. Since v(α) − {(1 − α)pα,1 ˆ + αpα,2 ˆ } is continuous with respect to α, there exists an open neighborhood B(α) ˆ of α ˆ such that for every α ∈ B(α) ˆ 0 ≤ v(α) − {(1 − α)pα,1 ˆ + αpα,2 ˆ } <

1 . n

It follows from the compactness of [0, 1] that there exists a finite set {α ˆ i }M i=1 ⊂ [0, 1] such that {B(ˆ αi )}M i=1 is a covering of [0, 1]. We define vn : [0, 1] → R by vn (α) = max(1 − α)pαˆi ,1 + αpαˆi ,2 . i

99 Then it is straightforward that v(α) ≥ vn (α) for every α ∈ [0, 1]. Moreover, we see that sup |v(α) − vn (α)| < α

1 . n

In fact, pick an arbitrary α ∈ [0, 1]. Then there is j ∈ M such that α ∈ B(α ˆ j ). This implies 0 ≤ v(α) − vn (α) ≤ v(α) − {(1 − α)pαˆj ,1 + αpαˆ j ,2 } <

1 . n

Finally we see that

1

vn (α) dµ (α) ≤

vn (α) dµ2 (α).

Since u(∆(C)) and U 1 (L) = U 2 (L) are closed intervals, we can assume, without loss of generality, that there exist {lc,i } ⊂ ∆(C) and {li } ⊂ L satisfying u1 (lc,i ) = u2 (lc,i ) = pαˆ i ,1 , and U 1 ({li }) = U 2 ({li }) = pαˆi ,2 . Thus we can rewrite vn by vn (α) = max(1 − α)u1 (lc,i ) + αU 1 ({li }) i

= max(1 − α)u2 (lc,i ) + αU 2 ({li }). i

Consider the menu xn = {(lc,i , {li })|i = 1, · · · , M }. Then it follows from Lemma 2.11 that

1

1

n

2

n

vn (α) dµ (α) = U (x ) ≤ U (x ) =

which completes the proof.

vn (α) dµ2 (α),

100

2.7.10

Proof of Proposition 2.2

1. We can solve (2.10) by the guess-and-verify method. Let Vµ (s, α) ≡ Aµ (α)

s1−σ + Bµ (α). 1−σ

(2.21)

Taking F.O.C. of the maximization problem

′ 1−σ ((1 + r)s − s′ )1−σ ′ s ′ ′ (1 − α) max +α Aµ (α ) + Bµ (α ) dµ(α ) , s′ 1−σ 1−σ (2.22) we have

where Aµ ≡ function

(1 − α)((1 + r)s − s′ )−σ = αAµ s′

,

Aµ (α′ ) dµ(α′ ). By rearrangement, we can obtain the saving

s′ = Thus,

−σ

(αAµ )1/σ (1 + r)s. (1 − α)1/σ + (αAµ )1/σ

SRµ (α) ≡

(αAµ )1/σ (1 − α)1/σ + (αAµ )1/σ

(2.23)

(2.24)

is the saving rate when α is realized.

Substituting (2.23) into (2.22) and comparing the coefficients with (2.21), for all α, σ Aµ (α) = (1 − α)1/σ + (αAµ )1/σ (1 + r)1−σ ,

Bµ (α) = αB µ , where B µ ≡

(2.25) (2.26)

Bµ (α′ ) dµ(α′ ). From (2.26), B µ = α ¯ B µ . Since α ¯ = 1, we have

B µ = 0, and hence Bµ (α) = 0 for all α. For all α ∈ [0, 1] and A ≥ 0, define the function f : [0, 1] × R+ → R by σ f (α, A) ≡ (1 − α)1/σ + (αA)1/σ (1 + r)1−σ .

(2.27)

101 Then, Aµ is characterized as a solution of the following equation Aµ = f(α, Aµ ) dµ(α). Let F (A) ≡

(2.28)

f (α, A) dµ(α). We want to show that there exists a unique

A > 0 satisfying A = F (A). First of all, F (0) = (1 − α)(1 ¯ + r)1−σ > 0. Since ∂f ′ F (A) = dµ(α) ∂A σ−1 1/σ−1 = (1 + r)1−σ α1/σ (1 − α)1/σ + (αA)1/σ A dµ(α) σ−1

1/σ 1 − α + α1/σ dµ(α), = (1 + r)1−σ α1/σ A we have limA→∞ F ′ (A) = α ¯ (1 + r)1−σ < 1. Since limA→∞ F (A) → ∞, the L’Hopital’s rule implies F (A) = lim F ′ (A) < 1. A→∞ A→∞ A lim

˜ < A. ˜ By Hence, there exists a sufficiently large number A˜ such that F (A) continuity of F , there exists A > 0 such that A = F (A). Finally, since 2 ∂ f ′′ dµ(α) F = ∂A2 σ−2 1/σ−2 1−σ 1−σ = (1 + r) α2/σ (1 − α)1/σ (1 − α)1/σ + (αA)1/σ A dµ(α), σ

F is either strictly convex or concave depending on σ ≶ 1. Taking into account the fact that limA→∞ F ′ (A) < 1, A must be unique.

2. Suppose that the DM believes her discount factor to be α ¯ for sure over the rest of the horizon. Then, she faces the decision problem Vα¯ (s) =

max (1 − α ¯ )u(c) + α ¯ Vα¯ (s′ ).

(c,s′ )∈B(s)

(2.29)

102 For all s ≥ 0, define s∗ as the number satisfying ∗ Vα¯ (s ) = Vµ (s, α′ ) dµ(α′ ). (2.30) ∗1−σ 1−σ From part 1, equation (2.30) is equivalent to Aα¯ s1−σ = Aµ s1−σ . By rearrangement,

s∗ = s

Aµ Aα¯

1

1−σ

.

Denote s∗ /s by φ(µ), which is called the certainty compensation ratio (CCR) for µ. The proof consists of the following three steps. The first step is that, for any σ, the expected value from the savings s strictly increases if the DM becomes more uncertain about discount factors. That is, for any s, ′ 2 ′ Vµ2 (s, α ) dµ (α ) > Vµ1 (s, α′ ) dµ1 (α′ ).

(2.31)

This inequality follows from the next lemma: Lemma 2.12. Aµ1 ≶ Aµ2 if σ ≶ 1. Proof. Note first that f defined as 2.27 is strictly convex or strictly concave in α according as σ < 1 or σ > 1. Indeed, for any α ∈ (0, 1) and A > 0, σ−2 1 1 ∂ 2f 1−σ 1 − σ σ + (αA) σ = (1 + r) (1 − α) ∂α2 σ 1 1 1 1 1 1 × 2(α(1 − α)) σ −1 A σ + (1 − α) σ −2 (αA) σ + α σ −2 ((1 − α)A) σ ≷0

whenever σ ≶ 1. We know that Aµ2 satisfies the equationAµ2 =

f(α, Aµ2 )dµ2 (α). Since µ1

second-order stochastically dominates µ2 , 2 Aµ2 = f(α, Aµ2 )dµ (α) ≷ f (α, Aµ2 )dµ1 (α)

(2.32)

103 depending on σ ≶ 1. Let F 1 (A) ≡ f (α, A)dµ1 (α). Notice that F 1 is a continuous function.

We know from the proof of part (i) that F 1 (0) = (1 − α)(1 + r)1−σ > 0 and

Aµ1 is a unique positive solution of F 1 (A) = A. Hence, F 1 (A) ≷ A if A ≶ Aµ1 . Taking this observation and 2.32 together, Aµ2 ≷ Aµ1 if σ ≶ 1. From Lemma 2.12, for all σ,

s1−σ s1−σ > Aµ1 = Vµ2 (s, α ) dµ (α ) = Aµ2 1−σ 1−σ ′

2

′

Vµ1 (s, α′ ) dµ1 (α′ ).

The next step is that φ(µ) always increases as µ becomes more uncertain. Since Vα¯ (·) is strictly increasing, it follows from (2.30) and (2.31) that s∗2 is strictly bigger than s∗1 , where s∗1 and s∗2 satisfy (2.30) for µ1 and for µ2 , respectively. Thus φ(µ2 ) = s∗2 /s > s∗1 /s = φ(µ1 ) as desired. From (2.30) together with φ(µi ), maximization problem (2.29) is rewritten as max (1 − α)u(c) + αVα¯ (φ(µi )s′ ).

(c,s′ )∈B(s)

Now this problem can be viewed as if the DM has a constant discount factor α ¯ from the next period on and φ(µi ) is the gross rate of return from savings s′ . That is, we can deal with the increase of CCR as the increase of interest rate. The last step concerns the income-substitution effects. In a two-period consumption-saving model with no uncertainty, it is well-known that, in case of σ < 1, the substitution effect dominates the income effect, and hence the

104 saving rate increases as the interest rate goes up, while, in case of σ > 1, the income effect dominates, and hence the saving rate decreases as the interest rate increases. Therefore, by taking all the steps together, the proposition follows.

2.7.11

Proof of Proposition 2.3

For the reservation property, we show it in the case of sampling with recall because the same proof works for sampling without recall. See Section 2.7.12. Notice first that wα∗ satisfies (1 − Since

α)u(wα∗ )

=α

U (x(w ′ )) dF (w ′ ).

(2.33)

U (x′ ) dF (x′ ) is constant, u(x∗α ) must increase as α increases. By u′ > 0,

wα∗ is increasing in α and non-increasing with respect to second order stochastic dominance for µ. Next we derive the equation that characterizes wα∗ . Note that

V (w, α) = max (1 − α)u(x(w)), α

′

′

U (x(w )) dF (w ) .

By the definition of wα∗ ,

V (w, α) =

   (1 − α)u(x(w))

if w ≥ wα∗

  α U (x(w ′ )) dF (w ′ ) if w < wα∗ .

(2.34)

105 Since U (x(w)) = α

∗ wα

0

=α

1

0

=α

1

0

=α

1

0

=α

V (w, α) dµ(α) for all x(w), (2.34) implies

U (x(w′ )) dF (w′ )

  =α

1

0

     

1

V (w ′ , α′ ) dµ(α′ ) dF (w ′ ) + 0

∗ wα ′

V (w′ , α′ ) dF (w ′ ) +

∗ wα ′

0

0

∗ wα ′

α′

∗ wα

1

0

V (w ′ , α′ ) dµ(α′ ) dF (w ′ ) 

U(x(w′′ )) dF (w′′ ) dF (x′ ) +

(1 − α′ ) u(x(wα∗ ′ ))F (wα∗ ′ ) +



V (w′ , α′ ) dF (w ′ )  dµ(α′ )

−

w

∗ wα ′

−

w ∗ wα ′



−

w

∗ wα ′

(1 − α′ )u(x(wα∗ ′ ))dF (w′ ) + 

−

w

−

w

∗ wα ′

0

(1 − α′ )u(x(w′ )) dF (w′ ) dµ(α′ ) 

(1 − α′ )u(x(w ′ )) dF (w ′ ) dµ(α′ ) 

(1 − α′ )u(x(w′ )) dF (w ′ ) dµ(α′ ).

Taking (2.33) and (2.35) together, we have the required result.

2.7.12

Proof of Proposition 2.4

We show that the optimal strategy satisfies reservation property. Following Kohn and Shavell [33], we formulate a sequential problem and show that its solution corresponds to the solution of the Bellman equation. We define a sequential problem that the DM faces. Denote the minimum and maximum of the support by αm and αM , respectively. Since the support of µ is closed in [0,1], there exist αm and αM . Without loss of generality, we consider the sampling problem at period 0. We assume that the DM faces sampling with recall and draws w and α at period 0.17 Let denote a history of 17

If we consider a problem in period t, we can take w as m(ht ).

(2.35)

106 offers and discount factors up to t period as ht = ((w, α), . . . , (wt , αt )), and the set of ht for t ≥ 1 is denoted by Ht . We define a distribution function Gt on ht by using F and µ. We denote a subhistory of ht as ht,n = ((w, α), . . . , (wn , αn )) for n = 0, . . . , t. Note that ht,0 = h0 = (w, α) for all ht . Given ht , the strategy d is a choice from {accept w, . . . , accept wt , continue}. For consistency with the main text, a strategy d depends only on (ht , αt ). Given a strategy d, we define the set of histories in which the DM first accepts an offer in period t as Htd = {ht ∈ Ht | d(ht ) = accept an offer from ht , d(ht,n ) =continue ∀n = 0, . . . , t−1}. d Note that H∞ is the set of histories in which the DM with strategy d never

accepts an offer. We denote a strategy d that satisfies d(h0 ) = accept w as d = 0. A strategy d that satisfies d(h0 ) = continue is denoted by d = 0. We also use a notation m(ht ) = max{w, . . . , wt }. The value of d = 0 is W0 (w, α, d = 0) = (1 − α)u(w). The expected value W0 (w, α, d) of a strategy d = 0, given w and α, is W0 (w, α, d) =

∞ t=1

At max{u(w), u(w1 ), . . . , u(wt )}dGt (ht ),

(2.36)

Htd

where A1 = α(1 − α1 ),. . . , and At = αα1 · · · αt−1 (1 − αt ).

Notice that

W0 (w, α, d) is non-decreasing in w. We define the value function in a sequential problem as v0 (h0 ) = supd W0 (w, α, d). When there is no confusion, we omit the time index 0 of W0 (w, α) and write it as W (w, α). Before proving the Proposition, we first show that the value function in the sequential problem satisfies the Bellman equation. We only focus on the

107

problem at 0. For this purpose, we define for t ≥ 2, Htd (h1 ) = { ht ∈ Htd ht,1 =

d h1 }. Note that H∞ (h1 ) is the set of histories after h1 in which the DM with

a strategy d never accepts an offer. Then, we define the expected value when the DM continues sampling after history h1 as follows W1 (h1 , d|h1 = 0) =

∞ t=2

At,1 max{u(w), u(w1 ), . . . , u(wt )}dGt (ht ),

Htd (h1 )

where d|h1 is the restriction of a strategy d after h1 and A1,1 = 1 − α1 , At,2 = α1 (1 − α2 ), . . . , and At,1 = α1 · · · αt−1 (1 − αt ). By rearrangement, we show that v0 (h0 ) satisfies the Bellman equation: v0 (h0 ) = max{(1 − α)u(w), W (w, α, d = 0)}, = max{(1 − α)u(w), α sup{(1 − α1 )u(w), (1 − α1 )u(w1 ), W1 (h1 , d|h1 = 0)}dG1 (h1 ), = max{(1 − α)u(w), α

d|h

1

v1 (h1 )dG1 (h1 )},

Similar definitions and rearrangement lead to the conclusion that the function V (m(ht ), αt ) = vt (ht ) for all ht satisfies the Bellman equation. Second, we show that the value function in the Bellman equation achieves supremum in the sequential problem. We use a notation Ht (Hnd ) = { ht ∈ Ht | ht,n ∈ Hnd }. For any d,

V (w, α) = max{(1 − α)u(w), α V (m(h1 ), α1 )dµ(α1 )dF (w1 )}, ≥ max{(1 − α)u(w), α(1 − α1 )u(m(h1 ))dG1 (h1 ) H1d + αα1 V (m(h2 ), α2 )dG2 (h2 )}. H2 \H2 (H1d )

108 By induction, for any d V (w, α) ≥ max{(1 − α)u(w), +

T t=1

d HT +1 \∪T t=1 HT +1 (Ht )

By taking T → ∞, we have V (w, α) ≥ max{(1 − α)u(w),

Htd

At u(m(ht ))dGt (ht )

αα1 · · · αT V (m(hT +1 ), αT +1 )dGT +1 (hT +1 )}.

∞ t=1

At max{u(w), . . . , u(wt )}dGt (ht )},

Htd

because V (m(ht ), αt ) ≤ u(w) for all ht and αt . Thus, similar rearrangement leads to the conclusion that vt (ht ) = V (m(ht ), αt ) for all t achieves supremum in the sequential problem. To prove Proposition 2.4, we prepare several lemmata. Lemma 2.13. Suppose α < 12 . If the DM with α weakly prefers accepting w to continuing, she must strictly prefer accepting w ′ for w′ > w. Proof. Suppose contrary; at w′ , there is a policy d′ which involves sampling at least once and which is at least as good as stopping immediately. That is, W (w ′ , α, d′ ) ≥ W (w′ , α, d ≡ 0) = (1 − α)u(w ′ ). Using (2.36), we have W (w ′ , α, d′ ) − W (w, α, d′ ) ∞ = αα1 · · · αi−1 (1 − αi )[max{u(w ′ ), u(w1 ), . . . , u(wi )} i=1

di

− max{u(w), u(w1 ), . . . , u(wi )}]dG(x) ∞ ≤ αα1 · · · αi−1 (1 − αi )(u(w′ ) − u(w))dG(x) i=1

di

′

< α(u(w ) − u(w))

∞

= α(u(w′ ) − u(w)).

i=1

di

dG(x)

(2.37)

109 Hence W (w, α, d′ ) > W (w ′ , α, d′ ) − α(u(w′ ) − u(w)) ≥ (1 − α)u(w ′ ) − α(u(w′ ) − u(w)) = (1 − 2α)u(w′ ) + αu(w) > (1 − α)u(w) = W (w, α, d ≡ 0), which contradicts to the assumption if α < 12 . In the followings, we always assume that α <

1 . 2

By using the above

Lemma, we show that the reservation value, wα is unique. Since the proof works for both sampling without recall and with recall, we use notation wα , here. Lemma 2.14. Suppose that there is wα such that the DM with α is indifferent between accepting wα and continuing. For w ′ > wα , she strictly prefers accepting w ′ to continuing. For w′ < wα , she strictly prefers continuing to accepting w ′ . It follows that wα is unique. Proof. By Lemma 2.13, the case for w ′ > w is immediate. For w′ < w, suppose contrary. By Lemma 2.13, the DM with α strictly prefers stopping at wα to continuing. Contradiction. Lemma 2.15. At w = 0, the DM with α strictly prefers continuing to accepting 0. At w = w, she strictly prefers accepting w to continuing. Proof. For w = 0. The payoff from stopping is 0 since u(0) = 0. The payoff from continuing is at least positive since F has positive mean.

110 For w = w. The payoff from stopping is (1 − α)u(w). Since the DM discounts the future, the expected payoff from continuing is at most α (1 − α1 ) max{u(w1 ), u(w)}dF (w1 )dµ(α1 ) = αu(w) (1−α1 )dµ(α1 ) = α(1−α)u(w). Since α ∈ (0, 1/2), α < 1 − α and 1 − α < 1. Thus 1 − α > α(1 − α). Then,

u(w) implies the result. We now prove that the optimal strategy has reservation property. Let W (w, α) = maxd=0 W (w, α, d). By (2.37), |W (w, α, d) − W (w ′ , α, d)| ≤ α |w − w ′ | ≤ max{αM , 1 − αm } |w − w ′ | ≤ (1 − αm ) |w − w ′ | for any w, w′ , α, and d. Thus, the family W (·, α, d) is equicontinuous, and it can be shown that the maximum of such a family of functions is continuous. Thus, g(w) = (1−α)u(w)−W (w, α) is continuous. By Lemma 2.15, g(w) > 0 and g(0) < 0. This implies g has at least one root. By the definition of g, such a root is a point of indifference between stopping and sampling. By Lemma 2.14, there is at most one such point. Thus, there is one root wα such that g(wα ) = 0 and the DM is indifferent between stopping and sampling. By construction, the DM follows reservation strategy with the reservation value wα .

2.7.13

Proof of Proposition 2.5

Let call d a strategy that uses the following reservation strategy; the DM stops if the current offer w ≥ wα∗ and continues if w < wα∗ . Note that wα∗ is the reservation value under sampling without recall. We show that U ( x((ht−1 , w))) ≥ Ud ( x((ht−1 , w))) > U (x(w)), where Ud ( x((ht−1 , w))) is the value from a menu x ((ht−1 , w)) when the DM follows d. The first inequality holds by optimality. We show the second one. Under d, the timing when the

111 agent stops is the same between x ((ht−1 , w)) and x(w). The only difference is the payoff. In x ((ht−1 , w)), the agent can choose the best one from past and current offers. Since the support of µ is finite and the support of F is [0, w],

the probability of the event in which one of rejected past offers is higher than current one is positive. Therefore, the second strict inequality holds.

112

Chapter 3 A note on the value of information with an ambiguous signal 3.1

Introduction

Consider an agent who chooses the information quality of a signal by paying a cost. Then, she observes a signal that is correlated with the unobservable parameter, and finally she takes an action. Suppose that information quality is one dimensional; particularly, the agent can choose the signal precision. In this framework and assuming expected utility maximization, the value of information is not globally concave under certain conditions as found by Radner and Stiglitz [43] (henceforth RS); see Chade and Schlee [9] (henceforth CS) for an analysis with general signal and parameter spaces, and also Keppo, Moscarini, and Smith [30] who derive explicit formulas for the value of information and

113 show nonconcavity of the value of information. If the value function is not concave in information quality, there are (at least locally) increasing returns to information. Under linear pricing, the demand for information may not follow the law of demand and may not be a continuous function. These properties cause difficulty for the analysis of information demand and for existence of general equilibrium. The analysis of the value of information has assumed that beliefs are represented by a probability measure, corresponding to Savage’s subjective expected utility theory. However, the distinction between risk and ambiguity is behaviorally significant as Ellsberg [16] demonstrates. In the Ellsberg Paradox, behavior interpreted as aversion to ambiguity is inconsistent with Savage’s theory. Ambiguity aversion can be accommodated by the multiple priors model due to Gilboa and Schmeidler [24]. Building on it, Epstein and Schneider [21] (henceforth ES) model an agent who treats signals as ambiguous when she has incomplete knowledge of signal quality. Following ES, this chapter analyzes a decision problem in which the signal is ambiguous and the agent is ambiguity averse. We consider a model in which a Bayesian’s value of information function is concave, and we ask whether concavity is affected by the introduction of ambiguity. More precisely, under ambiguity information quality is represented by an interval of precisions, and there are two relevant dimensions that affect information quality: the location and width of the interval of precisions. When the agent cannot choose these dimensions independently, concavity (in an appropriate sense) is preserved. However, nonconcavity can arise if the agent is free to choose the position and

114 width of the interval of precisions independently.

3.2

The Model

We consider four models with an ambiguous signal which differ in what is assumed about the choices available to the agent.

3.2.1

Model 1

The agent tries to learn a parameter θ that she cannot observe. She has a prior N (0, σθ2 ) on θ with σθ2 > 0. After observing a signal s, she takes either a safe action that always yields 0 payoff, or an uncertain action whose payoff is θ. The signal s is ambiguous; as in ES, we assume s = θ + η, η ∼

N (0, ση2 ), ση2

∈

1 1 , . M +ε M −ε

(3.1)

Thus, information quality is represented by an interval of precisions [M − ε, M + ε]. If ε = 0, the model is Bayesian and the agent chooses the precision M . If the signal is ambiguous, both the location and width of the interval affect information quality. Here we assume that the agent chooses both M and ε, subject to the cost c · M + d(ε), where c > 0, and d′ < 0. After paying the cost, the agent observes a signal s that is expected to be drawn according to one distribution in the set P(M, ε) =

N (0, σθ2

+

ση2 )

:

ση2

1 1 ∈ , M +ε M −ε

.

115 Although she has a single Normal prior, an ambiguous signal leads to the nonsingleton set of posteriors M(s)= N

σθ2 ση2 σθ2 s, σθ2 + ση2 σθ2 + ση2

:

ση2

1 1 ∈ , M +ε M −ε

.

The Bayesian benchmark case

As a benchmark case, we analyze the Bayesian model corresponding to ε = 0. The model is as if the agent buys M independent signals at unit cost c, and each signal’s noise follows N (0, 1).1 Prior and posterior beliefs (after observing s) are given by s∼

N (0, σθ2

+

ση2 )

and θ ∼ N

σθ2 ση2 σθ2 s, σθ2 + ση2 σθ2 + ση2

.

Hence, the value function is VB ({N (0, σθ2 )})

+

"

σθ2 . σθ2 + M1

max{0, E µ [θ]}dp(s)}} " ! # σθ2 σθ = max −c · M + √ . M ≥0 2π σθ2 + M1 =

max {−c(ση2 ) ση2 ≥0

Define the value of information as σθ H(M ) ≡ √ 2π

(3.2)

From (3.2), we observe that the value of information depends on the agent’s choice of precision M only through the correlation between a signal and the parameter,

σθ2 1 σθ2 + M

. Since the correlation is concave in M , the value of information

is concave in M. 1

Since d(0) is constant in this case, we omit it.

116 Moreover, H ′ (0) = +∞.2 The condition H ′ (0) = +∞ contradicts the source of nonconcavity in RS, CS, and Keppo, Moscarini, and Smith [30], which is that the marginal value of information is zero around null information - the value of information doesn’t increase too rapidly around null information. This condition is violated here.

Ambiguous Signals

We analyze the value of information when a signal is ambiguous. The agent is ambiguity averse and represented by the multiple priors model, axiomatized by Gilboa and Schmeidler[24]. Building on it, ES provides a model of ambiguous signals. In this chapter, the model of ambiguous signals follows ES, and the corresponding value function is V ({N (0, σθ2 )}) = max {−cM − d(ε) + min M ≥ε≥0

where minµ∈M(s) E µ [θ] =

p∈P(M,ε)

σθ2 2 σθ + M1−ε

(3.3) max{0, min E µ [θ]}dp(s)},

s if s ≥ 0, or

µ∈M(s)

σθ2 2 σθ + M1+ε

s if s < 0. As ES note,

if the agent observes a good signal, she uses the minimal correlation between the signal and the parameter

σθ2 2 σθ + M1−ε

in order to determine an action.

After some algebra, contained in Section 3.3, the value function is obtained as

2

$     σθ2 + M1+ε 2 σθ √ V ({N (0, σθ2 )}) = max −cM − d(ε) + . M ≥ε≥0  σθ2 + M1−ε  2π

In Kihlstrom [31], the value of information is also concave. Note that H ′ (M ) = 3 1 5 3 σ2 σ2 √θ (σ 2 M + 1)− 2 M − 2 > 0 and H ′′ (M) = √θ −2σθ2 M − 12 (σθ2 M + 1)− 2 M − 2 < 0. θ 2 2π 2 2π

117 Define the value of information as $ σθ2 + M1+ε σθ2 √ F (M, ε) ≡ . σθ2 + M1−ε 2π 1. The value of information is concave in M ; that is,

Proposition 3.1.

F (·, ε) is concave on [ε, ∞). 2. The value of information is not globally concave in ε. If ε is larger than 7 , 20σθ2

the value of information is concave in ε; that is, F (M, ·) is concave

on (ε, M ], ε =

7 . 20σθ2

If a signal is ambiguous, the value of information F consists of two parts. The first is

σθ2 2 σθ + M1−ε

, which describes the minimal correlation between a signal

and the parameter. If the agent observes a good signal, she uses this minimal correlation

σθ2 σθ2 + M1−ε

to guide her choice of action - this is because of the

“worst-case scenario” nature of multiple-priors utility. The minimal correlation between a signal and the parameter is concave in each of M and ε separately. √ 2 1 σθ + M +ε √ However, since the signal is ambiguous, F has another term that 2π is absent in the corresponding Bayesian model. Since the agent chooses the safe action if a signal is bad (s < 0), she treats bad signals as if s = 0. Hence, the distribution of a signal is described by a truncated Normal distribution in the set P(M, ε). This implies that before taking the minimum operation with respect to the signal distribution in (3.3), the expected value of a signal is decreasing in the precision through the change of the distribution of the signal. Hence, the highest precision M + ε is used for the minimum operation with respect to the distribution of a signal, and the expected value of a signal √ 2 1 σθ + M +ε √ is convex in M and ε separately. 2π

118 Though there are two conflicting effects on the value of information, the value of information is concave in M . Note first that the minimum correlation σθ2 σθ2 + M1−ε

√

is increasing and concave in M . But the expected value of a signal

σθ2 + M1+ε √ 2π

is decreasing and convex in M . As far as the choice of M is concerned,

the first effect always dominates the second, that is, the value of information is concave in M . This unambiguous result stems from the assumption of the normally distributed prior and normally distributed signal distributions that is crucial for us to interpret the information quality of a signal as an interval of signal precisions: with other distributional assumptions, the second effect might dominates the first. In contrast, the value of information is not concave in ε. For example, if σθ2 = 0.1, then F22 (1, 0.8) = 0.0175. If the value of ε is higher, the agent is less confident about signal quality, and both the minimum correlation and the expected value of a signal are decreased. Moreover, the minimum correlation is concave in ε, but the expected value of a signal is convex in ε. Hence for higher ( 7 values of ε ∈ 20σ , M , the first effect dominates the second, and the value 2 θ

of information is concave in ε. However, if ε is small, the value of information

may not be concave since the second effect may dominate the first.

3.2.2

Model 2

In this subsection, we assume that the upper bound to precision is determined only by M , that is, if the agent chooses M and ε, then ση2

∈

1 1 , M M −ε

with 0 ≤ ε ≤ M.

119 Compare with (3.1). The variable M determines the set of variances’ location, and ε determines its width and the maximum variance. Information quality is [M − ε, M ], and the associated cost is c · M + d(ε) with c > 0 and d′ < 0. The set of distributions for the signal is 1 1 2 2 2 P (M, ε) = N (0, σθ + ση ) : ση ∈ , , M M −ε ′

and the set of posteriors over the parameter is ′

M (s)= N

σθ2 ση2 σθ2 s, σθ2 + ση2 σθ2 + ση2

:

ση2

1 1 ∈ , M M −ε

.

The agent solves the same problem as in model 1 except that she uses P ′ (M, ε) and M′ (s). The corresponding Bayesian model is the same as in model 1, which means that the value of information is concave. After some algebra, similar to model 1, the value function is obtained as $     2 σθ2 + M1 σθ V ({N (0, σθ2 )}) = max −cM − d(ε) + √ . M ≥ε≥0  2π σθ2 + M1−ε 

Define the value of information as G(M, ε) =

$

σθ2 + M1 σθ2 √ . 2π σθ2 + M1−ε

Although the signal is ambiguous, the value of information is concave in M and ε separately. Proposition 3.2.

1. The value of information is concave in M ; that is,

G(·, ε) is concave on [ε, ∞). 2. The value of information is concave in ε; that is, G(M, ·) is concave on [0, M ].

120 As in model 1, the effect of M on the minimal correlation between a signal σ2

and the parameter σ2 + θ 1 dominates its effect on the expected value of a signal θ M −ε √ 2 1 σθ + M √ in the value function G. Hence the value of information is concave in 2π M . In contrast, ε changes only the minimal correlation that is concave in ε. This leads to the value function G being concave in ε.

3.2.3

Model 3

In contrast with model 2, we assume that the lower bound to precision is determined only by M , that is, if the agent chooses M and ε, then 1 1 2 , with 0 ≤ ε ≤ M. ση ∈ M +ε M Compare with (3.1). The variable M determines the set of variances’ location, and ε determines its width and the minimum of variances. If the agent chooses M and ε, information quality is [M, M +ε], and the associated cost is c·M +d(ε) with c > 0 and d′ < 0. The set of distributions of the signal is 1 1 ′′ 2 2 2 P (M, ε) = N (0, σθ + ση ) : ση ∈ , , M +ε M and the set of posteriors for the parameter is

σθ2 ση2 σθ2 1 1 ′′ 2 s, , . M (s)= N : ση ∈ σθ2 + ση2 σθ2 + ση2 M +ε M The agent solves the same problem as in models 1 and 2 except that she uses P ′′ (M, ε) and M′′ (s). Once again, the corresponding Bayesian model is the same as in model 1, which means that the value of information is concave. After some algebra, similar to model 1, the value function is obtained as $    σθ2 + M1+ε σ 2  θ √ V ({N (0, σθ2 )}) = max −cM − d(ε) + . 2 M ≥ε≥0  σθ + M1  2π

121 Define the value of information as $ σθ2 + M1+ε σ 2 θ √ I(M, ε) = 1 . 2 σ + 2π θ M 1. The value of information is concave in M ; that is,

Proposition 3.3.

I(·, ε) is concave on [ε, ∞). 2. The value of information is convex in ε; that is, I(M, ·) is convex on [0, M ]. As in the previous models, the value of information is concave in M . In this model, however, nonconcavity of the value of information is severe in that the value function is convex in ε. This is because ε changes only the expected value of a signal, which is convex in ε.

3.2.4

Model 4

Here we assume that the agent can choose a scalar measure n of information, where larger n increases mean precision and reduces quality (increases the width of the interval of precisions). It is as if the agent buys n ambiguous independent signals at unit cost c, and each signal’s noise follows N (0, σ 2 ) * ) with σ 2 ∈ h1 , 1l , where l > 0. In the absence of ambiguity, having more signals is clearly better because they increase precision. But when each signal

is ambiguous, then a larger number of signals leads also to reduced information quality. If the agent chooses the level of experimentation n, the set of distributions of the signal is ∗

P (n) =

N (0, σθ2

+

ση2 )

:

ση2

1 1 ∈ , nh nl

,

122 and the set of posteriors for the parameter is ∗

M (s)= N

σθ2 ση2 σθ2 s, σθ2 + ση2 σθ2 + ση2

:

ση2

1 1 , ∈ nh nl

.

Information quality is [nl, nh]. As in the previous models, the Bayesian case, corresponding to h = l = 1 and n = M , has a concave value function. More generally, the value function is $  1  σθ2 + nh σθ2 2 √ V ({N (0, σθ )}) = max −cn + n≥0  2π σθ2 +

Define the value of information by $ J(n) =

1 σθ2 + nh σθ2 √ 2π σθ2 +

1 nl

1 nl

  

.

.

Proposition 3.4. The value of information is concave; that is, J(·) is concave on [0, ∞). Although n affects both the location and the width of the interval of precisions, it does so in a fashion similar to the way that M acts in model 1. Hence the value of information is concave in n as it is concave in M in model 1. The normality assumption on distributions is crucial, but also important is that the choice of n replaces the separate choices of M and ε in model 1. This indicates that if a signal is ambiguous, concavity of the value of information depends on whether or not the agent can choose the position and the width of an interval of precisions independently. The concavity result here contrasts with results in RS and CS, where, in their Bayesian models, the value of information is not concave for low levels of experimentation. In RS and CS, corresponding to l = h = 1, nonconcavity

123 of the value of information arises because the marginal value of information is zero at null information, n = 0. If the marginal value of information is positive for some n > 0, concavity is violated. In the present model with an ambiguous signal, the marginal value of information, J ′ (n), is strictly positive for all n. In particular, if h = l = 1, then the marginal value of n is positive even for n = 0.

3.3

Proofs

3.3.1

The Detailed Algebra for Model 1

Since minµ∈M(s) E µ [θ] =

σθ2 s σθ2 + M1−ε

min

p∈P(M,ε)

= =

min

if s ≥ 0, or

σθ2 s σθ2 + M1+ε

max{0, min E µ [θ]}dp(s)} µ∈M(s)

σθ2

1 sdp(s) M−ε σθ2 φ(0) $ 2 min σθ 1 2 ση2 ∈[ M1+ε , M1−ε ] σθ + M −ε p∈P(M,ε)

0

if s < 0,

σθ2 +

+ ση2

$ σθ2 + M1+ε σθ2 √ = , σθ2 + M1−ε 2π

where P(M, ε) = N (0, σθ2 + ση2 ) with ση2 ∈ [ M1+ε , M1−ε ], 0 sdp(s) = (1 − 2 x + x − 2 2 φ(0) Φ(0)) 1−Φ(0) σθ2 + σε2 , φ(x) = √ 12 2 e 2(σθ +σε ) , and Φ(x) = −∞ φ(t)dt. 2π(σθ +σε )

Therefore, the problem is

$     σθ2 + M1+ε 2 σθ 2 √ V ({N (m, σθ )}) = max −cM − d(ε) + 1 . M ≥ε≥0.  σθ2 + M−ε 2π

124

3.3.2

Proof of Proposition 3.1

The derivatives of F with respect to M are

A σ2 F1 (M, ε) = √θ 1 3 > 0, 2 2π {σθ2 (M − ε) + 1}2 {σθ2 (M + ε) + 1} 2 (M + ε) 2 (2σθ2 M + 3σθ2 ε + 1){σθ4 (M 2 − ε2 ) + 2σθ2 M + 1}(M + ε) σ2 F11 (M, ε) = √θ 2 2π

−A(4σθ4 M 2 + 4σθ4 M ε +

11 2 σ M 2 θ

+ 52 σθ2 ε + 32 ) 3

,

5

{σθ2 (M − ε) + 1}3 {σθ2 (M + ε) + 1} 2 (M + ε) 2

where A = σθ2 M 2 + 3σθ2 M ε + σθ2 ε2 + M + 3ε. I show that F11 (M, ε) ≤ 0 for all M and ε such that M ≥ ε ≥ 0. Define N1 = (2σθ2 M + 3σθ2 ε + 1){σθ4 (M 2 − ε2 ) + 2σθ2 M + 1}(M + ε) 11 5 3 −A(4σθ4 M 2 + 4σθ4 M ε + σθ2 M + σθ2 ε + ) 2 2

2 9 = M 4 (−2σθ6 ) + M 3 −11σθ6 ε − σθ4 2 +M 2 −15ε2 σθ6 − 24εσθ4 − 3σθ2

33 2 1 3 6 2 4 M −9ε σθ − 20ε σθ − εσθ − 2 2 7 7 −3ε4 σθ6 − ε3 σθ4 − 6ε2 σθ2 − ε. 2 2

Since N1 ≤ 0 for all M and ε such that M ≥ ε ≥ 0, F11 (M, ε) ≤ 0 for all M and ε such that M ≥ ε ≥ 0. The derivatives of F with respect to ε are σ2 B F2 (M, ε) = − √θ 1 3 ≤ 0, 2 2π {σθ2 (M − ε) + 1}2 {σθ2 (M + ε) + 1} 2 (M + ε) 2 (2σθ2 M + 6σθ2 ε + 1) {σθ4 (M 2 − ε2 ) + 2σθ2 M ε + 1}(M + ε) σ2 F22 (M, ε) = − √θ 2 2π

+ 12 B (8σθ4 ε2 + 8σθ4 M ε + 3σθ2 ε − 3σθ2 M − 3) 3

5

{σθ2 (M − ε) + 1}3 {σθ2 (M + ε) + 1} 2 (M + ε) 2

,

125 where B = 3σθ2 M 2 + 2σθ2 M ε + 3σθ2 ε2 + 3M + ε. The second derivative’s sign is ambiguous. To provide a sufficient condition for F22 (M, ε) ≤ 0, define N2 = 2σθ2 M + 6σθ2 ε + 1 {σθ4 (M 2 − ε2 ) + 2σθ2 Mε + 1}(M + ε) 1 + B 8σθ4 ε2 + 8σθ4 M ε + 3σθ2 ε − 3σθ2 M − 3 2

6 1 4 61 2 4 3 6 2 6 2 2 = M 2σθ + M 20σθ ε + σθ + M 24σθ ε + σθ σ ε−5 2 2 θ

51 4 2 7 6 3 2 +M 12σθ ε + σθ ε + 10σθ ε − 2 2

15 1 . +6ε4 σθ6 + ε3 σθ4 + ε 3σθ2 ε − 2 2 10 7 1 7 If ε > max 61σ = 20σ , , 2 20σ 2 6σ 2 2 , N2 ≥ 0. Hence F22 (M, ε) ≤ 0 for all M θ

θ

θ

and ε such that M ≥ ε >

3.3.3

θ

7 . 20σθ2

Proof of Proposition 3.2

The derivatives of G with respect to M are σ2 C G1 (M, ε) = √θ 1 3 ≥ 0, 2 2π {σθ2 (M − ε) + 1}2 (σθ2 M + 1) 2 M 2 (−4σθ4 M ε + σθ2 M − σθ2 ε + 1)(σθ2 M + 1)M σ2 G11 (M, ε) = √θ 2 2π

− 12 C{σθ2 (M − ε) + 1}(4σθ2 M + 3) 3

5

{σθ2 (M − ε) + 1}3 (σθ2 M + 1) 2 M 2

,

126 where M − ε ≥ 0 and C = σθ2 M 2 + 2σθ2 M ε − σθ2 ε2 + M + ε. I show that G11 (M, ε) ≤ 0 for all M and ε such that M ≥ ε ≥ 0. Define N3 = (−4σθ4 Mε + σθ2 M − σθ2 ε + 1)(σθ2 M + 1)M

1 − B{σθ2 (M − ε) + 1}(4σθ2 M + 3) 2

9 4 21 4 4 6 3 6 2 6 2 2 = M −2σθ + M −6σθ ε − σθ + M 6σθ ε − σθ ε − 3σθ 2 2

3 17 1 3 +M −2σθ6 ε3 + σθ4 ε2 − 6σθ2 ε − − ε3 σθ4 + 3ε2 σθ2 − ε 2 2 2 2

9 = M 4 −2σθ6 + M 3 − σθ4 + M 2 6σθ6 ε (ε − M ) − 3σθ2 2

17 21 1 6 3 4 +M −2σθ ε + σθ ε ε− M − 2 2 2 3 3 − ε3 σθ4 + σθ2 ε (3ε − 6M ) − ε. 2 2

Since M ≥ ε ≥ 0, N3 ≤ 0 and G11 (M, ε) ≤ 0. The derivatives of G with respect to ε are 1

σ2 (σθ2 M + 1) 2 G2 (M, ε) = − √ θ 1 < 0, 2π {σθ2 (M − ε) + 1}2 M 2 1

2σ 4 (σθ2 M + 1) 2 G22 (M, ε) = − √ θ 1 < 0, 2π {σθ2 (M − ε) + 1}3 M 2 for all M and ε such that M ≥ ε ≥ 0.

3.3.4

Proof of Proposition 3.3

The derivatives of I with respect to M are σθ2 D √ I1 (M, ε) = 1 > 0, 3 2 2 2π (M σθ2 + 1) (M + ε) 2 {σθ2 (M + ε) + 1} 2 σ2 N4 I11 (M, ε) = − √θ , 3 5 4 2π (Mσθ2 + εσθ2 + 1) 2 (M + ε) 2 {1 + 3M σθ2 + 3M 2 σθ4 + 3M 3 σθ6 }

127 where D = σθ2 M 2 + 4σθ2 M ε + M + 2σθ2 ε2 + 2ε and

N4 = M 4 4σθ6 + M 3 28σθ6 ε + 9σθ4 + M 2 48σθ6 ε2 + 48σθ4 ε + 6σθ2 +M 32σθ6 ε3 + 52σθ4 ε2 + 24σθ2 ε + 1 +8σθ6 ε4 + 16σθ4 ε3 + 12σθ2 ε2 + 4ε.

Hence I11 (M, ε) < 0 for all M and ε such that M ≥ ε. The derivatives of I with respect to ε are σ2 M I2 (M, ε) = − √θ 1 < 0, 3 2 2π (M σθ2 + 1) (M + ε) 2 {σθ2 (M + ε) + 1} 2 σθ2 M (4M σθ2 + 4εσθ2 + 3) √ I22 (M, ε) = 3 > 0. 5 4 2π (M σθ2 + 1) (M + ε) 2 (Mσθ2 + εσθ2 + 1) 2

3.3.5

Proof of Proposition 3.4

The derivatives of J are − 12 σθ2 nl + σθ2 nh + 12 σθ2 √ J (n) = > 0, 1 1 1 2π (σθ2 nl + 1)2 (σθ2 nh + 1) 2 n 2 h 2 l−1 σθ2 n3 σθ6 hl (−4h + 2l) + n2 σθ4 l −6h + 32 l − 3nσθ2 l − ′′ J (n) = √ 1 3 1 2 2π (σθ2 nl + 1)3 (σθ2 nh + 1) 2 n 2 h 2 l−1 ′

where h ≥ l > 0. Hence J(·) is concave on [0, ∞).

1 2

< 0,

128

Bibliography [1] Ahn, D. S. Ambiguity without a state space. November 2005, forthcoming in Review of Economic Studies. [2] Atkeson, A. and Lucas, Jr, R. E. On efficient distribution with private information. Review of Economic Studies, Vol. 59, 427-453, 1992. [3] Becker, G. S. and Mulligan, C. B. The endogenous determination of time preference. Quaterly Journal of Economics, Vol. 112, 729-758, 1997. [4] Becker, R. On the long run steady state in a simple equilibrium with heterogeneous households. Quarterly Journal of Economics, Vol. 90, 375382, 1980. [5] Bertsekas, D. P. and Shreve, S. E. Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York, 1978. [6] Blanchard, O. J. Debt, deficits, and finite horizons. Journal of Political Economy, Vol. 93, 223-247, 1985. [7] Blume, L., A. Brandenburger, E. Dekel, Lexicographic probability and choice under uncertainty, Econometrica, Vol. 59, 61-79, 1991. [8] Caballé, J. and Fuster, L. Pay-as-you-go social security and the distribution of altruistic transfers. Review of Economic Studies, Vol. 70, 541-567, 2003. [9] Chade, H. and Schlee, E. Another look at the Radner-Stiglitz nonconcavity in the value of information. Journal of Economic Theory, 107: 421-52, 2002. [10] Chatterjee, S., Corbae, D., Nakajima, M., and Rios-Rull, J.-V. A quantitative theory of unsecured consumer credit with risk of default, 2006, forthcoming in Econometrica. [11] DeGroot, M. H. Optimal Statistical Decisions. McGraw-Hill, 1970.

129 [12] Dekel, E., Lipman, B. and Rustichini, A. Representing preferences with a unique subjective state space. Econometrica, Vol. 69, 891-934, 2001. [13] Dekel, E., B. Lipman, and A. Rustichini, Temptation-Driven Preferences, working paper, April 2006. [14] Dekel, E., Lipman, B., Rustichini, A. and Sarver, T. Representing preferneces with a unique subjective state space: a corrigendum. Econometrica, Vol. 75, 591-600, 2007. [15] Dutta, J. and Michel, P. The distribution of wealth with imperfect altruism. Journal of Economic Theory, Vol. 82, 379-404, 1998. [16] Ellsberg, D. Risk, ambiguity, and the Savage axioms. Quarterly Journal of Economics, 75: 643-669, 1961. [17] Epstein, L. G. Stationary cardinal utility and optimal growth under uncertainty. Journal of Economic Theory, Vol. 31, 133-153, 1983. [18] Epstein, L. G. A definition of uncertainty aversion. Review of Economic Studies, Vol. 66, 579-608, 1999. [19] Epstein, L. G., Marinacci, M. and Seo, K. Coarse Contingencies. Working paper, February 2007. [20] Epstein, L. G. and Zin, S. Substitution, risk aversion, and the temporal behavior of consumption and asset returns: a theoretical framework. Econometrica, Vol. 57, 937-969, 1989. [21] Epstein, L. and Schneider, M. Ambiguity, information quality and asset pricing. working paper, 2007. [22] Farhi, E. and Werning, I. Inequality and social discounting. Working paper, October 2006. [23] Ghirardato, P. and Marinacci, M. Ambiguity made precise: a comparative foundation. Journal of Economic Theory, Vol. 102, 251-289, 2002. [24] Gilboa, I. and Schmeidler, D. Maxmin expected utility with nonunique prior. Journal of Mathematical Economics, 18: 141-153, 1989. [25] Goldman, S. M. Flexibility and the demand for money. Journal of Economic Theory, Vol. 9, 203-222, 1974. [26] Gul, F. and Pesendorfer, W. Temptation and self-control. Econometrica, Vol. 69, 1403-1435, 2001.

130 [27] Gul, F. and Pesendorfer, W. Self-control and the theory of consumption. Econometrica, Vol. 72, 119-158, 2004. [28] Hausner, M., Multidimensional Utilities, In Thrall, R.M., C. H. Cooms, and R. L. Davis, editors, Decision Process, Wiley, NY, 1954. [29] Karni, E. and Zilcha, I. Saving behavior in stationary equilibrium with random discounting. Economic Theory, Vol. 15, 551-564, 2000. [30] Keppo, J., Moscarini, G. and Smith, L. The demand for information: more heat than light. working paper 2007. [31] Kihlstrom, R. A Bayesian Model of Demand for Information about Product Quality. International Economic Review, 15: 99-118, 1974. [32] Kocherlakota, N. R. Societal benefits of illiquid bonds. Journal of Economic Theory, Vol. 108, 179-193, 2003. [33] Kohn, M. G. and Shavell, S. The theory of search. Journal of Economic Theory, Vol. 9, 93-123, 1974. [34] Koopmans, T. C. On flexibility of future preference. In Shelley, M. W. and Bryan, G. L., editors, Human Judgements and Optimality, chapter 13. Academic Press, New York, 1964. [35] Kraus, A. and Sagi, J. S. Inter-temporal preference for flexibility and risky choice. September 2003, forthcoming in Journal of Mathematical Economics. [36] Kreps, D. M. A representation theorem for preference for flexibility. Econometrica, Vol. 47, 565-578, 1979. [37] Kreps, D. M. Static choice and unforeseen contingencies. In Dasgupta, P. and Gale, D. and Hart, O. and Maskin, E., editors, Economic Analysis of Markets and Games: Essays in Honor of Frank Hahn, 259-281. MIT Press, Cambridge, MA, 1992. [38] Kreps, D. M. and Porteus, E. L. Temporal resolution of uncertainty and dynamic choice theory. Econometrica, Vol. 46, 185-200, 1978. [39] Krusell, P. and Smith, A. Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy, Vol. 106, 867-896, 1998. [40] Levhari, D. and Srinivasan, T. N. Optimal savings under uncertainty. Review of Economic Studies, Vol. 36, 153-163, 1969.

131 [41] Lucas, Jr, R. E. On efficiency and distribution. Economic Journal, Vol. 102, 233-247, 1992. [42] Mehra, R. and Sah, R. Mood fluctuations, projection bias, and volatility of equity prices. Journal of Economic Dynamics and Control, Vol. 26, 869887, 2002. [43] Radner, R. and Stiglitz, J. E. A nonconavity in the value of information. In Boyer, M. and Kihlstrom, R., editors. Bayesian Models in Economic Theory, pages 33-52. North-Holland, Amsterdam, 1984. [44] Rothschild, M. and Stiglitz, J. E. Increasing risk I: a definition. Journal of Economic Theory, Vol. 2, 225-243, 1970. [45] Rothschild, M. and Stiglitz, J. E. Increasing risk II: its economic consequences. Journal of Economic Theory, Vol. 3, 66-84, 1971. [46] Rustichini, A. Preference for flexibility in infinite horizon problems. Economic Theory, Vol. 20, 677-702, 2002. [47] Salanié, F. and Treich, N. Over-savings and hyperbolic discounting. European Economic Review, Vol. 50, 1557-1570, 2006. [48] Sandmo, A. The effect of uncertainty on saving decisions. Review of Economic Studies, Vol. 37, 353-360, 1970. [49] Sarver, T. Anticipating regret: why fewer options may be better. Working paper, November 2006. [50] Takeoka, N. Subjective probability over a subjective decision tree. April 2005, forthcoming in Journal of Economic Theory. [51] Takeoka, N. Subjective prior over subjective states, stochastic choice, and updating. University of Rochester, Working paper, February 2005. [52] Uzawa, H. Time preference, the consumption function and optimum asset holdings. In Wolfe, J. N., editor, Papers in Honour of Sir John Hicks, 257-273. The University of Edinburgh Press, 1968. [53] Weil, P. Nonexpected utility in macroeconomics. Quarterly Journal of Economics, Vol. 105, 29-42, 1990. [54] Yaari, M. E. Uncertain lifetime, life insurance, and the theory of the consumer. Review of Economic Studies, Vol. 32, 137-150, 1965.

Three Essays on Decision Theory

Short Description

Description

Comments