Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Three essays on the identification and estimation of structural economic models
(USC Thesis Other)
Three essays on the identification and estimation of structural economic models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Three Essays on the Identification and Estimation of Structural Economic Models by Cheng Zhou A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in (Economics) THE UNIVERSITY OF SOUTHERN CALIFORNIA (Los Angeles) August 2016 c Cheng Zhou 2016 Table of Contents Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 Dynamic Programming Discrete Choice Models . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Dynamic Programming Discrete Choice Model . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 An example with four-period dynamic discrete choice . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Identification of Structural Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 1.6 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 1.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2 Econometrics of Buy-Price English Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2 Auction Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3 Identification of the Distribution of Valuations . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3 Estimation of Identified Sets in Nonlinear Panel Data Models . . . . . . . . . . . . . . . 73 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 ii Table of Contents 3.2 Identified Sets in Nonlinear Panel Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3 Asymptotic Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Appendices A Appendix of Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.2 Counterfactual Policy Predictions under Normalization of Period Utility Functions . . . . . . 101 B Appendix of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 B.1 Regularity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 B.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 C Appendix of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 C.1 Proofs of Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 C.2 Convergence of Rate of Sieve Estimators in Partially Identified Models . . . . . . . . . . . . 136 iii List of Tables 1.1 Estimation of Period Utility Functions: d x = 3;d z = 4 . . . . . . . . . . . . . . . . . . . . . . 46 1.2 Estimation of Period Utility Functions: d x = 30;d z = 4 . . . . . . . . . . . . . . . . . . . . . 47 2.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 iv List of Figures 1.1 Confidence Interval (95%) of Period Utility Functions: d x = 30;d z = 4 . . . . . . . . . . . . . 47 2.1 Estimation of the Distribution of Valuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.1 Hit Estimator of Identified Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.2 Constrained Simulated Annealing Algorithm for Drawing Marks . . . . . . . . . . . . . . . . 81 3.3 Estimation of Identified Set in Logit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.4 Estimation of Identified Set in Probit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 v Acknowledgements I would like to thank my committee members, Geert Ridder (chair), Cheng Hsiao, Yu-Wei Hsieh, Roger Moon, and Botao Yang, for their valuable guidance and advice throughout my graduate school years. In particular, I would like to express my greatest gratitude to Geert Ridder for spending countless hours reading and improving my job market paper (Identification and Linear Estimation of General Dynamic Programming Discrete Choice Models), discussing various econometrics issues, and nourishing me intellectually in general. He is the best advisor I could have ever dreamed of. I would like to thank to Han Hong and Matt Shum for their great advice on writing and “selling” the job market paper, and the enormous support during the job searching process. I am also very grateful to Chia-Shang (James) Chu, my advisor during the study at the Peking University China Center for Economic Research. He ignited my interests in econometrics and made me prepared for the PhD studies and research at USC. I would like to thank my wife Ruoyao Shi both for her personal and for her professional support during my graduate school years. She has always been supportive and understandable. The discussions with her have expanded and deepened my understanding about econometrics and economics in general. I feel lucky to have such a great personal and professional partner in my life. I also like to thank Nazmul Ahsan, Ahram Moon, Robson Morgan, Shuyang Sheng, Qi Sun, Jin Wang, Fei Wang, Haojun Yu, Xing Zhang, Qiankun Zhou, Yaoyao Zhu and my other friends at USC for making the graduate school years more colorful than the black-white paper. vi Dedication To my family. vii Chapter 1 Identification and Linear Estimation of General Dynamic Programming Discrete Choice Models This chapter studies the nonparametric identification and estimation of the structural parameters, including the per period utility functions, discount factors, and state transition laws, of general dynamic programming discrete choice (DPDC) models. I show an equivalence between the identification of general DPDC model and the identification of a linear GMM system. Using such an equivalence, I simplify both the identification analysis and the estimation practice of DPDC model. First, I prove a series of identification results for the DPDC model by using rank conditions. Previous identification results in the literature are based on normalizing the per period utility functions of one alternative. Such normalization could severely bias the estimates of counterfactual policy effects. I show that the structural parameters can be nonparametrically identified without the normalization. Second, I propose a closed form nonparametric estimator for the per period utility functions, the computation of which involves only least squares estimation. The existing estimation procedures rely on assuming that the dynamic programming (DP) problem is stationary or on solving the DP problem numerically with the aid of terminal conditions. Neither the identification nor the estimation requires terminal conditions, the DPDC model to be stationary, or having a sample that covers the entire decision period. 1.1 Introduction The dynamic programming discrete choice (DPDC) model is an empirical framework for studying the intertemporal discrete choices in fields as labor economics and empirical industrial organization (see Ackerberg, Benkard, Berry, and Pakes, 2007, and Keane et al., 2011, for surveys of applications). The DPDC model extends the static discrete choice model by allowing an individual’s current choice to affect not only her current utility but also her future state or utility. Taking occupational choice as an example (Keane and 1 1.1. Introduction Wolpin, 1997), starting at some age, an individual can choose among different occupations/activities: attend school/college, work in either a blue or white collar occupation, start her own business, enlist in the military or stay at home (unemployed). Such an occupational choice can be made repeatedly through her lifetime. Her current occupational choice will affect not only her current utility but also her future human capital, and hence future income. For example, attending college is costly, but a college graduate is more likely to find a job. It is reasonable to view her current occupational choice as a result of intertemporal maximization of her expected lifetime utility. In DPDC models, this intertemporal optimization is solved by dynamic programming. The econometric problems in DPDC models are the identification and estimation of structural parameters, including the per period utility functions, discount factors, and state transition laws. If the structural parameters of a DPDC model are obtained, counterfactual policy interventions, such as the effect of subsidizing tuition on college enrollment, can be simulated. The existing identification results and estimation methods for (non)stationary DPDC models are both conceptually complicated and numerically difficult due to the complexity of (non)stationary dynamic pro- gramming that is a recursive solution method. This paper will show that the identification of (non)stationary DPDC models and their estimation can be greatly simplified, because I will show that the identification of general DPDC models is equivalent to the identification of a linear GMM system. So the identification of DPDC models can be understood from the familiar rank conditions in linear models. Moreover, the per period utility functions and discount factors can be estimated by a closed form linear estimator. The idea of linear identification and estimation is inspired by the econometric literature on dynamic game models. Pesendorfer and Schmidt-Dengler (2008), Bajari, Chernozhukov, Hong, and Nekipelov (2009), Bajari et al. (2010) show that the Markovian equilibria of dynamic games with discrete choices can be equivalently written as a system of equations linear in the per period utility functions. Hence the identification of per period utility functions in dynamic game models is similar to the identification of a linear GMM system. Moreover, the per period utility functions can then be estimated by least squares. As a special case of the dynamic game with discrete choices, the identification and estimation of infinite horizon stationary single agent DPDC models can also be addressed using the equivalence to a linear GMM system (Pesendorfer and Schmidt-Dengler, 2008; Srisuma and Linton, 2012). Because the equivalence to a linear GMM has greatly simplified our understanding of the identification of stationary DPDC models and their estimation, a natural question is if such an equivalence exists for general DPDC models, especially finite horizon nonstationary DPDC models. Finite horizon models are common in labor economics, since households live for a finite time. This paper addresses this question. The DPDC model studied in this paper is general in three ways. First, the decision horizon can be finite or infinite. Second, all structural parameters, including per period utility functions, discount factors and 2 1.1. Introduction transition laws, are allowed to be time varying. Third, I do not assume that the per period utility function associated with one particular alternative is known, or is normalized to be a known constant. This feature is important, because normalization of the per period utility function will bias counterfactual policy predictions. The normalization derives from the analogy between dynamic and static choice. In static discrete choice the conditional choice probabilities (CCP) only depend on the differences between the payoffs of alternatives. So we can change payoffs of alternatives so long as their differences are not changed. This ambiguity motivates the normalization of the payoff of one alternative (Magnac and Thesmar, 2002; Bajari, Benkard, and Levin, 2007; Pesendorfer and Schmidt-Dengler, 2008; Bajari, Chernozhukov, Hong, and Nekipelov, 2009; Blevins, 2014). However, normalization in dynamic discrete choice models is not innocuous for counterfactual policy predictions. This point has been mentioned recently by some authors in a variety of settings, e.g. Norets and Tang (2014); Arcidiacono and Miller (2015); Aguirregabiria and Suzuki (2014); Kalouptsidi, Scott, and Souza-Rodrigues (2015). 1 The intuition is that in a dynamic discrete choice model, a forward-looking individual’s current choice depends on future utility. This future utility depends on the per period utility functions of all alternatives. Consider the normalization of setting the per period utility of the first alternative to be zero for all states. Such a normalization will distort the effects of the current choice on future utility, because the per period utility of the first alternative does not depend on the state. When we consider counterfactual interventions, the effects of the current choice on counterfactual future payoff will be also distorted, hence the counterfactual choice probability will be biased. Without imposing a normalization, I provide two alternative ways to identify the per period utility functions and discount factors. One is to assume that there are excluded state variables that do not affect per period utilities but affect state transitions. When excluded state variables are not available, another way is to assume that per period utility function is time invariant but that state transition laws are time varying. The excluded variables restriction has been used to identify discount factors in exponential discounting (Ching and Osborne, 2015) and hyperbolic discounting (Fang and Wang, 2015), but it has not been used to identify per period utility functions in general DPDC models. The closest work is Aguirregabiria and Suzuki’s (2014) study of market entry and exit decisions, where the per period utility function is equal to the observable revenue net of unobservable cost. Assuming that the firms’ dynamic programming problem is stationary, and the discount factor is known, they use exclusion restrictions to identify the cost function. However they do not consider the identification of the discount factor and of nonstationary DPDC models. Let us consider a binary choice model to explain the intuition why the exclusion restrictions can identify the per period utility function without normalization. The observable CCP is determined by the difference between the payoffs of the two alternatives. In DPDC model, such a payoff difference is the sum of the difference 1 I provide two propositions in the appendix showing the misleading consequence of normalization for counterfactual analysis. 3 1.1. Introduction between per period utility functions and the difference between the discounted continuation value functions. Exclusion restrictions create “exogenous” variation that can identify the value functions from the CCP. The identification of the per period utility functions follows from the Bellman equation. Using the equivalence to linear GMM, the estimation of DPDC models becomes so simple that the per period utility functions and discount factors can be estimated by a closed form linear estimator after estimating the conditional choice probabilities (CCP) and the state transition distributions. The implementation of our linear estimator is simple because only basic matrix operations are involved. Our linear estimator can be applied to situations where the agent’s dynamic programming problem is nonstationary, the panel data do not cover the whole decision period, and there are no terminal conditions available. Such simplicity in computation and flexibility in modeling are desirable in practice, because the existing estimation algorithms (Rust, 1987; Hotz and Miller, 1993; Aguirregabiria and Mira, 2002; Su and Judd, 2012) depend on complicated numerical optimization and/or iterative updating algorithms, and many of them cannot be applied when the dynamic programming problem is nonstationary and no terminal conditions are available. 1.1.1 Literature review We now survey the literature. If the agent’s dynamic programming problem is stationary, 2 Rust (1994, section 3.5) shows that the structural parameters of the DPDC models, including the per period utility functions and the discount factor, are nonparametrically unidentified. However the exact degree of underidentification is not clear. Magnac and Thesmar (2002) extend Rust’s underidentification argument in two ways. First, they determine the exact degree of underidentification in a two periods DPDC model and discuss the identifying power of various restrictions (section 2 to 4 of their paper). Their conclusion is that the alternative specific per period utility functions cannot be nonparametrically identified if the distribution of the unobserved payoff shocks, the discount factor, and the per period utility function and the alternative specific value function (ASVF) associated with one specific alternative are not all known. The precise definition of the ASVF will be given later; at this moment, one just needs to understand that the ASVF is the best expected remaining lifetime payoff if that particular alternative is chosen. Second, they study the identification of DPDC models with unobserved heterogeneity (section 5). The unobserved heterogeneity in their paper is discrete and affects per period utility functions but does not affect the law of state transitions. Their conclusion is that the DPDC models with unobservable heterogeneity are nonparametrically unidentified even under strong restrictions, such as that the current and future payoffs of one alternative are assumed to be known. The identification and estimation of the stationary single agent DPDC model is closely related to the 2 The agent’s dynamic programming problem is stationary if the per period utility functions, the law of state transitions and the discount factor of future payoff are all time invariant, and the decision horizon is infinite. 4 1.1. Introduction identification of dynamic game models with Markov perfect equilibria. The crucial observation is that if the dynamic programming problem is stationary, the Bellman equation becomes of a Fredholm integral equation of type 2 from which the value function can be solved. This implies that we have a closed form representation of the value function in terms of the per period utility functions, the discount factor and the observable CCP. Moreover, the observable CCP is determined by the difference between the payoffs of the alternatives, which is the sum of the difference between the per period utility functions and the difference between the discounted continuation value functions. We then have a closed form representation of CCP in terms of the per period utility functions, the discount factor and the value function. Replacing the value function in the CCP representation with its closed form representation from solving the Bellman equation, we have an equation that involves only the observable CCP, the unknown per period utility functions and the unknown discount factor. Assuming that the discount factor and the per period utility function associated with one alternative are known, Pesendorfer and Schmidt-Dengler (2008), Bajari, Chernozhukov, Hong, and Nekipelov (2009), Bajari et al. (2010) use this equation to study the identification of per period utility functions in dynamic game models, and Srisuma and Linton (2012) study the identification and estimation of stationary single agent DPDC model when some of the state variables are continuous. Blevins (2014) studies the nonparametric identification of the stationary dynamic programming decision process when the decisions involve both discrete and continuous choice. In the first stage, an agent makes a discrete choice; in the second stage, the agent makes a continuous choice given her previous discrete choice. Bajari et al. (2007) also use such a two-stage specification. Bajari et al. focus on the estimation issues, and their analysis allows for dynamic games. The advantage of using such a two-stage specification is that once the policy function of continuous choice is identified, the optimal continuous choice can be viewed as an observable state variable. Thus, Blevins’ model becomes a stationary DPDC model. Blevins’ conclusion is that when the discount factor, the per period utility function of one specific alternative, and the distribution of preference shocks are known, the per period utility functions of the other alternatives are nonparametrically identifiable. This conclusion corresponds to the earlier observation in Magnac and Thesmar (2002). When the distribution of preference shocks is unknown, he provides some exclusion restrictions (Assumption 12 on p. 546 of his paper) that can lead to identification of the distribution of the differences between payoff shocks. His method for identifying the distribution is similar to the control function approach used in the nonparametric instrumental variable literature (e.g.. Blundell and Powell, 2004; Imbens and Newey, 2009). His identification arguments depend crucially on the stationarity assumption, without which the functional mapping in his proof does not exist. When the distribution of preference shocks is unknown, Norets and Tang (2014) provide a partial identification approach to analyze the stationary DPDC models when all observable state variables are discrete with finite support. 5 1.1. Introduction Heckman and Navarro (2007) and Aguirregabiria (2010) study the identification of nonstationary DPDC models with a finite decision horizon. Both papers assume that researchers can observe the “outcomes” of agents’ choices. For example, the outcome is one’s earnings in Heckman and Navarro’s schooling decisions study. An agent’s per period utility is assumed to be the outcome net of the unobservable cost of the choice, and hence the identification of utility function is then equivalent to the identification of the cost function. Heckman and Navarro identify the period utility function under several restrictions. The most substantial two restrictions are (1) the continuation value associated with one specific alternative is known, and (2) the transition between the observed states does not depend on the agent’s decisions. These assumptions are restrictive in practice. Without these assumptions, Aguirregabiria aims to identify the effects of certain counterfactual policy interventions rather than the structural parameters in the case that the policy effects on the agents’ per period utility functions are completely known. There are two limitations of his approach. First, his method applies only to counterfactual policy interventions that affect the per period utility functions and the effects of which on current utility are completely known. If the intervention effects are unknown or the interventions are on state transitions, his method cannot be applied. Second, identification and estimation statements are based on backward induction, and the estimation requires data about decisions in the final decision period. It is not clear if this method can be extended to deal with infinite horizon DPDC model, for which the panel data cannot cover the entire decision process. 3 The estimation of a DPDC model is usually complicated since the model is based on dynamic programming that is a recursive solution method. Researchers usually adopt the maximum likelihood method to estimate the structural parameters, although it is not clear if the log likelihood function has a unique global maximizer. The first estimation method was the nested fixed-point (NFXP) algorithm proposed by Rust (1987). To alleviate the computational burden, Hotz and Miller (1993) developed a semiparametric two-step estimator of the structural parameters. The first step is to estimate the CCP nonparametrically. The second step uses the famous Hotz and Miller inversion proposition that gives a representation of the ASVF in terms of the CCP, per period utility functions, and the discount factor. Consequently, one has a closed form representation of the CCP in terms of the CCP itself and the structural parameters (see equation (3.12) of Hotz and Miller’s paper). Substituting nonparametric estimates obtained in the first step for the CCP in the closed form representation, one has the CCP for each value of structural parameters. Equating these expressions with its nonparametric estimates, one can develop a GMM estimator of the structural parameters, and this is the second step of Hotz and Miller’s estimation method. Hotz and Miller’s idea also applies to nonstationary DPDC models. 3 In the working paper version (Aguirregabiria, 2005), he did study the identification and estimation when decision horizon is infinite. But there he has to assume that the dynamic programming problem is stationary, and the estimation procedure becomes computationally difficulty because some contraction mappings are involved in his procedure. 6 1.1. Introduction There are two potential limitations of the Hotz and Miller two-step estimator. First, the computational gain comes at the expense of efficiency. This drawback has been addressed by Aguirregabiria and Mira (2002). Second, the closed form representation of the ASVF becomes complicated when there are many future periods before the decision horizon. The complication comes from the fact that the representation of the ASVF, see equation (3.12) of Hotz and Miller’s paper, requires the evaluation of the probabilities of all possible future paths and the expected utilities associated with these paths. This has not been noticed in the literature because the existing estimators focus on the stationary DPDC models, and under stationarity it is easier to express the ASVF in terms of the CCP and structural parameters (see equation (8) of Aguirregabiria and Mira, 2002, for example). Aguirregabiria and Mira (2002) provide a new approach called the nested pseudo likelihood (NPL) algorithm to estimate stationary DPDC models when the state variables are discrete. Their estimator could be as efficient as Rust’s NFXP estimator but computationally easier. When the dynamic programming process is stationary, Aguirregabiria and Mira establish a contraction mapping for the CCP. Using this contraction mapping, Aguirregabiria and Mira’s NPL estimator can improve the estimate of the CCP used in the second step of Hotz and Miller’s two-step estimator. Recently, Su and Judd (2012) provide another estimation approach called the mathematical program with equilibrium constraints (MPEC) for the stationary DPDC model. The equilibrium constraint in stationary DPDC models corresponds to the integrated Bellman equation. Their idea is to treat the ex ante value function, which becomes a vector when the observable states are discrete, as a parameter in maximizing the log likelihood function subject to the constraint that the ex ante value function must solve the integrated Bellman equation. However, their method works only with discrete state variables, and the number of points in the support has to be small. It is also not clear whether their method can be used to estimate nonstationary DPDC models. 1.1.2 Structure of the paper and notation rules In section §1.2, we develop the dynamic programming discrete choice model of which identification and estimation will be studied. The model’s set up follows the literature, except that we allow per period utility functions and discount factors to be time varying. In section §1.4, we show that the identification of the DPDC model is equivalent to the identification of a linear GMM system, and provide a list of identification results under various restrictions. In particular, we show two ways to identify the DPDC models without normalizing per period utility functions. After clarifying the identification of the model, we show that the DPDC model can be estimated by simple linear estimators. Numerical experiments are conducted to check the performance and to highlight some issues with our estimator. The last section concludes the paper with a discussion of some extensions. 7 1.2. Dynamic Programming Discrete Choice Model Notation. Let X, Y and Z be three random variables. We write X ? ? Y to denote that X and Y are independent. And write X? ?YjZ to denote that X and Y are independent conditional on Z. If the random variableX can take only a finite number of values. The support ofX isX (x 1 ;:::;x dx ). Letf(X) :X7!R be a real function. We use f to denote the d x -dimensional vector (f(x 1 );:::;f(x dx )) | . For a real number a2R, let a n (a;:::;a) | be an n-dimensional vector with entries all equal to a. 1.2 Dynamic Programming Discrete Choice Model 1.2.1 The model We first set up the dynamic programming discrete choice model. A female labor force participation example then follows the abstract setup to illustrate the notation. We restrict our attention to the binary choice case. The extension to multinomial choice is straightforward at the expense of more cumbersome notation (see Remark 4 in section §1.4). In each period t, an agent makes a binary choice D t 2f0;1g based on a vector of state variables t = (S t ;" 0 t ;" 1 t ). Researchers only observe the choice D t and the state variable S t . The choice in period t affects both the agent’s instantaneous utility in period t and the distribution of the next period state variable t+1 . Assumption 1 restricts the instantaneous utility to be additive in the unobserved state variables. Assumption 2 assumes that the state variable t is a controlled first-order Markov process. Both are standard assumptions in the literature. Assumption 1. The agent receives instantaneous utility u t ( t ;D t ) in period t. In particular, let u t ( t ;D t ) =D t ( 1 t (S t ) +" 1 t ) + (1D t )( 0 t (S t ) +" 0 t ); so that u t ( t ;D t = d) is additive in the unobserved state variable " d t . We call d t (S t ) the (structural) per period utility function in period t associated with alternative d. Assumption 2. The choice in period t affects the distribution of the next period state variable t+1 . Given the current state variable t and choice D t , the next period state variable t+1 is independent of all previous state variables and choices, that is t+1 ? ? ( t 0;D t 0)j( t ;D t ) for any t 0 <t. Let T 1 be the last decision period. In each period t, the agent makes a sequence of choices fD t ;:::;D T g to maximize the expected remaining lifetime utility in period t, u t ( t ;D t ) + T X r=t+1 r1 Y j=t ! E r [u r ( r ;D r )j t ;D t ]; 8 1.2. Dynamic Programming Discrete Choice Model where t 2 [0; 1) is the discount factor in period t. The agent’s problem is a Markov decision process, which can be solved by dynamic programming. Let V t ( t ) be the value function in period t. The optimal choice D t solves the Bellman equation, V t ( t ) = max d2f0;1g u t ( t ;D t = d) + t E t+1 [V t+1 ( t+1 )j t ;D t = d] = max d2f0;1g d t (S t ) +" d t + t E t+1 [V t+1 ( t+1 )jS t ;" 0 t ;" 1 t ;D t = d]: (1.1) In other words, the agent’s decision rule is as follows, D t = 8 > > > > > > < > > > > > > : 1; 1 t (S t ) + t E t+1 [V t+1 ( t+1 )jS t ;" 0 t ;" 1 t ;D t = 1] +" 1 t > 0 t (S t ) + t E t+1 [V t+1 ( t+1 )jS t ;" 0 t ;" 1 t ;D t = 0] +" 0 t ; 0; otherwise. (1.2) Withoutfurtherrestrictionaboutthestatetransitiondistribution, thecontinuationvalue E t+1 [V t+1 ( t+1 )jS t ; " 0 t ;" 1 t ;D t = d] is non-separable from the unobserved state variables " 0 t and " 1 t . To avoid dealing with non- separable models, we make the following assumption. Assumption 3. (i) Let " t = (" 0 t ;" 1 t ) | . The sequence of unobserved state variablesf" t g is independent and identically distributed. (ii) For each period t, S t ? ? (" t ;" t+1 ). (iii) For each period t, S t+1 ? ?" t j(S t ;D t ). The assumption is standard in the literature, but we want to emphasize the implied limitations. Assump- tion 3.(i) implies that the unobserved state variable " t does not include the unobserved heterogeneity that is constant or serially correlated over time. For example, suppose " d t = + d t , where is time invariant unobserved heterogeneity, and d t is a serially independent random utility shock. Then the unobserved state variable " d t becomes serially correlated. Moreover, if the unobserved heterogeneity is fixed effect that is correlated with the observed state variable S t , Assumption 3.(ii) is violated. If conditional on (S t ;D t ), the unobserved heterogeneity can still affect the distribution of the next period state variable S t+1 , Assumption 3.(iii) is violated. Applying Assumption 3, it can be verified that for each alternative d2f0;1g, E t+1 [V t+1 ( t+1 )jS t ;" 0 t ;" 1 t ;D t = d] = E St+1 [v t+1 (S t+1 )jS t ;D t = d]; (1.3) 9 1.2. Dynamic Programming Discrete Choice Model where v t+1 (S t+1 ) E "t+1 [V t+1 (S t+1 ;" t+1 )jS t+1 ] (1.4) is called the ex ante value function in the literature. Because the conditional expectations E St+1 (jS t ;D t = 0) and E St+1 (jS t ;D t = 1) as well as their difference will be frequently used, define the following new notation for expositional simplicity, E d t+1 (jS t ) E St+1 (jS t ;D t = d); d2f0;1g; E 1=0 t+1 (jS t ) E St+1 (jS t ;D t = 1) E St+1 (jS t ;D t = 0): (1.5) It should be remarked that E 1=0 t+1 (cjS t ) = 0 for any real constant c, so the conditional expectation difference E 1=0 t+1 (jS t ) viewed as a linear operator is not invertible. Define the alternative specific value function (ASVF) v d t (S t ) for each alternative d2f0;1g, v d t (S t ) = d t (S t ) + t E t+1 [V t+1 ( t+1 )jS t ;" 0 t ;" 1 t ;D t = d] = d t (S t ) + t E d t+1 [v t+1 (S t+1 )jS t ]: (1.6) The second line of the above display follows from equation (1.3). Using the notation of the ASVF, the Bellman equation (1.1) becomes V t (S t ;" t ) = max d2f0;1g v d t (S t ) +" d t ; (1.7) and the decision rule (1.2) now has a simpler expression, D t = 8 > > < > > : 1; if v 1 t (S t ) +" 1 t >v 0 t (S t ) +" 0 t ; 0; otherwise. (1.8) By the decision rule (1.8), the CCP p t (S t ) = P(D t = 1jS t ) equals the following, p t (S t ) = P(" 0 t " 1 t <v 1 t (S t )v 0 t (S t )): Let G(;) be the cumulative distribution function (CDF) of the vector of unobserved state variables " t = (" 0 t ;" 1 t ) | , and let ~ G() be the CDF of ~ " t =" 0 t " 1 t . In terms of the CDF ~ G(), the CCP is written as 10 1.2. Dynamic Programming Discrete Choice Model follows, p t (S t ) = ~ G(v 1 t (S t )v 0 t (S t )) = ~ G( 1 t (S t ) 0 t (S t ) + t E 1=0 t+1 [v t+1 (S t+1 )jS t ]): (1.9) When the CDF ~ G() is unknown, even the ASVF difference v 1 t (S t )v 0 t (S t ) cannot be identified, let alone the structural per period utility functions 0 t and 1 t . Suppose that the CDF ~ G() is known, the absolute level the per period utility functions 0 t (S t ) and 1 t (S t ) cannot be identified. Take = 0 for example, for any constant c2R, p t (S t ) = ~ G( 1 t (S t ) 0 t (S t )) = ~ G([ 1 t (S t ) +c] [ 0 t (S t ) +c]): To address these concerns, we make the following assumption. Assumption 4. (i) The CDF G(;) of the unobserved state variables " t = (" 0 t ;" 1 t ) | and the CDF ~ G() of ~ " t =" 0 t " 1 t are known. Moreover, ~ " t is a continuous random variable with supportR, and the CDF ~ G() is strictly increasing. (ii) The observable state variable S t is discrete with time invariant supportS =fs 1 ;:::;s ds g. (iii) (Normalization). For every period t, let 0 t (s 1 ) = 0. Note that besides the presence of the unknown ex ante value function v t+1 (S t+1 ), the CCP formula (1.9) is similar to the CCP in the binary static discrete choice model studied by Matzkin (1992), in which the CDF ~ G() can be nonparametrically identified. In the presence of the “special regressor” and the median assumption as assumed in Matzkin (1992), the CDF ~ G() of ~ " t can be identified by following Matzkin’s arguments (see also Aguirregabiria, 2010). The normalization in Assumption 4.(iii) differs from the commonly used normalization by letting 0 t (s 1 ) = 0 t (s 2 ) = = 0 t (s ds ) = 0; 8t: (1.10) The normalization (1.10) implies that the per period utility of alternative 0 does not vary with respect to the values of the state variable S t . It has been gradually realized that the normalization (1.10) is not innocuous for predicting counterfactual policy effects (see e.g. Norets and Tang, 2014; Arcidiacono and Miller, 2015; Aguirregabiria and Suzuki, 2014; Kalouptsidi et al., 2015). In Appendix A.2, we show two things. First, the normalization (1.10) will bias the counterfactual policy predictions, if the per period utility of alternative 0 depends on the value of the observed state variable S t . Second, the normalization of Assumption 4.(iii) will not bias the counterfactual policy predictions. 11 1.2. Dynamic Programming Discrete Choice Model By assuming discrete state space (Assumption 4.(ii)), the structural per period utility functions 0 t (S t ) and 1 t (S t ), the CCP p t (S t ), the ASVF v 0 t (S t ) and v 1 t (S t ), and the ex ante value functions v t (S t ) are all finitely dimensional. Denote 0 t = ( 0 t (s 1 );:::; 0 t (s ds )) | , and 1 t , p t , v 0 t , v 1 t and v t are defined similarly. It should be remarked that our identification results below hold for any finite number of states d s . Let f t+1 (S t+1 jS t ;D t ) be the conditional probability function of S t+1 given S t and D t . Let F d t+1 be the state transition matrix describing the transition probabilities from state S t to S t+1 when choice D t = d2f0;1g: F d t+1 2 6 6 6 6 4 f t+1 (s 1 js 1 ;D t = d) ::: f t+1 (s ds js 1 ;D t = d) . . . ::: . . . f t+1 (s 1 js ds ;D t = d) ::: f t+1 (s ds js ds ;D t = d) 3 7 7 7 7 5 : The difference between the two state transition matrices F 1 t and F 0 t will be frequently used, so we denote F 1=0 t F 1 t F 0 t : Similarly, let f 1=0 t+1 (S t+1 jS t )f t+1 (S t+1 jS t ;D t = 1)f t+1 (S t+1 jS t ;D t = 0): Example (Female labor force participation model). Our particular model is based on Keane et al. (2011, section 3.1). In each year t, a married woman makes a labor force participation decision D t 2f0;1g, where 1 is “to work” and 0 is “not to work”, to maximize the expected lifetime utility. The per period utility depends on the household consumption (cons t ) and the number of young children (kid t ) in the household. 4 Consumption equals the household’s income net of child-care expenditures. The household income is the sum of the husband’s income (husb t ) and the wife’s income (wage t ) if she works. The per-child child-care cost is if she works, and zero if she stays at home. So consumption is cons t = husb t +wage t D t kid t D t : Suppose the wage offer function takes the following form wage t = 1 + 2 xp t + 3 (xp t ) 2 + 4 edu +! t ; where xp t is the working experience (measured by the number of prior periods the woman has worked) of the woman in year t, edu is her education level, ! t is a random shock, which is independent of the wife’s working 4 We do not model the fertility decision, and assume the arrival of children as an exogenous stochastic process. 12 1.2. Dynamic Programming Discrete Choice Model experience and education. The wife’s working experience xp t evolves by xp t+1 = xp t +D t : Assume the period utility functions associated with the two alternatives are u 1 t (S t ;" 1 t ) = 1 t (husb t ;xp t ;edu;kid t ) +" 1 t = husb t + 1 + 2 xp t + 3 (xp t ) 2 + 4 edukid t +" 1 t ; u 0 t (S t ;" 0 t ) = 0 t (husb t ;kid t ) +" 0 t : (1.11) Besides the observable state variables about the woman, we also observe her husband’s working experience xp H t and education level edu H . Given husband’s income husb t , these two state variables, xp H t and edu H , do not affect the period utility but affect the state transitions by affecting the husband’s future income. These two state variables excluded from the period utility function will be useful for identification of the structural parameters. Let S t = (husb t ;xp t ;edu;kid t ;xp H t ;edu H ) be the vector of observable state variables. The problem is dynamic because the woman’s current working decision D t affects her working experience in the next period: xp t+1 = xp t +D t . As in the general model, the woman’s choice D t maximizes the value function D t = arg max d2f0;1g v d t (S t ) +" d t ; where the ASVF v d t (S t ) is defined by equation (1.6) with the period utility functions being substituted by equation (1.11). We are interested in predicting the labor supply effects of some counterfactual policy intervention, such as child-care subsidy, tax reduction or the introduction of contraceptive techniques to households. In terms of the CCP, this means we would like to know the new CCP after imposing these counterfactual policy interventions. To answer these questions, we first need to identify and estimate the structural parameters. 1.2.2 Data and structural parameters of the model Assume that researchers only observe T consecutive decision periods, rather than the whole decision process. Denote the T sampling periods by 1; 2;:::;T. It should be remarked that the first sampling period 1 does not need to correspond to the first decision period, nor does the last sampling period T correspond to the terminal decision period T . Denote the data byD: D = (D 1 ;S 1 ;D 2 ;S 2 ;:::;D T ;S T ); 13 1.3. An example with four-period dynamic discrete choice whose support isD = (f0;1gS) T . Let denote the vector of structural parameters of this model including per period utility functions ( 0 t , 1 t ), discount factors ( t ) and transition matrices (F 0 t , F 1 t ) in each period t. It will be useful to reparameterize ( 0 t ; 1 t ) as ( 0 t ; 1=0 t ), where 1=0 t = ( 1=0 t (s 1 );:::; 1=0 t (s ds )) | with 1=0 t (S t ) = 1 t (S t ) 0 t (S t ). Let t = ( 0 t ; 1=0 t ; t ;F 0 t ;F 1 t ) for t = 1;:::;T 1. And let T = (v 0 T ;v 1 T ;F 0 T ;F 1 T ) instead of T = ( 0 T ; 1=0 T ; T ;F 0 T ;F 1 T ), because the CCP p T (S T ) cannot be determined by the per period utility functions 0 T and 1=0 T alone when T <T . Let = ( 1 ;:::; T ), and let be the parameter space. We consider identification for such data that we call a short panel not only because short panel data are common in empirical studies, but also because the number of time periods turns out to play an important role in the identification of DPDC models. As shown below, when the discount factors are known, one needs at least three consecutive periods to identify nonstationary DPDC models without the terminal conditions, e.g. T =T (so researchers observe the decision in the terminal period) or E T+1 [v T+1 (S T+1 )jS T ;D T = d] = 0 for both d = 0 and 1. In the presence of terminal conditions, we can identify the model with two consecutive periods data, when the discount factors are known. If the discount factors are unknown, we need one additional period data to identify the discount factors. It is remarkable that such dependence of identification of DPDC models on the number of periods has not been noticed in the current literature. 5 1.3 An example with four-period dynamic discrete choice To develop some intuition for the general results that will be presented in section §1.4 and 2.4, we consider the identification and estimation of structural parameters in a four period dynamic discrete choice model. The goal is to show that with the Exclusion Restriction below, we can identify the per period utility functions without assuming that 0 t (S t ) = 0 for all S t , and the per period utility functions can be estimated by a closed form linear estimator. To keep the example concrete and simple, we maintain the following three assumptions in this section. First, assume that the unobserved state variables " 0 t and " 1 t are independent and follow the type-1 extreme value distribution. So the CDF ~ G() of ~ " t =" 0 t " 1 t in this section is the logistic distribution function. Second, the state transition matrices are time invariant. Let F 0 t =F 0 and F 1 t =F 1 for eacht. We also omit the time subscript “t” in the conditional expectations E 0 t , E 1 t and E 1=0 t , and simply write E 0 , E 1 and E 1=0 . Third, assume that the discount factor is constant over the decision periods and is denoted by . We will study three cases below. In the first case (section 1.3.1), assume that researchers observe the decisions in the last two decision periods, that is the data there are (D 3 ;S 3 ;D 4 ;S 4 ). In the second 5 In the literature of the identification of the CCP with unobserved discrete types, the identification also depends on the number of time periods in panel data. Interestingly, T3 is also required to identify type specific CCP (e.g. Kasahara and Shimotsu (2009); Hu and Shum (2012); Bonhomme et al. (2013, 2014)). 14 1.3. An example with four-period dynamic discrete choice case (section 1.3.2), we have data of only the first two decision periods, (D 1 ;S 1 ;D 2 ;S 2 ). In the last case (section 1.3.3), researchers observe the decisions in the first three decision periods, (D 1 ;S 1 ;D 2 ;S 2 ;D 3 ;S 3 ). Since period 4 is the terminal decision period, there is no continuation value for the choice in period 4. So we have “terminal condition” in the first case, but not in the second and third cases. The comparison between case 1 and 2 clarifies the role of “terminal conditions” in the identification of dynamic discrete choice models. We will assume that the discount factor is known in the first two cases. In the third case, in which we observe one additional period than case 2, we show how to identify the discount factor. 1.3.1 Identification and estimation with the data of the last two decision periods Consider the dynamic discrete choices backwardly. In period 4 (terminal period), there is no continuation value for the choice. Hence the decision rule in period 4, D 4 = 8 > > < > > : 1; if 1 4 (S 4 ) +" 1 4 > 0 4 (S 4 ) +" 0 4 ; 0; otherwise, which is a logit model. We then have p 4 (S 4 ) = ~ G( 1=0 4 (S 4 )): (1.12) Here ~ G() is the logistic distribution function. Also, the ex ante value function v 4 (S 4 ) equals the following, v 4 (S 4 ) = E "4 [V 4 (S 4 ;" 4 )jS 4 ] = E "4 max d2f0;1g d 4 (S 4 ) +" d 4 S 4 = 0 4 (S 4 ) + [ ln(1p 4 (S 4 ))]; (1.13) where 0:5772 is Euler’s constant. The last line of the above display follows from the properties of the logit model. Because we will refer the term ln(1p 4 (S 4 )) frequently, define (p 4 (S 4 )) = ln(1p 4 (S 4 )): 15 1.3. An example with four-period dynamic discrete choice It follows from the CCP formula (1.9) that the CCP in period 3 is p 3 (S 3 ) = ~ G( 1=0 3 (S 3 ) + E 1=0 [v 4 (S 4 )jS 3 ]) = ~ G 1=0 3 (S 3 ) + E 1=0 [ 0 4 (S 4 )jS 3 ] + E 1=0 [ (p 4 (S 4 ))jS 3 ] : (1.14) The second line follows from replacing v 4 (S 4 ) with equation (1.13). Let(p) = lnpln(1p) be the inverse of the logistic distribution function. It follows from equation (1.12) and (1.14) that (p 4 (S 4 )) = 1=0 4 (S 4 ); (1.15a) (p 3 (S 3 )) = 1=0 3 (S 3 ) + E 1=0 [ 0 4 (S 4 )jS 3 ] + E 1=0 [ (p 4 (S 4 ))jS 3 ]: (1.15b) From data (D 3 ;S 3 ;D 4 ;S 4 ), we can identify and estimate the state transition matrices F 0 and F 1 , and the CCP p 3 (S 3 ) and p 4 (S 4 ). The per period utility functions difference in the terminal period 1=0 4 (S 4 ) is then identified from equation (1.15a) without further restriction. However, 1=0 3 (S 3 ) and 0 4 (S 4 ) cannot be identified from equation (1.15b) without further restriction even when the discount factor is known. To see this, we can identify only 1=0 3 (S 3 ) + E 1=0 [ 0 4 (S 4 )jS 3 ] =(p 3 (S 3 )) E 1=0 [ (p 4 (S 4 ))jS 3 ]; when the discount factor is known. Because both 1=0 3 (S 3 ) and E 1=0 [ 0 4 (S 4 )jS 3 ] are unknown functions of S 3 , we cannot identify 1=0 3 (S 3 ) and 0 4 (S 4 ) separately. Moreover, we cannot identify the discount factor , because (p 3 (S 3 )) = 1=0 3 (S 3 ) + ( +c) E 1=0 [ 0 4 (S 4 ) (p 4 (S 4 ))jS 3 ] + ( +c) E 1=0 [ (p 4 (S 4 ))jS 3 ]: So the new discount factor ~ = +c and the new per period utility ~ 0 4 (S 4 ) = 0 4 (S 4 ) (p 4 (S 4 )) will also satisfy equation (1.15b). We will show how to identify and estimate 1=0 3 (S 3 ) and 0 4 (S 4 ) using equation (1.15b) and the following Exclusion Restriction, when the discount factor is known. Note that 1 4 (S 4 ) = 0 4 (S 4 ) + 1=0 4 (S 4 ). Given that 1=0 4 (S 4 ) is identified, the per period utility function 1 4 (S 4 ) is identified as long as 0 4 (S 4 ) is identified. Exclusion Restriction. The vector of observable state variables S t has two parts X t and Z t . Let S t = 16 1.3. An example with four-period dynamic discrete choice (X t ;Z t ), where X t 2X =fx 1 ;:::;x dx g and Z t 2Z =fz 1 ;:::;z dz g. Assume that 1 t (X t ;Z t ) = 1 t (X t ) and 0 t (X t ;Z t ) = 0 t (X t ) for any (X t ;Z t ). For expositional simplicity, assume thatS =XZ, so that d s =d x d z . In particular, let S = vec 2 6 6 6 6 4 (x 1 ;z 1 ) (x 2 ;z 1 ) ::: (x dx ;z 1 ) . . . . . . ::: . . . (x 1 ;z dz ) (x 2 ;z dz ) ::: (x dx ;z dz ) 3 7 7 7 7 5 : For d x =d z = 2, this meansS =f (x 1 ;z 1 ); (x 1 ;z 2 ); (x 2 ;z 1 ); (x 2 ;z 2 )g. For simplicity, suppose that X t and Z t in the Exclusion Restriction can take two values in this section, that is the supportX =fx 1 ;x 2 g andZ =fz 1 ;z 2 g. Applying the Exclusion Restriction and evaluating equation (1.15b) at each (x i ;z j )2XZ, we have (p 3 (x i ;z j )) = 1=0 3 (x i ) + E 1=0 [ 0 4 (X 4 )jx i ;z j ] + E 1=0 [ (p 4 (S 4 ))jx i ;z j ]:; (1.16) for i;j = 1; 2. For each x i 2X, the difference (p 3 (x i ;z 1 ))(p 3 (x i ;z 2 )) depends on 0 4 (X 4 ), but not on 1=0 3 (X 3 ). We are going to identify 0 4 (X 4 ) using the differencesf(p 3 (x i ;z 1 ))(p 3 (x i ;z 2 )) :i = 1; 2g first. Thenf 1=0 3 (x i ) : i = 1; 2g is identified by the above the display. For i = 1; 2, considering the difference (p 3 (x i ;z 1 ))(p 3 (x i ;z 2 )), we have b i = E 1=0 [ 0 4 (X 4 )jx i ;z 1 ] E 1=0 [ 0 4 (X 4 )jx i ;z 2 ] = X j=1;2 f 1=0 (x j jx i ;z 1 ) 0 4 (x j ) X j=1;2 f 1=0 (x j jx i ;z 2 ) 0 4 (x j ); i = 1; 2; (1.17) with b i = (p 3 (x i ;z 1 )) E 1=0 [ (p 4 (S 4 ))jx i ;z 1 ] (p 3 (x i ;z 2 )) E 1=0 [ (p 4 (S 4 ))jx i ;z 2 ] : Equation (1.17) can be organized as the following system of equations that is linear in 0 4 = ( 0 4 (x 1 ); 0 4 (x 2 )) | , A 0 4 =b; (1.18) where b = (b 1 ;b 2 ) | 17 1.3. An example with four-period dynamic discrete choice and A = 2 6 4 f 1=0 (x 1 jx 1 ;z 1 )f 1=0 (x 1 jx 1 ;z 2 ) f 1=0 (x 2 jx 1 ;z 1 )f 1=0 (x 2 jx 1 ;z 2 ) f 1=0 (x 1 jx 2 ;z 1 )f 1=0 (x 1 jx 2 ;z 2 ) f 1=0 (x 2 jx 2 ;z 1 )f 1=0 (x 2 jx 2 ;z 2 ) 3 7 5: Using the notation F 1=0 =F 1 F 0 , the matrix A can be written alternatively as follows, A =MF 1=0 (I 2 1 2 ); where I 2 is the 2 2 identity matrix, “ ” is Kronecker product, 1 2 = (1; 1) | , and M 2 6 4 1 1 0 0 0 0 1 1 3 7 5: (1.19) The linear system of equations like equation (1.18) will be frequently encountered in the sequel. The matrix A will always depend only on the state transition matrices and the discount factors; the vector b will always depend only on the CCP and the discount factors. However, their explicit definitions will change with respect to different model specifications. Note that A 1 2 = 0 2 . Hence 1 2 is a non-zero eigenvector of matrix A associated with the eigenvalue 0. The matrix A cannot be of full rank. If rankA = 1, the solution set of equation (1.18) is A + b +c 1 2 :c2R ; where A + is the Moore-Penrose pseudoinverse of A (see Lemma A.1 for proof). So the solution for 0 4 is unique up to a constant that does not change with respect to the states. Note that if X 4 ? ? Z 3 jX 3 ;D 3 , both columns of A are zero, hence rankA = 0. Though the solution for 0 4 is not unique, we have a unique solution for the per period utility functions difference 1=0 3 = ( 1=0 3 (x 1 ); 1=0 3 (x 2 )) | . Let 0 4 =A + b +c 1 2 be an arbitrary solution of equation (1.18), it follows from equation (1.16) that 1=0 3 (x i ) =(p 3 (x i ;z j )) 2 6 4 f 1=0 (x 1 jx i ;z j ) f 1=0 (x 2 jx i ;z j ) 3 7 5 | A + b +c 1 2 E 1=0 [ (p 4 (S 4 ))jx i ;z j ]: =(p 3 (x i ;z j )) 2 6 4 f 1=0 (x 1 jx i ;z j ) f 1=0 (x 2 jx i ;z j ) 3 7 5 | A + b E 1=0 [ (p 4 (S 4 ))jx i ;z j ]; for both j = 1 and 2. Note that the above display does not depend on the unknown constant c, so 1=0 3 (x i ) is identified fori = 1; 2. It should be remarked that 1=0 3 (x i ) is linear in the discount factor, and such linearity 18 1.3. An example with four-period dynamic discrete choice will be used to identify the discount factor in section 1.3.3. The per period utility function 0 4 = ( 0 4 (x 1 ); 0 4 (x 2 )) | is identified with the normalization 0 4 (x 1 ) = 0 (Assumption 4.(iii)). With such normalization, we can identify 0 4 as 0 4 = 2 6 4 0 0 1 1 3 7 5A + b: To estimate 1=0 3 and 0 4 , we only need to estimate the difference between the state transition matrices F 0 and F 1 , and the CCP p 3 (S 3 ) and p 4 (S 4 ), with which A and b are then estimated. The per period utility functions 1=0 3 and 0 4 can be estimated by the above displays after substituting the unknowns with their estimates. Remark 1. [Identification of discount factor using parametric specification] This remark shows that the parametric specification of the per period utility functions helps identify the discount factor with the terminal conditions. Suppose 1=0 t (X t ) = t;0 +X | t t;1 and 0 t (X t ) = t;0 +X | t t;1 : (1.20) Under the above linear specification, equation (1.16) becomes (p 3 (S 3 )) = 3;0 +X | 3 3;1 + E 1=0 (X | 4 jX 3 ;Z 3 ) ( 4;1 ) E 1=0 [ (p 4 (S 4 ))jX 3 ;Z 3 ]: (1.21) Note that the intercept term 4;0 disappears because E 1=0 ( 4;0 jX 3 ;Z 3 ) = 0, and this corresponds to our earlier conclusion that the per period utility function 0 4 (X 4 ) is identified up to a constant. It follows from equation (1.21) that ( 3;0 ; 3;1 ; 4;1 ;) can be identified if the three terms X 3 , E 1=0 (X 4 jX 3 ;Z 3 ) and E 1=0 [ (p 4 (S 4 ))jX 3 ;Z 3 ] are not linearly dependent. Remark 2. In general the discount factor is not identifiable with two periods data even with the Exclusion Restriction. Without parametric specification about the per period utility functions, we have (p 3 (X 3 ;Z 3 )) = 1=0 3 (X 3 ) + E 1=0 [ 0 4 (X 4 )jX 3 ;Z 3 ] + E 1=0 [ (p 4 (S 4 ))jX 3 ;Z 3 ]: LetU be the space of the per period utility function 0 4 (X 4 ). The linear specification 0 t = t;0 +X | t t;1 in Remark 1 assumes thatU is the set of all linear functions of X 4 . The discount factor may not be identified, 19 1.3. An example with four-period dynamic discrete choice because E 1=0 [ 0 4 (X 4 )jX 3 ;Z 3 ] could be any function of (X 3 ;Z 3 ). If the equation of unknown function g(X 4 ), E 1=0 [g(X 4 )jX 3 ;Z 3 ] = E 1=0 [ (p 4 (X 4 ;Z 4 ))jX 3 ;Z 3 ]; has a solution inU, the discount factor cannot be identified. 6 (In this particular case, there is always a solution because the CCP p 4 (S 4 ) in the terminal period depends only on X 4 by the Exclusion Restriction.) Suppose g(X 4 ) is one solution, then let ~ 0 4 (X 4 ) = 4 (X 4 )g(X 4 ): We have (p 3 (X 3 ;Z 3 )) = 1=0 3 (X 3 ) + ( +c) E 1=0 [~ 0 4 (X 4 )jX 3 ;Z 3 ] + ( +c) E 1=0 [ (p 4 (S 4 ))jX 3 ;Z 3 ]; for any c such that 0< +c< 1, and the discount factor is not identified. Remark 3. Without the Exclusion Restriction or if the excluded variable Z t does not affect the transition of the state variables affecting per period utilities (X 4 ? ?Z 3 j(X 3 ;D 3 )), the per period utility functions are not identifiable in general even with the linear specification (1.20) and the terminal conditions (the continuation value associated with each alternative is zero in period 4). Suppose there is no excluded variable Z t , and the state variable S t =X t is a scalar. Assume that X t follows an autoregressive process, X t = 0 + d 1 X t1 +! t ; d2f0;1g; with E(! t jX t1 ) = 0. Let 1=0 1 = 1 1 0 1 . Equation (1.21) becomes (p 3 (X 3 )) = 3;0 +X 3 3;1 + E 1=0 (X 4 jX 3 )( 4;1 ) E 1=0 [ln(1p 4 (X 4 ))jX 3 ] = 3;0 +X 3 3;1 + (X 3 1=0 1 )( 4;1 ) E 1=0 [ln(1p 4 (X 4 ))jX 3 ] = 3;0 +X 3 ( 3;1 + 1=0 1 4;1 ) E 1=0 [ln(1p 4 (X 4 ))jX 3 ]: We can only identify ( 3;1 + 1=0 1 4;1 ) as a whole. However, if one is willing to assume that the per period utility functions are time invariant, so that 3;1 = 4;1 , we then can identify 4;1 separately from the sum 6 In Remark 1, in order to identify the discount factor , we require that E 1=0 (X 4 jX 3 ;Z 3 ) are E 1=0 [ (p 4 (S 4 ))jX 3 ;Z 3 ] are not linearly dependent. Note that this is equivalent to the condition here that the equation E 1=0 [g(X 4 )jX 3 ;Z 3 ] = E 1=0 [ (p 4 (S 4 ))jX 3 ;Z 3 ] has no solution inU, which is the set of all linear functions of X 4 given the linear specification in Remark 1. 20 1.3. An example with four-period dynamic discrete choice ( 3;1 + 1=0 1 4;1 ), because 3;1 = 4;1 , 1=0 1 and are identified ( 4;1 is identified because 1=0 4 (S 4 ) is identified from equation (1.15a)). This observation will be generalized to Proposition 5 of section §1.4. 1.3.2 Identification and estimation with data of the first two decision periods Suppose now researchers observe the decisions in the first two periods only, hence there is no terminal condition in this case. The decision rule in period 2 is D 2 = 8 > > < > > : 1; if 1 2 (S 2 ) + E 1 [v 3 (S 3 )jS 2 ] +" 1 2 > 0 2 (S 2 ) + E 0 [v 3 (S 3 )jS 2 ] +" 0 2 ; 0; otherwise. Without the terminal condition, we now have the unknown ex ante value function v 3 (S 3 ). The CCP in period 2 (last sampling period) is p 2 (S 2 ) = ~ G( 1=0 2 (S 2 ) + E 1=0 [v 3 (S 3 )jS 2 ]): (1.22) The ex ante value function v 2 (S 2 ) equals the following, v 2 (S 2 ) = E "2 [V 2 (S 2 ;" 2 )jS 2 ] = E "2 max d2f0;1g d 2 (S 2 ) + E d [v 3 (S 3 )jS 2 ] +" d 2 S 2 = ( 0 2 (S 2 ) + E 0 [v 3 (S 3 )jS 2 ]) + (p 2 (S 2 )): (1.23) Here (p 2 (S 2 )) = ln(p 2 (S 2 )) is as defined before. The CCP in period 1 is p 1 (S 1 ) = ~ G( 1=0 1 (S 1 ) + E 1=0 [v 2 (S 2 )jS 1 ]) ; (1.24) with v 2 (S 2 ) satisfying equation (1.23). Similar to equation (1.15), we have the following equations from equation (1.22), (1.23) and (1.24), (p 1 (S 1 )) = 1=0 1 (S 1 ) + E 1=0 [v 2 (S 2 )jS 1 ]; (1.25a) (p 2 (S 2 )) = 1=0 2 (S 2 ) + E 1=0 [v 3 (S 3 )jS 2 ]; (1.25b) v 2 (S 2 ) = 0 2 (S 2 ) + E 0 [v 3 (S 3 )jS 2 ] + (p 2 (S 2 )): (1.25c) Without terminal condition, the per period utility functions difference 1=0 2 cannot be identified from equation (1.25b). As in case 1 of section 1.3.1, without restriction, the per period utility functions 1=0 1 and 21 1.3. An example with four-period dynamic discrete choice 0 2 , and the discount factor are not identified. So we conclude that without further restriction, the terminal condition only helps identify the difference between the per period utility functions in the last sampling period. Applying the Exclusion Restriction and evaluating equation (1.25) at each (x i ;z j )2XZfx 1 ;x 2 g fz 1 ;z 2 g, we have (p 1 (x i ;z j )) = 1=0 1 (x i ) + E 1=0 [v 2 (S 2 )jx i ;z j ]; i;j = 1; 2; (p 2 (x i ;z j )) = 1=0 2 (x i ) + E 1=0 [v 3 (S 3 )jx i ;z j ]; i;j = 1; 2; v 2 (x i ;z j ) = 0 2 (x i ) + E 0 [v 3 (S 3 )jx i ;z j ] + (p 2 (x i ;z j )); i;j = 1; 2: (1.26) We want to identify 1=0 1 , 1=0 2 and 0 2 by solving the unknowns 1=0 1 ; 1=0 2 ; 0 2 ;v 2 ;v 3 explicitly from equa- tion (1.26). Note that the per period utility function 0 1 does not appear in the above equations, hence cannot be identified. Also note that the terminal payoff v 3 is identified (up to a constant). We solve equation (1.26) by following the steps below. Step 1: Eliminate the per period utility functions 1=0 1 , 1=0 2 and 0 2 from equation (1.26). Let t (i;j)(p t (x i ;z j )); t (i;j) (p t (x i ;z j )); and v 3 (S 3 )v 3 (S 3 ): We have the following equations, 1(1; 1)1(1; 2) = E 1=0 [v2(S2)jx1;z1] E 1=0 [v2(S2)jx1;z2]; (1.27a) 1(2; 1)1(2; 2) = E 1=0 [v2(S2)jx2;z1] E 1=0 [v2(S2)jx2;z2]; (1.27b) 2(1; 1)2(1; 2) = E 1=0 [ v3(S3)jx1;z1] E 1=0 [ v3(S3)jx1;z2]; (1.27c) 2(2; 1)2(2; 2) = E 1=0 [ v3(S3)jx2;z1] E 1=0 [ v3(S3)jx2;z2]; (1.27d) 2(1; 1) 2(1; 2) =v2(x1;z1)v2(x1;z2) E 0 [ v3(S3)jx1;z1] + E 0 [ v3(S3)jx1;z2]; (1.27e) 2(2; 1) 2(2; 2) =v2(x2;z1)v2(x2;z2) E 0 [ v3(S3)jx2;z1] + E 0 [ v3(S3)jx2;z2]: (1.27f) When the discount factor is known, the above system is equivalent to A 2 6 4 v 2 v 3 3 7 5 =b 2 ; (1.28) 22 1.3. An example with four-period dynamic discrete choice where the unknown is 2 6 4 v 2 v 3 3 7 5 = vec 2 6 6 6 6 6 6 6 4 v 2 (x 1 ;z 1 ) v 3 (x 1 ;z 1 ) v 2 (x 1 ;z 2 ) v 3 (x 1 ;z 2 ) v 2 (x 2 ;z 1 ) v 3 (x 2 ;z 1 ) v 2 (x 2 ;z 2 ) v 3 (x 2 ;z 2 ) 3 7 7 7 7 7 7 7 5 ; the coefficient matrix A is a 6 8 matrix, A 2 6 6 6 6 4 MF 1=0 0 0 MF 1=0 M MF 0 3 7 7 7 7 5 ; with the M matrix as defined by equation (1.19), and b 2 is a 6-dimensional vector consisting of the terms on the left-hand-side of equation (1.27), b 2 vec 2 6 4 ( 1 (1; 1) 1 (1; 2))= 2 (1; 1) 2 (1; 2) 2 (1; 1) 2 (1; 2) ( 1 (2; 1) 1 (2; 2))= 2 (2; 1) 2 (2; 2) 2 (2; 1) 2 (2; 2) 3 7 5: (1.29) Step 2: Solvev 2 and v 3 from equation (1.28). LetA + be the Moore-Penrose pseudoinverse of matrix A, then 2 6 4 v + 2 v + 3 3 7 5 =A + b 2 solves equation (1.26). Because we need to use v + t and v + t+1 separately, it is useful to split the matrix A + into two parts: A + = 2 6 4 A + u A + l 3 7 5; whereA + u andA + l are the 46 matrices formed by the first and last 4 rows of matrixA + , respectively. Then 2 6 4 v + 2 v + 3 3 7 5 = 2 6 4 A + u b 2 A + l b 2 3 7 5: If rankA = 6, we know from Lemma A.2 that the solution set of equation (1.28) is that 8 > < > : 2 6 4 v + 2 +c 2 1 4 v + 3 +c 3 1 4 3 7 5 :c 2 ;c 3 2R 9 > = > ; : 23 1.3. An example with four-period dynamic discrete choice Step 3: Identify the per period utility functions 1=0 1 , 1=0 2 and 0 2 . Suppose rankA = 6, and letv 2 =v + 2 +c 2 1 4 and v 3 = v + 3 +c 3 1 4 be arbitrary solutions of equation (1.28) and let f 1=0 (i;j) = 2 6 6 6 6 6 6 6 4 f 1=0 (x 1 ;z 1 jx i ;z j ) f 1=0 (x 1 ;z 2 jx i ;z j ) f 1=0 (x 2 ;z 1 jx i ;z j ) f 1=0 (x 2 ;z 2 jx i ;z j ) 3 7 7 7 7 7 7 7 5 : Then associated withv 2 =v + 2 +c 2 1 4 and v 3 = v + 3 +c 3 1 4 , we have the following from equation (1.26): for j = 1; 2, 1=0 1 (x i ) =(p 1 (x i ;z j ))f 1=0 (i;j) | A + u b 2 ; (1.30) 1=0 2 (x i ) =(p 2 (x i ;z j ))f 1=0 (i;j) | A + l b 2 ; (1.31) 0 2 (x i ) =v + 2 (x i ;z j )f 1=0 (i;j) | A + l b 2 (p 2 (x i ;z j )) + (c 2 c 3 ): The constant c 2 c 3 in 0 2 (x i ) can be determined by the normalization condition that 0 2 (x 1 ) = 0. So we conclude that the per period utility functions 1=0 1 , 1=0 2 and 0 2 are identified. Given the explicit formulas for the per period utility functions, their estimation is easy. We again estimate the CCP and the state transition matrices, then plug their estimates into the above formulas to estimate the per period utility functions. 1.3.3 Identification of the discount factor with three-period data Suppose we have data (D 1 ;S 1 ;D 2 ;S 2 ;D 3 ;S 3 ). Applying the identification arguments of case 2 (section 1.3.2) with data (D 1 ;S 1 ;D 2 ;S 2 ) and data (D 2 ;S 2 ;D 3 ;S 3 ), respectively, we will have two formulas for 1=0 2 (x i ): 1=0 2 (x i ) =(p 2 (x i ;z j ))f 1=0 (i;j) | A + u b 3 ; (1.32) 1=0 2 (x i ) =(p 2 (x i ;z j ))f 1=0 (i;j) | A + l b 2 ; (1.33) where equation (1.32) follows from equation (1.30) with data (D 2 ;S 2 ;D 3 ;S 3 ), and equation (1.33) follows from (1.31) with data (D 1 ;S 1 ;D 2 ;S 2 ). So we have the following equation, f 1=0 (i;j) | A + u b 3 =f 1=0 (i;j) | A + l b 2 ; (1.34) 24 1.3. An example with four-period dynamic discrete choice about the discount factor . In the remainder of this section, we derive the solution of the discount factor . For t = 2; 3, define two vectors b t;u and b t;l , b t;u vec 2 6 4 t1 (1; 1) t1 (1; 2) 0 0 t1 (2; 1) t1 (2; 2) 0 0 3 7 5; b t;l vec 2 6 4 0 t (1; 1) t (1; 2) t (1; 1) t (1; 2) 0 t (2; 1) t (2; 2) t (2; 1) t (2; 2) 3 7 5; where t (i;j)(p t (x i ;z j )) and t (i;j) (p t (x i ;z j )), so that b t = 1 b t;u +b t;l ; according to the definition of b t in equation (1.29). Denote h u (i;j)f 1=0 (i;j) | A + u ; and h l (i;j)f 1=0 (i;j) | A + l : Then equation (1.34) becomes h u (i;j)b 3;l 1 h l (i;j)b 2;u + (h u (i;j)b 3;u h l (i;j)b 2;l ) = 0: (1.35) Let r 3;l (i;j)h u (i;j)b 3;l r 3;u (i;j)h u (i;j)b 3;u ; r 2;u (i;j)h l (i;j)b 2;u ; r 2;l (i;j)h l (i;j)b 2;l : Letting i = 1; 2, we have two equations about : r 3;l (1;j) 1 r 2;u (1;j) + (r 3;u (1;j)r 2;l (1;j)) = 0; r 3;l (2;j) 1 r 2;u (2;j) + (r 3;u (2;j)r 2;l (2;j)) = 0: Hence we have = r 2;u (2;j)(r 2;l (1;j)r 3;u (1;j))r 2;u (1;j)(r 2;l (2;j)r 3;u (2;j)) r 3;l (1;j)r 2;u (2;j)r 3;l (2;j)r 2;u (1;j) ; provided that ~ rr 3;l (1;j)r 2;u (2;j)r 3;l (2;j)r 2;u (1;j)6= 0: 25 1.4. Identification of Structural Parameters It is instructive to see when ~ r6= 0. We write ~ r =h u (1;j)b 3;l h l (2;j)b 2;u h u (2;j)b 3;l h l (1;j)b 2;u =h u (1;j)b 3;l b 2;u | h l (2;j) | h u (2;j)b 3;l b 2;u | h l (1;j) | = (h u (1;j)h u (2;j)) | b 3;l b 2;u | h l (2;j) | +h u (2;j) | b 3;l b 2;u | (h l (2;j)h l (1;j)) | : To ensure that ~ r6= 0, that is to identify the discount factor, it is necessary that (i) the choice D t changes the state transition distributions, that is f 1=0 (S t+1 jS t )6= 0 for some S t ;S t+1 . Otherwise, we have f 1=0 (i;j) = 0, hence h u (i;j) = 0 and h l (i;j) = 0, hence ~ r = 0; (ii) the state variableX t should affect the difference between the state transition distributions under the two alternatives given the excluded variable Z t , that is f 1=0 (1;j)6=f 1=0 (2;j) for some j = 1; 2. Otherwise, h u (1;j) =h u (2;j) and h l (1;j) =h l (2;j), hence ~ r = 0; (iii) for each periodt, the excluded variableZ t should still change the CCP conditional on the state variables X t that enter into the per period utility functions, that is p t (x i ;z j )6= p t (x i ;z j 0) for some i;j6= j 0 . Otherwise, b 2;u = 0 or b 3;l = 0, hence ~ r = 0. 1.4 Identification of Structural Parameters We first show that the identification of DPDC models is equivalent to the identification of a linear GMM model. Then, applying this equivalence, we prove a list of identification results. Several important remarks are added at the end of this section. 1.4.1 Linear GMM representation of DPDC models Our DPDC model maps its structural parameters to a joint probability function f(D;) of data D = (D 1 ;S 1 ;:::;D T ;S T ). DefineF ff(D;) : 2 g. Two sets of structural parameters and ~ are observationally equivalent if and only if (iff)f(D;) =f(D; ~ ) for allD2D, whereD is the support of dataD. Given dataD, the structural parameters are identified in the parameter space iff any two observationally equivalent parameters in are identical. In other words, the structural parameters are identified if for any f(D)2F, the system of equations f(D) =f(D;);8D2D; (1.36) 26 1.4. Identification of Structural Parameters has a unique solution for in the parameter space . Due to limited data and/or weak restrictions on the parameter space, we sometimes can only identify one component of the structural parameters, which turns out to be our case. Let = ( a ; b )2 be the vector of parameters. We say that a is identified in iff for any pair = ( a ; b ); ~ = ( ~ a ; ~ b ), the condition that and ~ are observationally equivalent implies a = ~ a . Again, this statement can be rephrased in terms of equation (1.36) as follows: a is identified in iff for any f(D)2F, equation (1.36) has a unique solution for a . Any joint probability function f(D) = f(D 1 ;S 1 ;:::;D T ;S T )2F can be decomposed as the following product f(D) = P(S 1 )P(D 1 jS 1 )P(S 2 jS 1 ;D 1 )P(D 2 jS 2 ;S 1 ;D 1 ) P(S 3 jS 2 ;S 1 ;D 2 ;D 1 )P(D 3 jS 3 ;S 2 ;S 1 ;D 2 ;D 1 ) P(S T jS T1 ;:::;S 1 ;D T1 ;:::;D 1 )P(D T jS T ;S T1 ;:::;S 1 ;D T1 ;:::;D 1 ): (1.37) Because the joint probability function f(D)2F satisfies the Markovian assumptions 2 and 3, we have P(S t jS t1 ;:::;S 1 ;D t1 ;:::;D 1 ) = P(S t jS t1 ;D t1 ) =f t (S t jS t1 ;D t1 ); P(D t jS t ;:::;S 1 ;D t1 ;:::;D 1 ) = P(D t jS t ) = (p t (S t )) Dt (1p t (S t )) 1Dt : So equation (1.37) equals the following, f(D) =f 1 (S 1 ) " T1 Y t=1 P(D t jS t )f t+1 (S t+1 jS t ;D t ) # P(D T jS T ): Similarly, we can decompose f(D;) by f(D;) =f 1 (S 1 ;) " T1 Y t=1 P(D t jS t ;)f t+1 (S t+1 jS t ;D t ;) # P(D T jS T ;); where P(D t jS t ;) = (p t (S t ;)) Dt (1p t (S t ;)) 1Dt . Because of the above decomposition off(D) andf(D;), 27 1.4. Identification of Structural Parameters it can be verified that equation (1.36) is equivalent to the following 7 f1(S1) =f1(S1;);8S12S; (1.38a) ft+1(St+1jSt;Dt) =ft+1(St+1jSt;Dt;); t = 1;:::;T 1;8(St+1;St;Dt)2S 2 f0;1g; (1.38b) pt(St) =pt(St;); t = 1;:::;T;8St2S: (1.38c) From equation (1.38a) and (1.38b), we conclude that the state transition probabilities are identified. In the remainder of the identification analysis, we assume that the state transition probabilities are known and focus on the identification of per period utility functions ( 0 t and 1=0 t ) and discount factors ( t ). The attention now is equation (1.38c), which requires the explicit form of the CCP p t (S t ;) in terms of the structural parameters . It follows from the CCP formula of equation (1.9) that p t (S t ;) = ~ G(v 1 t (S t )v 0 t (S t )) = ~ G( 1=0 t (S t ) + t E 1=0 t+1 [v t+1 (S t+1 )jS t ]); t = 1;:::;T 1; (1.39) and p T (S T ;) = ~ G(v 1 T (S T )v 0 T (S T )); where ~ G is the CDF of ~ " =" 0 t " 1 t . Because the CDF ~ G is known and strictly increasing (Assumption 3.(i)), its inverse ~ G 1 is known. Let () = ~ G 1 () denote the inverse. So that we have (p t (S t ;)) =v 1 t (S t )v 0 t (S t ); t = 1;:::;T: (1.40) It should be noted that the ex ante value functionsfv t+1 (S t+1 ) :t = 1;:::;T 1g in equation (1.39) are not structural parameters. So we express v t (S t ) in terms of the structural parameters. It follows from the 7 Take T =2 for example, so that D=(D 1 ;S 1 ;D 2 ;S 2 ). If equation (1.38) holds, we clearly have f(D)=f(D;). Suppose f(D)=f(D;), and we will show equation (1.38). We first have f 1 (S 1 )= P D 1 ;D 2 ;S 2 f(D) and f 1 (S 1 ;)= P D 1 ;D 2 ;S 2 f(D;). Fromf(D)=f(D;), we concludef 1 (S 1 )=f 1 (S 1 ;). The notation P D 1 ;D 2 ;S 2 means sum over all values of(D 1 ;D 2 ;S 2 ) in their support. We next have f(S 1 ;D 1 )= P D 2 ;S 2 f(D) and f(S 1 ;D 1 ;)= P D 2 ;S 2 f(D;), hence f(S 1 ;D 1 )=f(S 1 ;D 1 ;). Because f(S 1 ;D 1 )=f 1 (S 1 )P(D 1 jS 1 ), f(S 1 ;D 1 ;)=f 1 (S 1 ;)P(D 1 jS 1 ;) and f 1 (S 1 )=f 1 (S 1 ;), we conclude P(D 1 jS 1 )=P(D 1 jS 1 ;), which is equivalent to p 1 (S 1 ) = p 1 (S 1 ;). Following the same strategy, we conclude f(S 1 ;D 1 ;S 2 ) = f(S 1 ;D 1 ;S 2 ;), which implies that f 2 (S 2 jS 1 ;D 1 )=f 2 (S 2 jS 1 ;D 1 ;) as f(S 1 ;D 1 )=f(S 1 ;D 1 ;). We conclude p 2 (S 2 )=p 2 (S 2 ;) by f(D)=f(D;) and f(S 1 ;D 1 ;S 2 )=f(S 1 ;D 1 ;S 2 ;). 28 1.4. Identification of Structural Parameters definition of v t (S t ) in equation (1.4) and the Bellman equation (1.7) that v t (S t ) = V t (S t ;" t ) dG(" t ) = maxfv 0 t (S t ) +" 0 t ;v 1 t (S t ) +" 1 t g dG(" t ) =v 0 t (S t ) + maxf" 0 t ;v 1 t (S t )v 0 t (S t ) +" 1 t g dG(" t ) =v 0 t (S t ) + maxf" 0 t ;(p t (S t ;)) +" 1 t g dG(" t ) =v 0 t (S t ) + (p t (S t ;)); where depends only on the CDF G of the utility shocks " t = (" 0 t ;" 1 t ) | . Replacing v 0 t in the above display with its definition, v 0 t (S t ) = 0 t (S t ) + t E 0 t+1 [v t+1 (S t+1 )jS t ], we have a recursive expression of the ex ante value function, v t (S t ) = 0 t (S t ) + t E 0 t+1 [v t+1 (S t+1 )jS t ] + (p t (S t ;)); t<T; v T (S T ) =v 0 T (S T ) + (p T (S T ;)): (1.41) Note that the ASVF v 0 T 2 T = (v 0 T ;v 1 T ;F 0 T ;F 1 T ), and p T (S T ;) = ~ G(v 1 T (S T )v 0 T (S T )) is determined by T . So that v T (S T ) is completely determined by T , which is a part of the structural parameters. Given the above results, equation (1.38c) is equivalent to the following system of equations, p t (S t ) =p t (S t ;) = ~ G( 1=0 t (S t ) + t E 1=0 t+1 [v t+1 (S t+1 )jS t ]); t = 1;:::;T 1;8S t 2S; p T (S T ) =p T (S T ;) = ~ G(v 1 T (S T )v 0 T (S T ));8S T 2S; with v t (S t ) = 0 t (S t ) + t E 0 t+1 [v t+1 (S t+1 )jS t ] + (p t (S t ;)); t = 2;:::;T 1;8S t 2S; v T (S T ) =v 0 T (S T ) + (p T (S T ;));8S T 2S: In this system of equations, the known objects are the CCPfp t (S t ) : t = 1;:::;Tg and state transition matricesfF d 2 ;:::;F d T : d = 0;1ghiddenintheconditionalexpectationoperators E 1=0 t+1 (jS t )and E 0 t+1 (jS t ); the unknowns are per period utility functionsf 1=0 1 ;:::; 1=0 T1 ; 0 2 ;:::; 0 T1 g, ex ante value functionsfv 2 ;:::;v T g, 29 1.4. Identification of Structural Parameters thetwoASVFv 0 T andv 1 T , andthediscountfactorsf 1 ;:::; T1 g. Onecomponentofthestructuralparameters is identified iff the above system of equations has a unique solution for it. Two remarks help to simplify the above system of equations. First, using the invertibility of the CDF ~ G and the identitiesfp t (S t ;) =p t (S t ) : t = 1;:::;Tg, the above system has the same solutions as (pt(St)) = 1=0 t (St) +t E 1=0 t+1 [vt+1(St+1)jSt]; t = 1;:::;T 1;8St2S; (1.42a) (pT (ST )) =v 1 T (ST )v 0 T (ST );8St2S (1.42b) vt(St) = 0 t (St) +t E 0 t+1 [vt+1(St+1)jSt] + (pt(St)); t = 2;:::;T 1;8St2S; (1.42c) vT (ST ) =v 0 T (ST ) + (pT (ST ));8ST2S: (1.42d) Second, equation (1.42b) and (1.42d) simply state that v 0 T and v 1 T are uniquely determined by v T . Hence, in order to solve for = ( 1 ;:::; T ) from equation (1.42), we can solve for 1 ;:::; T1 and v T . Moreover, the solutions of ( 1 ;:::; T1 ;v T ), which appears only in equation (1.42a) and (1.42c), do not depend on equation (1.42b) and (1.42d). So equation (1.42) has the same solution for ( 1 ;:::; T1 ;v T ) as the following system, 8 > < > : (p t (s t )) = 1=0 t (s t ) + t E 1=0 t+1 [v t+1 (S t+1 )js t ]; t = 1;:::;T 1;8S t 2S; (p t (s t )) =v t (s t ) 0 t (s t ) t E 0 t+1 [v t+1 (S t+1 )js t ];t = 2;:::;T 1;8S t 2S: (ID) The identification analysis below will be based on checking if there is a unique solution for (some parts of) ( 1 ;:::; T1 ;v T ) by solving ( 1 ;:::; T1 ;v T ) from equation (ID). Equation (ID) has the feature that given the discount factors t , it is linear in all the other unknowns; meanwhile, equation (ID) is linear in the discount factors, given the other unknowns. When the discount factors t are known, the uniqueness of solution is very easy to check because equation (ID) is linear in all the other unknowns. More explicitly, using the notation of F 0 t+1 and F 1=0 t+1 , equation (ID) can be written as follows, 8 > < > : (p t ) = 1=0 t + t F 1=0 t+1 v t+1 ; t = 1;:::;T 1; (p t ) =v t 0 t t F 0 t+1 v t+1 ;t = 2;:::;T 1; (ID’) where (p t ) = ((p t (s 1 ));:::;(p t (s ds ))) | and (p t ) = ( (p t (s 1 ));:::; (p t (s ds ))) | : In this sense, we claim that the identification of DPDC models is equivalent to identification of a linear GMM system, henceforth a familiar problem. The necessary condition for identification is that the number of 30 1.4. Identification of Structural Parameters equations should be greater than the number of unknowns (order condition). If the order condition fails, we shall consider restrictions that can eliminate certain number of unknowns, or add more equations by increasing the number of time periods T in panel data. 1.4.2 Identification of DPDC models by the linear GMM representation A sequence of identification results will be derived by using the linear GMM representation of the DPDC model in equation (ID). The unknowns in equation (ID) are f 1=0 1 ;:::; 1=0 T1 ; 0 2 ;:::; 0 T1 ;v 2 ;:::;v T ; 1 ;:::; T1 g: Without restriction, the system of equations (ID) has (2T 3)d s equations with (3T 4)d s + (T 1) unknowns. This implies that the structural parameters are not identified even when all discount factors are known (removing T 1 unknowns). The non-identification of the DPDC model has long been known in the literature (Rust, 1994; Magnac and Thesmar, 2002). The problem of interests is what restrictions shall we use? We focus on the identification using the Exclusion Restriction stated in section §1.3, which is copied below for the convenience of reading. Exclusion Restriction. The vector of observable state variables S t has two parts X t and Z t . Let S t = (X t ;Z t ), where X t 2X =fx 1 ;:::;x dx g and Z t 2Z =fz 1 ;:::;z dz g. Assume that 1 t (X t ;Z t ) = 1 t (X t ) and 0 t (X t ;Z t ) = 0 t (X t ) for any (X t ;Z t ). For expositional simplicity, assume thatS =XZ, so that d s =d x d z . In particular, let S = vec 2 6 6 6 6 4 (x 1 ;z 1 ) (x 2 ;z 1 ) ::: (x dx ;z 1 ) . . . . . . ::: . . . (x 1 ;z dz ) (x 2 ;z dz ) ::: (x dx ;z dz ) 3 7 7 7 7 5 : For d x =d z = 2, this meansS =f (x 1 ;z 1 ); (x 1 ;z 2 ); (x 2 ;z 1 ); (x 2 ;z 2 )g. Notice that under the above restriction, d t = ( d t (x 1 );:::; d t (x dx )) | is ad x -dimensional vector. The above restriction is satisfied in our female labor force participation example, where S t = (husb t ;xp t ;edu;kid t ;xp H t ; edu H ) with X t = (husb t ;xp t ;edu;kid t ) and Z t = (xp H t ;edu H ). In general, given a set of state variables X t that affect per period utilities, one searches for Z t by looking for the variables that affect X t+1 but not affect per period utilities given X t . For example, in Rust’s (1987) bus engine replacement application, X t is the 31 1.4. Identification of Structural Parameters mileage of the bus. Then Z t could be characteristics of the bus’ route, which will affect the bus’ mileage in the next period, but not the current maintenance cost given the mileage. We have shown the identification power of the Exclusion Restriction in the previous section. Below, we provide more general identification results. It is instructive to consider first the stationary dynamic programming problem with known discount factor. By “stationary dynamic programming problem”, we mean that the decision horizon T is infinite, and the per period utility functions, the discount factors and the state transition distributions are time invariant. When the agent’s dynamic programming problem is stationary, and the Exclusion Restriction holds, (ID) becomes 8 > < > : (p(X;Z)) = 1=0 (X) + E 1=0 [v(X 0 ;Z 0 )jX;Z]; 8(X;Z)2S; (p(X;Z)) =v(X;Z) 0 (X) E 0 [v(X 0 ;Z 0 )jX;Z];8(X;Z)2S; (1.43) or equivalently 8 > < > : (p) = 1=0 1 dz +F 1=0 v; (p) =v 0 1 dz F 0 v: (1.44) For d x =d z = 2, 1=0 1 dz = ( 1=0 (x 1 ); 1=0 (x 1 ); 1=0 (x 2 ); 1=0 (x 2 )) | ; 0 1 dz = ( 0 (x 1 ); 0 (x 1 ); 0 (x 2 ); 0 (x 2 )) | : We will need to recover 1=0 and 0 from 1=0 1 dz and 0 1 dz , respectively. To this end, define thed x d s matrix W by WI dx 1 d z 1 dz | : (1.45) So that 1=0 =W ( 1=0 1 dz ) and 0 =W ( 0 1 dz ). The linear system of equations (1.43) or (1.44) has 2d s equations with 2d x +d s unknowns. So if d s 2d x , we may be able to identify the structural parameters. In particular, when d s =d x d z , the order condition d s 2d x would be satisfied as long as d z 2. The identification of the per period utility functions ( 1=0 ; 0 ) will be based on solving ( 1=0 ; 0 ) explicitly from equation (1.43) or (1.44). We solve equation (1.43) by following the steps below, which are similar to the steps in section 1.3.2. Step 1: Eliminate the per period utility functions 1=0 and 0 from equation (1.43) by considering the 32 1.4. Identification of Structural Parameters followings differences, (p(x i ;z j ))(p(x i ;z j+1 )) and (p(x i ;z j )) (p(x i ;z j+1 )); for i = 1;:::;d x and j = 1;:::;d z 1. For expositional simplicity, denote (i;j) =(p(x i ;z j )) and (i;j) = (p(x i ;z j )): It follows from equation (1.43) that 8 > > > > > < > > > > > : (i;j)(i;j + 1) = E 1=0 [v(X 0 ;Z 0 )jx i ;z j ] E 1=0 [v(X 0 ;Z 0 )jx i ;z j+1 ]; (i;j) (i;j + 1) =v(x i ;z j )v(x i ;z j+1 ) E 0 [v(X 0 ;Z 0 )jx i ;z j ] + E 0 [v(X 0 ;Z 0 )jx i ;z j+1 ]; for all i = 1;:::;d x and j = 1;:::;d z 1. We have a simpler representation of the above system of equations using equation (1.44). Define the [d x (d z 1)]d s matrix M by MI dx 2 6 6 6 6 4 1 1 . . . . . . 1 1 3 7 7 7 7 5 (dz1)dz : Multiplying both sides of equation (1.44) with matrix M, we have 8 > < > : M(p) =M( 1=0 1 dz ) +MF 1=0 v; M (p) =MvM( 0 1 dz )MF 0 v: Because M( 1=0 1 dz ) =M( 0 1 dz ) = 0 ds 8 , we have the following linear system of equations, Av =b; (1.46) whereA is the [2d x (d z 1)]d s matrix, andb is the [2d x (d z 1)]-dimensional vector defined by A = 2 6 4 MF 1=0 M(IF 0 ) 3 7 5 and b = 2 6 4 M(p) M (p) 3 7 5: (1.47) 8 Take dx = dz = 2 for example, we have 0 1 dz = ( 0 (x 1 ); 0 (x 1 ); 0 (x 2 ); 0 (x 2 )) | , henceforth M( 0 1 dz ) = ( 0 (x 1 ) 0 (x 1 ); 0 (x 2 ) 0 (x 2 )) | =0 2 . 33 1.4. Identification of Structural Parameters Step 2: Solve for the ex ante value value functions v in equation (1.46). Let A + be the Moore-Penrose pseudoinverse of matrix A, then the d s -dimensional vector v + A + b solves equation (1.46) by the definition of pseudoinverse. If rankA =d s 1, we know from Lemma A.1 that the solution set for equation (1.46) is v + +c 1 ds :c2R ; where c 1 ds = (c;:::;c) | is a d s -dimensional vector of constant c. Step 3: Identify the per period utility functions 1=0 and 0 . Suppose rankA =d s 1, and letv =v + +c 1 ds be an arbitrary solution of equation (1.46). Associated with the solution v =v + +c 1 ds , we have the following from equation (1.44), 1=0 1 dz =(p)F 1=0 (v + +c 1 ds ) =(p)F 1=0 v + ; and 0 1 dz = (v + +c 1 ds )F 0 (v + +c 1 ds ) (p) = (IF 0 )v + (p) + (cc) 1 ds : We have the above equations because F 1=0 1 ds = 0 ds and F 0 1 ds = 1 ds . Using the matrix W of equation (1.45), we have 1=0 =W ( 1=0 1 dz ) =W ((p)F 1=0 v + ): Hence the per period utilities difference 1=0 is identified. To get rid of the unknown constant c in 0 1 dz , we use the normalization 0 (x 1 ) = 0 of Assumption 4.(iii). For this purpose, Define L 2 6 6 6 6 6 6 6 4 0 0 ::: 0 1 1 . . . . . . 1 1 3 7 7 7 7 7 7 7 5 dsds : (1.48) 34 1.4. Identification of Structural Parameters We then have 0 =WL( 0 1 dz ) =WL[(IF 0 )v + (p)]: Hence the per period utility function 0 is identified with the normalization. Proposition 1 (Identification with the Exclusion Restriction, known discount factors and stationarity). In addition to Assumptions 1-4, suppose the Exclusion Restriction holds, the discount factors are known and that the agent’s dynamic programming problem is stationary. Let the matrix A and the vector b be defined by equation (1.47). If T 2 and rankA =d s 1, the per period utility functions 1=0 and 0 are identified. Moreover, we have 1=0 =W ((p)F 1=0 A + b); 0 =WL[(IF 0 )A + b (p)]: We now move to the identification of general dynamic programming discrete choice models using the Exclusion Restriction. It follows from the Exclusion Restriction that equation (ID) becomes 8 > < > : (p t (X t ;Z t )) = 1=0 t (X t ) + t E 1=0 t+1 [v t+1 (S t+1 )jX t ;Z t ]; t = 1;:::;T 1; (p t (X t ;Z t )) =v t (X t ;Z t ) 0 t (X t ) t E 0 t+1 [v t+1 (S t+1 )jX t ;Z t ];t = 2;:::;T 1; (1.49) for all (X t ;Z t )2S. There are (2T 3)d x + (T 1)d s + (T 1) unknowns and (2T 3)d s equations. When the discount factors are known (removing T 1 unknowns),d z 3 andT 3, we have more equations than unknowns. It should be remarked that when T < 3, the order condition always fails regardless of the value of d z . When T = 2, we have only (p 1 (X 1 ;Z 1 )) = 1=0 1 (X 1 ) + 1 E 1=0 2 [v 2 (S 2 )jX 1 ;Z 1 ]: Note that in general we do not have (p 2 (X 2 ;Z 2 )) = 1=0 2 (X 2 ) + 2 E 1=0 3 [v 3 (S 3 )jX 2 ;Z 2 ]; (p 2 (X 2 ;Z 2 )) =v 2 (X 2 ;Z 2 ) 0 2 (X 2 ) 2 E 0 3 [v 3 (S 3 )jX 2 ;Z 2 ]; because the state transition matrices F 0 3 and F 1 3 are unknown given data (D 1 ;S 1 ;D 2 ;S 2 ). However, if one assumes that the state transition matrices are time invariant as we did in section §1.3, we can use the above two equations. 35 1.4. Identification of Structural Parameters We first focus on the identification with known discount factors. Let v t+1 (S t+1 ) = t v t+1 (S t+1 ) be the discounted ex ante value function. For each period t = 2;:::;T 1, we will show how to solve the unknowns ( 1=0 t1 ; 1=0 t ; 0 t ;v t ;v t+1 ) from the following part of equation (1.49), 8 > > > > > < > > > > > : (p t1 (X t1 ;Z t1 )) = 1=0 t1 (X t1 ) + t1 E 1=0 t [v t (S t )jX t1 ;Z t1 ]; 8(X t1 ;Z t1 )2S; (p t (X t ;Z t )) = 1=0 t (X t ) + E 1=0 t+1 [ v t+1 (S t+1 )jX t ;Z t ]; 8(X t ;Z t )2S; (p t (X t ;Z t )) =v t (X t ;Z t ) 0 t (X t ) E 0 t+1 [ v t+1 (S t+1 )jX t ;Z t ]; 8(X t ;Z t )2S; (1.50) or equivalently 8 > > > > > < > > > > > : (p t1 ) = 1=0 t1 1 dz + t1 F 1=0 t v t ; (p t ) = 1=0 t 1 dz +F 1=0 t+1 v t+1 ; (p t ) =v t 0 t F 0 t+1 v t+1 : (1.51) Ranging period t from 2 to T 1, all unknowns 1=0 1 ;:::; 1=0 T1 ; 0 2 ;:::; 0 T1 ;v 2 ;:::;v T will then be solved. For eacht, we solve equation (1.50) by following the steps below, which are similar to the steps in section 1.3.2. Step 1: Eliminate the unknown per period utility functions 1=0 t1 , 1=0 t and 0 t from equation (1.50) by considering the following differences, 8 > > > > > > > > > < > > > > > > > > > : t1 (i;j) t1 (i;j + 1) = t1 E 1=0 t [v t (S t )jx i ;z j ] t1 E 1=0 t [v t (S t )jx i ;z j+1 ]; t (i;j) t (i;j + 1) = E 1=0 t+1 [ v t+1 (S t+1 )jx i ;z j ] E 1=0 t+1 [ v t+1 (S t+1 )jx i ;z j+1 ]; t (i;j) t (i;j + 1) =v t (x i ;z j )v t (x i ;z j+1 ) E 0 t+1 [ v t+1 (S t+1 )jx i ;z j ] + E 0 t+1 [ v t+1 (S t+1 )jx i ;z j+1 ]; (1.52) for all i = 1;:::;d x and j = 1;:::;d z 1. Here, t (i;j) =(p t (X t =x i ;Z t =z j )) and t (i;j) = (p t (X t =x i ;Z t =z j )): 36 1.4. Identification of Structural Parameters Equation (1.52) can be organized as the following linear system of equations, A t 2 6 4 v t v t+1 3 7 5 =b t ; (1.53) where A t is a [3d x (d z 1)] (2d s ) matrix, and b t is a [3d x (d z 1)]-dimensional vector: A t 2 6 6 6 6 4 MF 1=0 t 0 0 MF 1=0 t+1 M MF 0 t+1 3 7 7 7 7 5 and b t 2 6 6 6 6 4 1 t1 M(p t1 ) M(p t ) M (p t ) 3 7 7 7 7 5 : (1.54) Step 2: Solve v t and v t+1 from equation (1.53). Let A + t be the Moore-Penrose pseudoinverse of matrix A t , then 2 6 4 v + t v + t+1 3 7 5 =A + t b t solves equation (1.53) by the definition of pseudoinverse. Split the matrix A + t into two parts: A + t = 2 6 4 A + t;u A + t;l 3 7 5; (1.55) where A + t;u and A + t;l are the d s [3d x (d z 1)] matrices formed by the first and last d s rows of matrix A + t , respectively. Then 2 6 4 v + t v + t+1 3 7 5 = 2 6 4 A + t;u b t A + t;l b t 3 7 5; or more explicitly, v + t =A + t;u b t and v + t+1 =A + t;l b t : (1.56) If rankA t = 2 (d s 1), we know from Lemma A.2 that the solution set of equation (1.53) is that 8 > < > : 2 6 4 v + t +c t 1 ds v + t+1 +c t+1 1 ds 3 7 5 :c t ;c t+1 2R 9 > = > ; : (1.57) Step 3: Identify the per period utility functions 1=0 t1 , 1=0 t and 0 t . Suppose rankA t = 2 (d s 1), and let v t =v + t +c t 1 ds and v t+1 = v + t+1 +c t+1 1 ds be arbitrary solutions of equation (1.53). Then associated 37 1.4. Identification of Structural Parameters with v t =v + t +c t 1 ds and v t+1 = v + t+1 +c t+1 1 ds , we have the following from equation (1.51), 1=0 t 1 dz =(p t )F 1=0 t+1 v + t+1 ; 1=0 t1 1 dz =(p t1 ) t1 F 1=0 t v + t ; 0 t 1 dz =v + t F 0 t+1 v + t+1 (p t ) + (c t c t+1 ) 1 ds : Using the matrix W of equation (1.45), we have 1=0 t =W ((p t )F 1=0 t+1 v + t+1 ); 1=0 t1 =W ((p t1 ) t1 F 1=0 t v + t ): Hence we claim that 1=0 t and 1=0 t1 are identified. With the normalization 0 t (x 1 ) = 0 of Assump- tion 4.(iii), we get rid of the constant c t c t+1 in 0 t 1 dz . We then have 0 t =WL(v + t F 0 t+1 v + t+1 (p t )): Hence the per period utility function 0 t is identified with the normalization. Proposition 2 (Identification with the Exclusion Restriction, known discount factors andT 3). In addition to Assumptions 1-4, suppose the Exclusion Restriction holds, the discount factors are known and T 3. For t = 2;:::;T 1, let the matrix A t and the vector b t be defined by equation (1.54). If rankA t = 2 (d s 1), then the per period utility functions 1=0 t , 0 t and 1=0 t1 are identified. Moreover, we have 1=0 t =W ((p t )F 1=0 t+1 A + t;l b t ); 1=0 t1 =W ((p t1 ) t1 F 1=0 t A + t;u b t ); 0 t =WL(A + t;u b t F 0 t+1 A + t;l b t (p t )): When the panel data have the number of time periods greater than 4, we can also identify the discount factors using the strategy of section 1.3.3. Applying Proposition 2, we have two formulas of 1=0 t : 1=0 t =W ((p t )F 1=0 t+1 A + t;l b t ) 1=0 t =W ((p t ) t F 1=0 t+1 A + t+1;u b t+1 ); 38 1.4. Identification of Structural Parameters from which we have an equation of the discount factors t1 and t , which is hidden in b t : WF 1=0 t+1 A + t;l b t t WF 1=0 t+1 A + t+1;u b t+1 = 0: We derive the explicit solutions of ( t1 ; t ) below. Define b t;u = vec M(p t1 ) 0 0 and b t;l = vec 0 M(p t ) M (p t ) ; so that the vector b t defined in equation (1.54) equals b t = 1 t1 b t;u +b t;l : Let H t;l =WF 1=0 t+1 A + t;l and H t+1;u =WF 1=0 t+1 A + t+1;u : Then the equation WF 1=0 t+1 A + t;l b t t WF 1=0 t+1 A + t+1;u b t+1 = 0 is written as follows, 1 t1 H t;l b t;u t H t+1;u b t+1;l + (H t;l b t;l H t+1;u b t+1;u ) = 0; or H t;l b t;u H t+1;u b t+1;l 2 6 4 1 t1 t 3 7 5 = (H t;l b t;l H t+1;u b t+1;u ): Denote ~ R t = H t;l b t;u H t+1;u b t+1;l : (1.58) If rank ~ R t = 2, we have the unique solution of ( 1 t1 ; t ) | : 2 6 4 1 t1 t 3 7 5 = ~ R t + (H t;l b t;l H t+1;u b t+1;u ) Proposition 3 (Identification of discount factors with the Exclusion Restriction and T 4). Suppose the conditions of Proposition 2 hold with T 4 (more than 4 consecutive periods data). If the matrices ~ R t , 39 1.4. Identification of Structural Parameters t = 2;:::;T 1, defined in equation (1.58) are of full rank, the discount factors 1 ;:::; T2 are identified. Note that the discount factor T1 is not identified. In practice, the excluded state variable Z t could be time invariant. For example, in Rust’s (1987) bus engine replacement application, the excluded state variable can be a permanent route characteristic for the bus. Recently, Fang and Wang (2015) use excluded variables to identify hyperbolic discounting parameters. In their application of mammography decisions, the excluded variables include categorical variables like education and race, which do not change over time. When the excluded variable Z t is time invariant, the conclusion of Proposition 2 and 3 hold under a different rank condition. Proposition 4 (Identification with permanent excluded state variables). In addition to Assumptions 1-4 and the Exclusion Restriction, suppose that the excluded state variable Z t is time invariant. For t = 2;:::;T 1, let the matrix A t and the vector b t be defined by equation (1.54). (i) If the discount factors are known, and rankA t = 2d s d z 1 for t = 2;:::;T 1 with T 3, the per period utility functions 1=0 1 ;:::; 1=0 T1 and 0 2 ;:::; 0 T1 are identified and satisfy the formulas in Proposition 2. (ii) In addition to the conditions of part (i), if T 4 and the matrices ~ R t , t = 2;:::;T 1, defined in equation (1.58) are of full rank, the discount factors 1 ;:::; T2 are also identified. Proof. See Appendix A.1. When there are no excluded variables Z t , an alternative way to identify per period utility functions is to assume that the per period utility functions are time invariant but the state transition matrices are time varying. Then time itself is an excluded variable. Suppose that we have at least four consecutive periods observation, that is T 4. Assume that the per period utility functions are time invariant, 1=0 t = 1=0 and 0 t = 0 . Then for each t 3, equation (ID’) becomes the following 8 > > > > > > > > > > < > > > > > > > > > > : (p t ) = 1=0 +F 1=0 t+1 v t+1 (p t1 ) = 1=0 + t1 F 1=0 t v t (p t2 ) = 1=0 + t2 F 1=0 t1 v t1 (p t ) =v t 0 F 0 t+1 v t+1 (p t1 ) =v t1 0 t1 F 0 t v t : (1.59) To solve the per period utility functions 1=0 and 0 from equation (1.59), we again first solve the ex ante value functions v t1 ;v t ; v t+1 , then solve 1=0 and 0 given the solutions of the ex ante value functions. To 40 1.4. Identification of Structural Parameters solve v t1 ;v t ; v t+1 , we consider the difference (p t )(p t1 ), (p t1 )(p t2 ) and (p t ) (p t1 ). We have A t 2 6 6 6 6 4 v t+1 v t v t1 3 7 7 7 7 5 =b t ; where A t is the (3d s ) (3d s ) matrix, and b t is the 3d s -dimensional vector defined by A t = 2 6 6 6 6 4 F 1=0 t+1 t1 F 1=0 t 0 0 t1 F 1=0 t t2 F 1=0 t1 F 0 t+1 I + t1 F 0 t I 3 7 7 7 7 5 and b t = 2 6 6 6 6 6 4 (p t )(p t1 ) (p t1 )(p t2 ) (p t ) (p t1 ) 3 7 7 7 7 7 5 : (1.60) Proposition 5 (Identification with time-invariant per period utilities and time-varying transition matrices). In addition to Assumptions 1-4, suppose that the discount factors are known, T 4, and that the per period utility functions are time invariant. For t = 2;:::;T 1, let the matrix A t and the vector b t be defined by equation (1.60). If rankA t = 3d s 2, the per period utility functions 1=0 and 0 are identified, and 1=0 = 1 3 (p t ) +(p t1 ) +(p t2 ) F 1=0 t+1 t1 F 1=0 t t2 F 1=0 t1 A + t b t ; 0 = 1 2 L F 0 t+1 (I ds t1 F 0 t ) I ds A + t b t (p t ) (p t1 ) : Proof. See Appendix A.1. Remark 4 (Extension to multinomial choices). The identification arguments can be extended to multinomial choices by using the general Hotz-Miller inversion formulas (Hotz and Miller, 1993). Suppose the choice set f0;1;:::Jg contains J + 1 alternatives. By the Hotz-Miller’s formula, there existsf j : j = 1;:::;Jg and such that 8 > < > : v j t (S t )v 0 t (S t ) = j (p t (S t )) v t (S t )v 0 t (S t ) = (p t (S t )) ; where p t (S t ) = (P(D t = 1jS t );:::; P(D t = JjS t )) | . Equation (ID) becomes 8 > > > > > > > > > < > > > > > > > > > : 1 (p t (S t )) = 1=0 t (S t ) + t E 1=0 t+1 [v t+1 (S t+1 )jS t ]; t = 1;:::;T 1;8S t 2S; . . . . . . J (p t (S t )) = J=0 t (S t ) + t E J=0 t+1 [v t+1 (S t+1 )jS t ]; t = 1;:::;T 1;8S t 2S; (p t (S t )) =v t (S t ) 0 t (S t ) t E 0 t+1 [v t+1 (S t+1 )jS t ];t = 2;:::;T 1;8S t 2S: 41 1.4. Identification of Structural Parameters Each alternative j contributesd s (T1) equations (associated withf j (p t (S t )) :t = 1;:::;T1, for all S t 2 Sg); meanwhile the alternative j brings d s (T 1) additional unknownsf j=0 t :t = 1;:::;T 1g. So the degree of underidentification does not change as we include more alternatives. However, in the presence of the Exclusion Restriction, we have 8 > > > > > > > > > < > > > > > > > > > : 1 (p t (S t )) = 1=0 t (X t ) + t E 1=0 t+1 [v t+1 (S t+1 )jS t ]; t = 1;:::;T 1;8S t 2S; . . . . . . J (p t (S t )) = J=0 t (X t ) + t E J=0 t+1 [v t+1 (S t+1 )jS t ]; t = 1;:::;T 1;8S t 2S; (p t (S t )) =v t (S t ) 0 t (X t ) t E 0 t+1 [v t+1 (S t+1 )jS t ];t = 2;:::;T 1;8S t 2S: Each alternative j still contributes d s (T 1) new equations; meanwhile the alternative j brings only d x (T 1) additional unknownsf j=0 t :t = 1;:::;T 1g, because j=0 t is now d x -dimensional vector. So more alternatives provide more information about the structural parameters. The exact identification results for multinomial choices are slightly different from the above propositions, but the general idea is similar. Remark 5 (Rank conditions). The rank conditions in the above propositions are clearly important. This remark is to show that to satisfy the rank conditions, it is necessary that the choice can change the transition matrix, and that the the excluded variableZ t can affect the difference between the state transition probabilities under the two alternatives given the state variable X t that affects the per period utility functions. To be specific, consider the rank condition in Proposition 1. We have rankA = rank 0 B @ 2 6 4 MF 1=0 M(IF 0 ) 3 7 5 1 C A = rank(M(IF 0 )) + rank h (I ds P [M(IF 0 )] | )(MF 1=0 ) | i =d s d x + rank h (I ds P [M(IF 0 )] | )(MF 1=0 ) | i =d s d x +r: HereP [M(IF 0 )] | is the projection matrix generated by matrix [M(IF 0 )] | . To satisfy the rank condition (rankA =d s 1), we need r =d x 1. If there are many zero rows in MF 1=0 , r will be smaller than d x 1. Each row of MF 1=0 takes the form (f 1=0 (s 1 jx i ;z j )f 1=0 (s 1 jx i ;z j+1 );:::;f 1=0 (s ds jx i ;z j )f 1=0 (s ds jx i ;z j+1 )): Recall that f 1=0 (s k jx i ;z j ) = f(s k jx i ;z j ;1)f(s k jx i ;z j ;0). So the row will be zero if the choice does not 42 1.5. Estimation change transition matrix (hence f 1=0 (sjx;z) = 0), or the excluded variable Z does not affect the difference between the transition probabilities given the state variable X that affects per period utility functions (hence f 1=0 (sjx i ;z j )f 1=0 (sjx i ;z j+1 ) = 0). 1.5 Estimation All identification results in the previous section are constructive and follow from the linear system of equations (ID). The solution of the linear system has a closed form. Therefore, it is natural to estimate these identified structural parameters by replacing population parameters by sample estimates of the closed form solutions. The estimation proceeds in two steps. In the first step, we estimate the CCPfp t (S t ) : t = 1;:::;T 1g and the transition matrixfF d t+1 :t = 1;:::;T 1;d = 0;1g. Let ^ p t (S t ) and ^ F d t+1 be the estimates of the CCP p t (S t ) and transition matrix F d t+1 for each alternative d and each period t. For small state spaceS, the estimator of the CCP p t (S t ) is simply the proportion of D t = 1 in data for each S t . When the support of S t is large, a kernel estimator of p t (S t ) = E(D t jS t ) might be preferable. Similarly, for small state spaceS, an estimator of F d t+1 is simply the empirical frequency table of the transitions from S t to S t+1 given D t = d. When the support of S t is large, a smoothed approach may be preferable to avoid the issue of empty cells; see Aitchison and Aitken (1976). The second step is to estimate the structural parameters using the closed form solutions of the linear system under different identifying restrictions. We focus on the case with the Exclusion Restriction and known discount factors (Proposition 2). Moreover, assume that the transition matrices are also known. It follows from Proposition 2 that for eacht = 2;:::;T1, we have 1=0 t =W ((p t )F 1=0 t+1 A + t;l b t ); 1=0 t1 =W ((p t1 ) t1 F 1=0 t A + t;h b t ); 0 t =WL(A + t;h b t F 0 t+1 A + t;l b t (p t )): Then the estimators are ^ 1=0 t =W ((^ p t )F 1=0 t+1 A + t;l ^ b t ); ^ 1=0 t1 =W ((^ p t1 ) t1 F 1=0 t A + t;h ^ b t ); ^ 0 t =WL(A + t;h ^ b t F 0 t+1 A + t;l ^ b t (^ p t )); (1.61) 43 1.5. Estimation where ^ b t = 2 6 6 6 6 4 1 t1 M(^ p t1 ) M(^ p t ) M (^ p t ) 3 7 7 7 7 5 : Note that these estimators have a closed-form, and their computation is therefore easy. Suppose p n 0 B @ 2 6 4 ^ p t1 ^ p t 3 7 5 2 6 4 p t1 p t 3 7 5 1 C A! d N(0; t1;t ): We have p n(^ 1=0 t 1=0 t )! d N 0; (G 1=0 t ) | t1;t (G 1=0 t ) ; p n(^ 1=0 t1 1=0 t1 )! d N 0; (G 1=0 t1 ) | t1;t (G 1=0 t1 ) ; p n(^ 0 t 0 t )! d N 0; (G 0 t ) | t1;t (G 0 t ) ; by the delta method. Here G 1=0 t " @ 1=0 t @p t1 @ 1=0 t @p t # ; G 1=0 t1 " @ 1=0 t1 @p t1 @ 1=0 t1 @p t # ; G 0 t @ 0 t @p t1 @ 0 t @p t ; are gradient matrices, where @ 1=0 t @p t1 =WF 1=0 t+1 A + t;l r pt1 b t and @ 1=0 t @p t =Wr(p t )WF 1=0 t+1 A + t;l r pt b t ; @ 1=0 t1 @p t1 =Wr(p t1 ) t1 WF 1=0 t A + t;h r pt1 b t and @ 1=0 t1 @p t = t1 WF 1=0 t A + t;h r pt b t ; @ 0 t @p t1 =WLA + t;h r pt1 b t WLF 0 t+1 A + t;l r pt1 b t ; @ 0 t @p t =WLA + t;h r pt b t WLF 0 t+1 A + t;l r pt b t WLr (p t ); 44 1.5. Estimation with r(p) = 2 6 6 6 6 4 @(p(s1)) @p(s1) . . . @(p(s ds )) @p(s ds ) 3 7 7 7 7 5 and r (p) = 2 6 6 6 6 4 @ (p(s1)) @p(s1) . . . @ (p(s ds )) @p(s ds ) 3 7 7 7 7 5 ; r pt1 b t = 2 6 6 6 6 4 1 t1 Mr(p t1 ) 0 0 3 7 7 7 7 5 and r pt b t = 2 6 6 6 6 4 0 Mr(p t ) Mr (p t ) 3 7 7 7 7 5 : Remark 6 (Data requirement). To implement the estimators in equation (1.61), we need ^ p t1 , ^ p t , F 0 t , F 1 t , F 0 t+1 andF 1 t+1 . So if the transition matrices are unknown, we need panel data that cover the decision process over periods t 1, t and t + 1. If F d t =F d t+1 for both d = 0;1, or simply if F 0 t+1 and F 1 t+1 are known, we only need panel data to cover periods t 1 and t to estimate the CCP ^ p t1 and ^ p t . If the agent’s dynamic programming problem is stationary, and the transition matrices associated with both alternatives are known, we can estimate the per period utilities 0 and 1 even from the cross-sectional data using the closed-form formulas in Proposition 1. 1.5.1 Minimum distance estimation with parametric specification of the per period utility functions In this subsection, assume that the per period utility functions are parametrically specified: 1=0 t (X t ) = 1=0 t (X t ; t ) and 0 t (X t ) = 0 t (X t ; t ): For example, 1=0 t (X t ; t ) = X | t t and 0 t (X t ; t ) = X | t t . We now have the moment conditions about ( | t1 ; | t ; | t ): 1=0 t ( t )W ((p t )F 1=0 t+1 A + t;l b t ) = 0; 1=0 t1 ( t1 )W ((p t1 ) t1 F 1=0 t A + t;h b t ) = 0; 0 t ( t )WL(A + t;h b t F 0 t+1 A + t;l b t (p t )) = 0: Here 1=0 t ( t ) = ( 1=0 t (x 1 ; t );:::; 1=0 t (x dx ; t )) | ; 0 t ( t ) = ( 0 t (x 1 ; t );:::; 0 t (x dx ; t )) | : 45 1.6. Numerical Studies Table 1.1: Estimation of Period Utility Functions: d x = 3;d z = 4 T = 3 T = 4 Parameters Bias Sd Bias Sd 1=0 1 (x 1 ) < 1e 2 0.089 < 1e 2 0.082 1=0 1 (x 2 ) < 1e 3 0.097 < 1e 2 0.085 1=0 1 (x 3 ) < 1e 2 0.087 < 1e 2 0.088 1=0 2 (x 1 ) < 1e 2 0.238 < 1e 3 0.093 1=0 2 (x 2 ) < 1e 2 0.237 < 1e 3 0.094 1=0 2 (x 3 ) < 1e 2 0.191 < 1e 2 0.097 0 2 (x 1 ) -1.758 0.211 -1.752 0.138 0 2 (x 2 ) -1.752 0.210 -1.749 0.139 0 2 (x 3 ) -1.752 0.195 -1.751 0.146 Notes: (i) The terms "<1e2” and "<1e3” within the "Bias" column mean that the absolute value of the bias is smaller than 1e2 and 1e3, respectively. (ii) The simulated panel data have 5;000 cross-sectional observations. 1.6 Numerical Studies In the numerical experiments, we consider a singleX t and a single excluded variableZ t , and letS t = (X t ;Z t ). The decision horizon is T = 10. The structural period utility functions are 8 > < > : 1 t (X t ) =X t ; 0 t (X t ) = 1 2 + X t 3 ; t = 1;:::;T = 10; the discount factors t are 0:8 for all periods, and the utility shocks (" 0 t ;" 1 t ) are generated from type 1 extreme value distribution. The supportXfx 1 ;:::x dx g of X t are the d x cutting points that split the interval [0; 2] into d x 1 equally spaced subintervals. And the supportZfz 1 ;:::;z dz g of Z t are the cuttings points that split [0; 2] into d z 1 equally spaced subintervals. Let the state spaceSXZ, and d s =d x d z . The observable states S t = (X t ;Z t ) follows a homogenous controlled first-order Markov chain. Let F d be the time invariant d s d s transition matrix describing the transition probability law from S t to S t+1 given the discrete choice Y t = d. The transition matrix F d is randomly generated subjecting to the sparsity restriction that there are at most m s number of states that can be reached in the next period. The data for estimation are f (x it ;z it ;y it ) :i = 1;:::;n;t = 1;:::; 4g. So we observe only the first four periods of the dynamic decision process. Though the structural parameters in data generation process are time invariant, we will not use this condition in estimation. The cross-sectional sample size n is 5; 000 throughout the numerical studies. 46 1.6. Numerical Studies Table 1.2: Estimation of Period Utility Functions: d x = 30;d z = 4 Direct Smooth Parameters Bias Sd Bias Sd 1=0 1 (x 0:25 ) -0.011 0.305 < 1e 2 0.116 1=0 1 (x 0:50 ) < 1e 2 0.299 < 1e 2 0.107 1=0 1 (x 0:75 ) 0.012 0.305 0.015 0.114 1=0 2 (x 0:25 ) < 1e 2 0.333 < 1e 2 0.128 1=0 2 (x 0:50 ) < 1e 2 0.322 < 1e 2 0.113 1=0 2 (x 0:75 ) 0.015 0.333 0.017 0.123 0 2 (x 0:25 ) -1.745 0.835 -1.738 0.288 0 2 (x 0:50 ) -1.729 0.805 -1.735 0.247 0 2 (x 0:75 ) -1.732 0.838 -1.729 0.270 Notes: (i) The term "< 1e2" within the "Bias" column means that the absolute value of the bias is smaller than 1e2 and 1e3, respectively. The termsx 0:25 ,x 0:50 , andx 0:75 are the 25%, 50% and 75% quantiles of state variable X, respectively. (ii) The simulated panel data have 5;000 cross-sectional observations. 0.0 0.5 1.0 1.5 2.0 -1.0 -0.5 0.0 0.5 1.0 1.5 X Period Utility Difference 0.0 0.5 1.0 1.5 2.0 -1 0 1 2 X Period Utility 0 True value Direct estimation Estimation using smooth conditions Figure 1.1: Confidence Interval (95%) of Period Utility Functions: d x = 30;d z = 4 47 1.7. Concluding Remarks In the first setting, we let d x = 3, d z = 4 and m s = 3. We randomly generated 50 sets of transition matrices (F 0 ;F 1 ), and for each pair of transition matrices, we run the closed form linear estimation in equation (1.61) with 300 replications. So there are 15,000 number of replications in total. In the estimation, we assume both transition matrices and discount factors are known. The estimation results are summarized in table 1.1. Notice that for the identified period utility difference, the bias is nearly zero. But the partially identified period utility function 0 2 (X t ), there is a bias that is constant over the different state values. This is also consistent with what our theory predicts. It is also remarkable that there is substantial decrease in standard deviations when we increase the sampling length from 3 to 4. This corresponds to our claim in section §2.4 that using v t+1;+ = t A + t+1;h b t+1 rather than v t+1;+ = A + t;l b t can improve the estimation accuracy. In the second setting, we let d x = 30, d z = 4 and m s = 3. Again we randomly generated 50 sets of transition matrices, and run 300 replications for each pair of transition matrices. For this setting, we only simulate panel data with length of 4 periods. Two estimation procedures are used for this setting. The first is to use the estimators in equation (1.61) directly. The second one refines the initial estimates from the first approach by smoothing approximation. To see their performance, we plotted the 95% confidence interval of the period utility difference 1=0 2 and the period utility function 0 2 for both methods. Clearly, when the state space is large, it is advantageous to use the smooth conditions to improve the raw estimates of period utility functions. In table 1.2, we also report the bias and standard error for the period utility functions evaluated at the 25%, 50% and 75% quantiles of the supportX. 1.7 Concluding Remarks The identification and estimation of DPDC models are considered to be complicated and numerically difficult. This paper shows that the identification of DPDC model is indeed equivalent to the identification of a linear GMM system. So the identification and estimation of DPDC models become easy to address. We show how to identify DPDC models under a variety of restrictions. In particular, we show how to identify the DPDC model without normalizing the period utility function of any alternative. This case is particularly important because we show that normalization of period utility functions can usually bias the counterfactual policy predictions. Due to the equivalence to a linear GMM system, we show how to estimate DPDC models using linear estimation approach without using any terminal conditions or assuming the dynamic programming problem is stationary. The implementation of our estimator does not involve any numerical optimization or iteration. There are two practically important extensions of this paper. First, one can extend the paper by 48 1.7. Concluding Remarks incorporating the unobservable heterogeneity. Several papers, including Kasahara and Shimotsu (2009) and Hu and Shum (2012), have studied the identification of the CCP, when there are unobservable heterogeneity, such as discrete types in Kasahara and Shimotsu (2009). Since our identification of per period utility functions depends on the state transition matrix and the CCP only, one can then identify the type-specific per period utility functions by using the identified type specific CCP (and state transition distributions). Second, most paper in the literature of DPDC study the identification under the assumption that the distribution of utility shocks are known. Depending on the sensitivity of the parameter estimates on the specification of the error distribution, allowing the error distribution to be unknown could be practically important. SupposeG is a set of possible distributions of the utility shocks. Each error distribution G2G defines a pair of functionals and being used in our identification arguments. In addition, each pair of functions and will give rise to a set of formulas of the identified structural parameters according to Proposition 2 or other propositions. Therefore, we can explicitly characterize the identified set of the per period utility functions. 49 Chapter 2 Econometrics of Buy-Price English Auctions This chapter provides a framework for the empirical study of buy-price English auctions. Within the condi- tionally independent private value paradigm, I show how to identify the underlying distribution of bidders’ valuations from the observed bids and the observed ratio of the number of auctions ended at buy price to the number of auctions in the whole sample. The paper then develops a smooth monotone polynomial spline (PS) estimator for the latent distribution of private valuations. Under mild conditions, both L 2 and uniform convergence rates of the estimator are obtained. The pointwise asymptotic normality of the proposed estimator is also established. Basing on the contraction mapping theorem, I propose an fast iterative algorithm to implement the proposed PS estimator. 2.1 Introduction A buy-price (BP) English auction is an English auction for a single object in which there is an option for bidders to guarantee a purchase at a seller-specified buy price at any time. Any bidder can immediately win the object by bidding at the buy price. BP English auctions are very popular among on-line auctions. Major electronic commerce websites, such as Yahoo! and eBay, provide sellers with an option of setting a BP English auction. Given the popularity of BP English auctions, there have been several theoretical articles (Budish and Takeyama, 2001; Mathews, 2004; Reynolds and Wooders, 2009; Kirkegaard and Overgaard, 2008; Hidvegi et al., 2006) addressing the superiority of BP English auctions over standard English auctions. Budish and Takeyama (2001) address this issue in a simple two-bidder, two-value model. They conclude that with a properly set buy-price, the seller facing risk-neutral buyers earns the same expected profit as in a standard English auction, but the seller facing risk-averse buyers gains higher expected profit. Hidvegi et al. (2006) generalize Budish and Takeyama’s results in a general setting of multiple bidders with an arbitrary continuous distribution of valuations. Kirkegaard and Overgaard (2008) provide additional justification in a dynamic 50 2.2. Auction Setup model. Despite the explosive applications of BP English auctions in the real world and theoretical literature about it, the empirical examination of the theory has been lagged behind the theoretical work for years. There is no formal econometric theory developed for analyzing BP English auctions to the best of the author’s knowledge in spite of the abundant econometric literature about standard auctions. The existing econometric methods of auctions (Paarsch, 1992; Laffont et al., 1995; Guerre et al., 2000; Haile and Tamer, 2003; Hong and Shum, 2003) cannot be applied to analyze BP English auctions because the equilibrium in a BP English auction is largely different from the equilibrium in a standard English auction. Specifically, Hidvegi et al. (2006) show that in the equilibrium of a BP English auction, a bidder with valuation above the buy price would bid the buy price once the auction clock reaches her strategically chosen threshold price. Such a “threshold” strategy leads to censoring in the observation of bidders’ bids, which makes the identification and estimation of the underlying valuation distribution complicated. In this article, I develop an econometric framework for the empirical studies of BP English auctions. I begin with a brief description of Bayesian-Nash equilibrium bidding strategies in BP English auctions. These equilibrium strategies are then exploited to identify the distribution of bidders’ valuations. Moreover, a nonparametric approach is proposed to deal with the estimation issue. Polynomial spline estimators are developed to estimate the unknown valuation distribution given the observed bids and number of bidders in BP English auctions. An algorithm based on the contraction mapping theorem is designed to implement the proposed estimator. I derive the L 2 as well as L 1 convergence rates of the proposed estimators. The pointwise asymptotic normalities of the proposed spline estimators are also established. The paper proceeds as follows. In 2.2, I describe the framework of the buy-price English auctions studied in this article. Section 2.3 discusses the identification of this auction model. The nonparametric estimators are constructed in Section 2.4 to estimate the latent valuation distribution. Section 2.5 reports some simulation studies of my estimators, whereas section 2.6 offers concluding remarks. Finally, all mathematical derivations have been relegated to the appendix B. 2.2 Auction Setup Throughout I represent random variables in upper case and their realizations in lower case. I consider a BP English auction where a single indivisible good is sold to one of M2 2;:::;M bidders with the seller specified buy price B. The realizations of M and B are both observable to econometricians. The valuation 51 2.2. Auction Setup V i of bidder i2f1;:::;mg is specified as V i =B +W i ; (2.1) with B; W 1 ;:::; W m 0 mutually independent. In view of (2:1), I add an auction-specific covariate, buy price B, into bidder i’s valuation V i . As a consequence, bidders’ valuations are not independent, while they are independent conditional on the buy priceB. Such a specification follows from Li et al. (2000). It is reasonable to specify a bidder’s valuation as a monotone increasing function of the buy price like (2:1) in recognition of the fact that the buy price usually measures the market value of the good for sale. W i is drawn independently from an identical distribution F W () whose support is [w; w]. Assume the support of B is a compact subset S B of R + , and inf B2S B Bw to ensure V i 0. Assumption B.1 in Appendix B is made for F W (), which basically requires F W () to be a “regular” continuous distribution function. Similarly, I define v =w +b and v =w +b, then the support of V i is [v; v] conditional on B =b. Conditional on B, the valuations V 1 ;:::;V m are independent and identically distributed (i.i.d.). Let F V jB =b be the conditional distribution of V i given B =b, which can be rewritten as F V vjB =b =F W vb ; v2 [v; v]: (2.2) Throughout the course of auctions, bidders know their own valuations but not their opponents’. This can be a strict assumption since equilibrium learning is a remarkable feature of ascending auctions as shown in Hong and Shum (2003). A reserve price r is designated by the seller before the auction begins. Assume r v, so the potential bidders are the actually participated bidders. Let u (x) denote the bidder’s von Neumann-Morgenstern utility function, where x is the difference between the buyer’s valuation and his payment if he wins and is zero if otherwise. The details of the auction are specified as a “modified English clock auction” in Hidvegi et al. (2006). The rules of that kind of auction are summarized as follows: 1. The seller specifies the reserve price r and the buy price b before the auction begins. At the beginning of the auction, the auction clock is set at the reserve price. There are two buttons, a “bid” button and a “buy” button, available for each bidder. By holding the “bid” button, a bidder signals his willingness to buy the good at the current clock price. Once a bidder’s bid-button is released, the bidder quits the auction and cannot go back to the auction again. Throughout the auction, bidders don’t know how many opponents left in the auction. At any time, a bidder can take the good at the buy price by pressing the “buy” button. 52 2.2. Auction Setup 2. At any period of the auction, if there is one “buy” button pressed, the auction ends, and the bidder pressed the “buy” button wins the object and pays the buy price. If more than one “buy” buttons are pressed, the winner is randomly chosen from those bidders who pressed the “buy” buttons. If no “buy” button is pressed, and only one “bid” button is pressed, the bidder pressing the “bid” button wins the auction and pays the current auction clock. If there are multiple “bid” buttons pressed but no “buy” button is pressed, the auction clock ascends. If neither “bid” nor “buy” button is pressed, the auction ends without a sale. Hidvegi et al. (2006)show that there are four possible pure strategies that a bidder can follow in a “modified English clock auction”: (1) “traditional”: bid up to his valuations; (2) “threshold”: keep bidding until winning or his threshold price is reached. Once the auction clock reaches his threshold price, bid the buy price immediately; (3) “conditional”: bid, but use the buy price immediately if at least one other bidder bids at or above the reserve price; (4) “unconditional”: bid the buy price immediately with no conditions. Theorem 1 of Hidvegi et al. (2006) is used to define the unique Bayesian Nash equilibrium in a BP English auction described as above. Theorem 1. Under the rules of the “modified English clock auction” and Assumption B.1, an m-bidder buy price English auction with the buy price b has a unique Bayesian Nash equilibrium determined by constantse v andb v and function t v; b , b<e vb vv, such that all bidders with valuation v have the following strategies: 1. Use the traditional strategy if v<b; 2. Use the threshold strategy with a threshold price t v; b 2 r; b ifbv<e v, wheret v; b is a function defined by the differential equation u vt v; b u vb 1 + F m1 V vjB =b @ 1 F m1 V jB =b t v; b = 0; (2.3) @ 1 t b; b =1; t b; b =b; 3. Use the conditional strategy ife vv<b v; 4. Use the unconditional strategy if vb v. The threshold price t v; b is strictly decreasing and continuous in v for v2 b;e v . All bidders with vb follow the threshold strategy, if and only if lim v!v t v; b =r. Theorem 1 says that there is a unique Bayesian Nash equilibrium for the BP English auctions specified in the 53 2.3. Identification of the Distribution of Valuations “modified English clock auction”. When the bidder has constant absolute risk aversion (CARA) utility, u a (x) = 8 > > < > > : a 1 f1 exp (ax)g; a> 0; x; a = 0; the threshold price t v; b is determined by the equation, v t(v;b) u a bx dF m1 V xjB =b = 0; (2.4) by substituting u a (x) in (2:3). Without loss of generality, I assume bidders have the CARA utility in the remainder of the present paper. For the sake of simplicity, I make the following assumption, which makes the threshold strategy the only choice for bidders whose valuations are above b. Assumption 5. Let t v; b be the threshold price defined in (2:3) and suppose the reserve price r is low enough that lim v!v t v; b r. When the reserve price r is low enough, Assumption 5 can be satisfied naturally. Under Assumption 5, Theorem 1 implies that the bidder with vb will choose the threshold strategy only. For i = 1;:::;m, let B i denote bidder i’s bid, and the random variables B 1:m ;:::; B m:m represent the order statistics of the bids B 1 ;:::;B m , where B k:m denotes the k-th order statistic among the m bids. Similarly, let V 1:m ;:::; V m:m denote the ordered valuations, and let F k:m V jB =b denote the conditional distribution of V k:m given B =b. 2.3 Identification of the Distribution of Valuations In this section, I discuss the identification of the conditional distribution of valuations given buy price F V jB =b . A remarkable feature of BP English auctions is that with probability 1 it can terminate in two ways. The first is that the auction ends with no bidder using the buy price option. This can happen if no bidder’s valuation is above the buy price, or the auction clock does not reach any bidder’s threshold price before the end of the auction. This kind of BP English auctions will be referred to as Type I BP English auctions in the sequel. The second is that the auction ends with one bidder executing the buy price option, and this kind of auction will be referred to as Type II BP English auction. Note that as long as rv, the probability that the auction ends without a sale is zero. In addition, because the threshold price t V; b is strictly decreasing and continuous in V 2 b; v , the probability that the auction ends with more than one bidder pressing the “buy” button is also zero under Assumption 5 and B.1. The distinction between these 54 2.3. Identification of the Distribution of Valuations two possible endings is necessary, because the methodologies for identifying and estimating the valuation distribution in the contexts of these two cases are largely different. To make this distinction clear, I add subscripts “B” and “E” as necessary to the variables previously defined where “B” and “E” represent the case in which the auction ends with and without executing the buy price option, respectively. Under the rules of the “modified English clock auction”, I have the following two lemmas. Lemma 1. LetB m:m E be the transaction price of a Type I BP English auction, andV (m1):m E be the (m 1)-th order statistic among the m valuations V 1;E ;:::; V m;E . Under Assumption 5 and B.1, B m:m E =V (m1):m E , and V (m1):m E <b. Lemma 2. Let B (m1):m B be the (m 1)-th order statistic among the m bids B 1;B ;:::; B m;B in a Type II BP English auction, and V m:m B be the maximum of the valuations V 1;B ;:::; V m;B . Under Assumption 5 and B.1, B (m1):m B =t V m:m B ; b , and V m:m B b. Lemma 1 commonly appears in the literature about first-price or ascending auctions, e.g., Guerre et al. (2000), and Athey and Haile (2002). Lemma 2 is kind of “tautology” of the definition of the “threshold” strategy. By the “threshold” strategy, a bidder with the valuation above the buy price would bid at the buy price once the auction clock reaches his threshold price. Conversely, the highest or the last bid before the emergence of the buy price, i.e. B (m1):m B , is the threshold price of the winner whose valuation is V m:m B b. Remark 7. Lemma 1 and 2 hold under the rules of the “modified English clock auction”. Possible deviations from the settings can undermine these two lemmas. For example, the jump bidding behavior in English auctions can destroy Lemma 1 as pointed out by Haile and Tamer (2003). Actually, the jump bidding also makes Lemma 2 untenable. This issue will be pursued at the end of this article. Hereafter, I set the conditioning variable B =b as default, and do not specify it anymore when there is no abuse of notations. To identify F V vjB =b for v2 [v; v], note the following facts F V vjV <b = F V vjB =b F V bjB =b ; v2 v; b ; therefore, F V vjB =b = b V F V vjV <b ; v2 v; b ; (2.5) where b V :=F V bjB =b . Similarly, F V vjV b = F V vjB =b b V 1 b V ; v2 b; v ; 55 2.3. Identification of the Distribution of Valuations which means F V vjB =b = 1 b V F V vjV b + b V ; v2 b; v : (2.6) It therefore seems plausible that to identify F V jB =b on its whole support, one can first identify the two conditional distributions F V jV <b and F V jV b and the probability b V . Then combine them together through (2:5) and (2:6). The development of the identification of F V jB =b thus proceeds in three steps. First, I show that F V jV <b can be identified from the transaction price B m:m E in Type I BP English auctions (Theorem 2). Then, I show the probability b V can be identified from the ratio between the numbers of Type I to Type II BP English auctions in collected auction data provided that F V jV <b and F V jV b are known (Theorem 3). Finally, I show how to identify F V jV b , and F V vjB =b for v2 [v; v] (Theorem 4). Theorem 2. Let the valuation be generated by (2:1) and suppose Assumption 5 and B.1 hold. F V jV <b and F W (jW < 0) can be identified from B m:m E . Theorem 2 says that one can identify F V jV <b from B m:m E . This result is quite standard in the literature about the first-price or ascending auctions. However, B m:m E can inform us only a part of F V jB =b . Theorem 3 gives a link between b V and the probability that a generic BP English auction ends without using the buy price option. Theorem 3. Let the valuation be generated by (2:1) and suppose Assumption 5 and B.1 hold. Let m E denote the probability that an m-bidder buy-price English auction ends without using the buy price option. 1. With probability 1, an m-bidder buy-price English auction would end without using the buy price option if and only if one of the following two events happens: the first is that all bidders’ valuations are below the buy price; and the second is that V m:m b and t (V m:m )>V (m1):m . 2. m E can be written as follows, m E = b V m +E V m:m jV m:m b 0 @ " F V t (v)jB =b F V vjB =b # m1 1 A n 1 b V m o = b V m +E V m:m jV m:m b 0 B @ 2 4 b V F V t (v)jV <b 1 b V F V vjV b + b V 3 5 m1 1 C A n 1 b V m o : (2.7) 56 2.3. Identification of the Distribution of Valuations A remarkable fact about m E is that it can be easily estimated. Suppose we have observations of n m-bidders BP English auctions with the valuation generated by (2:1) with a common F W (), then the event that whether or not a BP English auction ends without using the buy price option, i.e. turning out to be a Type I BP English auction, can be viewed as a Bernoulli trial with the probability m E of being a Type I BP English auction. Thus, m E can be estimated from the ratio of the number of Type I to Type II BP English auctions among the observed n auctions. The identification of F V vjV b is based on (2:4), which can be transformed into the following form via the rule of integration by parts, F m1 V vjB =b = F m1 V t (v)jB =b " u a bt (v) u a bv # + v t(v) ( u 0 a bx u a bv ) F m1 V xjB =b dx; vb: (2.8) Let e F V jV b be an arbitrary specification of the conditional valuation distribution F V jV b . In view of Theorem 3, one hase b V which is the solution to the following equation, m E = e b V m +E e F V m:m (jV m:m b) 0 B @ 2 4 e b V F V t (v)jV <b 1e b V e F V vjV b +e b V 3 5 m1 1 C A n 1 e b V m o ; (2.9) where e F V m:m jV m:m b equals to n e F V jV b o m by the definition the distributions of order statistics. By (2:5) and (2:6), e F V vjB =b = 8 > > < > > : e b V F V vjV <b ; v<b; 1e b V e F V vjV b +e b V ; vb; (2.10) where F V vjV <b is identified from B m:m E following from Theorem 2. Then e F V vjB =b in (2:10) gives a specification ofF V jB =b on its whole support. By definition, the true conditional valuation distribution F V jB =b must satisfy (2:8), and I will show there is a unique solution to (2:8). As a consequence, if we can find a e F V jB =b satisfying (2:8) usingB m:m E ,B (m1):m B and m E ,F V jB =b is said to be identified from B m:m E , B (m1):m B and m E . Theorem 4 below establishes this possibility. Instead of stating Theorem 4 immediately, a sketch of its proof beforehand can be beneficial for understanding it. Moreover, the proof itself would serve as a motivation for designing an iterative algorithm for estimating F V jB =b in the next section. A remarkable property of (2:8) is that it suggests an updating rule: given aF V vjB =b ,v2 [v; v], as an 57 2.3. Identification of the Distribution of Valuations input in the right-hand-side (RHS) of (2:8), we have an output F V vjB =b , vb, from the left-hand-side (LHS) of (2:8). It therefore seems plausible that given an initial guess e F V jV b of F V jV b , one can iteratively use (2:8), (2:9) and (2:10) to update e F V jV b until convergence if there is a convergence. Theorem 4, the main result of this section, verifies this conjecture. I briefly describe the iteration process below. The following descriptions serve only illustrative purpose, and the formal algorithm can be found in Algorithm 1 of section §2.4. Step 1. Let e F V jV b be the guess of F V jV b ,e b V and e F V jB =b be defined by (2:9) and (2:10), respectively. Step 2. Lemma 2 says B (m1):m B =t V m:m B ; B =b , and let V m:m B =t 1 B (m1):m B ; B =b . Compute e V m:m B from the following equation e F m1 V e V m:m B jB =b = e F m1 V B (m1):m B jB =b 8 < : u a bB (m1):m B u a b e V m:m B 9 = ; + e V m:m B B m:m B 8 < : u 0 a bx u a b e V m:m B 9 = ; e F m1 V xjB =b dx; e V m:m B b: Step 3. Update F V vjB =b for vb with F m1 V e V m:m B jB =b = e F m1 V B (m1):m B jB =b 8 < : u a bB (m1):m B u a b e V m:m B 9 = ; + e V m:m B B m:m B 8 < : u 0 a bx u a b e V m:m B 9 = ; e F m1 V xjB =b dx; e V m:m B b; where F V jB =b denotes the updated version of F V jB =b . Denote F V jV b the updated version of e F V jV b , which is defined by F V jV b = F V e V m:m B jB =b e b V 1e b V : Step 4. If the convergence has been reached according to certain criterion, stop the iteration. Otherwise, set e F V jV b = F V jV b , and go back to Step 1. If this iteration process converges, and the limit point is unique, then the limit point must satisfy (2:8), and it isF V jB =b by definition. So,F V jB =b can be identified. I end this section with the following theorem which verifies the existence of the unique limit point of the algorithm and establishes the identification of F V jB =b . 58 2.4. Estimation Theorem 4. Let the valuation be generated by (2:1) and suppose Assumption 5 and B.1 hold. 1. There is a unique F V jB =b satisfying (2:8); 2. F V jV b and F W (jW 0) can be identified from B m:m E , B (m1):m B and m E ; 3. F V jB =b and F W () can be identified from B m:m E , B (m1):m B and m E on their whole supports. 2.4 Estimation Suppose now that the researcher observes bids in m j -bidder, m j 2 2;:::;M , BP English auctions t = 1;:::;T j . I will add subscript t and j as necessary to the variables previously defined. For example, denote b j t the buy-price of the t-th m j -bidder BP English auction. In each auction, bidders draw their valuations independently from the distribution F V jB =b j t which originates from F W () according to (2:1). Since the estimation of F V jB =b is identical to the estimation of F W () by (2:2), I will restrict attention to F W () in the sequel. My estimation of F W () is based on the identification results obtained in the previous section. In analogy with the identification approach, the development of estimation proceeds in two steps. I first develop a polynomial spline (PS) estimator for F W (jW < 0). Based on the estimate of F W (jW < 0), I then show how to estimate F W (jW 0) and b V simultaneously with an iterative algorithm. The algorithm returns a PS estimator for F W (jW 0) and a generalized method of moments (GMM) estimator forF W (0). With the estimates ofF W (jW < 0),F W (jW 0) andF W (0),F W () itself can be estimated on its whole support. 2.4.1 A PS Estimator for F W (jW < 0) Theorem 2 says F W (jW < 0) can be identified from the transaction price B m:m E in Type I BP English auctions. For j = 1;:::;J, letB E j = n b mj :mj E;t o T E j t=1 be the observations on B mj :mj E ,B E =[ J j=1 B E j , T E = P J j=1 T E j , andB E j = n b j E;t o T E j t=1 be the observations on the buy price of Type Im j -bidder BP English auctions. By Lemma 1, b mj :mj E;t equals to v (mj1):mj E;t , which is the (m j 1)-th order statistic in the m j valuations of the t-th Type I m j -bidder BP English auction. To save space, the following two notations are defined, F (mj1):mj W (w) = P W (mj1):mj wjW (mj1):mj < 0 ; w< 0; F W () =F W (jW < 0): 59 2.4. Estimation In this subsection, I develop a PS estimator for F W () based on the observations inB E . Before presenting my estimation method, the issue of over-identification in F W () in the context of multiple transaction prices B m1:m1 E ;:::; B m J :m J E is pointed out as a motivation for my approach. GivenB E j andB E j , one can transform b mj :mj E;t into w (mj1):mj E;t :=b mj :mj E;t b j E;t =v (mj1):mj E;t b j E;t ; w (mj1):mj E;t < 0: DenoteW E j = n w (mj1):mj E;t o T E j t=1 andW E =[ J j=1 W E j . Then, eachW E j ,j = 1;:::;J, can be used to construct a satisfying estimator for F (mj1):mj W () with standard nonparametric estimators, like the empirical CDF (ECDF), or kernel estimators. To estimate F W (), the standard approach in the auction literature is to use the following well known relationship between the distribution of an order statistic and its parent distribution, F k:m 0 (x) =F k 0 (x) mk X l=0 0 B @ k +l 1 k 1 1 C Af1F 0 (x)g l =fF 0 (x) ; k; mg; (2.11) where F k:m 0 () is the distribution of the k-th order statistic in m random samples, and F 0 () is its parent distribution. GivenanestimatorforF (mj1):mj W (), anestimatorforF W (), whichistheparentdistributionof F (mj1):mj W (), can be readily derived by solving (2:11) since (; m j 1; m j ) is a monotone transformation. When the number of bidders in those observed auctions is fixed, i.e. m j =m forj = 1;:::;J, the standard approach is free from doubt. However, in practice, the number of bidders can vary across auctions. Under such a circumstance, a direct consequence of using the standard estimation method is that eachW j E , j = 1;:::;J, leads to an estimator for F W (). As a consequence, one would have J different estimators for F W (). Which one should we adopt in practice since different estimators will usually produce different estimates for F W (w ) at a specific w < 0? Moreover, even if one can select a particular estimator from these J candidates by using certain criteria, like running cross-validation or the AIC criterion, such a selection proceeds at cost of dumping a lot of information contained in the other order statistics. Thus it is desirable to develop a new approach of estimating the parent distribution F W (), which (1) produces a single nonparametric estimator for F W () with desirable statistical properties; (2) is efficient in the sense that the estimator uses the information provided by all order statistics, i.e.W E ; (3) leads to a smooth and monotone nonparametric CDF estimator. The third requirement is based on the fact that the CDF is a monotone continuous function in most cases. The remainder of this subsection is devoted to develop such an approach. One can easily derive the ECDF F (mj1):mj W;T E j (w) = T E j 1 P T E j t=1 1 w (mj1):mj E;t w withW j E for each j = 1;:::; J. According to the Glivenko-Cantelli theorem, F (mj1):mj W;T E j () converges to F (mj1):mj W () 60 2.4. Estimation uniformly as T E j !1. Therefore, F (mj1):mj W;T E j (w) converges to F (mj1):mj W (w) almost surely for w < 0. Then a natural estimator for F W w (mj1):mj E;t is the solution of the following equation, F (mj1):mj W;T E j w (mj1):mj E;t = n F W w (mj1):mj E;t ; m j 1; m j o : (2.12) for t = 1;:::;T E j , and j = 1;:::;J. Denote F W;T E j w (mj1):mj E;t the solution of (2:12). By the continuous mapping theorem, F W;T E j w (mj1):mj E;t converges to F W w (mj1):mj E;t almost surely as T E j !1. The following result serves as a motivation for my primary specification (2:13) below. Lemma 3. Let F W;T E j w (mj1):mj E;t be the solution of (2:12). If W (mj1):mj E;1 ;:::; W (mj1):mj E;T E j are i.i.d. samples from F (mj1):mj W (), E n F W;T E j w (mj1):mj E;t o !F W w (mj1):mj E;t ; as T E j !1 for t = 1;:::; T E j , and j = 1;:::; J. Remark 8. AlthoughF (mj1):mj W;T E j () is an unbiased estimator forF (mj1):mj W (),F W;T E j () is not necessarily an unbiased estimator for F W () because F W () is a nonlinear transformation of F (mj1):mj W () according to (2:12). Thus, F W;T E j () is not guaranteed to be unbiased with respect to F W (). Only the asymptotic relation between E F W;T E j and F W can be stated. Lemma 3 suggests the following “regression model” F W;T E j w (mj1):mj E;t =F W w (mj1):mj E;t + tj ; t = 1;:::; T E j and j = 1;:::; J; (2.13) where tj ’s are error terms with asymptotic zero mean. This “regression model” motivates us to design a smooth estimator for F W () by treating (2:13) as a polynomial spline (PS) regression model with pseudo observations n w (mj1):mj E;t ; F W;T E j w (mj1):mj E;t o T E j ;J t=1;j=1 . Xue and Wang (2010) recently use the PS regression model to estimate distribution function with the observed i.i.d. samples from the distribution itself. Polynomial splines are special functions defined piecewise by polynomials connected smoothly over a set of interior knots. The support of F W () is [w; 0), which is known. When w is unknown, set w = min wjw2W E . Let K T E := w = 0 =::: = p < p+1 <:::< p+q T E < p+q T E +1 =::: = 2p+q T E +1 = 0 be a partition of the support ofF W (). There areq T E inner knots inK T E, andp is the degree of the B-splines 61 2.4. Estimation defined below. Because p +q T E is frequently used below, denote e q T E :=p +q T E hereafter. Then one can approximate the unknown distribution function F W by a linear combination of the B-spline basis, F W (w) e q T E X k=0 k B k;p (w); w2 [w; 0); (2.14) where B k;p (w) is the basis B-spline of degree p, and := 0 ;:::; e q T E 0 is a vector of unknown coefficients. The B-splines of degree p can be defined recursively by the Cox-de Boor formula, B k;0 (w) := 8 > > < > > : 1; if k w k+1 0; otherwise ; k = 0;:::;q T E + 2p; B k;p (w) := w k k+p k B k;p1 (w) + k+p+1 w k+p+1 k+1 B k+1;p1 (w); k = 0;:::;e q T E: Major statistical softwares, e.g., R, S-Plus, and SAS have subroutines that can compute B k;p (w) given the input of knots K T E and degree p. A more complete study of spline functions can be found in Schumaker (2007). Substituting (2:14) into (2:13) leads to the following “linear regression” form, F W;T E j w (mj1):mj E;t = e q T E X k=0 k B k;p w (mj1):mj E;t + tj ; t = 1;:::; T E j and j = 1;:::;J; (2.15) thereby making the model amenable to analysis along the lines of classical linear regression analysis. In addition, a notable feature of the “linear regression” in (2:15) is that tj contains heteroskedasticity. Such a heteroskedasticity mainly sources from the fact T E j may vary acrossj = 1;:::; J. Conditional onw (mj1):mj E;t , the variance of tj equals to the variance of F W;T E j w (mj1):mj E;t which obviously depends on T E j . Such a heteroskedasticity problem can be treated easily following the classical generalized least square (GLS) fashion. Suppose 2 tj :=var ( tj ) is known, the estimation of can be based on the following GLS principle, e T E = arg min 2R e q T E +1 J X j=1 T E j X t=1 2 6 4 8 < : F W;T E j w (mj1):mj E;t e q T E X k=0 k B k;p w (mj1):mj E;t 9 = ; 2 = 2 tj 3 7 5: (2.16) Then a smooth estimator for F W (w) can be defined by e F W;T E (w) = e q T E X k=0 e k;T EB k;p (w); w2 [w; 0); (2.17) 62 2.4. Estimation where e T E = e 0;T E;:::;e e q T E ;T E 0 . I call e F W;T E () in (2:17) the “B-spline linear estimator”. The numerical implementation of this estimator is easy, because the whole estimation procedure involves only calculating ECDF, solving simple equations, and running the GLS. However, e F W;T E () defined in (2:17) is not necessarily monotone. A sufficient condition for e F W;T E () to be monotone nondecreasing is that e k1;T E e k;T E for k = 1;:::;e q T E (c.f. Theorem 5.9 of Schumaker, 2007). To guarantee this condition to be satisfied, one can implement (2:16) by constraining2M := 2R e q T E +1 j k1 k ; k = 1;:::;e q T E , b T E = arg min 2M J X j=1 T E j X i=1 2 6 4 8 < : F W;T E j w (mj1):mj E;t e q T E X k=0 k B k;p w (mj1):mj E;t 9 = ; 2 = 2 tj 3 7 5: Denote the corresponding estimator for F W () as b F W;T E (w) = e q T E X k=0 b k;T EB k;p (w); w2 [w; 0); (2.18) and I call b F W;T E () the “B-spline constrained linear estimator”. An R package “ORDER2PARENT’ has been developed by the author, and interested readers can download the package from local CRAN mirrors. When 2 tj is unknown, one can use the feasible GLS approach to estimate as we do in the standard regression analysis. Details about this issue will not be further pursued here. The selection of the knots sequence K T E, especially the number of the inner knots q T E, has influence on the accuracy of the proposed estimator. Following Xue and Wang (2010), I use a set of knots equally spaced in the percentile ranks inW E by taking k =w [T E k=(q T E +1)] , k = 1;:::;q T E. The number of knots, q T E, is selected by the AIC criterion. To be specific, I denote a distribution estimator b F W;T E () for F W () with the number of knots q by b F W;T E (; q). The optimal knot number q T E minimizes the AIC, q T E = arg min q log 2 4 T E 1 J X j=1 T E j X t=1 n F W;T E j w (mj1):mj E;t b F W;T E w (mj1):mj E;t ; q o 2 +2 (q +p)=T E : (2.19) Xue and Yang (2006) suggest searching for q T E within max (0:5e q; 1); min 5e q; T E =4p where e q = T E 1=(2p+3) . Simulation studies show that the performance of the proposed estimator is robust to the choice of degree p. Denote L 2 [w; 0) the space of square integrable real-valued functions on [w; 0). For any g2 L 2 [w; 0), 63 2.4. Estimation definekgk 2 = E WjW<0 g 2 (W ) andkgk 2 T E = T E 1 P J j=1 P T E j t=1 g 2 W (mj1):mj E;t . Also, setkgk 1 = sup w2[w; 0) jg (w)j. The asymptotic properties of the proposed estimator b F W;T E () are summarized in the following theorems. Theorem 5. Let the valuation be generated by (2:1) and suppose Assumption 5, B.1 and B.2(1)-(2) hold. 1. b F W;T E F W =O p q (p+1) T E + T E 1=2 ; 2. b F W;T E F W 1 =O p q (p+1) T E + logT E =T E 1=2 . Theorem 6. Let the valuation be generated by (2:1) and suppose Assumption 5, B.1 and B.2(1)-(2) hold. If T E 1=2 q (p+1) T E ! 0, T E 1=2 n b F W;T E (w)F W (w) o ! d N 0; 2 (w) ; for all w2 [w; 0), and 2 (w) =F W (w) 1F W (w) as T E !1. Theorem 5 provides the rates of both L 2 and uniform convergence of the “B-spline constrained linear estimator” b F W;T E (). From Theorem 5, the proposed estimator is bothL 2 and uniformly consistent. Theorem 6 shows that the estimator is also pointwise normal, with the same asymptotic normal distribution as the ECDF. Remark 9. Let’s go back to (2:11). Although I presented the “B-spline constrained linear estimator” under a special case where k and m in (2:11) are (m j 1) and m j , respectively, it’s not difficult to verify that all the estimation procedures and statistical properties would apply if one set k and m in (2:11) at other possible values. For example, k and m would be m j and m j , respectively, in the next subsection. 2.4.2 Estimation of F W (jW 0) and F W (0) Theorem 4 suggests an iteration procedure to estimate F W (jW 0) andF W (0) based onB (m1):m B . Some new notations are defined here. Switching the subscript E inB E j ,B E andB E j with B,B B j ,B B andB B j for Type II BP English auctions are defined similarly. Let P mj :mj B;t =t V mj :mj B;t ; b j B;t be the threshold price of the winner of the t-th m j -bidder Type II BP English auction, and Lemma 2 says p mj :mj B;t =b (mj1):mj B;t . DenoteP B j = n p mj :mj B;t o T B j t=1 andP B =[ J j=1 P B j . GivenF V jB =b andt v; b ,v canbesolvedfrom (2:4). Sincewehaveobservationsoft v mj :mj B;t ; b j B;t , it is tempted to solve v mj :mj B;t from (2:4) if F V jB =b was known. (2:4) is abstract in the sense that it involves an integral for which no closed form expression appears to be known. To help make (2:4) operational, the following transformations are made. Also, I rewrite the equation in terms of F W () rather 64 2.4. Estimation than F V jB =b because F W () is my main interest herein, V m:m B P m:m B u a bx dF m1 V xjB =b = V m:m B P m:m B u a bx dF m1 W xb = 0; where P m:m B =t V m:m B ; B =b . The rule of integration by parts yields u a bz F m1 W zb j V m:m B P m:m B + V m:m B P m:m B F m1 W xb u 0 a bx dx = 0; which can be further written as follows using changes of variables, u a bz F m1 W zb j V m:m B P m:m B + (V m:m B P m:m B ) 1 0 F m1 W x (y)b u 0 a bx (y) dy = 0; (2.20) where x (y) = (V m:m B P m:m B )y +P m:m B : (2.21) u a bz F m1 W zb j V m:m B P m:m B = u a bV m:m B F m1 W V m:m B b u a bP m:m B F m1 W P m:m B b Then (2:20) can be approximated through the Monte Carlo integration method, (2:20) u a bz F m1 W zb j V m:m B P m:m B + (V m:m B P m:m B ) " R 1 R X r=1 F m1 W x (y r )b u 0 a bx (y r ) # = 0; (2.22) where y 1 ;:::; y R are i.i.d. samples from the uniform distribution on [0; 1], R is a sufficiently big number, say 5,000. Therefore, V m:m B can be solved easily from (2:22) given P m:m B and F W (). In Section 2.3, I noted that (2:8) suggests an updating rule for F V jB =b . I will further explore such an updating scheme to estimate F + W () := F W (jW 0), and 0 W := F W (0). As I did for (2:4), similar transformations are made for (2:8), and one has 65 2.4. Estimation F m1 W V m:m B b F m1 W P m:m B b ( u a bP m:m B u a bV m:m B ) + (V m:m B P m:m B ) R 1 R X r=1 " u 0 a bx (y r ) u a bV m:m B # F m1 W x (y r )b ! : (2.23) where x (y) is defined in (2:21). Letb mj E =T j E =T j , which is an unbiased and consistent estimator for mj E , because the T j BP English auctions are independently generated from a common F W (). (2:7) in Theorem 3 can be rewritten in terms of F W , F + W and 0 W , m E = 0 W m + 0 @ w 0 " 0 W F t w +b b (1 0 W )F + W (w) + 0 W # m1 d F + W (w) m 1 A n 1 0 W m o : (2.24) Becauseb mj E is an unbiased estimator for mj E , one has the following J moment conditions E m T 0 W =E 8 > > > > < > > > > : b m1 E m 1 0 W . . . b m J E m J 0 W 9 > > > > = > > > > ; =0 J1 ; (2.25) where T = P J j=1 T j , and m j 0 W = 0 W mj + 0 @ w 0 " 0 W F W t w +b b (1 0 W )F + W (w) + 0 W # mj1 d F + W (w) mj 1 A n 1 0 W mj o ; j = 1;:::; J: A GMM estimatorb 0 W;T for 0 W can then be constructed from (2:25) provided that F + W () and F W () were known. Because 0 W is obviously over-identified as long as J > 1, a weight matrix T is used. Thus, the GMM estimatorb 0 W;T is defined by b 0 W;T = arg min 2[0; 1] m T () T m 0 T (): (2.26) Building on (2:22), (2:23) and (2:26), we are now ready to show how to estimate F + W () and 0 W . Algorithm 1 below provides the procedures of estimating F + W () and 0 W . Algorithm 1. Let e F + W (; 0) ande 0 W (0) be the initial guesses of F + W () and 0 W , respectively, and set r = 0 initially. 66 2.4. Estimation Step 1. Given e F + W (; r) ande 0 W (r), define e F W (w; r) = 8 > > < > > : e 0 W (r) b F W;T E (w); if w< 0; 1e 0 W (r) e F + W (w; r) +e 0 W (r); if w 0: (2.27) Step 2. For each p mj :mj B;t 2P B , solve v mj :mj B;t from the following equation, u a b j B;t z e F mj1 W zb j B;t ; r j v m j :m j B;t p m j :m j B;t + v mj :mj B;t p mj :mj B;t " R 1 R X r=1 e F mj1 W n x t;j (y r )b j B;t ; r o u 0 a n b j B;t x t;j (y r ) o # = 0; (2.28) where x t;j (y r ) = v mj :mj B;t p mj :mj B;t y r +p mj :mj B;t : Denote v mj :mj B;t (r) the solution to the equation (2:28)V B j (r) = n v mj :mj B;t (r) o , andV B (r) =[ J j=1 V B j (r). Step 3. Let w mj :mj B;t (r) =v mj :mj B;t (r)b j B;t ,W B j (r) = n w mj :mj B;t (r) o , andW B (r) =[ J j=1 W B j (r). For each p mj :mj B;t 2P B and w mj :mj B;t (r)2W B (r), compute e F mj1 W w mj :mj B;t (r) ; r + 1 = e F mj1 W p mj :mj B;t b j B;t ; r 2 4 u a b j B;t p mj :mj B;t u a n w mj :mj B;t (r) o 3 5 + n w mj :mj B;t (r) +b j B;t p mj :mj B;t o 0 @ R 1 R X r=1 2 4 u 0 a n b j B;t x t;j (y r ) o u a n w mj :mj B;t (r) o 3 5e F mj1 W n x t;j (y r )b j B;t ; r o 1 A = t;j n p mj :mj B;t ; F + W (; r) o ; t = 1;:::;T B j and j = 1;:::;J: (2.29) Note that w mj :mj B;t (r) is solved from p mj :mj B;t and F + W (; r). Let K T B := 0 = 0 =::: = p < p+1 <:::< p+q T B < p+q T B +1 =::: = 2p+q T B +1 =w be a partition of the support of F + W (). Estimate F + W () with the “B-spline constrained linear estimator”, and the following PS linear regression model e F + W n w mj :mj B;t (r) ; r + 1 o = p+q T B X k=0 k B k;p n w mj :mj B;t (r) o +! tj ; t = 1;:::;T B j , and j = 1;:::;J; 67 2.4. Estimation where e F W n w mj :mj B;t (r) ; r + 1 o = 1e 0 W (r) e F + W n w mj :mj B;t (r) ; r + 1 o +e 0 W (r). Denote b F + W;T (; r + 1) the resulted estimator. Step4. Estimate 0 W using the GMM estimatorb 0 W;T in (2:26) regarding b F W;T E () in (2:18) and b F + W;T (; r + 1) to F W () and F + W (), respectively in (2:25). Denote the resulted estimate of 0 W byb 0 W;T (r + 1). Step 5. Check if the convergence criterion b F + W;T (; r + 1) e F + W (; r) T B is smaller than the predefined tolerance coefficient, say, wherekk T B is defined in analogy withkk T E. If b F + W;T (; r + 1) e F + W (; r) T B < , the convergence has been reached, and let b F + W;T () := b F + W;T (; r + 1) be the estimate for F + W (), and b 0 W;T = b 0 W;T (r + 1) be the estimate for 0 W . Otherwise, set r = r + 1, e F + W (; r) = b F + W;T (; r + 1) and e 0 W (r) =b 0 W;T (r + 1), and go back to Step 1. Remark 10. Though the initial guesses of F + W (; 0) and 0 W (0) can be arbitrary, my recommendation, and the initial guesses used in the subsequent Monte Carlo studies are F + W (w; 0) =F beta (w=w); w2 [0; w]; where F beta () is a Beta distribution. The initial guess 0 W (0) of 0 W is 0 W (0) = b m j E 1=m j ; where j = arg max j=1;:::;J T j . Such a guess is suggested by Theorem 3 treating the second term of the right-hand-side of (2:7) as zero. Remark 11. Another good convergence criterion is simply “plot-and-see”–plot the estimated F W (; r) for each iteration r. When the lines show little difference, the algorithm then has reached the convergence. It might be tempting to attempt to use the difference betweene 0 W (r) andb 0 W;T (r + 1), b 0 W (r)b 0 W (r + 1) , as a convergence criterion. However, my simulation experience showse 0 W (r) converges very quickly even e F + W (; r) has not reached its convergence. Finally, my simulation studies show that the algorithm actually converges quickly. It usually takes less than 20 iterations to arrive convergence. The convergence of Algorithm 1 is established in Theorem 7. Asymptotic properties of the estimators in Algorithm 1 are discussed in Theorem 8. Theorem 7. Let the valuation be generated by (2:1) and suppose Assumption 5, B.1 and B.2 hold. Let B B = B 0;p ;:::;B p+q T B ;p be the set of normalized B-spline basis for the linear spaceG (p) T B . The functional sequence n e F + W (; r) o 1 r=1 defined in Algorithm 1 converges uniformly to a limit point e F + W (;1) ofG (p) T B , and e 0 W (r) 1 r=1 converges to a pointe 0 W (1)2 [0; 1]. 68 2.5. Examples Table 2.1: Simulation Setup Design m 1 m 2 m 3 T 1 T 2 T 3 F W F B 1 2 3 4 100 100 100 F (1) W () U [5; 10] 2 2 3 4 200 200 200 F (1) W () U [5; 10] 3 2 3 4 100 100 100 F (2) W () U [5; 10] 4 2 3 4 200 200 200 F (2) W () U [5; 10] F (1) W () is a normal distribution N(5=4; 1) truncated to [5; 2], and F (2) W () is a trimodal distribution P 2 l=0 p l N 5l=2; 3 2 , (p 0 ;p 1 ;p 2 )= 4 1 ; 2 1 ; 4 1 , truncated to [5; 2]. Theorem 8. Let the valuation be generated by (2:1) and suppose Assumption 5, B.1 and B.2 hold. Let p = inf w;b t w +b; b and p = sup w;b t w +b; b . IfP B := n p mj :mj B;t o T B j ;J t=1;j=1 forms a dense set in p; p as T B !1, and 0< lim T E ;T B !1 T E =T B <1, 1. b F + W;T () in Algorithm 1 converges uniformly to F + W (), and b 0 W;T in Algorithm 1 converges to 0 W in probability; 2. b F + W;T F + W + =O p q (p+1) T B + T B 1=2 ; 3. b F + W;T F + W + 1 =O p q (p+1) T B + logT B =T B 1=2 ; 4. if one further has T B 1=2 q (p+1) T B ! 0, T B 1=2 n b F + W;T (w)F + W (w) o ! d N 0; 2 + (w) ; for all w2 [0; w], and 2 + (w) =F + W (w) 1F + W (w) , as T!1. 2.5 Examples In this section, simulation studies are conducted to evaluate the finite sample performance of the proposed estimation method. And some numerical features of Algorithm 1 will also be illustrated to give readers a more vivid impression about this iterative algorithm. The simulation settings are contained in Table 1. Under the settings, the buy price is assumed to follow a uniform distribution on [5; 10]. F W () takes two forms: the first one, F (1) W (), is a normal distribution N (5=4; 1) truncated to [5; 2]; the second one, F (2) W (), is a trimodal distribution P 2 l=0 p l N 5l=2; 3 2 , (p 0 ;p 1 ;p 2 ) = 4 1 ; 2 1 ; 4 1 , truncated to [5; 2]. The number of participated bidders in a BP English auction can be 2, 3 and 4 with equal probabilities. I do not specify the reserve price in my study because the data generating process (DGP) did not use it. The reserve price can be thought as a small value for which Assumption 5 holds. 69 2.6. Concluding Remarks For an m j -bidder BP English auction in my simulation study, its DGP proceeds in three steps. First, draw m j simulated samples from F V jB =b based on the simulation settings in Table 1. Then, given the m j simulated valuations, check if this BP English auction will end without using the buy price option–turn out to be a Type I BP English auction. The “if-and-only-if” conditions for an m j -bidder BP English auction being a Type I BP English auction can be found in Theorem 3. Finally, save the (m j 1)-th order statistic among the m j simulated valuations for Type I BP English auction. As for Type II BP English auction, compute and save the threshold price of the bidder with the maximum valuation among the m j bidders, i.e. t V m:m ; b . t V m:m ; b can be solved from (2:22). The estimation results are displayed in Figure figure 2.1. The red solid lines is the true distribution of valuations. The pointwise average of the estimated distribution (dash) and the 95% pointwise bootstrap confidence intervals (dot) are displayed. The replications are 1,000, and the maximum number of iterations is 20 which is sufficient to ensure the convergence based on prior simulation experience. For one replication of design 4, which is the most complicated one in this study, it takes less than 2 minutes to achieve convergence on a Mac computer with Intel(R) Core(TM) i5 CPU 680 @3.60GHz and 3.59GHz. So the proposed estimator is not too computationally demanding. The proposed estimators give rise to satisfying estimates of the distribution of latent valuations. Both the locations and shapes of the valuation distributions are captured by the proposed estimation approach. Another remarkable finding is that the finite sample performance of the proposed estimator depends on the “complexity” of the true distribution of valuations. When the true distribution is standard, like F (1) W () in design 1 and 2, the spline estimator requires fewer samples and produce more accurate estimates. 2.6 Concluding Remarks This paper develops methods for identifying and estimating the distribution of latent valuations in a buy price English auction. Exploring the equilibrium bidding strategies, I show how to identify nonparametrically the underlying distribution of valuations from the observed bids and number of bidders. In turn, I decompose the distribution of valuations into three parts and develop spline estimators to estimate them. To implement the proposed estimators, a computationally fast iteration algorithm is developed based on the contraction mapping theorem. 70 2.6. Concluding Remarks Figure 2.1: Estimation of the Distribution of Valuations 71 2.6. Concluding Remarks One potential extension to the present paper is to relax the rules of the “modified English clock auction” by allowing for the presence of the “jump bidding” behavior in the course of auction. When the jump bidding is allowed, Lemma 1 and 2 would be untenable. As substitutions, the following two behavior assumptions still hold. Assumption 6. LetB m:m E be the transaction price of a Type I BP English auction, and V (m1):m E andV m:m E be the (m 1)-th and m-th order statistics among the m valuations V 1;E ;:::; V m;E , respectively. Under Assumption 5 and B.1, B m:m E >V (m1):m E , V (m1):m E <b, and V m:m E B m:m E . Assumption 7. Let B (m1):m B and B (m2):m B be the (m 1)-th and (m 2)-th order statistics among the m bids B 1;B ;:::;B m;B in a Type II BP English auction, and V m:m B be the maximum of the valuations V 1;B ;:::;V m;B . Under Assumption 5 and B.1, B (m2):m B <t V m:m B ; b and B (m1):m B t V m:m B ; b . Assumption 6 comes from Haile and Tamer (2003), which says two intuitive facts: the first is that “bidders do not bid more than they are willing to pay”; and the second is that “bidders do not allow an opponent to win at a price they are willing to beat”. Assumption 7 also relies on two intuitive facts about the “threshold” bidding strategy: the first is that a bidder would not exercise the buy price option when the auction clock is lower than his threshold price, i.e. B (m2):m B <t V m:m B ; b ; and the second fact is that a bidder would use the buy price option when the auction clock has exceeded his threshold price, i.e. B (m1):m B t V m:m B ; b . Because t ; b is a monotone decreasing function in its first argument, Assumption 7 is equivalent to the following statement. Assumption 8. Let B (m1):m B and B (m2):m B be the (m 1)-th and (m 2)-th order statistics among the m bids B 1;B ;:::;B m;B in a Type II BP English auction, and V m:m B be the maximum of the valuations V 1;B ;:::;V m;B . Under Assumption 5 and B.1, t B (m2):m B ; b >V m:m B and t B (m1):m B ; b V m:m B . Following Haile and Tamer (2003), one can consider the identification and estimation of the bounds of the valuation distribution using Assumption 6 and 8. The formal treatment of such an extension is far more complicated and cannot be treated in the present paper. So a thorough discussion about this extension would be a future work. 72 Chapter 3 Estimation of Identified Sets in Nonlinear Panel Data Models For nonlinear panel data models, identification is a nontrivial issue because the distribution of unobserved individual effects is unknown. It is usually difficult for researchers to check the identifiableness of nonlinear panel data models. Moreover, recent work has found that many widely used nonlinear panel data models are indeed only set identified. This chapter proposes a general approach to estimate the identified set of nonlinear panel data models; the method is general in terms that the major assumptions are the likelihood functions need only to satisfy certain smooth conditions, and the distributions of unobserved individual effects are absolutely continuous with bounded support. The proposed set estimator is consistent for the identified set as n!1 and T being fixed; the estimator automatically converges to the true parameter value if the underlying model is point identified. The convergence rate of the distance between the identified set and its estimate is O p n , 0 < < 1=K, with K being the dimension of the unknown parameters of primary interest. Simulation examples corroborate the theoretical findings. 3.1 Introduction Nonlinear panel data models arise frequently in economic studies due to its flexibility in model specification and the allowance of unobserved individual heterogeneity. Many structural empirical works end up with nonlinear panel data models. Despite its popularity and flexibility, the analysis of nonlinear panel data remains a challenge to econometricians. Because of the nonlinear nature of the models, standard fixed effects estimation method in linear panel data models does apply. Random effects estimation method relies on the strong distributional assumptions, which cannot be tested but could be violated easily. More importantly, recent works in nonlinear panel data have suggested that the verification of the point identification of a nonlinear panel model is nontrivial (Arellano and Bonhomme, 2011), and point identification could fail in many common panel data models, e.g. probit model. When the observed individual covariates is bounded, and the support of the fixed effect is not a finite set, Chamberlain (2010) shows that the structural parameter 73 3.1. Introduction will not be point identified in most discrete outcomes panel data models. In particular, when the outcome is binary, the point identification is possible only in logit models. When the number of points in the support of outcome is large enough relative to the size of the support of fixed effect, e.g. the outcome is continuous and the fixed effect is discrete, the theory of functional differencing developed by Bonhomme (2012) implies that the point identification is usually possible. However, the issue of point identification is not clear when both the outcomes and the fixed effects are continuous, though Bonhomme (2012) has a conjecture in this scenario that the point identification is possible if T is bigger than the dimension of fixed effects variables. Though researchers have demonstrated the success as well as the failure of point identification in some cases, it is largely unknown to econometricians that when the point identification of structural parameters is possible and impossible. Moreover, checking the identifiableness of point identification may be difficult or tedious. Motivated by the above arguments, this paper aims to develop a systematic approach of estimating identified sets of nonlinear panel data models. The proposed method is desirable for the following reasons: (1) the method can estimate a wide range of nonlinear panel data models. The major assumption maintained is that the distributions of unobserved fixed effects have bounded supports; (2) the method does not require researchers to have knowledge about the identifiableness of models. The proposed identified set estimator converges in probability to the identified set. When a model is point identified, its identified set consists solely of the true value of parameters. Thus, the identified set estimator will shrinks nearly to a point for point identified models. This feature will be illustrated in the numerical examples in Section 5; (3) the proposed set estimator is computationally easy and stable. My work is related with the following two papers. Restricting to the simple setups with discrete outcomes, discrete regressors, and discrete fixed effects, Honoré and Tamer (2006), Chernozhukov et al. (2009) proposed two related ways to construct the identified set. The advantage of Honoré and Tamer (2006) and Chernozhukov et al. (2009) is that their method based on linear programming is easy to implement, and Chernozhukov et al. (2009) also provides inference method about the estimated identified set. However, their methods are restricted to simple settings with discrete outcomes and discrete fixed effects. It is not straightforward to extend their method to continuous cases, which are very common in applied microeconomics and macroeconomics. The method proposed in this paper can treat the continuous cases (both the outcome and the fixed effect could be continuous). One remark about the higher order bias correction method proposed by Hahn and Newey (2004) is that when the model is not point identified the bias correction will not make sense since the maximum likelihood estimators would converge to a random vector whose support is the identified set (??). In section 3.2, we first describe the semiparametric model studied in this paper and define the identified set of the structural parameters in the model, then end up with an outline of our estimation procedure. Section 3.3 explains our estimator in detail. Then in section 3.4, we investigate the asymptotic properties of the 74 3.2. Identified Sets in Nonlinear Panel Data Models proposed estimator. Some simulation results will be shown in section 3.5. All the proofs and technical lemmas are placed in the two appendices. 3.2 Identified Sets in Nonlinear Panel Data Models 3.2.1 Model Let Y (Y 1 ;:::;Y T ) | be the outcome, which may be discrete or continuous. The distribution of Y depends on a set of observable covariatesX (X 1 ;:::;X T ) | and an unobservable absolutely continuous state variable S whose density function is denoted by g (s)2G. Hereafter we call g (s) the mixing density. The mixing density g (s) is unknown to researchers, and we impose no parametric assumption on it though the class of mixing densityG does need to satisfy some technical conditions. These conditions will be stated in the next section. Letf YjX;S (yjx;s;) be the conditional density function ofY givenx ands, which is assumed to be known up to a finite dimensional vector of parameters 2BR K with K <1. 9 The vector of parameters will be called the structural parameter of the model. Given and g, the conditional density of Y given X =x is f YjX (yjx;;g) = f YjX;S (yjx;s;)g (s) ds: (3.1) Here I assumed that S and X are independent. Denote Z = (Y;X), and let z = (y;x) be a realization of Z. The data aren random realizations z 1 ;:::;z n ofZ. Let` (;g;z) = logf YjX (yjx;;g) be the log-likelihood function, and the true parameters are denoted by 0 2B andg 0 2G. DenoteP` (;g;Z) = E 0;g0 f` (;g;Z)g, where the expectation is taken under the true distribution. We know that ( 0 ;g 0 ) is a solution of the following maximization problem max 2B;g2G P` (;g;Z): (3.2) When ( 0 ;g 0 ) is the unique solution of (3.2), we say that model (3.1) is pointly identified. When the model is pointly identified, the parameter and g can be estimated by the semiparametric M-estimator whose properties have been well understood (see chapter 21 of Kosorok, 2008). In the present paper, we does not impose such a point identification assumption. Withdrawing the point identification assumption, the paper is interested in the identified set of the structural parameter 0 . The identified set B 0 B of 0 is the set of all 0 2B for which there exists a mixing density g 0 2G such that P` 0 ; g 0 ;Z =P` ( 0 ;g 0 ;Z). Notice that 9 When Y is discrete, f YjX;S (yjx;s;) shall be understood as the probability mass at y given x and s. 75 3.2. Identified Sets in Nonlinear Panel Data Models Figure 3.1: Hit Estimator of Identified Set when the model is pointly identified B 0 =f 0 g becomes a singleton. The aim of this paper is to develop a general and computationally simple method of estimating and computing the identified set B 0 . 3.2.2 Outline of Methods To outline our estimation strategy, it is instructive to consider the following question first: suppose the subset B 0 is an ordinary compact subset ofBR K , and let 1 0 ;:::; m 0 be a sequence of random samples drawn from this set. In the sequel , we call 1 0 ;:::; m 0 marks on the subset B 0 . How can we approximate the set B 0 given these marks? Chaudhuri et al. (1999) and Baíllo and Cuevas (2006) studied this question and proposed a simple estimator. LetP m be a sequence of cubic partitionsP m =fA mj :j = 1; 2;:::g of BR K , m = 1; 2;::: Let each element A mj ofP m be a bin given by Q K k=1 [c k w m ; (c k + 1)w m ), where c k ’s are integers and w m is the bin width. The point (c 1 w m ;:::;c K w m ) is the origin of the bin A mj . Then a plausible approximate of B 0 based on 1 0 ;:::; m 0 is B m =B m (w m ) = [ A mj 2P m : i 0 2A mj for some i = 1;:::;m : The partitionP m and the approximate B m are illustrated in Figure 3.1. 10 We will call the estimator B m hit estimator because the set B m consists of all bins that are hit by at least one mark. 10 The vector of structural parameters is = ( 1 ; 2 ). The lattice in the graph isPm. The area within the red line is the identified set B 0 . The black dots are marks on the identified set B 0 . The union of the gray bins is the H-shape estimator Bm of B 0 . 76 3.3. Asymptotic Marks The hit set estimator is computationally easy and is capable of capturing the shape of the target set. Moreover, the hit estimator is valid as long as the target set B 0 is compact and satisfies other mild regularity conditions; the target set B 0 does not need to be convex as the standard convex-hull algorithm demands. Because of these desired properties, we will modify the hit estimator to estimate the identified set B 0 in semiparametric mixture models. The set estimation problem here is more difficult than the original question studied in Chaudhuri et al. (1999) and Baíllo and Cuevas (2006) because we indeed do not have the marks 1 0 ;:::; m 0 on the identified set B 0 ; we observe random samples z 1 ;:::;z n generated from the mixture model (3.1) instead. But provided that we have a set of estimates 1 n ;:::; mn n of the marks that will land within the identified set B 0 as the sample size n!1, and 1 n ;:::; mn n are randomly distributed on the set B 0 , a natural estimator of the identified set B 0 would be ^ B n = ^ B n (m n ;w mn ) = [ A mnj 2P mn : i n 2A mnj for some i = 1;:::;m n ; (3.3) where m n is the number of marks, and w n is the bin width. Both m n and w n will depend on the sample size n. Such dependence will be precise, when we study the asymptotic properties of ^ B n . In the sequel, we call the estimates 1 n ;:::; mn n asymptotic marks on B 0 , and call the set estimator ^ B n in (3.3) asymptotic hit (AH) set estimator. It will be explained in the next section how to obtain the asymptotic marks. The optimal choice of the number of marks, m n , and the bin width, w mn , depends on the asymptotic properties of ^ B n , which will be the theme of Section 4. 3.3 Asymptotic Marks Some new notations and preliminary results are given as a preparation for describing the methods of obtaining the asymptotic marks. Given random samples z 1 ;:::;z n , where z i = (y i ;x i ), let P n ` (;g;z i ) =n 1 n X i=1 logf YjX (y i jx i ;;g) be the normalized log-likelihood function. DefineP n ` profile (;z i ) =P n `f;g sup n (;) ;z i g, where P n `f;g sup n (;) ;z i g = sup g2G P n ` (;g;z i ): (3.4) In perfect analogy, define P` profile (;z i ) = P`f;g sup (;) ;Zg, where g sup (;) solves the optimization problem sup g2G P` (;g;Z). We used “sup” instead of “max” because the functional spaceG that we are 77 3.3. Asymptotic Marks going to work with is not compact. Indeed, the optimization problem (3.4) involved in defining the function P n ` profile (;z i ) is nonstandard because the spaceG is a class of density functions. We defer the discussion about this issue until section 3.3.2. In the sequel, we call P n ` profile (;z i ) and P` profile (;Z) the sample and population profile log-likelihood function, respectively. Using the notion of population profile log- likelihood function P` profile (;Z), the identified set B 0 can be equivalently written as the set of all maxima of P` profile (;Z) withinB. We summarize this observation in Proposition 6. Proposition 6. A parameter 0 2B belongs to the identified set B 0 inG if and only if P` profile 0 ;Z = max 2B P` profile (;Z). In addition, when the structural parameter spaceBR K is bounded and the population profile log- likelihood function P` profile (;Z) is continuous on 2B, the identified set B 0 is compact. Proposition 7. SupposeBR K is bounded and P` profile (;Z) is continuous, the identified set B 0 of in G is a compact subset ofB. 3.3.1 Draw asymptotic marks given the sample profile log-likelihood function Some standard concepts of set operations are needed to state our results. Let d H (x;A) be the Hausdorff distance between a point x2R K and a subset AR K , that is d H (x;A) = inf x 0 2A kxx 0 k, wherekk is the Euclidean norm ofR K . Notice that when A is compact, the event d H (x;A)> 0 is equivalent to x2A c . Here A c is the complement of subset A. We use the following pseudo-distance to measure the discrepancy between two subsets A;BR K : d (A;B) := (A4B) = (A\B c ) + (A c \B); (3.5) where is a positive -finite measure onR K , and it is absolutely continuous with respect to the underlying probability law P. 11 Without loss of generality, let be the Lebesgue measure throughout the paper. Such a pseudo-distance is commonly used in set estimation, e.g. Chaudhuri et al. (1999), Baíllo and Cuevas (2006), and Mason and Polonik (2009). It can be verified that when both A and B are closed subsets ofR K , d (A;B) = 0 if and only if A =B. Proposition 6 says that a point of the identified set B 0 must be a maximum of the population profile log-likelihood function P` profile (;Z). Since the the sample profile log-likelihood function P n ` profile (;z i ) approximates P` profile (;Z), it is natural to use the set of points close to the maximum ofP n ` profile (;z i ) 11 An distance function is the Hausdorff distanced H (A;B) between setsA andB, which is used by Chernozhukov et al. (2007) and Beresteanu et al. (2011). We used the distance function (3.5) because the development of our asymptotic theory is easier by using d(A;B). One advantage of Hausdorff distance d H (A;B) over d(A;B) is that the former is more sensitive to the difference between the shapes of the two subsets. 78 3.3. Asymptotic Marks as asymptotic marks on B 0 . The following theorem validates this natural conjecture upon the following conditions. Assumption 9. (i) The parameter spaceB is a compact subset ofR K ; (ii) P n ` profile (;z i ) is second-order differentiable and uniformly continuous onB; (iii) P n ` profile (;z i ) converges uniformly to P` profile (;Z) in probability, that is max 2B P n ` profile (;z i )P` profile (;Z) ! p 0: Theorem 9. Let B 0 be the identified set of 0 inG, and suppose Assumption 9 hold. Let B n be the set of all M-estimators n ’s satisfying P n ` profile ( n ;z i ) max 2B P n ` profile (;z i ) ! n ; with ! n =o p (1) and ! n 0: (3.6) Then d H ( n ;B 0 )! p 0 for all n 2B n and d (B n ;B 0 )! p 0 as n!1. The theorem has three implications. First, any M-estimator will eventually fall within the identified set (d H ( n ;B 0 )! p 0). Second, the set of all M-estimators will coincide with the identified set as n!1 (d (B n ;B 0 )! p 0). Third, the result thatd (B n ;B 0 )! p 0 implies that when the true value 0 is not pointly identified, the probability limit of an M-estimator is a random variable whose support is the identified set B 0 . Remark 12. Given Theorem 9, it is tempted to collect a set of M-estimates b 1 ;:::;b m that nearly maximize the sample profile log-likelihood functionP n ` profile (;z i ) in the sense of satisfying (3.6), and then implement the AH set estimator ^ B n in (3.3) with these estimates. Though Theorem 9 guarantees that b 1 ;:::;b m will all fall within the identified set B 0 as n!1, it will be seen, when we study the asymptotic properties of the AH set estimator ^ B n , that the validity of the set estimator ^ B n requires that the asymptotic marks b 1 ;:::;b m are weakly correlated. The intuition is that when the asymptotic marks are highly correlated, they are likely to be trapped by a proper subset of the identified set B 0 . Hence the resulting set estimate will be asymptotically biased. Our next result shows that by controlling the cooling schedule of the standard simulated annealing (SA) algorithm (Dekkers and Aarts, 1991 and Henderson et al., 2003), the SA algorithm provides an efficient way of obtaining uncorrelated asymptotic marks. We say it is efficient in the sense that enough asymptotic marks can be obtained by estimating the model only once. We outline the SA algorithm for our problem as follows. 79 3.3. Asymptotic Marks The SA algorithm is a nested-loop algorithm, that is each iteration inside the outer loop is also a loop. Let j index the jth iteration in the outer loop. The outer loop gradually decreases the temperature parameter j n as j proceeds, whose role will be clear in the inner loop. Given the temperature parameter j n , the inner loop is to generate a sequence of random samples from the following Markov chain. Let j;k 2B be the current guess of the maximizer ofP n ` profile (;z i ). The index k in j;k is the iteration index for the inner loop. A parameter value b2B is then generated from a proposal distribution bj j;k . The new guess b2B is accepted with the probability A j;k ;b j n , where A j;k ;b j n := min ( 1; exp ( P n ` profile (b;z i )P n ` profile j;k ;z i j n )) : (3.7) If b is accepted, let j;k+1 =b; otherwise j;k+1 = j;k . Denote p bj j;k ; j n = bj j;k A j;k ;b j n . The inner loop gives rise to a Markov chain with the following transition probability of transforming j;k 2B into an element out of BB, P Bj j;k ; j n = 8 > > < > > : b2B p bj j;k ; j n db; for j;k = 2B; b2B p bj j;k ; j n db + 1 b= 2B p bj j;k ; j n db ; for j;k 2B: As a summary, Algorithm 3.2 shows the pseudo code of this algorithm. Let q ; j n be the density of the stationary distribution of the Markov chain at the j-th outer iteration, which exits by the following result. Theorem 10. Let j n 1 j=0 be a monotone decreasing sequence of the temperature parameters in the simulated annealing algorithm, and let 1 n = lim j!1 j n . Suppose Assumption 9 hold, and max 2B P n ` profile (;z i )P` profile (;Z) =O p n ; > 0: (3.8) If 1 n =O n 0 with 0 <, and the proposal distribution satisfies that (b 0 jb) = (bjb 0 ), the Markov chain generated by the SA algorithm with the cooling schedule j n 1 j=0 has a unique stationary distribution q (; 1 n ) as j!1, which converges uniformly to the uniform distribution on the identified set B 0 . In the remainder of this paper, we will call the SA algorithm with the temperature parameters satisfying the conditions of Theorem 10 the constrained simulated annealing (CSA) algorithm. Let 1 n ;:::; mn n be m n draws after a sufficiently long burn-in period from the Markov chain specified by the CSA algorithm. One can easily make these m n draws uncorrelated by storing only every l-th point of the output from the markov chain after the burn-in period, where l is some big number. (To avoid technical difficulties, we say them n draws are uncorrelated whenl is big, though they are only weakly correlated.) Moreover, Theorem 10 80 3.3. Asymptotic Marks 1: Select initial parameters b2B; 2: Select a temperature cooling schedule j n satisfying the conditions of ??; 3: Select a repetition schedulefM j g that defines the number of (inner loop) iterations at each temperature j n ; 4: Define an empty list blist. Set t = 1 and blist [1] =b; 5: for in j n do 6: for k in 1:M j do 7: Generate b 0 from the proposal distribution (b 0 jb); 8: Calculate the acceptance probability A; 9: Draw a Bernoulli number D with the probability of 1 being A in (3.7); 10: Let b =Db 0 + (1D)b; 11: blist [t] =b; 12: t t + 1; 13: end for 14: end for Figure 3.2: Constrained Simulated Annealing Algorithm for Drawing Marks says 1 n ;:::; mn n are uniformly distributed on the identified set B 0 as n!1. Hence they can be used as asymptotic marks to implement the AH set estimator ^ B n in (3.3) provided that we have the sample profile log-likelihood functionP n ` profile (;z i ). In the next subsection, we study the estimation ofP n ` profile (;z i ). Remark 13. The central component of Theorem 10 is (3.8), where we need to know the convergence rate of P n ` profile (;z i ) toward P` profile (;Z). Indeed the proof of Theorem 10 does not depend on the particular form of the criterion functionP n ` profile (;z i ). Suppose we have an approximateP n ^ ` profile (;z i ) of the true the sample profile log-likelihood function P n ` profile (;z i ). As long as P n ^ ` profile (;z i )P` profile (;Z) is o p (1) and we know its convergence rate, Theorem 10 still applies, and we can produce asymptotic marks based on the approximateP n ^ ` profile (;z i ). Remark 14. Although we describe the set estimator under the framework of semiparametric mixture model, the proposed method can be adapted to compute the identified set of many other partially identified models. Suppose the identified set B 0 is defined as the maxima of a population criterion function Q (), which is P` profile (;Z) in our case. As long as we have an uniformly consistent estimatorQ n () ofQ (), and we know the convergence rate ofjQ n ()Q ()j, the CSA algorithm can be used to produce asymptotic marks, and the AH set estimator can be used to estimate the identified set 0 : 0 2 arg max 2B Q () . In particular, the identified set in Chernozhukov et al. (2007) can computed by our approach. For example, the convergence rates of the empirical criterion functions in the moment equalities and inequalities models are O p n 1=2 (see p.g. 1262 and 1266 of their paper). 81 3.3. Asymptotic Marks 3.3.2 Sieve estimation of the sample profile log-likelihood function Recall that the empirical profile criterion function is defined byP n ` profile (;z i ) =P n `f;g sup n (;) ;z i g with P n `f;g sup n (;) ;z i g = sup g2G P n ` (;g;z i ). So the main problem is to solve sup g2G P n ` (;g;z i ) given any 2B: (3.9) IfG is a parametric family of distributions, (3.9) becomes trivial. The source of the difficulty in solving problem (3.9) is the lack of structure inG. The following assumption restricting the functional spaceG will be maintained in the remainder of this paper. Assumption 10. (i) The setG consists of the density functionsg of all absolutely continuous distributions that satisfy the following conditions: (a) there exists a fixed finite interval (a;b) such that g (s) 0 a.e. for s2 (a;b) and g (s) = 0 for s = 2 (a;b); (b) b a g (s) ds = 1; (c) logg2L 2 (a;b) , whereL 2 (a;b) is the set of all square (Lebesgue) integrable real-valued functions on (a;b); (ii) The true mixing density g 0 2G. A useful mathematics result is that the class of densitiesG specified in Assumption 10 is indeed a separable Hilbert space over real line when it is equipped with two special algebraic composition laws (Egozcue et al., 2006). The gain from this result is that the spaceG then has a countable basis, and every density inG can be uniquely determined by a vector from ` 2 space with Fourier expansion. Here ` 2 space is the set of all square summable sequences of real numbers. From this perspective, the spaceG can be understood as a big parametric family of distributions, and the maximization problem (3.9) seems to be simple. The class of densitiesG as a separable Hilbert space Some new notations are needed in the proceeding discussion. The notations we use here follow those of Egozcue et al. (2006). LetG B be the class of all bounded probability density functions on (a;b), 0<g (s)<1 for a < s< b. Clearly,G B is a subset ofG. The classG B combined with two algebraic composition laws, perturbation () and power transformation (), is shown to be a vector space (Theorem 3 of Egozcue, 82 3.3. Asymptotic Marks Díaz-Barrero, and Pawlowsky-Glahn, 2006). Let g 1 ;g 2 2G B , perturbation () is defined by g 1 g 2 := g 1 (s)g 2 (s) b a g 1 ()g 2 () d : (3.10) Let g2G B and c2R, power transformation () is defined by cg := g c (s) b a g c () d : (3.11) in addition, we define g 1 g 2 =g 1 ((1)g 2 ), that is g 1 g 2 = g 1 (s)=g 2 (s) b a g 1 ()=g 2 () d : (3.12) An inner producth;i A can be defined for the vector spaceG B . Let g 1 ;g 2 2G B , the inner producthg 1 ;g 2 i A is defined by hg 1 ;g 2 i A := 1 2 b a b a log g 1 ( 1 ) g 1 ( 2 ) log g 2 ( 1 ) g 2 ( 2 ) d 1 d 2 : (3.13) Given the inner producth;i A , a normkgk A = p hg;gi A of g2G B , and a distance function d A (g 1 ;g 2 ) = kg 1 g 2 k A can be defined immediately. Hereafter, we will refer d A (g 1 ;g 2 ) the Aitchison distance function. The important result we are going to use in summarized is Theorem 11 below. Its proof follows from Theorem 7, 8, 13 and 14 of Egozcue, Díaz-Barrero, and Pawlowsky-Glahn (2006), and we omit the proof here. Theorem 11. (i) The class of densitiesG defined in Assumption 10 is the closure ofG B , i.e.G =G B . (ii) The space (G;h;i A ) is a separable Hilbert space, whereh;i A is the inner product defined in (3.13). (iii) Supposef' j g j0 is a Hilbert basis ofL 2 (a;b) , and letf j g j0 be defined by j (s) = expf' j (s)g b a expf' j ()g d : Thenf j g j0 is an orthonormal basis of the Hilbert space (G;h;i A ). (iv) Any density function g2G can be uniquely written as follows, g (s) = 1 M j=0 j j (s); where j =hg; j i A , and P 1 j=0 j j j 2 <1. Here the operations “” and “” are defined in (3.10) and 83 3.3. Asymptotic Marks (3.11), respectively. Theorem 11.(iv) is the Fourier expansion of any density function g2G. With a fixed set of orthonormal basis, which is easily available according to part (c) of the theorem, a density function g is “parameterized” by an infinite dimensional vectorf j g j0 . It should be noted thatG is not a compact space, because it is not bounded. That’s why we used “sup” rather than “max” when stating the maximization problem overG. Finally, a remark about the necessity ofG B is that the two operations and are well-defined and closed only withinG B ; these two operations are not well-defined and closed inG. Based on Theorem 11.(iv) , we know that the maximization problem (3.9) is equivalent to the following problem, sup g2G P n ` (;g;z i ) = sup f jg j0 2` 2 P n ` ; 1 j=0 j j (s) ;z i : (3.14) Again ` 2 space is not compact. But the problem (3.14) has been much more manageable than the original problem (3.9). Though the equivalent result (3.14) holds for any basisf j g j0 , we use a particular basis generated from Legendre polynomials in this paper. LetfP j g j0 be the ordinary Legendre polynomials inL 2 space, P j (z) = j X k=0 j k j 1 k 1z 2 k ; j 0: Then n ' j (s) = p (2j + 1)= (ba)P j [2 (sa)= (ba) 1] o j0 forms a Hilbert basis ofL 2 (a;b) . By Theo- rem 11.(iii) , the sequence j (s) = exp hq 2j+1 ba P j 2(sa) ba 1 i b a exp hq 2j+1 ba P j 2(a) ba 1 i d ; j 0; (3.15) is an orthonormal basis of the Hilbert spaceG. It can be shown that for any g2G, there exists a vector f j g j0 such that g (s) = exp n P 1 j=0 j q 2j+1 ba P j 2(sa) ba 1 o b a exp n P 1 j=0 j q 2j+1 ba P j 2(a) ba 1 o d : (3.16) Hereafter, we will callf j g j0 in (3.15) the Legendre basis of the Hilbert spaceG. The Legendre basis has good properties that will be used in deriving the asymptotic properties of the AH set estimator ^ B n . 84 3.3. Asymptotic Marks The sieve estimation of marks The parameterg orf j g j0 to be maximized in the problem (3.14) is infinite dimensional. A standard way to treat this kind of infinite dimensional maximization problem is the sieve method (Geman and Hwang, 1982). LetG n be the sieve parameter space, a sequence of increasing subsets of the parameter spaceG growing dense inG as n!1. In particular,G n is defined as follows, G n := 8 < : g (s) = Jn1 M j=0 j j (s) : ( 0 ;:::; Jn1 )2R Jn and Jn1 X j=0 2 j <1 9 = ; : (3.17) The dimension of the subspaceG n is J n . Proposition 8. The sieve parameter spaceG n defined in (3.17) is a compact subspace ofG for any J n <1. Denote g n the projection of a density function g2G into the subspaceG n . The following two theorems give the error bound of approximating g with g n . Theorem 12. Let g (s) 2 G, m (z) = logg [a + (ba) (z + 1)=2], 1 z 1, and denotekmk T = 1 1 jm 0 (z)j= p 1z 2 dz the Chebyshev-weighted seminorm. Denote the Fourier expansion of the density function g (s) by g (s) = 1 M j=0 j j (s); wheref j g j0 is the Legendre basis ofG. If m;m (1) ;:::;m (p1) are absolutely continuous on [1; 1] and m (p) T =V p <1 for some p 1, then for each J n >p + 1, j Jn j r ba 2J n + 1 V p (J n 1=2) (J n 3=2) (J n (2p 1)=2) r 2 (J n p 1) : Theorem 13. Suppose the conditions of Theorem 12 hold, and denoteg n the projection ofg onto the subspace G n defined by (3.17). For each J n >p + 1, d A (g;g n ) V p (p 1) (J n 1=2) (J n 3=2) (J n (2p 3)=2) s (ba) 2 (J n p) : As n!1, the dimension ofG n will also grow to infinity. Theorem 13 says one can ignore the very high orders in Fourier expansion. For example, a truncated normal distribution on [a;b] can be exactly represented by the Fourier expansion of order 3, i.e. J n = 3, becauseV 1 = 0 in this case. More precisely, Theorem 13 says the approximation error d A (g;g n ) is O J (p1=2) n . If J n =O (n ), then d A (g;g n ) =O n (p1=2) . This result will be useful in the next section, when we start to study the convergence rate of AH set estimator ^ B n . 85 3.4. Asymptotic Theory Lemma C.2 in Appendix C also reports the L 2 and L 1 distance between g and g n . Let ^ g n (;) = arg max g2Gn P n ` (;g;z i ) and P n ^ ` profile (;z i ) = P n `f; ^ g n (;) ;z i g. As n!1,G n becomes a dense subset ofG, and the sieve estimateP n ^ ` profile (;z i ) will be “close” toP n ` profile (;z i ). In the sequel, we callP n ^ ` profile (;z i ) the sieve approximate the sample profile log-likelihood function . The convergence rate ofP n ^ ` profile (;z i ) will be derived in the next section. Then the constrained SA algorithm can be used to draw asymptotic marks by substituting P n ` profile (;z i ) with P n ^ ` profile (;z i ). Given the asymptotic marks, the AH set estimator ^ B n in (3.3) is now ready to be implemented with some parameters, including the number of marks and the bin width, unspecified. 3.4 Asymptotic Theory In this section, we study the asymptotic properties of the AH set estimator ^ B n based on the asymptotic marks obtained from the CSA algorithm with the objective function P n ^ ` profile (;z i ). We establish the consistency of ^ B n and derive the rate of convergence in terms of the pseudo-distance d ^ B n ;B 0 = ^ B n \B c 0 + ^ B c n \B 0 . We also study the choice of the order of Fourier expansion, J n , the number of marks, m n , and the bin width, w mn , in implementing the AH estimation. Denote = (;g) and =BG. The parameter space is a metric space. Define a distance between 1 ; 2 2 by d ( 1 ; 2 ) =d (( 1 ;g 1 ); ( 2 ;g 2 )) = q d ( 1 2 ) 2 +d A (g 1 ;g 2 ) 2 ; where d (;) is the Euclidean distance function, and d A (;) is the Aitchison distance function derived from the inner product (3.13). Let n =BG n be the sieve parameter space, a sequence of increasing subsets of the parameter space =BG growing dense in as n!1. Recall that the dimension ofG n is denoted by J n . Let ^ n = ^ n ; ^ g n be the sieve M-estimator, that is ^ n = arg max =(;g)2n P n ` (;g;z i ): (3.18) Define 0 = 0 ; g 0 2 :P` 0 ; g 0 ;Z = max (;g)2BG P` (;g;Z) ; G 0 = g 0 2G : there exists a 0 2B 0 such that 0 ; g 0 2 0 ; for notational simplicity. We assume the following regularity conditions. Assumption 11. (i) For any g2G, the log-likelihood function ` (;g;Z) is continuous for 2B, and it 86 3.4. Asymptotic Theory is dominated by an integrable function. (ii) For any 2B, the log-likelihood function ` (;g;Z) is twice Fréchet differentiable for g2G. (iii) Denote @ 0 the boundary of the set 0 . For 0 = 0 ; g 0 2@ 0 , the Hessian matrix E ( @ 2 ` 0 ; g 0 ;Z @@ 0 ) is negative definite. (iv) The first-order derivatives of ` (;g;Z) are bounded on : sup 2 k@` (;g;Z)=@k<1 and sup 2 k@` (;g;Z)=@gk<1: (v) There is some p 1 such that for all g 0 2G 0 , m 0 (z) = log g 0 fa + (ba) (z + 1)=2g,1 z 1, m (1) 0 ;:::; m (p1) 0 are absolutely continuous and the Chebyshev-weighted seminormk m (p) 0 k T <1. In addition, g 0 is Höolder continuous with p-th order, that sup x1;x22[a;b] fj g 0 (x 1 ) g 0 (x 2 )j=jx 1 x 2 j p g< L for some constant L. Theorem 14. Let J n =O (n ), where (2p + 1) 1 < < (2p 1) 1 with p being the smoothness parameter defined in Assumption 11.??. Let ^ n be the sieve M-estimator as defined in (3.18). Suppose Assumption 9-11 hold, we have d H ^ n ; 0 =O p n minf 1 2 ; p(2p1) 2p+1 g : Corollary 1. Suppose the conditions of Theorem 14 hold. Let ^ n = ^ n ; ^ g n be the sieve M-estimator; let P n ^ ` profile (;z i ) =P n `f; ^ g n (;) ;z i g, where ^ g n (;) = arg max g2Gn P n ` (;g;z i ). Then max 2B P n ^ ` profile (;z i )P` profile (;Z) =O p n minf 1 2 ; p(2p1) 2p+1 g : Corollary 2. Suppose the conditions of Theorem 14 hold. Let n be a draw from the constrained simulated annealing algorithm with the the temperature parameter 1 n =O n 0 , where 0 < min n 1 2 ; p(2p1) 2p+1 o , and sample profile likelihood function being substituted with P n ^ ` profile (;z i ). We have d H n ;B 0 =O p n 1=2 , where B 0 is the identified set of 0 . Corollary 1 derives the convergence rate of the sieve approximate sample profile log-likelihood function P n ^ ` profile (;z i ), which is necessary for using the constrained simulated annealing algorithm as we remarked 87 3.5. Numerical Examples in Remark 13. According to Corollary 2, the convergence rate of the asymptotic marks is p n . Given these two results, we are in a position to state the main result – the consistency of the AH set estimator of B 0 . Theorem 15. Let the asymptotic hit estimator ^ B n be defined by (3.3); let 1 n ;:::; mn n be m n asymptotic marks obtained by using the CSA algorithm based on the sieve the sample profile log-likelihood function P n ^ ` profile (;z i ). Suppose 11-Assumption 9 hold. (i) If the bin width w mn ! 0, m n w K mn !1, m n = O n , 0 < < 1, then d ^ B n ;B 0 ! p 0. Recall that K is the dimension of the structural parameters 2BR K . (ii) If m n = O n , 0 < < 1, and w mn = 1=K mn m n where 0 < < 1=K and mn is the Lebesgue measure of the smallest rectangle with sides parallel to the coordinates axes and containing all the marks 1 n ;:::; mn n , d ^ B n ;B 0 = O p n . And this is the best rate ^ B n can reach under the Lebesgue measure . Remark 15. The optimal choice of the number of marks, m n , depends on the sample size, n. In particular, Theorem 15 says m n =O n , 0< < 1. At the first glance, this looks counter-intuitive. More marks seem to be better for estimating the shape. The problem is that for a smaller sample, the estimation error is large. More marks induce more marks that are outside the identified set, and the estimated set is more different from the identified set. 3.5 Numerical Examples In this section, I will show the power of the developed estimation method by applying it to some simulation examples. In particular, I will examine its performance when the underlying model is point identified (the logit model in Section 5.1) and set identified (the probit model in Section.5). Moreover, the method can be used to estimate a linear regression with endogenous variables, for which no instrumental variables are easily available. 3.5.1 Point identification case: a logit example The outcome Y i = (Y i1 ;:::;Y iT ) is generated from P (Y it = 1jX it ;C it ; i ) = exp (X it 1;0 +C it 2;0 + i ) 1 + exp (X it 1;0 +C it 2;0 + i ) ; i = 1;:::;n; t = 1;:::;T: 88 3.5. Numerical Examples Figure 3.3: Estimation of Identified Set in Logit Models -4 -2 0 2 4 -4 -2 0 2 4 β1 β2 0.6 0.8 1.0 1.2 1.4 0.6 0.8 1.0 1.2 1.4 β1 β2 Here X it is drawn from standard normal distribution, and C i1 = =C iT =C i is drawn from a Bernoulli distribution with 1=2 probability of being 1. The fixed effect i follows i jC i TrunN [2;2] (2C i 1; 1); which is a truncated normal distribution in [2; 2]. Note that the modes of i jC i = 0 and i jC i = 1 are1 and 1, respectively. I study the the case of big n and small T. So T is fixed at 4 in all simulation examples reported here. The true value of ( 1;0 ; 2;0 ) = (1; 1). Note that the logit model is (the only) point identified model when the outcome is binary (Chamberlain, 2010). Therefore, the identified set B 0 =f(1; 1)g is a single point inR 2 here. So it should be expected that ^ B n resembles a point. In practice, I set the number of marks m n = p n, and w n = 1=2 mn m 1=3 n according to Theorem 15.(ii) with K = 2 and = 1= (K + 1). For a sample with n = 250, the sieve MLE took about 9 minutes, and the marks determination took 25 minutes on a dual core iMac with 3.06 GHz processor. The developed algorithm is based on parallel computing, so the number of cores of a computer matters. The d n used is n 3=4 . In the supplemental material, details about the computation are explained. A careful monte carlo study will take days, which is not allowed due to time constraint. I present only one trial here to show the result. The left-hand-side of Figure 3.3 is the identified set whenB = [5; 5] 2 . To show the H-shape estimate, I zoom in the graph, which is the right-hand-side of Figure 3.3. The graph shows that for logit model, the estimate of identified set B 0 shrinks nearly to be a point at the true value (1; 1). 89 3.5. Numerical Examples Figure 3.4: Estimation of Identified Set in Probit Models -4 -2 0 2 4 -4 -2 0 2 4 β1 β2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 β1 β2 3.5.2 Set identification case: a probit example The outcome Y i = (Y i1 ;:::;Y iT ) is generated from P (Y it = 1jX it ;C it ; i ) = (X it 1 +C it 2 + i ); i = 1;:::;n; t = 1;:::;T: The data generation process for the covariates are the same as the previous example. Figure 3.4 illustrates the estimation results. Again, the right-hand-side magnifies the left-hand-side estimate. A probit model is not point identified. Thus, the estimate of the identified set is a nontrivial set. Interestingly, the result shows that the signs of 1 and 2 are identified. And the identified set is not too big, especially for 1 . 90 Bibliography Ackerberg, D., C. L. Benkard, S. Berry, and A. Pakes (2007). Econometric tools for analyzing market outcomes. Handbook of econometrics 6, 4171–4276. Aguirregabiria, V. (2005, December). Another look at the identification of dynamic discrete decision processes: An application to retirement behavior. Working paper, Boston University. Aguirregabiria, V. (2010). Another look at the identification of dynamic discrete decision processes: An application to retirement behavior. Journal of Business & Economic Statistics 28(2), 201–218. Aguirregabiria, V. and P. Mira (2002). Swapping the nested fixed point algorithm: A class of estimators for discrete markov decision models. Econometrica 70(4), 1519–1543. Aguirregabiria, V. and J. Suzuki (2014). Identification and counterfactuals in dynamic models of market entry and exit. Quantitative Marketing and Economics 12(3), 267–304. Aitchison, J. and C. G. Aitken (1976). Multivariate binary discrimination by the kernel method. Biometrika 63(3), 413–420. Arcidiacono, P. and R. A. Miller (2015). Identifying dynamic discrete choice models off short panels. Working paper. Arellano, M. and S. Bonhomme (2011). Nonlinear panel data analysis. The Annual Review of Economics 3, 395–424. Athey, S. and P. A. Haile (2002). Identification of standard auction models. Econometrica 70(6), 2107–2140. Baíllo, A. and A. Cuevas (2006). Image estimators based on marked bins. Statistics 40, 277–288. Bajari, P., C. L. Benkard, and J. Levin (2007). Estimating dynamic models of imperfect competition. Econometrica 75(5), 1331–1370. Bajari, P., V. Chernozhukov, H. Hong, and D. Nekipelov (2009). Identification and efficient semiparametric estimation of a dynamic discrete game. Technical report, National Bureau of Economic Research. 91 Bibliography Bajari, P., H. Hong, and D. Nekipelov (2010). Game theory and econometrics: A survey of some recent research. In Advances in Economics and Econometrics: Tenth World Congress, Volume 3, pp. 3–52. Beirlant, J. and L. Gyorfi (1998). On the l 1 -error in histogram density estimation: The multidimensional case. Journal of Nonparametric Statistics 9, 197–216. Beresteanu, A., I. Molchanov, and F. Molinari (2011). Sharp identification regions in models with convex moment predictions. Econometrica 79, 1785–1821. Bishop, Y. M. M., S. E. Fienberg, and P. W. Holland (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge: The MIT Press. Blevins, J. R. (2014). Nonparametric identification of dynamic decision processes with discrete and continuous choices. Quantitative Economics 5(3), 531–554. Blundell, R. W. and J. L. Powell (2004). Endogeneity in semiparametric binary response models. The Review of Economic Studies 71(3), 655–679. Bollobás, B. (1990). Linear analysis: An introductory course, cambridge mathematical textbooks. Bonhomme, S. (2012). Functional differencing. Econometrica 80, 1337–1385. Bonhomme, S., K. Jochmans, and J.-M. Robin (2013). Nonparametric estimation of finite mixtures. Technical report, Sciences Po Economics Discussion Papers. Bonhomme, S., K. Jochmans, and J.-M. Robin (2014). Nonparametric spectral-based estimation of latent structures. Working paper CWP18/14, CEMMAP. Budish, E. B. and L. N. Takeyama (2001). Buy prices in online auctions: irrationality on the internet? Economics letters 72(3), 325–333. Chamberlain, G. (2010). Binary response models for panel data: Identification and information. Economet- rica 78, 159–168. Chaudhuri, A. R., A. Basu, S. K. Bhandari, and B. B. Chaudhuri (1999). An efficient approach to consistent set estimation. Sankhy¯ a B 61, 496–513. Chernozhukov, V., I. Fernández-Val, J. Hahn, and W. Newey (2009). Identification and estimation of marginal effects in nonlinear panel models. Working paper, MIT. Chernozhukov, V., H. Hong, and E. Tamer (2007). Estimation and confidence regions for parameter sets in econometric models. Econometrica 75, 1243–1284. 92 Bibliography Ching, A. and M. Osborne (2015). Identification and estimation of forward-looking behavior: The case of consumer stockpiling. Available at SSRN 2594032. Csorgo, M., S. Csorgo, L. Horváth, and D. M. Mason (1986). Weighted empirical and quantile processes. The Annals of Probability, 31–85. Dekkers, A. and E. Aarts (1991). Global optimization and simulated annealing. Mathematical Programming 50, 367–393. Ding, Y. and B. Nan (2011). A sieve m-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. The Annals of Statistics 39, 3032–3061. Egozcue, J. J., J. L. Díaz-Barrero, and V. Pawlowsky-Glahn (2006). Hilbert space of probability density functions based on aitchison geometry. Acta Mathematica Sinica, English Series 22, 1175–1182. Fang, H. and Y. Wang (2015). Estimating dynamic discrete choice models with hyperbolic discounting, with an application to mammography decisions. International Economic Review 56(2), 565–596. Geman, S. and C. R. Hwang (1982). Nonparametric maximum likelihood estimation by the method of sieves. The Annals of Statistics 10, 401–414. Guerre, E., I. Perrigne, and Q. Vuong (2000). Optimal nonparametric estimation of first-price auctions. Econometrica 68(3), 525–574. Hahn, J. and W. Newey (2004). Jackknife and analytical bias reduction for nonlinear panel models. Econometrica 72, 1295–1319. Haile, P. A. and E. Tamer (2003). Inference with an incomplete model of english auctions. Journal of Political Economy 111(1), 1–51. Heckman, J. J. and S. Navarro (2007). Dynamic discrete choice and dynamic treatment effects. Journal of Econometrics 136(2), 341–396. Henderson, D., S. H. Jacobson, and A. W. Johnson (2003). The theory and practice of simulated annealing. In F. Glover and G. A. Kochenberger (Eds.), Handbook of Metaheuristics, Volume 57 of International Series in Operations Research & Management Science, pp. 287–319. Springer. Hidvegi, Z., W. Wang, and A. B. Whinston (2006). Buy-price english auction. Journal of Economic Theory 129(1), 31–56. 93 Bibliography Hong, H. and M. Shum (2003). Econometric models of asymmetric ascending auctions. Journal of Economet- rics 112(2), 327–358. Honoré, B. E. and E. Tamer (2006). Bounds on parameters in panel dynamic discrete choice models. Econometrica 74, 611–629. Hotz, V. J. and R. A. Miller (1993). Conditional choice probabilities and the estimation of dynamic models. The Review of Economic Studies 60(3), 497–529. Hu, Y. and M. Shum (2012). Nonparametric identification of dynamic models with unobserved state variables. Journal of Econometrics 171(1), 32–44. Huang, J. Z. et al. (2003). Local asymptotics for polynomial spline regression. The Annals of Statistics 31(5), 1600–1635. Hutson, V. and J. S. Pym (1980). Applications of Functional Analysis and Operator Theory. London: Academic Press. Imbens, G. W. and W. K. Newey (2009). Identification and estimation of triangular simultaneous equations models without additivity. Econometrica 77(5), 1481–1512. Kalouptsidi, M., P. T. Scott, and E. Souza-Rodrigues (2015). Identification of counterfactuals and payoffs in dynamic discrete choice with an application to land use. Working paper, NBER. Kasahara, H. and K. Shimotsu (2009). Nonparametric identification of finite mixture models of dynamic discrete choices. Econometrica 77(1), 135–175. Keane, M. P., P. E. Todd, and K. I. Wolpin (2011). The structural estimation of behavioral models: Discrete choice dynamic programming methods and applications. In O. Ashenfelter and D. Card (Eds.), Handbook of Labor Economics, Volume 4a, Chapter 4, pp. 331–461. Elsevier. Keane, M. P. and K. I. Wolpin (1997). The career decisions of young men. Journal of Political Economy 105(3), 473–522. Kirkegaard, R. and P. B. Overgaard (2008). Buy-out prices in auctions: seller competition and multi-unit demands. The RAND Journal of Economics 39(3), 770–789. Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. New York: Springer. 94 Bibliography Laffont, J.-J., H. Ossard, and Q. Vuong (1995). Econometrics of first-price auctions. Econometrica: Journal of the Econometric Society, 953–980. Li, T., I. Perrigne, and Q. Vuong (2000). Conditionally independent private information in ocs wildcat auctions. Journal of Econometrics 98(1), 129–161. Magnac, T. and D. Thesmar (2002). Identifying dynamic discrete decision processes. Econometrica 70(2), 801–816. Mason, D. M. and W. Polonik (2009). Asymptotic normality of plug-in level set estimates. The Annals of Applied Probability 19, 1108–1142. Mathews, T. (2004). The impact of discounting on an auction with a buyout option: a theoretical analysis motivated by ebay’s buy-it-now feature. Journal of Economics 81(1), 25–52. Matzkin, R. L. (1992). Nonparametric and distribution-free estimation of the binary threshold crossing and the binary choice models. Econometrica, 239–270. Norets, A. and X. Tang (2014). Semiparametric inference in dynamic binary choice models. The Review of Economic Studies 81, 1229–1262. Paarsch, H. J. (1992). Deciding between the common and private value paradigms in empirical models of auctions. Journal of econometrics 51(1-2), 191–215. Pesendorfer, M. and P. Schmidt-Dengler (2008). Asymptotic least squares estimators for dynamic games. The Review of Economic Studies 75(3), 901–928. Reynolds, S. S. and J. Wooders (2009). Auctions with a buy price. Economic Theory 38(1), 9–39. Rust, J. (1987). Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica, 999–1033. Rust, J. (1994). Structural estimation of markov decision processes. In R. F. Engle and D. McFadden (Eds.), Handbook of econometrics, Volume 4, Chapter 51, pp. 3081–3143. Elsevier. Schumaker, L. (2007). Spline functions: basic theory. Cambridge University Press. Shen, X. and W. H. Wong (1994). Convergence rate of sieve estimates. The Annals of Statistics 22, 580–615. Srisuma, S. and O. Linton (2012). Semiparametric estimation of markov decision processes with continuous state space. Journal of Econometrics 166(2), 320–341. 95 Su, C.-L. and K. L. Judd (2012). Constrained optimization approaches to estimation of structural models. Econometrica 80(5), 2213–2230. van der Vaart, A. W. (1998). Asymptotic Statistics. New York: Cambridge University Press. Wang, H. and S. Xiang (2012). On the convergence rates of legendre approximation. Mathematics of Computation 81, 861–877. Wellner, J. A. and Y. Zhang (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. The Annals of Statistics 35, 2106–2142. Xue, L. and J. Wang (2010). Distribution function estimation by constrained polynomial spline regression. Journal of Nonparametric Statistics 22(4), 443–457. Xue, L. and L. Yang (2006). Additive coefficient modeling via polynomial spline. Statistica Sinica, 1423–1446. 96 Appendix A Appendix of Chapter 1 A.1 Proofs Lemma A.1. Let A be an mn real matrix with mn 1. Suppose each row of A sums to be zero and rankA =n1. Suppose linear equationAx =b has solutions. Then the solution set isfA + b+c1 n :8c2Rg, where A + is the Moore-Penrose pseudoinverse of A, and 1 n is a n-dimensional vector of ones. Proof. We know that the solution set of equation Ax =b isfA + b + (IA + A)a :8a2R n g. It suffices to show that (IA + A) is an nn matrix, whose elements are identical. Let A =UV | be an singular value decomposition (SVD) of matrix A. We know that A + =V + U | , where + is the pseudoinverse of . Because U and V are both orthogonal matrices, we have A + A = V + V | as an eigenvalue decomposition (EVD). When rankA =n 1, we have + = 2 6 4 I n1 0 3 7 5; whereI n1 is (n 1) (n 1) identity matrix. So the columns ofV are eigenvectors ofA + A corresponding to the eigenvalues 1 and 0. Because the sum of columns of A is zero, 1 n is an eigenvector of A + A corresponding to eigenvalue zero, and n 1=2 1 n is one column of V. Removing the column n 1=2 1 n from matrix V, we obtain an n (n 1) matrix ~ V and A + A =V + V | = ~ V ~ V | . As V is an orthogonal matrix, we have I =VV | = ~ V n 1=2 1 n 2 6 4 ~ V | n 1=2 1 | n 3 7 5 = ~ V ~ V | +n 1 1 n 1 n | =A + A +n 1 1 nn : Here 1 nn is a nn matrix whose elements are all 1. So we have IA + A =n 1 1 nn , and the lemma will 97 A.1. Proofs follow. Lemma A.2. Let A 1 and A 2 both be mn real matrices with m 2(n 1). Define a block matrix A A 1 A 2 . For each i = 1; 2, suppose each row of A i sums to be zero, and rankA = 2n 2. Suppose linear equation A 2 6 4 x 1 x 2 3 7 5 =b has solutions. Let 2 6 4 x 1;+ x 2;+ 3 7 5A + b: Then the solution set of the equation is 8 > < > : 2 6 4 x 1;+ +c 1 1 n x 2;+ +c 2 1 n 3 7 5 :c 1 ;c 2 2R 9 > = > ; : Proof. The proof is similar to the proof of Lemma A.1. The solution set of equation Ax = b isfA + b + (IA + A)a :8a2R n g. Let A =UV | be an SVD of matrix A. We have A + A =V + V | as an EVD. Because rankA = 2n 2 and the row sums of each A i (i = 1; 2) are zero, we have + = 2 6 4 I 2n2 0 22 3 7 5: So V has two columns w | 1 = n 1=2 (1 | n ; 0 | n ) and w | 2 = n 1=2 (0 | n ; 1 | n ), because they are two orthonormal eigenvectors corresponding to eigenvalue 0. Removing w 1 and w 2 from the columns of matrix V, we obtain an 2n (2n 2) matrix ~ V whose columns are eigenvectors corresponding to the 2n 2 nonzero eigenvalues. We then have A + A =V + V | = ~ V ~ V | . As V is an orthogonal matrix, we have I =VV | = ~ V w 1 w 2 2 6 6 6 6 4 ~ V | w | 1 w | 2 3 7 7 7 7 5 = ~ V ~ V | +w 1 w 1 | +w 2 w | 2 =A + A + 2 6 4 1 nn 1 nn 3 7 5: 98 A.1. Proofs The rest of the proof follows immediately. Proof of Proposition 4. The key observation is that when Z t is time invariant, the state transition matrix F d t is a d x -by-d x block matrix, F d t = 2 6 6 6 6 6 6 6 4 D t (x 1 ;x 1 ) D t (x 1 ;x 2 ) ::: D t (x 1 ;x dx ) D t (x 2 ;x 1 ) D t (x 2 ;x 2 ) ::: D t (x 2 ;x dx ) . . . . . . . . . . . . D t (x dx ;x 1 ) D t (x dx ;x 2 ) ::: D t (x dx ;x dx ) 3 7 7 7 7 7 7 7 5 ; of which each element D t (x i ;x j ) is a d z -by-d z diagonal matrix, for each period t and each choice d. The diagonal matrix D t (x i ;x j ) has the following form, D t (x i ;x j ) = 2 6 6 6 6 4 f t (x j ;z 1 jx i ;z 1 ;Y t1 = d) . . . f t (x j ;z d z jx i ;z d z ;Y t1 = d) 3 7 7 7 7 5 ; because f t (x j ;z k jx i ;z l ;Y t1 = d) = 0 whenever z k 6=z l . Let e i be an d z -dimensional vector whose elements are all zero excepting for the i-th element being 1. One can verify that ~ e 1 1 2dx e 1 ;:::; ~ e dz 1 2dx e dz ; ~ e dz +1 (0 | dx ; 1 | dx ) | ; belong to the null space of A t , and are linearly independent. Hence, if rankA t = 2d s d z 1, we have N (A t ) = span(~ e 1 ;:::; ~ e dz +1 ). Then the solution set for ?? is 8 > < > : 2 6 4 v t;+ v e t+1;+ 3 7 5 + 1 ~ e 1 + + dz +1 ~ e dz +1 : ( 1 ;:::; dz +1 )2R dz +1 9 > = > ; : (A.1) Let ~ e i = 2 6 4 ~ e i;h ~ e i;l 3 7 5: Then 1=0 t1 and 1=0 t are identified because F 1=0 t ~ e i;h =F 1=0 t+1 ~ e i;l = 0 ds for each i = 1;:::;d z + 1. The period utility function 0 t is identified up to an additive constant because for any v 0 t ; v e 0 t+1 belonging to the solution 99 A.1. Proofs set (A.1), we have v 0 t F 0 t+1 v e 0 t+1 = v t;+ F 0 t+1 v e t+1;+ + 1 dx ( dz +1 1 dz ) = v t;+ F 0 t+1 v e t+1;+ + dz +1 1 ds : So the conclusion of Proposition 2 can be established for the permanent excluded variable case. The same proof of Proposition 3 can be used to identify the discount factors in the permanent excluded variable situation. Proof of Proposition 5. We have a system A t 2 6 6 6 6 4 v e t+1 v t v t1 3 7 7 7 7 5 =b t ; (A.2) where A t and b t are as defined in equation (1.60). Note that the two vectors q | 1 = (1 | ds ; 0 | ds ;1 | ds ); q | 2 = (0 | ds ; 1 | ds ; (1 + t1 ) 1 | ds ); are linearly independent, and A t q 1 =A t q 2 = 0 3ds . So q 1 and q 2 are contained in the null space of matrix A t , denoted byN (A t ). By the assumption that rankA t = 3d s 2, we then haveN (A t ) = span(b 1 ;b 2 ). So the solution set of equation (A.2) is A + t b t + 1 q 1 + 2 q 2 : 1 ; 2 2R : We then have 1=0 = 1 3 (p t ) +(p t1 ) +(p t2 ) F 1=0 t+1 t1 F 1=0 t t2 F 1=0 t1 A + t b t ; is unique. Here F 1=0 t+1 t1 F 1=0 t t2 F 1=0 t1 is 1 3 block matrix. And the solution set of 0 is f 0 + +c 1 ds :c2Rg; 100 A.2. Counterfactual Policy Predictions under Normalization of Period Utility Functions where 0 + = 1 2 F 0 t+1 (I ds t1 F 0 t ) I ds A + t b t (p t ) (p t1 ) : Applying the normalization, 0 is identified with 0 =L 0 + . A.2 Counterfactual Policy Predictions under Normalization of Period Utility Functions For simplicity, we focus on the case where state transition matrices and discount factors are known and time invariant. Let F 0 and F 1 be the state transition matrices that generate the observed data. And let be discount factor. Consider a counterfactual experiment that changes state transition matrices but do not change per period utility functions. Let F 0 c and F 1 c the state transition matrices under counterfactual experiment. We are interested in predicting the counterfactual CCP. A.2.1 Consequence for stationary DPDC models Suppose the agent’s dynamic programming problem is stationary. One way to identify the stationary DPDC model is to assume that 0 and the discount factor are known. Let 0 = ( 0 (s 1 );:::; 0 (s ds )) | and ~ 0 = (~ 0 (s 1 );:::; ~ 0 (s ds )) | be two normalized period utility functions. Let F 0 c and F 1 c be counterfactual state transition matrices. Given ( 0 ;;F 0 c ;F 1 c ), we will have one counterfactual CCP p c (S). Similarly, the set (~ 0 ;;F 0 c ;F 1 c ) also defines a counterfactual CCP ~ p c (S). Write p c = (p c (s 1 );:::p c (s ds )) | and ~ p c = (~ p c (s 1 );::: ~ p c (s ds )) | . The following proposition answers the question when will p c = ~ p c , so the normalization of per period utility function 0 will be innocuous for predicting counterfactual policy effects. Proposition A.1. Define BF 1=0 (IF 0 ) 1 and B c F 1=0 c (IF 0 c ) 1 . One necessary condition for p c = ~ p c is that ( 0 ~ 0 )2N (BB c ), whereN (BB c ) is the null space of matrix BB c . One sufficient condition for p c = ~ p c is that 0 ~ 0 equals to a vector whose entries are identical, and this condition would also be necessary if rank (BB c ) =d s 1. Proof. By the definition of the ASVF v 0 (v 0 (s 1 );:::;v 0 (s ds )) | and v 1 (v 1 (s 1 );:::;v 1 (s ds )) | , we have v 1 v 0 = 1=0 +F 1=0 v: Also, it follows from equation (1.41), v = 0 +F 0 v + (p): 101 A.2. Counterfactual Policy Predictions under Normalization of Period Utility Functions So we have v 1 v 0 = 1=0 +F 1=0 (IF 0 ) 1 ( 0 + (p)) = 1=0 +B( 0 + (p); with B =F 1=0 (IF 0 ) 1 : Similarly, the difference between the counterfactual ASVF of two alternatives, v 1 c v 0 c , is v 1 c v 0 c = 1=0 +F 1=0 c (IF 0 c ) 1 ( 0 + (p c )) = 1=0 +B c ( 0 + (p c )); with B c =F 1=0 c (IF 0 c ) 1 : We know that v 1 v 0 =(p) and v 1 c v 0 c =(p c ): So we conclude (p)(p c ) = (BB c ) 0 +B (p)B c (p c ): We have similar conclusion for using the per period utility functions ~ 0 : (p)(~ p c ) = (BB c ) ~ 0 +B (p)B c (~ p c ): Hence, we have (~ p c )(p c ) = ((p)(p c )) ((p)(~ p c )) = (BB c )( 0 ~ 0 )B c (p c ) +B c (~ p c ): In other words, ((~ p c )B c (~ p c )) ((p c )B c (p c )) = (BB c )( 0 ~ 0 ): Define a mapping g :R ds 7!R ds such that g(p)(p)B c (p) for any p2R ds . We then have g(~ p c )g(p c ) = (BB c )( 0 ~ 0 ): (A.3) 102 A.2. Counterfactual Policy Predictions under Normalization of Period Utility Functions By the mean value theorem for vector valued mappings, we have 1 0 rg(p c +(~ p c p c )) d (~ p c p c ) = (BB c ) ( 0 ~ 0 ): Suppose ~ p c = p c . We must have (BB c ) ( 0 ~ 0 ) = 0, that is 0 ~ 0 belongs to the null space of BB c . When rank (BB c ) =d s 1, the null space of BB c contains only the vectors, whose elements are identical. This proves the necessary part of the proposition. For any ~ 0 = 0 +a, where a is a vector whose elements are all a, we have ~ v = v + (1) 1 a and ~ v c = v c + (1) 1 a. Because F 1=0 v = F 1=0 ~ v, we have 1=0 = ~ 1=0 . Then we have v 1 c v 0 c = ~ v 1 c ~ v 0 c for F 1=0 v c =F 1=0 ~ v c , which implies that ~ p c =p c . This shows the sufficiency part. A.2.2 Consequence for DPDC models with finite horizon Suppose the agent’s dynamic programming problem has finite horizon, and the last sampling period T is the decision horizon T . Let 0 t and ~ 0 t be two assumed per period utility of alternative 0. Let p t;c and ~ p t;c be the counterfactual CCP vectors associated with the assumed per period utility functions 0 t and ~ 0 t . The following proposition answers the question when will p T1;c = ~ p T1;c (p T;c = ~ p T;c is always true). Of course, the proposition can be extended to cover the other periods at the expense of more complicated notation. Proposition A.2. Define BF 1=0 and B c F 1=0 c . One necessary condition for p T1;c = ~ p T1;c is that ( 0 T ~ 0 T )2N (BB c ), whereN (BB c ) is the null space of matrix BB c . One sufficient condition for p T1;c = ~ p T1;c is that ( 0 T ~ 0 T ) equals to a vector of which all entries are identical, and this condition would also be necessary if rank (BB c ) =d s 1. Proof. We first have(p t ) = 1=0 T = ~ 1=0 T ,v T = 0 T + (p T ) and ~ v T = ~ 0 T + (p T ) for the last periodT. Next, it follows from v 1 T1 v 0 T1 = ~ v 1 T1 ~ v 0 T1 that 1=0 T1 +F 1=0 v T = ~ 1=0 T1 +F 1=0 ~ v T ; which implies that 1=0 T1 ~ 1=0 T1 =F 1=0 ~ 0 T 0 T : Now consider the counterfactual experiment. We have p T;c = ~ p T;c = p T because the counterfactual experiment does not change per period utilities. So v T;c =v T and ~ v T;c = ~ v T . For period T 1, however, we have (p T1;c ) =v 1 T1;c v 0 T1;c = 1=0 T1 +F 1=0 c v T 103 A.2. Counterfactual Policy Predictions under Normalization of Period Utility Functions and (~ p T1;c ) = ~ v 1 T1;c ~ v 0 T1;c = ~ 1=0 T1 +F 1=0 c ~ v T : Then (p T1;c ) (~ p T1;c ) = 1=0 T1 ~ 1=0 T1 +F 1=0 c (v T ~ v T ) =F 1=0 ~ 0 T 0 T F 1=0 c ~ 0 T 0 T = F 1=0 F 1=0 c ~ 0 T 0 T : The above display is similar to equation (A.3). So we can apply the arguments in the proof of Proposition A.1 to prove the present proposition. 104 Appendix B Appendix of Chapter 2 Let B E = B 0;p ;:::;B e q T E ;p be the set of normalized B-spline basis for the linear spaceG (p) T E . For any g 1 ; g 2 2G (p) T E , define the inner producthg 1 ; g 2 i =E WjW<0 fg 1 (W )g 2 (W )g and hg 1 ; g 2 i T E = 1=T E J X j=1 T E j X t=1 n g 1 W (mj1):mj E;t g 2 W (mj1):mj E;t o : The induced norms fromh;i andh;i T E arekgk 2 =E WjW<0 g 2 (W ) and kgk 2 T E = 1=T E J X j=1 T E j X t=1 n g 2 W (mj1):mj E;t o ; respectively. Also, set the supremum normkgk 1 = sup w2[w; 0) jg (w)j. ThenG (p) T E is a Hilbert space equipped with the empirical inner producth;i T E. In perfect analogy,G (p) T B ,h;i + ,h;i T B,kk + ,kk T B andkk + 1 are defined in a similar fashion. For a sequencefx t g 1 t=1 , I usefx t g"x 0 to denote lim t!1 x t =x 0 in the sequel. Denote a^b = min (a; b) and a_b = max (a; b). B.1 Regularity Conditions Assumption B.1. I need the following conditions to ensure the distribution of valuations to be a regular absolutely continuous distribution function. 1. The distribution F W is p + 1 times continuously differentiable for some p 1; 2. Let f W be the first-order derivative of F W , also is the density function. Assume f W is compactly supported, i.e. [w; w] is a compact set inR; 3. The density function f W is uniformly bounded below from 0 and above from infinity on [w; w]. Assumption B.2. The following technical conditions are needed for establishing my main theorems. 105 B.2. Proofs 1. The knot sequence e K T E = w = p < p+1 :::< p+q T E < e q T E +1 = 0 has a bounded mesh ratio–in other words, there is a constant c such that max k=p;:::;e q T E f k+1 k g= min k=p;:::;e q T E f k+1 k gc. 2. The number of inner knots q T E!1, as T E !1, and lim T E !1 q T E log T E =T E = 0. 3. The knot sequence e K T B = w = p < p+1 :::< p+q T B < e q T B +1 = 0 has a bounded mesh ratio–in other words, there is a constant c 0 such that max k=p;:::;e q T B f k+1 k g= min k=p;:::;e q T B f k+1 k g c 0 , where e q T B =p +q T B. 4. The number of inner knots q T B!1, as T B !1, and lim T B !1 q T B log T B =T B = 0. B.2 Proofs Proof of Lemma 1. If B m:m E 6=V (m1):m E , then either B m:m E <V (m1):m E or B m:m E >V (m1):m E . I will show by contradiction that neither is possible. Without loss of generality, let bidder 1’s valuation be V (m1):m E . If B m:m E < V (m1):m E , then there exists a > 0 such that B m:m E + < V (m1):m E . Thus, by bidding at (B m:m E +), bidder 1 will gain utility u V (m1):m E B m:m E which is bigger than u (0). So B m:m E cannot be the final transaction price by the definition of equilibrium. If B m:m E >V (m1):m E , V (m1):m E <b since B m:m E <b. According to Theorem 1, bidder 1 will follow the “traditional” strategy. Then bidder 1 will quit when the auction clock reaches V (m1):m E . Then only the bidder with valuation V m:m E is still pressing the “bid” button. By the rules, this bidder wins, and the transaction price is V (m1):m E <B m:m E . Therefore, one must have B m:m E =V (m1):m E . V (m1):m E <b is obvious since B m:m E <b. Proof of Lemma 2. Without loss of generality, assume bidder 1 has the valuation V m:m B . Then one must haveV m:m B b. Otherwise, V m:m B <b, and bidder 1 gains utility u V m:m B b <u (0) by bidding at the buy price. Thus bidder 1 would not use the buy price which is a contradiction to the fact that bidder 1 is the winner of the auction by executing the buy price option. By Theorem 1 and Assumption 5, bidder 1 will follow the “threshold” strategy. If B (m1):m B <t V m:m B ;b , bidder 1 will not use the buy price option, and the auction clock will ascends. Thus, B (m1):m B cannot be the (m 1)-th order statistic. On the other hand, B (m1):m B >t V m:m B ; b would contradict to the definition of the “threshold” strategy. Therefore, B (m1):m B =t V m:m B ; b . Proof of Theorem 2. Note that F V vjV <b =F W vbjW < 0 : 106 B.2. Proofs Then this result follows from Theorem 1 of Athey and Haile (2002). A complementary remark is that V (m1):m E is indeed a random sample from F V (m1):m jV (m1):m <b since whether a BP English auction ends without using the buy price option also depends onV m:m which is random and independent ofV (m1):m conditional on B =b. The proof of Theorem 3 makes use of the following lemma. Lemma B.1. Let X 1 ;:::; X m be independent and identically distributed (i.i.d.) random variables with an absolutely continuous cumulative distribution function (CDF) F X (x) and denote X 1:m ;:::; X m:m the corresponding order statistics. The conditional distribution function of X (m1):m given X m:m =e x is F X (m1):m jX m:m =e x (x) = F X (x) F X (e x) m1 ; xe x: Proof. Without loss of generality, assume the support of X is [0; 1]. By definition, F X (m1):m jX m:m =e x (x) = x 0 f X (m1):m jX m:m =e x (t)dt; 0xe x: (B.1) The conditional density function of X (m1):m given X m:m =e x is f X (m1):m jX m:m =e x (t) = f X (m1):m ;X m:m (t;e x) f X m:m (e x) ; (B.2) where f X m:m (e x) =mF m1 X (e x)f X (e x); (B.3) f X (m1):m ;X m:m (t;e x) = m! (m 2)! f X (t)fF X (t)g m2 f X (e x): (B.4) (B:1)-(B:4) give rise to F X (m1):m jX m:m =e x (x) = x 0 f X (m1):m jX m:m =e x (t)dt = x 0 m! (m2)! f X (t)fF X (t)g m2 f X (e x)dt f X m:m (e x) = mf X (e x)fF X (x)g m1 mf X (e x)fF X (e x)g m1 = F X (x) F X (e x) m1 : 107 B.2. Proofs Then this lemma has been proven. Proof of Theorem 3. At first, I show that with probability 1 an m-bidder buy-price English auction would end without using the buy price option if and only if one of the following two events happens: the first event, E 1 , is that all bidders’ valuations are below the buy price; and the second event, E 2 , is that V m:m b and t V m:m ; b >V (m1):m . I prove the “if” part first. Suppose all bidders’ valuations are below the buy price.. Then according to Theorem 1, every bidder follows the “traditional” strategy–bid up to their valuations. Then the auction will end at a price lower than the buy price. Suppose the maximum of the m valuations, V m:m , is above the buy price, V m:m b, and the corresponding threshold price t V m:m ; b is above all the other valuations, i.e. t V m:m ; b >V (m1):m . In such a context, except for the bidder with valuation V m:m , all bidders will follow the “traditional” strategy. Since t V m:m ; b >V (m1):m , the threshold price will never be triggered, and the auction ends at V (m1):m as implied by Lemma 1. Till now, I have shown the “if” part. To show the “only if” part, I show that any alternative settings of the valuations will make the auction end with buy price. First, if there is only one bidder whose valuation is above b, butt V m:m ; b V (m1):m , then the threshold price would be triggered since the bidder with valuation V (m1):m will bid up to V (m1):m >t V m:m ; b . Thus, in this case, the auction would end with the buy-price option. Moreover, if there are two or more bidders whose valuations are above the buy price. Then the auction must end with buy price option. Otherwise, let the auction end with e b<b. Then the bidder with valuation above the buy price can always improve his utility by bidding a little bit higher than e b, and e b cannot be the transaction price. There are no other situations left, so I have shown the “only if” part. Thus, the probability that a m-bidder buy price English auction ends without using the buy price is the sum of the probabilities of the two events, E 1 and E 2 , described before (note these two events are disjoint). So, one has m E = P (E 1 ) + P (E 2 ): (B.5) It is obvious that P (E 1 ) = b V m : (B.6) 108 B.2. Proofs As for the second event, E 2 , one has P (E 2 ) = P n V (m1):m <t V m:m ; b jV m:m b o n 1 b V m o = P 1 P 2 (B.7) whereP 2 is the probability thatV m:m b, andP 1 is the probability thatV (m1):m <t V m:m ; b conditional on V m:m b. P 1 in (B:7) can be further written as follows P 1 = v b P n V (m1):m t e v; b jV m:m =e v o dF V m:m jV m:m b (e v) = v b F V (m1):m jV m:m =e v t e v; b dF V m:m jV m:m b (e v): (B.8) Then, it follows from Lemma B.1 that F V (m1):m jV m:m =e v t e v; b = " F V t e v; b F V (e v) # m1 : (B.9) Substituting (B:9) back into (B:8), one has P 1 = v b " F V t e v; b F V (e v) # m1 dF V m:m jV m:m b (e v) = E V m:m jV m:m b 0 @ " F V t e v; b F V (e v) # m1 1 A : (B.10) Therefore, (B:5)-(B:7) and (B:10) give rise to the theorem. Proof of Theorem 4. First, I partition the set v; b into n equal pieces, v; b = [ n2 i=0 [p i ; p i+1 ) [ [p n1 ; p n ]; wherep 0 =v,p n =b, andp i+1 =p i + bv =n. DefineP d which is a discrete version of the threshold price, P d = 8 > > > > > > < > > > > > > : p 0 ; if B m B 2 [p 0 ; p 1 ); . . . . . . p n1 ; if B m B 2 [p n1 ; p n ]: 109 B.2. Proofs Correspondingly, I partition the set b; v also into n equal pieces, b; v = [ n2 i=0 [v i ; v i+1 ) [ [v n1 ; v n ]; where v 0 =b, v n =v, and v i+1 =v i + vb =n. Define the following distribution, e F V (v) = 8 > > > > > > > > > > < > > > > > > > > > > : F V;1 ; if v2 [v 0 ; v 1 ); . . . . . . F V;n ; if v2 [v n1 ; v n ]; F V;1 F V vjV <b ; if v2 v; b : (B.11) Let e F d V := (F V;1 ;:::;F V;n ) 0 . Then e F V (v) is determined by e F d V . Substitute e F V (v) with F V vjB =b in (3:4), and solvee v i =t 1 n p i ; e F V () o , i = 0;:::; n 1, from e F m1 V (e v i ) = e F m1 V (p i ) ( u a bp i u a be v i ) + e vi pi ( u 0 a bx u a be v i ) e F m1 V (x)dx; where e F V () is defined in (B:11). Denote i n p i1 ; e F d V () o = e F m1 V (p i1 ) ( u a bp i1 u a be v i1 ) + e vi1 pi1 ( u 0 a bx u a be v i1 ) e F m1 V (x)dx = e F m1 V (p i1 ) 0 @ u a bp i1 u a h bt 1 n p i1 ; e F V () oi 1 A + t 1 fpi1; e F V ()g pi1 0 @ u 0 a bx u a h bt 1 n p i1 ; e F V () oi 1 Ae F m1 V (x)dx: Define the following mapping, P d ; e F d V = n i p i1 ; e F d V o i=1;:::;n : My aim is to show there is a unique e F d V such that e F d V = P d ; e F d V : (B.12) 110 B.2. Proofs To see this, first note that P d ; e F d V is a contraction mapping in e F d V by Theorem 7 of Bollobás (1990). In fact, t 1 fpi1; e F V ()g pi1 0 @ u 0 a bx u a h bt 1 n p i1 ; e F V () oi 1 Ae F d V (x)dx is an example of the Volterra integral operator, which is a contraction mapping. Then Theorem 3 of Bollobás (1990) is applicable because is a contraction mapping. By Theorem 3 of Bollobás (1990), there is a unique b F d V satisfying (B:12). Let n go to infinity, the above arguments then show there is a unique fixed point satisfying (3:4). Thus, part (i) has been proven. Part (ii) and (iii) follow directly from part (i). Proof of Lemma 3. When W (mj1):mj E;1 ;:::; W (mj1):mj E;T E j are i.i.d. samples from F (mj1):mj W (), I have al- ready shown F W;T E j w (mj1):mj E;t ! a:s: F W W (mj1):mj E;t . Due to the fact that F W;T E j ()2 [0; 1], the lemma is obvious by applying the dominated convergence theorem to n F W;T E j w (mj1):mj E;t o . The proof of Theorem 5 and 6 make use of the following auxiliary lemma, which is an extension of Theorem 2.2 of Csorgo et al. (1986). Lemma B.2. Let U 1 ;:::;U n be n independent uniform [0; 1] random variables, and G n (x) =n 1 n X i=1 1 (U i x) be the uniform empirical distribution function. Let () : [0; 1]! [0; 1] be a monotone increasing differentiable function on [0; 1], and suppose () and its first-order derivative 0 () are bounded on [0; 1]. Let n (x) = p nfG n (x)xg for 0 x 1. Then, there exist a sequence of Brownian bridgesfB n (x) ; 0x 1g defined on the same probability space asf n (x) ; 0x 1g, such that sup 0x1 p n [ fG n (x)g (x)]B n f (x)g =O p (1): Proof. Given the conditions in this lemma, Theorem 2.2 of Csorgo et al. (1986) applies which implies that there exist a sequence of Brownian bridgesfB n (x) ; 0x 1g defined on the same probability space as f n (x) ; 0x 1g, such that p nfG n (x)xgB n (x) 1 = sup 0x1 p nfG n (x)xgB n (x) = O p n 1=4 (logn) 1=2 (log logn) 1=4 =O p (a n ): Because () is differentiable on [0; 1] and G n (x)2 [0; 1], there is a e y2 [G n (x)^x; G n (x)_x] such 111 B.2. Proofs that fG n (x)g (x) = 0 (e y)fG n (x)xg: Then, one has p n [ fG n (x) (x)g]B n f (x)g 1 = 0 (e y) p nfG n (x) (x)g B n f (x)g 1 = 0 (e y) p nfG n (x) (x)g 0 (e y)B n (x) + 0 (e y)B n (x)B n f (x)g 1 0 (e y) p nfG n (x) (x)gB n (x) 1 +k 0 (e y)B n (x)B n f (x)gk 1 = O p (a n ) +O p (1) = O p (1): Then, the lemma has been proven. The proof of Theorem 5 and 6 mainly relies on viewing e F W;T E () in (2:17) as the orthogonal projection of F W;T E on the spaceG (p) T E with respect to the empirical inner producth;i T E. Thus, the model becomes amenable to analysis along the lines of Huang et al. (2003) and Xue and Wang (2010). With slightly abuse of notation, let = () be a random function which interpolates the value tj at w (mj1):mj E;t , t = 1;:::; T E j and j = 1;:::; J. Let P G T E denote the projection matrix, e F W;T E =P G T E F W;T E j =P G T E F W + = e F W +e ; where e F W ande are the orthogonal projections of F W and . Then, e F W;T E F W = e F W F W +e ; (B.13) where e F W F W represents the approximation error, ande denotes the estimation error. Proof of Theorem 5. The proof of this theorem is quite similar to the proof of Theorem 3.1-3.3 in Xue and Wang (2010). First, Lemma A.4 of Xue and Wang (2010) shows that e F W F W =O p q (p+1) T E ; e F W F W 1 =O p q (p+1) T E : 112 B.2. Proofs Next, Lemma B.2 entails that there exist a sequence of Brownian bridge B T E () such that, sup w2[w; 0) p T E n F W;T E (w)F W (w) o B T E F W (w) =O p (1): (B.14) (B:14) can be easily verified through Lemma B.2 by noting F W;T E j = 1 F (mj1):mj W;T E j ; m j1 ; m j ; F W = 1 F (mj1):mj W ; m j 1; m j ; where is defined in (2:11). The transformation () in (2:11) satisfies the conditions of Lemma B.2. Using (B:14), the following two results can be proven using the similar arguments in the proof of Theorem 3.1-3.3 of Xue and Wang (2010), ke k =O p 1= p T E ; ke k 1 =O p r logT E T E ! : Moreover, when the sample size is large enough, e F W;T E is identical to b F W;T E according to Xue and Wang’s proof of their Theorem 3.3. Finally, (B:13) implies e F W;T E F W e F W F W +ke k =O p q (p+1) T E + T E 1=2 ; e F W;T E F W 1 e F W F W 1 +ke k 1 =O p q (p+1) T E + logT E T E 1=2 ! : Since e F W;T E is identical to b F W;T E as T E !1, the theorem has been proven. Proof of Theorem 6. By (B:14); one can also prove that under Assumption B.1 and B.2 e (w) for any w2 [w; 0) follows an asymptotically normal distribution. p T E e (w)! d N 0;F W (w) 1F W (w) ; with similar arguments in Xue and Wang’s proof of Lemma A.5. Under the condition T E 1=2 q (p+1) T E ! 0 as T E !1, one has p T E e F W F W =o P (1) as T E !1. 113 B.2. Proofs Then by the Slutsky lemma, p T E b F W;T F W ! d N 0;F W (w) 1F W (w) : Proof of Theorem 7. First, define the following mapping (P B ;g) = n t;j p mj :mj B;t ; g o t=1;:::;T B j ;j=1;:::;J ; where P B = P mj :mj B;t t=1;:::;T B j ;j=1;:::;J , and g2G (p) T B . Using the arguments in the proof of Theorem 4, it’s straightforward to show that (P B ; g) is a contraction mapping using Theorem 7 of Bollobás (1990). Again Theorem 3 of Bollobás (1990) says there is a unique point g 2G (p) T B , such that g = (P B ; g ): (B.15) Then Algorithm 1 is the standard way of approximating that fixed point according to the Contraction Mapping Theorem. Denote F + W (;1) the fixed point g in (B:15). Then n e F + W (; r) o " e F + W (;1). When n e F + W (; r) o " e F + W (;1), the convergence e 0 W (r) "e 0 W (1) is obvious. Proof of Theorem 8. Let’s keep using the notations used in the Proof of Theorem 7. Let e F + W (;1) and e 0 W (r) be the limit points of n e F + W (; r) o and e 0 W (r) . Similarly, define the limit point of n w mj :mj B;t (r) o " w mj :mj B;t (1). Because the density of a topological space is a topological invariant property, and w mj :mj B;t (1) is a surjective continuous function of p mj :mj B;t ,W B (1) = n w mj :mj B;t (1) o T B j ;J t=1;j=1 is a dense set in [0; w]. Then for an arbitrary w 0 2 [0; w], if w 0 2W B j (1), one has e F mj1 W (w 0 ;1) = e F mj1 W p 0 b 0 ;1 ( u a b 0 p 0 u a (w 0 ) ) + (v 0 p 0 ) R 1 R X r=1 " u 0 a b 0 x 0 (y r ) u a (w 0 ) # e F mj1 W x 0 (y r )b 0 ;1 ! ; (B.16) where p 0 , b 0 , v 0 and x 0 (y r ) are the corresponding values of p mj :mj B;t , b j B;t , v mj :mj B;t and x t;j (y r ) when w mj :mj B;t (1) = w 0 , and e F W (;1) are formed by (2:27) with e F + W (;1). If w 0 = 2 W B (1), then there is a sequencefw i g2W B (1) such thatfw i g"w 0 by the definition of a dense set. One has the sequences 114 B.2. Proofs fp i g, b i ,fv i g andfx i (y r )g. e F mi1 W (w i ;1) = e F mi1 W p i b i ;1 ( u a b i p i u a (w i ) ) + (v i p i ) R 1 R X r=1 " u 0 a b i x i (y r ) u a (w i ) # e F mi1 W x i (y r )b i ;1 ! : (B.17) Take limits for both sides of (B:17), one has (B:16) for w 0 = 2W B (1). Therefore, e F + W (;1), i.e. b F + W;T (), satisfies (2:23) for all w2 [0; w]. Then the uniform consistency part has been proven. Given the uniform consistency of e F + W (;1), the consistency of b 0 W;T follows from the continuous mapping theorem, and the properties of GMM. Note e F + W (;1) can be written as follows, e F + W (;1) = e F W (;1) 0 W (1) 1 0 W (1) : Then, E n e F + W (;1) o =F + W () +o p (1); by the dominated convergence theorem, which is a perfect analogy of (2:13). Thus, the arguments used in the proof of Theorem 5 and 6 can used to again to prove the rest of this theorem. 115 Appendix C Appendix of Chapter 3 C.1 Proofs of Main Results For sequences of numbers a n and b n , let a n .b n denote that a n Cb n for some constant C which does not depend on n, and let a n b n denote that a n =O (b n ) and b n =O (a n ). For a sequences of random numbers a n andb n , leta n p b n denote thata n =O p (b n ) andb n =O p (a n ). We usea n 9 p b n to denote thata n does not converge in probability to b n . For a subset AR K , denote A int and @A the interior and the boundary of A. Proof of Proposition 7. BecauseB 0 B andB is a bounded subset ofR K ,B 0 is a bounded subset ofR K . It suffices to show thatB 0 is a closed. Note that the set max 2B P` profile (;Z) is a singleton, which is closed. By Proposition 6, we have B 0 =P` profile 1 max 2B M profile () , which is also closed since M profile () is a continuous with respect to onB by the condition. Proof of Theorem 9. Let 0 be an arbitrary point ofB 0 . In this proof, we denoteo + p (1) the nonnegativeo p (1). By the uniform convergence condition Assumption 9.(iii), we haveP n ` profile 0 ;z i ! p P` profile 0 ;Z . Thus, P n ` profile 0 ;z i =P` profile 0 ;Z +o p (1). From the condition that P n ` profile ( n ;z i ) max 2B P n ` profile (;z i ) o p (1); we have the inequality P n ` profile ( n ;z i ) +o + p (1)P n ` profile 0 ;z i : For P n ` profile 0 ;z i =P` profile 0 ;Z +o p (1); we have P` profile 0 ;Z P n ` profile ( n ;z i ) +o p (1): 116 C.1. Proofs of Main Results Thus P` profile 0 ;Z P` profile ( n ;Z) P n ` profile ( n ;z i )P` profile ( n ;Z) +o p (1) max 2B P n ` profile (;z i )P` profile (;Z) +o p (1) = o p (1): (C.1) It follows from 6 that P` profile 0 ;Z P` profile ( n ;Z) 0 which together with (C.1) implies that P` profile 0 ;Z P` profile ( n ;Z)! p 0: (C.2) Suppose d H ( n ;B 0 )9 p 0 as n!1; in other words, there exists a fixed "> 0 for all n such that lim n!1 P (d H ( n ;B 0 )>")> 0: (C.3) Note that d H ( n ;B 0 ) = min 2B0 k n k because B 0 is compact according to Proposition 7. Hence the event d H ( n ;B 0 )>" implies that n = 2B 0 and P` profile 0 ;Z P` profile ( n ;Z)> for some fixed > 0 and for all 0 2B 0 . Therefore, it follows from (C.3) that lim n!1 P P` profile 0 ;Z P` profile ( n ;Z)> > 0; which contradicts to the result (C.2). Hence we must have d H ( n ;B 0 )! p 0. To show d (B n ;B 0 )! p 0, we first show that B 0 B n . Let 0 be an arbitrary element of B 0 . We need to show that P n ` profile 0 ;z i max 2B P n ` profile (;z i ) o p (1) which follows from P n ` profile 0 ;z i max 2B P n ` profile (;z i ) = max 2B P n ` profile (;z i )P n ` profile 0 ;z i = max 2B P` profile (;Z) +" n () P` profile 0 ;Z " n 0 max 2B P` profile (;Z) + max 2B " n ()P` profile 0 ;Z " n 0 = max 2B " n ()" n 0 = o p (1): Here" n () is ano p (1) such thatP n ` profile (;z i ) =P` profile (;Z) +" n (). Hence we can concludeB 0 B n and (B c n \B 0 )! 0. Next, let’s show (B n \B c 0 )! p 0. Notice thatB n is a compact subset sinceP n ` profile (;z i ) is continuous 117 C.1. Proofs of Main Results according to Assumption 9.(ii). We have B n \B c 0 B n \ B int 0 c , and B n \ B int 0 c is compact. Hence B n \ B int 0 c can be covered by a finite number of open balls inR K . Let 1 n ;:::; m n be the centers of these open balls. Notice that we have shownd H ( n ;B 0 )! p 0 for all n 2B n . The diameters of these open balls covering B n \ B int 0 c , which are 2d H i n ;B 0 for i = 1;:::;m, must be o p (1) too. Hence B n \ B int 0 c ! p 0 and (B n \B c 0 )! p 0. In sum, we have d (B n ;B 0 )! p 0. Proof of Theorem 10. Denote P profile n = max 2B P n ` profile (;z i ). When the temperature parameter is 1 n , Theorem 2.2 of Dekkers and Aarts (1991) implies that q (b; 1 n ) = exp P n ` profile (b;z i ) P profile n = 1 n b 0 2B exp h P n ` profile (b 0 ;z i ) P profile n = 1 n i db 0 : Define f B0 (b) = 1 (b2B 0 ) (B 0 ) ; where (B 0 ) is the Lebesgue measure of the identified setB 0 . We first show thatq (b; 1 n ) converges pointwise to f B0 (b) as n!1. In the remainder of this proof, we frequently make the following decomposition P n ` profile (b;z i ) =P` profile (b;Z) +" n (b): In particular, let " n = P profile n P` profile ( 0 ;Z), where 0 is the true value. It follows from the uniform convergence condition that " n (b) = O p (n ). Notice " n (b)= 1 n ! 0 as n!1 because 1 n = O n 0 , where 0 <. For b = 2B 0 , we have q (b; 1 n ) / exp P n ` profile (b;z i ) P profile n 1 n exp P` profile (b;Z) +" n (b)P` profile ( 0 ;Z) " n 1 n (C.4) = exp P` profile (b;Z)P` profile ( 0 ;Z) 1 n + " n (b) " n 1 n ; where " n = max 2B " n () and (C.4) follows from P profile n max 2B P` profile (;Z) + " n . Because b = 2 B 0 , there exists a constant > 0 such that P` profile (;Z)P` profile ( 0 ;Z)<. Hence, P` profile (;Z)P` profile ( 0 ;Z) 1 n + " n () " n 1 n ! p 1 and q (b; 1 n )! p 0: (C.5) Let 0 ; 0 0 2 B 0 be two arbitrary elements in the identified set B 0 . We have P n ` profile ( 0 ;z i ) = P` profile ( 0 ;Z) + " n ( 0 ) and P n ` profile ( 0 0 ;z i ) = P` profile ( 0 0 ;Z) + " n ( 0 0 ) by the condition of the uni- 118 C.1. Proofs of Main Results form convergence of P n ` profile (;z i ) to P` profile (;Z). Note that P` profile ( 0 ;Z) = P` profile ( 0 0 ;Z) since 0 ; 0 0 2B 0 . Then we have q ( 0 ; 1 n ) q ( 0 0 ; 1 n ) = exp " n ( 0 )" n ( 0 0 ) 1 n ! p 1; for " n ( 0 )" n ( 0 0 ) 1 n ! p 0: (C.6) Equations (C.5) and (C.6) implies that q (b; 1 n )! p f B0 (b). Moreover, it is easy to show that q (b; 1 n ) restricted on B 0 converges to f B0 (b) uniformly. This follows by checking that q (b; 1 n ) satisfies the Lipschitz condition whenP n ` profile (b;z i ) is first order differentiable and uniformly continuous. Consequently, we have B0 q (b; 1 n ) db! p 1, i.e. P SA n 2B 0 ! p 1. Proof of Proposition 8. AsafinitedimensionalsubspaceofG,G n isaclosed. BecauseG n isoffinitedimensions, it suffices to show thatG n is bounded in order to show thatG n is compact by Heine-Borel theorem. Let g2G n , and we have kgk 2 A = b a log 2 g (x) dx 1 ba b a logg (x) dx ! 2 ; using equation (5) of Egozcue et al. (2006). By Assumption 10 and the Jensen’s inequality, we have b a logg (x) dx ! 2 < b a log 2 g (x) dx<1: Thus,kgk A <1 for any g2G n , and the subspaceG n is bounded and compact. Proof of Theorem 12. Letf j g 1 j=1 the orthonormal basis generated from the ordinary Legendre polynomials inL 2 space. Then, the sequence j (x) = exp q 2j+1 P j 2(xa) 1 b a exp q 2j+1 P j 2(a) 1 d ; j 0; and =ba. For notational simplicity, we denote j = b a exp ( r 2j + 1 P j 2 (a) 1 ) d: As a orthonormal basis, one can write any density function g (x)2G 0 x as a Fourier series, g (x) = 1 M j=0 j j (x): 119 C.1. Proofs of Main Results The aim is to find the convergence rate of the Fourier seriesf j g 1 j=0 and the error bound ofjg n (x)g (x)j, where g n (x) = Jn M j=0 j j (x) is truncated approximation. We know that j =hg; j i A which is defined as follows, hg; j i A = b a logg (x) log j (x) dx 1 b a logg (x) dx b a log j (x) dx: First, we have log j (x) = r 2j + 1 P j 2 (xa) 1 log j : The inner producthg; j i A can be written as follows, hg; j i A = b a logg (x) ( r 2j + 1 P j 2 (xa) 1 log j ) dx 1 b a logg (x) dx b a r 2j + 1 P j 2 (xa) 1 log j ! dx = b a logg (x) r 2j + 1 P j 2 (xa) 1 dx log j b a logg (x) dx 1 b a logg (x) dx b a r 2j + 1 P j 2 (xa) 1 dx + log j b a logg (x) dx = b a logg (x) r 2j + 1 P j 2 (xa) 1 dx 1 b a logg (x) dx b a r 2j + 1 P j 2 (xa) 1 dx: A noticeable fact is that b a r 2j + 1 P j 2 (xa) 1 dx = r 2j + 1 2 1 1 P j (z) dz = 0: (C.7) Consequently, we have j =hg; j i A = b a logg (x) r 2j + 1 P j 2 (xa) 1 dx: By using the following two facts about Legendre polynomials, P j (x) = 1 2j + 1 P 0 j+1 (x)P 0 j1 (x) ; 120 C.1. Proofs of Main Results P j (1) = 1 and P j (1) =1; (C.8) we have j = b a logg (x) r 2j + 1 P j 2 (xa) 1 dx = r 2j + 1 2 1 1 logg a + 2 (z + 1) P j (z) dz = p (2j + 1) 2 1 1 logg a + 2 (z + 1) P j (z) dz: Denote m (z) = logg (a + (z + 1)=2), and we have j = p (2j + 1) 2 1 1 m (z)P j (z) dz = p (2j + 1) 2 1 1 m (z) 1 2j + 1 P 0 j+1 (z)P 0 j1 (z) dz = s 2j + 1 1 2 1 1 m (z) P 0 j+1 (z)P 0 j1 (z) dz = s 2j + 1 1 2 fm (z)P j+1 (z)m (z)P j1 (z)g 1 1 + 1 1 m 0 (z)P j1 (z) dz 1 1 m 0 (z)P j+1 (z) dz : By the properties (C.8), we have j = s 2j + 1 1 2 1 1 m 0 (z)P j1 (z) dz 1 1 m 0 (z)P j+1 (z) dz = s 2j + 1 1 2 1 2j 1 1 1 m 0 (z) P 0 j (z)P 0 j2 (z) dz 1 2j + 3 1 1 m 0 (z) P 0 j+2 (z)P 0 j (z) dz = s 2j + 1 1 2j 1 1 1 m 00 (z) P j2 (z) 2 P j (z) 2 dz 1 2j + 3 1 1 m 00 (z) P j (z) 2 P j+2 (z) 2 dz : Next, we know from Wang and Xiang (2012) that jP j (x)j r 2j (1x 2 ) ; 1<x< 1: 121 C.1. Proofs of Main Results Thus, j j j s 2j + 1 1 2j 1 1 1 jm 00 (z)j p 1z 2 1 2 r 2 (j 2) + 1 2 r 2j dz + 1 2j + 3 1 1 jm 00 (z)j p 1z 2 1 2 r 2j + 1 2 r 2 (j + 2) dz : Denote the Chebyshev-weighted seminorm kuk T = 1 1 ju 0 (x)j p 1x 2 dx; and V p = m (p) T . Then j j j s 2j + 1 1 2j 1 V 1 r 2 (j 2) + r 2j + 1 2j + 3 V 1 r 2j + r 2 (j + 2) s 2j + 1 V 1 j 1=2 r 2 (j 2) : In general, we can prove that j j j s 2j + 1 V p (j 1=2) (j 3=2) (j (2p 1)=2) r 2 (jp 1) ; if m (p) T =V p <1 for some p 1. Proof of Theorem 13. By the definition of the normkk A , one then have d A (f;g) = 8 < : b a (log (f g) (x)) 2 dx 1 b a log (f g) (x) dx ! 2 9 = ; 1=2 : One can show that d A (g;g n ) =kg g n k A = 1 M j=Jn+1 j j A : For the Legendre polynomial generated basis, we have g g n = exp P 1 j=Jn+1 j q 2j+1 P j 2(xa) 1 b a exp P 1 j=Jn+1 j q 2j+1 P j 2(a) 1 d ; 122 C.1. Proofs of Main Results log (g g n ) = 1 X j=Jn+1 j r 2j + 1 P j 2 (xa) 1 log 2 4 b a exp 8 < : 1 X j=Jn+1 j r 2j + 1 P j 2 (a) 1 9 = ; d 3 5 := A (x)B: Thus, b a (log (g g n ) (x)) 2 dx 1 b a log (g g n ) (x) dx ! 2 = b a (A (x)B) 2 dx 1 b a (A (x)B) dx ! 2 = b a A 2 (x) +B 2 2A (x)B dx 1 b a A (x) dxB ! 2 = b a A 2 (x) +B 2 2A (x)B dx 1 8 < : b a A (x) dx ! 2 + 2 B 2 2B b a A (x) dx 9 = ; = b a A 2 (x) dx 1 b a A (x) dx ! 2 = b a A 2 (x) dx; where the last equation follows from (C.7). Therefore, d A (g;g n ) = b a A 2 (x) dx ! 1=2 : Plugging term A (x) into the above equation, we have d A (g;g n ) = 0 B @ b a 8 < : 1 X j=Jn+1 j r 2j + 1 P j 2 (xa) 1 9 = ; 2 dx 1 C A 1=2 = 0 B @ 2 1 1 8 < : 1 X j=Jn+1 j r 2j + 1 P j (z) 9 = ; 2 dz 1 C A 1=2 : Denote b j = j p (2j + 1)=. Because the Legendre polynomials can be bounded by one, jP j (z) 1j; 1z 1; 123 C.1. Proofs of Main Results we have d A (g;g n ) p 1 X j=Jn+1 jb j j: By Theorem 12, we have jb j j V p (j 1=2) (j 3=2) (j (2p 1)=2) r 2 (jp 1) : Thus, d A (g;g n ) p 1 X j=Jn+1 V p (j 1=2) (j 3=2) (j (2p 1)=2) r 2 (jp 1) V p p r 2 (jp) 1 X j=Jn+1 1 (j 1=2) (j 3=2) (j (2p 1)=2) : For p> 1, we have d A (g;g n ) V p s 2 (jp) 1 X j=Jn+1 1 (j 1=2) (j 3=2) (j (2p 1)=2) = V p s 2 (jp) 1 p 1 1 X j=Jn+1 1 (j 3=2) (j (2p 1)=2) 1 (j 1=2) (j (2p 3)=2) = V p (p 1) (j 1=2) (j 3=2) (j (2p 3)=2) s 2 (jp) : This completes the proof. In the sequel, we write h (Z;;g) and f (Z;;) with h (;g;Z) and f (;;Z), respectively. So the notations will be easier to follow. Denote f 1 (;;Z) = @f (;;Z) @ ; f 11 (;;Z) = @ 2 f (;;Z) @ 2 ; h 1 (;g;Z) = b a f 1 (;;Z)g () d; h 11 (;g;Z) = b a f 11 (;;Z)g () d: Lemma C.1. Let ` (;g;Z) = log b a f (Z;;)g () d. Suppose ` (;g;Z) is twice Fréchet differentiable Assumption 11.(iii). Let` be a parametric path inG through` (;g;Z), that is` 2G, and` j=0 =` (;g;Z). 124 C.1. Proofs of Main Results LetR = r :r = @` @ =0 . For any r2R we have ` 1 (;g;Z) = h 1 (;g;Z) h (;g;Z) ; ` 11 (;g;Z) = h 1 (;g;Z) h (;g;Z) 2 + h 11 (;g;Z) h (;g;Z) ; ` 2 (;g;Z) [r] = b a f (;;Z)g () [logr ()A (r i )] d h (;g;Z) ; (C.9) ` 22 (;g;Z) [r 1 ;r 2 ] = D (r 1 ;r 2 )A (r 1 )B (r 2 )h (;g;Z) [C (r 1 ;r 2 )A (r 1 )A (r 2 )] h (;g;Z) B (r 2 ) [B (r 1 )A (r 2 )h (;g;Z)] [h (;g;Z)] 2 = T 1 (;g;Z) [r 1 ;r 2 ]T 2 (;g;Z) [r 1 ;r 2 ] (C.10) where A (r i ) = b a g () logr i () d; B (r i ) = b a f (;;Z)g () logr i () d; C (r 1 ;r 2 ) = b a g () logr 1 () logr 2 () d; D (r 1 ;r 2 ) = b a f(;;Z)g() logr 1 () logr 2 () d; i = 1; 2. Proof. These identities follow from direction calculations of derivatives on Banach space, e.g. Section 6 of Wellner and Zhang (2007). We omit the details of calculations to save space. Lemma C.2. Suppose Assumption 11.(v) hold. Letg2G 0 andg n be the projection ofg inG n . IfJ n =O (n ) for > 0, we have kg n gk 2 <O n (p 1 2 ) ; kg n gk 1 <O n p(2p1) 2p+1 : Proof. First, it is easy to very the following identity by using equation (4) in Egozcue et al. (2006), hf;gi A =hlogcf; logdgi 2 ; f;g2G; (C.11) whereh;i A is the inner product of the Hilbert spaceG as defined in (3.13), andh;i 2 is the ordinary L 2 125 C.1. Proofs of Main Results inner product, and c;d are two constant that satisfy the following conditions, logc = b a logf () d; logd = b a logg () d; with =ba. From (3.12) and (C.11), we have d A (f;g) 2 = kf gk 2 A =hf g;f gi A = logc f=g ; logc f=g 2 ; where = b a f ()=g () d, and logc = b a log f ()=g () d = b a (logf () logg ()) d + log: Thus, we have log c = b a (logf () logg ()) d: Using this identify, we have hf g;f gi A = b a log c + log f () g () 2 d = log c 2 + b a log f () g () 2 d + 2 log c b a log f () g () d = b a log f () g () 2 d log c 2 = b a [logf () logg ()] 2 d 1 " b a (logf () logg ()) d # 2 : By Jensen’s inequality, we have hf g;f gi A 1 1 b a [logf () logg ()] 2 d = 1 1 hlogf logg; logf loggi 2 : (C.12) It follows from Theorem 13 that, d A (g n ;g) =kg n gk A =O n (p 1 2 ) ; 126 C.1. Proofs of Main Results d 2 A (g n ;g) =hg n g;g n gi A =O n 2(p 1 2 ) : By (C.12), we have hlogg n logg; logg n loggi 2 <O n 2(p 1 2 ) ; k logg n loggk 2 <O n (p 1 2 ) : Note that both g and g n are bounded functions. Thus, b a (logg n () logg ()) 2 d = b a [g n ()g ()] 1 g + (1)g n 2 d; for some 2 [0; 1]. By the boundedness, we have 0<mM <1, such that m< 1 g + (1)g n 2 <M: Hence, 1 M b a [logg n () logg ()] 2 d b a [g n ()g ()] 2 d 1 m b a [logg n () logg ()] 2 d: By the shrinkage theorem, we have kg n gk 2 k logg n loggk 2 : Finally, using Lemma 7 of Shen and Wong (1994) (note that their lemma still holds when one drop the condition f (a) =f (b) = 0 in their notations), we have kg n gk 1 2kg n gk 2p 2p+1 2 L 1 2p+1 ; for some constant L. Consequently, we have kg n gk 1 <O n p(2p1) 2p+1 : And this completes the proof. Proof of Theorem 14. The convergence rate is derived from the general result Theorem C.1 in Appendix C.2. We first verify the three condition C.1-C.3 needed in Theorem C.1. Let 0 = 0 ; g 0 2 0 be a generic element of 0 , and let = (;g)2 n n 0 . By Taylor expansion, we 127 C.1. Proofs of Main Results have P` 0 ; g 0 ;Z P` (;g;Z) = P` (;g;Z)P` 0 ; g 0 ;Z = 1 2 P h 0 0 ` 11 0 ; g 0 ;Z 0 +` 22 0 ; g 0 ;Z [g g 0 ;g g 0 ] i +o d 2 (; 0 ) = 1 2 P h 0 0 ` 11 0 ; g 0 ;Z 0 i + 1 2 P ` 22 0 ; g 0 ;Z [g g 0 ;g g 0 ] +o d 2 (; 0 ) =A 1 +A 2 +o d 2 (; 0 ) : From Assumption 11.(iv), we have A 1 = 1 2 0 0 P ` 11 0 ; g 0 ;Z 0 1 2 k 0 k 2 ; (C.13) where 1 < 0 is the greatest eigenvalue of the Hessian matrix P` 11 0 ; g 0 ;Z . From (C.9) and the condition that P` 2 0 ; g 0 ;Z i [r] = 0, we have P " b a f (;;Z i )g () logr () d b a f 0 ;;Z i g 0 () d # = b a g 0 () log (r i ()) d: (C.14) Let T 1 (;g;Z) [r 1 ;r 2 ] and T 2 (;g;Z) [r 1 ;r 2 ] be defined in (C.10). It follows from the identity (C.14) that P T 2 0 ; g 0 ;Z i [r;r] =P 8 < : " b a f 0 ;;Z i g 0 () logr () d b a f 0 ;;Z i g 0 () d # 2 9 = ; P 2 " b a f 0 ;;Z i g 0 () logr () d b a f 0 ;;Z i g 0 () d # = Var 0 @ " b a f 0 ;;Z i g 0 () logr () d b a f 0 ;;Z i g 0 () d # 2 1 A : 128 C.1. Proofs of Main Results Direct calculations show that PT 1 0 ; g 0 ;Z i [r;r] =P 8 < : b a f 0 ;;Z i g 0 () log 2 r () d h b a f 0 ;;Z i g 0 () logr () d ih b a g () logr () d i b a f 0 ;;Z i g 0 () d 9 = ; 8 < : b a g 0 () log 2 r () d " b a g 0 () logr () # 2 9 = ; =P " b a f 0 ;;Z i g 0 () log 2 r () d b a f 0 ;;Z i g 0 () d # P " b a f 0 ;;Z i g 0 () logr () d b a f 0 ;;Z i g 0 () d #" b a g () logr () d # 8 < : b a g 0 () log 2 r () d " b a g 0 () logr () # 2 9 = ; = " Z b a f 0 ;;z g 0 () log 2 r () d dz # " Z b a f 0 ;;z g 0 () logr () d dz #" b a g () logr () d # 8 < : b a g 0 () log 2 r () d " b a g 0 () logr () # 2 9 = ; = 8 < : b a g 0 () log 2 r () d " b a g 0 () logr () d # 2 9 = ; 8 < : b a g 0 () log 2 r () d " b a g 0 () logr () # 2 9 = ; =0: Consequently, P ` 22 0 ; g 0 ;Z i [r;r] = Var 0 @ " b a f 0 ;;Z i g 0 () logr () d b a f 0 ;;Z i g 0 () d # 2 1 A : Therefore,P ` 22 0 ; g 0 ;Z i [r;r] is an elliptic bilinear form, and there exists a constant c> 0 such that A 2 = 1 2 P ` 22 0 ; g 0 ;Z [g g 0 ;g g 0 ] 1 2 cd g (g; g 0 ) 2 : (C.15) Since = (;g)2 n n 0 and 0 = 0 ; g 0 2 0 are both arbitrary, we can conclude from (C.13) and (C.15) 129 C.1. Proofs of Main Results that inf fd H (;0)";2n; 020g P ` 0 ; g 0 ;Z ` (;g;Z) &" 2 : So Condition C1 in Appendix B holds with the constant = 1. Next, we have ` (;g;Z)` 0 ; g 0 ;Z 2 = ( log b a f (;;Z)g () d log b a f 0 ;;Z g 0 () d ) 2 = " log b a f (;;Z)g () d log b a f 0 ;;Z g () d + log b a f 0 ;;Z g () d log b a f 0 ;;Z g 0 () d # 2 " log b a f (;;Z)g () d log b a f 0 ;;Z g () d # 2 + " log b a f 0 ;;Z g () d log b a f 0 ;;Z g 0 () d # 2 = ` (;g;Z)` 0 ;g;Z 2 + ` 0 ;g;Z ` 0 ; g 0 ;Z 2 : (C.16) By the mean value theorem for linear operators (c.f. Lemma 4.4.7 of Hutson and Pym, 1980), ` (;g;Z)` 0 ;g;Z k 0 k sup f _ : _ =+(1) 0;2[0;1]g k` 1 _ ;g;Z k: (C.17) Note that sup f _ : _ =+(1) 0;2[0;1]g k` 1 _ ;g;Z k is finite according to our assumptions. Similarly, we have ` 0 ;g;Z ` 0 ; g 0 ;Z kg g 0 k sup f _ g: _ g=g+(1) g0;2[0;1]g k` 2 0 ; _ g;Z k: (C.18) 130 C.1. Proofs of Main Results Again sup f _ g: _ g=g+(1) g0;2[0;1]g k` 2 0 ; _ g;Z k is finite. It then follows from (C.16)-(C.18) that sup fd H (;0)";2n; 020g Var ` 0 ; g 0 ;Z ` (;g;Z) = sup f2n; 0 20;d(; 0 )=d H (;0)"g Var ` 0 ; g 0 ;Z ` (;g;Z) .d ; 0 2 +d A (g; g 0 ) 2 =d ; 0 2 <" 2 : So Condition C2 in Appendix B holds with constant = 1 in the notation of Appendix B. Letg 1 ;g n 2G n , andg 1 = Jn1 j=0 1;j j andg 2 = Jn1 j=0 2;j j , wheref j g Jn1 j=0 are defined in (3.15). It is not difficult to verify that kg 1 g 2 k A = 2 4 Jn1 X j=0 ( 1;j 2;j ) 2 3 5 1=2 : Then, using (C.12),kg n gk 2 k logg n loggk 2 , andkg n gk 1 2kg n gk 2p 2p+1 2 L 1 2p+1 , we have kg 1 g 2 k 1 2L 1 2p+1 2 4 Jn1 X j=0 ( 1;j 2;j ) 2 3 5 p 2p+1 = 2L 1 2p+1 k 1 2 k 2p 2p+1 ; where i = ( i;0 ;:::; i;Jn1 ) 0 , i = 1; 2. By the calculations in Shen and Wong (1994) on page 597 or Example 19.7 of van der Vaart (1998) on page 271, denote the ceiling of x bydxe, then for any "> 0, there exists a set of brackets n g L i ;g U i :i = 1;:::; l (1=") c1Jn mo such that for any g2G n , g L i gg U i for some 1i l (1=") c1Jn m , wherekg U i g L i k 1 ". Note the set 0 :2B; 0 2B 0 is a compact subset of R K , it hence can be covered by l c 2 (1=") K m balls with radius ".By (C.17) and (C.18), we can claim that the "-bracketing number associated withkk 1 norm for the classF n =f` (;)` ( n 0 ;) : 2 n ; 0 2 0 g follows N [] (";F n ;kk 1 ) (1=") c1Jn c 2 (1=") K . (1=") c1Jn+K ; withc 1 andc 2 being constants. It follows from the fact that the covering number is bounded by the bracketing number, H (";F n ;kk 1 ) = logN (";F n ;kk 1 ). (c 1 J n +K) log (1=").n log (1="): Note that log (1=") =" 0 + . Thus, 2r 0 = and r = 0 + in the notations of Condition C3 in Appendix B. 131 C.1. Proofs of Main Results Thus, the in Theorem C.1 is 1 2 log logn 2 logn 1 2 as log logn 2 logn ! 0 asn!1. And theK 1=2 ( n 0 ; 0 )< O n p(2p1) 2p+1 by Lemma C.2. Also, we have d ( n 0 ; 0 ) =d A ( g 0 ; g 0;n ) =O n (p 1 2 ) by Theorem 13. For the sieve MLE, we know the n = 0 in the notation of Theorem C.1. Hence, Theorem C.1 implies that d H ^ n ; 0 = O p max n n 1 2 ;O n (p 1 2 ) ;O n p(2p1) 2p+1 o = O p n minf 1 2 ; p(2p1) 2p+1 g : The theorem has been proved. Proof of Corollary 1. SinceP n ^ ` profile (;z i ) converges uniformly to P` profile (;Z). It suffices to derive the convergence rate of P n ^ ` profile (;z i )P` profile (;Z) for any particular 2B. Here, we set = 0 . We have P n ^ ` profile ( 0 ;z i )P` profile ( 0 ;Z) = P n ^ ` profile ( 0 ;z i )P n ^ ` profile ^ n ;z i +P n ^ ` profile ^ n ;z i P` profile ^ n ;Z +P` profile ^ n ;Z P` profile ( 0 ;Z) P n ^ ` profile ^ n ;z i P n ^ ` profile ( 0 ;z i ) + P n ^ ` profile ^ n ;z i P` profile ^ n ;Z + P` profile ^ n ;Z P` profile ( 0 ;Z) : (C.19) By Assumption 11, the three terms have the following order P n ^ ` profile ^ n ;z i P n ^ ` profile ( 0 ;z i ) P` profile ^ n ;Z P` profile ( 0 ;Z) k ^ n 0 k; P n ^ ` profile ^ n ;z i P` profile ^ n ;Z k^ g n (;)g sup (;)k: Moreover, we havek ^ n 0 k andk^ g n (;)g sup (;)k have the same order asd H ^ n ; 0 by Corollary C.1. Proof of Corollary 2. This result follows from Theorem 2.1 of Ding and Nan (2011). Proof of Theorem 15. We first prove the consistency d ^ B n ;B 0 := ^ B n \B c 0 + ^ B c n \B 0 ! p 0, where is the Lebesgue measure onR K . DenoteM n = 1 n ;:::; mn n the set of m n marks. This set can be divided into two parts M n =M In n [M Out n ; 132 C.1. Proofs of Main Results whereM In n B 0 andM Out n B c 0 . Denote M In n = n In;1 n ;:::; In;m In n n o ; M Out n = n Out;1 n ;:::; Out;m Out n n o ; and m n =m In n +m Out n . Accordingly, the set estimate ^ B n can be written as follows, ^ B n = ^ B In n [ ^ B Out n ; ^ B In n = [ A mn;j 2P mn : In;i n 2A mn;j for some In;i n ; ^ B Out n = [ A mn;j 2P mn : Out;i n 2A mn;j for some Out;i n : Denote Q (A;") := S x2A Q (x;") for any set A, where Q (x;") is an open ball centered at x with radius ". It follows from Corollary 2 that d H i n ;B 0 ! p 0, that is lim n!1 P d H i n ;B 0 < p Kw mn = 1; where w mn is the bin width. Note that the event E := n d H i n ;B 0 < p Kw mn ; for all i = 1;:::;m n o implies that the event that ^ B n Q B 0 ; p Kw mn . Thus, we have P (E) P h ^ B n Q B 0 ; p Kw mn i 1: Moreover, we have P (E) mn Y i=1 P h d H i n ;B 0 < p Kw mn i : Because d H i n ;B 0 = O p n 1=2 as showed in Corollary 2, we have Var d H i n ;B 0 = O n 1 by Theorem 14.4-1 in Bishop et al. (1975) on page 476. It follows from the Chebyshev inequality that P h d H i n ;B 0 < p Kw mn i 1O n 1 : Thus, it follows from Bernoulli’s inequality that P (E) 1m n O n 1 : 133 C.1. Proofs of Main Results Under the condition that m n =O n , 0< < 1, we have lim n!1 P h B n Q B 0 ; p Kw mn i = 1: It is not difficult to prove that the event n B n Q B 0 ; p Kw mn o is indeed equivalent to the event n ^ B n \B c 0 Q B 0 ; p Kw mn \B c 0 o . We then can conclude that ^ B n \B c 0 =o p (1). On the other hand, since ^ B n = ^ B In n [ ^ B Out n , we have ^ B c n = ^ B In;c n \ ^ B Out;c n and ^ B c n \B 0 ^ B In;c n \B 0 : Note that m In n m n because any 0 2B 0 is within the support of the stationary distribution of the markov chain in the CSA algorithm. Note that F ^ B In;c n \B 0 e m In n , where e m In n = f ;m In n f d is theL 1 error of the histogram density estimator f ;m In n () := m In n w K m In n 1 P m In n i=1 1 In;i n 2A mn;j ; for 2A mn;j . Under the assumptions, the error e m In n converges to 0 a.s. (Beirlant and Gyorfi, 1998). The measure to B 0 is absolutely continuous with respect to F , we have ^ B In;c n \B 0 ! 0 a.s.. In sum, d ^ B n ;B 0 ! p 0. Now, let’s derive the convergence rate of the distance d ^ B n ;B 0 . We first have ^ B c n \B 0 ^ B In;c n \B 0 = ^ B In;c n \I mn + ^ B In;c n \B 0 \D mn ; whereI mn is the union of theA mn;j contained in the interior ofB 0 , andD mn is the union ofA mn;j intersecting @B 0 , the boundary of B 0 . We have E h ^ B In;c n \I mn i = E 2 4 X Amn;jImn (A mn;j )1 In;i n = 2A mn;j ;i = 1;:::;m In n 3 5 = w K mn X Amn;jImn P In;i n = 2A mn;j ;i = 1;:::;m In n : (C.20) Let 2 = diam (B 0 ) be the diameter of B 0 . Then, we have #fA mn;j I mn g 1=2 K (K=2 + 1) , w K mn =O w K mn : Here 1=2 K = (K=2 + 1) is the volume of a ball inR K with radius , and w K mn is the volume of a bin 134 C.1. Proofs of Main Results A mn;j inR K . Moreover, using the inequality that (1x) n exp (nx) for 0<x< 1. We have P In;i n = 2A mn;j ; i = 1;:::;m In n exp m In n c 1 ; for some constant c 1 2 (0; 1). We have (C:20)c 2 exp m In n c 1 ; (C.21) for some constants c 2 > 0. Besides, we have ^ B In;c n \B 0 \D mn (B 0 \D mn ) #fA mn;j D mn gw K mn 2 K=2 K1 = (K=2) w K1 mn w K mn = O (w mn ): Here we used the surface area of a ball inR K with radius is 2 K=2 K1 = (K=2), and the surface area of a cube inR K with width w mn is w K1 mn . In sum, we have E h ^ B c n \B 0 i O (w mn ); as the exponential bound in (C.21) goes to zero faster than any power of w mn . Hence, ^ B c n \B 0 =O p (w mn ): (C.22) As for the term ^ B n \B c 0 , we have E h ^ B n \B c 0 i = E h ^ B In n \B c 0 + ^ B Out n \B c 0 i c 3 w K mn #fA mn;j D mn g + E h ^ B Out n \B c 0 i = O (w mn ) + E h ^ B Out n \B c 0 i : (C.23) As for E h ^ B Out n \B c 0 i , we have E h ^ B Out n \B c 0 i E h ^ B Out n i w K mn E m Out n : 135 C.2. Convergence of Rate of Sieve Estimators in Partially Identified Models Note that the sequence of events 1 n = 2B 0 ;:::; mn n = 2B 0 can be viewed as a sequence of Bernoulli draws with the probability of success being P d H i n ;B 0 > 0 . Therefore, we have E m Out n =m n P d H i n ;B 0 > 0 : From Corollary 2, P d H i n ;B 0 > 0 <O n 1 . Therefore, E h ^ B Out n \B c 0 i <w K mn m n O n 1 : (C.24) It follows from (C.22), (C.23) and (C.24) that d ^ B n ;B 0 =O p (w mn ) +O p (w mn ) +w K mn m n O n 1 : Under the settings that m n =O n , 0< < 1, and w mn = 1=K mn m n , 0<< 1=K, we have d ^ B n ;B 0 = O p n +O p n K1+ = O p n minf;K+1g = O p n : Then the theorem has ben established. C.2 Convergence of Rate of Sieve Estimators in Partially Identified Models In this appendix, we provide a general result about the convergence rate of sieve estimators when the model of interest is partially identified. Denote the parameter space, which is assumed to be a metric space, and denote d (;) the metric on the space . Let d H (x;A) be the Hausdorff distance between x2 and A . And letf n g n1 be a sieve parameter space that becomes dense in as n!1. Finally, let 0 be the identified set in . The notations used below follow Shen and Wong (1994). Let d H (x;A) := inf y2A d (x;y) be the Hausdorff distance between x and A. Letf n g n1 be a sieve sequence of parameter space, and let 0 be the identified set. The notations here follow Shen and Wong (1994). 136 C.2. Convergence of Rate of Sieve Estimators in Partially Identified Models Condition C.1. For some constants A 1 > 0 and > 0, and for all small "> 0, inf fd H (;0)";2n;020g P [` ( 0 ;Z)` (;Z)] 2A 1 " 2 : Condition C.2. For some constant A 2 > 0 and > 0, and for all small "> 0, sup fd H (;0)";2n;020g Var [` ( 0 ;Z)` (;Z)]A 2 " 2 : Condition C.3. Let F n =f` (;)` ( n 0 ;) : 2 n ; 0 2 0 g be a class of functions indexed by 2 n and 0 2 0 . Here n 0 is the projection of 0 in n , that is d ( n 0 ; 0 ) = inf 2n d (; 0 ). For some constants r 0 < 1=2 and A 3 > 0, H (";F n )A 3 n 2r0 " r for all small "> 0; whereH (";F n ) is theL 1 -metric entropy of the spaceF n , that is, exp (H (";F n )) is the number of "-balls in the L 1 -metric needed to cover the spaceF n . Note that when the model is point identified, the identified set 0 becomes a singleton, and the above conditions C1-C3 become Conditions C1-C3 on page 583 in Shen and Wong (1994). The following result, which is a generalization of Theorem 1 of Shen and Wong (1994), shows that the failure of point identification does not change the convergence rate at which a sieve estimator converges into the identified set. Theorem C.1. Suppose Conditions C1 and C3 hold and ^ n satisfies P n ` ^ n ;Z sup 2n P n ` (;Z) n (C.25) with n =o (n ! ), where ! = 8 > > > > > > > > > > < > > > > > > > > > > : 2(12r0) 2 log logn logn ; if r = 0 + ; 2(12r0) 2+r ; if 0<r< 2; 12r0 2 log logn logn ; if r = 2; 12r0 r ; if r> 2: (C.26) 137 C.2. Convergence of Rate of Sieve Estimators in Partially Identified Models In addition, Condition C2 is also supposed to hold for the case of 0 + r< 2. Then, d H ^ n ; 0 =O p max n ; sup 020 d ( n 0 ; 0 ); sup 020 K 1=2 ( n 0 ; 0 ) ; (C.27) where K ( n 0 ; 0 ) =P [` ( 0 ;Z)` ( n 0 ;Z)] and = 8 > > > > > > > > > > > > > > < > > > > > > > > > > > > > > : 12r0 2 log logn 2 logn ; if r = 0 + ; ; 12r0 42 ; if r = 0 + ; <; 12r0 4min(;)(2r) ; if 0<r< 2; 12r0 4 log logn 2 logn ; if r = 2; 12r0 2r ; if r> 2: The proof of Theorem C.1 depends on the following Lemma, which is similar to Lemma 3 in Shen and Wong (1994). Lemma C.3. Suppose Conditions C1 and C2 hold. Assume also that Condition 3 holds if 0 + r< 2. If at Step k 1 we have a rate " (k1) n =n k1 > max n (12r0)=[(r+2)] ; sup 020 d ( n 0 ; 0 ); sup 020 K 1=2 ( n 0 ; 0 ) ; so that P d H ^ n ; 0 D" (k1) n 5 exp (1") max D 4 ;D 2 M 1 n 2r0 + (k 1) exp L 0 ; where 0 = min r + 4r 0 r + 2 ; r (1 2r 0 ) 4 +r 0 and L = (1") min M 2 D 2 ;M 3 D 4(2r)=2 : Then at Step k, we can find an improved rate " (k) n = max n k ;n (12r0)=[(r+2)] ; sup 020 d ( n 0 ; 0 ); sup 020 K 1=2 ( n 0 ; 0 ) ; 138 C.2. Convergence of Rate of Sieve Estimators in Partially Identified Models where k = (1 2r 0 )= (4) + k1 (2r)= (4), so that P d H ^ n ; 0 D" (k) n 5 exp (1") max D 4 ;D 2 M 1 n 2r0 +k exp L 0 : Proof. The proof is similar to the proof of Lemma 2 in Shen and Wong (1994). Without loss of generality, we assumeD> 1 and we only prove the case of 4 (2r)=2. LetB (i) n = n D" (i) n D H ^ n ; 0 <D" (i1) n o for i = 2;:::;k. Then P d H ^ n ; 0 D" (k) n P d H ^ n ; 0 D" (k1) n + P B (k) n : The target is P B (k) n . By Condition C1, inf n d H (;0)D" (k) n ;2n;020 o P [` ( n 0 ;Z)` ( 0 ;Z)] n 2A 1 D" (k) n 2 sup 020 P [` ( 0 ;Z)` ( n 0 ;Z)] n A 1 D" (k) n 2 : For the last inequality, we need A 1 D" (k) n 2 A 4 (1 +o (1)) sup 020 K ( n 0 ; 0 )> 0: Thus, P B (k) n P 0 @ sup n D" (k) n d H (;0)D" (k1) n ;2n;020 o P n ` (;Z)P n ` ( n 0 ;Z) n 1 A P 2 4 sup n D" (k) n d H (;0)D" (k1) n ;2n;020 o G n [` (;Z)` ( n 0 ;Z)]A 1 p n D" (k) n 2 3 5 : G n f = p n (P n fPf) denote the empirical process. Then the result can be proved using the arguments in the proof of Lemma 2 in Shen and Wong (1994) on page 599. Corollary C.1. Suppose the conditions in Theorem C.1 hold, and let ^ n be an estimator that satisfies P n ` ^ n ;Z sup 2n P n ` (;Z) n ; 139 C.2. Convergence of Rate of Sieve Estimators in Partially Identified Models with n =o (n ! ) for some !> 0 as described in Theorem C.1. Let ^ n be another estimator such that P n ` ^ n ;Z P n ` ^ n ;Z n ; (C.28) where n =o (n ). If !, then d H ^ n ; 0 p d H ^ n ; 0 . Proof. By (C.25), we have P n ` ^ n ;Z P n ` ^ n ;Z n sup 2n P n ` (;Z) ( n + n ): We have n + n = o n min(!;) , and n + n = o (n ! ) if !. Then, Theorem C.1 implies that d H ^ n ; 0 has the convergence rate as in (C.1). Thus, d H ^ n ; 0 p d H ^ n ; 0 . Corollary C.1 says that the convergence rate of the estimators ^ n that are generated from an existing estimator ^ n in the way described in (C.28) cannot be better than the convergence rate of the original estimator ^ n . Moreover, if the criterion differenceP n ` ^ n ;Z P n ` ^ n ;Z between the original estimator ^ n and the generated estimator ^ n is negligible in terms that n =o ( n ), they have the same convergence rate. 140
Abstract (if available)
Abstract
This chapter studies the nonparametric identification and estimation of the structural parameters, including the per period utility functions, discount factors, and state transition laws, of general dynamic programming discrete choice (DPDC) models. I show an equivalence between the identification of general DPDC model and the identification of a linear GMM system. Using such an equivalence, I simplify both the identification analysis and the estimation practice of DPDC model. First, I prove a series of identification results for the DPDC model by using rank conditions. Previous identification results in the literature are based on normalizing the per period utility functions of one alternative. Such normalization could severely bias the estimates of counterfactual policy effects. I show that the structural parameters can be nonparametrically identified without the normalization. Second, I propose a closed form nonparametric estimator for the per period utility functions, the computation of which involves only least squares estimation. The existing estimation procedures rely on assuming that the dynamic programming (DP) problem is stationary or on solving the DP problem numerically with the aid of terminal conditions. Neither the identification nor the estimation requires terminal conditions, the DPDC model to be stationary, or having a sample that covers the entire decision period.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on high-dimensional econometrics
PDF
Applications of Markov‐switching models in economics
PDF
Essays on understanding consumer contribution behaviors in the context of crowdfunding
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on policies to create jobs, improve health, and reduce corruption
PDF
Three essays on supply chain networks and R&D investments
PDF
Three essays on the statistical inference of dynamic panel models
PDF
Essays on family and labor economics
PDF
A structural econometric analysis of network and social interaction models
PDF
Essays on econometrics analysis of panel data models
PDF
Estimation of dynamic models
PDF
Panel data forecasting and application to epidemic disease
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essays on family planning policies
PDF
Essay on monetary policy, macroprudential policy, and financial integration
PDF
Essays on treatment effect and policy learning
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
PDF
Large N, T asymptotic analysis of panel data models with incidental parameters
PDF
Computational aspects of optimal information revelation
Asset Metadata
Creator
Zhou, Cheng (author)
Core Title
Three essays on the identification and estimation of structural economic models
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
06/09/2016
Defense Date
03/30/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
computation,counterfactual policy,dynamic programming discrete choice,identification,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ridder, Geert (
committee chair
), Hsiao, Cheng (
committee member
), Hsieh, Yu-Wei (
committee member
), Yang, Botao (
committee member
)
Creator Email
andrewchengchou@gmail.com,chengzho@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-251264
Unique identifier
UC11279420
Identifier
etd-ZhouCheng-4426.pdf (filename),usctheses-c40-251264 (legacy record id)
Legacy Identifier
etd-ZhouCheng-4426.pdf
Dmrecord
251264
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhou, Cheng
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
computation
counterfactual policy
dynamic programming discrete choice