Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on nonparametric and finite-sample econometrics
(USC Thesis Other)
Essays on nonparametric and finite-sample econometrics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Essays on Nonparametric and Finite-sample Econometrics by Grigory Franguridi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ECONOMICS) May 2023 Copyright 2023 Grigory Franguridi To my parents Oxana Franguridi (1964–2021) and Konstantin Franguridi (b. 1961) ii Acknowledgments I am grateful to my main advisors Hyungsik Roger Moon and Geert Ridder who did not give up on me despite all the fluctuations of the scientific discovery process. I will always be indebted for the time and effort they invested in me and hope to continue collaborating with them for years to come. I thank Tim Armstrong and Sergey Lototsky who readily agreed to becomemembersofmydissertationcommittee. IgivespecialappreciationtoMatthewShum for many productive and inspiring meetings. I also acknowledge my former advisors Sasha Shapoval, Stanislav Anatolyev, Stanislav Khrapov, and Patrik Guggenberger for setting me on the right track. Academically, I have benefited greatly from conversations with a very large number of fine researchers around the world (some of which have become my coauthors). Without attempting to mention all of them here, I would like to thank Michael Leung, Andres Santos, Denis Chetverikov, Yu-Wei Hsieh, Arie Kapteyn, Pierre Hoonhout, Jinyong Hahn, Hashem Pesaran, Kirill Ponomarev, Victor Chernozhukov, Zheng Fang, Vira Semenova, Joris Pinkse, Marc Henry, Andres Aradillaz-Lopez, Pasha Andreyanov, Kaspar Wüthrich, Bulat Gafarov, and Mahrad Sharifvaghefi. I have been lucky to share my time in Los Angeles with wonderful friends. I give massive appreciation to Diana Van Patten who was always there for me when I had a sudden urge to play chess or complain about my life. I thank Ksenia Fiorin for being very supportive in my toughest hours (and letting me take over her lease when she moved out!). I am indebted to Vasily Korovkin for being a reliable tennis partner. I thank Imil Nurutdinov for many hours iii spent talking and playing guitars. I owe Andreas Aristidou for his constant encouragement. I would also like to thank my friends Jeanne Sorin, Margarita Khvan, Amy Mahler, Ruozi Song, Rajat Kochnar, Lidia Kosenkova, Michele Fioretti, Anna Milostnova, Kirill Borusyak, Denis and Olga Shishkin, Eugene Kanasheuski, Roman Istomin, Bhanu Shri, Andrea Nocera, Simon Reese, Nandita Krishnaswamy, Liza Rebrova, Dmitry Panteleev, Rachel Lee, Nico Roig, and Brian Finley, among others. Although I was unable to see my friends from back in Russia as often as I would love to, I stayed in contact with them and value their friendships dearly. Thank you, Ilya Loukonski, Nina Mamontova, Sergey Zaytsev, Andrei and Oleg Silyaev, Renat Shagabutdinov, Natalia Bokhonova, Nikita Khamnaev, Egor Esenkov, Alina Ivanova, and Anna Mazur. I am blessed to have Amber Bhargava as a part of my life and I thank her for her love, patience and wisdom. Her presence brought joy, confidence and calmness during the turbulent final years of my PhD journey. I am grateful to my loving family. I thank my grandparents Lyudmila and Anastas Franguridiforencouragingmyendeavorsandservingasrolemodelsofdecencyandtolerance. Although they did not live to see me get a doctoral degree, their positive influence on my life choices cannot be overstated. I highly appreciate the love of my grandmother Lidia Karaush who always took great care of me. The unwavering support of my parents Oxana and Konstantin Franguridi, to whom this dissertation is dedicated, has been a great source of inspiration. They both had a transformative impact on my personal development and decision-making in their own, very different ways. My mom’s passing in October 2021 was one of the most devastating events of my life. I will always remember and love her. Finally, IgenuinelyappreciatethesupportIreceivedfromstaffattheDepartmentofEco- nomics, especially Young Miller, Alex Karnazes, Morgan Ponder, Annie Le, and Irma Alfaro, staff at the Office of International Services, the Graduate School and medical professionals at the Engemann Student Health Center. iv Table of Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 1 Chapter One: A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Uniform bound on the operator norm . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Generic chaining bound . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.1 Operator norm minimizing estimator . . . . . . . . . . . . . . . . . . 14 1.3.2 Estimator of number of factors with functional data . . . . . . . . . . 17 1.4 Monte Carlo illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Chapter Two: Nonparametric Inference on Counterfactuals in First-price Auctions 22 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 v 2.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.1 Auction model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.3 Data generating process . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3 Estimation and inference for value quantiles . . . . . . . . . . . . . . . . . . 30 2.3.1 The Bahadur–Kiefer expansion . . . . . . . . . . . . . . . . . . . . . 33 2.3.2 Inference on value quantiles . . . . . . . . . . . . . . . . . . . . . . . 35 2.4 Estimation and inference for counterfactuals . . . . . . . . . . . . . . . . . . 39 2.4.1 Smooth (S-type) counterfactuals . . . . . . . . . . . . . . . . . . . . 40 2.4.2 Nonsmooth (T-type) counterfactuals . . . . . . . . . . . . . . . . . . 42 2.5 Monte Carlo experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.6 Empirical application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.7 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3 Chapter Three: Bias Correction for Quantile Regression Models 52 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 Model and estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Asymptotic theory for bias correction . . . . . . . . . . . . . . . . . . . . . . 56 3.3.1 Stochastic expansion of quantile regression estimators . . . . . . . . . 56 3.3.2 Bias formula for exact estimators . . . . . . . . . . . . . . . . . . . . 58 3.3.3 Feasible bias correction . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.4 Finite difference estimators of bias components . . . . . . . . . . . . 61 3.4 Simulation evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5 Empirical application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 vi Bibliography 74 A Appendices to Chapter 1 81 A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 A.2 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 B Appendices to Chapter 2 84 B.1 Estimation and inference for value quantiles, proofs . . . . . . . . . . . . . . 84 B.1.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 B.1.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.2 Estimation and inference for counterfactuals, proofs . . . . . . . . . . . . . . 89 B.2.1 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 B.2.2 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 B.2.3 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 C Appendices to Chapter 3 92 C.1 Bahadur-Kiefer representation, proofs . . . . . . . . . . . . . . . . . . . . . . 92 C.1.1 Auxiliary results for generic IVQR estimators . . . . . . . . . . . . . 92 C.1.2 Auxiliary results for exact estimators . . . . . . . . . . . . . . . . . . 94 C.1.3 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 C.2 Second-order bias correction, proofs . . . . . . . . . . . . . . . . . . . . . . . 99 C.2.1 Auxiliary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 C.2.2 Proofs of main results on bias correction . . . . . . . . . . . . . . . . 104 C.3 Illustration of approximate bias formula in univariate case . . . . . . . . . . 107 C.4 Exact QR and IVQR algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 108 C.5 Stochastic expansion of 1-step corrected IVQR estimators . . . . . . . . . . . 109 C.6 Additional figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 vii List of Tables 1.1 Performance of the maximal rank estimator under different thresholds NT . . 21 2.1 Typical counterfactuals in first-price auctions as linear functionals of v(). . . 29 2.2 Simulated coverage of the 95% uniform confidence bands. . . . . . . . . . . . 45 2.3 Test results at the 95% confidence level. . . . . . . . . . . . . . . . . . . . . 49 viii List of Figures 2.1 Distributions of the number of bidders (left) and of bid residuals (right). . . 47 2.2 Confidence intervals and bands for the counterfactual expected revenue. . . . 48 3.1 Exact (circles) and second-order (crosses) biases, scaled by n, as functions of quantile level for ^ =Y (bnc) , where Y Uniform(0; 1), n = 10. . . . . . . 60 3.2 Bias (multiplied by n) before and after correction for DGP1, different sample sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3 Bias (multiplied by n) before and after correction for DGP2–DGP4 . . . . . 68 3.4 RMSE comparison of raw and bias corrected estimators . . . . . . . . . . . 69 3.5 Quantile regression of annual food expenditure on income . . . . . . . . . . . 72 C.1 Bias (multiplied by n) before and after correction for DGP1, sensitivity to bandwidth choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 ix Abstract Thisdissertationbringstogetherthreeresearchpapersinnonparametricandfinite-sample econometrics. In the first paper (Chapter 1), which is joint work with Hyungsik Roger Moon, for anN T random matrix X() with weakly dependent uniformly sub-Gaussian entries x it () that may depend on a possibly infinite-dimensional parameter2B, we obtain a uniform bound onitsoperatornormoftheformE sup 2B jjX()jj6CK p max(N;T ) + 2 (B;d B ) ,where C is an absolute constant, K controls the tail behavior of (the increments of) x it (), and 2 (B;d B ) is Talagrand’s functional, a measure of multi-scale complexity of the metric space (B;d B ). We illustrate how this result may be used for estimation that seeks to minimize the operator norm of moment conditions as well as for estimation of the maximal number of factors with functional data. The second paper (Chapter 2), which is joint work with Pasha Andreyanov, is concerned with inference for auctions. For a classical model of the first-price sealed-bid auction with independent private values, we develop nonparametric estimation and inference procedures for a class of policy-relevant metrics, such as total expected surplus and expected revenue under counterfactual reserve prices. Motivated by the linearity of these metrics in the quan- tile function of bidders’ values, we propose a bid spacings-based estimator of the latter and derive its Bahadur-Kiefer expansion. This makes it possible to construct exact uniform con- fidence bands and assess the optimality of a given auction rule. Using the data on U.S. Forest Service timber auctions, we test whether setting zero reserve prices in these auctions x was revenue maximizing. In the third paper (Chapter 3), which is joint work with Bulat Gafarov and Kaspar Wüthrich,westudythebiasofclassicalquantileregressionandinstrumentalvariablequantile regression estimators. While being asymptotically first-order unbiased, these estimators can have non-negligible second-order biases. We derive a higher-order stochastic expansion of these estimators using empirical process theory. Based on this expansion, we derive an explicit formula for the second-order bias and propose a feasible bias correction procedure that uses finite-difference estimators of the bias components. The proposed bias correction method performs well in simulations. We provide an empirical illustration using Engel’s classical data on household expenditure. xi Preface At the early stages of its development, the science of econometrics focused on smooth parametricorsemiparametricmodels. Objectsofinterestinsuchmodelsarefinite-dimensional parameters, while inference is often based on asymptotic approximations to the finite-sample distribution of parameter estimators. However, by now it has been well-established that smooth, parametric models often fail to capture important features of economic data. This is why nonsmooth, semi- or nonparametric models have become standard in econometric practice. In such models, inference sometimes requires finite-sample, rather than asymp- totic, techniques. The usefulness of finite-sample considerations is two-fold. First, asymptotic distribu- tions are often imprecise approximations to true distributions, while the convergence to the asymptotic distribution may lack uniformity in the true parameter value, leading to unreli- able inference procedures. Second, the asymptotic distribution may not exist, especially if the object of interest is infinite-dimensional. This calls for at least two modern finite-sample techniques for developing inference procedures and improving their quality. One is to derive asymptotic expansions (rather than just asymptotic distributions), which allows to better understand finite-sample properties of the estimator and sometimes develop algorithms to remove higher-order biases. The other is to use almost sure, finite-sample approximations to true distributions, especially when the asymptotic distribution does not exist or is hard to establish. This dissertation is a collection of three research projects (chapters) in econometrics that xii use the techniques described above to develop better estimation and inference procedures in meaningful economic contexts. Inthefirstchapter, whichisjointworkwithHyungsikRogerMoon, westudytheoperator norm of an NT random matrix that may depend on a (possibly infinite-dimensional) parameter. Under the assumptions of weak dependence between the entries corresponding to the same unit i = 1;:::;n and subgaussianity of innovations, we obtain a bound on the expectation of (equivalently, the tails) of the supremum of the operator norm over the parameter space. This bound is finite-sample: it holds for all N;T up to absolute constant. Besides, our result is nonparametric since the parameter is allowed to be infinite-dimensional and the distribution of the data is unrestricted apart from conditions on tail decay. The second chapter, which is joint work with Pasha Andreyanov, is concerned with non- parametric inference on counterfactuals in first-price auctions with reserve prices. We show that certain counterfactuals of interest, such as expected revenue and expected surplus, can be represented as continuous linear functionals of the quantile function of bidders’ valua- tions (VQF), for which there exists a natural sample analog estimator. Combined with the Bahadur-Kiefer expansion for VQF, this representation allows us to construct estimators and confidence bands that are uniformly valid across counterfactual reserve prices. Our confi- dencebasedarebasedonasimulationfromafinite-sampleapproximationtoanappropriately standardized counterfactual estimator. In fact, for such estimator the asymptotic distribu- tion may not exist, rendering classical techniques inapplicable. The resulting algorithm is computationally fast and easy to code. In the third chapter, which is joint work with Bulat Gafarov and Kaspar Wüthrich, we study the quantile regression with instrumental variables (IVQR) in small samples. We show that the second-order bias of the IVQR estimator may be non-negligible when the sample is small or when the focus is on a tail quantile. We develop a feasible bias correction procedure that leads to a second-order unbiased estimator and show in simulations that it removes a substantial portion of the bias with virtually no increase in variance. xiii Chapter 1 A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications 1 1.1 Introduction Since its introduction in nuclear physics (Wigner, 1955) and mathematical statistics (Wishart, 1928), random matrix theory has been developed to understand the properties of the spectra of large dimensional random matrices generated by various distributions. These include the asymptotic theory of the empirical distribution of the eigenvalues of large dimensional random matrices and bounds on the extreme eigenvalues. For detailed results on these topics, readers can refer to recent surveys like Bai (2008), Edelman and Rao (2005), Bai and Silverstein (2010), and Tao (2012), among others. In random matrix theory the study of the asymptotics of the largest eigenvalue of large dimensional random matrices goes back to Geman (1980). Suppose that X is an NT matrix consisting of random variables x it . Many researchers have derived the limit of the 1 Joint work with Hyungsik Roger Moon 1 largest eigenvalue of the sample covariance matrix, 1 (X 0 X) 2 , under various distributional assumptions on the random matrixX. For example, whenx it are iidN(0; 1) and := lim N T , Geman(1980)showedthat 1 N 1 (X 0 X)! a:s: (1+ 1=2 ) 2 . Johnstone(2001)obtainedastronger result that the properly normalized largest eigenvalue, 1 (X 0 X) NT NT with NT = ( p N 1 + p T ) 2 and NT = ( p N 1 + p T )(1= p N 1 + 1= p T ) 1=3 , converges to the Tracy–Widom law; this has been later shown to hold under more general distributional assumptions by Khorunzhiy (2012) and Tao and Vu (2011), among many others. The aforementioned results imply that 1 (X 0 X) is stochastically bounded 3 of order max(N;T ), or equivalently, the operator normkXk := p 1 (X 0 X) is stochastically bounded of order p max(N;T ). In fact, such bound does not require that the underlying distribution is Gaussian and can be derived under much weaker conditions. For example, Latała (2005) showed that the bound holds ifx it are independent across (i;t) with mean zero and uniformly bounded fourth moments. Moon and Weidner (2017) extended this result for the cases where x it are weakly correlated across i or t. Other papers that have established similar bounds on EkXk include Bandeira and Van Handel (2016), Guédon et al. (2017) and Latała et al. (2018). In the case where X consists of independent sub-Gaussian entries, the p max(N;T ) order for the operator norm may be obtained using a powerful way of bounding sub- Gaussian stochastic processes called generic chaining, which was developed in Fernique (1976) and advanced later by M. Talagrand in a series of papers. Indeed, note thatjjXjj = max u2U max v2V u 0 Xv, where maxima are taken over the unit spheresUR N andVR T , respectively. The processZ(u;v) =u 0 Xv defined onUV can be shown to be sub-Gaussian and so we can invoke generic chaining to get the bound for its expected maximum in terms of a certain measure of metric complexity ofUV called Talagrand’s functional 2 (UV) (see definition in the next section). It turns out that 2 (UV) has exact order p max(N;T ). 2 X 0 denotes the transpose of X. 3 A sequence of random variables n is said to be stochastically bounded or order a n , n =O p (a n ), if for any "> 0 there exists M > 0 such thatP(j n =a n j>M)6" for all large enough values of n. 2 In this paper we extend existing nonasymptotic bounds on the operator norm of a high- dimensional random matrix to the case of elements that are allowed to be weakly dependent and to be functions of a possibly infinite-dimensional parameter. Specifically, let x it () be weakly dependent over t, sub-Gaussian stochastic processes indexed by parameter belonging to a (pseudo-)metric space (B;d B ). Let X() be the NT matrix consisting of x it () and let 2 (B;d B ) be Talagrand’s functional of B w.r.t. d B . Our main contribution is to show thatE sup 2B kX()k is of order p max(N;T ) + 2 (B;d B ). We illustrate usefulness of this uniform bound with two examples. In one, we propose and show consistency of a new estimator that minimizes the operator norm of a matrix that consists of moment functions. In the other, we consider the generalization of the standard factor model to the case of functional data and suggest a new estimator of the maximal number of factors. The paper is organized as follows. Section 1.2 introduces our uniform bound along with the techniques necessary for its derivation. Section 1.3 contains two applications of our theoretical result. Finally, Section 1.5 concludes the paper. The appendix contains two technical proofs of the results in the main text. Throughout the paper, C will denote a universal positive constant that may not be the same at each occurrence, but may never depend on sample sizes, dimensions or any other features of the modeling framework. 1.2 Uniform bound on the operator norm 1.2.1 Generic chaining bound Our main result is based on the general bound on suprema of sub-Gaussian processes called the generic chaining bound. We discuss this classic technique in this section and provide a proof in the appendix for completeness. First, we need the following definitions. The -Orlicz norm of a random variable Y is 3 defined as jjYjj = inffK > 0 s.t. E ( (Y=K))6 1g; where :R + !R + isaconvexfunctionsatisfying lim x!1 (x)=x =1and lim x!0 (x)=x = 0, and the convention that the infimum of an empty set is +1. In this paper, we let = 2 , where 2 (x) = exp(x 2 ) 1, and calljjjj 2 just “the Orlicz norm”. A random variable with finite ( 2 -)Orlicz norm is called sub-Gaussian. Intuitively, the Orlicz norm quantifies the decay speed for the tails of the distribution of Y. In fact,jjYjj 2 6K is equivalent to 4 P(jYj6t)> 1 2e t 2 =K 2 for all t> 0: Hence, for example, Gaussian distributions and distributions with bounded support are all sub-Gaussian. Note also that the last inequality implies EjYj = Z 1 0 (1P(jYj6t))dt6 2 Z 1 0 e t 2 =K 2 dt =K p : (1.1) Now let T be a set and d be a (pseudo-)metric on this set such that (T;d) is a (pseudo- )metric space 5 . Consider a zero mean stochastic process (Z t ) indexed by the elements of T. The process (Z t ) is said to have sub-Gaussian increments if there exists a constant K > 0 such that jjZ t Z s jj 2 6Kd(t;s) for all t;s2T: (1.2) It has long been understood that behavior of sub-Gaussian processes is intimately con- nected to the metric complexity of its index set. In particular, the conventional bound on 4 See e.g. Vershynin (2018), Proposition 2.5.2. 5 Throughout the paper, “metric” can be replaced by a less restrictive notion of “pseudometric”, a distinc- tion we omit from now on. 4 the expected supremum of (Z t ) (see e.g. Van Der Vaart and Wellner (1996) Corollary 2.2.8.) is E sup t2T Z t 6CK Z 1 0 p logN(T;d;")d"; (1.3) whereN(T;d;") is the covering number of (T;d) (i.e. the minimal number of "-balls that is sufficient to cover T in metric d) and C is an absolute constant. The integral on the right hand side is sometimes called Dudley’s entropy of (T;d) and quantifies complexity of (T;d) across multiple scales. It turns out, however, that Dudley’s entropy bound is not optimal, even for Gaussian processes. In fact, the entropy may be infinite when the expected supremum is not, rendering the bound uninformative 6 . This led to the development of more precise ways to control suprema of sub-Gaussian processes in Fernique (1976) and Talagrand (2006). The generic chaining bound is stronger than (1.3) and is sharp for Gaussian processes 7 . To introduce it, we need another definition. For a metric space (T;d), a sequence of finite subsets T 0 T 1 T is admissible if their cardinalities satisfy jT 0 j = 1 andjT k j6 2 2 k for k> 1: (1.4) Let the distance from the point t2T to the set T k T be d(t;T k ) = inf t 0 2T k d(t;t 0 ): Talagrand’s functional 2 is then defined by the formula 2 (T;d) = inf (T k ) sup t2T 1 X k=0 2 k=2 d(t;T k ); (1.5) 6 For an illustrative example, see Exercise 8.1.12 in Vershynin (2018). 7 See Section 8.6 in Vershynin (2018). 5 where the infimum is taken over all admissible sequences (T k ). Note that we can restrict our attention to only those admissible sequences that eventually come arbitrarily close to any point t2T, which is possible provided (T;d) is separable 8 . To understand the relation between Talagrand’s functional and Dudley’s entropy, let us provide the discussion from Talagrand (2006) pp.12–13 here. Denote N 0 = 1, N k = 2 2 k for k> 1, and e k (T ) = inf ST :jSj6N k sup t2T d(t;S): Note that 2 (T;d)6 inf (T k ) 1 X k=0 2 k=2 sup t2T d(t;T k ) = 1 X k=0 2 k=2 e k (T ); (1.6) where the second equality holds because minimizing the sum P 1 k=0 2 k=2 sup t2T d(t;T k ) w.r.t. alladmissiblesequences (T k )canbeperformedbyseparatelyminimizingeachterm sup t2T d(t;T k ) w.r.t. subsets T k T satisfyingjT k j6N k . The definition of e k (T ) involves choosing at most N k points S in T such that the balls with radius e k (T ) and centers in S coverT; moreover, e k (T ) is the minimal such radius, i.e. e k (T ) = inff"> 0 : N(T;d;")6N k g: It follows that ife k (T )<", thenN(T;d;")>N k orN(T;d;")>N k +1. Hence we can write p log(N k + 1)(e k (T )e k+1 (T ))6 Z e k (T ) e k+1 (T ) p logN(T;d;")d": 8 A metric space (T;d) is separable if it has a countable subset that is dense in T. 6 Since log(N k + 1)> 2 k log 2 for k> 0, summation over k> 0 yields p log 2 1 X k=0 2 k=2 (e k (T )e k+1 (T ))6 Z e 0 (T ) 0 p logN(T;d;")d"; (1.7) where, of course, e 0 (T ) = diam(T ) = sup t;s2T d(t;s). The term on the left hand side of this inequality satisfies 1 X k=0 2 k=2 (e k (T )e k+1 (T )) = 1 X k=0 2 k=2 e k (T ) 1 X k=1 2 (k1)=2 e k (T )> 1 2 1=2 1 X k=0 2 k=2 e k (T ): Combining this with (1.6) and (1.7) yields the key relation 2 (T;d)6C Z diam(T ) 0 p logN(T;d;")d": Hence, when used as an upper bound, Talagrand’s functional is sharper than Dudley’s en- tropy. We are now ready to state the generic chaining bound for sub-Gaussian processes, see e.g. Theorem 8.5.3 in Vershynin (2018). Theorem1 (Generic chaining). LetZ t ,t2T be a mean zero random process on a separable metric space (T;d) with sub-Gaussian increments as in (1.2). Then, for some absolute constant C > 0, E sup t2T Z t 6CK 2 (T;d): Proof. See Appendix A.1. 1.2.2 The main result We impose the following assumptions. Assumption 1. The parameter belongs to a separable metric space (B;d B ). 7 Assumption 2. For each2B, random variables x it () follow different MA(1) processes for each i, viz. x it () = 1 X =0 i ()" i;t (); (1.8) where i () are nonrandom coefficients such that, for all i = 1;:::;N and 2B, j i ()j6 ; where 1 X =0 <1: (1.9) Assumption3. Innovations" i () are independent, mean zero, sub-Gaussian random vari- ables with uniformly bounded scaling factors, i.e. there exists K 1 > 0 s.t. for all (i;;)2 NZB jj" i ()jj 2 6K 1 : Assumption 4. Innovations " i () are separable 9 stochastic processes whose increments are sub-Gaussian with uniformly bounded constants, i.e. there exists K 2 > 0 s.t. for all (i;)2NZ and ( 1 ; 2 )2BB jj" i ( 1 )" i ( 2 )jj 2 6K 2 d B ( 1 ; 2 ): Assumption 1 is very weak and only imposes separability of the metric space B which holds for most parameter spaces encountered in practice such as Euclidean spaces and spaces of integrable functions. Assumption 2 is similar to case (ii) in Lemma S.2.1 of Moon and Weidner (2017) and allows x it () to be weakly dependent over time. Assumption 3 and Assumption4imposeuniformsub-Gaussianityontheinnovations" it ()andtheirincrements 9 Let (B;d B ) be a separable metric space with a countable dense subset D. A stochastic process onB is called separable if for all2B, there exists a sequence i 2D such that i ! and( i )!() almost surely. Non-separable stochastic processes have separable copies under very weak conditions, see Shalizi and Kontorovich (2010). 8 " it ( 1 )" it ( 2 ), respectively. Note that Assumption 4 is equivalent to the tail bound P (j" it ( 1 )" it ( 2 )j6td B ( 1 ; 2 ))> 1 2e t 2 K 2 2 for all t> 0: Denote () = ( 1 ();:::; N ()) 0 and let () the NT matrix consisting of " it (),i = 1;:::;N,t = 1;:::;T: Equation (1.8) can be rewritten in the matrix form as X() = (x it ()) = 1 X =0 diag( ()) (): Suppose for a moment that we have a bound on () of the form E sup jj ()jj6'(N;T;B); where ' does not depend on . Then E sup jjX()jj =E sup 1 X =0 diag( ()) () 6E sup 1 X =0 kdiag( ()) ()k 6E 1 X =0 sup kdiag( ()) ()k6E 1 X =0 sup kdiag( ())kk ()k 6E 1 X =0 sup kdiag( ())k sup k ()k = 1 X =0 sup kdiag( ())kE sup k ()k 6'(N;T;B) 1 X =0 sup max i=1;:::;N j i ()j6D'(N;T;B): (1.10) This shows that the bound onE sup jjX()jj is, up to the absolute constant D, the same as the bound onE sup jj ()jj. Hence we can focus on obtaining the latter bound from now on. It will be clear from the proof that the bound will not depend on , so we consider the case = 0 and denote = 0 for brevity. 9 The operator norm of () can be expressed as jj()jj = sup u2U;v2V Z(u;v;); whereU andV are unit spheres inR N andR T , respectively, and the process Z(u;v;) :=u 0 ()v = N X i=1 T X t=1 u i v t " it (): Define the L 1 product metric onUVB by d((~ u; ~ v; ~ ); (u;v;)) =d R N (~ u;u) +d R T (~ v;v) +d B ( ~ ;): where d R d denotes the standard Euclidean metric onR d . To obtain a uniform bound onjj()jj, we would like to apply Theorem 1 to the process Z() defined on the metric space (UVB;d). Our first lemma asserts that Z has sub-Gaussian increments. Lemma 1. Under Assumptions 1, 3, 4, the process Z has sub-Gaussian increments w.r.t. the metric d, with the constant K = max(K 1 ;K 2 ): Proof. For (~ u; ~ v; ~ ); (u;v;)2UVB, write Z(~ u; ~ v; ~ )Z(u;v;) = (~ uu) 0 ( ~ )~ v +u 0 (( ~ ) ())~ v +u 0 ()(~ vv) =z I +z II +z III : Recall a standard result for the 2 norm (see e.g. equation (2.1) in Mendelson and Tomczak- Jaegermann (2008)): there exists an absolute constant c > 0 such that for all constants a i and independent centered variables 1 ;:::; n one has n X i=1 a i i 2 6c v u u t n X i=1 a 2 i jj i jj 2 2 6cjjajj max i=1;:::;n jj i jj 2 : 10 Applying this inequality, we obtain jjz I jj 2 = N X i=1 T X t=1 (~ u i u i )v t " it ( ~ ) 2 6cK 1 d R N (~ u;u); jjz II jj 2 = N X i=1 T X t=1 u i v t (" it ( ~ )x it ()) 2 6cK 2 d B ( ~ ;); jjz III jj 2 = N X i=1 T X t=1 u i (v t ~ v t )" it ( ~ ) 2 6cK 1 d R T (~ v;v): This implies jjZ(~ u; ~ v; ~ )Z(u;v;)jj 2 6jjz I jj 2 +jjz II jj 2 +jjz III jj 2 6c max(K 1 ;K 2 ) d R N (~ u;u) +d R T (~ v;v) +d B ( ~ ;) =c max(K 1 ;K 2 )d((~ u; ~ v; ~ ); (u;v;)); which completes the proof. Our second lemma establishes the bound on Talagrand’s functional of a product space in terms of Talagrand’s functionals of component spaces. Lemma 2 (Talagrand’s functional of a product space). Consider a finite number of metric spaces (T l ;d l ); l = 1;:::;L and the product space T = N L l=1 T l with the L 1 product metric defined by d(t;t 0 ) = L X l=1 d l (t l ;t 0 l ) for t = (t 1 ;:::;t L );t 0 = (t 0 1 ;:::;t 0 L )2T: Talagrand’s functional 2 of T satisfies 2 (T;d)6 (1 + p 2) L X l=1 2 (T l ;d l ): Proof. See Appendix A.2. Finally, by Lemma 1, we can apply the generic chaining bound of Theorem 1 toZ(u;v;) 11 defined on the separable metric space T =UVB with theL 1 metricd. Lemma 2 then yields E sup 2B jj()jj =E " sup (u;v;)2UVB Z(u;v;) # 6CK ( 2 (U;d R N ) + 2 (V;d R T ) + 2 (B;d B )): (1.11) For the unit sphere S d1 inR d , its Dudley’s entropy satisfies Z diam(S d1 ) 0 p logN(S d1 ;jjjj;")d"6C p d: Besides, Talagrand’s functional is bounded from above by Dudley’s entropy (e.g. Exercise 8.5.7 in Vershynin (2018)), up to absolute constant factors. Applying these bounds to unit spheres UR N andVR T gives E sup 2B jj()jj6CK p max(N;T ) + 2 (B;d B ) : Finally, taking into account the inequality (1.10), we obtain the main theoretical result of this paper. Theorem 2. Under Assumptions 1, 2, 3, 4, E sup 2B jjX()jj6CK p max(N;T ) + 2 (B;d B ) ; where K = max(K 1 ;K 2 ): Remarks (i) Generic chaining yields not only the bound on the expected value, but also tail bounds and bounds on moments of sup 2B jjX()jj, see e.g. Dirksen (2015). In particular, it 12 follows from Theorem 8.5.5 of Vershynin (2018) that, for all u> 0, the event sup 2B jjX()jj6CK h p max(N;T ) + 2 (B;d B ) + (2 +diam(B))u i holds with probability at least 1 2 exp(u 2 ), where diam(B) is the diameter ofB in d B . (ii) Suppose" it () are Gaussian random variables. Then the processZ(u;v;) is Gaussian and therefore the bound (1.11) is sharp, up to an absolute constant, by the majorizing measure theorem, see Theorem 8.6.1 in Vershynin (2018). (iii) IfB is a bounded set inR d , the main result and majorization of Talagrand’s functional with Dudley’s entropy yield E sup 2B jjX()jj6CK p max(N;T;d): In particular, if B consists of one element (so that there is no dependence on ), the bound reduces to EjjXjj6CK p max(N;T ); which is a classical result in random matrix theory, see e.g. Latała (2005). (iv) The dimension ofB is allowed to grow with the sample size; of course, to maintain the p max(N;T ) rate for the operator norm, the dimension should not grow faster than p max(N;T ). (v) Theorem2canbegeneralizedtothecaseofOrlicznormsjjjj with (x) = exp(x ) 1, > 1. An important special case = 1 corresponds to sub-exponential random variables. 13 The bound will take the form E sup 2B jjX()jj6CK (max(N;T )) 1= + (B;d B ) ; where the generalized Talagrand’s functional is defined by (T;d) = inf (T k ) sup t2T 1 X k=0 2 k= d(t;T k ): The proof is similar to the case = 2. The appropriate version of the generic chaining bound is E sup t2T Z(t)6CK (T;d); where Z() is a stochastic process with bounded -Orlicz increments. Also, (T;d)6C Z diam(T ) 0 1 (N(T;d;")) d": Both results can be found in Talagrand (2006). 1.3 Applications 1.3.1 Operator norm minimizing estimator In this section, we investigate a new estimator that minimizes the operator norm of the moment function matrix. Suppose that " it ()2R L areL moment functions of2BR K such thatE(" it ( 0 )) = 0. For simplicity, assume that L = K = 1. Let "() = [" it ()], the NT matrix of moment functions. The conventional method of moment estimator solves ~ = arg min 2B 1 NT X i;t " it () = arg min 2B 1 0 N p N "() p NT 1 T p T ; 14 where1 N is the N-vector of ones. The new estimator we propose minimizes the operator norm of the moment function matrix "(), b := arg min 2B k"()k p NT = arg min 2B sup kwk=1;kvk=1 w 0 "() p NT v: In this section we establish consistency of b using our main result of the previous section. Assumption 5. (i) the parameter setB is a bounded subset ofR, (ii) the centered moment function" it ()E(" it ()) satisfies the conditions of Assumptions 2-4, and (iii) for any > 0, there exists > 0 such that inf j 0 j> kE(())k p NT > 2. Conditions (i)-(ii) of Assumption 5 ensure that " it ()E(" it ()) satisfies Assumptions 1-4. The last condition (iii) corresponds to the identification condition of the extremum estimator. For consistency of b , it suffices to show that for any > 0, there exists > 0 such that inf j 0 j> k"()k p NT k"( 0 )k p NT > (1.12) with probability approaching one. First, note that, sinceE("( 0 )) = 0, the triangle inequality yields k"( 0 )k p NT 6 sup 2B k"()E("()k p NT : (1.13) On the other hand, inf j 0 j> k"()k p NT > inf j 0 j> kE("())k p NT sup 2B k"()E("()k p NT : (1.14) 15 Combine (1.13) and (1.14) to obtain inf j 0 j> k"()k p NT k"( 0 )k p NT > inf j 0 j> kE("())k p NT 2 sup 2B k"()E("()k p NT : (1.15) Finally, choose as in Assumption 5(iii) to guarantee inf j 0 j> kE("())k p NT > 2 and note that Theorem 2 gives sup 2B k"()E("()k p NT =o p (1): Then (1.15) implies inf j 0 j> k"()k p NT k"( 0 )k p NT > 2 2o p (1)> w.p.a.1; which finishes the proof of consistency of ^ . Remarks (i) If " it () are iid, then the identification condition Assumption 5 (iii) becomes the usual identification condition, that is, for any > 0, there exists > 0 such that inf j 0 j> jE(" it ())j> 2. Thisisbecause kE("())k p NT =jE(" it ())j 1 N p N 1 0 T p T =jE(" it ())j: (ii) Suppose that " it () = (" 1;it ();:::;" L;it ()) 0 2R L . Instead of the operator norm objec- tive function, we may also consider L X l=1 ! l k" l ()k p NT ; where ! l are weights. (iii) We can also extend the objective function to be the sum ofR NT largest singular values, 16 whereR NT is a sequence of positive integers such thatR NT !1 while R NT p min(N;T ) ! 0: 1 p NT R NT X r=1 s r ("()); where s r (A) is the r th largest singular value of matrix A. Sincek"()E("())k = s 1 ("()E("())), we have sup 2B 1 p NT R NT X r=1 s r ("()E("())) 6R NT k"()E("())k p NT =O p R NT p min(N;T ) ! =o p (1): 1.3.2 Estimator of number of factors with functional data Consider a generic factor model for functional data Y () =()f() 0 +U(); (1.16) where belongs to a separable metric space (B;d B ),Y ()2R NT is the observation matrix of functional outcomes 7! y it (), and ()2R NR() , f()2R TR() such that for all 2B the probability limits of() 0 ()=N andf() 0 f()=T exist and are positive definite deterministic matrices such that lim inf N;T sup s R ()f() 0 p NT > 0: (1.17) The object of interest is the maximal rank R = max 2B R(). To illustrate applicability of this model, suppose that the outcome variable is intraday pollution levels y it (), where is the time within a day, across counties i and time t, as in Aue et al. (2015). It is plausible to assume that counties with higher population density and dependence on automobiles will have higher average levels of pollution. At the same time, 17 pollution patterns on weekdays and on weekends may differ in a systematic way. Hence it is reasonable to model the intraday pollution curve y it () as the interaction of the county fixed effect i () and the time effect f t (), plus independent noise, arriving at model (1.16). A related approach to modeling functional time series can be found in Kargin and Onatski (2008), whose empirical objective is to predict the contract rate curves of daily Eurodollar futures. Of course, arguments similar to those outlined above may be applied to modeling of numerous other functional quantities, from mortality as a function of age to crop yields as a function of spatial location. For more examples and an overview of functional data analysis, see e.g. Wang et al. (2016) and Kowal et al. (2019). Let us now show heuristically how to derive a consistent estimator of the maximal rank R. Note that the model assumptions imply sup s i (()f() 0 ) =O p p NT ; i6R; (1.18) sup s i (()f() 0 ) = 0; i>R: (1.19) IfU()satisfiestheconditionsofTheorem2,wehave sup jjU()jj =O p p max(N;T ) + 2 (B;d B ) and so sup jjU()jj =O p p max(N;T ) : Denotes i (A) thei-th largest singular value of matrixA. The Ky Fan inequality for singular values asserts that for A;B2R NT js i (A +B)s i (A)j6s 1 (B) =jjBjj for all i = 1;:::; min(N;T ): 18 Using this inequality, for a fixed we obtain s R Y () p NT =s R ()f() 0 +U() p NT >s R ()f() 0 p NT U() p NT : Therefore, there exists a positive constant C > 0 such that sup s R Y () p NT > sup s R ()f() 0 p NT sup U() p NT >CO p 1 p min(N;T ) ! ; where the last inequality holds by (1.17). On the other hand, sup s R+1 Y () p NT 6 sup s R+1 ()f() 0 p NT + sup s R+1 U() p NT 6O p 1 p min(N;T ) ! : This establishes consistency of the following natural estimator of R, ^ R = min(N;T ) X l=1 I sup s l Y () p NT > NT ; (1.20) where NT is a sequence of real numbers satisfying NT ! 0 and NT p min(N;T )!1. Empirical practice calls for an automatic procedure for choosing the tuning parameter NT . One may consider one of the following three options, using the penalty term from Bai and Ng (2002): NT;1 = ^ r N +T NT log NT N +T ; NT;2 = ^ r N +T NT log min(N;T ); NT;3 = ^ s log min(N;T ) min(N;T ) ; 19 where ^ 2 = sup ^ 2 () is a consistent estimator of 2 = sup 2 () = sup 1 NT N X i=1 T X t=1 E u it () 2 : In applications, ^ 2 () can be replaced by the residual variance of Y () after partialling out k max factors using principle component analysis, where k max is a pre-specified upper bound on the true maximal number of factors R. 1.4 Monte Carlo illustration Here we illustrate the performance of the maximal rank estimator in the functional factor model described in the previous section with a simple simulation design. The data generating process is the functional factor model (1.16), where, for simplicity, we let the loadings () and the factors f() to be independent of . In scalar form, the model is y it () = R() X r=1 ir f tr +u it (); i = 1;:::;N; t = 1;:::;T; (1.21) where ir ;f tr iidN(0; 1) and u it () = 2 ( it1 cos + it2 sin); it1 ; it2 iidN(0; 1): The chosen specification for u it () comes from a generic representation of any Gaussian stochastic process as an infinite trigonometric series, in which we only retain one term. Clearly, the error variance V(u it ()) = for all and there is nontrivial dependence of u it () across values of. We set = 1. The results do not change substantially when larger values of are used. We choose the range of parameter to beB =f0; 0:1;:::; 0:9; 1g and the corresponding 20 NT;1 NT;2 NT;3 N T 25 50 100 25 50 100 25 50 100 Bias 3.5 1.7 0.2 2.0 0.7 0.0 6.5 4.2 1.3 25 RMSE 0.6 0.6 0.4 0.6 0.6 0.1 0.5 0.6 0.6 Bias 2.0 0.0 0.0 0.9 0.0 0.0 4.5 4.8 0.2 50 RMSE 0.6 0.1 0.0 0.6 0.0 0.0 0.6 0.6 0.4 Bias 0.3 0.0 0.0 0.1 0.0 0.0 1.7 0.3 0.9 100 RMSE 0.5 0.0 0.0 0.2 0.0 0.0 0.6 0.5 0.5 Table 1.1: Performance of the maximal rank estimator under different thresholds NT . ranks (R(0);R(0:1);:::;R(0:9);R(1)) = (4; 4; 1; 4; 3; 1; 2; 3; 4; 4; 1); so that the true value of interest is R = max R() = 4. The simulated bias and root MSE for the maximal rank estimator (1.20) are shown in Table 1.1. Clearly, the choice NT = NT;3 for the tuning parameter leads to poor small sample performance, which is similar to the results of Bai and Ng (2002). However, under the other two choices NT;1 ; NT;2 , bias and RMSE are modest even in small samples and become essentially zero when N;T> 50. Given these simulation results, we are convinced that our generalization (1.20) of the estimatorofBaiandNg(2002)willbeusefulforpractitionerswhoareinterestedinestimating factor models with functional data. 1.5 Conclusion In this paper, we derive a novel uniform stochastic bound on the operator norm of sub- Gaussian random matrices. We use it to establish consistency of a new estimator that minimizes the operator norm of the matrix of moment functions as well as to introduce an estimator of the maximal number of factors in a functional interactive fixed effects model. 21 Chapter 2 Nonparametric Inference on Counterfactuals in First-price Auctions 1 2.1 Introduction In the empirical studies of first-price auctions, a structural approach to estimation and inference is often used. This approach exploits restrictions derived from economic theory to recover bidders’ latent valuations from the observed bids. With these valuations at hand, the researcher can make predictions about the effects of changes in auction rules or composition of bidders. Various methods, in both parametric and nonparametric frameworks, have been developed, see, e.g., Paarsch et al. (2006), Athey and Haile (2007), and Perrigne and Vuong (2019) for an overview. Since the seminal papers by Elyakime et al. (1994), Guerre et al. (2000) and Li et al. (2000), it is the probability density function (PDF) of bidders’ values that has been con- sidered a default medium containing the model primitives. This choice is natural since it allows for constructive identification (Matzkin, 2013) when valuations are independent, 2 and since various counterfactual quantities can be expressed as functionals of the value density. 1 Joint work with Pasha Andreyanov 2 With correlated valuations, nonparametric identification is partial, see Aradillas-López et al. (2013). 22 However, the standard estimator of the value density (Guerre et al., 2000) is a two-step non- parametric procedure not amenable to simple theoretical analysis. 3 More importantly, most counterfactuals are nonlinear functionals of the value density, rendering rigorous counter- factual inference prohibitively hard. This leads to researchers reporting confidence intervals based on simulation from the estimated PDF (e.g., Li and Perrigne, 2003) or none at all. Our main contribution is developing the methodology for nonparametric estimation and uniform inference for a class of important counterfactual quantities, such as bidder’s ex- pected surplus, total expected surplus, and expected revenue under counterfactual reserve prices. This methodology relies on the quantile function of valuations — an alternative can- didate for constructive identification — instead of the PDF, and on recognizing that many counterfactuals are continuous linear functionals of this quantile function. Since the value quantile function is the key ingredient of our counterfactual evaluation, we provide its complete first-order asymptotic analysis. Namely, we derive the uniform, asymptotically linear (Bahadur-Kiefer, or BK) expansion for the kernel estimator ^ v h of the value quantile function v, where h is a smoothing bandwidth. This expansion implies that, despite converging to a Gaussian distribution pointwise, the estimator does not admit a functional central limit theorem, which calls for alternative ways of conducting uniform inference. Luckily, thelineartermofthestudentizedestimatoris known andpivotal, allowing us to suggest simple simulation-based confidence bands and establish their validity using the anti-concentration theory of Chernozhukov et al. (2014). With the theoretical properties of the value quantile function at hand, we move towards the analysis of the estimators of the aforementioned counterfactuals. We show that these can be divided into two broad classes. One class contains the “smoother” (w.r.t. the value quan- tile function) counterfactuals that are estimable at the parametric rate n 1=2 and converge weakly to a Gaussian process in` 1 [0; 1]. The other class contains the “less smooth” counter- factuals that are only estimable at the slower rate (nh) 1=2 and do not converge weakly in 3 For example, uniform inference for the value density was only developed recently in Ma et al. (2019). 23 ` 1 [0; 1]. For each class, we develop a distinct protocol for construction of confidence intervals and bands, and establish their validity. To demonstrate how our methodology can be used to answer a concrete economic ques- tion, we use Phil Haile’s data on U.S. Forest Service timber auctions, where a reserve price was never employed, and assess the optimality of this auction design. Namely, we test whether the seller’s expected revenue could have been increased, had the auction designer chosen a nonzero reserve price. Our work contributes to the expanding literature on quantile methods in first-price auc- tions, see Marmer and Shneyerov (2012) and Enache and Florens (2017) for kernel-based estimators, Luo and Wan (2018) for isotone regression-based estimators, and Guerre and Sabbah (2012) and Gimenes and Guerre (2021) for local polynomial estimators. Interest- ingly, our estimator of valuation quantiles is a weighted sum of the differences of ordered bids, often referred to as bid spacings. The latter have been used for collusion detection in Ingraham (2005), for set identification of bidders’ rents in Paul and Gutierrez (2004) and Marra (2020), and in the prior-free clock auction design in Loertscher and Marx (2020). Finally, Zincenko (2021) has recently shown that it is possible to perform uniform infer- ence on the seller’s expected revenue with a pseudo-value-based estimator. The validity of his confidence bands relies on either the bootstrap delta method or extreme value theory for kernel density estimators. Our quantile-based approach, on the other hand, does not require bootstrapping or extreme value approximations, is applicable to a larger class of counterfac- tuals and may be computationally advantageous due to a natural choice of the grid and the possibility of using the Fast Fourier Transform for discrete convolution. The rest of the paper is organized as follows. In Section 2.2, we set up the theoretical and econometric framework for our analysis. In Section 2.3, we develop estimation and inference on the value quantile function. In Section 2.4, we develop estimation and inference on the counterfactual quantities of interest. In Section 2.5, we provide the Monte Carlo simulations of the finite-sample coverage of our confidence bands. In Section 2.6, we use 24 the timber auction data to test whether counterfactual reserve prices are revenue-enhancing. Section 2.7 contains a discussion of some practical aspects of our methodology. Section 2.8 concludes the paper. Proofs of theoretical results are provided in the Appendix. 2.2 Framework 2.2.1 Auction model Our setting is a sealed-bid first-price auction with independent private values and a (potentially binding) reserve price. There are M> 2 potential bidders in the auction, who are ex ante identical and risk- neutral. 4 A potential bidder becomes active if two conditions are met: her valuation exceeds the publicly announced reserve price r> 0 (i.e. the lowest price at which the auctioneer is willing to sell the auctioned object), and she passes an exogenous (independent of her valuation, identity and the reserve price) selection procedure. Every active bidder submits a bidb without observing the number of active bidders or their bids. Naturally, each bidder assigns the same beliefp m to the event that there are exactlym active bidders in the auction. We assume away the degenerate case p 0 +p 1 = 1. 5 The object is won by the highest bidder who pays the face value of her bid to the auctioneer. Denote the valuation of a potential bidder by v. We impose the following assumption on the value distribution (Guerre et al., 2009, Definition 2). Assumption1 (Distributionofvalues). The valuesv 1 ;:::;v M of potential bidders are drawn independently from a common CDF ~ G such that: 1. The support of ~ G has the form [v; v]; where 06v< v< +1. 2. ~ G is twice continuously differentiable on [v; v]. 4 For risk-averse bidders with known CRRA utilities, the analysis is similar, see, e.g., Zincenko (2021). 5 We emphasize that, although the equilibrium beliefs (p m ) M m=1 depend on the reserve price r as well as the unspecified selection procedure, its nature is irrelevant as long as the beliefs are identical, see, e.g., Krishna (2009, Section 3.2.2). 25 3. ~ g(v) = ~ G 0 (v)> 0 for all v2 [v; v]. We assume that the primitives ~ G;r; (p m ) M m=1 of the model are common knowledge. There- fore,theequilibriumbehaviordependsonlyonthedistributionofvaluationsofactivebidders; this distribution has the CDF G(v) = 8 > > < > > : 0; v<r; ~ G(v) ~ G(r) 1 ~ G(r) ; v>r: (2.1) We denote the associated PDF by g(v) = G 0 (v). If the reserve price is not binding, i.e. ~ G(r) = 0, then G = ~ G. Define the auxiliary functions A 1 (u) := M X m=1 p m u m1 ; A 2 (u) :=uA 1 (u); (2.2) A 3 (u) := (1u)A 1 (u); A(u) := A 1 (u) A 0 1 (u) ; (2.3) where A 0 1 (u)6= 0 since P M m=2 p m > 0 by non-degeneracy of the beliefs. The bidding strategy(v) of the symmetric Bayes-Nash equilibrium can be characterized either via the first order conditions 0 (v) = (v(v))g(v) A(G(v)) ; (v) =v; (2.4) or the envelope conditions (v) =v R v r A 1 (G(x))dx A 1 (G(v)) ; (2.5) see, e.g., Riley and Samuelson (1981) or Krishna (2009). Clearly, this strategy is weakly increasing and twice continuously differentiable for all v2 [v; v]. Equations (2.4) and (2.5) together imply that (v) < v and 0 (v) > 0 for all v > v and, moreover, 0 (v) = 1=(1 + A 0 (0)) > 0 by L’Hôpital’s rule. Consequently, the strategy is strictly increasing with the 26 slope 0 (v)> 0, for all v2 [v; v]. Denoting byF the CDF of the equilibrium bid, andf =F 0 , the inverse bidding strategy can be written as v = 1 (b) =b + A(F (b)) f(b) ; (2.6) allowing to recover the latent values from the observed bids. This suggests a nonparametric estimation approach that was popularized by Guerre et al. (2000) and Li et al. (2000). Alternatively, we can rewrite the equation (2.6) in terms of the quantiles of the partici- pating values. Denote byQ(u) :=F 1 (u) the bid quantile function and byq(u) :=Q 0 (u) the associated bid quantile density. Letv(u) :=G 1 (u) be theu-th quantile of the participating value distribution G. Then equation (2.6) can be rewritten as v(u) = 1 (Q(u)) =Q(u) +A(u)q(u); (2.7) where we use the change of variables b = Q(u) along with the identities F (Q(u)) = u and f(Q(u))q(u) = 1. Since, by definition, Q(u) = (G 1 (u)), and both (G 1 ) 0 (v) and 0 (v) are strictly positive for all v2 [v; v], we arrive at the following property of the equilibrium distribution of bids. Proposition 1 (Distribution of bids). Under Assumption 1, the equilibrium bids are drawn independently from a distribution with a quantile function Q such that: 1. Q is twice continuously differentiable on [0; 1], 2. q(u) =Q 0 (u)> 0 for all u2 [0; 1]. 2.2.2 Counterfactuals The counterfactual experiment of interest is the increase in the reserve price from r to r > r. 6 We show that a variety of counterfactual metrics can be written in terms of the 6 We do not consider the counterfactual decrease in the reserve price since no identification is possible in this case, unless, of course, r is non-binding (which is itself a non-testable assumption in our framework). 27 counterfactual reserve price r and the distribution G of bids submitted under the original reserve price r, despite the fact that the counterfactual experiment is accompanied by the change in the beliefs about the number of participants due to endogenous entry (see the last row of Table 2.1). We then show that, in our model, these counterfactual metrics can be rewritten as linear functionals of the value quantile function v(), the key observation enabling simple inference procedures in Section 2.4. One such counterfactual is the total expected (ex ante) surplus. In a symmetric equilib- rium, it is ex post equal to the highest valuation if it exceeds r , and zero otherwise, which is a random variable with CDF A 2 (G()). Hence the total surplus is its expectation TS(r ) := Z v r xd (A 2 (G(x))): (2.8) Another counterfactual is bidder’s expected surplus. We will distinguish the interim surplus of a potential bidder p (v) from that of an active bidder a (v). By the revenue equivalence principle (see Krishna, 2009), the interim expected utility of an active bidder is related to her equilibrium probability of winning, equal to A 1 (G(v)), via the envelope conditions a (v) := Z v r A 1 (G(x))dx: (2.9) The interim expected utility of a potential bidder only differs from that of an active bidder by a factor of a, the expected participation rate (the ratio of the number of active bidders to the number of potential bidders) under the original reserve price r, 7 p (v) :=a a (v); a := 1 M M X m=1 mp m : (2.10) To derive the potential bidder’s expected (ex ante) surplus BS, we need to take the expec- This is due to the fact that we never observe the bidders whose valuations are smaller than r. 7 We emphasize that endogenous participation is fully captured by the A 1 ;A 2 ;A 3 ;A functions and the a constant, and so we do not need the counterfactual participation rate in our calculations. 28 Classical (nonlinear) form Quantile (linear in v) form total expected surplus Z v r xd (A 2 (G(x))) Z 1 u A 0 2 (z)v(z)dz potential bidder’s expected surplus a R v r A 3 (G(x))dx aA 3 (u )v(u ) a R 1 u A 0 3 (z)v(z)dz expected revenue MaA 3 (G(r ))r + R v r xd (A 2 (G(x)) +MaA 3 (G(x))) MaA 3 (u )v(u ) + R 1 u (A 0 2 (z) +MaA 0 3 (z))v(z)dz optimal bid given value v =v(u) v R v r A 1 (G(x))dx A 1 (G(v)) A 1 (u ) A 1 (u) v(u ) + Z u u A 0 1 (z) A 1 (u) v(z)dz probability of m active bidders given r P M i=m p i i m (1G(r )) m G(r ) im P M i=m p i i m (1u ) m (u ) im Table 2.1: Typical counterfactuals in first-price auctions as linear functionals of v(). tation of p (v) w.r.t. the distribution of v. Integration by parts yields the formula BS(r ) := Z v r p (x)dG(x) =a Z v r Z v r A 1 (G(x))dxd(G(v) 1) =a Z v r A 3 (G(x))dx: Finally, we consider the seller’s expected revenue under the counterfactual reserve price r , which is equal to the difference between the total expected surplus and M times the potential bidder’s expected surplus, Rev(r ) :=TSMBS = Z v r xd (A 2 (G(x)))Ma Z v r A 3 (G(x))dx = =MaA 3 (G(r ))r + Z v r xd (A 2 (G(x)) +MaA 3 (G(x)))); and the bidder’s strategy under r , r (v) =v R v r A 1 (G(x))dx A 1 (G(v)) ; v>r : (2.11) It can be seen that all the aforementioned counterfactuals are complicated, nonlinear 29 functionals of the primitives (r ;G). However, using change of variablesz =G(x) (i.e. pass- ing to the ranks of valuations from their levels) and denoting u =G(r ) yields expressions that are linear in the quantile function v(), see Table 2.1. This makes v() a key object needed for the counterfactual analysis. 2.2.3 Data generating process The observed data is a random sample of bidsfb il ; i = 1;:::;m l ; l = 1;:::;Lg, whereb il denotes the bid submitted by thei-th participant in thel-th auction. All the auctions are ex ante symmetric and independent, and m l is the number of participants in the l-th auction. For brevity, we denote the (random) sample size by n =n(L) = P L l=1 m l , and define b 1 :=b 11 ; b 2 :=b 21 ; ::: ;b n :=b m L L : (2.12) Note that, since the bidders do not know the realizations of the number of active bidders, the samplesfb 1 ;:::;b n g andfm 1 ;:::;m L g are independent. Besides, asL!1, the sample sizen(L)!1 with probability one. Therefore, without loss of generality, we condition our subsequent exposition on a realizationfm l g 1 l=1 such thatn(L)!1. This has the following importantimplication: althoughtheauxiliaryfunctionsA 1 ;A 2 ;A 3 ;Aandtheconstantaneed to be estimated from the data, we can assume that they are known, since their estimators only depends on the conditioning variables m 1 ;:::;m L , see equations (2.16)-(2.18) below. 2.3 Estimation and inference for value quantiles As explained in Section 2.2.2, the value quantile functionv() is the key object needed for the counterfactual analysis. In this section we develop the asymptotic theory for its natural (plug-in) estimator. To define the estimator, we need to introduce two auxiliary objects. 30 The first object is the kernel estimator of the bid quantile density q(u), defined by ^ q h (u) := Z 1 0 K h (uz)d ^ Q(z); u2 [0; 1]: (2.13) Here K is a compactly supported kernel, K h (z) :=h 1 K (h 1 z), h> 0 is a bandwidth, and ^ Q(u) is the empirical bid quantile function, ^ Q(u) = 8 > > < > > : b (bnuc+1) ; u2 [0; 1); b (n) ; u = 1; (2.14) where b (1) 6:::6b (n) are the order statistics of the observed bids b 1 ;:::;b n : We note that ^ q h takes the form of a weighted sum of bid spacings b (i+1) b (i) , ^ q h (u) = n1 X i=1 K h (ui=n) b (i+1) b (i) : (2.15) This estimator was previously studied by Siddiqui (1960) and Bloch and Gastwirth (1968) for the case of rectangular kernel, and by Falk (1986), Welsh (1988), Csörgő et al. (1991) and Jones (1992) for general kernels. The second auxiliary object is the plug-in estimators of A 1 ;A 2 ;A 3 ;A and a defined by A 1 (u) := M X m=1 p m u m1 ; A 2 (u) :=u A 1 (u); A 3 (u) := (1u) A 1 (u); (2.16) A(u) := A 1 (u) A 0 1 (u) ; a := 1 M M X m=1 m p m ; (2.17) where p m is the empirical frequency of auctions with m bidders, p m := 1 L L X l=1 1(m l =m); m = 1;:::;M: (2.18) We use the “check” (as opposed to “hat”) notation here to highlight that A 1 ; A 2 ; A 3 ; A; a 31 are treated as known since, as explained in Section 2.2.3, our analysis is conditional on m 1 ;:::;m L . Given A and ^ q h , we define our estimator of the value quantile v(u) by ^ v h (u) := ^ Q(u) + A(u)^ q h (u); u2 [0; 1]: (2.19) We note that ^ v h consists of two parts: (i) the empirical quantile function ^ Q that is uniformly consistent and converges to a Gaussian process in ` 1 [0; 1] at the parametric rate n 1=2 , and (ii) the kernel quantile density ^ q h that is uniformly consistent only away from the boundaryf0; 1g and does not converge to a (tight) limit in ` 1 ["; 1"] even if " > 0, but converges pointwise to a Gaussian limit at the nonparametric rate (nh) 1=2 , see the proof of Theorem 1. Therefore, the first-order asymptotic properties of ^ v h are determined by the kernel quantile density ^ q h . We impose the following assumptions. Assumption 2 (Kernel function). 1. K :R!R is a nonnegative function such that Z R K(z)dz = 1 and R K := Z R K(z) 2 dz <1: (2.20) 2. K is a Lipschitz function supported on the interval [1; 1]. Assumption 3 (Bandwidth). The bandwidth h =h n is such that h! 0 and 1. there exist c> 0 and > 0 such that h n >cn 1=2+ for all n. 2. there exist C > 0 and > 0 such that h n 6Cn 1=3 for all n. Assumption 2.1 states that K is a valid, square-integrable PDF. Assumption 2.2 is stan- dard in the literature on strong approximations of local empirical processes (see, e.g., Rio, 1994). In particular, it implies thatK is a function of bounded variation, which is crucial in 32 our derivation of the BK expansion. Assumption 3.1 states that h n decays at a rate that is slower thann 1=2 . This assumption is mild and guarantees that the Gaussian approximation to the supremum of our (studentized) estimator has at least the rateo p (log 1=2 n). This rate is needed to establish validity of the confidence bands in Section 2.3.2. Finally, Assump- tion 3.2 imposes undersmoothing that eliminates the smoothing bias in ^ v h and nonsmooth counterfactuals, see Section 2.4.2. 2.3.1 The Bahadur–Kiefer expansion In this section, we derive the Bahadur–Kiefer (i.e. almost sure, uniform, asymptotically linear) representation of the form p nh(^ v h (u)v(u)) ^ q h (u) = A(u)G n;h (u) +R n (u); u2 [h; 1h]; (2.21) where G n;h (u) := p nh 1 n n X i=1 [K h (uF (b i ))EK h (uF (b i ))] (2.22) and the remainder R n (u) converges to zero a.s. uniformly in u2 [h; 1h] with an explicit rate. The key feature of this representation is that the main term is fully known and piv- otal: its distribution does not depend on the data generating process since U i := F (b i ) iid Uniform[0; 1]. Heuristically, this suggests that the distribution of the left-hand side un- der any DGP is a valid approximation for its true distribution. Indeed, in Section 2.3.2, we show the validity of such approximation by combining pivotality with the anti-concentration theory of Chernozhukov et al. (2014). This leads to a simple algorithm for the confidence bands on the quantile function v(): To derive this representation, we rely on the classical BK expansion of the quantile 33 function (Bahadur, 1966; Kiefer, 1967), ^ Q(u)Q(u) =q(u) ^ F (Q(u))u +r n (u); (2.23) where r n (u) =O a:s: n 3=4 `(n) uniformly in u2 [0; 1]: (2.24) Here`(n) = (logn) 1=2 (log logn) 1=4 isalogarithmicoffsetfactorthatarisesduetotheuniform nature of the approximation and may often be disregarded in practice. Note that the BK expansion represents a nonlinear estimator ^ Q(u) as a sum of the linear estimator — the empirical distribution function ^ F (Q(u)) — and the remainder r n (u) that converges to zero a.s. uniformly at a nonparametric (slow) rate n 3=4 `(n). Theorem 1 (Bahadur-Kiefer expansion for value quantiles). Under the Assumptions 1 and 2, the estimator ^ v h (u) has the representation Z n (u) =Z n (u) +R n (u); u2 [h; 1h]; (2.25) where Z n (u) := p nh (^ v h (u)v(u)) ^ q h (u) ; Z n (u) := A(u)G n;h (u); (2.26) R n (u) =O a:s: n 1=2 h 3=2 +h 1=2 +h 1=2 n 1=4 `(n) uniformly in u2 [h; 1h]: (2.27) Remark 1 (BK expansion for quantile density). The proof of the preceding theorem also implies the BK expansion for the normalized quantile density p nh (^ q h (u)q h (u)), which may be of independent interest. In this case, the right-hand side does not have the factor A(u), while the term h 1=2 in the remainder rate can be replaced by the faster term h logh. We note that two types of biases arise in the estimation of v(). The first type of bias is the smoothing biasE^ v h (u)v(u) which manifests in the termn 1=2 h 3=2 in the remainder rate. This bias can be eliminated by undersmoothingh =o(n 1=3 ), i.e. choosing a (suboptimally) 34 small bandwidth such that p nh(E^ v h (u)v(u))! 0. Conversely, if the rate of h is larger than n 1=3 , as in the case of Silverman’s rule-of-thumb bandwidth h rot = O n 1=5 , the confidence bands will be centered at E^ v h (u) rather than v. This conflict between MSE- optimal estimation and correct inference is a feature of most nonparametric estimators, see Horowitz (2001) and Hall (2013). The other type of bias is the boundary bias, stemming from the estimator ^ v h (u) being inconsistent whenu is close to the boundaryf0; 1g of its domain [0; 1]. Because our interest is in valid hypothesis testing, and not the confidence bands per se, we can eliminate this bias by introducing the trimming u2 [h; 1h] while maintaining the validity of inference procedures based on the representation (2.25). 2.3.2 Inference on value quantiles Theorem 1 allows us to construct pointwise confidence intervals and uniform confidence bands for the value quantile function. In particular, the following corollary provides the asymptotic distribution of the estimator of a fixed valuation quantile. Corollary 1. Under the Assumptions 1, 2 and 3, we have, for every u2 (0; 1), p nh (^ v h (u)v(u)) N(0;V (u)); (2.28) V (u) :=A 2 (u)q 2 (u)R K : (2.29) Special cases of this result (for the quantile density estimator ^ q h ) were derived by Siddiqui (1960) and Bloch and Gastwirth (1968). It implies that a confidence interval of nominal confidence level (1) for v h (u) can be constructed as ^ v h (u) A(u)^ q h (u) p R K p nh z 1 2 ; ^ v h (u) + A(u)^ q h (u) p R K p nh z 1 2 ; (2.30) where z 1 2 is the standard normal quantile of level 1 2 . 35 We now turn to the problem of uniform inference on v(). Note that if the process Z n converged weakly in ` 1 [h; 1h] to a known (or estimable) process Z, this would have enabled construction of asymptotically valid confidence bands by using the quantiles of sup u jZ(u)j as critical values. 8 Unfortunately, although Z n (u) is asymptotically Gaussian at each point u2 (0; 1), it does not converge in ` 1 [h; 1h]. This follows from the fact that the main termZ n in the BK expansion is the scaled kernel density process, which is known to lack functional convergence (see, e.g., Rio, 1994). 9 In such a case, there are two common ways to circumvent the problem and derive valid confidence intervals. Oneapproachistoderivetheasymptoticdistributionofanormalizedversionof sup u Z n (u) using extreme value theory, and then rely on the knowledge of the normalizing constants to construct the confidence band. In the case of kernel and histogram density estimation, this approach was pioneered by Smirnov (1950) and Bickel and Rosenblatt (1973). 10 However, convergence to the asymptotic distribution turns out to be very slow, leading to the coverage error of the resulting confidence band to be O(1= logn)), as shown by Hall (1991). The other approach is to rely on finite-sample approximations for (the distribution of) the supremum W n := sup u2[h;1h] jZ n (u)j: (2.31) If such an approximation admits simulation, it can be used for construction of confidence bands. This is the approach we take in this paper. We consider two types of approximations, both of which are pivotal, and hence allow for 8 For one-sided confidence bands, one would use the quantiles of sup u Z n (u) instead. 9 For an example of a sequence of stochastic processes on [0; 1] that weakly converges pointwise, but not in` 1 [0; 1], considerX n (u) =h 1=2 n (B (u +h n )B (u)), whereB is the Brownian motion, h n ! 0 andu2 (0; 1). Clearly,X n (u) N(0; 1) for allu, but, by Lévy’s modulus of continuity theorem, sup u jX n (u)j!1 a.s., and so there is no convergence in ` 1 [0; 1]. 10 For a nonasymptotic version of Smirnov-Bickel-Rosenblatt extreme value theorem, see Rio (1994, The- orem 1.2). 36 simulation. One is simply the supremum of the linear term Z n , viz. W n := sup u2[h;1h] jZ n (u)j: (2.32) The other is the supremum ofZ n under an alternative, uniform[0,1] distribution of bids, viz. W U[0;1] n := sup u2[h;1h] Z U[0;1] n (u) ; (2.33) where Z U[0;1] n (u) is the process Z n (u) calculated using the pseudo-sample f ~ b i g n i=1 iid Uniform[0; 1]: (2.34) This approximation is nonstandard and makes use of the asymptotic pivotality of W n . In principle, any distribution of the pseudo-bids rationalized by a value distribution satisfying Assumption 1 can be chosen; however, the uniform distribution is convenient since, in this case, we have, for all u2 [0; 1], Q(u) :=u; q(u) := 1; v(u) :=u +A(u); (2.35) and hence Z U[0;1] n (u) := p nh ^ v h (u;f ~ b i g n i=1 )u A(u) : (2.36) We emphasize that it is not immediate that the distributions of W n andW U[0;1] n approxi- mate the distribution ofW n in a way that guarantees the validity of the associated confidence bands ^ v h (u) ^ q h (u)c n;1=2 p nh ; ^ v h (u) + ^ q h (u)c n;1=2 p nh ; u2 [h; 1h]; (2.37) 37 wherec n;1=2 is the (1=2)-quantile of eitherW n orW U[0;1] n . Indeed, note that Theorem 1 implies the inequality jW n W n j = sup u2[h;1h] jZ n (u)j sup u2[h;1h] jZ n (u)j 6 sup u2[h;1h] jR n (u)j; (2.38) and hence the coupling W n =W n +r n ; (2.39) wherer n tends to zero a.s. at a known rate. If one could show that this implies Kolmogorov convergence sup t2R jP(W n 6t)P(W n 6t)j! 0; (2.40) then the confidence bands based on W n would be valid. However, (2.40) need not follow from (2.39) even if the a.s. convergence rate of r n is very fast, unless further conditions are imposed on W n . As an illustration of this phenomenon, consider an abstract example W n =n 1 U, W n = n 1 (U 1), where U Uniform[0; 1]. Then r n =W n W n =n 1 , but P(W n 6 0)P(W n 6 0) = 0 1 =16! 0; and so (2.40) does not hold. On the other hand, if W n had an absolutely continuous asymp- totic distributionD, then the CDF of W n would converge to the CDF ofD pointwise, and hence the quantiles ofD would serve as valid critical values. Therefore, intuitively, a certain degree of anti-concentration of W n is needed to guar- antee that the coupling (2.39) implies Kolmogorov convergence (2.40) and hence validity of simulated critical values. The anti-concentration literature mainly focuses on Gaussian 38 processes, while the process Z n is non-Gaussian. Fortunately, Z n is the normalized kernel density estimator for uniform data, which is a well-studied process. In particular, we rely on the seminal work Chernozhukov et al. (2014) to establish a coupling of W n with the supre- mum of a Gaussian process and show that the latter exhibits sufficient anti-concentration. We then argue that an identical argument works for W n . Finally, the pivotality of W n and the coupling (2.39) imply the Kolmogorov convergence for W U[0;1] n . Formally, we have the following result. Theorem 2. Under the Assumptions 1, 2 and 3, sup x2R jP(W n 6x)P(W n 6x)j! 0; (2.41) sup x2R P(W n 6x)P(W U[0;1] n 6x) ! 0; (2.42) and hence the confidence bands (2.37) are asymptotically valid and exact. Remark2. For the purpose of constructing the one-sided confidence bands, we note that the same result holds withW n ,W n , andW U[0;1] n replaced by sup u2[h;1h] Z n (u), sup u2[h;1h] Z n (u), and sup u2[h;1h] Z U[0;1] n (u), respectively. 2.4 Estimation and inference for counterfactuals In this section, we develop the asymptotic theory for the counterfactuals in Table 2.1 which heavily relies on the analysis of the estimator of value quantiles in the previous section. Clearly, every such counterfactual has the general form T (u ) :='(u )v(u ) + Z 1 u (x)v(x)dx; (2.43) where ' and are continuously differentiable functions on [0; 1] that only depend on the auxiliary objects A 1 ;A 2 ;A 3 ;A and a (or their derivatives). As an example, for the total 39 expected surplus '(x) 0 and (x) = A 0 2 (x), while for the expected revenue '(u ) = MaA 3 (u ) and (x) =A 0 2 (x) +MaA 0 3 (x). The representation (2.43) implies T (u ) is a (weighted) sum of two continuous linear functionals of v of different smoothness: (i) evaluation at a point v(u ) and (ii) integration S (u ) := Z 1 u (x)v(x)dx: (2.44) Thenaturalestimatorsofthetwocomponentshavefundamentallydifferentasymptoticprop- erties. Namely, in Section 2.3.1 we showed that the less smooth functional v(u ) is only estimable at the nonparametric rate (nh) 1=2 and does not converge in ` 1 ["; 1"] even for "> 0. On the other hand, we will show in Section 2.4.1 that the smoother functional S (u ) is estimable at the parametric rate n 1=2 and converges to a Gaussian process in ` 1 [0; 1]. We will combine the two results in Section 2.4.2 to show that, whenever '6= 0, inference on T can be performed similarly to that on v. 2.4.1 Smooth (S-type) counterfactuals First, let us consider estimation and inference for functionals (2.44), where : [0; 1]!R is a known, continuously differentiable function. To motivate our estimator, use integration by parts to rewrite S (u ) = Z 1 u (u)Q(u)du + Z 1 u A(u) (u)dQ(u) (2.45) = Z 1 u (u)Q(u)du + A(u) (u)Q(u)j 1 u Z 1 u Q(u)d[ A(u) (u)] = (2.46) = Z 1 u (u)Q(u)du + A(u) (u)Q(u)j 1 u Z 1 u Q(u)( A 0 (u) (u) + A(u) 0 (u))du (2.47) = Z 1 u (u)Q(u)du A(u ) (u )Q(u ) + A(1) (1)Q(1); (2.48) 40 where we denote (u) := (1 A 0 (u)) (u) A(u) 0 (u): (2.49) Note that the latter formula expresses S (u ) as a continuous linear functional of the quantilefunctionQ,whichisestimableataparametricrate. Thisleadstoanaturalestimator of S that does not contain tuning parameters. Namely, for any u 2 [0; 1], we define the estimator ^ S (u ) := Z 1 u (u) ^ Q(u)du A(u ) (u ) ^ Q(u ) + A(1) (1) ^ Q(1) (2.50) = n1 X i=bnu c b (i+1) Z i+1 n max(i;nu ) n (u)du A(u ) (u )b (bnu c+1) + A(1) (1)b (n) : (2.51) The following theorem establishes standard Gaussian process asymptotics for ^ S . Theorem 3. Under Assumptions 1, 2 and 3, p n( ^ S ()S ()) G ;q () in ` 1 [0; 1]; (2.52) whereG ;q () is a tight, centered Gaussian process on [0; 1] with the covariance function EG ;q (u )G ;q (v ) = Cov (f u (U);f v (U)); Uuniform[0; 1]; u ;v 2 [0; 1]; (2.53) and f u (U) := Z 1 u (u)q(u)1(U6u)du + A(u ) (u )q(u )1(U6u ): (2.54) Remark 3. The integral in the expression for ^ S (u ) can be replaced by ((i))=n for any (i)2 i n ; i+1 n . This would have no impact on the statement of Theorem 3. 41 2.4.2 Nonsmooth (T-type) counterfactuals Given the asymptotic results for the two components of the generic counterfactual (2.43), we may now turn to estimation and inference on the latter. To this end, define the estimator ^ T h (u ) = '(u )^ v h (u ) + ^ S (u ); u 2 [h; 1h]; (2.55) where ^ v h (u ) is defined in (2.19) and ^ S (u ) is the estimator (2.51) with = . Since ^ S (u ) converges fast, while ^ v h (u ) converges slowly, the asymptotics of ^ T h (u ) is dominated by the latter, as illustrated by the following theorem. Theorem 4. Under the Assumptions 1 and 2, we have the representation Z T n (u ) =Z T n (u ) +R T n (u ); u 2 [h; 1h]; (2.56) where Z T n (u ) := p nh ^ T h (u )T (u ) ^ q h (u ) ; Z T n (u ) := '(u ) A(u )G n;h (u ); (2.57) R T n (u ) :=O a:s: n 1=2 h 3=2 +h 1=2 +h 1=2 n 1=4 `(n) uniformly in u 2 [h; 1h]: (2.58) Theorem 4 immediately yields the following result on the asymptotic distribution of ^ T h (u ) at a fixed point u 2 (0; 1). Corollary 2. Under the Assumptions 1, 2 and 3, we have, for every u 2 (0; 1); p nh( ^ T h (u )T (u )) N(0;V (u )); (2.59) V (u ) = (A(u )q(u )'(u )) 2 R K : (2.60) We now consider uniform inference on T (). Since the estimator ^ T h () does not converge 42 in ` 1 [h; 1h], but the approximating process Z T n is known and pivotal, the methodology will be similar to the case of the valuation quantile function, see Section 2.3.2. In particular, we show that valid confidence bands for T () can be based on simulation from either (i) the approximating process Z T n , or (ii) the process Z T n under an alternative distribution of bids. To this end, define W T n := sup u 2[h;1h] Z T n (u ) ; (2.61) W T n := sup u 2[h;1h] Z T n (u ) ; (2.62) W T;U[0;1] n := sup u 2[h;1h] Z T;U[0;1] n (u ) ; (2.63) where Z T;U[0;1] n (u ) is the process Z T n (u ) calculated using the pseudo-sample f ~ b i g n i=1 iid Uniform[0; 1]; (2.64) cf. equations (2.31)-(2.34). Define the confidence bands by ^ T h (u ) ^ q h (u )c n;1=2 p nh ; ^ T h (u ) + ^ q h (u )c n;1=2 p nh ; u2 [h; 1h]; (2.65) where c n;1=2 is the (1=2)-quantile of either W T n or W T;U[0;1] n . Theorem 5. Under the Assumptions 1, 2 and 3, we have sup t2R P(W T n 6t)P(W T n 6t) ! 0; (2.66) sup t2R P(W T n 6t)P(W T;U[0;1] n 6t) ! 0; (2.67) and hence the confidence bands (2.65) are asymptotically valid and exact. Remark4. For the purpose of constructing the one-sided confidence bands, we note that the same result holds withW T n ,W T n , andW T;U[0;1] n replaced by sup u2[h;1h] Z T n (u), sup u2[h;1h] Z T n (u), 43 and sup u2[h;1h] Z T;U[0;1] n (u), respectively. Remark5 (Shape of confidence bands). Note that, for any function () bounded away from zero on [0; 1], the representation (2.56) is equivalent to Z T n (u )=(u ) =Z T n (u )=(u ) + ~ R T n (u ); u 2 [h; 1h]; (2.68) where ~ R T n (u ) has the same uniform convergence rate as R T n (u ). Similarly to (2.65), the two-sided confidence bands based on such representation are ^ T h (u ) (u )^ q h (u )c n;1=2 p nh ; ^ T h (u ) + (u )^ q h (u )c n;1=2 p nh ; u2 [h; 1h]; (2.69) where c n;1=2 is the (1=2)-quantile of either of the random variables W T; n := sup u 2[h;1h] Z T n (u )=(u ) ; (2.70) W T;U[0;1]; n := sup u 2[h;1h] Z T;U[0;1] n (u )=(u ) ; (2.71) These bands stay asymptotically exact for any , but have the shape (u )^ q h (u ), which may affect their finite-sample performance and asymptotic power. 11 2.5 Monte Carlo experiments While our theoretical results establish the asymptotic validity of the confidence bands, they do not rule out a substantial finite-sample size distortion. In this section, we evaluate the extent of this distortion in a set of Monte Carlo experiments. For simplicity, we simulate an auction with exactly two bidders (M = 2 and p 2 = p 2 = 1) and a non-binding (original) reserve price r = 0. We consider three choices for the 11 Montiel Olea and Plagborg-Møller (2019) discuss a similar issue with simultaneous confidence bands for a vector (rather than functional) parameter. 44 Estimand (i) (ii) (iii) (iv) (v) Sample size = 1,000, trim = 3% beta(1,1) 0.95 0.952 0.912 0.91 0.974 beta(2,2) 0.954 0.954 0.912 0.904 0.97 beta(5,2) 0.952 0.954 0.924 0.916 0.966 beta(2,5) 0.956 0.962 0.902 0.898 0.968 powerlaw(2) 0.952 0.952 0.928 0.922 0.976 powerlaw(3) 0.948 0.948 0.93 0.926 0.978 Sample size = 10,000, trim = 1.5% beta(1,1) 0.95 0.948 0.932 0.936 0.96 beta(2,2) 0.954 0.954 0.932 0.934 0.96 beta(5,2) 0.952 0.954 0.93 0.932 0.962 beta(2,5) 0.952 0.952 0.918 0.93 0.958 powerlaw(2) 0.954 0.952 0.94 0.938 0.96 powerlaw(3) 0.948 0.952 0.934 0.938 0.96 Sample size = 100,000, trim = .7% beta(1,1) 0.95 0.948 0.938 0.942 0.954 beta(2,2) 0.952 0.948 0.944 0.946 0.956 beta(5,2) 0.954 0.952 0.944 0.948 0.956 beta(2,5) 0.956 0.952 0.932 0.948 0.954 powerlaw(2) 0.944 0.948 0.948 0.948 0.954 powerlaw(3) 0.946 0.948 0.952 0.95 0.952 Table 2.2: Simulated coverage of the 95% uniform confidence bands. distributions of the observed bids: uniform, beta and power-law; all of which are supported on the interval [0; 1]. The simplest choice is the uniform distribution, since it has a strictly positive density with q(u) = 1. The beta(;) distribution features a bell-shaped density for; > 1, with varying skewness. In contrast, the density f(x) =x 1 of the power-law distribution increases on the support [0; 1] for> 1. We censor these distributions at top 5% and bottom 5% quantile levels, so that the quantile density is strictly positive and satisfies the statement of Proposition 1. 12 The estimation targets are (i) the bid quantile functionq, (ii) the value quantile function v; and the following quantities as functions of the counterfactual reserve price: (iii) the potential bidder’s expected surplus, (iv) the expected revenue, and (v) the total expected 12 The censoring of the distribution in the Monte Carlo simulations is achieved by replacing their true quantile function Q(u) with (Q(0:05 + 0:9u)Q(0:05))=(Q(0:95)Q(0:05)). We emphasize that every simulated bid distribution can be rationalized by some value distribution satisfying Assumption 1. 45 surplus, see Table 2.1. For the non-counterfactual targets (i), (ii) and for the T-type counterfactuals (iii), (iv), we calculate the confidence bands by simulation from the left-hand side of the respective BK expansions under the uniform[0,1] bid distribution, see Theorem 5. For theS-type functional (v), the confidence bands are constructed by simulation from the estimated process ^ G(u ) := 1 p n n X i=1 ^ f u (U i )E ^ f u (U i ) ; (2.72) where U i iid Uniform[0; 1] and ^ f u is equal to f u with the true values q replaced by their estimates ^ q h , see Theorem 3. We use the undersmoothing bandwidthh = 1:06sn 0:34 , where s is the standard deviation of bid spacings, and set both the number of DGP simulations and the number of simulations for the critical values to 500. The results are shown in Table 2.2. The simulated coverage can be seen to be close to the nominal level of 0:95 for larger sample sizes, which validates our theoretical results in Sections 2.3 and 2.4. 2.6 Empirical application In this section, we apply our methodology to test the hypothesis about the optimality of the auction design employed in timber sales held by the U.S. Forest Service in between 1974 and 1989, see, e.g., Haile (2001). These auctions did not feature a reserve price (i.e. r = 0) which raises the question whether the collected revenue could have been higher, had the reserve price been set at a positive level. We use the data kindly provided by Phil Haile on his website. 13 We select a subsample of auctions that are sealed-bid and have at least 2 bidders, see Figure2.1. Asiscommonintheliterature, weresidualizethelog-bidsusingavailableauction- level characteristics: year and location dummies, the logarithms of the tract advertised value 13 http://www.econ.yale.edu/~pah29/ 46 2 3 4 5 6 7 8 9 bidders 0 1000 2000 3000 4000 5000 count 0.5 1.0 1.5 2.0 bid residuals 0.0 0.5 1.0 1.5 density Figure 2.1: Distributions of the number of bidders (left) and of bid residuals (right). and the Herfindahl index. 14 The latter is a measure of homogeneity of the tract with respect to the timber species. This procedure is consistent with a multiplicative model of observed auction heterogeneity. 15 The distribution of bid residuals is truncated at the 5-th percentile on each end, leaving 60758 observations, see Figure 2.1. To assess the change (u ) in the expected revenue, we use the quantile version of the revenue formula in Table 2.1, which yields (u ) := Rev(u )Rev(0) =MaA 3 (u )v(u ) Z u 0 (A 0 2 (x) +MaA 0 3 (x))v(x)dx; whereu > 0 is the counterfactual exclusion level (rank of the counterfactual reserve price in the distribution of valuations) and we used the fact that in our application v(0) =r. Since (u ) is similar to the T-type functional (2.43), its estimator is ^ h (u ) :='(u )^ v h (u ) Z u 0 (u) ^ Q(u)du + A(u ) (u ) ^ Q(u ) A(0) (0) ^ Q(0) ; (2.73) 14 The exponentiated log-bid residuals (to which we will refer simply as bid residuals) are interpreted as estimates of the idiosyncratic component of bids, while the exponentiated fitted values are interpreted as estimates of the common component of bids, see Haile et al. (2003). 15 The asymptotic distributions of the test statistics are not affected by the error in the residualization procedure, as long as the estimates of the common component of bids converge at a faster (in our case parametric) rate, see Haile et al. (2003) and Athey and Haile (2007). 47 0.1 0.2 0.3 0.4 0.5 quantile levels 1.42 1.44 1.46 1.48 expected revenue × 10 7 point estimate 95 % confidence (2-sided) interval 95 % confidence (2-sided) band optimal exclusion 0.1 0.2 0.3 0.4 0.5 quantile levels 1.42 1.44 1.46 1.48 × 10 7 95 % confidence (1-sided) interval 95 % confidence (1-sided) band Figure 2.2: Confidence intervals and bands for the counterfactual expected revenue. where '(x) :=Ma A 3 (x), (x) :=A 0 2 (x) +MaA 0 3 (x) and is defined in (2.49). We use the undersmoothing bandwidth h = 1:06sn 0:34 , wheres is the standard devia- tionofspacingsofbidresiduals,andevaluate ^ h (u )ontheevenlyspacedgridfi=ng n i=0 . This bandwidth is slightly smaller than the Silverman rule of thumb bandwidth h = 1:06sn 1=5 . Toconstructtheconfidencebands, weusetherepresentationinTheorem4andRemark5. First, 1000 realizations of the bid quantile density ^ q U h () are simulated, independently from the data, based on pseudo-bids from the uniform[0,1] distribution. Second, for a nominal confidence level (1), the critical value c n;1 is computed as the (1)-quantile of sup u2[h;1h] (^ q U h (u) 1). Finally the one-sided confidence band is computed as ^ h (u)MaA 3 (u) A(u) ^ q h (u)c n;1 ; +1 ; u2 [h; 1h]: (2.74) We test the hypothesis of nonexistence of a counterfactual (positive) reserve price that would increase the seller’s expected revenue. Formally, we test the hypotheses H 0 against H 1 , where H 0 : sup u 2[h;1h] (u ) = 0; H 1 : sup u 2[h;1h] (u )> 0: (2.75) The corresponding test statistic is the maximal (over the grid) value of the lower end point function of the one-sided confidence band, and H 0 is rejected whenever this maximum 48 Number of bidders 2 3 2-5 5-9 2-9 sample size 10328 12477 43387 26841 60758 bandwidth 0.01 0.009 0.006 0.007 0.006 optimal exclusion ~ u 0.274 0.305 0.293 0.311 0.276 H 0 reject reject reject reject reject Table 2.3: Test results at the 95% confidence level. is positive. We denote by ~ u the point at which the maximum is attained, i.e. the optimal exclusion level. We test the hypothesis using subsamples of auctions with different numbers of bidders, see Table 2.3. We use subsamples with 2 and 3 bidders, and also with 2-5 (small auctions), 5-9 (large auctions), and 2-9 (all auctions) bidders. 16 Under all the specifications, H 0 is rejected at 95% confidence level, see Table 2.3 and Figure 2.2, meaning that the revenue gains at the optimal reserve price are statistically significant, albeit relatively small. 2.7 Practical considerations In this section, we briefly discuss some important technical aspects of our methodology. Choice of the grid. While it is theoretically possible to evaluate our estimators at any quantile level, choosing the evenly-spaced gridfi=ng n i=2 has a massive impact on the com- putational complexity of the estimation procedure and its performance. Note that, with this grid, the estimate of ^ q h (u) becomes a discrete convolution of the vector of spacingsfb (i) b (i1) g n i=2 with a discrete filter corresponding to K h . The discrete convolution is a remarkably fast and reliable procedure. Moreover, the counterfactuals es- timators can be well approximated with the weighted cumulative sums of the vectors of spacings. Consequently, all our estimators can be thought of as combinations of elementary vector operations with sorting and convolution. 16 Typically, a researcher would pick, for the sake of simplicity, a subsample of auctions with the same number of bidders. However, our methodology allows for a random number of bidders, so we can pool auctions with different numbers of bidders together. 49 Shape restrictions. Since v() is a quantile function, one may want to impose mono- tonicity on ^ v h () and the associated confidence bands. As suggested in Chernozhukov et al. (2009, 2010), an effective way of doing so is smooth rearrangement of the estimate and the confidence bands, whose discrete counterpart is merely a sorting algorithm. We leave the analysis of such shape-restricted estimators for future work. We note that there is emerg- ing literature exploiting shape restrictions for auction counterfactuals, e.g. Henderson et al. (2012); Luo and Wan (2018); Pinkse and Schurter (2019); Ma et al. (2021). Competing estimators. The main competitor to our estimator ^ q h () of the bid quantile density is the reciprocal of the kernel estimator ^ f l () of the bid density, as in the first step of the procedure of Guerre et al. (2000), ~ q l (u) = ^ f l (b ([un]+1) ) 1 = 1 nl n X i=1 K b ([un]+1) b i l ! 1 ; u2 [0; 1]: An insightful comparison of ^ q h and ~ q l was carried out by Jones (1992) who showed that the variance components of the mean squared errors (MSE) of the estimators are equal if so-called scale match-up bandwidths are used. Namely, for a fixed pointu2 (0; 1), the MSE of ^ q(u) is only less than or equal to that of ~ q l (u) whenq(u)q 00 (u)6 1:5q 0 (u) 2 or, equivalently, f(b)f 00 (b)> 1:5f 0 (b) 2 : Therefore, the reciprocal kernel density ~ q l performs better close to the center of the distribution, while the kernel quantile density ^ q h is preferable at the tails. Finally, we note that the algorithms to construct the estimates of ~ q l and ^ q h on the grid, seem to have different computational complexity, which can be heuristically shown to be O(n 2 ) and O(n logn), respectively. This is due to the fact that the convolution algorithm, which the estimator ^ q l (u) relies on, has the complexity of roughlyO(n logn) due to its usage of the fast Fourier transform. 50 2.8 Conclusion In this paper, we develop a novel approach to estimation and inference on counterfactual functionals of interest (such as the expected revenue as a function of the rank of the coun- terfactual reserve price) in the standard nonparametric model of the first-price sealed-bid auction. We show that these counterfactuals can be written as continuous linear functionals of the quantile function of bidders’ valuations, which can be recovered from the observed bids using a well-known explicit formula. We suggest natural estimators of the counterfac- tuals and show that their asymptotic behavior depends on their structure. In particular, we classify the counterfactuals into two types, one allowing for parametric (fast) convergence rates and standard inference, and the other exhibiting nonparametric (slow) convergence rates and lack of uniform convergence. For each of the types of counterfactuals, we develop simple, simulation-based algorithms for constructing pointwise confidence intervals and uni- form confidence bands. We apply our results to assess the potential for revenue extraction by setting an optimal reserve price, using Phil Haile’s USFS auction data. Avenues for further research include incorporating auction heterogeneity, showing the minimax optimality of the counterfactual estimators, developing procedures for data-driven bandwidth selection, shape-restricted estimation of the valuation quantile function and as- sociated functionals, and estimation and inference for the value density estimator based on our value quantile function estimator. Acknowledgements We are grateful to Karun Adusumilli, Tim Armstrong, John Asker, Yuehao Bai, Zheng Fang, Sergio Firpo, Antonio Galvao, Andreas Hagemann, Vitalijs Jascisens, Hiroaki Kaido, Nail Kashaev, Michael Leung, Tong Li, Hyungsik Roger Moon, Hashem Pesaran, Guillaume Pouliot, Geert Ridder, Pedro Sant’Anna, Yuya Sasaki, Matthew Shum, Liyang Sun, Takuya Ura, Quang Vuong, and Kaspar Wuthrich for valuable comments. 51 Chapter 3 Bias Correction for Quantile Regression Models 1 3.1 Introduction Many interesting empirical applications of classical quantile regression (QR) (Koenker and Bassett, 1978) and instrumental variable quantile regression (IVQR) (Chernozhukov and Hansen, 2005, 2006) feature small samples sizes, which can arise either as a result of a limited number of observations or when estimating tail quantiles, or both (e.g., Chernozhukov, 2005; Elsner et al., 2008; Chernozhukov and Fernández-Val, 2011; Adrian and Brunnermeier, 2016; Adrian et al., 2019). QR and IVQR estimators are nonlinear and can thus exhibit substantial biases in small samples. In this paper, we theoretically characterize these biases and develop a feasible bias correction procedure. To study the biases, we start by deriving a higher-order stochastic expansion of the clas- sical QR and exact IVQR estimators. 2 Such an expansion is needed because the higher-order terms contribute nonzero biases while the first-order term does not. We derive explicit ex- 1 Joint work with Bulat Gafarov and Kaspar Wüthrich 2 We define exact IVQR estimators as estimators that exactly minimize a norm of the sample moment conditions. Such estimators can be obtained using mixed-integer programming (MIP) methods (e.g., Chen and Lee, 2018; Zhu, 2019). See Appendix C.4. 52 pressions and uniform (in the quantile level) rates for the components in the expansion, building on the empirical process arguments of Ota et al. (2019). This expansion can be thought of as a refined Bahadur-Kiefer (BK) representation of the estimator that decom- poses the nonlinear component into terms up to the order O p (n 1 ) and a O p n 5=4 p logn remainder. 3 Using the stochastic expansion, we study the bias of QR and exact IVQR estimators. We derive a bias formula based on the leading terms up to orderO p (n 1 ) in the expansion, which we refer to as the second-order (asymptotic) bias. 4 The second-order bias formula provides an approximation of the actual bias that yields a feasible correction. The results explicitly account for theO (n 1 ) (up to a logarithmic factor) bias due to nonzero sample moments at theestimator. Weemphasizethatourproofstrategyisdifferentfromthegeneralizedfunction heuristic used in the existing literature (Phillips, 1991; Lee et al., 2017, 2018), which does not account for all the leading terms up to O (n 1 ) in the bias. 5 The missing terms can be important bias contributors in practice, as we document in the empirical application in Section 3.5. A feasible bias correction procedure then follows from the second-order bias formula. We propose finite-difference estimators of all the components in the formula. These estimators admit higher-order expansions that allow us to select bandwidth rates. 6 We show that the resulting bias-corrected estimator has zero second-order bias. WeevaluatetheperformanceofourbiascorrectionprocedureinaMonteCarlosimulation study. The simulations show that the theoretical (infeasible) bias formula describes well the second-order bias of classical QR and exact IVQR. We find that the proposed feasible bias correction can effectively reduce the bias in many cases, even in samples as small as n = 50. 3 In Appendix C.5, we also derive a uniform BK representation for generic IVQR estimators after a feasible 1-step Newton correction. 4 This approach of focusing on the moments of the leading terms in the stochastic expansions is standard in the literature (e.g., Nagar, 1959; Kaplan and Sun, 2017). 5 See Section 3.3.2 and Appendix C.3 for further discussion and examples. 6 In particular, our finite-difference estimator for the Jacobian coincides with Powell (1986)’s classical estimator, and the bandwidth rate we derive coincides with the AMSE optimal bandwidth choice in Kato (2012). 53 However, strong asymmetry and, especially, heavy tails in the conditional distribution of the outcome may reduce the effectiveness of bias correction. The impact of the feasible bias correction on the root MSE (RMSE) is rather small and ambiguous. We illustrate the bias correction approach by revisiting the relationship between food expenditure and income based on the original Engel (1857) data (e.g., Koenker and Bassett, 1982; Koenker and Hallock, 2001). Our results highlight the importance of bias correction in empiricalapplicationswithsmallsamplesizes. Specifically, wefindthatthesecond-orderbias of classical QR is non-negligible, especially at the upper tail, leading QR to underestimate the effect heterogeneity across quantiles. Roadmap. The remainder of the paper is organized as follows. Section 3.2 describes the model and the estimators. Section 3.3 provides our main theoretical results. Section 3.4 presents the Monte Carlo simulation results. Section 3.5 contains the empirical application. Section 3.6 concludes. All the proofs and some additional details are given in the Appendix. 3.2 Model and estimators Consider a setting with a continuous outcome variable Y, a (k 1) vector of covariates W, and a (k 1) vector of instruments Z. We assume throughout that k is fixed. Every observation (Y i ;W i ;Z i ),i = 1;:::;n, is jointly drawn from a distributionP. We assume that (Y i ;W i ;Z i ) is i.i.d., and we will sometimes suppress the index i to lighten up the notation. The parameter of interest 0 2 R k is defined as a solution to the following unconditional quantile moment restrictions, E[(1fY6W 0 0 g)Z] = 0; 2 (0; 1): (3.1) We consider two cases: (i) classical QR, where Z = W (Koenker and Bassett, 1978), and (ii) linear IVQR, where Z6=W in general (Chernozhukov and Hansen, 2006, 2008). 54 The classical QR estimator of 0 is a solution to the following convex minimization problem, ^ ;QR 2 argmin 2 E n (YW 0 ); (3.2) where (u) = u( 1fu < 0g) is the check function (Koenker, 2005) andE n denotes the sample average, i.e., the expectation with respect to the empirical measure. For IVQR, we consider estimators that exactly minimize the p-norm of the sample moments, ^ `p 2 argmin 2 jj^ g ()jj p ; (3.3) where p2 [1;1] and ^ g () :=E n (1fY 6W 0 g)Z. This class of exact IVQR estimators includes GMM, which corresponds to p = 2 as in Chen and Lee (2018) for just-identified models, and the estimator proposed by Zhu (2019), which corresponds to p =1. The cases p = 1 and p =1 have computationally convenient mixed integer linear programming (MILP) representations, while the MILP formulation for p = 2 has many more decision variables. In our Monte Carlo simulations, we use p = 1 for computational convenience (see Appendix C.4). We use the notation g () := E(1fY 6 W 0 g )Z for the unconditional moment restrictions as a function of 2 . We maintain the following standard identification assumptions. Assumption 4 (Identification). 1. 0 is the unique solution to g () = 0 over a compact set R k , and 0 is in the interior of for all 2 (0; 1). 2. The Jacobian G( 0 ) :=@ g ( 0 ) has full rank for all 2 (0; 1). As noted by Chernozhukov and Hansen (2006, p.502), compactness of the parameter space “is not restrictive in micro-econometric applications.” Throughout the paper, we use the short notation G for G( 0 ) whenever it does not lead to ambiguity. 55 We impose the following smoothness assumptions on the conditional density and its derivatives. Such assumptions are standard in the literature on higher-order properties of quantile estimators (e.g., Ota et al., 2019). Assumption5 (Conditionaldensity). The conditional density ofY i given (W i ;Z i ),f Y (yjw;z), exists, is a.s. three times continuously differentiable on supp(Y ), and there exists a constant f such thatjf (r) Y (yjw;z)j6 f for all (y;w;z)2supp(Y )supp(W )supp(Z), wherer = 0; 1 and f (r) Y (jw;z) is the r-th derivative of f Y (jw;z). In our theoretical analysis of the bias, we will often work with a related object, the conditional densityf " (ejW;Z) :=f Y (e+W 0 0 jW;Z) of the quantile residual" :=YW 0 0 . Finally, we assume that the regressors and the instruments have bounded support. Assumption 6 (Regressors and instruments). There exists a constant m <1 such that kZk<m andkWk<m a.s. Assumption 6 simplifies the exposition. It could be relaxed following arguments similar to those in Ota et al. (2019). 3.3 Asymptotic theory for bias correction To derive a bias correction procedure, we follow the approach of Nagar (1959) and focus on the bias of the leading terms in an asymptotic stochastic expansion of the estimator. 3.3.1 Stochastic expansion of quantile regression estimators Theclassicalfirst-orderasymptotictheoryforquantileregressionestimators(e.g.,Koenker and Bassett, 1978; Angrist et al., 2006; Chernozhukov and Hansen, 2006; Kaplan and Sun, 2017; Kaido and Wüthrich, 2021) is based on the following leading term, ^ := 0 G 1 ( 0 )E n Z (1fY6W 0 0 g): (3.4) 56 For correctly specified models, ^ is an infeasible unbiased estimator of 0 . However, because feasible quantile estimators are nonlinear, they generally have a nonzero higher-order bias. The following theorem provides a characterization of the terms in a stochastic expansion of ^ up to order O p (n 1 ) (ignoring logarithmic terms). Tostatetheresult, weintroducesomeadditionalnotation. For2 , definetheauxiliary functions g () :=E1fY 6 W 0 gZ, B n () := p n(E n 1fY 6 W 0 gZg ()), and B n () := B n ()B n ( 0 ). 7 Also, denote the Hessian of the j-th moment function g j by @ G j ( 0 ) := @ @ g j ( 0 ). Finally, for any x 2 R k , denote by x 0 @ G( 0 )x the vector with components x 0 @ G j ( 0 )x, j = 1;:::;k. Theorem 6. Suppose that Assumptions 4–6 hold. Consider ^ = ^ `p obtained from program (3.3) for some p2 [1;1] or ^ = ^ ;QR . Then ^ = ^ +G 1 " ^ g ( ^ ) B n ( ^ ) p n 1 2 ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) # +R n; ; where sup 2[";1"] k^ g ( ^ )k = 8 > > < > > : O p 1 n ; if ^ = ^ ;QR ; O p logn n ; if ^ = ^ `p ; (3.5) sup 2[";1"] kB n ( ^ )k =O p p logn n 1=4 ; (3.6) sup 2[";1"] kR n; k =O p p logn n 5=4 : The proof of Theorem 6 builds on the empirical process arguments in Lemma 3 of Ota et al. (2019) and uses the maximal inequality in Corollary 5.1 of Chernozhukov et al. (2014). The rates in Theorem 6 are uniform in the quantile level . Uniformity is important in theory and practice because QR and IVQR methods are particularly powerful when used to analyze the entire quantile process. 7 Processes B n () and B n () take values in the space ` 1 () of bounded functions on . 57 The expansion in Theorem 6 can be thought of as a refined Bahadur-Kiefer (BK) expan- sion. Different from standard BK expansions, we do not bundle together all the higher-order terms(asopposedto, forexample,ZhouandPortnoy,1996;Ota et al.,2019). Noticethatthe dominant nonlinear term in the BK expansion, n 1=2 G 1 B n ( ^ ), has orderO p n 3=4 p logn (Equation (3.6)). Theorem 3 in Knight (2002) shows that for classical QR with discrete covariates, this term converges in distribution to a zero mean random process. Therefore, we explicitly extract the higher-order terms up to order O p (n 1 ) (ignoring logarithmic terms) from the BK remainder. As we will show in the following sections, these higher-order terms admit feasible counterparts. Remark 6 (Alternative approach for deriving stochastic expansions). Portnoy (2012) pro- posed an alternative approach for deriving a stochastic expansion of classical QR estimators. This approach yields bounds on the precision of a nonlinear Gaussian approximation of order O p n 1 log 5=2 n . As we will show, the expansion in Theorem 6 yields a bias formula for both QR and IVQR estimators that admits a feasible implementation. The results in Portnoy (2012) are specific to classical QR, and it is not clear to us whether these results can be used for bias correction using a Nagar-style approach. Remark 7 (General IVQR estimators). While we focus on exact IVQR estimators in the main text, the results in Theorem 6 can be used to obtain a uniform BK expansion for general 1-step corrected IVQR estimators. See Appendix C.5 for details. 3.3.2 Bias formula for exact estimators Following common practice (e.g., Nagar, 1959; Kaplan and Sun, 2017), for a generic estimator ^ , we define the second-order bias B(^ ) as the bias of the leading terms in the stochastic expansion of ^ up to the orderO p (n 1 ). This second-order bias can be interpreted as an approximation of the actual bias that works with arbitrarily high probability in large samples. 58 Before stating the result, we observe that under our Assumption 5, the moment condition (3.1) is equivalent to E[(1fY6W 0 ( 0 )g (1))Z] = 0: Thus, we can characterize the QR and IVQR estimators using the moment function g () := E[(1fY6W 0 g(1))Z] with sample analog ^ g (). The following theorem characterizes the second-order bias in terms of ^ g ( ^ ) and ^ g ( ^ ). As in Section 3.3.1, we define@ G j ( 0 ) := @ @ g j ( 0 ). Theorem 7. Suppose that Assumptions 4–6 hold. Consider ^ = ^ `p obtained from program (3.3) for some p2 [1;1] or ^ = ^ ;QR . Then the second-order bias is B( ^ ) =G 1 1 2 E ^ g ( ^ ) ^ g ( ^ ) n 1 2n Q 0 vec( ) ; (3.7) where := 1 2 Ef " (0jW;Z)ZW 0 G 1 Z; :=Var[Z(1fY6W 0 0 g)]; Q is a matrix with columns Q j :=vec (G 1 ) 0 @ G j ( 0 )G 1 ; j = 1;:::;k: Thesecond-orderbiasformula(3.7)hasthreecomponents. Thefirstterm,G 1 E ^ g ( ^ ) ^ g ( ^ ) =2, captures the bias from the sample moments not being zero at the estimator. This term is not equal to zero in general. By Theorem 6 and Assumption 6, this term has bias of order at most O (n 1 logn). The second component, n 1 G 1 , appears because of the disconti- nuity in the sample moment functions. The term reflects the dependence between the sample moments and the linear influence of a single observation on ^ . The last compo- nent, (2n) 1 G 1 Q 0 vec( ), stems from the non-uniformity of the conditional distribution of Y given (W;Z). Similar terms are typically present in most nonlinear estimators with nonzero Hessian of the score function (see, for example, Rilstone et al., 1996). 59 Toillustratetheapproximatebiasformula,consideranorderstatisticofY Uniform(0; 1) (corresponding to our framework withZ =W = 1), for which an exact bias formula is avail- able (e.g., Ahsanullah et al., 2013). We show in Appendix C.3 that the precision of the second-order bias formula in this case is O (n 2 ), which is smaller than the order of the re- mainder term in the stochastic expansion of Theorem 6. Figure 3.1 illustrates the precision of the asymptotic formula by comparing it to the actual bias of the order statistic. Figure 3.1: Exact (circles) and second-order (crosses) biases, scaled by n, as functions of quantile level for ^ =Y (bnc) , where Y Uniform(0; 1), n = 10. It is interesting to compare our results to the higher-order bias analysis of non-smooth estimators based on the generalized functions heuristic (e.g., Phillips, 1991). In recent work, Lee et al. (2017, 2018) derived a second-order bias formula for classical QR and IVQR under the assumption that the sample moments are zero at the estimator so that the first term of the bias formula vanishes. We show in Appendix C.3 that this term is non-negligible even in simple cases (see also Figure 3.5 in Section 3.5). 3.3.3 Feasible bias correction The bias formula suggests the following feasible bias-corrected estimator, ^ bc = ^ 1 2 ^ G 1 h ^ g ( ^ ) ^ g ( ^ ) i + 1 n ^ G 1 ^ + 1 2 ^ Q 0 vec( ^ ) ; 60 where ^ G, ^ , ^ Q, and ^ areestimatorsofG, ,Q, and , respectively, satisfyingthefollowing consistency requirement. Assumption 7 (Consistency of component estimators). The estimators ^ G, ^ , ^ Q, and ^ are consistent for G, , Q, and , respectively. Moreover, ^ GG =o p (1= logn). In Section 3.3.4, we propose finite-difference estimators for which Assumption 7 holds under Assumptions 4–6. Assumption 7 could also be verified for other nonparametric esti- mators of the bias components. The next theorem shows that the second-order bias of the bias-corrected estimator is zero. Theorem 8. Suppose that Assumptions 4–7 hold. Consider ^ = ^ `p obtained from program (3.3) for somep2 [1;1] or ^ = ^ ;QR . Then the bias correction eliminates the second-order bias,B ^ bc = 0: Note that the requirement that ^ GG = o p (1= logn) in Assumption 7 is necessary to ensure that the contribution of the product of the sample moments, which are O p (n 1 logn) by Theorem 6, and the estimation error in G 1 can be omitted for computing the second- order bias. It is only required for IVQR estimators. For classical QR estimators, the sample moments are of order O p (n 1 ), so that consistency of ^ G at any rate of convergence suffices for Theorem 8. 3.3.4 Finite difference estimators of bias components In order to implement the bias correction, we need estimators of G, , Q, and that satisfy Assumption 7. The variance matrix can be estimated using the analogy principle, ^ :=E n [Z(1fY6W 0 ^ g)E n Z(1fY6W 0 ^ g)] 2 : All other bias components take the form of derivatives. Therefore, we leverage our theoret- ical results on the properties of the sample moments to develop a unified finite-difference 61 framework for estimating these components. Under Assumptions 4 and 5, the Jacobian is G = Ef " (0jW;Z)ZW 0 and the Hessian consists of gradients of the components of G, i.e. @ G i;j ( 0 ) =Ef (1) " (0jW;Z)Z i W j W, where i;j = 1;:::;k: This suggests the following analog estimators. Denote by e i the i-th unit vector inR k , where i = 1;:::;k. The (i;j)-th component of G can be estimated using Powell (1986)’s estimator ^ G i;j =E n " 1fY6W 0 ^ +h 1;n g 1fY6W 0 ^ h 1;n g 2h 1;n Z i W j # ; (3.8) where h 1;n ! 0 is a bandwidth. The derivative of the (i;j)-th component of G in the directione ` (i.e. the second partial derivative of g ) can be estimated as the symmetric first difference of (3.8), \ (@ G i;j ) ` =E n " 1fY6W 0 ^ +h 2;n g 2 1fY6W 0 ^ g + 1fY6W 0 ^ h 2;n g h 2 2;n Z i W j W ` # ; where h 2;n ! 0 is a (potentially different) bandwidth. For , the finite difference sample analog is ^ = 1 2 E n " 1fY6W 0 ^ +h 3;n g 1fY6W 0 ^ h 3;n g 2h 3;n ZW 0 ^ G 1 Z # ; whereh 3;n ! 0 is a bandwidth. Finally, Q can be estimated by the sample analog matrix ^ Q with columns ^ Q j :=vec h ( ^ G 1 ) 0 [ @ G j ^ G 1 i ; j = 1;:::;k; where [ @ G j is the matrix with elements \ (@ G i;j ) ` for i;` = 1;:::;k. The next lemma establishes the consistency of the estimators of the bias components. It implies that these estimators satisfy the high-level conditions in Assumption 7. Moreover, it provides nearly remainder-optimal bandwidth rates, i.e., the rates that yield the fastest con- 62 vergence rates of the stochastic remainder terms of the corresponding stochastic expansions (up to logarithmic terms). Lemma 1. Suppose that Assumptions 4–6 hold. Then the nearly remainder-optimal band- width rates are h 1;n /n 1=5 , h 2;n /n 1=7 , h 3;n /n 2=15 , and, under these bandwidth rates, ^ G =G +O p p logn n 2=5 ; \ (@ G i;j ) ` = (@ G i;j ) ` +O p p logn n 2=7 ; ^ Q j =Q j +O p p logn n 2=7 ; ^ = +O p p logn n 4=15 ; ^ = +O p 1 p n : Moreover, the convergence rate for ^ G is uniform in 2 ["; 1"]. Proposition 1 in Kato (2012) shows that the remainder rate for ^ G in Lemma 1, h 1;n / n 1=5 , is the AMSE-optimal rate. 8 To implement the finite difference estimators in practice, one needs to choose constants in addition to the bandwidth rates. We discuss the choice of these constants in Sections 3.4 and 3.5. 3.4 Simulation evidence In this section, we evaluate the performance of our feasible bias correction procedure in a Monte Carlo simulation study. We consider data-generating processes (DGPs) inspired by the simulations in Andrews and Mikusheva (2016). The outcome is generated according to 8 We conjecture that analogous AMSE-optimality results could be established for estimators \ (@ G i;j ) ` and ^ Q j , but leave this extension for future work. 63 the following location-scale model Y i =W i + (0:5 +W i )U i ; i = 1;:::;n; where W i = ( ~ W i ), Z i = ( ~ Z i ), U i = F 1 U ( ~ U i ) , ( ~ W i ; ~ Z i ; ~ U i ) N(0; ), 11 = 22 = 33 = 1, 23 = 0, and is the standard normal CDF. Hence, in all the designs, both the regressors and the instruments are Uniform[0; 1]. We consider four DGPs that differ with respect to the error distribution F U and whether or not W is exogenous. DGP1 (Uniform, exogenous) F U (u) = R u 1 1ft2 [0; 1]gdt 12 = 1, 13 = 0 DGP2 (Triangular, exogenous) F U (u) = R u 1 2t1ft2 [0; 1]gdt 12 = 1, 13 = 0 DGP3 (Cauchy, exogenous) F U (u) = R u 1 1 (1+(4t) 2 ) dt 12 = 1, 13 = 0 DGP4 (Uniform, endogenous) F U (u) = R u 1 1ft2 [0; 1]gdt 12 = 0:75, 13 = 0:25 An important practical issue is the choice of the bandwidths h 1;n , h 2;n , and h 3;n . We choose the nearly remainder-optimal rates for the bandwidths from Lemma 1: h 1;n = A G n 1=5 , h 2;n =A Q n 1=7 , h 3;n =A n 2=15 , where (A G ;A Q ;A ) are constants that are inde- pendent of the sample size. We report results for both our baseline choice (A G ;A Q ;A ) = (1; 1; 1) and the choice of (A G ;A Q ;A ) that minimizes the ` 2 -distance between the residual biases after infeasible and feasible correction for the intercept and the slope parameter over the grid A :=f(1; 1; 1); (1:5; 1; 1); (0:5; 1; 1); (1; 1:5; 1); (1; 0:5; 1); (1; 1; 1:5); (1; 1; 0:5)g: (3.9) Figure C.1 in Appendix C.6 shows how the results change if we increase/decrease each element of (A G ;A Q ;A ), one at a time. We focus on the performance of bias correction for 2f0:25; 0:5; 0:75g. To evaluate the precision of the Monte Carlo integration, we compute the Monte Carlo standard error (MCSE). We report 1 (1 0:05=12) MCSE to account for the joint testing of six hy- 64 potheses (two hypotheses per each of three levels of ). For comparison, we also display the infeasible bias correction based on the true G, Q, and . Figure 3.2 shows the results for the exogenous DGP1 with n2f50; 100; 200g based on classical QR of Y on W, implemented via linear programming (see Appendix C.4). Notice that the scaled bias of the QR estimator, n(E ^ ;QR 0 ), (blue dots) has the same range across the different sample sizes. This illustrates that we are estimating O(n 1 ) terms in the asymptotic bias expansion, which should be approximately constant after scaling by n. One can see that the infeasible bias correction (gold dashed lines) reduces the bias at most quantile levels. The correction does not result in zero bias due to the finite precision of the simulation (the sample moment term does not have an explicit bias formula) and the presence of the higher-order bias terms. Notice that the MCSE grows with the sample size for a fixed number of simulations due to the rescaling, which helps explain the seemingly better performance for n = 50. We find that the feasible bias correction (gold squares and crosses) typically reduces the bias at 2f0:25; 0:75g, where the bias of classical QR is largest, and can get very close to the infeasible bias correction. At = 0:5, the bias of classical QR is small so that the feasible bias correction has a negligible impact. The feasible bias correction with the baseline bandwidth choice (1; 1; 1) results in comparable deviations from the infeasible one across the different sample sizes. This is consistent with Lemma 1 since, otherwise, different sample sizes would require different constants A G ;A Q ;A . In Figure 3.3, we study the role of the error distribution F U and endogeneity. The in- feasible bias correction reduces the bias across all the DGPs. Panels (a) and (b) document two cases that present challenges for the feasible bias correction procedure in small samples: strong asymmetry in DGP2, which may affect the performance at the median, and, espe- cially, heavy tails in DGP3, which may affect the performance at tail quantiles. Panel (c) demonstrates that the bias correction performs well when combined with IVQR implemented using the MILP formulation in Appendix C.4. While the scaled second-order bias of IVQR 65 can be substantial, especially for the slope parameter, the feasible bias correction effectively removes that bias across all quantile levels considered. Finally, we investigate the impact of bias correction on the RMSE of the estimators in Figure 3.4. We report the results for QR based on the exogenous DGP1 and IVQR based on the endogenous DGP4 with n = 100. Overall, the impact of bias correction on the RMSE is small. While it slightly increases the RMSE for DGP1, it sometimes decreases the RMSE for DGP4. 66 Figure 3.2: Bias (multiplied by n) before and after correction for DGP1, different sample sizes (a) (b) (c) Notes: The panels display the bias (multiplied by n) of the intercept and the slope for classical QR without bias correction (blue dots), QR with the best feasible bias correction (gold squares), QR with the baseline feasible bias correction (gold crosses), and QR with infeasible bias correction (gold dashed line) for DGP1 withn2f50; 100; 200g. The error bands (gold bars) correspond to 1 (1 0:05=12) MCSE to account for the joint testing of 6 hypotheses. (Notice that the scaled MCSE of the estimator grows with n.) All results are based on 40,000 simulation repetitions. 67 Figure 3.3: Bias (multiplied by n) before and after correction for DGP2–DGP4 (a) (b) (c) Notes: The panels display the bias (multiplied by n) of the intercept and the slope for classical QR without bias correction (blue dots), QR with the best feasible bias correction (gold squares), QR with the baseline feasible bias correction (gold crosses), and QR with infeasible bias correction (gold dashed line) for DGP2– DGP4 withn = 100. We use classical QR for DGP2 and DGP3 and exact IVQR (implemented via the MILP formulationinAppendixC.4)forDGP4. Theerrorbands(goldbars)correspondto 1 (10:05=12)MCSE to account for the joint testing of 6 hypotheses. The DGP2 and DGP3 results are based on 40,000 simulation repetitions; the DGP4 results are based on 10,000 repetitions. For DGP4, the infeasible bias correction is based on the feasible formula applied to a simulated sample of 10,000,000 observations. 68 Figure 3.4: RMSE comparison of raw and bias corrected estimators (a) (b) Notes: Panel (a) compares the RSME for classical QR without (blue) and with (gold) bias correction. Panel (b) compares the RMSE for exact IVQR (implemented via the MILP formulation in Appendix C.4) without (blue) and with (gold) bias correction. The results for DGP1 are based on 40,000 simulation repetitions; the results for DGP4 are based on 10,000 repetitions. 3.5 Empirical application The second-order bias matters most in applications with small sample sizes. We therefore illustrate our bias correction approach using the classical dataset of Engel (1857), analyzed 69 by Koenker and Bassett (1982) and Koenker and Hallock (2001), among others. The data contain information on annual income and food expenditure (in Belgian francs) for n = 235 Belgian working-class households and are obtained from the R package quantreg (Koenker, 2022). One feature of these data is the growing dispersion of the outcome variable (food expenditure) as a function of the regressor (income) (Koenker and Hallock, 2001), which is similar to our Monte Carlo designs. We divide the values of income and food expenditure by 1000 so that the unit of measurement becomes a thousand Belgian francs. This makes the scale of the variables comparable to that in our Monte Carlo simulations, allowing us to use the same baseline bandwidth choices. We estimate classical QRs of food expenditure (Y) on income (W) and a constant (blue dots). The bias-corrected estimators (gold squares) are obtained using the baseline band- width choice h 1;n = A G n 1=5 , h 2;n = A Q n 1=7 , and h 3;n = A n 2=15 with (A G ;A Q ;A ) = (1; 1; 1). Figure 3.5 presents the results. Panel (a) shows the impact of bias correction. The results suggest that bias correction is particularly important at the upper tail, where we find substantial differences between the classical QR and the bias-corrected estimates. After the bias correction, we observe more heterogeneity across quantile levels, suggesting that the second-order bias leads classical QR to underestimate this heterogeneity. Notice that the second-order bias can be large even for the median (i.e., least absolute deviation) regression. To investigate the sensitivity of our results to the choice of the three bandwidths, we also report the minimum and maximum bias corrected estimates when (A G ;A Q ;A ) varies over the setA in equation (3.9). Our results suggest that bias-corrected estimates are insensitive to the bandwidth choice. Panel (b) shows the individual contributions of the different components to the overall 70 second-order bias. We can decompose the bias correction term as follows: ^ ^ bc = 1 2 ^ G 1 h ^ g ( ^ ) ^ g ( ^ ) i | {z } (i) 1 n ^ G 1 ^ | {z } (ii) 1 2n G 1 ^ Q 0 vec( ^ ) | {z } (iii) : The main takeaway from the bias decomposition is that while all three components play a role, the sample moments term (i) can be very large and accounts for most of the bias at the upper tail quantiles. The QR estimates with the highest overall second-order bias have the largest contribution from the sample moments term (2f0:5; 0:8g). These results suggest that one could reduce the worst-case bias substantially by only correcting the sample moment term, which is akin to a 1-step Newton correction (see Appendix C.5). 71 Figure 3.5: Quantile regression of annual food expenditure on income (a) Impact of bias correction (b) Composition of second-order bias Notes: Panel (a) compares the classical QR estimates with (blue dots) and without (gold squares) bias correction, where (A G ;A Q ;A ) = (1; 1; 1). The bars indicate the maximum and minimum of the estimates when varying (A G ;A Q ;A ) over the gridA defined in (3.9). Panel (b) shows the contributions of different bias components to the overall second-order bias. 3.6 Conclusion We demonstrate that QR and IVQR estimators can exhibit a non-negligible second-order bias. We theoretically characterize this bias and use our results to derive a novel feasible 72 bias correction method. Our method can effectively reduce the second-order bias at a very low computational cost and without substantially increasing RMSE. Acknowledgements We are grateful to the Editor (Xiaohong Chen), the Associate Editor, and three anony- mous referees of the Journal of Econometrics, as well as Victor Chernozhukov, Zheng Fang, Dalia Ghanem, Jiaying Gu, Nail Kashaev, Roger Koenker, Vladimir Koltchinskii, Simon Lee, Blaise Melly, Hyungsik Roger Moon, Hashem Pesaran, Joris Pinkse, Wolfgang Polonik, Stephen Portnoy, Geert Ridder, Andres Santos, Davide Viviano, Yuanyuan Wan, and sem- inar participants at UC Berkeley, UC Davis, University of Toronto, and USC for valuable comments. All errors and omissions are our own. 73 Bibliography Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables.NationalBureauofStandardsAppliedMath- ematics Series 55. Tenth Printing. Adrian, T., Boyarchenko, N. and Giannone, D. (2019). Vulnerable growth. American Economic Review, 109 (4), 1263–89. — and Brunnermeier, M. K. (2016). Covar. American Economic Review,106 (7), 1705– 41. Ahsanullah, M., Nevzorov, V. B. and Shakil, M. (2013). An introduction to order statistics, vol. 8. Springer. Alexander, K. S. (1987). The central limit theorem for empirical processes on vapnik- cervonenkis classes. The Annals of Probability, pp. 178–203. Andrews, I. and Mikusheva, A. (2016). Conditional inference with a functional nuisance parameter. Econometrica, 84 (4), 1571–1612. Angrist, J., Chernozhukov, V. and Fernández-Val, I. (2006). Quantile regression undermisspecification, withanapplicationtotheuswagestructure. Econometrica,74(2), 539–563. Aradillas-López, A., Gandhi, A. and Quint, D. (2013). Identification and inference in ascending auctions with correlated private values. Econometrica, 81 (2), 489–534. Athey, S. and Haile, P. A. (2007). Nonparametric approaches to auctions. Handbook of econometrics, 6, 3847–3965. Aue, A., Norinho, D. D. and Hörmann, S. (2015). On the prediction of stationary functional time series. Journal of the American Statistical Association, 110 (509), 378– 392. Bahadur, R. R. (1966). A note on quantiles in large samples. The Annals of Mathematical Statistics, 37 (3), 577–580. Bai, J. andNg, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70 (1), 191–221. 74 Bai, Z. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices, vol. 20. Springer. Bai, Z. D. (2008). Methodologies in spectral analysis of large dimensional random matrices, a review. In Advances In Statistics, World Scientific, pp. 174–240. Bandeira, A. S. and Van Handel, R. (2016). Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability, 44 (4), 2479– 2506. Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. The Annals of Statistics, pp. 1071–1095. Bloch, D. A. and Gastwirth, J. L. (1968). On a simple estimate of the reciprocal of the density function. The Annals of Mathematical Statistics, 39 (3), 1083–1085. Chen, L.-Y. and Lee, S. (2018). Exact computation of gmm estimators for instrumental variable quantile regression models. Journal of Applied Econometrics, 33 (4), 553–567. Chernozhukov, V. (2005). Extremal quantile regression. The Annals of Statistics,33 (2), 806 – 839. —, Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of em- pirical processes. The Annals of Statistics, 42 (4), 1564–1597. — andFernández-Val, I. (2011). Inference for extremal conditional quantile models, with an application to market and birthweight risks. The Review of Economic Studies, 78 (2), 559–589. —,Fernandez-Val, I.andGalichon, A.(2009).Improvingpointandintervalestimators of monotone functions by rearrangement. Biometrika, 96 (3), 559–575. —,Fernández-Val, I.andGalichon, A.(2010).Quantileandprobabilitycurveswithout crossing. Econometrica, 78 (3), 1093–1125. — andHansen, C. (2005). An iv model of quantile treatment effects. Econometrica,73 (1), 245–261. — and — (2006). Instrumental quantile regression inference for structural and treatment effect models. Journal of Econometrics, 132 (2), 491–525. — and — (2008). Instrumental variable quantile regression: A robust inference approach. Journal of Econometrics, 142 (1), 379–398. Csörgő, M., Horváth, L. and Deheuvels, P. (1991). Estimating the quantile-density function. In Nonparametric Functional Estimation and Related Topics, Springer, pp. 213– 223. Dirksen, S. (2015). Tail bounds via generic chaining. Electronic Journal of Probability,20. 75 Edelman, A. andRao, N. R. (2005). Random matrix theory. Acta Numerica,14, 233–297. Elsner, J. B., Kossin, J. P. and Jagger, T. H. (2008). The increasing intensity of the strongest tropical cyclones. Nature, 455 (7209), 92–95. Elyakime, B., Laffont, J. J., Loisel, P. and Vuong, Q. (1994). First-price sealed-bid auctions with secret reservation prices. Annales d’Economie et de Statistique, pp. 115–141. Enache, A.andFlorens, J.-P. (2017).Aquantileapproachtotheestimationoffirst-price private value auction. Available at SSRN 3522067. Engel, E.(1857).DieProduktions-undKonsumptionsverhältnissedesKönigreichsSachsen. Zeitschrift des Statistischen Bureaus des Königlich Sächsischen Ministeriums des Innern, 8, 1–54. Falk, M. (1986). On the estimation of the quantile density function. Statistics & Probability Letters, 4 (2), 69–73. Fernique, X. (1976). Regularité des trajectoires des fonctions aléatoires gaussiennes. pp. 1–96. Geman, S. (1980). A limit theorem for the norm of random matrices. The Annals of Prob- ability, pp. 252–261. Gimenes, N. and Guerre, E. (2021). Quantile regression methods for first-price auctions. Journal of Econometrics. Guédon, O., Hinrichs, A., Litvak, A. E. and Prochno, J. (2017). On the expecta- tion of operator norms of random matrices. In Geometric Aspects of Functional Analysis, Springer, pp. 151–162. Guerre, E., Perrigne, I. and Vuong, Q. (2000). Optimal nonparametric estimation of first-price auctions. Econometrica, 68 (3), 525–574. —, — and — (2009). Nonparametric identification of risk aversion in first-price auctions under exclusion restrictions. Econometrica, 77 (4), 1193–1227. — and Sabbah, C. (2012). Uniform bias study and bahadur representation for local poly- nomial estimators of the conditional quantile function. Econometric Theory, pp. 87–129. Haile, P., Hong, H. and Shum, M. (2003). Nonparametric tests for common values at first-price sealed-bid auctions. Haile, P. A. (2001). Auctions with resale markets: An application to us forest service timber sales. American Economic Review, 91 (3), 399–427. Hall, P. (1991). On convergence rates of suprema. Probability Theory and Related Fields, 89 (4), 447–455. — (2013). The bootstrap and Edgeworth expansion. Springer Science & Business Media. 76 Henderson, D. J., List, J. A., Millimet, D. L., Parmeter, C. F. and Price, M. K. (2012). Empirical implementation of nonparametric first-price auction models. Journal of Econometrics, 168 (1), 17–28. Horowitz, J. L. (2001). The bootstrap. In Handbook of econometrics, vol. 5, Elsevier, pp. 3159–3228. Ingraham, A. T. (2005). A test for collusion between a bidder and an auctioneer in sealed- bid auctions. Available at SSRN 712881. Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal com- ponents analysis. Annals of Statistics, pp. 295–327. Jones, M. C. (1992). Estimating densities, quantiles, quantile densities and density quan- tiles. Annals of the Institute of Statistical Mathematics, 44 (4), 721–727. Kaido, H.andWüthrich, K.(2021).Decentralizationestimatorsforinstrumentalvariable quantile regression models. Quantitative Economics, 12 (2), 443–475. Kaplan, D. M. and Sun, Y. (2017). Smoothed estimating equations for instrumental variables quantile regression. Econometric Theory, 33 (1), 105–157. Kargin, V. and Onatski, A. (2008). Curve forecasting by functional autoregression. Jour- nal of Multivariate Analysis, 99 (10), 2508–2526. Kato, K. (2012). Asymptotic normality of powell’s kernel estimator. Annals of the Institute of Statistical Mathematics, 64 (2), 255–273. Khorunzhiy, O. (2012). High moments of large Wigner random matrices and asymptotic properties of the spectral norm. Random Operators and Stochastic Equations, 20 (1), 25–68. Kiefer, J. (1967). On bahadur’s representation of sample quantiles. The Annals of Mathe- matical Statistics, 38 (5), 1323–1342. Knight, K. (2002). Comparing conditional quantile estimators: first and second order considerations. Working Paper. Koenker (2005). Quantile Regression. Cambridge University Press. Koenker, R. (2022). quantreg: Quantile Regression. R package version 5.94. — and Bassett, G. (1978). Regression quantiles. Econometrica, 46 (1), 33–50. — and — (1982). Robust tests for heteroscedasticity based on regression quantiles. Econo- metrica, 50 (1), 43–61. — and Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives, 15 (4), 143–156. 77 Kowal, D. R., Matteson, D. S. and Ruppert, D. (2019). Functional autoregression for sparsely sampled data. Journal of Business & Economic Statistics, 37 (1), 97–109. Krishna, V. (2009). Auction theory. Academic press. Latała, R. (2005). Some estimates of norms of random matrices. Proceedings of the Amer- ican Mathematical Society, 133 (5), 1273–1282. Latała, R., van Handel, R. and Youssef, P. (2018). The dimension-free structure of nonhomogeneous random matrices. Inventiones Mathematicae, 214 (3), 1031–1080. Lee, T.-H., Ullah, A. and Wang, H. (2017). The second-order bias and mse of quantile estimators. Unpublished manuscript. —, — and — (2018). The second-order bias of quantile estimators. Economics Letters,173, 143–147. Li, T. and Perrigne, I. (2003). Timber sale auctions with random reserve prices. Review of Economics and Statistics, 85 (1), 189–200. —, — andVuong, Q. (2000). Conditionally independent private information in ocs wildcat auctions. Journal of Econometrics, 98 (1), 129–161. Loertscher, S. and Marx, L. M. (2020). Asymptotically optimal prior-free clock auc- tions. Journal of Economic Theory, 187, 105030. Luo, Y. and Wan, Y. (2018). Integrated-quantile-based estimation for first-price auction models. Journal of Business & Economic Statistics, 36 (1), 173–180. Ma, J., Marmer, V. and Shneyerov, A. (2019). Inference for first-price auctions with guerre, perrigne, and vuong’s estimator. Journal of Econometrics, 211 (2), 507–538. —, —, — and Xu, P. (2021). Monotonicity-constrained nonparametric estimation and inference for first-price auctions. Econometric Reviews, 40 (10), 944–982. Marmer, V. andShneyerov, A. (2012). Quantile-based nonparametric inference for first- price auctions. Journal of Econometrics, 167 (2), 345–357. Marra, M. (2020). Sample spacings for identification: The case of english auctions with absentee bidding. Available at SSRN 3622047. Matzkin, R. L.(2013).Nonparametricidentificationinstructuraleconomicmodels. Annual Review of Economics, 5 (1), 457–486. Mendelson, S. and Tomczak-Jaegermann, N. (2008). A subgaussian embedding the- orem. Israel Journal of Mathematics, 164 (1), 349–364. Montiel Olea, J. L. and Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to svars. Journal of Applied Econo- metrics, 34 (1), 1–17. 78 Moon, H. R. and Weidner, M. (2017). Dynamic linear panel regression models with interactive fixed effects. Econometric Theory, 33 (1), 158–195. Nagar, A. L. (1959). The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica, 27 (4), 575–595. Newey, K. andMcFadden, D. (1994). Large sample estimation and hypothesis. Handbook of Econometrics, IV, Edited by RF Engle and DL McFadden, pp. 2112–2245. Olver, P. J. (2014). Introduction to partial differential equations. Springer. Ota, H., Kato, K. andHara, S. (2019). Quantile regression approach to conditional mode estimation. Electronic Journal of Statistics, 13 (2), 3120–3160. Paarsch, H. J., Hong, H. et al. (2006). An introduction to the structural econometrics of auction data. MIT Press Books, 1. Paul, A. and Gutierrez, G. (2004). Mean sample spacings, sample size and variability in an auction-theoretic framework. Operations Research Letters, 32 (2), 103–108. Perrigne, I. and Vuong, Q. (2019). Econometrics of auctions and nonlinear pricing. Annual Review of Economics, 11, 27–54. Phillips, P. C. B. (1991). A shortcut to lad estimator asymptotics. Econometric Theory, pp. 450–463. Pinkse, J. andSchurter, K. (2019).Estimationofauctionmodelswithshaperestrictions. arXiv preprint arXiv:1912.07466. Portnoy, S. (2012). Nearly root-n approximation for regression quantile processes. The Annals of Statistics, 40 (3), 1714–1736. Powell, J. L. (1986). Censored regression quantiles. Journal of Econometrics, 32 (1), 143–155. Riley, J. G. and Samuelson, W. F. (1981). Optimal auctions. The American Economic Review, 71 (3), 381–392. Rilstone, P., Srivastava, V. and Ullah, A. (1996). The second-order bias and mean squared error of nonlinear estimators. Journal of Econometrics, 75 (2), 369 – 395. Rio, E. (1994). Local invariance principles and their application to density estimation. Probability Theory and Related Fields, 98 (1), 21–45. Shalizi, C. and Kontorovich, A. (2010). Almost none of the theory of stochastic pro- cesses. Unpublished manuscript. Siddiqui, M. M. (1960). Distribution of quantiles in samples from a bivariate population. Journal of Research of the National Bureau of Standards, 64, 145–150. 79 Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. The Annals of Statistics, pp. 177–184. Smirnov, N. V. (1950). On the construction of confidence regions for the density of distri- bution of random variables. In Doklady Akad. Nauk SSSR, vol. 74, pp. 189–191. Stroock, D. W. (1998). A concise introduction to the theory of integration. Springer Sci- ence & Business Media. Stute, W. (1984). The oscillation behavior of empirical processes: The multivariate case. The Annals of Probability, pp. 361–379. Talagrand, M. (2006). The generic chaining: upper and lower bounds of stochastic pro- cesses. Springer Science & Business Media. Tao, T. (2012). Topics in random matrix theory, vol. 132. American Mathematical Society. — and Vu, V. (2011). Random matrices: universality of local eigenvalue statistics. Acta Mathematica, 206 (1), 127–204. Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer. Van Der Vaart, A.andWellner, J.(1996).Weak Convergence and Empirical Processes, vol. 58. Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, vol. 47. Cambridge University Press. Wang, J.-L., Chiou, J.-M. and Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and Its Application, 3, 257–295. Welsh, A. (1988). Asymptotically efficient estimation of the sparsity function at a point. Statistics & Probability Letters, 6 (6), 427–432. Wigner, E. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62 (3), 548—-564. Wishart, J. (1928). The generalised product moment distribution in samples from a normal multivariate population. Biometrika, pp. 32–52. Zhou, K. Q. and Portnoy, S. L. (1996). Direct use of regression quantiles to construct confidence sets in linear models. The Annals of Statistics, 24 (1), 287–306. Zhu, Y. (2019). Learning non-smooth models: instrumental variable quantile regressions and related problems. arXiv preprint arXiv:1805.06855. Zincenko, F. (2021). Estimation and inference of seller’s expected revenue in first-price auctions. Available at SSRN 3966545. 80 Appendix A Appendices to Chapter 1 A.1 Proof of Theorem 1 The following proof can be found in Vershynin (2018), see Theorem 8.5.3. Since T is separable, we can assume for simplicity that it is finite. Let (T k ) be an admissible sequence and k (t) be the best approximation to t in T k , i.e. d(t; k (t)) = min t 0 2T k d(t;t 0 ): Now consider a chain of approximations to the point t starting from some t 0 t 0 = 0 (t)! 1 (t)!! ~ K (t) =t and write Z t Z t 0 = ~ K X k=1 Z k (t) Z k1 (t) : Sub-Gaussianity of increments Z k (t) Z k1 (t) implies that, for any u> 0, P jZ k (t) Z k1 (t) j6Cu2 k=2 d( k (t); k1 (t)) > 1 2e C 2 u 2 2 k =K 2 > 1 2e 8u 2 2 k ; (A.1) where C> p 8K. Note that since k (t)2 T k , k1 (t)2 T k1 , the number of possible pairs ( k (t); k1 (t)) is jT k jjT k1 j6jT k j 2 = 2 2 k+1 . Applyingtheunionboundto(A.1)overk2Nandpairs ( k (t); k1 (t)), we obtain P jZ k (t) Z k1 (t) j6Cu2 k=2 d( k (t); k1 (t)) for all t2T;k2N > 1 1 X k=1 2 2 k+1 2e 8u 2 2 k > 1 2e u 2 : 81 The event on the left-hand side implies jZ t Z t 0 j6Cu 1 X k=1 2 k=2 d( k (t); k1 (t))6Cu 1 X k=1 2 k=2 (d( k (t);t) +d( k1 (t);t)) 6 ~ Cu 2 (T;d) for a constant ~ C > 0. Taking supremum over t2T yields sup t2T jZ t Z t 0 j6 ~ Cu 2 (T;d): Since this event holds with probability at least 12e u 2 , sup t2T jZ t Z t 0 j is a sub-Gaussian random variable with Orlicz norm bounded by ~ C 2 (T;d). The conclusion then follows from (1.1) and the inequality E sup t2T Z t =E sup t2T (Z t Z t 0 )6E sup t2T jZ t Z t 0 j: A.2 Proof of Lemma 2 We give the proof for the case of L = 2 metric spaces for simplicity. The case of arbitrary L follows immediately by inspection. Denote the two spaces by (X;d X ) and (Y;d Y ). Consider admissible sequences (X k ) and (Y k ) in X and Y, respectively. To each such pair there corresponds a sequence ( ~ T k ) in T =XY of the form ~ T k = ( X 0 Y 0 ; k = 0; X k1 Y k1 ; k> 1: (A.2) This sequence is admissible sincej ~ T 0 j =j ~ T 1 j = 1 andj ~ T k j =jX k1 jjY k1 j6 2 2 k1 2 2 k1 = 2 2 k for k> 2. Fix (x;y)2T and write X k>0 2 k=2 d((x;y); ~ T k ) =d X (x;X 0 ) + X k>1 2 k=2 d(x;X k1 ) +d Y (y;Y 0 ) + X k>1 2 k=2 d Y (y;Y k1 ): The bound on the first two terms on the right-hand side is d X (x;X 0 ) + X k>1 2 k=2 d(x;X k1 ) = (1 + p 2)d X (x;X 0 ) + p 2 X k>1 2 k=2 d X (x;X k ) 6 (1 + p 2) X k>0 2 k=2 d X (x;X k ): Similarly, we have d Y (y;Y 0 ) + X k>1 2 k=2 d Y (y;Y k1 )6 (1 + p 2) X k>0 2 k=2 d Y (y;Y k ): 82 Adding the two inequalities and taking suprema yields sup (x;y) X k>0 2 k=2 d((x;y); ~ T k )6 (1 + p 2) sup (x;y) 0 @ X k>0 2 k=2 d X (x;X k ) + X k>0 2 k=2 d Y (y;Y k ) 1 A 6 (1 + p 2) 0 @ sup x X k>0 2 k=2 d X (x;X k ) + sup y X k>0 2 k=2 d Y (y;Y k ) 1 A : Taking infima over admissible sequences ( ~ T k ) (which are functions of admissible sequences (X k ) and (Y k )) yields inf ( ~ T k ) sup (x;y) X k>0 2 k=2 d((x;y); ~ T k )6 (1 + p 2) 0 @ inf (X k ) sup x X k>0 2 k=2 d X (x;X k ) + inf (X k ) sup y X k>0 2 k=2 d Y (y;Y k ) 1 A = (1 + p 2) ( 2 (X;d X ) + 2 (Y;d Y )): Finally, note that 2 (T;d) is not larger than the left-hand side of the inequality above since the infimum in its definition is taken over all admissible sequences (T k ), not only those that have the form ( ~ T k ). 83 Appendix B Appendices to Chapter 2 B.1 Estimation and inference for value quantiles, proofs B.1.1 Proof of Theorem 1 First, we need the following two lemmas concerning expressions that appear further in the proof. Lemma 3. Suppose K is a continuous function of bounded variation. Then Z 1 0 K h (uz)d ^ Q(z)Q(z) = Z 1 0 ^ Q(z)Q(z) dK h (uz) +R I n (u); (B.1) where sup u2[0;1] jR I n (u)j =O a:s: 1 nh . Proof. Denote ^ (z) = ^ Q(z)Q(z) and note that ^ is a function of bounded variation a.s. Using integration by parts for the Riemann-Stieltjes integral (see e.g. Stroock, 1998, Theorem 1.2.7), we have Z 1 0 K h (uz)d ^ (z) = Z 1 0 ^ (z)dK h (uz) +K h (u 1) ^ (1)K h (u) ^ (0) (B.2) To complete the proof, note that ^ (1) = b (n) b = O a:s: (n 1 ), ^ (0) = b (1) b = O a:s: (n 1 ), jK h (u 1)j6h 1 K(0) andjK h (u)j6h 1 K(0). Lemma 4. Suppose K is a continuous function of bounded variation. Then, for every u2 [0; 1], Z n (u) := p nh Z 1 0 ( ^ F (Q(z))z)dK h (uz) =G n;h (u); (B.3) G n;h (u) := p nh n n X i=1 [K h (uF (b i ))EK h (uF (b i ))]: (B.4) Proof. UsingintegrationbypartsfortheRiemann-Stieltjesintegral(seee.g.Stroock,1998, Theorem 84 1.2.7), we have Z 1 0 ( ^ F (Q(z))z)dK h (uz) = Z 1 0 K h (uz)d h ^ F (Q(z))z i +K h (u 1) h ^ F ( b) 1 i +K h (u) ^ F (0) = Z 1 0 K h (uz)d h ^ F (Q(z))z i ; where we used the fact that ^ F ( b) = 1 a.s. and ^ F (0) = 0 a.s. We further write Z 1 0 ( ^ F (Q(z))z)dK h (uz) = Z 1 0 K h (uz)d h ^ F (Q(z))z i = Z b 0 K h (uF (x))d h ^ F (x)F (x) i = 1 n n X i=1 [K h (uF (b i ))EK h (uF (b i ))]; where in the second equality we used the change of variables x =Q(z). We now proceed with the proof of Theorem 1. Plug in the BK expansion (2.23) and use Lemma 3 to obtain ^ q h (u)q h (u) = Z 1 0 K h (uz)d h ^ Q(z)Q(z) i (B.5) = Z 1 0 h ^ Q(z)Q(z) i dK h (uz) +R I n (u) (B.6) = Z 1 0 q(z)( ^ F (Q(z))z)dK h (uz) + Z 1 0 R BK n (z)dK h (uz) +R I n (u): (B.7) First term in (B.7). Sincef isboundedawayfromzero,jq 0 j6M <1forsomeconstantM, andhencejq(z)q(u)j6 Mjzuj. The first term in (B.7) can then be rewritten as Z 1 0 q(z)( ^ F (Q(z))z)dK h (uz) =q(u) Z 1 0 ( ^ F (Q(z))z)dK h (uz) +R II n (u); (B.8) where R II n (u) = Z 1 0 (q(z)q(u))( ^ F (Q(z))z)dK h (uz) (B.9) 6Mh Z 1 0 ( ^ F (Q(z))z)dK h (uz) =Mh (nh) 1=2 Z n (u) : (B.10) By Lemma 4, Z n (u) =G n;h (u), where the processG n;h (u) = O a:s: (logh) uniformly in u2 [0; 1] (see e.g. Silverman, 1978; Stute, 1984), and hence R II n (u) =O a:s: h logh p nh uniformly over u2 (0; 1): (B.11) 85 Applying Lemma 4 to the first term in (B.8) allows us to rewrite Z 1 0 q(z)( ^ F (Q(z))z)dK h (uz) =q(u)(nh) 1=2 G n;h (u) +O a:s: h logh p nh : (B.12) Second term in (B.7). This term can be upper bounded as follows, sup u Z 1 0 R BK n (z)dK h (uz) 6 sup u Z 1 0 R BK n (z) jdK h (uz)j (B.13) 6 sup z jR BK n (z)jTV (K h ) =O a:s: n 3=4 `(n) h 1 TV (K) =O a:s: h 1 n 3=4 l(n) ; (B.14) where we used the properties of total variation in the first inequality and in the second equality. Plugging (B.12) and (B.14) into (B.7) and multiplying by p nh yields p nh (^ q h (u)q h (u)) =q(u)G n;h (u) +O a:s: h logh +h 1=2 n 1=4 `(n) ; (B.15) where we disregarded the term p nhR I n (u), since it has the uniform order O a:s: (n 1=2 h 1=2 ), which is smaller than O a:s: h 1=2 n 1=4 `(n) . Note that, for u2 [h; 1h], there exists (u;z) lying between u and z such that q h (u) = Z 1 0 q(z)K h (uz)dz = Z 1 0 q(u) +q 0 ((u;z))(uz) K h (uz)dz (B.16) =q(u) +O(h): (B.17) Combining this with (B.15) yields p nh (^ q h (u)q(u)) =q(u)G n;h (u) +O a:s: n 1=2 h 3=2 +h logh +h 1=2 n 1=4 `(n) ; (B.18) uniformly in u2 [h; 1h]. UsingG n;h (u) =O a:s: (logh) again, we conclude that p nh (^ q h (u)q(u)) =O a:s: logh +n 1=2 h 3=2 (B.19) (note that we dropped the terms h logh and h 1=2 n 1=4 `(n) since they are smaller than logh) or, dividing by p nh, ^ q h (u)q(u) =O a:s: logh p nh +h ; uniformly in u2 [h; 1h]: (B.20) Now we replace q(u)G n;h (u) by ^ q h (u)G n;h (u) in (B.18), which leads to the approximation error G n;h (u) (^ q h (u)q(u)) =O a:s (logh) 2 p nh +h logh ; uniformly in u2 [h; 1h]; (B.21) as follows from (B.20). Hence, (B.18) becomes p nh (^ q h (u)q(u)) =^ q h (u)G n;h (u) +O a:s: n 1=2 h 3=2 +h logh +h 1=2 n 1=4 `(n) ; (B.22) 86 uniformlyinu2 [h; 1h], wherewedroppedtheterm (logh) 2 p nh sinceitissmallerthanh 1=2 n 1=4 `(n). Finally, write p nh (^ v h (u)v(u)) = p nh ^ Q(u)Q(u) + A(u) p nh (^ q h (u)q(u)): (B.23) Since ^ Q(u)Q(u) =O a:s: (n 1=2 ) uniformly in u2 (0; 1), we have p nh (^ v h (u)v(u)) =O a:s: (h 1=2 ) + A(u) p nh (^ q h (u)q(u)): (B.24) Combining this with (B.22) and noting that h logh is smaller than h 1=2 yields p nh (^ v h (u)v(u)) = A(u)^ q h (u)G n;h (u) +O a:s: n 1=2 h 3=2 +h 1=2 +h 1=2 n 1=4 `(n) ; (B.25) uniformly inu2 [h; 1h]. Dividing by ^ q h (u), which is bounded away from zero w.p.a. 1, completes the proof. B.1.2 Proof of Theorem 2 A key ingredient of the proof is to note that Lemmas 2.3 and 2.4 of Chernozhukov et al. (2014) continue to hold even if their random variable Z n does not have the form Z n = sup f2Fn G n f for the standard empirical process G n , but instead is a generic random variable admitting a strong sup-Gaussian approximation with a sufficiently small remainder. For completeness, we provide the aforementioned trivial extensions of the two lemmas here. LetX be a random variable with distributionP taking values in a measurable space (S;S). Let F be a class of real-valued functions on S. We say that a function F :S!R is an envelope ofF if F is measurable andjf(x)j6F (x) for all f2F and x2S. We impose the following assumptions (A1)-(A3) of Chernozhukov et al. (2014). (A1) The classF is pointwise measurable, i.e. it contains a coutable subsetG such that for every f2F there exists a sequence g m 2G with g m (x)!f(x) for every x2S. (A2) For some q> 2, an envelope F ofF satisfies F2L q (P ). (A3) The classF is P-pre-Gaussian, i.e. there exists a tight Gaussian random variable G P in l 1 (F) with mean zero and covariance function E[G P (f)G P (g)] =E[f(X)g(X)] for all f;g2F: Lemma5 (A trivial extension of Lemma 2.3 of Chernozhukov et al. (2014)). Suppose that Assump- tions (A1)-(A3) are satisfied and that there exist constants, > 0 such that 2 6Pf 2 6 2 for all f2F. Moreover, suppose there exist constants r 1 ;r 2 > 0 and a random variable ~ Z = sup f2F G P f such thatP(jZ ~ Zj>r 1 )6r 2 . Then sup t2R P(Z6t)P( ~ Z6t) 6C r 1 n E ~ Z + p 1_ log(=r 1 ) o +r 2 ; where C is a constant depending only on and . 87 Proof. For every t2R, we have P(Z6t) =P(fZ6tg\fjZ ~ Zj6r 1 g) +P(fZ6tg\fjZ ~ Zj>r 1 g) 6P( ~ Z6t +r 1 ) +r 2 6P( ~ Z6t) +C r 1 n E ~ Z + p 1_ log(=r 1 ) o +r 2 ; where Lemma A.1 of Chernozhukov et al. (2014) (an anti-concentration inequality for ~ Z) is used to deduce the last inequality. A similar argument leads to the reverse inequality, which completes the proof. Lemma 6 (A trivial extension of Lemma 2.4 of Chernozhukov et al. (2014)). Suppose that there exists a sequence of P-centered classesF n of measurable functions S!R satisfying assumptions (A1)-(A3) withF =F n for each n, where in the assumption (A3) the constants and do not depend on n. Denote by B n the Brownian bridge on ` 1 (F n ), i.e. a tight Gaussian random variable in ` 1 (F n ) with mean zero and covariance function E[B n (f)B n (g)] =E[f(X)g(X)] for all f;g2F n : Moreover, suppose that there exists a sequence of random variables ~ Z n = sup f2Fn B n (f) and a sequence of constants r n ! 0 such thatjZ n ~ Z n j =O P (r n ) and r n E ~ Z n ! 0. Then sup t2R P(Z n 6t)P( ~ Z n 6t) ! 0: Proof. Take n !1 sufficiently slowly such that n r n (1_E ~ Z n ) =o(1). Then sinceP(jZ n ~ Z n j> n r n ) =o(1), by Lemma 5, we have sup t2R P(Z n 6t)P( ~ Z n 6t) =O r n (E ~ Z n +j log( n r n )j) +o(1) =o(1): This completes the proof. Lemma 7. Let W n = sup u2[0;1] (u) p nh 1 n n X i=1 [K h (U i u)EK h (U i u)] (B.26) for some smooth function : [0; 1] ! R. Then there exists a tight centered Gaussian random variable B n in ` 1 ([0; 1]) with the covariance function E[B n (u)B n (v)] =(u)(v)Cov (K h (Uu);K h (Uv)); u;v2 [0; 1]; (B.27) such that, for ~ W n = sup u2[0;1] B n (u), we have the approximation W n = ~ W n +O p (nh) 1=6 logn : (B.28) Proof. Define the class of functions F n =f[0; 1]3x7!(u)K h (ux); u2 [0; 1]g (B.29) 88 and note that W n = p hkG n k Fn : (B.30) Let us apply Chernozhukov et al. (2014, Proposition 3.1) to obtain a sup-Gaussian approximation of W n . Indeed, in the notation of Chernozhukov et al. (2014, Section 3.1), take g 1; G =fgg; I = [0; 1]; c n (u;g) =(u): (B.31) Then the representation (8) in Chernozhukov et al. (2014) holds, i.e. W n = sup (u;g)2IG c n (u;g) p nh 1 n n X i=1 [K h (U i u)EK h (U i u)]: (B.32) It is now trivial to check that the assumptions of Chernozhukov et al. (2014, Proposition 3.1) hold and the statement of the lemma follows. Let us now go back to the proof of Theorem 2. Use Lemma 7 with (u) = u and note that Lemma 6 and Chernozhukov et al. (2014, Remark 3.2) then imply sup t2R P(W n 6t)P( ~ W n 6t) ! 0: (B.33) On the other hand, by Theorem 1 we have W n =W n +O a:s: h 1=2 +h 1=2 n 1=4 l(n) : (B.34) Substituting (B.28) into this equation, we obtain W n = ~ W n +O p (nh) 1=6 logn +h 1=2 +h 1=2 n 1=4 l(n) : (B.35) Assumption 3 then impliesW n ~ W n =o p (log 1=2 n). Chernozhukov et al. (2014, Remark 3.2) now implies sup t2R P(W n 6t)P( ~ W n 6t) ! 0: (B.36) Given (B.33) and (B.36), applying the triangle inequality finishes the proof. B.2 Estimation and inference for counterfactuals, proofs B.2.1 Proof of Theorem 3 First, write ^ S(u )S(u ) = Z 1 u '(u)( ^ Q(u)Q(u))du (B.37) A(u ) (u )( ^ Q(u )Q(u )) + A(1) (1)( ^ Q(1)Q(1)): (B.38) 89 Using the classical BK expansion (2.23), we obtain ^ S(u )S(u ) = Z 1 u '(u)q(u) h ^ F (Q(u))u i du (B.39) + A(u ) (u )q(u ) h ^ F (Q(u ))u i +R n (u ); (B.40) where the composite error term R n (u ) = Z 1 u '(u)r n (u)du A(u ) (u )r n (u ) + A(1) (1)( ^ Q(1)Q(1)) (B.41) =O a:s: n 3=4 `(n) ; (B.42) uniformly in u 2 [0; 1]. The latter rate follows from (2.23) and the fact that ^ Q(1)Q(1) = O a:s: (n 1 ). Denoting U i =F (b i ), we can write p n ^ S(u )S(u ) = 1 p n n X i=1 [f u (U i )Ef u (U i )] +O a:s: n 1=4 `(n) : (B.43) Let us show that the classff u ju 2 [0; 1]g is Donsker. Since the sum of a finite number of Donsker classes is Donsker (see Alexander, 1987), and also a constant times a Donsker class is Donsker, it suffices to show that the following (uniformly bounded) classes are Donsker, H =fh u :U7! Z 1 u '(u)q(u)1(U6u)duju 2 [0; 1]g; (B.44) G =fg u :U7! A(u ) (u )q(u )1(U6u )ju 2 [0; 1]g: (B.45) Indeed, for u ;v 2 [0; 1], jh u (U)h v (U)j = Z max(u ;v ) min(u ;v ) '(x)q(x)1(U6x)dx 6ju v j sup u2[0;1] j'(u)q(u)j; (B.46) i.e.H is Lipschitz in parameter. Its Donskerness follows by Theorem 2.7.11 and Theorem 2.5.6 in Vaart and Wellner (1996). On the other hand,G is Donsker since it is a product of the set of constant functions U 7! A(u ) (u )q(u ) (which is trivially Donsker) and the VC classf1(6u ); u 2 [0; 1]g. B.2.2 Proof of Theorem 4 We have p nh ^ T h (u )T (u ) = '(u ) (^ v h (u )v(u )) + p nh ^ S (u )S (u ) (B.47) = '(u ) A(u )^ q h (u ) (G n;h (u ) +R n (u)) +O p h 1=2 (B.48) = '(u ) A(u )^ q h (u )G n;h (u ) +O p n 1=2 h 3=2 +h 1=2 +h 1=2 n 1=4 `(n) ; (B.49) 90 uniformly in u 2 [h; 1h], where the last two equations use Theorem 1 and Theorem 3. Dividing by ^ q h (u ), which is bounded away from zero w.p.a. 1, finishes the proof. B.2.3 Proof of Theorem 5 Use Lemma 7 with (u) = u'(u) and note that Lemma 6 and Remark 3.2 in Chernozhukov et al. (2014) then imply sup t2R P(W T n 6t)P( ~ W T n 6t) ! 0: (B.50) On the other hand, by Theorem 4 we have W n =W n +O a:s: h 1=2 +h 1=2 n 1=4 l(n) : (B.51) Substituting (B.28) into this equation, we obtain W T n = ~ W T n +O p (nh) 1=6 logn +h 1=2 +h 1=2 n 1=4 l(n) : (B.52) Under the assumption of the theorem,h decays polynomially, and henceW T n ~ W T n =o p (log 1=2 n). Remark 3.2 of Chernozhukov et al. (2014) now implies sup t2R P(W T n 6t)P( ~ W T n 6t) ! 0: (B.53) Given (B.50) and (B.53), applying the triangle inequality finishes the proof. 91 Appendix C Appendices to Chapter 3 C.1 Bahadur-Kiefer representation, proofs C.1.1 Auxiliary results for generic IVQR estimators Lemma 8. Under Assumptions 5 and 6, g () is three times continuously differentiable in . Proof. By definition, g () =E(1fY 6W 0 g)Z =E(E(F Y (W 0 jW;Z))Z). The result then follows from the dominated convergence theorem. Lemma9. Suppose Assumptions 5and 6hold. Then forany estimator ^ such that sup 2[";1"] k ^ 0 k =O p r 1 n for some sequence r n !1, we have a representation ^ g ( ^ ) = 1 p n B n ( 0 ) +(EZE n Z) + 1 p n B n ( ^ ) +G( 0 )( ^ 0 ) + 1 2 ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) +O p 1 r 3 n ; (C.1) where the remainder rate is uniform in 2 ["; 1"]. Proof. By definition, ^ g ( ^ ) =E n 1fY 6W 0 ^ gZE n Z; = 1 p n B n ( ^ ) +g ( ^ )E n Z; = 1 p n B n ( 0 ) + 1 p n B n ( ^ ) +(EZE n Z) +g ( ^ ): By Lemma 8, g () is three times continuously differentiable. Since is restricted to a compact set , the norm of the third derivative is bounded on . The Taylor theorem implies that there exist a neighborhood of 0 such that for any in the neighborhood, g () =G( 0 )( 0 ) + 1 2 ( 0 ) 0 @ G( 0 )( 0 ) +R(); where R() = O k 0 k 3 uniformly in . Then (C.1) follows immediately because ^ is a uniformly consistent estimator. 92 Now let us study the large sample behavior of the term B n ( ^ ) in (C.1). Lemma 10. Suppose that Assumptions 5 and 6 hold. For any pair of estimators ^ and ^ such that sup 2[";1"] k ^ ^ k =O p r 1 n for some sequence r n !1, we have B n ( ^ )B n ( ^ ) =O p r logr n r n ! +O p logr n p n uniformly in 2 ["; 1"]: Proof. The proof relies on the arguments in Ota et al. (2019, Lemma 3) adapted to our setting. The idea is to verify the conditions of Lemma 1 of Ota et al. (2019), which follows from Corollary 5.1 in Chernozhukov et al. (2014) and use this corollary to prove the desired result. Since sup 2[";1"] k ^ ^ k = O p r 1 n , we have P sup 2[";1"] k ^ ^ k6M n =r n ! 1 for any sequence M n !1. Consider the functions f ; : (y;w;z)7! 1fyw 0 6 0g 0 z; f ;h; : (y;w;z)7! 1fyw 0 6w 0 hg 1fyw 0 6 0g 0 z; that constitute function classes F = f ; : 2 ;kk = 1 ; F n = f ;h; : 2 ;khk6 M n r n ;kk = 1 ; where M n is a sequence such that M n !1 and M n =r n ! 0. LetG n be the standard empirical process operator onF[F n with the data (Y i ;W i ;Z i ), i = 1;:::;n. Note thatG n f ;e j =B n ( ^ ) 0 e j , where e j := (0;::: 0; 1; 0;::: 0) 0 with 1 in j-th position. By Assumption 6, bothF andF n admit a constant envelope F (y;w;z) 2m. Let us now verify the conditions in Lemma 1 of Ota et al. (2019). First, since 1 YW 0 6W 0 h 1 YW 0 6 0 2 = 1 min(0;W 0 h)<YW 0 6 max(0;W 0 h) ; we obtain Ef 2 ;h; (Y;W;Z) =E 1 YW 0 6W 0 h 1 YW 0 6 0 0 Z 2 6m 2 E1 min(0;W 0 h)<YW 0 6 max(0;W 0 h) =m 2 E E F Y (W 0 h +W 0 jW;Z)F Y (W 0 jW;Z) W;Z 6m 2 E E jW 0 hj sup y f Y (yjW;Z) W;Z 6m 3 fkhk =O M n r n ; where we used Assumption 5. Therefore, the variance parameter of the process is 2 n := sup f2Fn Ef 2 (Y;W;Z) =O M n r n : Second, we haveE max 16i6n F 2 (Y i ;W i ;Z i ) = (2m) 2 = constant. 93 Third, because the function classF is a VC class with the constant envelope m, there exist constants A and V independent of n such that the standard entropy bound sup Q N (F n ;kk Q;2 ;m)6 (A=) V ; for all 2 (0; 1] holds (e.g., Van Der Vaart and Wellner, 1996, Section 2.6). Here the supremum is taken with respect to all finitely discrete measures Q andkk Q;2 is the L 2 (Q) norm. Finally, applying Lemma 1 in Ota et al. (2019), we obtain E sup ;h; kG n f ;h; k. p V 2 n log(Am= n ) + Vm p n log(Am= n ) =O r logr n r n ! +O logr n p n ; (C.2) where the last equality holds by choosingM n !1 sufficiently slowly. Note that the right hand side of this equation does not depend on . Consequently, by the definition of the norm and equation (C.2), sup 2[";1"] kB n ( ^ )B n ( ^ )k = sup 2[";1"] max kk=1 G n f ^ ; G n f ^ ; =O p r logr n r n ! +O p logr n p n ; where the last equality holds by Markov’s inequality. C.1.2 Auxiliary results for exact estimators Lemma 11. Under Assumptions 4.1, 5 and 6, sup 2[";1"] k ^ 0 k = o p (1), where ^ = ^ `p for any p2 [1;1] or ^ = ^ ;QR . Proof. Wegivetheprooffor ^ = ^ `p . Uniformconsistencyforthecaseof ^ = ^ ;QR wasestablished by Angrist et al. (2006, Theorem 3). By Assumption 4.1, arg min 2 kg ()k p = 0 : Assumptions 5 and 6 imply that the function class (w;y;z)7!z(1fy6w 0 g); 2 ["; 1"]; 2 is Donsker and thus Glivenko-Cantelli, and hence 1 sup 2;2[";1"] j^ g ()g ()j = sup 2;2[";1"] j(E n E)Z i (1fY i 6W 0 g)j a:s: ! 0: By the argmin theorem (Theorem 2.1 in Newey and McFadden, 1994), applied to Q n (;) := k^ g ()k p , we get sup 2[";1"] k ^ `p 0 k =o p (1). 1 See Chernozhukov and Hansen (2006, Lemma B.2) for a more detailed discussion. 94 Lemma 12. Under Assumptions 4–6, for any exact QR estimator ^ ;QR as defined in equation (3.2), we have sup 2[";1"] k ^ ;QR 0 k =O p 1 p n ; (C.3) sup 2[";1"] k^ g ( ^ ;QR )k p =O p k n : Proof. The proof of (C.3) follows from Theorem 3 in Angrist et al. (2006). The exact QR estimators yield exact zeros of the subgradient 1 n n X i=1 (h(Y i W 0 i ))W i ; where the multi-valued function h(u) is defined as 1fu< 0g for u6= 0 and h(0) := [0; 1] for u = 0. The subgradient function differs from sample moment functions by the fraction of observations with Y i =W 0 i ^ ;QR . For the case when the observations are in “general position” (Definition 2.1 in Koenker, 2005), there are at mostk terms like that, and so sup 2[";1"] k^ g ( ^ ;QR )k p =O p (k=n). Lemma 13. Under Assumptions 4, 5 and 6, for any estimator ^ = ^ `p that minimizesk^ g ()k p , we have sup 2[";1"] k ^ ^ k =o p 1 p n ; sup 2[";1"] k^ g ( ^ )k p =o p 1 p n ; where ^ is introduced in equation (3.4). Proof. The proof proceeds in four steps. Step 1. Notice that under the assumptions of the lemma, the empirical process B n () is Donsker and stochastically equicontinuous (see Chernozhukov and Hansen, 2006, Lemma B.2). Step 2. By definition, ^ can be written as ^ = 0 G 1 (EZE n Z) + 1 p n B n ( 0 ) = 0 +O p 1 p n ; (C.4) where 0 andG :=@ g ( 0 ) are well-defined by Assumption 4 and the functional CLT holds as an implication of Assumption 6. Here the remainder is uniform in 2 ["; 1"]. By Lemma 9 applied to ^ , ^ g ( ^ ) = 1 p n B n ( 0 ) +(EZE n Z) + 1 p n B n ( ^ ) +g( 0 ) +G( 0 )( ^ 0 ) + 1 2 ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) +O p n 3 2 : 95 Then after substituting the first equation in (C.4) into the term G( 0 )( ^ 0 ), we have ^ g ( ^ ) = 1 p n B n ( ^ ) +O p 1 n : So by Step 1, ^ g ( ^ ) =O p n 1 2 . Since ^ `p is defined as the estimator attaining the minimal p-norm, sup 2[";1"] k^ g ( ^ `p )k p 6 sup 2[";1"] k^ g ( ^ )k p =O p n 1 2 : Step 3. Consider ^ (2) := ^ G 1 B n ( ^ )= p n. By equation (C.4), ^ is uniformly consistent, and hence ^ (2) = ^ +o p (1= p n), uniformlyin2 ["; 1"]sinceB n isstochasticallyequicontinuous(Step 1). Then by the stochastic equicontinuity of B n (Step 1) and Lemma 9 applied to ^ (2) , uniformly in 2 ["; 1"], ^ g ( ^ (2) ) = B n ( ^ (2) )B n ( ^ ) p n +o p 1 p n =o p 1 p n : This implies sup 2[";1"] k^ g ( ^ `p )k p 6 sup 2[";1"] k^ g ( ^ (2) )k p =o p n 1 2 : Step 4. By Lemma 11, sup 2[";1"] k ^ ;p 0 k = O p r 1 n for some r n !1. By Lemma 9 and Steps 1 and 2, ^ `p satisfies G( 0 )( ^ `p 0 ) + 1 2 ( ^ `p 0 ) 0 @ G( 0 )( ^ `p 0 ) = ^ g ( ^ `p ) 1 p n B n ( 0 )(EZE n Z) 1 p n B n ( ^ `p ) +O p 1 r 3 n =O p 1 p n +O p 1 r 3 n ; (C.5) uniformly in 2 ["; 1"]. By Assumption 4.2, we can multiply the last equation by G 1 ( 0 ) and obtain ^ `p 0 +O p r 2 n =O p 1 p n +O p 1 r 3 n ; which implies we can take r n = p n by a fixed point argument, which is discussed in detail in Step 3 of the proof of Lemma 14 below. By uniform consistency of ^ `p and Step 1, sup 2[";1"] kB n ( ^ `p )k = sup 2[";1"] kB n ( ^ `p )B n ( 0 )k =o p (1): 96 Lemma 9 applied to ^ `p gives us ^ `p = ^ G 1 ( 0 ) 1 2 ( ^ `p 0 ) 0 @ G( 0 )( ^ `p 0 ) +G 1 ( 0 )^ g ( ^ `p ) 1 p n G 1 ( 0 )B n ( ^ `p ) +O p n 3=2 The termG 1 ( 0 )2 1 ( ^ `p 0 ) 0 @ G( 0 )( ^ `p 0 ) isO p (n 1 ). The termG 1 ( 0 )^ g ( ^ `p ) iso p (n 1=2 ) by Step 3. The term n 1=2 G 1 ( 0 )B n ( ^ `p ) iso p (n 1=2 ) by stochastic equicontinuity of B n (Step 1) and uniform consistency of ^ `p . Therefore, ^ `p = ^ +o p n 1 2 uniformly in 2 ["; 1"]. In fact, the results of the previous lemma can be further refined. Lemma 14. Under Assumptions 4–6, for any estimator ^ `p that minimizesk^ g ()k p , we have sup 2[";1"] k^ g ( ^ `p )k p =O p logn n : Proof. The proof proceeds in four steps. Step 1. By Lemma 10 and 13 applied to ^ `p and 0 , B n ( ^ `p ) =B n ( ^ `p )B n ( 0 ) =O p s log p n p n ! +O p log p n p n uniformly in 2 ["; 1"]: As a result, B n ( ^ `p ) =O p p logn n 1=4 uniformly in 2 ["; 1"]: Step 2. Consider the estimator ^ (2) := ^ G 1 ( 0 )B n ( ^ `p ) p n ; where, by Step 1, G 1 ( 0 )B n ( ^ `p )= p n =O p n 3=4 p logn . By Lemma 9, we get ^ g ( ^ (2) ) = 1 p n B n ( 0 ) + (EZE n Z) + 1 p n B n ( ^ (2) ) +G( 0 )( ^ (2) 0 ) + ( ^ (2) 0 ) 0 @G( 0 ) @ ( ^ (2) 0 ) +O p 1 n 3=2 : (C.6) Then, by definition of ^ (2) , ^ g ( ^ (2) ) = B n ( ^ (2) )B n ( ^ `p ) p n + ( ^ (2) 0 ) 0 @G( 0 ) @ ( ^ (2) 0 ) +O p 1 n 3=2 : (C.7) 97 Define r n to be a sequence satisfying sup 2[";1"] k ^ `p ^ (2) k = O p (r 1 n ) (Lemma 13 implies uniform consistency of ^ `p and that r n can be taken to be at least p n). By Lemma 10, B n ( ^ (2) )B n ( ^ `p ) =O p r logr n r n ! +O p logr n p n ; Then (C.7) becomes ^ g ( ^ (2) ) =O p p logn p nr n +O p logn n ; (C.8) where we replaced logr n with the faster growing sequence log p n =O(logn). By Lemma 9 applied to ^ `p and the definition of ^ , ^ `p = ^ +G 1 ( 0 )^ g ( ^ `p ) G 1 ( 0 )B n ( ^ `p ) p n G 1 ( 0 )( ^ `p 0 ) 0 @G( 0 ) @ ( ^ `p 0 ) +O p 1 n 3=2 : So by (C.6) and the definition of ^ (2) , we get ^ `p ^ (2) =G 1 ( 0 )^ g ( ^ `p ) +O p (n 1 ); which implies we can take r 1 n as the rate of ^ g ( ^ `p ) =O p (r 1 n ). Step 3. Note that by definition of ^ `p , sup 2[";1"] k^ g ( ^ `p )k p 6 sup 2[";1"] k^ g ( ^ (2) )k p . Then from (C.8), we obtain sup 2[";1"] k^ g ( ^ `p )k p 6 sup 2[";1"] k^ g ( ^ (2) )k p =O p p logn p nr n +O p logn n : On the right hand side of this inequality, suppose that the first term dominates the second term. Then we have (r 1 n ) 1 2 =O p logn p n ; or, equivalently, r 1 n =O logn n : By Step 2, it implies the statement of the lemma. Suppose, instead, that the second term dominates the first term. Then by Step 2, r 1 n = O n 1 logn . The statement of the lemma follows. 98 C.1.3 Proof of Theorem 6 By Lemmas 12 or 13, ^ 0 =O p (1= p n) uniformly in 2 ["; 1"]. By Lemma 9 applied to ^ , uniformly in 2 ["; 1"], ^ G 1 ^ g ( ^ ) = ^ G 1 " B n ( ^ ) p n + 1 2 ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) # +O p 1 n 3=2 : The result in equation (3.6) follows from Lemma 10. Under Assumptions 4–6, Lemma 12 yields the first equation in (3.5). Similarly, Lemma 14 yields the second equation in (3.5). To complete the proof, notice that, uniformly in 2 ["; 1"], ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) = ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) +O p p logn n 5=4 : C.2 Second-order bias correction, proofs C.2.1 Auxiliary results Lemma 15. Consider ^ = ^ `p obtained from program (3.3) for somep2 [1;1] or ^ = ^ ;QR , where 2 (0; 1). Under Assumptions 4, 5, and 6, we have E 1 p n B n ( ^ ) =E ^ g ( ^ ) + ^ g ( ^ ) 2 ! + 1 n () +EO p p logn n 5=4 +O 1 n 2 ; where () :=E 1 2 f " (0jW;Z)ZW 0 G 1 Z: Proof. The proof proceeds in six steps. Step 1. Note that 1 p n EB n ( ^ ) = 1 p n E B n ( ^ )B n ( 0 ) =E 1fY 6W 0 ^ gZ Eg ( ^ ): (C.9) Theorem 6 implies ^ = 0 1 n G 1 n X i=1 1fY i 6W 0 i 0 g Z i + ~ R n ; (C.10) where ~ R n = O p n 3=4 p logn . Since by construction, ^ is restricted to a compact set , it is bounded. The term (1fY i 6W 0 i 0 g)Z i has bounded support by Assumption 6. As a result, the remainder term ~ R n has bounded support. It implies that ~ R n has finite moments of all orders. Step 2. Define ^ " i :=Y i W 0 i ^ and split the first term in equation (C.9) as follows: E1fY i 6W 0 i ^ gZ i =E1f^ " i = 0gZ i +E1f^ " i < 0gZ i : (C.11) 99 We can use (C.10) to isolate an influence of observationi, i :=W 0 i G 1 Z i (1fY i 6W 0 i 0 g). Withoutlossofgeneralityfori.i.d.data, weconsideri = 1. Theindicator 1f^ " 1 < 0gcanberewritten as 1 n Y 1 <W 0 1 ^ 1 +n 1 1 o ; where ^ 1 := ^ + 1 n G 1 1fY 1 6W 0 1 0 g Z 1 = 0 1 n G 1 n X j=2 1fY j 6W 0 j 0 g Z j + ~ R n is equal to ^ without the linear influence of the observation i = 1. Then, using Taylor’s theorem (justified below equation (C.12)), EZ 1 P Y 1 <W 0 1 ^ 1 + 1 n 1 j1fY 1 6W 0 1 0 g;Z 1 ;W 1 =EZ 1 P Y 1 <W 0 1 ^ 1 j1fY 1 6W 0 1 0 g;Z 1 ;W 1 +E 1 n Z 1 1 f Y 1 (W 0 1 ^ 1 j1fY 1 6W 0 1 0 g;Z 1 ;W 1 ) +O 1 n 2 : Note also that the first term in the Taylor expansion can be rewritten as EZ 1 P (Y 1 <W 0 1 ^ 1 j1fY 1 6W 0 1 0 g;Z 1 ;W 1 ) =EZ 1 1fY 1 <W 0 1 ^ 1 g =EZ 1 P (Y 1 <W 0 1 ^ 1 jZ 1 ;W 1 ): TheuseofTaylor’stheoremhereisjustifiedbythefollowingargument. Thefunctionf Y 1 (yj ^ 1 ; 1 ;W 1 ;Z 1 ) is measurable since it can be defined as a limit of measurable functions (increments of conditional CDF). As result, for any non-negative measurable function(W 1 ;Z 1 ) with finite expectation, the in- tegralE(W 1 ;Z 1 )f Y 1 (yj ^ 1 ; 1 ;W 1 ;Z 1 ) exists (but may take infinite values). By the law of iterated expectations, E(W 1 ;Z 1 )f Y 1 (yj ^ 1 ; 1 ;W 1 ;Z 1 ) =E(W 1 ;Z 1 )f Y 1 (yjW 1 ;Z 1 ) (see Step 5 below for a detailed justification based on Fubini-Tonelli theorem). By Assumption 5, f Y 1 (yjW 1 ;Z 1 ) is uniformly bounded and E(W 1 ;Z 1 )f Y 1 (yjW 1 ;Z 1 )6 fE(W 1 ;Z 1 )<1: (C.12) The same is true for the derivative of the density @f Y 1 in place of f Y 1 , by Assumption 5. There- fore, P (f Y 1 (yj ^ 1 ; 1 ;W 1 ;Z 1 ) =1) = 0 and P (@f Y 1 (yj ^ 1 ; 1 ;W 1 ;Z 1 ) =1) = 0, which justifies the Taylor expansion of the expectations of the conditional PDF above. By this property (a.s. smoothness of f Y 1 (yj ^ 1 ; 1 ;Z 1 ;W 1 )) and equation (C.10), EZ 1 1 f Y 1 (W 0 1 ^ 1 j ^ 1 ;W 1 ; 1 ;Z 1 ) =EZ 1 1 f Y 1 (W 0 1 0 jW 1 ;Z 1 ; 1 ) +E Z 1 1 W 0 1 ( ^ 1 0 )@f Y 1 (jW 1 ;Z 1 ; 1 ) ; where is some random variable that takes values between W 0 1 ^ 1 and W 0 1 0 . By the boundedness of ^ 1 ,W 1 , andZ 1 , and the fact that ^ 1 = 0 +O p (1= p n), these expectations exist and the second term is of orderEO p (1= p n). 100 By the definition of 1 , the first term can be rewritten as EZ 1 1 f " 1 (0jW 1 ;Z 1 ; 1 ) =EZ 1 W 0 1 G 1 Z 1 1f" i 6 0gf " 1 (0jW 1 ;Z 1 ; 1 ) +EZ 1 W 0 1 G 1 Z 1 f " 1 (0jW 1 ;Z 1 ): Finally, (C.11) becomes EZ 1 P (Y 1 <W 0 1 ^ 1 jZ 1 ;W 1 ) + n Ef " 1 (0jW 1 ;Z 1 )Z 1 W 0 1 G 1 Z 1 +E1f^ " 1 = 0gZ 1 + 1 n + 1 n E Z 1 1 W 0 1 ( ^ 1 0 )@f Y 1 (jW 1 ;Z 1 ; 1 ) +O 1 n 2 ; (C.13) where the term :=EZ 1 W 0 1 G 1 Z 1 1f" 1 6 0gf " 1 (0jW 1 ;Z 1 ; 1 ) and 1 n E Z 1 1 W 1 0 ( ^ 1 0 )@f Y 1 (jW 1 ;Z 1 ; 1 ) =EO p 1 n 3=2 : Step 3. Now consider Eg ( ^ ), the second term in (C.9). Let (Y n+1 ;W n+1 ;Z n+1 ) be a copy of (Y;W;Z), which is independent of the samplefY i ;W i ;Z i g n i=1 . Also define n+1;1 := 1 n W 0 n+1 G 1 Z 1 (1fY 1 6W 0 1 0 g); which satisfiesE n+1;1 = 0. Then Eg ( ^ ) =E1fY n+1 6W 0 n+1 ^ gZ n+1 =EPfY n+1 6W 0 n+1 ^ 1 1 n n+1;1 jW n+1 ;Z n+1 gZ n+1 =EPfY n+1 <W 0 n+1 ^ 1 jW n+1 ;Z n+1 gZ n+1 + 1 n EZ n+1 n+1;1 f Y n+1 (W 0 n+1 0 j1fY 1 6W 0 1 0 g;Z n+1 ;W n+1 ) (C.14) +O 1 n 2 +EO p 1 n 3=2 ; where the remainder rate is derived by an argument similar to the one below equation (C.12). Note that the term in line (C.14) is equal to zero sinceE ( n+1;1 jY n+1 ;W n+1 ;Z n+1 ) = 0 by the i.i.d. data assumption. Thus, combining this equality with (C.13) yields E 1fY 1 6W 0 1 ^ gZ 1 Eg ( ^ ) =EZ 1 PfY 1 <W 0 1 ^ 1 jW 1 ;Z 1 gEZ n+1 PfY n+1 <W 0 n+1 ^ 1 jW n+1 ;Z n+1 g + n EZ 1 W 0 1 G 1 Z 1 f " 1 (0jW 1 ;Z 1 ) +E1f^ " 1 = 0gZ 1 + 1 n +EO p 1 n 3=2 +O 1 n 2 : (C.15) Step 4. Let us simplify the first two terms of equation (C.15). Define ^ 1 := 1 n G 1 n X j=2 (1f" j 6 0g)Z j ; 101 so that ^ 1 is a zero mean r.v. that is independent of Y 1 and ^ 1 = 0 + ^ 1 + ~ R n . Denote ^ 1 :=Y 1 W 0 1 ^ 1 . Apply Taylor’s theorem (as in Step 2) to obtain EZ 1 PfY 1 <W 0 1 ^ 1 jW 1 ;Z 1 ;W 0 1 ~ R n g =EZ 1 Pf ^ 1 <W 0 1 ( 0 + ~ R n )jW 1 ;Z 1 ;W 0 1 ~ R n g =EZ 1 P ( ^ 1 <W 0 1 0 jW 1 ;Z 1 ) +EZ 1 W 0 1 ~ R n f ^ 1 (W 0 1 0 jW 1 ;Z 1 ; ~ R n ) + 1 2 EZ 1 ~ R 0 n W 1 @f ^ 1 (jW 1 ;Z 1 ; ~ R n )W 0 1 ~ R n ; (C.16) where isarandomscalarthattakesvaluesbetweenW 0 1 0 andW 0 1 ~ R n . ByStep1, ~ R n =O p n 3=4 p logn has bounded support and, hence, a finite second moment. Therefore, the last term in (C.16) is finite. For the second term in (C.16), note that ~ R n f ^ 1 (W 0 1 0 jW 1 ;Z 1 ; ~ R n ) = ~ R n f " 1 (W 0 1 ( ^ 1 )jW 1 ;Z 1 ; ~ R n ) = ~ R n f " 1 (0jW 1 ;Z 1 ; ~ R n ) +@f " 1 (~ jW 1 ;Z 1 ; ~ R n ) ~ R n W 0 1 ( ^ 1 ) = ~ R n f " 1 (0jW 1 ;Z 1 ; ~ R n ) +O p p logn n 5=4 ; where ~ is a random scalar that takes values between 0 and W 0 1 ^ 1 . The last equality follows since ~ R n =O p n 3=4 p logn by Step 1 and ^ 1 =O p (1= p n). Hence, (C.16) becomes EZ 1 P (Y 1 W 0 1 ^ 1 <W 0 1 0 jW 1 ;Z 1 ) +EZ 1 W 0 1 ~ R n f " 1 (0jW 1 ;Z 1 ; ~ R n ) +EO p p logn n 5=4 : Similarly, using the i.i.d. assumption, EZ n+1 PfY n+1 <W 0 n+1 ^ 1 jW n+1 ;Z n+1 g =EZ n+1 P (Y n+1 W 0 n+1 ^ 1 <W 0 n+1 0 jW n+1 ;Z n+1 ) +EZ n+1 W 0 n+1 ~ R n f " n+1 (W 0 n+1 ^ 1 jW n+1 ;Z n+1 ; ~ R n ) +EO p p logn n 5=4 =EZ 1 P (Y 1 W 0 1 ^ 1;n <W 0 1 0 jW 1 ;Z 1 ) + Ef " 1 (0jW 1 ;Z 1 )Z 1 W 0 1 E ~ R n +EO p p logn n 5=4 : To summarize, (C.15) becomes E 1fY 1 6W 0 1 ^ gZ 1 Eg ( ^ ) = n EZ 1 W 0 1 G 1 Z 1 f " 1 (0jW 1 ;Z 1 ) +E1f^ " 1 = 0gZ 1 + 1 n +EO p p logn n 5=4 +O 1 n 2 +EZ 1 W 0 1 ( ~ R n E ~ R n )(f " 1 (0jW 1 ;Z 1 ; ~ R n )f " 1 (0jW 1 ;Z 1 )): (C.17) Step 5. Let study the last term in equation (C.17). For any t> 0, consider an auxiliary function (t) :=EZ 1 W 0 1 ( ~ R n E ~ R n )(F " 1 (tjW 1 ;Z 1 ; ~ R n )F " 1 (tjW 1 ;Z 1 )): 102 By definition, for every t> 0, (t) =EZ 1 W 0 1 ( ~ R n E ~ R n )1f0<" 1 6tgEZ 1 W 0 1 ( ~ R n E ~ R n )1f0<" 1 6tg = 0: By the existence of the corresponding conditional PDF (possibly taking infinite values), (t) =EZ 1 W 0 1 ( ~ R n E ~ R n ) Z t 1 (f " 1 (ejW 1 ;Z 1 ; ~ R n )f " 1 (ejW 1 ;Z 1 ))de: By the Fubini-Tonelli theorem for product measures, we can exchange the order of integration, (t) = Z t 1 EZ 1 W 0 1 ( ~ R n E ~ R n )(f " 1 (ejW 1 ;Z 1 ; ~ R n )f " 1 (ejW 1 ;Z 1 )) de: Hence, by the main theorem of calculus, for all e> 0, @ (e) @e =EZ 1 W 0 1 ( ~ R n E ~ R n )(f " 1 (ejW 1 ;Z 1 ; ~ R n )f " 1 (ejW 1 ;Z 1 )): Since the function (t) 0, we have @ (0) @e =EZ 1 W 0 1 ( ~ R n E ~ R n )(f " 1 (0jW 1 ;Z 1 ;R n )f " 1 (0jW 1 ;Z 1 )) = 0: Therefore, equation (C.17) becomes E 1fY 1 6W 0 1 ^ gZ 1 Eg ( ^ ) = n EZ 1 W 0 1 G 1 Z 1 f " 1 (0jW 1 ;Z 1 ) +E1f^ " 1 = 0gZ 1 + 1 n +EO p p logn n 5=4 +O 1 n 2 : (C.18) Step 6. Let us simplify the second and the third terms in equation (C.18). The latter can be rewritten as E 1fY 1 6W 0 i ^ gZ 1 E 1fY n+1 6W 0 n+1 ^ gZ n+1 (C.19) =E1f^ " 1 = 0gZ 1 E 1fY 1 >W 0 i ^ gZ 1 +E 1fY n+1 >W 0 n+1 ^ gZ n+1 =E1f^ " 1 = 0gZ 1 E 1fY 1 6W 0 i ( ^ )g (1) Z 1 +E 1fY n+1 6W 0 n+1 ( ^ )g (1) Z n+1 (C.20) =E1f^ " 1 = 0gZ 1 1 n EZ 1 W 0 i G 1 Z 1 (1)f " (0jW 1 ;Z 1 ) + 1 n +E1f^ " 1 = 0gZ 1 +EO p p logn n 5=4 +O 1 n 2 = 1 n EZ 1 W 0 1 G 1 Z 1 ( 1)f " (0jW 1 ;Z 1 ) 1 n +EO p p logn n 5=4 +O 1 n 2 ; (C.21) where :=EZ 1 W 0 1 G 1 Z 1 1f" 1 6 0gf " 1 (0jW 1 ;Z 1 ; 1f" 1 6 0g), and the last equality uses (C.18). Notice that it follows from equations (C.19) and (C.20) that E1f^ " 1 = 0gZ 1 =E ^ g( ^ ) + ^ g ( ^ ) : (C.22) 103 Using the definition of f " 1 (0jW 1 ;Z 1 ; 1f" 1 6 0g) and Fubini-Tonelli theorem as in Step 5, =EZ 1 W 0 1 G 1 Z 1 1f" 1 6 0gf " 1 (0jW 1 ;Z 1 ; 1f" 1 6 0g) = lim t#0 EZ 1 W 0 1 G 1 Z 1 1f" 1 6 0g 1f" 1 6 0g 1f" 1 6tg t = lim t#0 EZ 1 W 0 1 G 1 Z 1 1f" 1 6 0g 1f" 1 6tg t =EZ 1 W 0 1 G 1 Z 1 f " 1 (0jW 1 ;Z 1 ): Since " 1 has conditional density by Assumption 5, the same argument can be applied to show = . Hence, equations (C.18) and (C.21) imply 1 n = 1 2 E ^ g( ^ ) + ^ g ( ^ ) + 1 n EZ 1 W 0 1 G 1 Z 1 f " 1 (0jW 1 ;Z 1 ) +EO p p logn n 5=4 +O 1 n 2 : Finally, equations (C.18) and (C.22) yield E 1 p n B n ( ^ ) = 1 n EZ 1 W 0 1 G 1 Z 1 1 2 f " (0jW 1 ;Z 1 ) +E ^ g( ^ ) + ^ g ( ^ ) 2 ! +EO p p logn n 5=4 +O 1 n 2 ; which completes the proof. Lemma16. Suppose that a functionf(x) is four times continuously differentiable in a neighborhood of x. Then for sufficiently small h2R, @ x f(x) = f(x +h)f(xh) 2h +O(h 2 ); @ x;x f(x) = f(x +h) 2f(x) +f(xh) h 2 +O(h 2 ): Proof. See Chapter 5 in Olver (2014) and p.884 in Abramowitz and Stegun (1972). C.2.2 Proofs of main results on bias correction Proof of Theorem 7. By Theorem 6, ^ = ^ +G 1 " ^ g ( ^ ) B n ( ^ ) p n 1 2 ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) # +R n; ; where R n; =O p n 5=4 p logn . 104 Lemma 15 implies 1 p n EB n ( ^ ) =E ^ g( ^ ) + ^ g ( ^ ) 2 ! + 1 n () +EO p p logn n 5=4 +O 1 n 2 : For correctly specified models,E ^ = 0 and, for each component j, we have ( ^ 0 ) 0 @ G j ( 0 )( ^ 0 ) =E^ g 0 ( 0 )(G 1 ) 0 @ G j ( 0 )G 1 ^ g ( 0 ) = 1 n Q 0 j vec( ): By definition ofB( ^ ), we can ignore the terms ER n; , EO p n 5=4 p logn , and O n 2 . The statement of the theorem follows. Proof of Theorem 8. Theorem 6 implies the following asymptotic expansion for the bias corrected estimator, ^ bc = ^ +G 1 " ^ g ( ^ ) B n ( ^ ) p n 1 2 ( ^ 0 ) 0 @ G( 0 )( ^ 0 ) # (C.23) EG 1 ^ g ( ^ ) ^ g ( ^ ) 2 + 1 n G 1 + 1 2 Q 0 vec( ) (C.24) ( ^ G 1 G 1 ) ^ g ( ^ ) ^ g ( ^ ) 2 + 1 n ( ^ G 1 G 1 ) ^ + 1 2 ^ Q 0 vec( ^ ) (C.25) + 1 n G 1 ^ + 1 2 ^ Q 0 vec( ^ )Q 0 vec( ) +R n; (C.26) G 1 ^ g ( ^ ) ^ g ( ^ ) 2 +EG 1 ^ g ( ^ ) ^ g ( ^ ) 2 : (C.27) By Theorem 7, the expectation of the sum of the terms in (C.23) and (C.24) is zero. By Theorem 6, Assumption 7, and the Mann-Wald and Delta theorems, the terms in (C.25) and (C.26) are o p n 1 . The last line, expression (C.27), has zero mean by Assumption 6. Therefore, E ^ bc =Eo p (n 1 ) and the statement of the theorem follows from the definition of the second-order bias. Proof of Lemma 1. Notice that by Lemma 10, for any h, E n 1fY 6W 0 ^ +hg 2h Z i W j E F Y (W 0 +hjW;Z) 2h Z i W j = ^ =O p p logh n p nh +O p logh nh : Then using Lemma 16 for h n ! 0 and the Delta theorem, we obtain E n 1fY 6W 0 ^ +h n g 1fY 6W 0 ^ h n g 2h n Z i W j =G i;j () +O p 1 p n +O h 2 n +O p p logh n p nh n +O p logh n nh n : 105 The overall rate is max n 1=2 ;h 2 n ; (nh n ) 1=2 p logh n ; (nh n ) 1 logh n . Note that by Lemmas 10, 12, and 14, and Assumption 5, this remainder rate is uniform in 2 ["; 1"]. Ignoring a loga- rithmic factor, we see that the bandwidth h 1;n /n 1=5 delivers an optimal overall remainder rate O p n 2=5 p logn that is uniform in 2 ["; 1"] . Similarly, by Lemma 16, \ (@ G i;` ) j =e 0 i @ G j ()e ` +O p 1 p n +O(h 2 n ) +O p p logh n p n p h 3 n ! +O p logh n nh 2 n : Taking h 2;n /n 1=7 , we obtain the optimal remainder rate \ (@ G i;` ) j =e 0 i @ G j ()e ` +O p p logn n 2=7 : Notice that ^ Q j =Q j +O maxfk ^ GGk;k \ (@ G i;` ) j (@ G i;` ) j kg =Q j +O p p logn n 2=7 : By an argument similar to the above, ^ = +O p 1 p n +O p p logh n p nh n +O p logh n nh n +O(h 2 n ) +O k ^ GGk h n ! : Since,undertheoptimalstepsizefortheestimator ^ G,wehaveO(k ^ GGk=h n ) =O p h 1 n n 2=5 p logn , the overall rate is max n 1=2 ; p logn h n n 2=5 ; logh n nh n ; h 2 n ; p logh n p nh n : The optimal bandwidth is h 3;n /n 2=15 with ^ = +O p p logn n 4=15 : BytheCLTandtheequicontinuityoftherelevantsamplemomentfunctionsimpliedbyLemma10, we get ^ := d Var[Z(1fY 6W 0 ^ g)] =E n [(1fY 6W 0 ^ g)ZZ 0 ] E n [Z(1fY 6W 0 ^ g)]E n [Z 0 (1fY 6W 0 ^ g)] = +O p 1 p n : 106 C.3 Illustration of approximate bias formula in univari- ate case Suppose we are interested in estimating the-quantile of a uniformly distributed outcome vari- able Y. This is a special case of the general framework with W =Z = 1, f Y (y) = 1f06y6 1g. Note that, under the maintained assumptions, the true parameter 0 has an equivalent alterna- tive definition as a solution to E[(1fY 6W 0 ( 0 )g (1))Z] = 0: As a result, there are two ways of defining an estimator: as a minimizer ofj^ g ()j or as a minimizer ofj^ g ()j, where ^ g () =E n (1fY 6g); ^ g () =E n 1fY 6g (1): The derivatives of the population moment conditions g () =g () = 0 are G = 1, @ G = 0 and G := @ g () =1, @ G = 0, respectively. In either case, the closure of the argmin set will be [Y (k) ;Y (k+1) ], where k :=bnc. If the fractional partfng := nbnc6 1 2 , a minimizer of j^ g ()j (j^ g ()j) is the order statisticY (k) (Y (k+1) , respectively); iffng> 1 2 , a minimizer ofj^ g ()j (j^ g ()j) is Y (k+1) (Y (k) , respectively). Of course, on the real lineR 1 , all normskk p ; p2 [1;1], coincide with the absolute valuejj. In this simple example, formula (3.7) yields asymptotic bias expansions EY (k) = kn n + 1 n 1 2 1 2n +o 1 n = fng n n +o 1 n ; (C.28) EY (k+1) = kn n + 1 n 1 2 + 1 2n +o 1 n = fng n + 1 n +o 1 n : (C.29) The exact bias formulas are given by (e.g., Ahsanullah et al., 2013) EY (k) = k n + 1 = fng n + 1 n + 1 ; EY (k+1) = k + 1 n + 1 = fng n + 1 + 1 n + 1 : Comparing these formulas with the asymptotic formulas (C.28) and (C.29), we see that they indeed coincide up to O n 2 . Figure 3.1 in the main text illustrates the exact and the second-order bias formula (scaled by n) for n = 10. 107 C.4 Exact QR and IVQR algorithms First consider a linear programming (LP) implementation of the QR regression (3.2) (Koenker, 2005, Section 6.2): min ;r;s 0 r + (1) 0 s s:t: " i =r i s i =Y i W 0 i ; i = 1;:::;n; r i > 0;s i > 0; i = 1;:::;n: Here is an (n 1) vector of ones. This formulation allows us to apply LP solvers like Gurobi to obtain the exact minimum in (3.2). Next consider the exact estimator for the IVQR case, ^ ;1 = argmin 2 jj^ g ()jj 1 : The underlying optimization problem can be equivalently reformulated as a mixed integer linear program (MILP) with special ordered set (SOS) constraints, min e;;r;s;t 0 t s:t: " i =r i s i =Y i W 0 i ; i = 1;:::;n; (r i ;e i )2SOS 1 ; i = 1;:::;n; (s i ; 1e i )2SOS 1 ; i = 1;:::;n; r i > 0;s i > 0; i = 1;:::;n; e i 2f0; 1g; i = 1;:::;n; t l 6Z 0 l (e)6t l ; l = 1;:::;d: whereZ l is ann1 vector of realizations of instrumentl. All constraints except the last one coincide with the ones derived by Chen and Lee (2018) in Appendix C.1. The last constraint ensures that the objective function is the ` 1 norm of the just identifying moment conditions. Remark8. We also considered the “big-M” formulation while performing the Monte Carlo analyses. The big-M formulation has certain computational advantages, although the arbitrary choice of tuning parameters may result in sub-optimal solutions. This problem is more prominent for tail quantiles. Since the big-M formulation does not guarantee exact solutions, consistent with our theory, the choice of tuning parameters may affect the asymptotic bias. We prefer the above SOS formulation because it does not depend on tuning parameters as the big-M MILP/MIQP formulations in Chen and Lee (2018) and Zhu (2019). 2 2 These papers pick the value of the tuning parameter M as a solution to a linear program that in turn depends on the choice of an arbitrary box around a linear IV estimate. This is problematic if there is a lot of heterogeneity in the coefficients across quantiles. Moreover, in the linear model with heavy tailed residuals, the linear IV estimator is not consistent. 108 C.5 Stochastic expansion of 1-step corrected IVQR esti- mators In the main text, we focus in classical QR and exact IVQR estimators. As shown in the following corollary, the results in Theorem 6 can be used to obtain a uniform BK expansion for general IVQR estimators after a feasible 1-step correction. Corollary3. Suppose that Assumptions4–6 hold. Consider any estimator ^ such that sup 2[";1"] k ^ 0 k =O p n 1=2 . Then sup 2[";1"] k ^ ^ G 1 ^ g ( ^ ) ^ k =O p p logn n 3=4 ; where ^ G is defined in (3.8). Proof. By Lemma 9 applied to ^ , uniformly in 2 ["; 1"], ^ G 1 ^ g ( ^ ) = ^ +O p p logn n 3=4 : Under the maintained assumptions, Lemma 1 implies sup 2[";1"] k ^ G( ^ )G( 0 )k =O p n 2=5 p logn : By Lemma 8, @G() is bounded uniformly over 2 . Then, by Assumption 4.2 and continuity of the minimal eigenvalue function, the eigenvalues of G() are bounded away from zero on 2 . Therefore, the derivative of the inverse matrix function, F (A) := A 1 , is uniformly bounded over G( 0 ) for 2 ["; 1"]. Hence, by the element-wise Taylor expansion of F at G( 0 ), sup 2[";1"] k ^ G 1 ( ^ )G 1 ( 0 )k =O sup 2[";1"] k ^ G( ^ )G( 0 )k ! =O p n 2=5 p logn : By Lemma 10, sup 2[";1"] k^ g ( ^ ) ^ g ( 0 )k =O p p logn n 3=4 : (C.30) By Donsker’s theorem, sup 2[";1"] k^ g ( 0 )k =O p 1 n 1=2 ; so, by the triangular inequality and (C.30), sup 2[";1"] k^ g ( ^ )k =O p 1 n 1=2 : Then sup 2[";1"] k ^ G 1 ( ^ )^ g ( ^ )G 1 ( 0 )^ g ( ^ )k =O p n 1=2 n 2=5 p logn ; which concludes the proof. 109 C.6 Additional figures Figure C.1: Bias (multiplied by n) before and after correction for DGP1, sensitivity to bandwidth choice (a) A G 0:5 (b) A Q 0:5 (c) A 0:5 Notes: The panels display the bias (multiplied by n) of the intercept and the slope for classical QR without bias correction (blue dots), QR with feasible bias correction using the baseline bandwidth choiceA = (1; 1; 1) (gold crosses), QR with feasible bias correction using0:5 deviation in one bandwidth at a time (up-pointing triangles for +0:5, down-pointing triangles for0:5), and QR with infeasible bias correction (gold dashed line) for DGP1. In Panel (a), we vary A G , in Panel (b), we vary A Q , and in Panel (c), we vary A . All results are based on 40,000 simulation repetitions. 110
Abstract (if available)
Abstract
This dissertation brings together three research papers in nonparametric and finite-sample econometrics.
In the first paper (Chapter 1), which is joint work with Hyungsik Roger Moon, for an $N \times T$ random matrix $X(\beta)$ with weakly dependent uniformly sub-Gaussian entries $x_{it}(\beta)$ that may depend on a possibly infinite-dimensional parameter $\beta\in \mathbf{B}$, we obtain a uniform bound on its operator norm of the form $\E \sup_{\beta \in \mathbf{B}} ||X(\beta)|| \leq CK \left(\sqrt{\max(N,T)} + \gamma_2(\mathbf{B},d_\mathbf{B})\right)$, where $C$ is an absolute constant, $K$ controls the tail behavior of (the increments of) $x_{it}(\cdot)$, and $\gamma_2(\mathbf{B},d_\mathbf{B})$ is Talagrand's functional, a measure of multi-scale complexity of the metric space $(\Bbf,d_\Bbf)$. We illustrate how this result may be used for estimation that seeks to minimize the operator norm of moment conditions as well as for estimation of the maximal number of factors with functional data.
The second paper (Chapter 2), which is joint work with Pasha Andreyanov, is concerned with inference for auctions. For a classical model of the first-price sealed-bid auction with independent private values, we develop nonparametric estimation and inference procedures for a class of policy-relevant metrics, such as total expected surplus and expected revenue under counterfactual reserve prices. Motivated by the linearity of these metrics in the quantile function of bidders' values, we propose a bid spacings-based estimator of the latter and derive its Bahadur-Kiefer expansion. This makes it possible to construct exact uniform confidence bands and assess the optimality of a given auction rule. Using the data on U.S. Forest Service timber auctions, we test whether setting zero reserve prices in these auctions was revenue maximizing.
In the third paper (Chapter 3), which is joint work with Bulat Gafarov and Kaspar Wuthrich, we study the bias of classical quantile regression and instrumental variable quantile regression estimators.
While being asymptotically first-order unbiased, these estimators can have non-negligible second-order biases. We derive a higher-order stochastic expansion of these estimators using empirical process theory. Based on this expansion, we derive an explicit formula for the second-order bias and propose a feasible bias correction procedure that uses finite-difference estimators of the bias components. The proposed bias correction method performs well in simulations. We provide an empirical illustration using Engel's classical data on household expenditure.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on linear and non-linear econometric dependencies
PDF
Essays on econometrics
PDF
Nonparametric ensemble learning and inference
PDF
Essays on causal inference
PDF
A structural econometric analysis of network and social interaction models
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on economics of education
PDF
Essays on treatment effect and policy learning
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essays on econometrics analysis of panel data models
PDF
Three essays on econometrics
PDF
Behavioral approaches to industrial organization
PDF
Two essays on financial econometrics
PDF
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
PDF
Essays on development and health economics: social media and education policy
PDF
Essays on beliefs, networks and spatial modeling
PDF
Essays on price determinants in the Los Angeles housing market
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Essays on competition and antitrust issues in the airline industry
PDF
Non-parametric models for large capture-recapture experiments with applications to DNA sequencing
Asset Metadata
Creator
Franguridi, Grigory
(author)
Core Title
Essays on nonparametric and finite-sample econometrics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Degree Conferral Date
2023-05
Publication Date
04/26/2023
Defense Date
04/25/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bias correction,counterfactual evaluation,factor model,first-price auctions,OAI-PMH Harvest,quantile regression,random matrix
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Moon, Hyungsik Roger (
committee chair
), Armstrong, Timothy (
committee member
), Lototsky, Sergey (
committee member
), Ridder, Geert (
committee member
)
Creator Email
franguri@usc.edu,franguridi@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113078050
Unique identifier
UC113078050
Identifier
etd-Franguridi-11716.pdf (filename)
Legacy Identifier
etd-Franguridi-11716.pdf
Document Type
Dissertation
Format
theses (aat)
Rights
Franguridi, Grigory
Internet Media Type
application/pdf
Type
texts
Source
20230426-usctheses-batch-1031
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
bias correction
counterfactual evaluation
factor model
first-price auctions
quantile regression
random matrix