Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Topics in selective inference and replicability analysis
(USC Thesis Other)
Topics in selective inference and replicability analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TOPICS IN SELECTIVE INFERENCE AND REPLICABILITY ANALYSIS by Jinting Liu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2022 Copyright 2022 Jinting Liu To my grandparents, 刘学坤 and刘秀云 ii Acknowledgements Throughout the years of graduate school, I have received a great deal of support and assistance. I would like to take this opportunity to acknowledge these people. First I would like to express my very great appreciation to my advisor, Prof. Wenguang Sun, for his support, constructive advice and patient guidance. This dissertation would not have been possible without his support and encouragement. I would also like to thank Bowen Gang for the discussion and suggesting a key element to the numerical implementation. I would like to thank the rest of my dissertation committee, Prof. Remigijus Mikulevicius and Prof. Sergey Lototsky. I would also like to thank the support of Prof. Jay Batro who has left USC for a new position at the University of Texas at Austin. I also want to extend my thanks to the graduate school and everyone in the Math Department including the professors, the stas and my cohort. Last but not the least, I would like to thank my parents for being supportive, and my friends, especially the girls from Africa Hall, for being there for me, not letting me drift away and helping me see the world from dierent perspectives. iii Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Abstract ix 1 Selective Inference with Condent Directions 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Model and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Denitions and goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Comparison and Connection to Existing Approaches . . . . . . . . . . . . . . 9 1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Oracle procedure for unweighted setting . . . . . . . . . . . . . . . . 13 1.4.2 Oracle procedure for weighted setting . . . . . . . . . . . . . . . . . . 18 1.4.3 Data-driven procedure and estimation . . . . . . . . . . . . . . . . . 22 1.5 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.5.1 Unweighted settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.5.2 Discussion on proposed algorithm . . . . . . . . . . . . . . . . . . . . 32 iv 1.5.3 Weighted settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.7 Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.7.1 Proof of Theorem 1.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.7.2 Proof of Theorem 1.4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.8 Proof of Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 1.8.1 Proof of Proposition 1.4.1 . . . . . . . . . . . . . . . . . . . . . . . . 47 1.8.2 Proof of Proposition 1.4.3 . . . . . . . . . . . . . . . . . . . . . . . . 47 1.9 Proof of Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 1.9.1 Proof of Lemma 1.7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 1.9.2 Proof of Lemma 1.8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 1.10 Supplementary Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 59 2 Detecting Simultaneous Signals in the Presence of Local Structures 62 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.2 Statement of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.4.1 Oracle procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.4.2 Estimation using LAWS . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.4.3 Data-driven procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.4.4 Theoretical properties of the data-driven procedures . . . . . . . . . . 84 2.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 2.7 Proof of Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 2.7.1 Proof of Proposition 2.4.1 . . . . . . . . . . . . . . . . . . . . . . . . 92 2.7.2 Proof of Proposition 2.4.2 . . . . . . . . . . . . . . . . . . . . . . . . 93 2.8 Proof of Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 v 2.8.1 Proof of Theorem 2.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Bibliography 103 vi List of Tables 1.1 Types of errors in directional inference problem . . . . . . . . . . . . . . . . 5 1.2 FSR and power for varying weighting schemes . . . . . . . . . . . . . . . . . 37 1.3 Number of selections in each block for varying weighting schemes . . . . . . 60 vii List of Figures 1.1 Unweighted setting i N(; 1) where N(0; 2 ) . . . . . . . . . . . . . . 30 1.2 Unweighted setting i p + LT N(0; 1) + (1p + )RT N(0; 1) . . . . . . . . . 31 1.3 Unweighted setting with heteroskedastic error . . . . . . . . . . . . . . . . . 32 1.4 Unweighted setting unimodal about 0 designed for the NewDeal procedure . 34 1.5 Weighted setting 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.6 Weighted setting 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.7 Unweighted settings not unimodal about 0 . . . . . . . . . . . . . . . . . . . 59 1.8 Varying the grid size used by the deconvolution method . . . . . . . . . . . . 61 2.1 Performance of the oracle procedures when there is no local structure . . . . 99 2.2 Performance of the oracle procedures part I . . . . . . . . . . . . . . . . . . 100 2.3 Performance of the data-driven procedures part I . . . . . . . . . . . . . . . 100 2.4 Performance of the oracle procedures part II . . . . . . . . . . . . . . . . . . 101 2.5 Performance of the data-driven procedures part II . . . . . . . . . . . . . . . 101 2.6 Estimation of local structures . . . . . . . . . . . . . . . . . . . . . . . . . . 102 viii Abstract Statistical hypothesis test is among the most eminent methods in statistics as it has been applied to a wide range of practical problems in many disciplines. Multiple testing refers to the testing of more than one hypothesis at a time. Procedures developed for this type of problem aims to make as many rejections of the null hypotheses as possible while controlling some error rates such as the false discovery rate in large-scale setting, for which there can be tens of thousands or millions hypotheses to be tested simultaneously such as in genomics. The null hypothesis being tested against the alternative can be exact, i.e. an exact parameter value is specied, or inexact, i.e. only a range or interval is specied. However, performing such a test may not be able to arrive at a satisfying answer directly, and sometimes, the procedures designed for this general multiple hypothesis testing fail to incorporate auxiliary information that can be utilized to improve the power of the procedure. For example, in two-sample multiple testing, researchers are very often more interested in the direction of the dierence if there is a dierence, which is in fact a debated statement on its own. In replicability analysis, researchers want to identify features or locations that are non-null in both studies and these signals may have intrinsic local structures as they tend to be clustered in certain regions. This dissertation is on the topics of selective inference and replicability analysis studying the two aforementioned problems. Chapter 1 of this dissertation studies the problem of selective inference with condent directions. A general selective inference framework is developed for the problem of directional analysis and it incorporates the researchers' preferences as selection weights in the denition of the power. An oracle procedure is developed to maximize the weighted ix power while controlling the false selection rate. A deconvolution method for estimating the underlying distribution of the eect sizes being classied is introduced. Theories and simulations results are presented to justify the proposed procedure. Chapter 2 extends the topic of replicability analysis to the adaptive detection of simultaneous signals under the presence of strong and smooth local structures in terms of location-based signal frequency. Two procedures are proposed to incorporate the information provided by the two studies into the construction of the p-values using two dierent approaches and then apply the procedural weights derived from the local structural information regarding the simultaneous signals. Estimators are constructed and then numerically estimated to extract the structural information. Theories and simulations results are presented to justify the two proposed procedures. x Chapter 1 Selective Inference with Condent Directions 1.1 Introduction The classical formulation of two-sample multiple testing aims to answer the question of whether the eects of A and B are dierent. However, this is arguably an ill-posed question as the eects of A and B are almost always dierent { in some decimal place at least { for any A and B, with the exception of very few applications if there are any. An alternative formulation of the problem is to test a composite null H 0 : A B 2R, where R is referred to as an \indierence region" but the specication of this indierence region, although helpful in many scenarios, is not desirable in a range of applications as it tends to be too subjective. In The Philosophy of Multiple Comparisons, Tukey (1991) alluded to the idea that the question we should be answering is \Can we tell the direction in which the eects of A dier from the eects of B?" Equivalently, we should aim to answer with condent about the direction from A to B, whether it is up, down, or uncertain. It is important to note that \uncertain" here refers to an indecision and it does not necessarily indicate acceptance of the traditional null hypothesis that the eects of A and B are equal. Whereas the decision space is alwaysf1; 0; 1g, the actual state space can bef1; 1g or f1; 0; 1g depending on the applications and/or the beliefs of the researchers. This motivates us to formulate the two-sample multiple testing problem in the framework of selective inference with condent directions. This formulation posits a proper question 1 without introducing a subjective quantity. Moreover, when directions are involved in the inference, very often the researchers might have a preference to some degree on selecting certain direction, positive over negative or negative over positive, and sometimes this preference may vary by location when the index represents temporal or spatial location. The main contributions of this work include: (i) We extended the idea of Tukey to formulate a new framework for its two-sample selective inference. In particular, our denition of false selection rate (FSR), although similar to the idea of the mixed directional false discovery rate (FDR), has a dierent interpretation and is more inclusive. (ii) We reviewed, consolidated and improved upon existing formulations and approaches related to the unweighted version of the selective directional inference problem. (iii) We worked under a general model with a minimal set of assumptions to develop the methodology and the theories for the proposed problem. (iv) We obtained optimality results and incorporated selection/preference weights into the power analysis; (v) To make the oracle procedure applicable in practice, we developed a deconvolution method to estimate the underlying distribution of the eect sizes. The rest of this chapter is organized as follows. In Section 2, we will formulate our problem of selective directional inference with condence and preference, and introduce the error-rate and power functions that will be used to evaluate procedures for this problem. In Section 3, we will review existing works as well as approaches relevant to our problem, and highlight the dierences and the connections between their works and ours. Section 4 focuses on methodology and we will describe our proposed oracle and data-driven procedures along with their theoretical properties and the theoretical foundation behind it. We then conduct simulation studies for dierent settings to investigate the validity and optimality of our proposed data-driven procedure as well as the proposed deconvolution method, results are presented in Section 5. 2 1.2 Problem Formulation In this section, we will formulate a general decision theoretic framework for the problem of preference-based weighted selective inference on direction with condence. We will rst introduce our model and notation, and then discuss the error and power functions that will be used to assess the proposed method. 1.2.1 Model and notation First we will introduce the model behind our problem. Suppose there are m observations, X 1 ; ;X m , that are generated according to the following model possibly with heteroscedastic errors: X i = i + i ; i iid G() and i N(0; 2 i ) i = 1;:::;m; (1.2.1) where i 's are m location parameters, or eect sizes, G() is the unknown cumulative distribution function with its probability density function denoted by g(), which can be continuous, discrete or mixed. We further assume that the errors are heteroscedastic, i.e. i 's are random variables whose values are either known or can be estimated well. What we have in (1.2.1) is a exible general non-parametric model with little assumption on the underlying truth requiring no prior information. One special case of our general model is the commonly used one-way random eects model that was considered for a similar problem in Lewis and Thayer (2004), Lewis and Thayer (2009) and Sarkar and Zhou (2008), in which the population means, 1 ; ; m , are assumed to be independently and identically distributed according to i j; 2 N(; 2 ); i = 1;:::;m; (1.2.2) where is the overall mean that is assumed to be known a-prior. Another special case is 3 the mixture model considered in Bansal and Miescke (2013) with the prior on i such that f( i ) =p f ( i )I( i < 0) +p 0 f 0 ( i )I( i = 0) +p + f + ( i )I( i > 0); (1.2.3) and it has a hierarchical structure with the rst stage prior on the probabilities p =P ( i < 0), p 0 =P ( i = 0) and p + =P ( i > 0) such that p +p 0 +p + = 1, and the second stage prior on the density functions of i conditioned on the three scenarios f ( i ) =f( i j i < 0), f 0 ( i ) =f( i j i = 0) and f + ( i ) =f( i j i > 0). Notice that in the one-way random eects model (1.2.2), it is assumed to be certain that i 6= 0, which is often true in many applications, and i = 0 is therefore excluded from consideration. Here in this formulation (1.2.3), however, the case i = 0 is not excluded and i is assumed to possibly have a point mass at 0. Such formulation can be relevant in genetic research when testing for gene expression levels, for even though some genes can be over-expressed or under-expressed, a majority of them would not be expressed, and therefore the possibility of i = 0 cannot be ignored. The advantage of our model in (1.2.1) is that we do not make assumption on the possibility or impossibility of i = 0, and our selective inference procedure is designed not to rely on such assumption either. 1.2.2 Denitions and goals Next we will introduce the error-rate and the weighted power functions that will be used to evaluate our weighted selective inference procedure. Let = ( 1 ;:::; m )2f1; 0; +1g m denote the true state of nature, where i = 8 > > > > > > < > > > > > > : 1 indicating i < 0 0 indicating i = 0 1 indicating i > 0 ; 4 and = ( 1 ;:::; m )2f1; 0; +1g m the directional selection made for the m eects, where i = 8 > > > > > > < > > > > > > : 1 indicating the decision to claim i < 0 0 indicating the indecision on i 1 indicating the decision to claim i > 0 : Since we make no assumptions on whether i = 0 is realistic or not, i = 0 simply indicates that no selection is made in regards to the sign of the ith eect, either because i = 0 or no directional inference can be made with required level of condence. If it is known in advance that i = 0 is indeed unrealistic, then i = 0 should be interpreted as the direction being uncertain. Our goal is to identify the decision rule that can maximize the power while controlling the error. In the aforementioned situation in which we make no assumption on the possibility or impossibility of i = 0, the types of errors that may occur are summarized in Table 1.1 by following the conventional notion of classication errors. we can see that Type I, or false Table 1.1: Types of errors in directional inference problem = 0 = +1 =1 = 0 correct Type I error Type I error = +1 Type II error correct Type III error =1 Type II error Type III error correct positive, error is a non-directional error that occurs when a noise is incorrectly classied as a signal of positive or negative direction; Type II, or false negative, error occurs when a signal of interest is missed; and Type III is the directional error that occurs when an eect of positive direction is classied as negative or vice versa. In the scenario that i 6= 08i, only the directional errors (Type III) and false negative errors (Type II) need to be considered. The commonly used error rate in large-scale multiple testing is the standard False Discovery Rate (FDR) originally proposed in Benjamini and Hochberg (1995), and it is 5 dened as the expected proportion of Type I errors among the selected discoveries. Since our selective inference problem on direction is a three-decision problem instead of a binary classication problem, the denition of false discovery rate becomes insucient. In Benjamini et al. (1993), the FDR framework was extended to address the problem of directional errors by introducing two variants of directional FDR: the pure (Type III) directional FDR and the mixed (Type I and III) directional FDR. By encompassing the concept of the false discovery rate, we propose a rather general and exible error rate for our selective inference problem, and call it the False Selection Rate (FSR), which is dened as the expected fraction of erroneous decisions among all claims made, FSR D :=E P m i=1 I( i 6= i )I( i 2D) P m i=1 I( i 2D)_ 1 ; (1.2.4) whereI is an indicator function and Df1; 0; +1g is a subset of the decision space over which the error rate is dened. When D is chosen to bef1; +1g, we recover the mixed directional FDR (mdFDR), FSR := FSR f1;+1g =E P m i=1 [I( i 0)I( i = +1) +I( i 0)I( i =1)] P m i=1 I(j i j = 1)_ 1 ; (1.2.5) and this is be the error rate we will be using. Depending on the purpose of the inference, researchers may be more concerned of the errors among positive, or negative, claims instead of all claims made. In this case, the error rates can be dened over positive or negative claims only: FSR =E P m i=1 I( i 0)I( i =1) P m i=1 I( i =1)_ 1 ; FSR + =E P m i=1 I( i 0)I( i = +1) P m i=1 I( i = +1)_ 1 : (1.2.6) In order to facilitate the theoretical derivations, we will replace the expectation of the ratio 6 by the ratio of two expectations, and dene the marginal False Selection Rate (mFSR) as mFSR D := E [ P m i=1 I( i 6= i )I( i 2D)] E [ P m i=1 I( i 2D)_ 1] : (1.2.7) In the case of the false discovery rate (FDR), it has been shown in Genovese and Wasserman (2002) that under weak conditions, mFDR = FDR +O(m 1=2 ), where m is the number of eects to be made inference on, implying that in large-scale testing or inference problems, the dierence between the marginal false discovery rate (mFDR) and the false discovery rate (FDR) is negligible. Therefore, the asymptotic equivalence between our false selection rate (FSR) and marginal false selection discovery rate (mFSR) can be established similarly. For the selective directional inference problem with weights, we will modify the expected number of true positive results (ETP), a commonly used power measure in multiple hypothesis testing, and dene the expected number of true directional decisions (ETD) as ETD =E " m X i=1 I( i i = 1) # : (1.2.8) This power function is similar to the Bayes directional per-comparison power rate (BDPCPR) used by Lewis and Thayer (2004) and Sarkar and Zhou (2008) with BDPCPR = ETD m . In our weighted framework, we aim to incorporate weights indicating selection preference into our procedure. Note that the weights we are considering are selection weights, not to be confused with \procedural weights" (Benjamini and Hochberg, 1997) that are used to increase the (unweighted) power of the procedure (1.2.8). Depending on the context of the application and the purpose of the researchers, correct selection of one direction may be more important than that of the other direction, and correct selection of eects in some groups or locations may also be more important than others. The importance of incorporating such selection weights can be demonstrated in the application of neuro{imaging analysis when researchers are looking for regional brain 7 activation and deactivation. We will incorporate these selection preferences as pre-specied weights independent of the underlying true or the observed data, re ecting dierential attitude towards selection of dierent directions for each eect in the power function, which we will dene as the weighted expected number of true directional (+/-) (wETD) decisions: wETD :=E ( m X i=1 w i I( i = i =1) +w + i I( i = i = +1) ) (1.2.9) =E ( m X i=1 n (w i ^w + i )I( i i = 1) + w i w + i I i = i = 1 2I(w i >w + i ) o ) =E ( m X i=1 w i I( i i = 1) +w d i I( i = i =d i ) ) (1.2.10) where in equation (1.2.9), w 1 ;:::;w m 1 are the weights placed on correctly identied negative directions and w + 1 ;:::;w + m 1 on positive directions, and to oer a dierent perspective on the purpose of the weights, we rewrote the wETD in a dierent form (1.2.10) with w i 's representing the (directionless) weights for correctly identifying the direction at i and w d i 's as the (directional) weights indicating the additional power gain contributed if the correctly identied direction is of d i , a desired direction at i. Note that even though there are a total of 2m weights introduced, in practice, depending on the context and the purpose of the inference, the number of weights can be reduced. When w i =w + i for all i, we have w d i = 0 and in this case, the two possible directions at i are treated equally without preference for either direction with the weights w i 's remaining in place, and the problem becomes similar to that in Basu et al. (2018) but in a directional setting. The weighting scheme most researchers may be interested in is when w + i =w + and w i =w for all i, and in this case, we can rewrite the power function as wETD =E " m X i=1 I i =1 I i =1 + w + w I i =+1 I i =+1 # ; (1.2.11) 8 where w + w > 1 indicates a preference on the positive selections and w + w < 1 indicates a preference on the negative selections. When w + w = 1, i.e. w i =w + i = 1 for all i, we recover the power function dened in equation (1.2.8) and this unweighted version has been studied for a few special cases that fall under our proposed general model. The choice of w i 's depends on the selection preferences of the researchers in the context of the application, and it needs to be independent of the observations and the underlying truth. Depending on how the weights are proposed, one of the representations of wETD in equation (1.2.9) and (1.2.10) might be more intuitive to work with but they are fundamentally the same. In Rosenthal (1986), a system of weighting based on quality of studies was explored. Furthermore, when the index i represents temporal or spatial location, selection preferences can be based on the index as researchers sometimes have varying preferences on selection of dierent direction in dierent regions, for example, activation or deactivation in dierent regions of interest (ROIs) in fMRI analysis. There is an additional constraint on the choice of the weights in order for our proposed procedure to be consistent, and we will discuss it later when it comes up in methodology section. Finally, the goal of our weighted selective inference problem on direction with condence is to nd a decision rule which maximizes the wETD subject to the constraint FSR; (1.2.12) for some 2 (0; 1). 1.3 Comparison and Connection to Existing Approaches In this section, we will brie y discuss how a multiple testing problem with directional decisions similar to ours has been formulated and approached in existing literature, along with any attempts made to incorporate weights into the problem. Traditionally, directional decision problem has often been formulated as a directional 9 two-sided test (Kaiser, 1960), and in the context of multiple comparisons, it becomes a multiple hypotheses problem with directional alternatives H i 0 : i = 0 vs. H i : i < 0 or H i + : i > 0; i = 1;:::;m: (1.3.1) Such formulation was used in Sarkar and Zhou (2008) and Bansal and Sheng (2010). In this setting, directional decisions are made only when \statistical signicance" is attained, i.e. the null hypotheses are rejected. However, as Jones and Tukey (2000) pointed out, in many situations, researchers and statisticians may believe a point null hypothesis is never true. Therefore, such formulation can be unrealistic and misleading. Alternatively, Shaer (1972), Shaer (1974) and Jones and Tukey (2000) recommended formulating the directional decision problem as a simultaneous test of two directional hypotheses H 0 : i 0 vs. H a : i > 0 and H 0 : i 0 vs. H a : i < 0; (1.3.2) with rejection of one of them giving an automatic directional conclusion. Such formulation can avoid the debate around whether the point null in (1.3.1) is realistic or not. In the context of multiple comparisons, there would be a total of 2m one-sided hypotheses for the m directional decisions to be made. Paired one-sided tests as in (1.3.2) have occasionally been used in the neuro-imaging eld but such handling often turned out to be a statistical malpractice when being used in the traditional frequentist framework without additional multiplicity correction, doubling the error rate as a result. Such mishandling has been discussed in Chen et al. (2019a) and Chen et al. (2019b). It has been shown that the two-sided, equal-tails, null hypothesis test (1.3.1) at level is equivalent to the simultaneous test of 2 directional inequality hypotheses (1.3.2) each at =2. Regardless of which formulation to use, the original FDR-controlling method introduced in Benjamini and Hochberg (1995) can be suitably augmented to make directional decision, and when all 10 the i 's are nonzero, the maximum (pure) directional FDR is =2, whereas when some of the i 's can be zero, the mixed directional FDR will not exceed as pointed out by (Shaer, 2002; Benjamini and Yekutieli, 2005). In recent years, the focus has been shifted to applying a decision theoretic approach to multiple comparisons problem. Having emphasized to be \not exactly a Bayes decision procedure but more of a frequentist notion," Sarkar and Zhou (2008) started with the decision theoretic formulation of simultaneous testing of null hypotheses again two-sided alternatives as in (1.3.1), but constructed an alternative procedure that controls the Bayesian directional false discovery rate (BDFDR), which is equivalent to our FSR (1.2.5), through controlling the posterior directional false discovery rate (PDFDR), PDFDR =E jX P m i=1 I( i i =1) P m i=1 I(j i j = 1)_ 1 : (1.3.3) The procedure was shown to be optimal in maximizing the posterior expectation of the directional per-comparison power rate, which is equivalent to our power function ETD (1.2.8). Their methodology is similar to what we will propose for the unweighted scenario but is only limited to the random eects model, whereas our methodology does not require any assumptions on the underlying truth of the eects. Working also strictly in the context of random eects model excluding the possibility of any zero eect sizes, Lewis and Thayer (2004) developed a Bayes decision procedure that controls the directional FDR by minimizing the Bayes risk for a per-comparison \0-1" loss function in the form of L (;) = m X i=1 fI( i i =1) +I(j i j = 1; i = 0)g; (1.3.4) with =. Connection between multiple testing problem and weighted classication problem, but in non-directional situation, has been formally established by Sun and Cai (2007). While Lewis and Thayer (2004) worked in a sample theory context, Sun and Cai 11 (2007) developed an locally adaptive empirical Bayes procedure that is more powerful than the traditional BH procedure (Benjamini and Hochberg, 1995). We will extend the theory and procedure in Sun and Cai (2007) to our current setting. Later in Lewis and Thayer (2009), as a response to Sarkar and Zhou (2008), they introduced another loss function L (;) = P m i=1 I( i i =1) maxf1;m P m i=1 I(j i j = 1; i = 0)g + 2 P m i=1 I(j i j = 1; i = 0) m ; (1.3.5) that is more directly tied to the (pure directional) FDR, which is the expectation of the rst part in the sum above, and then proposed a Bayes rule that is essentially a Bayesian version of the original multiple comparisons procedure proposed by Benjamini and Hochberg (1995) modied for directional testing by replacing the p-values by the posterior probabilities. In terms of power, the approach proposed by Sarkar and Zhou (2008) has a much better performance from a Bayesian perspective. So far these approaches mentioned here as well as most of the other existing approaches do not take into account the situations where the selection of one direction might be more desirable than the other, nor the situations where directional inferences made for certain eects are more important than others. One attempt was made by Bansal and Miescke (2013), in which they took a Bayesian decision theoretic approach and introduced a more general loss function than the conventional \0-1" loss with the individual loss in the form of L i ( i ; i ) = 8 > > > > > > > > > > < > > > > > > > > > > : 0 if i = i l 0 if i = 0 and i 6= 0 l 0 +l( i ) if i i =1 l 1 +l( i ) if i 6= 0 and i = 0 ; where l 0 and l 1 are some positive constants, and l() 0 is symmetric around 0 and increasing in the magnitude of the actual eect sizejj. However, their proposed Bayes rule does not control any false discovery rates, and to do so requires the additional constraint 12 of posterior (mixed) directional FDR to be placed on. We aim to achieve the same goal of placing varying importance on selections of dierent direction and for dierent eects, but we do so by incorporating weights into the power function, which is a better representation of preference than the loss function used by Bansal and Miescke (2013). 1.4 Methodology In this section, we will present our theoretical and methodological developments for the problem dened previously, maximizing the weighted expected number of true directional decisions (wETD) while controlling the false directional discovery rate (FSR) at level . Before presenting the procedure for the aforementioned problem, we will rst consider the unweighted setting in order to build a methodological foundation for our weighted scenario and also to ll the gap in the existing literature as mentioned before. 1.4.1 Oracle procedure for unweighted setting In this part, we will develop a compound decision theoretic framework for the unweighted version of our selective inference problem, similar to what Lewis and Thayer (2004) and Lewis and Thayer (2009) did for the random eects model, but we take a dierent approach in developing an optimal rule that is similar to what Sarkar and Zhou (2008) and Stephens (2016) proposed. Our theoretic development is along the lines of Sun and Cai (2007), in which the connection between a multiple-testing problem and a weighted classication problem was established. Consider our selective inference problem on directions without weights, i.e. the power function to be maximized is the ETD in equation 1.2.8 instead of the wETD while subjected to the false selection rate (FSR) constraint, it gives rise to a weighted 13 classication problem with loss function L D; (;) = 1 m m X i=1 X d2D I( i 6=d)I( d i = 1) +I( i =d)I( d i = 0) ; (1.4.1) where d2Df1; +1g such that d i 2f0; 1g denotes the decision made with respect to state d for the ith eect, 2 (0; 1) is the relative weight for a missed discovery of state d to a falsely claimed decision of d. When D =f1; +1g, the scenario we will be focused on, the loss function becomes L (;) = 1 m m X i=1 I( i 0)I( 1 i = 1) +I( i =1)I( 1 i = 0) +I( i 0)I( +1 i = 1) +I( i = +1)I( +1 i = 0) ; (1.4.2) and =f i + + i g i2[m] 2f1; 0; +1g m with i + i = 0 as a natural implication instead of an additional constraint and we will explain this point shortly after introducing the oracle rule. This loss function can also t into the context of the simultaneous test of m pairs one-sided hypotheses as in (1.3.2) with 1 i = 1 indicating H 0 : i 0 is rejected and +1 i = 1 indicating H 0 : i 0 is rejected. Notice that this loss function appears to be related to the one (1.3.5) dened by Lewis and Thayer (2009) but is essentially dierent and this formulation will lead to dierent, as in more powerful, decision rule. Having dened the loss function, based the arguments provided by Sun and Cai (2007), it then follows that in the unweighted selective directional inference problem, for any given mFSR level 2 (0; 1=2) in the multiple testing problem, there exists a unique () that denes a weighted classication problem. The posterior risk is R (x;) =E jX; [L (;)] = 1 m m X i=1 h I( 1 i = 1)P ( i 0jx i ; i ) +I( 1 i = 0)P ( i < 0jx i ; i ) +I( +1 i = 1)P ( i 0jx i ; i ) +I( +1 i = 0)P ( i > 0jx i ; i ) i ; (1.4.3) 14 wherex =fx 1 ;:::;x m g and =f 1 ;:::; m g are the paired observations generated according to model (1.2.1). The oracle Bayes rule can then be obtained by minimizing the posterior risk (1.4.3), at the same time the overall risk E[R (X;)] =E X; E jX; [L (;)] is also minimized. Suppose the density of the eects g is known, the oracle decision rule to the aforementioned classication problem will be given by (;) =f 1 ;:::; m g =f 1 + + 1 ;:::; m + + m g, where i =I (x i ; i ) = P ( i 0jx i ; i ) P ( i < 0jx i ; i ) = P ( i 0jx i ; i ) 1P ( i 0jx i ; i ) < ; + i =I + (x i ; i ) = P ( i 0jx i ; i ) P ( i > 0jx i ; i ) = P ( i 0jx i ; i ) 1P ( i 0jx i ; i ) < : (1.4.4) Note that (x i ; i ) and + (x i ; i ) are monotonically increasing in P ( i 0jx i ; i ) and P ( i 0jx i ; i ) respectively, with P ( i 0jx i ; i ) +P ( i 0jx i ; i ) = 1 +P ( i = 0jx i ; i ) 1 if there is a point mass at = 0 and P ( i 0jx i ; i ) +P ( i 0jx i ; i ) = 1 otherwise, which implies we will always have i + i = 0 for < 1=2 as mentioned above. Therefore, instead of making decisions for i 2f1; 0g and + i 2f1; 0g separately, by minimizing the posterior risk (1.4.3), we are essentially making inferences on i 2f1; 0; +1g simultaneously for each i. Borrowing the concept of local false discovery rate (Lfdr) introduced in Efron et al. (2001), Efron and Tibshirani (2002) and Storey (2002), we will dene a directional version of it as Lfsr(x;) = minfLfsr (x;); Lfsr + (x;)g = minfP ( 0jx;);P( 0jx;)g = min ( R 1 0 f(xj;)g()d f X (xj) ; R 0 1 f(xj;)g()d f X (xj) ) = min ( R 1 0 ( x )g()d R 1 1 ( x )g()d ; R 0 1 ( x )g()d R 1 1 ( x )g()d ) ; (1.4.5) where Lfsr and Lfsr + represent the local false selection rates for making negative and 15 positive claim, respectively, and () is the probability density function of the standard normal distribution. Without loss of generality, we assume that is a continuous random variable so we have Lfsr (x;) + Lfsr + (x;) = 1. Additionally, we will dene a set of intermediate decisions as (Lfsr (); Lfsr + ()) =f1 2I(Lfsr (x i ; i )< Lfsr + (x i ; i )) :i = 1;:::;mg; (1.4.6) i.e. i =1 if Lfsr (x i ;)< Lfsr + (x i ;) and i = +1 when the inequality is the other way around. This intermediate decision represents the inference that would be made if a claim is to be made. Then the optimal rule for mFSR control in terms of the Lfsr statistic would be in the form of () =fI(Lfsr(x i ; i )<) :i = 1;:::;mg (Lfsr (); Lfsr + ()): (1.4.7) Let Lfsr(X;) = minfLfsr (X;); Lfsr + (X;)g denote the oracle test statistic. Instead of directly deriving and working with the marginal cdf of this Lfsr statistic which is unrealistic as very often there is not enough information about the underlying model, we will work out an intuitive derivation of an adaptive procedure directly. The marginal false selection rate (mFSR) of the oracle rule in the general unweighted setting is dened as mFSR = E h P m i=1 I( i 0)I( i = +1) +I( i 0)I( i =1) i E 1_ P m i=1 I(j i j = 1) = E X;j0 1 S \S + P ( 0) +E X;j0 1 S \S P ( 0) E 1 S ; (1.4.8) where S =f(x;) : Lfsr (x;)< Lfsr + (x;)g, S + =f(x;) : Lfsr + (x;)< Lfsr (x;)g and S =f(x;) : Lfsr(x;)<g for some threshold 2 (0; 1=2) on the test statistic. It is 16 given by Q OR () = R S \S + f(xj 0;)dxP ( 0) + R S \S f(xj 0;)dxP ( 0) R S f X (xj)dx = R S \S + Lfsr(x;)f X (xj)dx + R S \S Lfsr(x;)f X (xj)dx R S f X (xj)dx = R S Lfsr(x;)f X (xj)dx R S f X (xj)dx = R IfLfsr(x;)<gLfsr(x;)f X (xj)dx R IfLfsr(x;)<gf X (xj)dx : (1.4.9) It can then be adapted to the observations, x i ;:::x m , and estimated by Q OR () P m i=1 IfLfsr(x i ; i )<gLfsr(x i ; i ) P m i=1 IfLfsr(x i ; i )<g : (1.4.10) Note that from above, we see that if directional inference is made for j eects along the ranking of Lfsr, we can expect P j i=1 Lfsr (i) decisions to be of incorrect direction, therefore j 1 P j i=1 Lfsr (i) serves as a good estimate of the mFSR. The following is the adaptive (oracle) procedure for mFSR control in the unweighted setting. Procedure 1.4.1. (Adaptive procedure for mFSR control) Assume x 1 ;:::;x m are a set of observations from model (1.2.1) with the pdf g() and i 's given. (1) Let Lfsr i = min Lfsr (x i ); Lfsr + (x i ) be the test statistic, where Lfsr (x i ; i ); Lfsr + (x i ; i ) =fPr ( i 0jx i ; i ); Pr ( i 0jx i ; i )g = ( R 1 0 i (x i )g()d f X (x i j i ) ; R 0 1 i (x i )g()d f X (x i j i ) ) ; and initialize OR i = 1 2I[Lfsr (x i )< Lfsr + (x i )] as the intermediate decisions for i = 1;:::;m. 17 (2) Rank the intermediate decisions OR i s by increasing Lfsr i , and let k = max ( j : 1 j i X i=1 Lfsr (i) ) if it exists, and 0 otherwise: By setting OR (i) = 0 for i>k, the intermediate decisions OR i s are now the nal oracle decisions. 1.4.2 Oracle procedure for weighted setting So far, by considering the unweighted setting of the selective inference problem on direction with condence, we have made the connection between the directional hypothesis testing framework, which can be problematic to dene as discussed earlier, and the Bayesian decision theory framework. We have also developed an adaptive procedure for mFSR control. Having lled the gap in existing literature and built the foundation for our selective inference problem, we will now consider the weighted setting of maximizing the wETD (1.2.11) subject to the constraint of FSR, which is the main focus of this work Looking back at the unweighted setting, the error rate mFSR of the optimal rule depends on the Lfsr statistic and the power measure ETD depends on 1 Lfsr. In order to dierentiate the relative importance between the directions as well as among the decisions, we choose to incorporate these selection weights into the power measure instead of the error rate since an erroneous decision on the direction is always an error but such risk may be worth it if the potential power gain, when correct, can justify the risk. Our selective inference problem in the weighted setting can then be viewed as a Fractional Knapsack problem of three knapsacks: bags 1 and 2 are marked as and + representing inference made for negative and positive direction respectively, and the waste bag is marked as 0 for indecision. The waste bag does not contribute to total prot and has unlimited capacity, whereas bags 1 and 2 have limited total capacity and asymmetric prot, meaning an object, i.e. an eect, being assigned to one bag can contribute more prot than assigned to 18 the other one. In the context of our directional selective inference problem, the total capacity constraint is analogous to the FSR constraint, the prot is to the power, and the bags are to the directional claims. The asymmetric prot can be interpreted in relation to the weighted power (1.2.10) in which the directionless part of the weights w i 's are the prots carried by the objects and the directional part of the weights w d i 's are the additional prot contributed by the bag marked as d = or +. The goal is to pack the eects into the three bags so that the total prot contributed by these decisions is maximized without exceeding the total capacity. To solve this problem, similar to the objects in a Fractional Knapsack problem, the eects can be ranked by the value-to-weight ratio, which we will call the Prot-to-Cost Ratio (PCR) in order to avoid confusion with our selection weights. To exhaust the FSR level, we allow a randomized decision rule. The tricky part about this ranking by Prot-to-Cost Ratio approach is that one eect already assigned to one non-waste bag might need to be taken out and moved to the other non-waste bag if it could contribute more total prot. We will need to allow such re-assignment in our procedure. Next we will introduce our adaptive procedure for the directional selective inference problem as dened in (1.2.12) followed by a description of the procedure. Without loss of generality, we assume the eects i 's come from a continuous distribution. Procedure 1.4.2. (Adaptive procedure for wETD maximization with mFSR control) Assume x 1 ;:::;x m are a set of observations from model (1.2.1) with the probability density function g() provided by an oracle,f i g i2[m] and the two sets of selection weights, fw + i g i2[m] andfw i g i2[m] , are also given. Then the oracle decision rule forf OR 1 ;:::; OR m g is as follows: (1) Let Lfsr i = min Lfsr i ; Lfsr + i , where Lfsr i ; Lfsr + i =fP ( i 0jx i ; i ); P ( i 0jx i ; i )g = ( R 1 0 i (x i )g()d f X (x i j i ) ; R 0 1 i (x i )g()d f X (x i j i ) ) : 19 For i2A :=fi : Lfsr i g, let OR i = 1 2I(Lfsr i < Lfsr + i ) and then denote the available capacity by C := P i2A ( Lfsr i ). (2) For i2B :=fi : Lfsr i >g, compute the Prot-to-Cost Ratios that are dened as PCR i ; PCR + i = w i 1 Lfsr i Lfsr i ; w + i 1 Lfsr + i Lfsr + i : Let S i = argmax s2f;+g PCR s i , and if PCR i = PCR + i , let S i = argmax s2f;+g Lfsr s i . Dene a set of new ranking statistic as R S i i ; R S i i = PCR S i i ; w S i i (1 Lfsr S i i )w S i i (1 Lfsr S i i ) j Lfsr i Lfsr + i j : (In the case that Lfsr i = Lfsr + i , dene R S i i = 0.) (3) Rank the positive elements R2 R S i i ; R S i i i2B in decreasing order and let R (r) denote the r-th largest element, s(r)2f; +g and I(r)2B denote the direction and the index that R (r) is associated with, respectively. Initialize the decision OR i = 0 for i2B. Choose k = max n j :C j = j X r=1 h Lfsr s(r) I(r) I s(r) =S I(r) + Lfsr s(r) I(r) Lfsr s(r) I(r) I s(r)6=S I(r) i C o and sequentially let OR I(r) =s(r) for r = 1;:::;k if such k exists; otherwise, let k=0. (4) Randomization is utilized for deciding OR I(k+1) . Let U be a Uniform (0,1) random variable independent of the observations as well as the underlying truth. Then • if s(k + 1) =S I(k+1) , let OR I(k+1) =s(k + 1)I U < CC k Lfsr s(k+1) I(k+1) ; • if s(k + 1)6=S I(k+1) , change the selection to the opposite direction, i.e. OR I(k+1) =s(k + 1), in the case U < CC k Lfsr s(k+1) I(k+1) Lfsr s(k+1) I(k+1) . 20 In step (1), we computed the test statistic Lfsr's, made inferences on those with Lfsr i by selecting the direction associated with it and calculated the available capacity remaining. For those that had not been considered, i.e. the setB, we computed the Prot-to-Cost Ratios PCR i ; PCR + i i2B in step (2) and a set of new ranking statistic R i ; R + i i2B was constructed based on these Prot-to-Cost Ratios, allowing a direction previously selected to be reconsidered and changed if it could contribute more prot overall. In step (3), the positive elements in R S i i ; R S i i i2B were ranked in decreasing order and directions were selected while going down the list until the capacity is (nearly) reached. Notice that by design, the direction with the higher PCR value would be considered rst and selected given enough capacity, and then at a later point, the opposite direction might need to be considered since selecting this direction, although might not add more prot per unit cost, can increase the total prot given enough capacity, which is why we have to construct the new ranking statistic. Essentially, max PCR i ; PCR + i = max R i ;R + i is used to determine whether a directional inference should be made or not as well as the potential direction to be selected, and then min R i ;R + i is used to determine whether the opposite direction would contribute more total prot and should be selected instead. Note that by construction of the ranking statistic, we have the following proposition, which will be needed in proving the optimality of our oracle procedure. Proposition 1.4.1. By construction of the ranking statistic R i and R + i , for each i, the order between PCR i and PCR + i is preserved in between R i and R + i , i.e. R S i i := PCR S i i PCR S i i >R S i i , where S i is as dened in step 2 of the oracle procedure. The last inequality is an expected outcome for if making a selection of certain direction for another index has the same PCR as this second occurrence, then that selection should be considered rst for it does not require reversing a previously made selection carrying a higher PCR. In the case that PCR i = PCR + i , the procedure would select the direction that has the higher Lfsr. In order to achieve exact FSR level control and wETD optimality, we made use of randomization in step (4) on the last potential non-zero 21 direction if no inference had been made for that index, or on the last possible modication if a selection had been made earlier in step (3). If we let t =R (k+1) , then the oracle decision rule has the following implication: OR i = 8 > > > > > > < > > > > > > : 1 =) w i (1 Lfsr i )t (Lfsr i ); +1 =) w + i (1 Lfsr + i )t (Lfsr + i ); 0 =) w i (1 Lfsr i )<t (Lfsr i ) and w + i (1 Lfsr + i )<t (Lfsr + i ) (1.4.11) for all i. Keep in mind that the implications are only necessary but not sucient. Note that randomization is only utilized for theoretical consideration in order to force the FSR to be at exactly level so that the optimal power can be eectively characterized. Since only one directional decision OR (k+1) is randomized, which has a negligible eect in a large-scale setting, we will not utilize this randomized rule for the data-driven procedure which will be introduced later. Next we will present the theorem on the validity and optimality of the proposed oracle procedure. Theorem 1.4.2. Assume 1 < w + i w i < 1 for some 2 (0; 1=2), the oracle procedure OR proposed in Procedure 1.4.2 controls the mFSR at level and is optimal in the sense that if is any other decision rule based on (x x x; ) with level mFSR control, then we would have wETD OR wETD . 1.4.3 Data-driven procedure and estimation In Procedure 1.4.2, we assumed that there is an oracle who knows the probability density function g() of true eect sizes i 's, but in practice, in order to apply our proposed procedure, we would need to have an relatively accurate estimate, call itb g , based on the set of observations, x i ;:::;x m and i ;:::; m , generated from model (1.2.1). What we have is a nonparametric deconvolution problem, and since we do not make any assumptions a priori regarding the type of distribution g might be, we propose to start with a discrete 22 approximation of this unknown probability distribution in the form of b g = n X i=0 i s+i (); (1.4.12) where () is the delta function, s =b minfx i gc, n() = 1 (d maxfx i geb minfx i gc) so that there are a total number of n+1 evenly spaced grid points to cover the domain of the random variable , which is taken to be betweenb minfx i gc andd maxfx i ge, with being the xed size between two adjacent grid points, and lastly, i 's are the parameters to be estimated by minimizing the divergence between the true density g() and estimated densityb g(). The choice of n and the eect it has on the performance of the data-driven procedure will be discussed shortly after. Since j 's are the latent variables that we have no knowledge or observations of, and what we do have are the set of observations X j 's generated from the marginal conditional density f(xj j ) along with j 's assumed to be either known or can be estimated well, it is therefore natural to shift our focus to f(xj) and b f(xj) with f (x) =f(xj) = Z 1 1 (x)g()d = Z 1 1 x g()d; and b f (x) = b f(xj) = Z 1 1 (x)b g()d = n X i=0 i (xsi); (1.4.13) where () is the density function for a standard normal variable. Furthermore, let us temporarily shift our attention to the joint distributions f(x;) and b f(x;) for can be a random variable itself. The dissimilarity between these two probability distributions can be quantied by a divergence measure such as the Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951). The KL divergence belongs to the family of f-divergence between density functions, which was introduced by Csisz ar (1967) and encompasses the total variation and the Hellinger distances, and also to the family of density power divergences introduced by Basu et al. (1998) that encompasses the L2 distance. The KL divergence 23 between two densities P and Q, where Q<. Whereas when min Lfsr i ; Lfsr + i , we would automatically select the direction associated with the smaller Lfsr as it would add prot without incurring any cost, making the concept of PCR irrelevant and inappropriate here as it would be negative and ranked in the back. Moreover, the PCR and the ranking statistic R can be unbounded by denition. To make the data-driven procedure more compact, we will redene the ranking statistic based on the old ones so that the two dierent sets of selection rules can be combined into one 26 without altering the selection process of the oracle procedure. For i = 1;:::;m, let R S i i := Lfsr S i i w S i i 1 Lfsr S i i +j Lfsr S i i j = 1 w S i i (1Lfsr S i i ) Lfsr S i i + sgn Lfsr S i i if Lfsr S i i 6=; and R S i i := jLfsr i Lfsr + i j max w S i i (1 Lfsr S i i )w S i i (1 Lfsr S i i ); 0 +jLfsr i Lfsr + i j = 1 max w S i i (1Lfsr S i i )w S i i (1Lfsr S i i ) jLfsr i Lfsr + i j ; 0 + 1 if Lfsr i 6= Lfsr + i ; with S i 2f; +g such that Lfsr S i i w S i i 1Lfsr S i i < Lfsr S i i w S i i 1Lfsr S i i or Lfsr S i i > Lfsr S i i when equality happens. We can see that these ranking statistic are closely, inversely to be specic, related to the ranking statistic used in the oracle procedure (1.4.2), but unlike the old ones, these ranking statistic are applicable for all i inf1;:::;mg and should be ranked in increasing order. Note that we have R S i i 2 (1; 1], andR S i i 0 if and only if Lfsr S i i , in which case the corresponding direction S i should be selected instead. On the other hand, R S i i 2 (0; 1], and R S i i = 1 if and only if w S i i (1 Lfsr S i i )w S i i (1 Lfsr S i i ) 0, which is true by denition when Lfsr i = Lfsr + i or Lfsr S i i , and it means the directionS i should never be considered as it loses to S i in terms of prot per unit cost as well as total prot it could contribute. Moreover, note that R i and R + i are continuous and bounded in the interval [0; 1]. Based on the oracle procedure, we will formulate our data-driven procedure using the modied ranking statistic as follows. Procedure 1.4.3. (Data-driven procedure for wETD maximization with mFSR control) Assume x 1 ;:::;x m are a set of observations from model (1.2.1) with some unknown probability density function g(), and i 's are known or can be estimated well. The two sets of selection weights,fw + i g i2[m] andfw i g i2[m] , satisfying the condition w i ;w + i 2 1 ; 1 for < 1=2 have been pre-determined based on researchers' preferences. 27 (1) Compute the test statistic for i =f1;:::;mg: n d Lfsr i ; d Lfsr + i o = ( P n j:s+j=0 j (xsj) P n j=0 j (xsj) ; P j:s+j=0 j=0 j (xsj) P n j=0 j (xsj) ) ; where is the standard normal density function, s = minfx 1 ;:::;x m g, = 1 n (d maxfx i geb minfx i gc) and the parameters 0 ;:::; n are estimated from the optimization problem: max 0 ;:::;n m X i=0 " log n X j=0 j i (x i sj) # subject to n X j=0 j = 1; and j 0 8j: (2) Compute the ranking statistic for i = 1;:::;m as b R S i i = d Lfsr S i i w S i i (1 d Lfsr S i i )+j d Lfsr S i i j ; b R S i i = j d Lfsr i d Lfsr + i j max n w S i i (1 d Lfsr S i i )w S i i (1 d Lfsr S i i ); 0 o +j d Lfsr i d Lfsr + i j ; where S i = argmin s2f;+g d Lfsr s i w s i (1 d Lfsr s i ) ; if equality happens, let S i = argmax s2f;+g Lfsr s i . (3) Rank the elements b R2 n b R S i i ; b R S i i o m i=1 for b R< 1 in increasing order, and let b R (r) denote the r-th smallest element in the set, with s(r)2f; +g and I(r)2f1;:::;mg denoting the direction and the index that b R (r) is associated with, respectively. Choose k = max n j : j X r=1 h d Lfsr s(r) I(r) I s(r) =S I(r) + d Lfsr s(r) I(r) d Lfsr s(r) I(r) I s(r)6=S I(r) i o : If such k exists, then sequentially let DD I(r) =s(r) for r = 1;:::;k and DD I(r) = 0 for all other unmade decisions; otherwise, let DD i = 0 for i = 1;:::;m. We can see that the data-driven decision rule from the above procedure belongs to the 28 following class of decision rules(t) =f i (t) :i = 1;:::;mg for some t2 [0; 1), where i (t) = 8 > > > > > > < > > > > > > : 1 if b R i t< b R + i or b R + i < b R i t 0 if min n b R i ; b R + i o >t +1 if b R + i t< b R i or b R i < b R + i t: With the theoretical property of the estimator already established previously in Proposition 1.4.3, we are now ready to present the main result on the asymptotic validity and attainment of the proposed data-driven procedure. Theorem 1.4.4. The data-driven procedure described in Procedure 1.4.3 controls the FSR at level +o(1) and wETD DD=wETD OR > 1 +o(1) under the condition that the selection weights used are independent of the observations. 1.5 Simulation Studies In this section, we will examine the numerical performance of our proposed data-driven procedure in unweighted as well as weighted settings. We used the R package CVXR (Fu et al., 2021) to formulate the convex optimization problem described in (1.4.17) with the solver Rmosek (MOSEK ApS, 2021). We will rst focus on the unweighted settings and compare the performance of our data-driven procedure, mainly the nonparametric deconvolution method for estimating the distribution of true eectsb g(), with that of the method proposed in Stephens (2016), which will be referred to as the NewDeal procedure, along with the modied BH procedure assuming all eect sizes are zero under the null. The NewDeal procedure has been implemented by its authors and made available to the public as the R package ashr (Stephens et al., 2020). After presenting the numerical performance of our proposed deconvolution method in the unweighted settings, we will oer a brief discussion on the advantages of our method as compared to the one proposed in Stephens (2016) and then move on to the weighted setting to demonstrate how the 29 weighted procedure can aect the directional decisions. For each setting, the number of observations ism = 5000, each simulation is ran over 100 repetitions, and the grid size used in our proposed deconvolution method is 0.5. The nominal FSR is xed at = 0:05 for unweighted settings and at = 0:1 for the weighted setting. For the power measure in the unweighted settings, we will use the expected true proportion ETP, which is dened as the ratio of expected true directional decision ETD and the number of nonzero eects, so that the power measure is standardized between 0 and 1. 1.5.1 Unweighted settings In the rst setting, we will consider the random eects model that has been studied by Lewis and Thayer (2004), Sarkar and Zhou (2008) and Lewis and Thayer (2009) for the similar problem in regards to multiple directional decisions but with dierent formulations. In this scenario, we have X i j i N( i ; 1) and i j 2 N(0; 2 ); i = 1;:::;m; with varies from 0.1 to 3. The performance of the three procedures is shown in Figure 1.1. Next we will consider a scenario in which the true i 's come from an asymmetric a) Error Comparison τ FSR Proposed NewDeal BH alpha level 0.1 0.5 1 1.5 2 2.5 3 0 0.1 0.2 0.3 b) Power Comparison τ ETP 0.1 0.5 1 1.5 2 2.5 3 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 1: μ i ~ N(0,τ 2 ) for varying τ Figure 1.1: Unweighted setting i N(; 1) where N(0; 2 ) 30 distribution, X i j i N( i ; 1) and g() =p + + ()I(> 0) +p ()I(< 0); where + () and () are the truncated N(0; 1) densities over (0;1) and (1; 0), respectively, and p + varies from 0.1 to 0.9 with p = 1p + . When p + =p = 0:5, this setting becomes the random eects model with = 1. The performance of the three procedures in this setting is shown in Figure 1.2. a) Error Comparison p + FSR Proposed NewDeal BH alpha level 0.1 0.3 0.5 0.7 0.9 0 0.1 0.2 0.3 b) Power Comparison p + ETP 0.1 0.3 0.5 0.7 0.9 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 2: μ i ~ p + LT_N(0,1)+(1 - p + )RT_N(0,1) for varying p + Figure 1.2: Unweighted setting i p + LT N(0; 1) + (1p + )RT N(0; 1) The rst two settings all have homoskedastic error, so next we will present a setting with heteroskedastic error such that X i j i N( i ; i ) with i 0:5N(1; 1) + 0:5N(1; 1) and i 0:5 1 + 0:5 s ; where s varies from 1 to 3. In other words, for this setting, the true eects i 's come from a mixture of the two normal distributions, and half of the observations have standard error of 1 and the other half have standard error of s. The performance of the three procedures in this setting is shown in Figure 1.3. 31 a) Error Comparison s FSR Proposed NewDeal BH alpha level 1 1.5 2 2.5 3 0 0.1 0.2 0.3 b) Power Comparison s ETP 1 1.5 2 2.5 3 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 3: μ i ~ 0.5N(−1,1)+0.5N(1,1) and σ i ~ 0.5δ 1 +0.5δ s for varying s Figure 1.3: Unweighted setting with heteroskedastic error 1.5.2 Discussion on proposed algorithm So far we have only considered the settings in which the eect sizes are certain to be nonzero, i.e. i 6= 0, and this is the scenario we had in mind when formulating the problem but did not want to be restricted to it. In such scenario, our procedure has the best performance when compared to the other two procedures. The NewDeal procedure is more conservative because it was designed under the assumption that the distribution of the true eects i 's is unimodal about 0, so a point mass is assigned to it. Furthermore, it includes a penalty term in estimating the proportion of zero eects in order to yield a conservative estimate for the point mass at 0. Therefore, when there is no mass point at 0 for i , the NewDeal procedure is overly conservative. Even though we believe such scenario is more realistic, our proposed procedure does not depend on this assumption and can perform well regardless. Shown in Figure 1.7 that is placed in Section 1.10, are the simulation results of another three unweighted settings in which there is a point mass at 0. In Setting 4.1, the true eects i 's come from a mixture distribution of a point mass at 0 and a normal N(2,1) such that X i j i N( i ; 1) and i (1p) 0 +pN(2; 1): 32 We see that as the proportion of zero eects decreases to around 0.1, the NewDeal procedure fails to control the FSR. In Setting 4.2, we x the non-zero proportion at 0.9 with X i j i N( i ; 1) and i 0:1 0 + 0:9N(; 1); and vary the mean of the normal distribution from 0 to -2.5. We see that the FSR exceeds the level when the mean is beyond -2. Note that when the normal distribution component is N(0,1), it would make the distribution of i 's unimodal as assumed by the NewDeal procedure, but their procedure is still too conservative when compared to our proposed procedure. And in unweighted setting 4.3, we placed two point masses at 0 and 2 with varying proportion p to have X i j i N( i ; 1) and i (1p) 0 +p 2 : In this setting, the NewDeal procedure once again fails to control FSR, which can be as large as three times the level. The FSR violation of the NewDeal procedure is due to the underestimate in the zero eects proportion despite having a penalty term to ensure consistently conservative estimate, and this FSR violation is most likely due to the assumption behind the NewDeal procedure being violated when the distribution of true eects is not unimodal about 0. Under these settings, our proposed procedure, on the other hand, is still able to control the FSR without substantial violation since our deconvolution method was designed without any assumptions so it can handle both scenarios. Lastly for the unweighted procedure, we will consider a setting that ts perfectly into the NewDeal framework for estimatingb g() with X i j i N( i ; 1) and i (1p) 0 +pU(2; 3); and the simulation results are shown in Figure 1.4. We see that the NewDeal procedure is 33 able to control FSR almost perfectly at level, and the performance of our procedure is comparable to it although it slightly exceeds the level. Overall, our proposed unweighted procedure, more specically the nonparametric deconvolution procedure for estimating b g(), is more exible and has better performance when compared to the NewDeal procedure. a) Error Comparison p FSR Proposed NewDeal BH alpha level 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 b) Power Comparison p ETP 0.2 0.4 0.6 0.8 1 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 5: μ i ~ (1−p)δ 0 +pU(−2,3) for varying p Figure 1.4: Unweighted setting unimodal about 0 designed for the NewDeal procedure 1.5.3 Weighted settings We are now ready to implement our proposed weighted procedure. We will focus on the simple weighting scheme with only one weight w for the m inferences. This weight is used to dierentiate the importance of discoveries made in the two directions. Without loss of generality, we place this weight on the positive signals with w> 1 implying a preference on the discovery of positive directions, whereas 0<w< 1 implies a preference on the negative directions. We will vary the value of this weight w in its allowable range 1 ; 1 for = 0:1< 0:5, and we will observe how this weight aects the directional decisions that will be made. When w = 1, it becomes the unweighted setting. For the error measure, other than the false selection rate FSR that needs to be controlled, we will also include the false selection rate among the positive decisions FSR + and among the negative decisions FSR so we could see the eects of this weight more directly. Since the denition of our weighted 34 power wETD involves the arbitrary weight w, in order to standardize this power measure for dierent values of w, we will rst calculate this wETD for weighted decision rule as well as for the unweighted decision rule, and then take the ratio of these two so that a ratio greater than 1 indicates our weighted procedure is indeed more powerful than the unweighted procedure based on the denition of wETD. Additionally, we will also include the standardized version of ETD as well as ETD + and ETD . These additional error and power measures are included to assess the trade-os of placing such a weight. Let us consider the symmetric setting where true eects i 's come from a mixture of two normal distributions N(2; 1) and N(2; 1), X i j i N( i ; 1) and i 0:5N(2; 1) + 0:5N(2; 1); By observing the ETD + and ETD in Figure 1.5, we see that as the weight w increases Error Analysis weight on positive signals FSR 0.11 0.14 0.2 0.33 1 2 3 4 5 6 7 8 9 0.0 0.1 0.2 0.3 FSR FSR+ FSR− alpha level Power Analysis weight on positive signals power ratio 0.11 0.14 0.2 0.33 1 2 3 4 5 6 7 8 9 0.6 0.8 1.0 1.2 1.4 wETDD/wETDD(unweightedProc) ETDD ETDD+ ETDD− Weighted Setting 1: µ i ~ 0.5N(−2,1)+0.5N(2,1) for varying weights placed on positive signals Figure 1.5: Weighted setting 1 from 1, more positive signals are discovered, and as w decreases from 1, more negative signals are discovered. These additional discoveries in the preferred direction are made at the expense of those in the other direction as well as the total number of discoveries, and there are also more false discoveries in the preferred direction as shown by FSR + and 35 FSR + . Overall, the FSR is below the level as required and mostly importantly, the weighted procedure is more powerful than the unweighted procedure according to the denition wETD with the pre-specied weight w. To further examine the role that the weight w plays, we will next consider two asymmetric scenarios in which there are more negative eects than positive ones, X i j i N( i ; 1) and i 0:1 0 + 0:9N( 0 ; 1); where we choose 0 =1 in one setting and 0 =2 in the other setting. Again, we will vary the value of the weight w that is placed on the positive signals within its allowable range, and the simulation results are shown in Figure 1.6. We see that the FSR is controlled at the nominal level = 0:1 but the FSR + exceeded the level as the weight placed on positive signals increases in weighted setting 2.1, and this problem becomes exacerbated in weighted setting 2.2 when there are much less positive signals. As the weight w increases to its maximum allowed value, more than 40% of the positive directional decisions are false. Moreover, in such extreme setting where most of the eects are non-positive, the weighted power wETD of our weighted procedure using larger weight has little improvement. Therefore, placing too large of a weight on a desired direction might not provide more power gain, and as a result, a moderate weight is recommended. At the end, we will brie y demonstrate the eects of implementing a more complicated weighting scheme in which the wights placed on positive and negatives signals are not all the same across the m =10,000 tests by splitting them into 4 blocks assuming the indices are based on temporal or spatial location. We will consider a simple setting in which half of i 's are -1 and the other half are 1, and implement four dierent weighting schemes. The detailed outcome of the dierent schemes and the eects on directional decisions in dierent blocks are shown in Table 1.3 located in Section 1.10. In order to make the pattern more apparent, we used the maximum value for the directional weight although in reality such practice is neither necessary nor recommended. Table 1.2 shows that for each 36 Error Analysis weight on positive signals FSR 0.11 0.14 0.2 0.33 1 2 3 4 5 6 7 8 9 0.0 0.1 0.2 0.3 0.4 0.5 FSR FSR+ FSR− alpha level Power Analysis weight on positive signals power ratio 0.11 0.14 0.2 0.33 1 2 3 4 5 6 7 8 9 0.0 0.5 1.0 1.5 wETDD/wETDD(unweightedProc) ETDD ETDD+ ETDD− Weighted Setting 2.1: µ i ~ 0.1δ 0 (·)+0.9N(−1,1) for varying weights placed on positive signals Error Analysis weight on positive signals FSR 0.11 0.14 0.2 0.33 1 2 3 4 5 6 7 8 9 0.0 0.1 0.2 0.3 0.4 0.5 FSR FSR+ FSR− alpha level Power Analysis weight on positive signals power ratio 0.11 0.14 0.2 0.33 1 2 3 4 5 6 7 8 9 0.0 0.5 1.0 1.5 wETDD/wETDD(unweightedProc) ETDD ETDD+ ETDD− Weighted Setting 2.2: µ i ~ 0.1δ 0 (·)+0.9N(−2,1) for varying weights placed on positive signals Figure 1.6: Weighted setting 2 weighting scheme, as expected, FSR is controlled at = 0:1 and the weighted procedure is more powerful in terms of the weighted version of power wETD. Table 1.2: FSR and power for varying weighting schemes Weighting scheme I II III IV FSR with = 10% 0.0966 0.0965 0.0967 0.0966 wETD(weightedProc) 48367.58 32367.10 36155.48 43483.76 wETD(unweightedProc) 37619.10 28665.75 31335.18 35784.23 37 1.6 Conclusions The main purpose of this work is to introduce a statistical methodology and develop the relevant theories for a weighted selective inference problem with condent directions. The preference component is incorporated into the problem as weights in the power function. In terms of multiple hypotheses testing, the unweighted version of our problem is similar to a multiple comparisons problem with directional decisions, and it can also be put into a decision theoretic framework. Our proposed oracle decision was shown to be optimal in maximizing the power function dened as the weighted expected number of true directional decisions while controlling the false selection rate at any specied level that is reasonable. The proposed deconvolution method needed to practically apply the proposed procedure has been proved to provide a asymptotically consistent density estimator for the underlying distribution of the eect sizes. Simulation studies showed the proposed procedure when driven completely by the data can outperform existing procedures and has wider applications as we make no assumption on the underlying truth except for the noise term. Lastly, it is important to keep in mind that the weights used as part of the procedure are selection weights representing the preference and need be chosen independent of the observations as well as the underlying truth. 1.7 Proofs of Theorems 1.7.1 Proof of Theorem 1.4.2 The two objectives of this proof are to show (a) the oracle procedure (1.4.2) is valid in the sense that mFSR OR and (b) the oracle procedure is optimal such that ETD OR ETD for any other decision rule with FDR . Part (a): Validity 38 To show that the proposed oracle procedure controls the mFSR at level , i.e. mFSR OR : = E U;X; E jX; P m i=1 I( i 0)I( OR i =1) +I( i 0)I( OR i = +1) E U;X; P m i=1 I( OR i 6= 0) = E U;X; P m i=1 Lfsr I( OR i =1) + Lfsr + I( OR i = +1) E U;X; P m i=1 I( OR i 6= 0) =; we only need to show E U;X; m X i=1 (Lfsr i )I( OR i =1) + (Lfsr + i )I( OR i = +1) = 0: (1.7.1) Based on the construction of the oracle procedure, we have LHS(1.7.1) = X fi:Lfsr s i g (Lfsr s i ) + k X i=1 Lfsr s(i) I(i) + Lfsr s(k+1) I(k+1) I U <c 1 ; s(k + 1) =S I(k+1) + Lfsr s(k+1) I(k+1) Lfsr s(k+1) I(k+1) I U <c 2 ; s(k + 1)6=S I(k+1) ; where k is the cuto point determined in step (3), c 1 = CC k Lfsr s(k+1) I(k+1) and c 1 = CC k Lfsr s(k+1) I(k+1) Lfsr s(k+1) I(k+1) with CC k being the remaining available capacity after the cuto point. Since U is a Uniform (0,1) random variable independent of the observations as well as the underlying truth, it follows that LHS(1.7.1) = 0 and we have proved that our oracle procedure controls the mFSR exactly at level when randomization is utilized in the oracle decision rule. Then result in terms of FSR would follow from the asymptotic equivalence of FDR and mFDR, which was shown in Cai et al. (2019). Part (b): Optimality. 39 Let 2f1; 0; +1g m be an arbitrary decision rule such that FSR( ), i.e. E X; m X i=1 (Lfsr i ) Pr( i =1jX;) + (Lfsr + i ) Pr( i = +1jX;) 0; (1.7.2) and here we have assumed randomization based on the observations is also employed by this arbitrary decision rule in making the directional decisions. Consider the following sets: I = i :I( OR i =1) Pr( i =1jX;)> 0 fi : OR i =1g; I + = i :I( OR i = +1) Pr( i = +1jX;)> 0 fi : OR i = +1g; I 0 = n i :I(j OR i j = 1) min P( i =1jX;);P( i = +1jX;) < 0 o fi : OR i = 0g; and let I = I( OR i =1) Pr( i =1jX;) w i (1 Lfsr i )t (Lfsr i ) II = I( OR i = +1) Pr( i = +1jX;) w + i (1 Lfsr + i )t (Lfsr + i ) : Combining with Equation (1.4.11), we see that P i2I I 0, P i2I + II 0 and P i2I 0 (I + II) 0. Moreover, by the construction of the oracle procedure along with Proposition 1.4.1, we know there exists a t > 0 such that for i2I , one of the following scenarios must be true: (1) Lfsr i 0, (2) R i t >R + i , or (3) R + i >R i t . Let us consider each scenario separately. (1) If Lfsr i 0, then 1 Lfsr i 1, and since Lfsr i + Lfsr + i 1 by denition, it implies that 1 Lfsr + i Lfsr i and Lfsr + i for < 1=2. Under the given condition 1 w + i w i 1 and noting that t > 0, we have w i (1Lfsr i )t (Lfsr i )w i (1Lfsr i )w + i 1 (1Lfsr i )w + i (1Lfsr + i ) w + i (1 Lfsr + i )t (Lfsr + i ): 40 (2) If R + i <t R i := PCR i , note that by construction of this ranking statistic, we have w + i 1 Lfsr + i w i 1 Lfsr i <t jLfsr + i Lfsr i j; and recall that by Proposition 1.4.1, R + i < PCR + i PCR i . Therefore, if Lfsr + i > Lfsr i or PCR + i t , it is straightforward to see w i (1 Lfsr i )t (Lfsr i )w + i (1 Lfsr + i )t (Lfsr + i ): In the case that Lfsr + i < Lfsr i and PCR + i t , we have w i 1 Lfsr i t Lfsr i = PCR i Lfsr i t Lfsr i PCR + i Lfsr i t Lfsr i =w + i 1 Lfsr + i Lfsr i Lfsr + i t Lfsr i =w + i 1 Lfsr + i t Lfsr + i +w + i 1 Lfsr + i Lfsr i Lfsr + i Lfsr + i t Lfsr i Lfsr + i w + i 1 Lfsr + i t Lfsr + i +t Lfsr i Lfsr + i t Lfsr i Lfsr + i =w + i 1 Lfsr + i t Lfsr + i : (3) If R i t , from the denition of R i , it is again easy to see that w i (1 Lfsr i )t (Lfsr i )w + i (1 Lfsr + i )t (Lfsr + i ): We can now conclude that for i2I such that one of scenarios (1) (3) is true, we have I + II = 1 Pr( i =1jX;) w i (1 Lfsr i )t (Lfsr i ) Pr( i = +1jX;) w + i (1 Lfsr + i )t (Lfsr + i ) 1 Pr( i =1jX;) Pr( i = +1jX;) w i (1 Lfsr i )t (Lfsr i ) 0: 41 It follows that P i2I (I + II) 0, and we can similarly show P i2I + (I + II) 0, so we have X i2I [I + [I 0 I( OR i =1) Pr( i =1jX;) w i (1 Lfsr i )t (Lfsr i ) + I( OR i = +1) Pr( i = +1jX;) w + i (1 Lfsr + i )t (Lfsr + i ) 0: Note that OR i 's are perfectly determined byX and except for OR (k+1) , which is dened as a random variable such that Pr OR I(k+1) =s(k + 1) > 0 i R s(k+1) I(k+1) =t . Without loss of generality, assume s(k + 1) = and let i denote the index I(k + 1). There are two possibilities, either (1) PCR i :=R i =t PCR + i or (2) PCR + i > PCR i R i =t > 0. If (1) is true, P( OR i =1jX;)P( i =1jX;) w i (1 Lfsr i )t (Lfsr i ) | {z } =0 + h P( OR i = +1jX;) | {z } =0 P( i = +1jX;) i w + i (1 Lfsr + i )t (Lfsr + i ) | {z } 0 0: In the case that (2) is true, we can easily show that Lfsr i > Lfsr + i . Let III =w i (1 Lfsr i )t (Lfsr i ) and IV =w + i (1 Lfsr + i )t (Lfsr + i ). ThenR i =t implies that III IV = 0, so h P( OR i =1jX;)P( i =1jX;) | {z } 1+P( i =+jX;) i w i (1 Lfsr i )t (Lfsr i ) | {z } III + P( OR i = +1jX;)P( i = +1jX;) w + i (1 Lfsr + i )t (Lfsr + i ) | {z } IV P( OR i =1jX;) 1 | {z } =P( OR i =+1jX;) III +P( OR i = +1jX;) IV = 0: 42 Therefore, it follows that m X i=1 P( OR i =1jX;)P( i =1jX;) w i (1 Lfsr i )t (Lfsr i ) + P( OR i = +1jX;)P( i = +1jX;) w + i (1 Lfsr + i )t (Lfsr + i ) 0: (1.7.3) Recall that the power function of any decision rule is given by wETD( ) =E m X i=1 w i I( i =1)I( i =1) +w + i I( i = +1)I( i = +1) =E X; m X i=1 w i P( i =1jX;)(1 Lfsr i ) +w + i P( i = +1jX;)(1 Lfsr + i )) : Combining equations (1.7.1) { (1.7.3) and noting that t > 0 and the weights are independent of the observations as well as the underlying truth, we can therefore conclude ETD( OR ) ETD( ) and the desired result follows. 43 1.7.2 Proof of Theorem 1.4.4 We rst begin with a summary of notations that will be used throughout the proof: N i (t) = Lfsr i I R i t + Lfsr + i I R + i t (1.7.4) min Lfsr i ; Lfsr + i I max R i ;R + i t = Lfsr i I fR i t<R + i or R + i <R i tg + Lfsr + i I fR + i t<R i or R i <R + i tg ; b N i (t) = d Lfsr i I n b R i t o + d Lfsr + i I n b R + i t o (1.7.5) h min d Lfsr i ; d Lfsr + i i I n max b R i ; b R + i t o = d Lfsr i I f b R i t< b R + i or b R + i < b R i tg + d Lfsr + i I f b R + i t< b R i or b R i < b R + i tg ; Q(t) =m 1 m X i=1 N i (t) with OR = infft2 [0; 1) :Q(t) 0g being the oracle threshold; b Q(t) =m 1 m X i=1 b N i (t) with b = infft2 [0; 1) : b Q(t) 0g being the data-driven threshold; Q 1 (t) =E [N i (t)] with 1 = infft2 [0; 1) :Q 1 (t) 0g being the asymptotic threshold: And recall that R S i i := Lfsr S i i w S i i 1 Lfsr S i i +j Lfsr S i i j ; R S i i := jLfsr i Lfsr + i j max w S i i (1 Lfsr S i i )w S i i (1 Lfsr S i i ); 0 +jLfsr i Lfsr + i j ; (1.7.6) where S i = argmin s2f;+g Lfsr s i w s i (1Lfsr s i ) or argmax s2f;+g Lfsr s i when equality happens, so similarly we have b R S i i := d Lfsr S i i w S i i 1 d Lfsr S i i +j d Lfsr S i i j ; b R S i i := j d Lfsr i d Lfsr + i j max n w S i i (1 d Lfsr S i i )w S i i (1 d Lfsr S i i ); 0 o +j d Lfsr i Lfsr + i j ; (1.7.7) 44 where S i = argmin s2f;+g d Lfsr s i w s i (1 d Lfsr s i ) or argmax s2f;+g d Lfsr s i when equality happens. Notice that both Q(t) and b Q(t) are only non-decreasing and right-continuous but not monotonic or continuous, which means that the consistency of Q(t) and b Q(t) do not necessarily imply the consistency of Q 1 (0) and b Q 1 (0) respectively. We therefore need to dene a continuous version of each function as Q c (t) = R (k+1) t R (k+1) R (k) Q(R (k) ) + tR (k) R (k+1) R (k) Q(R (k+1) ) for 0R (k) <t<R (k+1) < 1; b Q c (t) = b R (k+1) t b R (k+1) b R (k) b Q( b R (k) ) + t b R (k) b R (k+1) b R (k) b Q( b R (k+1) ) for 0 b R (k) <t< b R (k+1) < 1: By construction, Q c (t) and b Q c (t) are continuous and strictly increasing in t on the domain [0; 1), therefore their inverses Q 1 c and b Q 1 c are well dened, continuous and monotonic. Furthermore, it is easy to see that b = b Q 1 c (0), OR =Q 1 c (0), Q(t) p !Q c (t) and b Q(t) p ! b Q c (t). Next let us rst state a lemma that contains some key facts needed to prove the theorem and the proof of this lemma is in Section 1.9. Lemma 1.7.1. Let N i (t) and b N i (t) be as dened by Equations (1.7.4) and (1.7.5). Then for any t2 [0; 1), we have (1) E b N i (t)N i (t) 2 =o(1), (2) E h b N i (t)N i (t) b N j (t)N j (t) i =o(1), (3) b Q 1 c (0) p !Q 1 c (0) p ! 1 . To show FSR DD = FSR OR +o(1) = +o(1), we only need to show that mFSR DD = mFSR OR +o(1), and then the result would follow from the asymptotic equivalence of FDR and mFDR, which was proven in Cai et al. (2019). Consider the oracle and data-driven thresholds OR and b dened at the beginning of the proof. For the 45 mFDR level of the decision rules, we have mFSR OR = E m P i=1 h I f i 0g I fR i OR <R + i or R + i <R i ORg +I f i 0g I fR + i OR <R i or R i <R + i ORg i max 0 @ E m P i=1 I ( min Lfsr i w i ( 1Lfsr i ) ; Lfsr + i w + i ( 1Lfsr + i ) ! OR ) ; 1 1 A ; mFSR DD = E m P i=1 h I f i 0g I f b R i b < b R + i or b R + i < b R i b g +I f i 0g I f b R + i b < b R i or b R i < b R + i b g i max 0 @ E m P i=1 I ( min d Lfsr i w i ( 1 d Lfsr i ) ; d Lfsr + i w + i ( 1 d Lfsr + i ) ! b ) ; 1 1 A : According to Part (iii) of Lemma 1.7.1, we have b = b Q 1 c (0) p !Q 1 c (0) = OR , and by following the steps used in proving Part (i) of the lemma, we can show that E h I fR i OR <R + i or R + i <R i ORg I f b R i b < b R + i or b R + i < b R i b g i =E h I fR i OR <R + i or R + i <R i tg I f b R + i b < b R i or b R i < b R + i b g +I f b R i ; b R + i > b g +I f b R i b < b R + i or b R + i < b R i b g I fR + i OR <R i or R i <R + i ORg +I fR i ;R + i > ORg i =o(1); and similarly, we can also show that E h I fR + i OR <R i or R i <R + i ORg I f b R + i b < b R i or b R i < b R + i b g i =o(1): Moreover, by Proposition 1.4.3, we know that d Lfsr i p ! Lfsr i and d Lfsr + i p ! Lfsr + i , so we have min Lfsr i w i 1 Lfsr i ; Lfsr + i w + i 1 Lfsr + i ! OR p ! min 0 @ d Lfsr i w i 1 d Lfsr i ; d Lfsr + i w + i 1 d Lfsr + i 1 A b : Therefore, we have mFSR DD = mFSR OR +o(1). And since mFSR OR = by construction when randomization is employed but the eect of not including this step is negligible when 46 the sample size m is large as it is of o(1), so we can conclude that mFSR DD = mFSR OR +o(1) = +o(1): Finally, we can use the same arguments to show that wETD DD=wETD OR = 1 +o(1). 1.8 Proof of Propositions 1.8.1 Proof of Proposition 1.4.1 In this proof, we will show that R S i i <R S i i by construction, where S i := argmax s2f;+g PCR s i and if PCR i = PCR + i , we choose S i = argmax s2f;+g Lfsr s i as dened in step 2 of the oracle procedure. When PCR i 6= PCR + i , we have R S i i := w S i i (1 Lfsr S i i )w S i i (1 Lfsr S i i ) j Lfsr i Lfsr + i j = PCR S i i Lfsr S i i PCR S i i Lfsr S i i j Lfsr i Lfsr + i j < PCR S i i Lfsr S i i PCR S i i Lfsr S i i j Lfsr i Lfsr + i j = PCR S i i Lfsr S i i Lfsr S i i j Lfsr i Lfsr + i j < PCR S i i < PCR S i i =:R S i i ; when PCR i = PCR + i , since we let S i = argmax s2f;+g Lfsr s i , we have R S i i < 0< PCR S i i = PCR S i i =:R S i i ; and if Lfsr i = Lfsr + i , the inequality is also valid as R S i i := 0< PCR S i i PCR S i i =:R S i i . Therefore, we can conclude that R S i i < PCR S i i PCR S i i =:R S i i . 1.8.2 Proof of Proposition 1.4.3 The outline of the proof is as follows. We will start with an empirical distributionb g emp based on the unobservable realizations of j and use it as a proxy to justify the discretization of the unknown density for . Then we will work on the theoretical results regarding the oracle discretized density estimator ~ g n and the test statistic formed using it. 47 And at the end, we will justify the method used to nd the discretized density estimator b g n . First we will state a useful lemma: Lemma 1.8.1. Suppose i g() for i = 1;:::;m. Letb g emp be the empirical density function such thatb g emp = 1 m P m i=1 i (). Then for f (x) = R 1 1 (x)g()d and b f emp (x) = R 1 1 (x)b g emp ()d,E kf b f emp k 2 2 ! 0 as n!1. Furthermore, E kf (; 0) b f emp (; 0)k 2 2 ! 0 andE kf (; 0) b f emp (; 0)k 2 2 ! 0 as m!1. Continue with the assumption made in Lemma 1.8.1 in regard to the set of realizations i g() and the empirical density functionb g emp . Lets;s + ;:::;s +n be a grid of xed numbers spanning a range from s minf i g m i=1 to s +n maxf i g m i=1 , with s +k = 0 for some k by design, where is the grid size when there are n+1 grid points. Then for any i , there exists s i 2fs;s + ;:::;s +ng such thatj i s i j =2, and we have * E Z 1 1 b f emp (x) 1 m m X i=1 (xs i ) 2 dx =E Z 1 1 h 1 m m X i=1 (x i ) (xs i ) i 2 dx E Z 1 1 1 m m X i=1 (x i ) (xs i ) 2 dx E Z 1 1 2 4m " X i: i almost surely for some > 0, where f =E f and ~ f n =E ~ f are the marginal density functions. Furthermore, we can rewrite ~ f n in terms of ~ g n , the discretized estimate of g(), such that ~ f n (x) = n X j=0 j (xsj) = Z 1 1 (x)~ g n ()d = Z 1 1 (x) n X j=0 j s+j ()d: Next we will show that d Lfsr + (X i ; i ) P ! Lfsr + (X i ; i ) and d Lfsr (X i ; i ) P ! Lfsr (X i ; i ). If we can show L2 convergence, i.e. E k d Lfsr Lfsr k 2 2 ! 0 andE k d Lfsr + Lfsr + k 2 2 ! 0 as the sample size m goes to innity, the desired result of convergence in probability will be automatically implied. And to do so, we will take the same approach as that used in proving Lemma A.1 from Sun and Cai (2007). Note that f is continuous, then there exists K = [M;M] such that Pr(x2K c )! 0 as M!1. Let inf x2K f (x) =l 0 and A f =fx :j ~ f n (x)f (x)jl 0 =2g. Note thatE k ~ f n f k (l 0 =2) 2 Pr(A f ), so we have Pr(A f )! 0. Thus f and ~ f are bounded below by a positive number for large m except for an event that has a low probability. Similar arguments can be applied to the upper bound of f and ~ f , as well as to the upper and lower bounds for ~ f n i (; 0), f n i (; 0), ~ f n i (; 0), and f n i (; 0). Therefore, we can conclude that they are all bounded in the interval [l a ;l b ] with 0<l a <l b <1 for large m except for an event that 50 has algebraically low probability, here we denote it by A . Then we can conclude that E k d Lfsr Lfsr k 2 2 =E Z d Lfsr (x;) Lfsr (x;) 2 dF X; (x;) =E E Z 1 1 ~ f n (x; 0) ~ f n (x) f (x; 0) f (x) ! 2 f (x)dx =E E Z 1 1 ~ f n (x; 0)f (x; 0) ~ f n (x) + f (x; 0)(f (x) ~ f n (x)) ~ f n (x)f (x) ! 2 f (x)dx E E Z 1 1 ~ f n (x; 0)f (x; 0) ~ f n (x) + f (x) ~ f n (x) ~ f n (x) ! 2 f (x)dx Pr(A ) +cE E k ~ f n (; 0)f (; 0)k 2 2 +cE E k ~ f n f k 2 2 ! 0 as m;n!1: We can similarly show thatE k d Lfsr + Lfsr + k 2 ! 0. To show d Lfsr (X i ; i ) p ! Lfsr (X i ; i ), rst let B =f(x;) :j d Lfsr (x;) Lfsr (x;)jg. Then we have 2 Pr(B )E k d Lfsr Lfsr k 2 2 ! 0, which implies Pr(B )! 0 and the result follows from here. The same approach can be used to show that d Lfsr + (X i ; i ) p ! Lfsr + (X i ; i ). Having shown the existence of the set 0 ;:::; n that minimizesE k b f emp ~ f n ; 0 ;:::;n k 2 2 and presented the theories based on it, the remaining task now is to provide theoretical justication for the method that is used to estimate this set. Since E k b f emp b f n ; 0 ;:::;n k 2 2 E k b f emp f k 2 2 +E kf b f n k 2 2 , we can instead aim to minimize E kf b f n k 2 2 , or equivalentlyE kf b f n k 2 2 . To do so, we opted for minimizing the KL divergence between the joint distribution f X; and its estimate b f n X; . This nonnegative divergence measure is dened as D KL (f X; jj b f n X; ) =E X; " log f(X;) b f n (X;) # =E " log f(Xj) b f n (Xj) # =E [logf (X)]E h log b f n (X) i ; We can see that minimizing the KL divergence is equivalent to maximizing 51 E X; h log b f n (X) i . Then for given a set of m pairs of observation (X 1 ; 1 );:::; (X m ; m ), by the law of large numbers, we have 1 m m X i=1 h logf i (x i ) log b f n i (x i ) i !D KL (f X; j b f n X; ) as m!1: Furthermore, note that the MISE of any given density estimator b f of f can be bounded as follows: E kf b f k 2 2 E Z 1 1 cjf (x) b f (x)jdx =ckf X; b f X; k 1 c p 2 ln 2 q D KL (f X; j b f X; ) for some nonnegative constant c, where the last inequality comes from the Pinsker's inequality. Here we have assumed, without loss of generality, that is a random variable instead of a xed value, and it is nonzero almost surely. In practice, we only have a nite sample size and the solver that is used to solve the optimization problem might not return the exact optimal solutions especially when the problem is more complicated, either in terms of the underlying distribution g or the number of parameters j 's, i.e. number of grid points used to discretize this unknown distribution. In theory as shown above, the more grid points we use, the closer the estimated statistic d Lfsr will be to the oracle statistic Llsr. However, a solver will almost always return only a few nonzero estimates for the parameters. And to further demonstrate the complexity of the issue, consider the simple case when the underlying distribution is a point mass at 0 such that g = 0 () and assume this point mass is erroneously estimated to be at the nearest grid point . Then it can be shown that the L2 distance between the true and the estimated density in this case is kf b f n k 2 2 =E Z 1 1 ( (x) (x)) 2 dxO(E 2 4 p 3 ); and the KL divergence between them is D KL (fjj b f) P m i=1 log i (x) i (x) O( 2 2 2 ). This 52 means that when the grid size gets smaller and smaller, the error get smaller and smaller, and as a result, the solver might assign the point mass at at the point near to 0 instead of at 0. In the case when the solver fails to return the optimal solution,E X; k d Lfsr Lfsrk 2 2 would not converge to 0, and as the problem gets more complex, it becomes more dicult for the solver to nd the optimal solution. If we can have the optimal estimates of the p i 's, smaller grid size can make the procedure more powerful, whereas bigger grid size would make the procedure more conservative with more weight being allocated to the grid point at 0, . With bigger grid size, a solver would have a easier problem to solve and is more likely to return the optimal solution. Such eect of grid size on the performance of our procedure is demonstrated in Figure 1.8. So to handle this issue, we can use a grid size small enough but not too small so that our procedure can be more stable at the cost for being conservative. The recommended conservative grid size to sigma ratio, or the expected ratio, is at least 0.5. 1.9 Proof of Lemmas 1.9.1 Proof of Lemma 1.7.1 In this proof, we will show that for any t2 [0; 1), we have (i) E b N i (t)N i (t) 2 =o(1), (ii) E h b N i (t)N i (t) b N j (t)N j (t) i =o(1), (iii) Q 1 c (0) b Q 1 c (0) p ! 0. Proof of Part (i). From the denition of N i (t) and b N i (t) , we can decompose 53 b N i (t)N i (t) 2 into the following terms: b N i (t)N i (t) 2 = d Lfsr i Lfsr i 2 I f b R i t< b R + i or b R + i < b R i tg I fR i t<R + i or R + i <R i tg | {z } I + d Lfsr + i Lfsr + i 2 I f b R + i t< b R i or b R i < b R + i tg I fR + i t<R i or R i <R + i tg | {z } II + d Lfsr i Lfsr + i 2 I f b R i t< b R + i or b R + i < b R i tg I fR + i t<R i or R i <R + i tg | {z } III + d Lfsr + i Lfsr i 2 I f b R + i t< b R i or b R i < b R + i tg I fR i t<R + i or R + i <R i tg | {z } IV + d Lfsr i 2 I f b R i t<R + i or b R + i < b R i tg + d Lfsr + i 2 I f b R + i t< b R i or b R i < b R + i tg I fR i ;R + i >tg | {z } V + h Lfsr i 2 I fR i t<R + i or R + i <R i tg + Lfsr + i 2 I fR + i t<R i or R i <R + i tg i I f b R i ; b R + i >tg | {z } VI : Recall that Lfsr =+ i and d Lfsr =+ i are bounded random variables with domain in [0; 1]. By Proposition 1.4.3, we know d Lfsr i Lfsr i =o p (1) and d Lfsr + i Lfsr + i =o p (1). So we have E[I]E d Lfsr i Lfsr i 2 E h j d Lfsr i Lfsr i j i =o(1) and similarly we can show that E[II] =o(1). To showE[III] =o(1), note that from the denition of R =+ i and b R =+ i (1.7.6, 54 1.7.7), we have E[III]E h I f b R i t< b R + i or b R + i < b R i tg I fR + i t<R i or R i <R + i tg i = E h I f b R i t< b R + i ;R + i t<R i g +I f b R + i < b R i t;R i <R + i tg i | {z } P f(Lfsr i ;w i )f(Lfsr + i ;w + i ) f( d Lfsr i ;w i )f( d Lfsr + i ;w + i ) >0 +P (Lfsr i Lfsr + i )( d Lfsr i d Lfsr + i ) >0 + E h I f b R i t< b R + i ;R i <R + i tg i | {z } P g( d Lfsr + i ;w + i ; d Lfsr i ;w i )g(Lfsr + i ;w + i ; Lfsr i ;w i )>0 + E h I f b R + i < b R i t;R + i t<R i g i | {z } P g(Lfsr i ;w i ; Lfsr + i ;w + i )g( d Lfsr i ;w i ; d Lfsr + i ;w + i )>0 ; where f(x;w) := x w(1x) and g(x 1 ;w 1 ;x 2 ;w 2 ) := jx 1 x 2 j maxfw 1 (1x 1 )w 2 (1x 2 ); 0g+jx 1 x 2 j are continuous functions. Therefore, by Proposition 1.4.3 and the Continuous Mapping theorem, we can conclude thatE[III] =o(1), and we can similarly show thatE[IV] =o(1). Same approach can be used to show thatE[V] +E[VI] =o(1) for we can show that E[V]P f( d Lfsr i ;w i )t;f(Lfsr i ;w i )>t +P f( d Lfsr + i ;w + i )t;f(Lfsr + i ;w + i )>t ; E[VI]P f(Lfsr i ;w i )t;f( d Lfsr i ;w i )>t +P f(Lfsr + i ;w + i )t;f( d Lfsr + i ;w + i )>t ; and hence E[V]+E[VI]P jf( d Lfsr i ;w i )f(Lfsr i ;w i )j> 0 +P jf( d Lfsr + i ;w + i )f(Lfsr + i ;w + i )j> 0 ; where f(x;w) is dened as above in showingE[III =o(1). The proof of part (i) has been completed. Proof of Part (ii). Since X i and X j are identically distributed, by the Cauchy-{Schwarz 55 inequality, we have E h b N i (t)N i (t) b N j (t)N j (t) i E h b N i (t)N i (t) i 2 ; and the desired result follows directly from part (i). Proof of Part (iii). To show Q 1 c (0) b Q 1 c (0) p ! 0, we only need to established (a) Q 1 c (0) p ! 1 and (b) b Q 1 c (0) p ! 1 , and then the desired result would follow. To show (a), note that so far we have Q(t) p !Q 1 (t) by WLLN and Q(t) p !Q c (t) by construction, so it follows that Q c (t) p !Q 1 (t). Since Q 1 c () is continuous, for any > 0, there exists a > 0 such thatjQ 1 c (Q 1 ( 1 ))Q 1 c (Q c ( 1 ))j< whenever jQ 1 ( 1 )Q c ( 1 )j<. The desired result thus follows from P (jQ 1 ( 1 )Q c ( 1 )j>)P jQ 1 c (Q 1 ( 1 ))Q 1 c (Q c ( 1 ))j> =P jQ 1 c (0) 1 j> : Part (b) can be proven in a similar fashion. We have b Q 1 c () is continuous and b Q(t) p ! b Q c (t) by construction. To show b Q(t) p !Q 1 (t), since we already have Q(t) p !Q 1 (t), we only need to establishQ(t) b Q(t) p ! 0. LetS m = P m i=1 b N i (t)N i (t) , then by the result in parts (i) and (ii), we have Var(m 1 S m ) 1 m 2 m X i=1 E b N i N i 2 +O 1 m 2 X i;j:i6=j E h b N i N i b N j N j i ! =o(1): Moreover, we can showE [m 1 S m ]! 0 by repeating the steps of part (i). Therefore, by applying the Chebyshev's inequality, we have m 1 S m = b Q(t)Q(t) p ! 0, which completes the proof. 56 1.9.2 Proof of Lemma 1.8.1 First note that by changing the order of integration, we have E kf b f emp k 2 2 =E Z 1 1 f (x) b f emp (x) 2 dx = Z 1 1 E f (x) b f emp (x) 2 dx: Focusing on the integrand, from the bias-variance decomposition, we have E f (x) b f emp (x) 2 = E b f emp (x)f (x) 2 + Var b f emp (x) : For the bias part, by denition, b f emp (x) = R 1 1 (x)b g emp ()d = 1 m P m i=1 (x i ), so we have E b f emp (x) =E 1 m m X i=1 (x i ) =E (x) = Z 1 1 (x)g()d =f (x); As for the variance part, we have, Var( b f emp (x)) = 1 m Var( (x)) 1 m Z 1 1 2 (x)g()d = 1 m2 p Z 1 1 = p 2 (x)g()d = 1 m2 p f = p 2 (x): Therefore, MISE( b f emp ) = 0 + Z 1 1 1 m2 p f = p 2 (x)dx = 1 m2 p ! 0 as m! 0: To showE kf (; 0) b f emp (; 0)k 2 2 ! 0, we can proceed in a similar approach. 57 Note that in this case, E b f emp (x; 0) =E 1 m m X i=1 (x i )1 i 0 = Z 0 (x)g()d =f (x; 0); Var b f emp (x; 0) = 1 m Var ( (x)1 i 0 ) 1 m Z 0 2 (x)g()d = 1 m2 p f = p 2 (x;> 0); so the result follows as before. And similarly, we can show that E kf (; 0) b f emp (; 0)k 2 2 ! 0 as m!1. 58 1.10 Supplementary Numerical Results a) Error Comparison p FSR Proposed NewDeal BH alpha level 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 b) Power Comparison p ETP 0.5 0.6 0.7 0.8 0.9 1 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 4.1: μ i ~ (1−p)δ 0 +pN(2,1) for varying p a) Error Comparison c FSR Proposed NewDeal BH alpha level −2.5 −2 −1.5 −1 −0.5 0 0 0.1 0.2 0.3 b) Power Comparison c ETP −2.5 −2 −1.5 −1 −0.5 0 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 4.2: μ i ~ 0.1δ 0 +0.9N(c,1) for varying c a) Error Comparison p FSR Proposed NewDeal BH alpha level 0.1 0.3 0.5 0.7 0.9 0 0.1 0.2 0.3 b) Power Comparison p ETP 0.1 0.3 0.5 0.7 0.9 0.0 0.2 0.4 0.6 0.8 1.0 Unweighted Setting 4.3: μ i ~ (1−p)δ 0 +pδ 2 for varying p Figure 1.7: Unweighted settings not unimodal about 0 For settings in which the distribution of the true eects g() is not unimodal about 0 as assumed by the NewDeal procedure, their procedure can fail to control the FSR, whereas our proposed procedure is more exible and does not rely on any assumption. 59 Table 1.3: Number of selections in each block for varying weighting schemes Location index 1-2,000 2,001-4,000 4,001-6,000 6,001-8,000 8,001-10,000 I (w + ,w ) 2*(9,1) (9,1) (1,1) (1,9) 2*(1,9) positive selections 1001.24 861.02 553.12 551.28 617.96 negative selections 616.59 549.46 550.83 855.34 999.75 TP selections 839.52 764.36 527.18 523.99 581.10 TN selections 581.15 523.45 523.98 758.50 840.41 II (w + ,w ) (9,1) (9,1) (1,1) (1,9) (1,9) positive selections 935.17 938.66 585.59 584.02 585.84 negative selections 583.45 580.74 583.33 935.08 933.05 TP selections 804.99 811.44 554.98 551.89 554.03 TN selections 553.49 550.24 551.91 806.38 805.03 III (w + ,w ) (9,1) (9,1) 4*(1,1) (1,9) (1,9) positive selections 908.39 912.66 753.95 572.18 573.58 negative selections 572.36 569.08 751.48 908.34 906.25 TP selections 789.39 796.30 687.54 541.93 543.69 TN selections 544.07 540.28 685.45 790.84 789.42 IV (w + ,w ) (9,1) (9,1) 9*(1,1) (1,9) (1,9) positive selections 868.15 871.85 867.92 555.46 556.51 negative selections 556.32 553.03 866.66 866.66 866.27 TP selections 764.93 771.49 764.84 527.57 528.86 TN selections 530.34 526.51 764.64 765.71 765.11 60 a) Error Comparison grid size FSR Proposed NewDeal BH alpha level 0.4 0.6 0.8 1 1.2 0 0.1 0.2 0.3 b) Power Comparison grid size ETP 0.4 0.6 0.8 1 1.2 0.0 0.2 0.4 0.6 0.8 1.0 Varying grid size with μ i ~ N(0,2 2 ) and σ i = 1 a) Error Comparison grid size FSR Proposed NewDeal BH alpha level 0.4 0.6 0.8 1 1.2 0 0.1 0.2 0.3 b) Power Comparison grid size ETP 0.4 0.6 0.8 1 1.2 0.0 0.2 0.4 0.6 0.8 1.0 Varying grid size with μ i ~ N(0,2 2 ) and σ i = 1.5 a) Error Comparison grid size FSR Proposed NewDeal BH alpha level 0.4 0.6 0.8 1 1.2 0 0.1 0.2 0.3 b) Power Comparison grid size ETP 0.4 0.6 0.8 1 1.2 0.0 0.2 0.4 0.6 0.8 1.0 Varying grid size with μ i ~ N(0,2 2 ) and σ i = 2 Figure 1.8: Varying the grid size used by the deconvolution method Grid size is varied from 0.4 to 1.2 in order to demonstrate how the procedure becomes more conservative as the grid size to sigma ratio increases and less stable as this ratio decreases. Sample size used in the simulations is 5000 and the number of replications is 100. Similar results have been observed under other simulation settings as well. 61 Chapter 2 Detecting Simultaneous Signals in the Presence of Local Structures 2.1 Introduction The problem of simultaneous signals detection arises when two or more sequences of paired observations, or test statistics obtained from the observations, are given and the goal is to detect as many occurrences of simultaneous signals as possible while controlling the error rate, such as the false discovery rate in large scale setting. Such problem can be seen as a type of multiple hypothesis testing problem, so existing false discovery rate control procedures have be modied to solve this problem. When observations have either spatial or temporal relationship, although signals may be sparse on the global level, i.e. there are few simultaneous signals overall, it may not be the case on the local level because very often signals tend to be clustered in certain regions and the frequency of signals is most likely going to be heterogeneous. Existing methods often ignore this particular setting and fail to take advantage of this local structures. Procedures for detecting simultaneous signals have applications in replicability analysis and colocalization studies. Replicability analysis aims to identify the overlapping signals across independent studies that examine the same features, and these features can have spatial order such as in genome-wide association studies. Colocalization analysis, according to A Practical Guide to Evaluating Colocalization in Biological Microscopy (Dunn et al., 2011), 62 aims to study complex spatial associations between bio-molecules by comparing the distribution of a uorescently labeled version of the molecule with that of a second, complementarily labeled probe. Recent developments in colocalization analysis have shifted from the subjective qualitative perspective to a more systematic and quantitative approach. In this project, we will develop a general framework for detecting simultaneous signals when local structures are assumed to be present. The rest of this chapter is organized as follows. In Section 2, we will formulate the problem and the spatial setting that is under consideration. In Section 3, We will review existing works related to our problem. Section 4 focuses on the methodology including the oracle procedure, the method that is used to extract and estimate the local structures, and the data-driven procedure. We will also discuss the properties and theories of our proposed procedures in that section. Results of simulations studies are presented in Section 5. 2.2 Statement of Problem In this section, we will formulate the problem of simultaneous signals detection under a general spatial setting. LetSR d denote a d-dimensional spatial domain. We will focus on a setup under which the hypotheses are located on a nite, regular latticeS2S, and data are observed at every location s2S. Appealing to the inll asymptotic framework, we assumeS!S in our theoretical analysis. Such setup is suitable for when complete data points are observed at regular locations, for example, linear network data and ne resolution images. Assume there are two spatial processes, X S =fX s :s2Sg and Y S =fY s :s2Sg. Let X s and Y s be two binary variables such that i s = 1 indicates the presence and i s = 0 indicates the absence of the signal of interest at location s in spatial process i2fX;Yg. Let s = X s Y s , we then have s = 1 indicating the presence and s = 0 indicating the absence of simultaneous signals between the two spatial processes at location s. The identication of simultaneous signals across space can then be formulated in the 63 language of multiple testing where H 0 (s) : s = 0 vs. H 1 (s) : s = 1 for s2S: (2.2.1) First let us consider the two spatial processes separately for identifying signals of interest in each process. Given the two sets of summary statistic,fT i s :s2Sg for i2fX;Yg, by following the conventional practice in standard multiple hypothesis testing procedures, we will convert T i s to a p-value p i s , which has a uniform distribution over the interval [0; 1] under the null hypothesis i s = 0 so the conditional cumulative distribution function (CDF) of this p-value can be written as P P i s tj i s 1 i s t + i s G i (tjs) for t2 [0; 1]; (2.2.2) where G i (tjs) denotes the unspecied CDF of the non-null p-value when i s = 1 for spatial process i2fX;Yg at location s and is known to be stochastically smaller than the uniform distribution. For each of the spatial processes, we dene the frequency level at location s as i (s) =P i s = 1 ; which represents the likelihood of the signal of interest showing up at location s. In many applications, due to the existence of spatial correlations and external covariates, signals tend to appear more frequently in certain regions and less so in some other regions, and the magnitude of non-null eects may also uctuate across locations. To capture such local structural patterns, we will allow i (s) and G i (js) to vary across the spatial domainS and further assume i (s) varies smoothly as a continuous function of s, which is a mild condition needed for our methodological development. Moving on to the intended problem of detecting simultaneous signals between the two 64 processes, we now have a pair of p-values p X s ;p Y s at each location s and we let p max s = max p X s ;p Y s and p min s = min p X s ;p Y s : Between the two p-values at a location s, p max s is naturally the statistic that should be used as it captures the evidence against the null that there is no simultaneous signals, but p min (s) can also provide extra information on the side in regards to the local sub-structures when there is no simultaneous signals, i.e. whether there is only one spatial process having signal of interest or neither of them does. We will make the following assumption: A1. When there is no simultaneous signals at a location s, i.e. s = X s Y s = 0, the two p-values p X s and p Y s are independent. The frequency level of simultaneous signals at location s can be dened as (s) =P X s = Y s = 1 ; (2.2.3) and note that (s) = X (s) Y (s) is not necessarily true, most likely not true, considering at the locations where simultaneous signals are present, the two spatial processes may be dependent since they are be triggered by the same event so we only assume independence when there is no simultaneous signals. Furthermore, let us dene the following notations, 00 (s) =P X s = Y s = 0 ; 10 (s) =P X s = 1; Y s = 0 ; 01 (s) =P X s = 0; Y s = 1 ; (2.2.4) such that (s) + 00 (s) + 10 (s) + 01 (s) = 1. The aforementioned smooth condition on i (s) for i2fX;Yg implies that (s); 10 (s); 01 (s) and 00 (s) are also continuous in s. We will focus on point-wise analysis where testing units are individual locations. The decision at location s is represented by a binary variable s , where s = 1 if H 0 (s) is rejected and s = 0 otherwise. The commonly used false discovery rate (FDR) proposed by 65 Benjamini and Hochberg (1995) in this case is dened as FDR =E FDP =E V R_ 1 =E P s2S [1 s ] s 1_ P s2S s ; (2.2.5) where FDP stands for the false discovery proportion. The power of an FDR controlling procedure =f s :s2Sg can be characterized by the expected number of true positives (ETP): ETP() =E " X s2S s s # : (2.2.6) Note that it was shown in Cai et al. (2019) that the FDR is asymptotically equivalent to the marginal FDR (mFDR) that is dened as mFDR = E[V ] E[R]_ 1 = EFP (EFP + ETP)_ 1 = FDR +o(1); (2.2.7) where EFP denotes the expected number of false positives. We aim to develop an FDR-controlling procedure that identies the presence of simultaneous signals between two spatial processes with increased power by incorporating the local structures revealed by pooling information from nearby locations . 2.3 Literature Survey Falling within the general problem of detecting simultaneous signals, replicability analysis is an area that many researchers are interested in. The goal of replicability analysis is to identify the overlapping signals across independent studies that examine the same features. These features may or may not have structural ordering, but when there is structural ordering, the procedures developed for this type of problem often ignore this structural ordering and thus fail to incorporate the auxiliary information on the intrinsic local structures into the analysis to increase the power of their procedures. In replicability analysis, since the studies examine the same set of features, signals are expected to more or 66 less agree among the studies. Most existing procedures for replicability analysis were developed specically for such setting rather than the general sparse setting we have in which we assume global sparsity of simultaneous signals only allowing one process to have active signals on its own. Nonetheless, these procedures can be applied to our problem as well, and we want to show that under the setting of our problem, our proposed procedures can outperform the existing procedures. For the particular problem of two studies replicability analysis, Benjamini and Heller (2008) and Benjamini et al. (2009) suggested applying the original Benjamini{Hochberg (BH) procedure (Benjamini and Hochberg, 1995) to the maximum p-values in each location, which had been a common practice in the analysis of conjunctions in imaging data (Friston et al., 2005). However, when signal level is sparse with nothing to discover in most features, this procedure can be overly conservative with low power. The procedure proposed by Bogomolov and Heller (2018) claims to be more powerful than the existing procedures by rst selecting the promising features from each study solely based on that study, and then testing for replicability while controlling the FDR only among the features that have been selected in both studies. By focusing on the promising features selected in the rst step, this approach has the advantage of reducing the number of features needed to be accounted for in the subsequent replicability analysis, thus increasing the power. This approach is most suitable whenever the fraction of features with signal is smaller than half. However, in applications such as genome-wide association and brain imagining studies with features ordered by locations, there is usually a strong local structure so this method may not be appropriate as it fails to incorporate the spatial information that can be utilized to further increase the power. Our proposed procedures are based on the idea of weighted p-value. Many authors have explored the use of procedural weights to dierentiate hypotheses in order to gain power while controlling the FDR. Genovese et al. (2006) presents a framework for multiple hypothesis testing procedures that controls the FDR at pre-specied level when 67 incorporating prior information, which takes the form of weights on the p-values, about the hypotheses. Their study shows that if the assignment of weights is positively associated with the null hypotheses being false, such procedure improves the power. Similar idea can be used in the problem of simultaneous signals detection. Since the proposed problem of detecting simultaneous signals in the presence of local structures can be considered as spatial multiple testing, we will take an approach similar to the locally adaptive weighting and screening (LAWS) procedure proposed by Cai et al. (2021), in which the main idea is to recast spatial multiple testing in the framework of simultaneous inference with auxiliary information. Under such framework, the p-values play the primary roles for assessing the signicance, while the spatial locations are viewed as auxiliary variables for providing important structural information to assist with inference. 2.4 Methodology In this section, we will present the methodological development of our procedures as well as the underlying intuition. We aim to develop FDR-controlling procedures that can take advantage of the existing structural information, and we will introduce two procedures based on the locally adaptive weighting and screening (LAWS) method. We will show that although the LAWS method can be applied naively on the maximum p-values at each location, we can make further improvement by analyzing the dierent scenarios that make up the null. 2.4.1 Oracle procedures Given the two sets of p-values for the two spatial processes X and Y , for the purpose of detecting simultaneous signals, the p-value at each location s can naturally be dened as p s := max p X s ;p Y x , and it is straightforward to see that it is a valid p-value such that those corresponding to the true null hypotheses of not having simultaneous signals, i.e. 68 s = 0, are uniformly distributed under the independence assumption A1, P 00 (p max s t) =P p max s <tj X s = Y s = 0 =t 2 t; P 10;01 (p max s t) =P p max s <tj X s + Y s = 1 t; (2.4.1) where the last inequality is based on the assumption that the p-value distribution under alternative hypothesis is stochastically smaller than the uniform distribution. Because the frequency level of the simultaneous signals is unlikely to be consistent across the locations and simultaneous signals will most likely tend to be clustered, we will denote this location-based frequency level by (s) and the p-value can be scaled by the weighting factor w s = 1(s) (s) as explained in Cai et al. (2021). Note that for single studies, when the frequency level is assumed to be homogeneous, i.e. =(s) for all s2S, this parameter represents the fraction of features for which the null hypothesis are true, and it can be estimated and incorporated into the original BH procedure to construct an adaptive FDR controlling procedure, and the power of such adaptive procedure can be higher than the non-adaptive version especially if the frequency level of signals is small (Benjamini and Hochberg, 2000). When the frequency level is not homogeneous due to the existence of local structures, a locally adaptive weighting and screening approach is more appropriate and can result in higher power as well as lower FDR rate than the globally adaptive version. Similar to how Benjamini et al. (2009) proposed applying the BH procedure to fp max s g s2S for detecting simultaneous signals, we originally considered applying the LAWS method to these maximum p-values, i.e. estimate the location-based frequency level f(s) :s2Sg and then dene the weighted p-value as p w s =w s p s = 1(s) (s) p max s : (2.4.2) 69 Assume the p-value threshold is given by t w 2 (0; 1), then the expected number of false positives (EFP) can then be calculated as EFP =E " X s2S I (P w s t w ; s = 0) # = X s2S P (P w s t w j s = 0)P ( s = 0) X s2S (t w =w s )(1(s)) = X s2S (s)t w : (2.4.3) Therefore, if j hypotheses are rejected along the ranking of the weighted p-values, we can expect a number of p w (j) P s2S s rejections likely to be false positives. It then follows that j 1 p w (j) P s2S s provides a good estimate of the false discovery proportion (FDP) for P s2S s > 0. However, upon further analysis, it occurred to us that using the maximum p-value at each location leaves out the information that the minimum p-value can provide. Building on the foundation of this raw implementation of the LAWS procedure, we came up with two FDR procedures that are more powerful and are based on dierent approaches. The aforementioned naive implementation of LAWS focuses solely on the location-based frequency level of simultaneous signals (s) as captured by p max s and the structural information it provides. However, under the null hypothesis that there is no simultaneous signals at a location s, there are two possible scenarios, (1) both spatial processes are inactive, i.e. X s = Y s = 0, and (2) one and only one of them has signal of interest, i.e. X s + Y s = 1. This sub-structure was left out in our initial attempt in applying LAWS method, and this extra information can be captured by the minimum p-value p min s . Leaving out this information can make any procedures for detecting simultaneous signals less optimal. We need to incorporate p min s into our procedures to take the additional information on the sub-structure of the signals into consideration. To provide an motivation, consider the covariate-adjusted mixture model (X s ;Y s ) ind f(x;yjs) under the independence assumption, where the covariate s encodes the side information provided by 70 local structures. We have f(x;yjs) = 00 (s)f 00 (x;yjs) + 10 (s)f 10 (x;yjs) + 01 (s)f 01 (x;yjs) +(s)f 11 (x;yjs) (2.4.4) = 00 (s)f 0 (xjs)f 0 (yjs) + 10 (s)f 1 (xjs)f 0 (yjs) + 01 (s)f 0 (xjs)f 1 (yjs) +(s)f 11 (x;yjs); where f 0 (js) is the null probability density function and f 1 (js) is the non-null density function followed by each process X and Y independently when there is no simultaneous signals, i.e. X s + Y s 1, and f 11 (x;yjs) is the non-null joint distribution between the two processes when simultaneous signals are present. Let us dene the conditional (or covariate-adjusted) local false discovery rate at location s as CLfdr(x;yjs) =P ( s = 0jx;y;s) =P X s + Y s = 0jx;y;s +P X s + Y s = 1jx;y;s = 00 (s)f 00 (x;yjs) + 10 (s)f 10 (x;yjs) + 01 (s)f 01 (x;yjs) 00 (s)f 00 (x;yjs) + 10 (s)f 10 (x;yjs) + 01 (s)f 01 (x;yjs) +(s)f 11 (x;yjs) = (x;yjs) (x;yjs) + 1 ; where (x;yjs) = 00 (s) (s) f 00 (x;yjs) f 11 (x;yjs) + 10 (s) (s) f 10 (x;yjs) f 11 (x;yjs) + 01 (s) (s) f 01 (x;yjs) f 11 (x;yjs) (2.4.5) = 1(s) (s) 00 (s) 1(s) f 00 (x;yjs) f 11 (x;yjs) + 10 (s) 1(s) f 10 (x;yjs) f 11 (x;yjs) + 01 (s) 1(s) f 01 (x;yjs) f 11 (x;yjs) : It follows from the optimality theory in Cai et al. (2019) that under Model 2.4.4, CLfdr thresholding rule is optimal in the sense that it maximizes the ETP subject to the FDR constraint. Therefore, ranking by CLfdr, which is monotone in , is equivalent to ranking by the quantity . By inspecting Equation 2.4.5, we see that there are three sources of in uence on the ranking: (1) the information on the sparsity structures that re ect how 71 frequently simultaneous signals appear in the neighborhood of s, i.e. 1(s) (s) ; (2) the information exhibited by the data itself that indicates the strength of evidence against the null, i.e. the three density ratios; and (3) the contribution of each type of null scenarios, i.e. the coecient of the three density ratios. The density ratios and the density functions are extremely dicult to model and calculate, which is why we resort to work with the p-values p max s that can also capture the evidence against the null in the data and to replace f 00 (x;yjs) f 11 (x;yjs) by (p max s ) 2 , both f 10 (x;yjs) f 11 (x;yjs) and f 01 (x;yjs) f 11 (x;yjs) by p max s . Combining the above information, we will dene the weighted p-values as p w s = 00 (s) (s) (p max s ) 2 + 10 (s) + 01 (s) (s) p max s = 1(s) (s) 00 (s) 1(s) (p max s ) 2 + 10 (s) + 01 (s) 1(s) p max s = 1(s) (s) 00 (s) 1(s) (p max s ) 2 + 10;01 (s) 1(s) p max s =w s p s : (2.4.6) Notice that the proposed weighted p-value in (2.4.6) can be interpreted as applying the weight w s = 1(s) (s) on the modied p-value p s = 00 (s) 1(s) (p max s ) 2 + 10 (s)+ 01 (s) 1(s) p max s instead of p max s . This modied p-value incorporates the sub-structure provided by 00 (s) and 10 (s) + 01 (s), and it can be shown to be a valid p-value that is also more powerful as the test statistic than the maximum p-value. Proposition 2.4.1. Dene the modied p-value at location s as p s = 00 (s) 1(s) (p max s ) 2 + 10;01 (s) 1(s) p max s ; and assume the two p-values at location s are independent under the null hypothesis s = 0. The P s is a valid p-value and is stochastically smaller than the unweighted P max s under the null, i.e. P null (P max s )P null (P s ). Having constructed this modied p-value, the problem of detecting simultaneous signals 72 between two spatial processes is again reduced to a spatial FDR problem discussed in Cai et al. (2021) and we can apply the LAWS method while incorporating the local structures of simultaneous signals as well as the sub-structure of the two spatial processes when there is no simultaneous signals. We can now present our rst oracle procedure. Procedure 2.4.1. (Oracle Procedure I) Given a sequence of p-value pairs f(p X s ;p Y s ) :s2Sg and assume we know the frequency levels of simultaneous signals f((s); 00 (s)) :s2Sg. 1. Calculate the weighted p-values as p w s = 00 (s) (s) (p max s ) 2 + 10 (s) + 01 (s) (s) p max s ; where p max (s) = maxfp X s ;p Y s g for s2S. Order them in increasing order such that p w (1) :::p w (m) , and denote the corresponding hypotheses as H (1) ;:::;H (m) , where m =jSj; 2. Let k = max n i :p w (i) i P s2S (s) o . If such k exists, reject the k hypotheses associated with p w (1) ;:::;p w (k) ; otherwise no rejection is made. Note that in order to increase the stability of the procedure, for the implementation of LAWS procedure, we will set (s) = 1 if (s)> 1 and (s) = if (s)< for a small > 0 such as ==m, so P s2S (s)> 0 is guaranteed. The above procedure can be analyzed from a Bayesian perspective by treating the p-value p max s as a random 73 variable. The CLfdr can be written as P ( s = 0jP max s p) =P X s = Y s = 0jP max s p +P X s + Y s = 1jP max s p = 00 (s)P P max s pj X s = Y s = 0 P (P max s p) + [ 10 (s) + 01 (s)]P P max s pj X s + Y s = 1 P (P max s p) 00 (s)p 2 + [ 10 (s) + 01 (s)]p 00 (s)p 2 + [ 10 (s) + 01 (s)]p +(s)F max;11 (p) = (pjs) (pjs) + 1 ; for some p2 (0; 1), where (pjs) = 1 F max;11 (p) 00 (s) (s) p 2 + 10 (s) + 01 (s) (s) p = 1 F max;11 (p) 1(s) (s) 00 (s) 1(s) p 2 + 10 (s) + 01 (s) 1(s) p ; and F max;11 (p) is the unspecied non-null cumulative distribution function of the maximum p-value between P X s and P Y s for all s2S. This Bayesian perspective further justies the use of the weights w s and the modied p-value p w s as dened in (2.4.6). In practice, the performance of the above modied procedure 2.4.1 will depend on how well we can estimate the frequency level of simultaneous signals (s) as well as the additional parameters 00 (s) and 10;01 (s) := 10 (s) + 01 (s). Under the assumption of smooth structures, (s) can be estimated from the sequence of maximum p-values, whereas 00 (s) can be estimated from the sequence of minimum p-values. We will discuss the details of the estimation method in the next subsection when we present the data-driven procedures. Before that, we will introduce another procedure that takes into account eects of the sub-structure of the two spatial processes from a dierent perspective. For the modied p-value (2.4.6) used in the above procedure, we see that for any xed value of p max s , the larger the 00 (s), the smaller the modied p-value, which makes sense since if it is more 74 likely that neither of the processes has a signal at location s such that X s = Y s = 0, then it would be less likely to have two small p-values, making the false rejection less costly. This side information can also be captured by the minimum p-values p min s more directly. If we were to test the global null hypothesis at each location between the two spatial processes, i.e. X s = Y s = 0, we would have used the minimum p-value instead and apply the BH procedure at level =2 in order to achieve FDR control at level (Benjamini and Heller, 2008). Therefore, rather than estimating the parameters 00 (s) using the minimum p-values, we can rst screen the minimum p-values p min s to select those under certain threshold t 1. The outcome of having this screening process on p min s is to reduce the multiplicity, i.e. the number of hypothesis to be tested simultaneously, by rejecting the conjunction of null hypotheses of X s = Y s = 0 and testing only at the locations where we believe there is at least one process has signal of interest. The choice of this threshold t2 (0; 1) will have no eect on the FDR control but can be chosen or adapted to make the procedure more powerful. If we let t = 1, then we would recover the raw implementation of LAWS using the maximum p-values, which was our initial attempt but rejected for failing to incorporate the information provided by the minimum p-values, and this threshold will not screen out any hypotheses. If t is too small, it is possible that we leave out too many hypotheses that could have been rejected potentially. A reasonable range for this screening threshold is in (0;], and an locally adaptive choice can be t s =(s) since if it is unlikely to have simultaneous signals at location s, having a small threshold on p min s will have less negative impact on the power of the procedure. Note that this decision rule is dierent from the previous procedure as it can be considered as a two-stage decision rule. Although the performance of this procedure will depend on the choice of this minimum p-value threshold or thresholds, it does not require the knowledge of 10 (s) + 01 (s), which can be dicult to estimate well. Therefore, it is worthwhile to consider this two-stage decision rule, which is similar to but in fact dierent from the two-stage decision rule proposed by Bogomolov and Heller (2018), and we will compare the performance of these procedures in dierent 75 scenarios in the section on simulation studies. For this second oracle procedure, which can be considered as partially adaptive, we dene the p-value test statistic at location s as p t s = 8 > > < > > : p max s if p min s t 1 if p min s >t (2.4.7) for some threshold t2 (0;). We can once again show that it is a valid p-value. To provide more insight, let us revisit the covariate-adjusted mixture model (2.4.4) and consider the equivalent representation of CLfdr for any p2 (0; 1) and t2 (0;) which is chosen independent of the maximum p-values: P s = 0jP t s p = [1(s)]P null P max s p;P min s t [1(s)]P null (P max s p;P min s t) +(s)P 11 (P max s p;P min s t) (1) [1(s)]P null (P max s p) [1(s)]P null (P max s p) +(s)P 11 (P min s t;P max s p) (2) [1(s)]p [1(s)]p +(s)P 11 (P max s p;P min s t) = [1(s)]p [1(s)]p +(s)P 11 (P max s pjP min s t)E [I (P min s t)] (3) E [1(s)]p [1(s)]p +(s)P 11 (P max s pjP min s t)I (P min s t) =E (pjs) (pjs) + 1 ; where (pjs) = 1 F max;11 (pjP min s t) 1(s) (s) p: (2.4.8) In the above derivation, the last inequality is based on Jensen's inequality, and since the function x x+1 is monotonically increasing in x, the rst and second inequality comes from the fact thatP null P max s p;P min s t P null (P max s p) and the result that P null (P max s p)p as shown in (2.4.1), respectively. Now we can dene the weighted 76 p-value on the set R =fs :p min s t s g as p w s =w s p max s = 1(s) (s) p max s for s2R; (2.4.9) and here we have assumed that the threshold on p min s are also location based. We then have the following two-stage step-wise FDR-controlling procedure based on the ranking of these weighted p-values forfs :p min s t s g. Procedure 2.4.2. (Oracle Procedure II) Given a sequence of p-value pairs f(p X s ;p Y s ) :s2Sg as well as the frequency levels of simultaneous signalsf(s) :s2Sg and a set of screening thresholdsft s :s2S;t s 2 (0; 1]g independent of the maximum p-values: 1. Let R =fs :p min s t s g, and calculate the weighted p-values p w s =w s p max s = 1(s) (s) p max s for s2R. Order them (in increasing order) as p w (1) :::p w jRj , and denote corresponding hypotheses as H (1) ;:::;H jRj ; 2. Let k = max n i :p w (i) i P s2R (s) o . If such k exists, reject the k hypotheses associated with p w (1) ;:::;p w (k) ; otherwise no rejection is made. We will present the theoretical properties in regards to the two proposed procedure at the end of this section after we present the estimation methods and the data-driven procedures. 2.4.2 Estimation using LAWS Next we will discuss the practical implementation of our proposed procedures. Recall that the oracle procedure I (2.4.1) assumes the knowledge of (s), 00 (s) and 10;01 (s), whereas oracle procedure II (2.4.2) only assumes the knowledge of (s) for s2S. In order to use the proposed procedures in reality, we need to be able to estimate these unknown quantities. We will borrow the ideas used to construct the LAWS method proposed in Cai et al. (2021) to estimate these unknown parameters. 77 Let us rst start with the background of the LAWS estimator. In the xed proportion case without considering the local structures, adaptive estimators for improving power in FWER (family-wise error rate) controlling methods can be traced back to Schweder and Spjtvoll (1982), and its applications in FDR controlling methods was discussed in Benjamini and Hochberg (2000), Storey (2002) abd Genovese et al. (2006). The non-null proportion can be estimated as ^ = 1 #fi :p i >g m(1) for some pre-specied threshold2 (0; 1); (2.4.10) where m is the total number of hypotheses. Those p-values that are greater than are assumed to be associated with the true null hypotheses, and a portion of the m(1) true nulls will have p-values less than as the p-value under null is assumed to follow the uniform distribution approximately. Therefore we have m(1) #fi :p i >g +m(1), which leads to the above estimator. Under the spatial setting, (s) is most likely not going to be homogeneous and signals are more likely to be clustered in certain regions. Thus, the above estimator is no longer appropriate when local structure is present. To estimate (s), we see that for some pre-specied threshold 2 (0; 1), we have P(P s >) = (1(s))(1) +(s)P(P s >j s = 1) (1(s))(1): We can then introduce an conservative intermediate estimate of (s) as follows: (s) = 1 P(P s >) 1 (s): (2.4.11) As increases, we expect the p-values corresponding to true null hypotheses will become increasingly dominant in the range (; 1) as compared to the non-null p-values, making the termP(P s >j s = 1) more negligible, so the estimate will become less conservative with 78 a smaller bias. Therefore, (s) serves as a good estimator to (s) with a suitable . Since there is not enough information at location s to estimateP(P s >), a spatial-adaptive weighted screening (SAWS) approach is used to estimate (s) by pooling information from its neighborhood. The SAWS method relies on the assumption that (s) varies as a smooth function across the spatial locations, allowing information to be pooled from nearby locations by using a kernel to weight observations based on their distance to the location being estimated. Let K :R d !R be a positive, bounded and symmetric kernel function satisfying the following conditions: Z K(s)ds = 1; Z sK(s)ds = 0; and Z s T sK(s)ds1: (2.4.12) Let K h (s) =h 1 K(s=h), where h> 0 is the bandwidth. At location s, dene v h (s;s 0 ) = K h (ss 0 ) K h (0) for all s2S: (2.4.13) The kernel function is used to exploit the spatial correlation with m s = P s2S v h (s;s 0 ) being interpreted as the \total" number of observations at location s. LetS =fs2S :p s >g for some pre-specied threshold and dene ^ (s) = 1 P s2S v h (s;s 0 ) (1) P s2S v h (s;s 0 ) : (2.4.14) This SAW estimator ^ (s) will converge to the intermediate oracle estimator (s) asymptotically asS!S under the inll asymptotic framework (Cai et al., 2021). For the intended problem of simultaneous signals detection, we will rst estimate (s), the frequency level of simultaneous signals at each location s, for it is used by both of our procedures. As mentioned earlier, this information is captured by p max s , the maximum 79 p-value between the two spatial processes at location s. Since we have P(P max s >) (1(s))P null (P max s ) (1(s))(1); (2.4.15) an intermediate estimator of (s) can then be dened as (s) := 1 P(P max s >) 1 (s); (2.4.16) which is a conservative estimator. For the weighted p-values (2.4.6) that are used in procedure I, in addition to (s), we also need to estimate 00 (s) and 10;01 (s), which can be interpreted as the sub-structure under the null. We see that 00 (s) is captured by the minimum p-values p min s and since P(P min s > 0 ) 00 (s)P 00 (P min s > 0 ) = 00 (s)(1 0 ) 2 (2.4.17) for some pre-specied threshold 0 , an intermediate estimator of 00 (s) can similarly be dened as 0 00 (s) := P(P min s > 0 ) (1 0 ) 2 00 (s); (2.4.18) where 0 is not necessarily the same as the threshold used in the estimator of (s). Having dened the intermediate estimator for (s) and 00 (s), we can then let ; 0 10;01 (s) = 1 (s) 0 00 (s): (2.4.19) Notice that by construction, 0 00 (s) + 0 10;01 (s) 00 (s) + 10;01 (s) and 0 00 (s) 00 (s). The second inequality is not needed and we would instead prefer to have ; 0 10;01 (s) being a conservative estimator. We can see from the denition of the weighted p-value (2.4.6) that an overestimated 00 (s) and an underestimated 10;01 (s) can sometimes make this weighted p-value mistakenly smaller if (s) is not conservative enough. Let (s) and 0 (s) denote 80 the bias of the two conservative estimators derived from (2.4.17) and (2.4.15), then we have (s) =(s) ^ (s) (s)P 11 (P max s ) 1 ; 0 (s) = ^ 00 (s) 00 (s) = 10;01 (s)P 10;01 (P min s 0 ) +(s)P 11 (P min s 0 ) (1 0 ) 2 : (2.4.20) We see that the relationship between (s) and 0 (s) is not straightforward and it depends not only on the thresholds and 0 but also on the actual value of (s) and 00 (s). We can use a larger 0 to reduce the bias of 0 0 0(s) but that may increase the variance. Until we are able to construct an estimator for 10;01 (s) that can be shown to be conservative, we have to assume that under the null hypothesis s = 0, 00 (s) (P max s ) 2 + 10;01 (s)P max s ^ 00 (s) (P max s ) 2 + ^ 10;01 (s)P max s ; (2.4.21) so when we replace 00 (s) by ^ 00 (s) and 10;01 (s) by ^ 10;01 (s) in the construction of the weighted p-value 2.4.6, it will be biased upwards and remain to be valid. The validity of the data-driven version of procedure I depends on this assumption which we do not have control of at this point, and this motivated us to construct procedure II which only requires (s). Having dened the intermediate estimators of (s) and 00 (s), we will use the method reviewed earlier to nd the nal estimates ^ (s) and ^ 00 (s). For a pre-specied threshold , letS max =fs2S :p max s >g, the intermediate estimator of (s) is then estimated as ^ (s) = 1 P s2S max v h (s;s 0 ) (1) P s2S v h (s;s 0 ) ; (2.4.22) where v h (s;s 0 ) is dened in (2.4.13). To estimate 0 00 (s), letS min 0 =fs2S :p min (s)> 0 g for a pre-specied 0 , and estimate the intermediate estimator of 0 00 (s) as ^ 0 00 (s) = P s2S min 0 v h (s;s 0 ) (1 0 ) 2 P s2S v h (s;s 0 ) : (2.4.23) 81 Next we will justify these two nal estimators by showing ^ (s) converges to (s) and ^ 0 00 (s) converges to 0 00 (s) for every s2S by appealing to the inll asymptotic framework under which the gridS becomes denser and denser in a xed and nite domainS2R d so thatS!S. For each s2S, let max min (s) and max max (s) be the smallest and largest eigenvalues of the Hessian matrixP (2) (P max s >)2R dd respectively; and let min min (s) and min max (s) be those ofP (2) (P min s > 0 )2R dd . We will dene the following two assumptions. A2. Assume that () and 0 00 () have continuous rst and second derivatives and there exist constants C 1 > 0 and C 2 > 0 such thatC 1 max min (s) max max (s)C 1 and C 2 min min (s) min max (s)C 2 uniformly for all s2S. A3. Assume that Var X s2S IfP max s >g ! C 0 1 X s2S Var(IfP max s >g) and Var X s2S IfP min s > 0 g ! C 0 2 X s2S Var(IfP min s > 0 g): Remark The rst assumption (A2) is a mild regularity condition on the alternative CDFs ofp max s under s = 1 andp min s under X s + Y s 1. (A3) assumes that most of the maximum and minimum p-values are weakly correlated and can be further relaxed with a large choice of the bandwidth. Under the above two assumptions, as shown by Cai et al. (2021), the consistency of the estimators of the intermediate oracle estimators of the frequency levels can be proven. Proposition 2.4.2. Under A2 and A3, if hjSj 1 , we have, uniformly for all s2S, E [^ (s) (s)] 2 ! 0 and E h ^ 0 00 (s) 0 00 (s) i 2 ! 0 S!S and it then follows that E h ^ ; 0 10;01 (s) ; 0 10;01 (s) i 2 ! 0 as S!S 82 . 2.4.3 Data-driven procedures Given the two proposed oracle procedures 2.4.1 and 2.4.2, we can replace (s) and 00 (s) by ^ (s) and ^ 0 00 (s), respectively, to arrive at the data-driven procedures. Procedure 2.4.3. (Data-driven Procedure I) Given a sequence of p-value pairs f(p X s ;p Y s ) :s2Sg: 1. Compute ^ (s) and ^ 0 00 (s) for s2S with and 0 pre-specied; 2. At each location s, let p max (s) = maxfp X s ;p Y s g and calculate the weighted p-value b p I s = min ^ 0 00 (s) ^ (s) (p max s ) 2 + 1 ^ (s) ^ 0 00 (s) ^ (s) p max s ; 1 ; 3. Sort the weighted p-values as b p I (1) :::b p I (m) , and denote corresponding hypotheses as H (1) ;:::;H (m) ; 4. Let k = max n i :b p I (i) i P s2S ^ (s) o . If such k exists, reject the k hypotheses associated with b p I (1) ;:::;b p I (k) ; otherwise no rejection is made. As mentioned previous, the next procedure is more of a two-stage procedure than a p-value thresholding procedure, and the rst step is to screen the minimum p-values by a threshold t2 (0;) which can either be pre-specied or chosen adaptively. By implementing this screening step, we do not need to estimate 00 (s). Procedure 2.4.4. (Data-driven Procedure II) Given a sequence of p-value pairs f(p X s ;p Y s ) :s2Sg: 1. Compute ^ (s) for s2S with pre-specied parameter ; 83 2. Let R =fs :p min s tg, where t can be any pre-specied value in (0;) or can be adapted to the data such as t s =^ s , the only requirement is for it to be independent of the maximum p-values; 3. At each location s2R, calculate b p II s = min n ^ s 1^ s p max s ; 1 o . Then sort the weighted p-values as b p II (1) :::b p II (jRj) , and denote corresponding hypotheses as H (1) ;:::;H (jRj) ; 4. Let k = max n i :b p II (i) i P s2S ^ (s)Ifp min (s)tg o . If such k exists, reject the k hypotheses associated with b p II (1) ;:::;b p II (k) ; otherwise no rejection is made. 2.4.4 Theoretical properties of the data-driven procedures Next we will investigate the theoretical properties of the oracle and the data-driven procedures. Since our methodology is based o the LAWS method developed by Cai et al. (2021), we will follow the theoretical development in that paper. Dene the z-values using the two-sided p-values as z(s) = 1 (1p max s =2) for s2S, and let m =jSj. Arrange fs2Sg in any orderfs 1 ;:::;s m g and denote the corresponding z-values Z = (z 1 ;:::;z m ) T . Several regularity conditions will be also needed for the asymptotic error rates control. In spatial data analysis with a latent processf s :s2Sg, the dependence among the maximum p-values at each location s may come from two possible sources: the correlations among the p-values when s are given and the correlations among s . The conditions on these two types of correlations are respectively specied below in (A4) and (A5) respectively. A4. Dene (r i;j ) mm = R = Corr(Z). Assume max 1i<jm jr i;j jr< 1 for some constant r> 0. Moreover, there exits > 0 such that max fi:(s i )=0g j i ( )j =o(m ) for some constants 0<< 1r 1+r , where i ( ) =fj : 1jm;jr i;j j (logm) 2 g. A5. Under Model (2.2.3), there exists a suciently small constant > 0, such that (s)2 [; 1], and that Var P s2S I( s = 0) =O(m 1+ ) for some constant 0 < 1. 84 A6. DeneS = i : 1im;j i j (logm) (1+)=2 , where i =E(z i ). For some > 0 and some > 0,jS j 1=( 1=2 ) + (logm) 1=2 , where 3:14 is a math constant. Remark Condition A4 assumes that most of P max s 's under the null s = 0 are weakly correlated. Condition A5 only assumes that the latent variablesf s :s2Sg are not perfectly correlated. Condition A6 requires that there are a few spatial locations with (maximum) mean eects of z-values exceeding (logm) 2 for some > 0 and it is rather mild. We are now ready to state the main theorem in regards to the theoretical properties of the proposed procedures. Let OR I and OR II denote the testing rule of the oracle procedure proposed in 2.4.1 and 2.4.2 respectively; b I and c II denote the corresponding testing rule dened using the intermediate estimators (s) (2.4.16) and 00 (s) (2.4.18) that are assumed to be known; DD b I and DD c II denote the data-driven testing rule dened using the estimated ^ (s) (2.4.22) and ^ 00 (s) (2.4.23). Recall that procedure II only uses (s) and does not need 00 (s). The following theorem shows that these procedures will lead to conservative error rates control asymptotically under dependency in terms of false discovery rate (FDR) and false discovery proportion (FDP). Theorem 2.4.3. Under the conditions (A4)-(A6), for any > 0, we have lim m!1 FDR I and lim m!1 P FDP I + = 1; (2.4.24) lim m!1 FDR II and lim m!1 P FDP II + = 1: (2.4.25) The results hold when the oracle parameters are replaced by the intermediate estimator, lim m!1 FDR b I and lim m!1 P FDP b I + = 1; (2.4.26) lim m!1 FDR c II and lim m!1 P FDP c II + = 1: (2.4.27) Under the assumptions in Proposition 2.4.2, when the procedures are driven fully by the 85 data, we further have lim S!S FDR DD b I and lim S!S P FDP DD b I + = 1; (2.4.28) lim S!S FDR DD c II and lim S!S P FDP DD c II + = 1: (2.4.29) 2.5 Simulation Results In this section, we will present the simulation results to demonstrate how well our proposed procedures can perform in terms of the expected number of true positives (ETP) normalized by the total number of simultaneous signals under dierent settings as compared to the procedure proposed by Bogomolov and Heller (2018), which will be referred to as the BoHe procedure. The BoHe procedure appears to be the most competitive procedure so far and it was implemented with the default parameters as follows. First construct the selection sets S X =fs :p X s =2g and S Y =fs :p Y s =2g, then apply the BH procedure to each set at level 2jS X j^ X 0 and 2jS Y j^ Y 0 , where ^ X 0 is the plug-in estimate of P j2S Y (1 X j )=jS Y j and ^ Y 0 is dened similarly, and lastly take the intersection of the discoveries made in each set as the nal discoveries. For the oracle implementation, we will use the actual value of the fractions, and for the data-driven implementation, we will use the recommended threshold = to estimate these fractions. We will refer to our rst proposed procedure as LAWSmodI (2.4.1) and the second one as LAWSmodII (2.4.2). For the simulations setting, we will dene the conguration vector as ! (s) = ( 00 (s); 10 (s); 01 (s);(s)) for s2S where each component is dened in (2.2.4). Let the density function of (X s ;Y s ) be f(x;yjs) = 00 (s)f 00 (x;yjs) + 10 (s)f 10 (x;yjs) + 01 (s)f 01 (x;yjs) +(s)f 11 (x;yjs) = 00 (s)f 0 (xjs))f 0 (yjs)) + 10 (s)f 1 (xjs)f 0 (yjs)) (2.5.1) + 01 (s)f 0 (xjs)f 1 (yjs) +(s)f 1 (xjs)f 1 (yjs); 86 where f 0 (js) =N(0; 1) and f 1 (js) =N(; 1) for > 0. The two sets of two-sided p-values are then computed as 2(jx s j) and 2(jy s j). In the following studies, we will take m =jSj = 7; 000. Although our procedures are designed to detect simulations signals specically in the presence of local structures, i.e. signals are clustered, we will rst consider the homogeneous setting in which the frequency levels of the signals are the same across the locations so we can better understand when our procedures can have an advantage. For the screening step in LAWSmodII, we used t s =(s) to select and then test only the hypotheses with p min s t s . For this homogeneous setting, other than the BoHe procedure, we will also include the results produced by applying the BH procedure as well as the LAWS procedure to the maximum p-values at the nominal level. As shown in Figure 2.1, we conducted three studies for the homogeneous setting where (1) row one corresponds to the study for which either X s = Y s = 0 or X s = Y s = 1, (2) row two corresponds to the study in which the frequency of simultaneous signals is xed at = 0:05 while varying the ratio of 00 to 1 11 , and (3) row three corresponds to the study for which 00 is xed at 0.2 while varying the ratio of 11 to 1 00 . We see that applying the BH procedure to the maximum p-values is the most conservative method having the lowest error and power, which is expected to happen. The unmodied LAWS procedure applied to the maximum p-values in the homogeneous setting is fact equivalent to applying the BH procedure at level m(1) , thus the LAWS procedure can outperform the standard BH procedure and can outperform the BoHe procedure when the frequency of simultaneous signals becomes large, but it will always fall short when compared to the two modied versions as they incorporate more information. We see in row one that when 00 + 11 = 1, LAWSmodI has the largest power and controls the FDR exactly at the level because in this case, the null is only made of X s = Y s = 0 and we knowP null (P max s ) 2 without having to approximateP 10;01 (P max s ), which is unknown and can be much less than , by . In row two, as the composition of the null changes with increasing 00 , the power of 87 LAWSmodI increases the fastest as the modied p-values becomes more accurate, whereas the LAWSmodII and BoHe procedures have comparable power but their powers increased at a slower rate. The two single-study procedures applied to the maximum p-values have very little power as the proportion of simultaneous signals is only 5% as these two procedures fail to take into consideration the null composition. Notice that when the null is more likely to be of state X s + Y S = 1 than X s = Y S = 0, the BoHe procedure failed to control the FDR and this FDR violation does not go away when we replaced the oracle faction by the estimate. The cause remains to be investigated but it appears that the FDR control of the BoHe procedure depends on the selection rules by which the sets S X and S Y are chosen, and this part was not discussed by the original paper. In row three, notice that the power of BoHe stabilized when the ratio of 11 to 1 00 passes certain level and this pattern can also be explained by the selection rule used. On the other hand, our proposed procedures were able to control the FDR at the specied level in all cases. Overall, we see that our proposed procedures have higher power when either the fraction of simulations signals is high or when X s = Y s = 0 is the dominant state among the null. Even though the proportion of locations with simultaneous signals may be small on the global level, these simultaneous signals are very likely to be clustered locally for which our procedures are intended. Next we will move on the setting in which there are local structures present. We will make the patterns of~ (s) piece-wise constants such that simultaneous signals appear more frequently in blocks [1001,1200], [2001,2200], [3001,3200], [4001,4200], [5001,5200] and [6001,6200], whereas in the other locations, signals were congurated according to the specied baseline~ . In Figure 2.2, we have~ = (0:94; 0:02; 0:02; 0:02), which simulate the setting of replicability analysis such that signals tend to either show up simultaneously or not at all, whereas in Figure 2.4, we have~ = (0:58; 0:2; 0:2; 0:02) allowing individual signals to show up more frequently. In the rst row of Figure 2.2 and 2.4, we varied from 2 to 4 in order to investigate the impact of the signal strength and the frequency levels of 88 simultaneous signals (s) in the specied blocks are between 0.7 and 0.8. In row two, is xed at 2.5 while (s) in the specied blocks is increased from 0.4 to 0.9. We see that our proposed procedures became much more powerful as the frequency level of simultaneous signals increased in the specied blocks. As we allow more individual signals to show up outside the specied blocks from Figure 2.2 to Figure 2.4, we see that the FDR of the BoHe procedure increased while the power decreased, which is consistent with what we observed in the homogeneous setting. So far we have assumed the knowledge of the unknown parameters~ (s) in order to demonstrate the performance of our proposed procedures. In practice, we will have to estimate these parameters using the methodology discussed previously. We chose = 0:5 and 0 = 0:25 in estimating (s) and 00 (s), respectively, and we used a Gaussian kernel with a bandwidth h=50. To see how well the estimators and the estimation method works, we visualized the structural patterns and the estimates in Figure 2.6. We see that the varying structure of simultaneous signals as well as the substructure can be captured by ^ (s), ^ 00 (s) reasonably well. As predicted by the theory, the estimate ^ (s) is very conservative as it tends to be smaller than the true (s) except at the boundary of the blocks, and this underestimation will contribute to conservative FDR control in procedure II whereas procedure I will also depends on the estimate of 10;01 (s). By using the estimated~ (s), our procedures can be driven completely by the data without the need of any prior knowledge. We applied the data-driven version of our procedure and the BoHe procedure under the same settings as in Figure 2.2 and 2.4, and the results are displayed in Figure 2.3 and 2.5. For our second procedure, we used t = = 0:05 instead of t s =(s) as the screening threshold on the minimum p-values so that we can see whether using a simple threshold will have a big impact on the performance of the procedure. By comparing the performance of the data-driven procedures with that of the oracle procedures, we see that although the power of our procedures when driven by the data decreased, they still have an advantage when the frequency level of simultaneous signals 89 are large enough in the specied blocks. Moreover, the BoHe procedure would become less competitive when more individual signals, i.e. 10;01 is large. 2.6 Conclusions This projects aims to set up a general framework for the problem of simultaneous signals detection under the presence of strong and smooth local structures. Traditionally, researchers would apply the BH procedure on the maximum p-values at each location, but this procedure is too conservative having low power as it leaves out side information such as the minimum p-values as well as the local structures of the signals. For the problem of replicability analysis, which is one application of simultaneous signals detection, Bogomolov and Heller (2018) proposed a powerful procedure that rst screens each study to select promising features. From the simulation studies, we see that under the null that there is no simultaneous signals, if there is a high probability of each study (spatial process) having signals on it own with the other study (spatial process) not having signals, the procedure proposed by Bogomolov and Heller (2018) can fail to control the FDR and it seems that their procedure depends on the selection threshold used. The proposed procedures in this work is for the setting in which there is strong local structures for simultaneous signals, which is common for spatial or temporal processes such as in brain imaging and genome-wide association studies. These procedures are based on the LAWS procedure developed in Cai et al. (2021) which applies a weight at each location s to the p-value in order to further dierentiate the locations and produce a more informative ranking of the hypotheses. This weight depends on the frequency level of simultaneous signals at location s, so a spatial-adaptive weighted screening approach is used to estimate these parameters. We dened a more rened p-value in the rst procedure and this modied p-value is made possible by the estimation method of LAWS. Instead of rening the p-value, we implemented a screening step on the minimum p-values in the second procedure so we do not need to perform as many estimations. From the simulation studies 90 for the oracle procedures, we see that our proposed procedures can outperform the procedure in Bogomolov and Heller (2018) when there are local structures in the simultaneous signals, but when these procedures are driven by the data, the local structure needs to be stronger in order for our procedures to outperform due to the conservative estimate of the parameters. On the other hand, if each process can have high level of individual signals on its own outside of the blocks with clustered simultaneous signals, our proposed procedures can outperform the procedure proposed by Bogomolov and Heller (2018). Overall, our proposed procedures have decent performance for the intended setting of having strong and smooth local structures. 91 2.7 Proof of Propositions 2.7.1 Proof of Proposition 2.4.1 Under the setting of our problem and Assumption A1, we have P null (P s ) =P null 00 (s) 1(s) P 2 max (s) + 10 (s) + 01 (s) 1(s) P max s =P null 00 (s) 1(s) P 2 max (s) +P max s 10 (s) + 01 (s) 00 (s) =P null P max s + 10 (s) + 01 (s) 2 00 (s) 2 1(s) 00 (s) + 10 (s) + 01 (s) 2 00 (s) 2 =P 00;10;01 P max s s 1(s) 00 (s) + 10 (s) + 01 (s) 2 00 (s) 2 10 (s) + 01 (s) 2 00 (s) =P 00 P max s s 1(s) 00 (s) + 10;01 (s) 2 00 (s) 2 10;01 (s) 2 00 (s) 00 (s) 1(s) +P 10;01 P max s s 1(s) 00 (s) + 10;01 (s) 2 00 (s) 2 10;01 (s) 2 00 (s) 10;01 (s) 1(s) s 1(s) 00 (s) + 10;01 (s) 2 00 (s) 2 10;01 (s) 2 00 (s) 2 00 (s) 1(s) + s 1(s) 00 (s) + 10;01 (s) 2 00 (s) 2 10;01 (s) 2 00 (s) 10;01 (s) 1(s) = =; where the numeric derivation was eliminated. Hence, this adjusted p-value is valid and it is straightforward to see thatP null (P max s )P null (P s ). This concludes the proof of Proposition 2.4.1. 92 2.7.2 Proof of Proposition 2.4.2 This proposition is related to Proposition 1 in Cai et al. (2021) and can therefore be proved using the same method. Recall that 1 ^ (s) = P s2S max v h (s;s 0 ) (1) P s2S v h (s;s 0 ) = P s2S max K h (ss 0 ) (1) P s2S K h (ss 0 ) ; ^ 0 00 (s) = P s2S min 0 v h (s;s 0 ) (1 0 ) 2 P s2S v h (s;s 0 ) = P s2S min 0 K h (ss 0 ) (1 0 ) 2 P s2S K h (ss 0 ) : To showE [^ (s) (s)] 2 ! 0, we will use the bias-{variance decomposition. For the bias term, uniformly for all s2S, we have E 2 4 X s2S max K h (ss 0 ) 3 5 =E X s 0 2S K h (ss 0 )I (P max s 0 >) ! = X s 0 2S [K h (ss 0 )P (P max s 0 >)]; AsS!S, we have E [1 ^ (s)]! R S K h (ss 0 )P (P max s 0 >) ds 0 (1) R S K h (ss 0 ) ds 0 : Under the rst condition (A2) of the proposition, we can use the multivariate Taylor expansion to get Z S K h (ss 0 )P (P max s 0 >) ds 0 =P (P max s >) Z S K h (ss 0 )ds 0 +v T Z S s 0 s h K h ( ss 0 h )ds 0 + h 2 2 O(1) Z R d x t xK(x)dx +o(h 2 ); where v = (v 1 ;:::;v d ) T such that v j =O(1) for j = 1;:::;d. Therefore, we have E[^ (s)] (s) 2 = R S K h (ss 0 )P (P max s 0 >) ds 0 (1) R S K h (ss 0 ) ds 0 P (P max s 0 >) 1 2 c v T R S s 0 s h K h ( ss 0 h )ds 0 R S K h (ss 0 ) ds 0 ! 2 +ch 4 R R d x t xK(x)dx R S K h (ss 0 ) ds 0 2 93 for some constant c> 0. For the variance term, the second condition (A3) of the proposition implies that Var 0 @ X s2S max K h (ss 0 ) 1 A = Var X s 0 2S K h (ss 0 )I (P max s 0 >) ! c 0 X s 0 2S K 2 h (ss 0 )P (P max s 0 >) 1P (P max s 0 >) : Therefore, by lettingS!S, we have Var (1 ^ (s))c 00 jSj 1 R S K 2 h (ss 0 )P (P max s 0 >) 1P (P max s 0 >) ds 0 (1) R S K h (ss 0 ) ds 0 2 c 00 jShj 1 (1) 2 R S K 2 h (x) dx R S K h (ss 0 ) ds 0 2 c 000 jShj 1 (1) 2 = Z S K h (ss 0 ) ds 0 2 ; for some constants c 00 ;c 000 > 0. By lettingjSj!1 and h! 0, based on the requirements for the kernel K() as in (2.4.12) and the domainS being nite, we then arrive at the resultE [^ (s) (s)] 2 ! 0. We can similarly show thatE ^ 0 00 (s) 0 00 (s) 2 ! 0. It then implies thatE h ^ ; 0 10;01 (s) ; 0 10;01 (s) i 2 ! 0 since ^ ; 0 10;01 (s) ; 0 10;01 (s) = 1 ^ (s) ^ 0 00 (s) 1 (s) 0 00 (s) : This concludes the proof. 2.8 Proof of Theorem 2.8.1 Proof of Theorem 2.4.3 This proof is based on the theoretical foundation provided by Cai et al. (2021). First we will state a key lemma that resembles Lemma 1 in Cai et al. (2021). 94 Lemma 2.8.1. Assume there exists a suciently small constant > 0 such that (s)2 [; 1]. Under Conditions (A4) and (A6), applying the BH procedure at level to the adjusted p-values dened, which are dened as p ~ w s = minfp s ~ w s ; 1g = min ( p s 1(s) m(s) X s2S (s) 1(s) ; 1 ) ; will control the FDR and FDP such that lim m!1 FDR and lim m!1 PfFDP +g = 1 for any > 0: Since the original lemma has been proved in the aforementioned work, we will not rewrite the proof here again but will make some references to the elements in that proof. Also note that the above lemma is stated for general (s)2 [; 1], where is a suciently small constant, and they are not restricted to our denition in (2.2.3). Therefore, it holds when we replace (s) by (s), and it also holds under the conditions of Proposition 2.4.2 when we further replace (s) by ^ (s). (a) On the validity of procedure I (2.4.1,2.4.3) First recall that by Proposition 2.4.1, we haveP null (P max s )P null (P s ), so the oracle version of the adjusted p-values P s dened in (2.4.6) is a valid p-value. Note that Procedure 2.4.1 is equivalent to threshold the weighted p-values at t w = sup t n t : [ FDP w (t) o = sup t ( t : P s2S (s)t max P s2S I (P w s t); 1 ) ; where [ FDP w (t) denote the FDP estimated by the procedure that places a threshold of t on the p-values (2.4.6) weighted by w s = 1(s) (s) , and the corresponding oracle FDP can be written as FDP w (t) = P s:s=0 I (P w s t) max P s2S I (P w s t); 1 : 95 Based on the denition of t w , we see that in order to show lim m!1 P FDP I + = 1, it suces to show that, uniformly for all tt w , FDP w (t) [ FDP w (t) c = P s2S [I (P w s t; s = 0)c(s)t] P s2S (s)t ! 0 in probability for some constant c2 (0; 1]. Note that the weights used in the oracle procedure and the weights used in the procedure described in Lemma 2.8.1 are proportional such that ~ w s =w s m 1 P s2S (s) 1(s) , so the ordering of the hypotheses produced by these two procedures are the same. Note that the optimal threshold on p ~ w s produced by Lemma 2.8.1 can be represented as t ~ w = sup t ( t : mt max P s2S I (P ~ w s t); 1 ) ; and by a change of variable, we can rewrite it as t ~ w = sup t 8 < : t : mtm 1 P s2S (s) 1(s) max n P s2S I P ~ w s tm 1 P s2S (s) 1(s) ; 1 o 9 = ; m 1 X s2S (s) 1(s) = sup t ( t : t P s2S (s) 1(s) max P s2S I (P w s t); 1 ) m 1 X s2S (s) 1(s) : Therefore, we can see that the procedure described in the lemma is equivalent to placing on the weighted p-values p w s used by our oracle procedure a threshold of t = sup t ( t : P s2S (s) 1(s) t max P s2S I (P w s t); 1 ) t w : From to the proof of Lemma 1 in Cai et al. (2021), we learn that for every realization of f s 2Sg, uniformly for all tt (t m ) or equivalently t>t w , P s:s=0 [I (P w s t)P (P w s tj s = 0)] P s:s=0 P (P w s tj s = 0) ! 0 as m! 0: 96 Since by Condition A5, we have P s:s=0 P (P w s tj s = 0) P s2S P (P w s t; s = 0) P s2S P (P w s t; s = 0) ! 0 in probability with respect to the random variablesf s :s2Sg dened in (2.2.3). Therefore, we have P s2S [I (P w s t; s = 0)P (P w s t; s = 0)] P s2S P (P w s t; s = 0) ! 0 in probability. And since P (P w s t; s = 0) =P (P w s tj s = 0)P ( s = 0) (s)t 1(s) (1(s)) =(s)t; (2.8.1) we can conclude that uniformly for all tt w , there exists a constant c2 (0; 1] such that as m!1, P s2S [I (P w s t; s = 0)c(s)t] P s2S (s)t ! 0 in probability, and convergence in expectation follows. This concludes the proof of statement (2.4.24) of the theorem. To show the when the parameters used by the oracle procedure are replaced by the intermediate oracle estimators, the procedure can also control the error rates, we will examine modied p-values and the weights separately. For the modied p-value, at this point, we have to resort to assuming the validity of modied p-value dened on the intermediate estimators 0 00 (s) (2.4.18) and ; 0 10;01 (s) = (s) 0 00 (s) until there is a way to conservatively estimate ; 0 10;01 (s). Having assume the validity of the p-value, we can follow the previous derivation but replace (s) by (s) (2.4.11) and the only dierence is in (2.8.1) since now we will have P P w() s t; s = 0 =P P w() s tj s = 0 P ( s = 0) (s)t 1 (s) (1(s)): 97 Recall that by construction, is a conservative estimator such that 1 , therefore the result holds, i.e. P P w() s t; s = 0 (s)t. This concludes the proof of statement (2.4.26) of the theorem. To show this procedure when driven fully by the data can control the errors rates, we will proceed as before but replace (s) by ^ (s) and denote the corresponding weight by ^ w(s). By Proposition 2.4.2, we have P P ^ w s t; s = 0 =P P ^ w s tj s = 0 P ( s = 0) = (1 +o(1)) (s)t 1 (s) (1(s)) (1 +o(1)) (s) asS!S. This concludes the proof of the error rates control of the data-driven procedure I (2.4.28). (b) On the validity of procedure II (2.4.2,2.4.4) Procedure II uses the same weighting scheme but instead of re-dening the p-value to take into consideration the specic state of null, it uses the maximum p-value p max s at each location s as the p-value and apply a threshold on the minimum p-values p min s to screen out likely locations with X s = Y s = 0. This threshold, or thresholds, is pre-specied independent of the maximum p-values that will be used to determine the optimal threshold for FDR control, so the results we have for procedure I are valid when conditioned on p min s t min s . 98 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 11 where π 10,01 = 0 FDR 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 LAWS LAWSmodI LAWSmodII BoHe BH alpha level 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 11 where π 10,01 = 0 ETP 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 00 (π 00 + π 10,01 ) where π 11 = 0.05 FDR 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 LAWS LAWSmodI LAWSmodII BoHe BH alpha level 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 00 (π 00 + π 10,01 ) where π 11 = 0.05 ETP 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 11 (π 11 + π 10,01 ) where π 00 = 0.2 FDR 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 LAWS LAWSmodI LAWSmodII BoHe BH alpha level 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 11 (π 11 + π 10,01 ) where π 00 = 0.2 ETP 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.1: Performance of the oracle procedures when there is no local structure Eect size =3 and number of replications is 1000. 99 2.0 2.5 3.0 3.5 4.0 0.00 0.02 0.04 0.06 0.08 FDR Comparison μ FDR LAWSmodI LAWSmodII BoHe alpha level 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison μ ETP 0.4 0.5 0.6 0.7 0.8 0.9 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 00 =π 10 =π 01 , μ=2.5 π FDR LAWSmodI LAWSmodII BoHe alpha level 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 00 =π 10 =π 01 , μ=2.5 π ETP Figure 2.2: Performance of the oracle procedures part I Baseline frequency level outside of regions with clustered signals is ~ = ( 00 ; 10 ; 01 ; 00 ) = (0:94; 0:02; 0:02; 0:02). 2.0 2.5 3.0 3.5 4.0 0.00 0.02 0.04 0.06 0.08 FDR Comparison μ FDR LAWSmodI_DD LAWSmodII_DD BoHe_DD alpha level 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison μ ETP 0.4 0.5 0.6 0.7 0.8 0.9 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 00 =π 10 =π 01 , μ=2.5 π FDR LAWSmodI_DD LAWSmodII_DD BoHe_DD alpha level 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 00 =π 10 =π 01 , μ=2.5 π ETP Figure 2.3: Performance of the data-driven procedures part I 100 2.0 2.5 3.0 3.5 4.0 0.00 0.02 0.04 0.06 0.08 FDR Comparison μ FDR LAWSmodI LAWSmodII BoHe alpha level 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison μ ETP 0.4 0.5 0.6 0.7 0.8 0.9 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 00 =π 10 =π 01 , μ=2.5 π FDR LAWSmodI LAWSmodII BoHe alpha level 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 00 =π 10 =π 01 , μ=2.5 π ETP Figure 2.4: Performance of the oracle procedures part II Baseline frequency level outside of regions with clustered signals is ~ = ( 00 ; 10 ; 01 ; 00 ) = (0:58; 0:2; 0:2; 0:02). 2.0 2.5 3.0 3.5 4.0 0.00 0.02 0.04 0.06 0.08 FDR Comparison μ FDR LAWSmodI_DD LAWSmodII_DD BoHe_DD alpha level 2.0 2.5 3.0 3.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison μ ETP 0.4 0.5 0.6 0.7 0.8 0.9 0.00 0.02 0.04 0.06 0.08 FDR Comparison π 00 =π 10 =π 01 , μ=2.5 π FDR LAWSmodI_DD LAWSmodII_DD BoHe_DD alpha level 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.2 0.4 0.6 0.8 1.0 Power Comparison π 00 =π 10 =π 01 , μ=2.5 π ETP Figure 2.5: Performance of the data-driven procedures part II 101 0 1000 2000 3000 4000 5000 6000 7000 0.0 0.2 0.4 0.6 0.8 1.0 1 - π 11 τ = 0.5 0 1000 2000 3000 4000 5000 6000 7000 0.0 0.2 0.4 0.6 0.8 1.0 π 00 τ ' = 0.25 0 1000 2000 3000 4000 5000 6000 7000 0.0 0.2 0.4 0.6 0.8 1.0 π 10,01 = 1 - π 11 - π 00 Figure 2.6: Estimation of local structures The red dash lines represent the actual value of the parameters whereas the blue lines represent the estimated value. Bandwidth of the kernel used for the estimation is set to 50. 102 Bibliography Bansal, N. K. and Miescke, K. J. (2013). A bayesian decision theoretic approach to directional multiple hypotheses problems. Journal of Multivariate Analysis, 120:205 { 215. Available at https://doi.org/10.1016/j.jmva.2013.05.012. Bansal, N. K. and Sheng, R. (2010). Bayesian decision theoretic approach to hypothesis problems with skewed alternatives. Journal of Statistical Planning and Inference, 140(10):2894 { 2903. Available at https://doi.org/10.1016/j.jspi.2010.03.013. Basu, A., Harris, I. R., Hjort, N. L., and Jones, M. C. (1998). Robust and ecient estimation by minimising a density power divergence. Biometrika, 85(3):549{559. Available at https://doi.org/10.1093/biomet/85.3.549. Basu, P., Cai, T. T., Das, K., and Sun, W. (2018). Weighted false discovery rate control in large-scale multiple testing. Journal of the American Statistical Association, 113(523):1172{1183. Available at https://doi.org/10.1080/01621459.2017.1336443. Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics, 64(4):1215{1222. Available at https://doi.org/10.1111/j.1541-0420.2007.00984.x. Benjamini, Y., Heller, R., and Yekutieli, D. (2009). Selective inference in complex research. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4255{4271. Available at https://doi.org/10.1098/rsta.2009.0127. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289{300. Available at http://www.jstor.org/stable/2346101. Benjamini, Y. and Hochberg, Y. (1997). Multiple hypotheses testing with weights. Scandinavian Journal of Statistics, 24(3):407{418. Available at https://doi.org/10.1111/1467-9469.00072. Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1):60{83. Available at https://doi.org/10.3102/10769986025001060. Benjamini, Y., Hochberg, Y., and Kling, Y. (1993). False discovery rate control in pairwise comparisons. Research Paper, Department of Statistics and Operations Research, Tel Aviv University. 103 Benjamini, Y. and Yekutieli, D. (2005). False discovery rate{adjusted multiple condence intervals for selected parameters. Journal of the American Statistical Association, 100(469):71{81. Available at https://doi.org/10.1198/016214504000001907. Bogomolov, M. and Heller, R. (2018). Assessing replicability of ndings across two studies of multiple features. Biometrika, 105(3):505{516. Available at https://doi.org/10.1093/biomet/asy029. Cai, T. T., Sun, W., and Wang, W. (2019). Covariate-assisted ranking and screening for large-scale two-sample inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(2):187{234. Available at https://doi.org/10.1111/rssb.12304. Cai, T. T., Sun, W., and Xia, Y. (2021). Laws: A locally adaptive weighting and screening approach to spatial multiple testing. Journal of the American Statistical Association, 0(0):1{14. Available at https://doi.org/10.1080/01621459.2020.1859379. Chen, G., Cox, R. W., Glen, D. R., Rajendra, J. K., Reynolds, R. C., and Taylor, P. A. (2019a). A tail of two sides: Articially doubled false positive rates in neuroimaging due to the sidedness choice with t-tests. Human Brain Mapping, 40(3):1037{1043. Available at https://doi.org/10.1002/hbm.24399. Chen, G., Xiao, Y., Taylor, P., Rajendra, J. K., Riggins, T., Geng, F., Redcay, E., and Cox, R. (2019b). Handling multiplicity in neuroimaging through bayesian lenses with multilevel modeling. Neuroinformatics, 17(4):515{545. Available at https://doi.org/10.1007/s12021-018-9409-6. Cover, T. M. and Thomas, J. A. (2005). Elements of Information Theory. John Wiley & Sons, Ltd. Available at https://doi.org/10.1002/047174882X.ch11. Csisz ar, I. (1967). Information-type measures of dierence of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica, 2:229{318. Dunn, K. W., Kamocka, M. M., and McDonald, J. H. (2011). A practical guide to evaluating colocalization in biological microscopy. American journal of physiology. Cell physiology, 300 4:C723{42. Efron, B. and Tibshirani, R. (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiology, 23(1):70{86. Available at https://onlinelibrary.wiley.com/doi/pdf/10.1002/gepi.1124. Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001). Empirical bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96(456):1151{1160. Available at https://doi.org/10.1198/016214501753382129. Friston, K. J., Penny, W. D., and Glaser, D. (2005). Conjunction revisited. NeuroImage, 25:661{667. Available at https://doi.org/10.1016/j.neuroimage.2005.01.013. 104 Fu, A., Narasimhan, B., Kang, D. W., Diamond, S., Miller, J., Boyd, S., and Roseneld, P. K. (2021). CVXR: Disciplined Convex Optimization. R package version 1.0-9. Availalbe at https://cran.r-project.org/package=CVXR. Fu, L. J., James, G. M., and Sun, W. (2020). Nonparametric empirical bayes estimation on heterogeneous data. Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):499{517. Available at https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00347. Genovese, C. R., Roeder, K., and Wasserman, L. (2006). False discovery control with p-value weighting. Biometrika, 93(3):509{524. Available at https://doi.org/10.1093/biomet/93.3.509. Jones, L. V. and Tukey, J. W. (2000). A sensible formulation of the signicance test. Psychological Methods, 5(4):411{414. Available at https://doi.apa.org/doi/10.1037/1082-989X.5.4.411. Kaiser, H. F. (1960). Directional statistical decisions. Psychological Review, 67(3):160{167. Available at https://doi.org/10.1037/h0047595. Kullback, S. and Leibler, R. A. (1951). On Information and Suciency. The Annals of Mathematical Statistics, 22(1):79 { 86. Available at https://doi.org/10.1214/aoms/1177729694. Lewis, C. and Thayer, D. T. (2004). A loss function related to the FDR for random eects multiple comparisons. Journal of Statistical Planning and Inference, 125(1):49{58. Available at https://doi.org/10.1016/j.jspi.2003.07.020. Lewis, C. and Thayer, D. T. (2009). Bayesian decision theory for multiple comparisons. Lecture Notes-Monograph Series, 57:326{332. Available at http://www.jstor.org/stable/30250048. MOSEK ApS (2021). Rmosek: The R to MOSEK Optimization Interface. R package version 9.2.38. Available at https://www.mosek.com. Rosenthal, R. (1986). Meta-analytic procedures for social science research sage publications: Beverly hills, 1984, 148 pp. Educational Researcher, 15:18 { 20. Available at https://doi.org/10.3102%2F0013189X015008018. Sarkar, S. K. and Zhou, T. (2008). Controlling Bayes directional false discovery rate in random eects model. Journal of Statistical Planning and Inference, 138(3):682 { 693. Available at https://doi.org/10.1016/j.jspi.2007.01.006. Schweder, T. and Spjtvoll, E. (1982). Plots of P-values to evaluate many tests simultaneously. Biometrika, 69(3):493{502. Available at https://doi.org/10.1093/biomet/69.3.493. 105 Shaer, J. P. (1972). Directional statistical hypotheses and comparisons among means. Psychological bulletin, 77(3):195{197. Available at https://doi.org/10.1037/h0032261. Shaer, J. P. (1974). Bidirectional unbiased procedures. Journal of the American Statistical Association, 69(346):437{439. Available at 10.1080/01621459.1974.10482970. Shaer, J. P. (2002). Multiplicity, directional (type III) errors, and the Null Hypothesis. Psychological Methods, 7(3):356 { 369. Available at https://doi.org/10.1037/1082-989X.7.3.356. Stephens, M. (2016). False discovery rates: a new deal. Biostatistics, 18(2):275{294. Available at https://doi.org/10.1093/biostatistics/kxw041. Stephens, M., Carbonetto, P., Dai, C., Gerard, D., Lu, M., Sun, L., Willwerscheid, J., Xiao, N., and Zeng, M. (2020). ashr: Methods for Adaptive Shrinkage, using Empirical Bayes. R package version 2.2-47. Available at https://cran.r-project.org/package=ashr. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):479{498. Available at https://doi.org/10.1111/1467-9868.00346. Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102(479):901{912. Available at https://doi.org/10.1198/016214507000000545. Tukey, J. W. (1991). The Philosophy of Multiple Comparisons. Statistical Science, 6(1):100 { 116. Available at https://doi.org/10.1214/ss/1177011945. 106
Abstract (if available)
Abstract
Statistical hypothesis test is among the most eminent methods in statistics as it has been applied to a wide range of practical problems in many disciplines. Multiple testing refers to the testing of more than one hypothesis at a time. Procedures developed for this type of problem aims to make as many rejections of the null hypotheses as possible while controlling some error rates such as the false discovery rate in large-scale setting, for which there can be tens of thousands or millions hypotheses to be tested simultaneously such as in genomics. The null hypothesis being tested against the alternative can be exact, i.e. an exact parameter value is specified, or inexact, i.e. only a range or interval is specified. However, performing such a test may not be able to arrive at a satisfying answer directly, and sometimes, the procedures designed for this general multiple hypothesis testing fail to incorporate auxiliary information that can be utilized to improve the power of the procedure. For example, in two-sample multiple testing, researchers are very often more interested in the direction of the difference if there is a difference, which is in fact a debated statement on its own. In replicability analysis, researchers want to identify features or locations that are non-null in both studies and these signals may have intrinsic local structures as they tend to be clustered in certain regions. This dissertation is on the topics of selective inference and replicability analysis studying the two aforementioned problems. ❧ Chapter 1 of this dissertation studies the problem of selective inference with confident directions. A general selective inference framework is developed for the problem of directional analysis and it incorporates the researchers' preferences as selection weights in the definition of the power. An oracle procedure is developed to maximize the weighted power while controlling the false selection rate. A deconvolution method for estimating the underlying distribution of the effect sizes being classified is introduced. Theories and simulations results are presented to justify the proposed procedure. ❧ Chapter 2 extends the topic of replicability analysis to the adaptive detection of simultaneous signals under the presence of strong and smooth local structures in terms of location-based signal frequency. Two procedures are proposed to incorporate the information provided by the two studies into the construction of the p-values using two different approaches and then apply the procedural weights derived from the local structural information regarding the simultaneous signals. Estimators are constructed and then numerically estimated to extract the structural information. Theories and simulations results are presented to justify the two proposed procedures.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Large scale inference with structural information
PDF
Large-scale multiple hypothesis testing and simultaneous inference: compound decision theory and data driven procedures
PDF
High dimensional estimation and inference with side information
PDF
Conformalized post-selection inference and structured prediction
PDF
Model selection principles and false discovery rate control
PDF
Adapting statistical learning for high risk scenarios
PDF
Statistical inference for second order ordinary differential equation driven by additive Gaussian white noise
PDF
Sequential analysis of large scale data screening problem
PDF
Optimizing statistical decisions by adding noise
PDF
Asymptotically optimal sequential multiple testing with (or without) prior information on the number of signals
PDF
Asymptotic problems in stochastic partial differential equations: a Wiener chaos approach
PDF
Large-scale inference in multiple Gaussian graphical models
PDF
Nonparametric empirical Bayes methods for large-scale inference under heteroscedasticity
PDF
Neural matrix factorization model combing auxiliary information for movie recommender system
PDF
Statistical learning in High Dimensions: Interpretability, inference and applications
PDF
Multi-population optimal change-point detection
PDF
Statistical insights into deep learning and flexible causal inference
PDF
Shrinkage methods for big and complex data analysis
PDF
Feature selection in high-dimensional modeling with thresholded regression
PDF
Leveraging sparsity in theoretical and applied machine learning and causal inference
Asset Metadata
Creator
Liu, Jinting
(author)
Core Title
Topics in selective inference and replicability analysis
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Degree Conferral Date
2022-05
Publication Date
02/15/2022
Defense Date
01/20/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
directional decisions,false discovery rate,local false discovery rate,OAI-PMH Harvest,procedural weights,replicability analysis,selection weights,selective inference,simultaneous signals
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mikulevicius, Remigijus (
committee chair
), Lototsky, Sergey (
committee member
), Sun, Wenguang (
committee member
)
Creator Email
jintingl@usc.edu,liujt23206@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110760034
Unique identifier
UC110760034
Legacy Identifier
etd-LiuJinting-10392
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Liu, Jinting
Type
texts
Source
20220223-usctheses-batch-913
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
directional decisions
false discovery rate
local false discovery rate
procedural weights
replicability analysis
selection weights
selective inference
simultaneous signals