Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Sequential testing of multiple hypotheses
(USC Thesis Other)
Sequential testing of multiple hypotheses
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SEQUENTIAL TESTING OF MULTIPLE HYPOTHESES by Jinlin Song A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2013 Copyright 2013 Jinlin Song Dedication To my family ii Acknowledgments First of all, I would like to thank my advisor, Prof. Jay Bartro, without whom this dissertation would not have been possible. I am truly indebted and grateful to his patience, encouragement and guidance throughout my work on this thesis. I would also like to extend my gratitude to the committee members for their valuable insights. My sincerest thanks to my family for their unconditional love, which comforted me whenever I was hurt and guided me through many dicult times. Although oceans apart, they have always been there for me. I would like to thank my boyfriend Xiaokun who has always believed in me and has been especially understanding and helpful during the preparation of my thesis. Finally, a heartfelt thanks to my fellow graduate students, Selenne, Alyona, Xin and many more, for their encouragement and support. I hope I was a source of support for them as well and wish the best for them. iii Table of Contents Dedication ii Acknowledgments iii List of Tables vi List of Figures vii Abstract viii Chapter 1: Introduction 1 1.1 Familywise Error Rate and False Discovery Rate . . . . . . . . . . . 1 1.2 Sequential Hypotheses Testing . . . . . . . . . . . . . . . . . . . . . 4 1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: A General Framework for Rejective Sequential Tests of Multiple Hypotheses Controlling FWER 8 2.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Fixed-Sample Multiple Hypotheses Testing . . . . . . . . . . 8 2.1.2 Sequential Multiple Hypotheses Testing . . . . . . . . . . . . 11 2.2 A General Framework . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Sequential Step-Down Procedure . . . . . . . . . . . . . . . 16 2.3.2 A Sequential Testing Procedure for Hypotheses in Order . . 18 2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.1 Chromosome aberrations of patients exposed to anti-tuberculosis drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 Identifying the maximum safe dose . . . . . . . . . . . . . . 30 Chapter 3: Sequential Testing of Multiple Hypotheses Controlling Type I and II Familywise Error Rates 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 A Sequential Test Controlling FWE I and FWE II . . . . . . . . . . 36 iv 3.2.1 Notation and Set-Up . . . . . . . . . . . . . . . . . . . . . . 36 3.2.2 The Sequential Holm Procedure . . . . . . . . . . . . . . . . 42 3.3 Constructing Test Statistics that Satisfy (3.2)-(3.3) for Individual Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 Simple Hypotheses and Their Use as Surrogates for Certain Composite Hypotheses . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Other Composite Hypotheses . . . . . . . . . . . . . . . . . 59 3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.1 Independent Bernoulli Data Streams . . . . . . . . . . . . . 64 3.4.2 Correlated Normal Data Streams . . . . . . . . . . . . . . . 65 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Chapter 4: Sequential Testing of Multiple Hypotheses Controlling False Discovery Rate 71 4.1 Benjamini and Hochberg's Fixed-Sample Step-Up Procedure . . . . 72 4.2 A Sequential Procedure Controlling FDR . . . . . . . . . . . . . . . 73 4.2.1 Notation and Set-Up . . . . . . . . . . . . . . . . . . . . . . 74 4.2.2 The Rejective Sequential Benjamini-Hochberg Procedure . . 76 4.3 Constructing Test Statistics that Satisfy (4.3) for Individual Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Chapter 5: Sequential Testing of Multiple Hypotheses Controlling Type I and Type II False Discovery Rates 91 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2 A Sequential Test Controlling FDR I and FDR II . . . . . . . . . . . 92 5.2.1 Notation and Set-Up . . . . . . . . . . . . . . . . . . . . . . 92 5.2.2 The Sequential Benjamini-Hochberg Procedure . . . . . . . . 94 5.3 Constructing Test Statistics that Satisfy (5.2) and (5.3) for Individ- ual Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.1 Independent Bernoulli Data Streams . . . . . . . . . . . . . 107 5.4.2 Correlated Normal Data Streams . . . . . . . . . . . . . . . 108 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 References 112 v List of Tables 2.1 Total chromosome aberrations per 100 cells (including gaps). . . . . 23 2.2 Sample size needed for the chromosome aberration example with dierent N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Expected sample size of the permutated chromosome aberration data with dierent N. . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Operating characteristics of the sequential and xed-sample proce- dures for testing hypotheses in order. . . . . . . . . . . . . . . . . . 31 3.1 Three sample paths of the sequential Holm procedure for k = 3 hypotheses about Bernoulli data using critical values A 1 =2:34, A 2 =1:94, A 3 =1:27, B 1 = 1:93, B 2 = 1:53, B 3 = :86. The values of the stopped sequential statistics are in bold. . . . . . . . . 47 3.2 Critical values (3.25) of the sequential Holm procedure for simple hypotheses, for = :05, = :2, = 0 and k = 2;:::; 10 to two decimal places. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 Operating characteristics of sequential and xed-sample multiple testing procedures controlling familywise error rates for k streams of independent Bernoulli data. . . . . . . . . . . . . . . . . . . . . . 66 3.4 Operating characteristics of sequential and xed-sample multiple testing procedures controlling familywise error rates for k streams of correlated Normal data. . . . . . . . . . . . . . . . . . . . . . . . 69 4.1 Signicance levels of testing = 0 vs. = 1 about N(; 1) data with critical value (4.17). . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2 Powers of testing = 0 vs. = 1 with critical value (4.17). . . . 88 5.1 Critical values (5.18) of the sequential Benjamini-Hochberg pro- cedure for simple hypotheses, for = :05, = :2, = 0 and k = 2;:::; 10 to two decimal places. . . . . . . . . . . . . . . . . . . 105 5.2 Operating characteristics of sequential and xed-sample multiple testing procedures controlling false discovery rates for k streams of independent Bernoulli data. . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Operating characteristics of sequential and xed-sample multiple testing procedures controlling false discovery rates for k streams of correlated Normal data. . . . . . . . . . . . . . . . . . . . . . . . . 110 vi List of Figures 2.1 Total chromosome aberrations per 100 cells (including gaps) . . . . 24 2.2 Flow chart of the testing procedure for the chromosome aberration example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3 Q-Q plots for control group, before treatment, and after treatment. 28 2.4 p-value for individual hypothesis. . . . . . . . . . . . . . . . . . . . 29 3.1 The hierarchical clustering structure for Example (d) of Zou and Hastie [55]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 vii Abstract There are many areas of scientic inquiry where it is desired to test multiple sta- tistical hypotheses concerning data accumulated over time. For example, new technology in microarray analysis and neuroimaging often requires testing tens of thousands of hypotheses about streams of data from experiments or imaging runs. Multiple-endpoint biomedical clinical trials is another area with these characteris- tics, in which the need for sequential sampling can be especially strong to reduce the number of patients exposed to a toxic treatment, or decrease the delay until an ecacious treatment is made available to sick patients. In these examples, sequential analysis methods must be married with multiple testing error control methods to prevent false discoveries. This thesis presents new sequential methods that control the multiple testing error rate, of which we consider the two most commonly-used: familywise error rate (FWER) and false discovery rate (FDR). For FWER-control we rst develop a general framework for rejective sequen- tial multiple testing procedures. This allows us to generalize xed-sample FWER- controlling procedures to the sequential setting. As an example we develop a viii sequential procedure for testing hypotheses in order [39] and apply it to two clini- cal trial data sets. Next we propose a general and exible sequential procedure that allows the combination of standard sequential tests into a sequential multiple test- ing procedure that simultaneously control both type I and II FWER at prescribed levels, regardless of the data's between-stream dependence. Through extensive simulation studies, the proposed procedure shows both savings in expected sample size and less conservative error control when compared with xed-sample Holm, sequential Bonferroni, and other previously proposed sequential procedures. Mirroring our proposal for FWER-control, for FDR-control we propose gen- eral and exible sequential procedures that combine standard sequential tests into sequential multiple testing procedures that either control FDR, or simulta- neously control type I and II FDR; both procedures are inspired by Benjamini and Hochberg's [8] seminal xed-sample procedure. FDR-control is rst proved under independence of data streams, and then under arbitrary dependence using a small, logarithmic in ation of error rate. These are the rst sequential FDR-controlling procedures of their kind, and show a dramatic reduction in expected sample size when compared to xed-sample procedures in our extensive simulation studies, which consider both independent and dependent data streams. ix Chapter 1 Introduction 1.1 Familywise Error Rate and False Discovery Rate Multiple hypotheses testing is the testing of more than one statistical hypotheses at a time. It occurs frequently in various scientic areas. For example, in pharma- ceutical studies, compounds are usually tested at many dierent dose levels. At each level, the null hypothesis of no signicant eect will be tested, and overall conclusions, such as the eectiveness, the minimum eective dose or the maximum safe dose of the compound, will be made based on all the hypotheses tested on the dierent dose levels [46, 48]. Multiple hypotheses testing is also commonly used in neuroscience, where thousands of voxels (volume elements) in the functional MRI image of a brain are tested at the same time to identify the area activated when the subject is exposed to certain stimuli [17, 34]. Other areas of application include clinical trials with multiple endpoints [52, 50], microarray data analysis in genomics [18], and variable selection in high-dimensional regression [33]. 1 Assume that we have a set for which each 2 indexes a probability measure P , dened on a common outcome space . Here is not necessarily a parameter in the traditional nite-dimensional sense, but can more generally be an arbitrary index of probability measures. Suppose we are to test a setH = fH (1) ;H (2) ;:::;H (k) g of hypotheses, whereH (i) , fori = 1; 2;:::;k. For every 2 , a hypothesis H2H is said to be true if 2H. We deneT () =fH (i) 2 H : 2 H (i) g, the collection of true null hypotheses when P is the underlying probability measure, andF() =fH (i) 2H : = 2 H (i) g, the complement ofT () inH. In the traditional frequentist testing paradigm for testing a single hypothesis, the probability of type I error is usually controlled at a pre-designated level . A variety of generalizations to the multiple testing situation are considered in the multiple testing literature. Among them, the most popular ones are the familywise error rate (FWER) and the false discovery rate (FDR). FWER is dened as the probability of making at least one false rejection, that is, FWER =P 0 @ [ H (i) 2T () fH (i) is rejectedg 1 A : Controlling the FWER only when all hypotheses are true is called the weak control of the FWER, while the strong control of the FWER refers to the control of FWER 2 under all congurations. FDR, on the other hand, describes the proportion of false rejections among all rejections: FDR =E( V R_ 1 ); whereV is the number of type I errors made out of theR rejections. It is equivalent to the FWER when all hypotheses are true, but smaller than the FWER under other congurations. Therefore, procedures controlling the FDR typically provide more power. In practice, the choice of whether to use FWER or FDR as the error metric reference depends on the nature of the problem. If the overall conclusion is only true when all rejections are true, then the FWER should be adopted. For example, in a toxicological study, any false rejection will lead to an underestimated, hence unsafe, maximum safe doze. In this case, the FWER should be chosen over the FDR. On the other hand, in a clinical trial with multiple endpoints, the overall conclusion that the treatment is superior doesn't need to be false even if some of the rejections are false. Hence controlling the FWER may be unnecessarily stringent and FDR may be a more natural choice. 3 1.2 Sequential Hypotheses Testing Sequential testing is hypotheses testing where the sample size is not xed in advance. Instead, data are evaluated as they are collected, and further sampling is stopped as soon as signicant results are observed. Thus, in a sequential pro- cedure, the number of observations required is a random variable instead of a predetermined number. We point out that our use of the word "sequential" in this thesis refers to the manner of sampling (or equivalently, observation) and diers from the way the word is sometimes used in the literature on xed-sample multiple testing procedures to describe the stepwise analysis of xed-sample test statistics, e.g., p-values. Sequential procedures were rst developed during the World War II for more ecient industrial quality controls. One of the fundamental procedures, the sequential probability ratio test (SPRT), was introduced by Wald [53]. Let X 1 ;X 2 ;::: be a sequence of random variables with joint density functions f n ( 1 ; 2 ;:::; n );n = 1; 2;::: . Consider the simple hypotheses H 0 : f n = f 0n for all n against H 1 : f n = f 1n for all n for some given sequencesff 0n g andff 1n g of 4 densities. Let l n = f 1n (x 1 ;:::;x n )=f 0n (x 1 ;:::;x n ), the likelihood ratio. Choose constants 0<A< 1<B <1. The SPRT samples X 1 ;X 2 ;::: until N = 8 > > > < > > > : rst n 1 such that l n = 2 (A;B) 1 if l n 2 (A;B) for all n 1: If N <1, then Reject H 0 if l N B; Accept H 0 if l N A: For independent, identically distributed observations, the SPRT is optimal in the sense of minimizing the expected sample size under both H 0 and H 1 among all tests with the same level of type I and II error rates. One important application of sequential testing is in clinical trials [7]. Clinical trials can be sequential in two dierent senses. Firstly, the patients are often recruited sequentially and the recruiment will be stopped when signicant evidence is accumulated [10]. On the other hand, the pool of patients may be xed, but measurements are repeatedly being taken on this xed pool until signicance is detected [16]. Nowadays, most modern clinical trials have more than one endpoint (outcome measurement), which means sequential testing for a single hypothesis is no longer sucient. 5 1.3 Summary This thesis focuses on sequential testing for multiple hypotheses. In Chapter 2, we introduce a general framework for rejective sequential tests of multiple hypotheses controlling FWER, that is, procedures that only stop early to reject a hypothe- sis. We provide an alternative proof for the FWER controlling of Bartro and Lai's [2010] procedure under the framework. The framework is then used to derive a sequential testing procedure for hypotheses in order [39], illustrating that the framework can be used to generalize xed-sample FWER controlling procedures to sequential procedures in nontrivial settings. The performance of this new pro- cedure is demonstrated with both real and simulated data in clinical trials. The procedures discussed in Chapter 2 only allow early stopping when sucient evidence has been collected to reject the a null hypothesis. In Chapter 3, we pro- pose a sequential testing procedure that allows early acceptances as well as early rejections while simultaneously controlling both the type I and II FWER. The pro- posed procedure shows both savings in expected sample size and less conservative error control in a simulation study. Chapter 4 and Chapter 5 concerns sequential testing procedures that control FDR. First, a rejective procedure inspired by Benjamini and Hochberg's [1995] xed-sample step-up procedure is proposed in Chapter 4. This procedure's FDR control of this procedure is proved under independence, while for dependent data, a simple conservative modication is needed. This modication provides FDR 6 control regardless of dependence. Base on this procedure, we propose a new proce- dure in Chapter 5 that simultaneously controls type I and II false discovery rates. The main theorem is proved in Section 5.2 and how to nd the critical values for likelihood ratio tests is illustrated in Section 5.3. Finally, the performance and operating characteristics of the new procedure are demonstrated through simula- tion studies in Section 5.4. 7 Chapter 2 A General Framework for Rejective Sequential Tests of Multiple Hypotheses Controlling FWER 2.1 Previous Work 2.1.1 Fixed-Sample Multiple Hypotheses Testing Many procedures have been developed for multiple hypotheses testing, among which the Bonferroni procedure [41] is the oldest, most well-known, and most general. Recall our set-up from Section 1.1, in which we test a set H = fH (1) ;H (2) ;:::;H (k) g of hypotheses at the same time. Assume that for each H (i) there is a valid p-value p (i) such that, for any 2 H i , P (p (i) ) for all 0 < < 1. The Bonferroni procedure rejects H (i) if p (i) =k. Although the Bonferroni procedure lacks power when the number of hypotheses is large or the 8 test statistics are highly correlated, it remains popular in applications because it is easy to implement and it doesn't require independence of the hypotheses or modeling of the dependence structure, which in many cases is not practical. To improve the power and conservative nature of the Bonferroni procedure, Holm [24] provided a rejective procedure that starts with the hypothesis with the smallest p-value. Let p (i 1 ) p (i 2 ) p (i k ) be the ordered p-values. Holm's procedure rejects the corresponding hypothesis H (i 1 ) if p (i 1 ) =k. If H (i 1 ) is accepted, all hypotheses are accepted and no further test is conducted. Otherwise, one continues by testing H (i 2 ) and rejecting if p (i 2 ) =(k 1). Continuing in this way,H (i j ) is rejected if and only ifH (i 1 ) ;H (i 2 ) ;:::;H (i j1 ) are all rejected and p (i j ) =(kj+1). In other words, Holm's procedure rejectsH (i 1 ) ;H (i 2 ) ;:::;H (i j ) if p (i l ) =(kl + 1) for all lj. Rejective procedures that start with the most signicant hypothesis are called step-down procedures, while step-up procedures refer to procedures that start with the least signicant hypothesis. Both procedures mentioned above control the FWER at level in the strong sense. Another improvement of the Bonferroni procedure was suggested by Simes [44], who proposed a procedure to test the intersection hypothesis H (0) = T k i=1 H (i) . It rejects H (0) if p (i j ) j=k for any j = 1;:::;k. This method controls the probability of type I error at level in the conventional sense with respect to H (0) when H (1) ;H (2) ;:::;H (k) are independent. Simes also provided extensive simulation results suggesting the procedure controls type I error for many other 9 types of dependency that are typically seen in testing situations. However, Simes didn't provide a test for individual hypotheses, only for the overall intersection hypothesis H 0 . Based on the idea of employing the closure principle to extend Simes' procedure, due to Hommel [25], Hochberg [21] derived an extended Simes' procedure for making statements on individual hypotheses. This step-up procedure rejectsH (i 1 ) ;H (i 2 ) ;:::;H (i j ) ; if p (i j ) =(kj + 1). In other words, the procedure rejects all the hypotheses if p (i k ) , otherwise, it moves on to H (i k1 ) and reject H (i 1 ) ;:::;H (i k1 ) if p (i k1 ) =2, etc. The extended Simes' procedure strongly controls the FWER when the original Simes' procedure controls the probability of type I error for the overall intersection hypothesis and it is more powerful than Holm's step-down procedure, which uses the same set of critical values. The generality of procedures such as the Bonferroni procedure and Holm's procedure enables them to be applicable in many situations, without restriction on the correlations of the individual test statistics. However, they don't take the structure of the hypotheses into account and this causes them to be under powered when the hypotheses are structurally related in some way. Marcus et al.'s [30] closed testing procedure is one of the rst procedures to take the structure of the hypotheses into consideration without assuming a particular joint distribution, such as multivariate normal. The procedure controls FWER whenH is closed under intersection, i.e., if H (i) ;H (j) 2H, then H (i) \H (j) 2H. A hypothesis H (i) is rejected if and only if all hypotheses contained in H (i) are rejected and 10 p (i) . The procedure was shown to strongly control the FWER, and since every hypothesis is tested at level instead of some fraction of , it provides more power than the other FWER-controlling procedures mentioned above when Hisclosed. Inspired by the closed testing, more FWER controlling procedures were introduced for structured hypotheses including the partitioning procedure [45], the gatekeeping procedures [54], the testing procedures for trees [33], and the testing procedure for hypotheses in order [39], which will be discussed in Section 2.3 where we generalize it in the sequential setting. Generally speaking, procedures for structured hypotheses should be adopted whenever the structure is observed, because they, in general, take advantage of the information provided to increase the power. Almost all of the xed-sample FWER-controlling procedures we mentioned above can be constructed as special cases of a general rejection principle recently discoevered by Goeman and Solari [19]. The general framework introduced in this chapter further generalizes the principle to the sequential setting. 2.1.2 Sequential Multiple Hypotheses Testing As mentioned above, multiple testing and sequential testing are both quite mature elds. However, the intersection of these two areas is not well-developed in a general setting. One area that has been considered is the adaptation of some clas- sical xed-sample tests about vector parameters to the sequential sampling setting, 11 including O'Brien and Fleming's [36] sequential version of Pearson's 2 test, and Tang et al.'s [49, 50] group sequential extensions of O'Brien's [35] generalized least squares statistic. For bivariate normal populations, Jennison and Turnbull [27] proposed a sequential test of two one-sided hypotheses about the bivariate mean vector, and Cook and Farewell [12] proposed a sequential test in a similar set- ting but where one of the hypotheses is two-sided. A procedure for comparing three treatments was proposed by Siegmund [43], related to Paulson's [37] earlier procedure for selecting the largest mean of k normal distributions, which Bartro and Lai [5] showed to be a special case of their more general sequential step-down method. Assume there is a setN of possible sample sizes. Fori = 1; 2;:::;k andn2N, denote the test statistic and the correspondent p-value ofH (i) when sample size is n by (i) (n) and p (i) (n). Assume there exists a critical value function C (i) () that is non increasing in 2 (0; 1) and satises sup 2H i P (sup n2N [ (i) (n)C (i) ()] 0) (2.1) for all 0 < < 1 and i = 1;:::;k. The sequential step-down procedure proposed by Bartro and Lai [5] is dened as the following. Let I 1 =f1; 2;:::;kg;n 0 = 0. For j = 1;:::;k: 12 1. Sample up to n j = inffn2N :n>n j1 and max i2I j (i) (n)C (i) (=jI j j)g 2. Order the test statistics (i(j;1)) (n j ) (i(j;2)) (n j ) (i(j;jI j j)) (n j ); where i(j;`) denotes the index of the `th ordered statistic at stage j. 3. Reject H ((i(j;jI j j)) ;H (i(j;jI j j1)) ;H (i(j;jI j jm j +1)) , where m j = min m 1 : min 1lm (i(j;jI j j`+1)) (n j )C (i(j;jI j j`+1)) (=`) 0 : 4. Stop if j = k, or if n j = supN. Otherwise, let I j+1 be the indices of the remaining hypotheses and continue on to stage j + 1. Note that the sequential step-down procedure involves ranking the test statistics, which may be on completely dierent scales in general. One can solve this problem by introducing a standardizing function ' (i) () for each hypothesisH (i) , which will be applied to the statistic (i) (n) before ranking. The standardizing functions' (i) n can be any increasing functions such that ' (i) (C (i) (=j)) = j. So by replacing (i) (n) by e (i) (n) = '( (i) (n)), C (i) (=jI j j) byjI j j and C (i(j;jI j `+1j)) (=`) by `, 13 the sequential step-down procedure can be applied to the problems in which the test statistics are not on the same scale. This sequential step-down procedure was shown to control the FWER in the strong sense in [5], and we will show that it is a special case of the general framework given in Section 2.2. 2.2 A General Framework In this section, we dene a sequential rejection procedure as a random functionN from 2 H N to 2 H , where 2 H is the power set ofH andN is the set of all possible sample sizes. N (R;n) determines the set of hypotheses to be rejected when the sample size is n and the hypotheses inRH have already been rejected. Let R i H denote the set of hypotheses rejected after the i th stage, i = 1; 2;:::;k. LettingR 0 =; andn 0 = 0, the iterative rejection procedure based onN is dened by R i =R i1 [N (R i1 ;n i ) for i = 1; 2;:::;k; where n i = inffn2N;nn i1 :N (R i1 ;n)6=;g: Note that ifN (R i1 ;n) =; for all n n i1 , we have n i =1 by convention and thus we deneN (R;1) =; for allRH. The procedure terminates when N (R i ;n i+1 ) =;. Since at least one hypothesis is rejected at each stage, there are 14 at most k iterative stages. If a procedure is terminated at the i th stage, where i<k, then by denition,R j =R i for j =i + 1;i + 2;:::;k. Theorem 2.1. Suppose that for everyRSH and n2N, N (R;n)N (S;n)[S (2.2) with probability 1 for every P , 2 , and that for every 2 , we have P fN (F();n)F() for all n2Ng 1: (2.3) Then, for every 2 ; P (R k F()) 1: Proof. We use induction to prove that for any 2 , the event E =fN (F();n)F() for all n2Ng implies thatR k F(). ObviouslyR 0 F(). Now suppose thatR i F(). By (2.2), we have R i+1 =N (R i ;n i+1 )[R i N (F();n i+1 )[R i : 15 Hence E impliesR i+1 F()[R i =F(): Therefore, P (R k F())P (E) 1: The rejection principle for xed-sample size procedures presented in Goeman and Solari [19] can now be regarded as a special case of Theorem 2.1 by taking N =fmg, where m is the xed sample size. 2.3 Examples 2.3.1 Sequential Step-Down Procedure For allRH, and n2N, consider the function N (R;n) = H (i) 2HnR : (i) (n)C (i) (=jHnRj) ; whereC (i) () is as dened in (2.1). The iterative procedure based onN is exactly the sequential step-down procedure. To show that the procedure strongly controls the FWER, we are going to show it conforms to the conditions in Theorem 2.1. Condition (2.2) is immediate from the monotonicity of C (i) () and the fact that jHnRjjHnSj, wheneverRS. For condtion (2.3), for any 2 , 16 P fN (F();n)F() for all n2Ng = 1P fN (F();n)*F() for some n2Ng = 1P ( [ H (i) 2T () f (i) (n)C (i) (=jHnF()j) for some n2Ng) 1 X H (i) 2T () P sup n2N (i) (n)C (i) (=jHnF()j) 0 1 X H (i) 2T () sup 0 2H (i) P 0 sup n2N (i) (n)C (i) (=jHnF()j) 0 1 X H i 2T () jHnF()j ( by denition of C (i) ()) = 1 X H (i) 2T () jT ()j = 1jT ()j jT ()j = 1 Hence, by Theorem 2.1, the sequential step-down procedure controls the FWER in the strong sense. 17 2.3.2 A Sequential Testing Procedure for Hypotheses in Order The main practical usefulness of Theorem 2.1 is that it provides a way to generalize xed-sample size procedures of FWER control to the sequential setting. It will be illustrated by the testing procedure for hypotheses in order [39], dened below. Consider a partitionfH 1 ; ;H s g ofH, which meansH i \H j =; when i6=j, and S k j=1 H j =H. For H (i) 2H, denote I(H (i) ) =fj : H (i) 2H j g, the index of the partition part to which H (i) belongs. The testing procedure for hypotheses in order rejects H (i) if and only if p (i) , and all hypotheses in S j<I(H (i) ) H j have already been rejected. In other words, if we say H (i) is in a higher hierarchy than H (j) when I(H (i) ) < I(H (j) ), then a hypothesis can only be rejected when all the hypotheses in higher hierarchies have been rejected. We say that a subset G H is exclusive if at most one H (i) 2 G is true. Furthermore, we say a partitionfH 1 ; ;H s g ofH, is sequentially exclusive if eachH i will be exclusive if hypotheses in all higher hierarchies are false. Rosenbaum [39] showed that if the partition is sequentially exclusive, then the testing procedure for hypotheses in order controls the FWER in the strong sense. To construct the sequential testing procedure for hypotheses in order, we con- sider the function N (R;n) =fH (i) 2HnR :p (i) (n)D (i) (); andH j R for all j <I(H (i) )g; 18 where p (i) (n) is the sequential p-value of H (i) at sample size n and D (i) () is a non-decreasing function such that sup 2H (i) P ( inf n2N [p (i) (n)D (i) ()] 0) for all 0 << 1 and i = 1; 2;:::;k. Then the iterative procedure based onN is as the following. Let l 1 = 1;n 0 = 0. For j=1, . . . , k: 1. Sample up to n j = inffn2N :n>n j1 and min H (i) 2H l j p (i) (n)D (i) ()g 2. Reject H (i) 2H l j if all hypotheses in S k<I(H (i) ) H k have been rejected, and p (i) (n j )D (i) (): 3. Stop if j =k, or if n j = supN. Otherwise let l = max H (i) rejected I(H (i) ); and 8 > > > < > > > : l j+1 =l + 1 if all hypotheses inH l are rejected l j+1 =l otherwise: 19 Continue with stage j + 1. Corollary1. If the partitionH is sequentially exclusive, then the sequential testing procedure for hypotheses in order controls the FWER in the strong sense. Proof. We start by checking condition (2.2), i.e., for any n2N andRSH, if H i 2 N (R;n)nS, thenH j R S for all j < I(H i ) and p i;n D n (). By the denition ofN , H i 2N (S;n). Hence condition (2.2) is satised. For condition (2.3), note that to make a type I error, the true hypotheses that are highest in hierarchy must be rejected. Let ^ l = maxf1 i s :H i F()g and by convention maxf;g = 0, then P (N (F();n)F() for all n2N) = 1P fN (F();n)*F() for some n2Ng 1P 0 @ [ H (i) 2H ^ l+1 \T () p (i) (n)D (i) () for some n2N 1 A 1 X H (i) 2H ^ l+1 \T () P (p (i) (n)D (i) () for some n2N) 1 X H (i) 2H ^ l+1 \T () sup 0 2H (i) P 0( inf n2N [p (i) (n)D (i) ()] 0) 1jH ^ l+1 \T ()j = 1: 20 The last equality is true because by sequential exclusiveness, there can only be one hypothesis inH ^ l+1 \T (). Hence, by Theorem 2.1, the sequential testing procedure for hypotheses in order controls the FWER in the strong sense. All xed-sample FWER controlling procedures mentioned in Section 2.1.1 can be generalized to the sequential setting by choosing the correct functionN . The functionN should be chosen such that the iterative procedure based onN will stop whenever at least one hypothesis can be rejected according to the xed-sample procedure, but at an adjusted level. Like the C (i) () in the sequential step-down procedure and the D (i) () in the sequential procedure for hypotheses in order, the adjusted level should be dened so that the condition (2.3) can be satised. Whenever the procedure stops, reject all the hypotheses that can be rejected at that stage, and move on to the next stage until all hypotheses are rejected or the procedure reaches the largest possible sample size and no more hypothesis can be rejected. Thus the sequential rejection principle is not only a tool for proving strong FWER controlling of sequential procedures, but also provides insight into the construction of new sequential procedures. 21 2.4 Applications 2.4.1 Chromosomeaberrationsofpatientsexposedtoanti- tuberculosis drugs Masjedi et al. [31] studied possible mutagenic eects of the anti-tuberculosis drugs by comparing the frequency of chromosome aberrations including gaps per 100 cells in n = 36 patients before (denoted b) and after (denoted a) the treatment, and 36 healthy controls (denoted c), who matched the case group by sex and age and were preferably selected from relatives of the patient group when possible, see Table 2.1 and Figure 2.1. For the ith data triple y ai ;y bi ;y ci , consider the model y ai = a + i + i + i ; y bi = b + i + i + i ; y ci = c + i + i ; for i = 1;:::;n, where i ; i ; i ; and i are independent random variables, with continuous distributionsF ;F ;F ; andF , symmetric about zero. Here i and i re ect the pairing of treated subjects and the controls, and the possible correlation between the pre- and post-treatment measurements on the same person, respec- tively. A dierence between a and c may be due to either tuberculosis or the 22 Table 2.1: Total chromosome aberrations per 100 cells (including gaps). Subject Control Before treatment After treatment 1 1.00 0.50 3.00 2 1.50 4.50 5.50 3 0.50 3.50 5.00 4 0.50 2.66 3.33 5 0.66 1.50 4.50 6 1.00 5.00 7.00 7 1.00 1.33 5.33 8 0.66 1.50 2.50 9 0.00 2.00 5.33 10 1.33 1.50 3.00 11 1.50 1.33 3.33 12 2.00 2.00 2.00 13 1.33 2.00 4.66 14 0.00 2.66 10.00 15 3.00 1.33 3.33 16 0.50 3.50 5.00 17 0.66 3.00 5.00 18 1.33 2.66 3.33 19 3.00 0.00 4.00 20 0.66 1.50 7.00 21 0.50 1.00 3.00 22 0.66 4.00 4.00 23 2.00 1.33 2.66 24 1.33 0.66 3.33 25 0.00 1.50 3.50 26 1.00 0.66 2.00 27 0.50 2.00 3.33 28 1.33 1.00 3.50 29 0.50 1.33 2.66 30 1.00 2.00 2.00 31 0.66 2.00 4.00 32 1.50 1.50 1.50 33 2.66 2.00 3.50 34 1.33 0.66 3.33 35 0.66 0.00 2.66 36 1.00 1.50 1.50 23 0 2 4 6 8 10 0 5 10 15 20 25 30 35 40 Control Before Treatment A7er Treament Figure 2.1: Total chromosome aberrations per 100 cells (including gaps) treatment. In order to show the eect of the drugs, one needs to show that the treated responses exceed baseline and control responses by more than baseline and control responses dier from each other. Hence the null and alternative hypotheses are H 0 : a max( b ; c ) max( b ; c ) min( b ; c ) vs. H 1 : a max( b ; c )> max( b ; c ) min( b ; c ): 24 Rosenbaum [39] suggested dividing H 0 into the ve hypotheses. H (1) 0 : a ( b + c )=2; H (2) 0 : a b ; H (3) 0 : a c ; H (4) 0 : a c c b ; H (5) 0 : a b b c : To rejectH (1) 0 is to conclude that the post-treatment level exceeds the average of the baseline and control level. To reject H (2) 0 or H (3) 0 is to conclude that the post-treatment level exceeds the baseline level or the control level, respectively. If c >= b , then H (4) 0 is H 0 . Otherwise, H (5) 0 is H 0 . Next, the ve hypotheses are partitioned into three subsets: H 1 =fH (1) 0 g,H 2 =fH (2) 0 ;H (3) 0 g andH 3 = fH (4) 0 ;H (5) 0 g. This partition is sequentially exclusive because if H (1) 0 is false, then at most one of H (2) 0 and H (3) can be true. And if both H (2) 0 and H (3) 0 are false, then at most one of H (4) 0 and H (5) 0 is false. Thus Corollary 1 applies. Rosenbaum [39] applied the xed-sample hypotheses testing procedure for hypotheses in order and rejected all 5 hypotheses at level 0:05. Note thatH 0 will be rejected only if all ve hypotheses H (i) 0 are rejected; see Figure 2.2. By Corollary 1, if the individual tests of H (i) 0 are level-, then the chance of falsely rejecting H 0 is less than . 25 Start H 1 0 rejected at level ? First hierarchy H 2 0 and H 3 0 rejected at level ? Second hierarchy Stop H 0 is not rejected H 4 0 and H 5 0 rejected at level ? Third hierarchy H 0 is rejected no yes no yes no yes Figure 2.2: Flow chart of the testing procedure for the chromosome aberration example. The Q-Q plots for Masjedi et al.'s [31] data is given in Figure 2.3, and y ai ;y bi ; and y ci are clearly not normally distributed, hence we apply Wilcoxon's one-sided signed-rank test to y ti (y bi +y ci )=2 for H (1) 0 , to y ti y bi for H (2) 0 , to y ti y ci for H (3) 0 , to y ti +y bi 2y ci for H (4) 0 and to y ti +y ci 2y bi for H (5) . The sequential testing procedure for hypotheses in order was conducted with three dif- ferent sets N of possible sample sizes: fully-sequential sampling, group-sequential sampling with group size 6, and a variable group size scheme. The conventional signicance level = 0:05 is adopted and the critical value D (i) () is obtained using Monte Carlo simulation. For each of these sequential sampling schemes, all 26 5 hypotheses were rejected in each sampling scheme, which led to the rejection of H 0 . Table 2.2 shows the sample size needed for each N. Both the fully sequential test and the group-sequential test with N = f15; 30; 33; 36g managed to reduce the sample size. Figure 2.4 shows the sequential p-values of the ve hypotheses, the rst four of which coincide with each other. Table 2.2: Sample size needed for the chromosome aberration example with dier- ent N. N D (i) () Sample size f1;:::; 36g 0.0031 34 f6; 12; 18; 24; 30; 36g 0:0068 36 f15; 30; 33; 36g 0.0093 30 To see how many observations the sequential procedure can save on average, the data set was permutated randomly for 50; 000 times for each choice ofN. Five tied rankings were eliminated for the simulations, making the maximum sample size 31. Table 2.3: Expected sample size of the permutated chromosome aberration data with dierent N. N D (i) () Average sample size f1;:::; 31g 0.0031 18.78 (0.21) f10; 15; 20; 25; 31g 0.0068 18.80 (0.21) f10; 20; 25; 31g 0:0082 20.01 (0.20) f10; 20; 31g 0.0098 20.27 (0.32) f15; 31g 0.0140 21.34 (0.33) 27 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 3 Standard Normal Quantiles Quantiles of Input Sample QQ Plot of Sample Data versus Standard Normal For Control Group −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −1 0 1 2 3 4 5 Standard Normal Quantiles Quantiles of Input Sample QQ Plot of Sample Data versus Standard Normal before treatment −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 8 9 10 Standard Normal Quantiles Quantiles of Input Sample QQ Plot of Sample Data versus Standard Normal after treatment Figure 2.3: Q-Q plots for control group, before treatment, and after treatment. 28 5 10 15 20 25 30 35 40 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 X: 9 Y: 0.001953 Sample size p−value The trace of p−values for hypothesis 1−4 Dn(0.05)=0.00305 5 10 15 20 25 30 35 40 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 X: 34 Y: 0.002714 Sample size p−value The trace of p−values for hypothesis 5 Dn(0.05)=0.00305 Figure 2.4: p-value for individual hypothesis. Table 2.3 gives the average sample sizes for each sampling scheme. It shows signicant savings in average sample sizes for all the listedN. In all cases, at least 9 observations can be saved, which is nearly 1/3 of the maximum sample size 31. The fully sequential procedure reduces the average sample size by more than 12. However, in real world applications, fully sequential sampling may not always be 29 realistic. Table 2.3 suggests that the dierence in the average sample size among dierent N is small, which means adjusting N according to the needs of dierent applications would not cause a large drop in the savings of sample size. One approach is to choose a xed group size which is convenient for the experimenters and use group sequential sampling. An alternative approach is to use an ecient adaptive scheme. Bartro [2, 3, 4] has developed a general asymptotic theory of optimal adaptive group size selection in group sequential testing, an extension of which to multiple testing is applicable here for the latter approach. However, we do not explore that route here. 2.4.2 Identifying the maximum safe dose In toxicological studies, dose-response experiments are conducted in which several doses of the compounds are administered to separate groups of subjects to estimate the highest dose that will not cause some undesirable eect, called the maximum safe dose (MSD). Consider a set of increasing dose levels 1; 2;:::;k, and denote by y ij the observation of the j th subject in the i th dose level group. For this example, assume that all the observations are independent with y ij N( i ; 2 ). The hypotheses are H (i) 0 : i 0 vs. H (i) 1 : i > 0; i = 1;:::;k: 30 Furthermore, assume that all dose levels higher than the MSD have an undesirable eect and all levels below the MSD are safe [47]. Therefore,H (i) cannot be rejected before allH (j) ;j >i, are rejected. Setk = 5, = 1 and the actual MSD to be dose level 2. The 5 hypotheses are divided into 5 subsets:H i =fH (6i) g;i = 1;:::; 5. Since each hierarchy only contains one hypothesis, the partition is sequentially exclusive. By Corollary 1, if the individual hypotheses are tested at level-, then the chance that we underestimate the MSD is at most. Due to ethical concerns, it is very important to control the chance of underestimating the MSD. If the MSD is underestimated, patients might be taking the drugs at a dangerous dose level without realizing it. It is also crucial that the sampling for a certain group is stopped as soon as signicant undesirable eect is detected. Table 2.4: Operating characteristics of the sequential and xed-sample procedures for testing hypotheses in order. Sequential Fixed-sample Level Ave. sample size P(identied as MSD) Sample size P(identied as MSD) 0 - - 0.0002 - 0.0023 1 0 50.0 0.0155 50 0.0447 2 0 49.7 0.8243 50 0.8968 3 0.5 28.6 0.1600 50 0.0562 4 1.0 8.9 0 50 0 5 2.0 2.6 0 50 0 : All hypotheses are rejected, which indicates that the MSD is smaller then the smallest dose level provided. Table 2.4 gives the result of the 50; 000 simulations where both the sequential and the xed-sample testing procedure for hypotheses in order were applied to articial data sets with = 0:05. Since the data is normally distributed with known, z-test was applied to each hypothesis and the p-values were sequentially 31 compared to critical values obtained from Monte Carlo simulations. The maximal sample size of the sequential procedure was 50, which is also the sample size of the xed-sample procedure. On average, the sequential procedure only required 28.6, 8.9, and 2.6 observations at the 3 highest dose levels, respectively, a dramatic reduction from the xed-sample procedure which required 50 observations at all dose levels. This reduction is especially remarkable when noting that the patients dosed at these high dose levels would likely experience toxicity, or possibly even death. Both the xed-sample procedure and the sequential procedure correctly identied the MSD (dose level 2) more than 80% of the time. The sequential procedure was indeed less likely to identify the correct MSD compared to the xed-sample procedure. However, the xed-sample procedure was more likely to underestimate the MSD (4.70%) compared to the sequential procedure (1.57%), which, as mentioned before, is undesirable. 32 Chapter 3 Sequential Testing of Multiple Hypotheses Controlling Type I and II Familywise Error Rates 3.1 Introduction The sequential testing procedures discussed so far stop sampling when signicant evidence is available to reject the hypotheses. This is desirable when sampling under false hypotheses has a very high cost, whether safty-wise or nancial-wise. However, in some situations, people want to stop sampling as soon as there is suf- cient data to accept the hypotheses. One such example is the variable selection problem in high-dimensional regression [20]. In this setting, ecient sequential sampling would have the appealing feature of reducing the large number of covari- ates needed to be sampled for the regression model. It is common that data are collected on a large scale, but only a small sub- set of the variables are actually relevant. Hypothesis testing for each individual 33 Figure 3.1: The hierarchical clustering structure for Example (d) of Zou and Hastie [55]. variable is often used to detect relevant variables. However, the power of the tests diminishes quickly as the number of variables increases. One possible remedy is to test clusters of variables based on certain hierarchical clustering methods instead of testing each individual variable. For each cluster, one wishes to test the null hypothesis that none of the variables in the cluster has signicant eect, vs. the alternative hypothesis that at least one of the variables in the cluster has signicant eect. The hierarchical clustering structure of the variables forms a tree structure for the corresponding hypotheses. An example is illustrated in Figure 3.1, which shows the hierarchical clustering tree of 40 simulated predictors from Example (d) of Zou and Hastie [55]. The clustering tree is produced by complete linkage 34 with the distance given by the Spearman correlation [33] and the predictors are generated as follows: x i =Z 1 + i ; Z 1 N(0; 1);i = 1;:::; 5; x i =Z 2 + i ; Z 2 N(0; 1);i = 6;:::; 10; x i =Z 3 + i ; Z 3 N(0; 1);i = 11;:::; 15; x i N(0; 1); i = 16;:::; 40; i N(0:1); i = 1;:::; 40: Several existing xed-sample testing procedures for trees can be easily general- ized to the sequential setting using Section 2.1. The problem now is that we want to stop sampling for the irrelevant clusters of variables, i.e., the clusters whose cor- responding null hypotheses are accepted. In other words, we wish to stop sampling for a certain cluster as soon as there is sucient evidence that the corresponding null hypothesis should be accepted. Recall that the sequential probability ratio test (SPRT) described in Section 1.2 stops sampling when there is sucient evidence to reject or accept the hypothesis. Furthermore, it minimizes the expected sample size both under the null hypothesis and the alternative hypothesis among all tests with the same level of type I and type II error. Hence a combination of multiple SPRT-like tests seems to be a realistic and ecient approach for the problems described above. In the next chapter, we 35 will propose such a procedure and prove that it simultaneously controls type I and type II familywise error rate, which will be dened in Section 3.2. 3.2 A Sequential Test Controlling FWE I and FWE II 3.2.1 Notation and Set-Up This chapter addresses the following scenario: A scientist wishes to perform a battery of k 2 experiments sequentially in time in order to investigate some phenomenon, resulting in k data streams: Data stream 1 X (1) 1 ;X (1) 2 ;::: from Experiment 1 Data stream 2 X (2) 1 ;X (2) 2 ;::: from Experiment 2 (3.1) . . . Data stream k X (k) 1 ;X (k) 2 ;::: from Experiment k. The scientist would like to control the overall error rate of the battery of experi- ments in order to be able to draw statistically-valid conclusions for each experiment once all experimentation has ceased, but also needs to be as ecient as possible with the nite resources available by \dropping" certain experiments (i.e., stop- ping experimentation) when additional data is no longer needed from that stream 36 to reach a conclusion. The between-stream data may be very dissimilar in dis- tribution and dimension, but at the same time may be highly correlated, or even duplicated exactly in some cases, since they all are related to some phenomenon. The preceding scenario occurs in a number of real applications including multiple endpoint (or multi-arm) clinical trials [26, Chapter 15], multi-channel changepoint detection [51] and its applications to biosurveillance [32], genetics and genomics [15], acceptance sampling with multiple criteria [1], and nancial trading strategies [38]. If we think of each experiment as a hypothesis test about that corresponding data stream, then what is needed is a combination of a multiple hypothesis test and a sequential hypothesis test. This scenario described above was addressed by Bartro and Lai [5] who gave a procedure that sequentially tests k hypotheses while controlling the type I fam- ilywise error rate [22], i.e., the probability of rejecting any true hypotheses, at a prescribed level. Their procedure requires only the existence of basic sequen- tial tests for each data stream and makes no assumptions about the dependence between the dierent data streams; in particular, the error control holds when the streams are highly positively correlated, as is often the case in the application areas mentioned above. This chapter introduces a procedure to test k hypotheses while simultaneously controlling both the type I and II familywise error rates (dened precisely below) at prescribed levels in the same general setting: no assumptions are made about the dependence between the dierent data streams. We call this 37 new procedure the sequential Holm procedure because of its relation to Holm's [24] seminal xed-sample step-down procedure which controls familywise error rate. Following the formulation of the sequential Holm procedure in Section 3.2, we consider simple hypotheses and then discuss composite hypotheses in Section 3.3, simulation studies in Section 3.4, and nally a discussion of future extensions and a summary. For simplicity of presentation we introduce the procedure in the fully-sequential setting where the possible stopping times can be any positive integer n = 1; 2;:::, although formulations in other settings like group-sequential and truncated set- tings are possible with only minor modications. Fix the number k 2 of data streams and let [k] =f1;:::;kg. Assume that there arek streams (4.1) of sequen- tially observable data and, for each i2 [k], it is desired to test the null hypothe- sis H (i) versus the alternative hypothesis G (i) about the parameter (i) governing the ith data stream X (i) 1 ;X (i) 2 ;:::, where H (i) and G (i) are disjoint subsets of the parameter space (i) containing (i) . The individual parameters (i) may them- selves be vectors, and the global parameter = ( (1) ;:::; (k) ) is the concatena- tion of the individual parameters and is contained in the global parameter space = (1) (k) . Given 2 , letT () =fi2 [k] : (i) 2 H (i) g denote the indices of the \true" hypotheses andF() =fi2 [k] : (i) 2 G (i) g the indices of the \false" 38 null hypotheses. The type I and II familywise error rates, denoted FWE I () and FWE II (), are dened as FWE I () =P (H (i) rejected, any i2T ()) FWE II () =P (H (i) accepted, any i2F()): Here the notion of rejecting (resp. accepting)H (i) is equivalent to accepting (resp. rejecting) G (i) . This denition of FWE I () is the same as the standard one for xed-sample testing (such as in [22]) and FWE II () is dened analogously; the quantity 1 FWE II () has been called \familywise power" by some authors [e.g., 29]. The rst sequential procedures to simultaneously control both the type I and II familywise error rates were introduced by De and Baron [13, 14], however these procedures are constrained to continue sampling all data steams until accept/reject decisions can be reached for all data steams, making their form and performance quite dierent than the sequential Holm procedure proposed here. The building blocks of the the sequential Holm procedure dened below are k individual sequential test statisticsf (i) (n)g i2[k];n1 , where (i) (n) is the statistic for testing H (i) vs. G (i) based on the data X (i) 1 ;X (i) 2 ;:::;X (i) n available from the ith stream at time n. Concrete examples of these test statistics are given later in this section and in Section 3.3 but, for now, the reader may think of (i) (n) as a sequential log likelihood ratio statistic for testing H (i) vs. G (i) , say. Given 39 desired FWE I () and FWE II () bounds and 2 (0; 1), respectively, for each data stream i we assume the existence of critical values A (i) s = A (i) s (;) and B (i) s =B (i) s (;), s2 [k], such that P (i)( (i) (n)B (i) s some n, (i) (n 0 )>A (i) 1 all n 0 <n ) ks + 1 for all (i) 2H (i) (3.2) P (i)( (i) (n)A (i) s some n, (i) (n 0 )<B (i) 1 all n 0 <n ) ks + 1 for all (i) 2G (i) (3.3) for all i;s 2 [k]. We will show below that, in most cases, there are standard sequential statistics that satisfy these error bounds. Without loss of generality we assume that, for each i2 [k], A (i) 1 A (i) 2 :::A (i) k <B (i) k B (i) k1 :::B (i) 1 : (3.4) For example, if the A (i) s were not non-decreasing in s then they could be replaced by e A (i) s = maxfA (i) 1 ;:::;A (i) s g for which (3.3) would still hold; similarly forB (i) s and 40 (3.2). Note that, by (3.2)-(3.3), the critical values A (i) 1 ;B (i) 1 are simply the critical values for the sequential test that samples until A (i) 1 < (i) (n)<B (i) 1 (3.5) is violated, and this test has type I and II error probabilities=k and=k, respec- tively. The values A (i) s , s2 [k], are then such that the similar sequential test with critical values A (i) s , B (i) 1 has type II error probability =(ks + 1), and the analogous statement holds for the test with critical values A (i) 1 , B (i) s . The sequential multiple testing procedure introduced below will involve ranking the test statistics associated with dierent data streams, which may be on com- pletely dierent scales in general, so for each streami we introduce a standardizing function ' (i) () which will be applied to the statistic (i) (n) before ranking. The standardizing functions ' (i) can be any increasing functions such that ' (i) (A (i) s ) and ' (i) (B (i) s ) do not depend on i. For simplicity, here we take the ' (i) to be piecewise linear functions such that ' (i) (A (i) s ) =(ks + 1) and ' (i) (B (i) s ) =ks + 1 for all s2 [k]. (3.6) 41 That is, for i2 [k] dene ' (i) (x) = 8 > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > : xA (i) 1 k; forxA (i) 1 xA (i) s A (i) s+1 A (i) s (ks + 1); forA (i) s xA (i) s+1 ifA (i) s+1 >A (i) s ; 1s<k 2(xA (i) k ) B (i) k A (i) k 1; forA (i) k xB (i) k xB (i) s B (i) s1 B (i) s +ks + 1; forB (i) s xB (i) s1 ifB (i) s1 >B (i) s ; 1<sk xB (i) 1 +k; forxB (i) 1 : 3.2.2 The Sequential Holm Procedure We shall describe the sequential Holm procedure in terms of stages of sampling, between which accept/reject decisions are made. LetI j [k] (j = 1; 2;:::) denote the index set of the active hypotheses (i.e., the H (i) which have been neither accepted nor rejected yet) at the beginning of the jth stage of sampling, and n j will denote the cumulative sample size of any active test statistic up to and including the jth stage. The total number of null hypotheses that have been rejected (resp. accepted) at the beginning of the jth stage will be denoted by r j (resp. a j ). Accordingly, setI 1 = [k], n 0 = 0, a 1 = r 1 = 0, letjj denote set cardinality, and x desired FWE I () and FWE II () bounds and, respectively. The jth stage of sampling (j = 1; 2;:::) proceeds as follows: 42 1. Sample the active streamsfX (i) n g i2I j ;n>n j1 until n equals n j = inf n n>n j1 :A (i) a j +1 < (i) (n)<B (i) r j +1 is violated for some i2I j o : (3.7) 2. Standardize and order the active test statistics e (i) (n j ) =' (i) ( (i) (n j )), i2 I j , as follows: e (i(j;1)) (n j ) e (i(j;2)) (n j )::: e (i(j;jI j j)) (n j ); (3.8) wherei(j;`) denotes the index of the`th ordered active standardized statistic at the end of stage j. 3. (a) If the rst inequality in (3.7) was violated for some i 2 I j , i.e., if (i) (n j )A (i) a j +1 , then accept the m j 1 null hypotheses H (i(j;1)) ;H (i(j;2)) ;:::;H (i(j;m j )) ; where m j = min n m 1 : e (i(j;m+1)) (n j )>(ka j m) o ; (3.9) and set a j+1 =a j +m j . Otherwise set a j+1 =a j . 43 (b) If the second inequality in (3.7) was violated for some i2I j , i.e., if (i) (n j )B (i) r j +1 , then reject the m 0 j 1 null hypotheses H (i(j;jI j j)) ;H (i(j;jI j j1)) ;:::;H (i(j;jI j jm 0 j +1)) ; (3.10) where m 0 j = min n m 1 : e (i(j;jI j jm)) (n j )<kr j m o ; (3.11) and set r j+1 =r j +m 0 j . Otherwise set r j+1 =r j . 4. Stop if there are no remaining active hypotheses, i.e., if a j+1 +r j+1 = k. Otherwise, letI j+1 be the indices of the remaining active hypotheses and continue on to stage j + 1. Before giving an example of this procedure, we make some remarks about its denition. (A) There will never be a con ict between the acceptances in Step 3a and the rejections in Step 3b since if H (i) is accepted at stage j then i = i(j;m) for some mm j , hence m 1<m j so by (3.9) we have e (i) (n j ) = e (i(j;m)) (n j )(ka j (m 1))< 0<kr j (jI j jm); 44 which shows that the set in (3.11) must contain the valuejI j jm, hence m 0 j jI j jm, ormjI j jm 0 j . This, with (3.10), shows thatH (i) =H (i(j;m)) could not have also been rejected. A similar argument shows that a null hypothesis that is rejected could not also be accepted at the same stage. (B) If k = 1 then this denition becomes the sequential test (3.5) of the single null hypothesis H (1) versus alternative G (1) which has type I and II error probabilities bounded by and , respectively. (C) Ties in (3.8) can be broken arbitrarily (at random, say) without aecting the error control proved in Theorem 3.1, below. (D) If the same critical values are used for all data streams, that is, if A (i) s = A (i 0 ) s =A s and B (i) s =B (i 0 ) s =B s for all i;i 0 ;s2 [k], then the standardization performed in Step 2 can be dispensed with as long as the values to the right of the inequalities in (3.9) and (3.11) are replaced by A a j +m+1 and B r j +m+1 , respectively. Error control still holds under these conditions, which we prove below as part of Theorem 3.1. Before stating our main result, Theorem 3.1, that this procedure controls both type I and II familywise error rates, we give a simplistic example to show the mechanics of the procedure. Table 3.1 contains three sample paths in the setting of three pairs of null and alternative hypotheses about the probabilityp (i) of success 45 in Bernoulli data X (i) n , i = 1; 2; 3. Here the test statistics (i) (n) are taken to be log likelihood ratios (i) (n) = (2S (i) n n) log(:4=:6) where S (i) n = n X j=1 X (i) j ; (3.12) for testing H (i) :p (i) :6 vs. G (i) :p (i) :4; (3.13) i = 1; 2; 3, about the success probabilityp (i) of i.i.d. Bernoulli data. This choice of test statistic and calculation of the critical values given in the table's header will be explained in detail further below in Section 3.3.1; for now we merely focus on the procedure's decisions to stop or continue sampling. Per remark (D) we dispense with the standardizing functions and drop the superscript (i) on the critical values A s ;B s . The values of the stopped test statistics are given in bold in Table 3.1. On sample path 1, sampling proceeds until time n 1 = 7 when H (1) and H (2) are rejected because this is the rst time any of the 3 test statistics exceed B 1 or fall below A 1 . In particular, H (1) is rejected because (1) (7) = 2:03 B 1 = 1:93 and H (2) is also rejected at this time because (2) (7) = 2:03B 2 = 1:53 and one null hypothesis (i.e.,H (1) ) has already been rejected; the fact that (2) (7) also exceeds B 1 was not necessary for rejecting H (2) . Next, sampling of stream 3 is continued until time n 2 = 10 when H (3) is accepted because its test statistic falls below 46 A 1 =2:43. Similarly, on sample path 2, after rejecting H (1) at time n 1 = 7, H (2) is then rejected at time n 2 = 8 because (2) (8) exceeds B 2 = 1:53 and one null hypothesis (i.e.,H (1) ) has already been rejected. H (3) is also accepted at time n 2 = 8 for the same reason as above. On sample path 3, all three null hypotheses are rejected at time n 1 = 7 because (1) (7) = 2:03B 1 , (2) (7) = 2:03B 2 and Table 3.1: Three sample paths of the sequential Holm procedure for k = 3 hypotheses about Bernoulli data using critical values A 1 =2:34, A 2 =1:94, A 3 =1:27,B 1 = 1:93,B 2 = 1:53,B 3 =:86. The values of the stopped sequential statistics are in bold. Data Stream n = 1 2 3 4 5 6 7 8 9 10 Sample Path 1 1 X (1) n 1 0 0 0 0 0 0 (1) (n) -.41 .00 .41 .81 1.22 1.62 2.03 2 X (2) n 0 1 0 0 0 0 0 (2) (n) .41 .00 .41 .81 1.22 1.62 2.03 3 X (3) n 1 0 1 1 0 1 1 1 1 1 (3) (n) -.41 .00 -.41 -.81 -.41 -.81 -1.22 -1.62 -2.03 -2.43 Sample Path 2 1 1 0 0 0 0 0 0 -.41 .00 .41 .81 1.22 1.62 2.03 2 0 1 1 0 0 0 0 0 .41 .00 -.41 .00 .41 .81 1.22 1.62 3 1 0 1 1 1 1 1 1 -.41 .00 -.41 -.81 -1.22 -1.62 -2.03 -2.43 Sample Path 3 1 0 1 0 0 0 0 0 .41 .00 .41 .81 1.22 1.62 2.03 2 0 0 0 1 0 0 0 .41 .81 1.22 .81 1.22 1.62 2.03 3 1 0 1 0 0 0 0 -.41 .00 -.41 .00 .41 .81 1.22 47 one null hypothesis (i.e.,H (1) ) has already been rejected, and (3) (7) = 1:22B 3 and two null hypotheses (i.e., H (1) and H (2) ) have already been rejected. Next we state the result that the sequential Holm procedure control the fami- lywise error rates. Theorem 3.1. Fix ;2 (0; 1). If the test statistics (i) (n), i2 [k], n 1, and critical values A (i) s =A (i) s (;) and B (i) s =B (i) s (;), i;s2 [k], satisfy (3.2)-(3.3), then the sequential Holm procedure dened above in Steps 1-4 satises FWE I () and FWE II () for all 2 . If A (i) s =A (i 0 ) s =A s and B (i) s =B (i 0 ) s =B s for all i;i 0 ;s2 [k], then this conclusion still holds if we take ' (i) (x) =x for all i2 [k] and replace the right-hand-sides of the inequalities in (3.9) and (3.11) by A a j +m+1 and B r j +m+1 , respectively. Proof. We x and, for simplicity, omit it from the notation that follows. We rst prove that FWE II . Since each ' (i) () is strictly increasing and satises (3.6), (i) (n)A (i) s , e (i) (n)(ks + 1) (3.14) (i) (n)B (i) s , e (i) (n)ks + 1 (3.15) for any i;s2 [k]. IfF =; then FWE II = 0, so assume thatF6=;. Let S j = n m2 [k] :i(j;m)2I j \F and e (i(j;m)) (n j )(ka j m + 1) o 48 and letj denote the earliest stage at which a type II error occurs, taking the value 1 if no such error occurs; to prove that FWE II we thus assume without loss of generality that j <1 with probability 1. By our assumptions and by denition of j ,S j 6=; so let m = minS j . By partitioningF we have jFj =jF\I j j +jfi2F :H (i) rejected at some stage 1;:::;j 1gj: (3.16) By denition of m , the rst term on the right-hand-side of (3.16) is bounded above by jI j j (m 1) = (ka j r j ) (m 1); and the second term on the right-hand-side of (3.16) is bounded above by r j . Combining these two givesjFjka j m + 1, or a j +m kjFj + 1: (3.17) Letting V i =fH (i) accepted and i =i(j ;m )g, we thus have V i n e (i) (n j )(ka j m + 1) o = n (i) (n j )A (i) a j +m o (by (3.14)) n (i) (n j )A (i) kjFj+1 o (by (3.17)): (3.18) 49 We will also show that V i n (i) (n)<B (i) 1 for all n<n j o : (3.19) This holds because if (i) (n)B (i) 1 for some n<n j , thenH (i) would be rejected at some stage prior to j . To see this, let W i be the event on the right-hand-side of (3.19). It is clear from Step 3b that on W c i , some hypothesis would be rejected at a stage j < j since B (i) 1 B (i) r j +1 for any value of r j . Let j 0 < j denote the earliest stage such that H (i) is not rejected before stage j 0 and (i) (n j 0)B (i) 1 : (3.20) Let m be such that i =i(j 0 ;jI j 0jm). We will show that e (i(j 0 ;jI j 0j`)) (n j 0)kr j 0` for all 1`m which, by (3.11), implies that H (i) is rejected at stage j 0 and nishes the proof of (3.19). For any 1`m, e (i(j 0 ;jI j 0j`)) (n j 0) e (i(j 0 ;jI j 0jm)) (n j 0)k (by (3.15) and (3.20)) kr j 0`: 50 Combining (3.18) and (3.19) we have V i n (i) (n)A (i) kjFj+1 some n, (i) (n 0 )<B (i) 1 all n 0 <n o ; and using this we have FWE II =P [ i2F V i ! X i2F P (V i ) X i2F P ( (i) (n)A (i) kjFj+1 some n, (i) (n 0 )<B (i) 1 all n 0 <n) X i2F =jFj (by (3.3)) =: The proof that FWE I is similar so the details are omitted. The only thing that could make the situation dierent is the possibility that a hypothesis that would have been rejected in Step 3b is accepted in Step 3a. However, Remark (A) guarantees that this does not happen. To prove the second claim of the theorem, we note that the only properties of the standardizing functions ' (i) needed for the proof above are: 1. for all i2 [k], ' (i) () is strictly increasing; 2. ' (i) (A (i) s ) =' (i 0 ) (A (i 0 ) s ) and ' (i) (B (i) s ) =' (i 0 ) (B (i 0 ) s ) for all i;i 0 ;s2 [k]; 51 3. the right-hand-sides of the inequalities in (3.9) and (3.11) equal' (i) (A a j +m+1 ) and ' (i) (B r j +m+1 ), respectively. If A (i) s =A (i 0 ) s =A s and B (i) s =B (i 0 ) s =B s for all i;i 0 ;s2 [k], then we can instead use ' (i) (x) =x for all i2 [k] which preserves the rst two properties in this case, and replacing the right-hand-sides of (3.9) and (3.11) by A a j +m+1 and B r j +m+1 , respectively, satises the third property. 3.3 Constructing Test Statistics that Satisfy (3.2)-(3.3) for Individual Data Streams Since all that is needed in the above construction of the sequential Holm procedure are sequential test statistics and critical values satisfying (3.2)-(3.3) for each data stream, in this section we show how to construct them in a few dierent settings and give some examples. 3.3.1 Simple Hypotheses and Their Use as Surrogates for Certain Composite Hypotheses In this section we show how to construct the test statistics (i) (n) and critical values fA (i) s ;B (i) s g s2[k] satisfying (3.2)-(3.3) for any data stream i such that H (i) and G (i) are both simple hypotheses. This setting is of interest in practice because many more complicated composite hypotheses can be reduced to simple hypotheses. In 52 this case the test statistics (i) (n) will be taken to be log-likelihood ratios because of their strong optimality properties of the resulting sequential probability ratio test (SPRT); see Cherno [11]. In order to express the likelihood ratio tests in simple form, we now make the additional assumption that each data streamX (i) 1 ;X (i) 2 ;::: constitutes independent and identically distributed data. However, we stress that this independence assumption is limited to within each stream so that, for example, elements of X (i) 1 ;X (i) 2 ;::: may be correlated with (or even identical to) elements of another stream X (i 0 ) 1 ;X (i 0 ) 2 ;:::. We represent the simple null and alternative hypothesesH (i) andG (i) by the corresponding distinct density functionsh (i) (null) andg (i) (alternative) with respect to some common-nite measure (i) . Formally, the parameter space (i) corresponding to this data stream is the set of all densities f with respect to (i) , and H (i) is considered true if the true density f (i) satises f (i) = h (i) (i) -a.s., and is false if f (i) = g (i) (i) -a.s. The SPRT for testing H (i) : f (i) = h (i) vs. G (i) : f (i) = g (i) with type I and II error probabilities and , respectively, utilizes the simple log-likelihood ratio test statistic (i) (n) = n X j=1 log g (i) (X (i) j ) h (i) (X (i) j ) ! (3.21) 53 and samples sequentially until (i) (n) A(;) or (i) (n) B(;), where the critical values A(;) and B(;) satisfy P h (i)( (i) (n)B(;) some n, (i) (n 0 )>A(;) all n 0 <n) (3.22) P g (i)( (i) (n)A(;) some n, (i) (n 0 )<B(;) all n 0 <n): (3.23) There are a few dierent options for computing A(;) and B(;) in practice. They may be computed numerically via Monte Carlo or normal approximation to the log-likelihood ratio (3.21), but the most widely-used method is to use the simple, closed-form Wald-approximations A(;) = log 1 +; B(;) = log 1 ; (3.24) , where = 0 in Wald's original formulation. See Hoel et al. [23, Section 3.3.1] or Siegmund [42] for a derivation. Although, in general, the inequalities in (3.22)- (3.23) only hold approximately when A(;) and B(;) are given by (3.24), Hoel et al. [23] show that the actual type I and II error probabilities when using (3.24) can only exceed or by a negligibly small amount in the worst case, and the dierence approaches 0 for small and , which is relevant in the present multiple testing situation where we will utilize Bonferroni-type cutdowns of and . This approximation has been shown to be too conservative in normal distribution problems because of the over-shooting at the boundaries. Siegmund 54 [42] suggests = 0:583 based on a Brownian motion approximation. In what follows in this section we adopt (3.24) and use these to construct the critical values A (i) s , B (i) s of the sequential Holm procedure. The extensive simulations performed in Section 3.4 show that this does not lead to any exceedances of the desired familywise error rate bounds. Alternative approaches would be to compute fA (i) s ;B (i) s g s2[k] via Monte Carlo, as mentioned above, or to replace (3.24) by log and log 1 , respectively, for which (3.22)-(3.23) always hold [see 23] and proceed similarly, but we do not explore those options here. The next theorem shows that, neglecting Wald's approximation, the following simple expressions (3.25) can be used for the critical values in the sequential Holm procedure. Specically, we show that the left-hand-sides of (3.2)-(3.3) are equal to the right-hand-sides of (3.26)-(3.27), and hence the inequalities in (3.2)-(3.3) hold, up to Wald's approximation. Theorem3.2. Suppose that, for a certain data streami,H (i) :f (i) =h (i) andG (i) : f (i) = g (i) are simple hypotheses. Let (i) Wald (;) and (i) Wald (;) be the values of the probabilities on the left-hand-sides of (3.22) and (3.23), respectively, with (i) (n) given by (3.21) andA(;) andB(;) given by the Wald approximations (3.24). Now x ;2 (0; 1) and for s2 [k] let s = s (;) = (ks + 1) (ks + 1)(k) ; s = s (;) = (ks + 1) (ks + 1)(k) : 55 Also, let (i) Holm (s) and (i) Holm (s) denote the left-hand-sides of (3.2) and (3.3), respectively, with A (i) s , B (i) s given by A (i) s =A (i) s (;) = log (1 s )(ks + 1) + B (i) s =B (i) s (;) = log (1 s )(ks + 1) : (3.25) Then, for all s2 [k], (i) Holm (s) = (i) Wald (=(ks + 1); s ) and (3.26) (i) Holm (s) = (i) Wald ( s ;=(ks + 1)) (3.27) and therefore (3.2)-(3.3) hold, up to Wald's approximation, when using the critical values (3.25). Proof. First note that s ; s 2 (0; 1) for all s2 [k] since 0< s = ks + 1 ks + 1 k < 1 k < 1 k 1 1 56 ask 2, and similarly for s . A (i) s andB (i) s in (3.25) can be written asA( s ;=(k s + 1) andB(=(ks + 1); s ), respectively, and it is simple algebra to then check that A(=(ks + 1); s ) =A (i) 1 for any s2 [k]. Then, to verify (3.26), (i) Holm (s) =P h (i)( (i) (n)B (i) s some n, (i) (n 0 )>A (i) 1 all n 0 <n) =P h (i)( (i) (n)B(=(ks + 1); s ) some n, (i) (n 0 )>A(=(ks + 1); s ) all n 0 <n) = (i) Wald (=(ks + 1); s ) by (3.22). The proof of (3.27) is similar. The theorem gives simple, closed form critical values (3.25) that can be used in lieu of Monte Carlo or other methods of calculating the 2k critical val- uesfA (i) s ;B (i) s g s2[k] for a streami whose hypothesesH (i) ;G (i) are simple. Example values of (3.25) for =:05 and =:2 are given in Table 3.2 for k = 2;:::; 10. Example: Exponential families Suppose that a certain data stream i is comprised of i.i.d. d-dimensional random vectors X (i) 1 ;X (i) 2 ;::: from a multiparameter exponential family of densities X (i) n f (i)(x) = exp[ (i)T x (i) ( (i) )]; n = 1; 2;:::; (3.28) 57 Table 3.2: Critical values (3.25) of the sequential Holm procedure for simple hypotheses, for =:05, =:2, = 0 and k = 2;:::; 10 to two decimal places. k A 1 ;:::;A k B 1 ;:::;B k 2 -2.28 -1.59 3.58 2.89 3 -2.69 -2.29 -1.60 4.03 3.62 2.93 4 -2.98 -2.70 -2.29 -1.60 4.33 4.04 3.64 2.95 5 -3.21 -2.99 -2.70 -2.29 -1.60 4.56 4.34 4.05 3.65 2.96 6 -3.39 -3.21 -2.99 -2.70 -2.29 -1.60 4.75 4.57 4.35 4.06 3.66 2.96 7 -3.55 -3.39 -3.21 -2.99 -2.70 -2.30 -1.60 4.91 4.76 4.58 4.35 4.07 3.66 2.97 8 -3.68 -3.55 -3.39 -3.21 -2.99 -2.70 -2.30 -1.60 5.05 4.92 4.76 4.58 4.36 4.07 3.66 2.97 9 -3.80 -3.68 -3.55 -3.40 -3.21 -2.99 -2.70 -2.30 -1.60 5.17 5.05 4.92 4.77 4.58 4.36 4.07 3.67 2.97 10 -3.91 -3.80 -3.68 -3.55 -3.40 -3.21 -2.99 -2.70 -2.30 -1.61 5.28 5.17 5.05 4.92 4.77 4.59 4.36 4.07 3.67 2.98 where (i) andx ared-vectors, () T denotes transpose, :R d !R is the cumulant generating function, and it is desired to test H (i) : (i) = vs. G (i) : (i) = for given; 2R d . LettingS (i) n = P n j=1 X (i) j , the log-likelihood ratio (3.21) in this case is (i) (n) = ( ) T S (i) n n[ (i) ( ) (i) ()] (3.29) 58 and, by Theorem 3.2, the critical values (3.25) can be used, which satisfy (3.2)-(3.3) up to Wald's approximation. As mentioned above, many more complicated testing situations reduce to this setting. For example, the Bernoulli example (3.12) for testing (3.13) can be reduced to testing p (i) = :6 vs. p (i) = :4 by considering the worst-case error probabilities under the hypotheses (3.13), hence (3.12) is given by (3.29) with (i) = log[p (i) =(1 p (i) )], (i) ( (i) ) = log(1p (i) ), = log(:6=:4) = , and the critical values in Table 3.1 are given by (3.25) with = = :25, this value is chosen merely to produce short sample paths for the sake of the example. 3.3.2 Other Composite Hypotheses While many composite hypotheses can be reduced to the simple-vs.-simple situa- tion in Section 3.3.1, the generality of Theorem 3.1 does not require this and allows any type of hypotheses to be tested as long as the corresponding sequential statis- tics satisfy (3.2)-(3.3). In this section we discuss the more general case of how to proceed to apply Theorem 3.1 when a certain data streami is described by a mul- tiparameter exponential family (3.28) but simple hypotheses are not appropriate. Let I( (i) ; (i) ) = ( (i) (i) ) T r (i) ( (i) ) [ (i) ( (i) ) (i) ( (i) )] 59 denote the Kullback-Leibler information number for the distribution (3.28), and it is desired to test H (i) :u( (i) )u 0 vs. G (i) :u( (i) )u 1 (3.30) where u() is a continuously dierentiable real-valued function such that for all xed (i) , I( (i) ; (i) ) is 0 B B @ decreasing increasing 1 C C A in u( (i) ) 0 B B @ < > 1 C C A u( (i) ); andu 0 <u 1 are chosen real numbers. The family of models (3.28) and general form of the hypotheses (3.30) contain a large number of situations frequently encoun- tered in practice, including various two-population tests that occur frequently in randomized Phase II and Phase III clinical trials; see Bartro et al. [7, Chapter 4]. Of course there are many composite hypotheses encountered in practice which do not t into the form (3.30), such as e H (i) : (i) = (i) 0 vs. e G (i) :6= (i) 0 (3.31) for some xed (i) 0 . However, by considering true values of (i) arbitrarily close to (i) 0 , it is clear that no test of (3.31) can control the type II error probability for all (i) 2 e G (i) in general, and since the focus here is on tests that control both the type I and II familywise error rates, one would need to restrict e G (i) in some way 60 for that to be possible, for example by modifying e G (i) to be only the (i) such that jj (i) (i) 0 jj 2 for some > 0. But this restricted form ts into the framework (3.30) by choosing u( (i) ) =jj (i) (i) 0 jj, u 0 = 0, and u 1 = . So although it is not natural to test (3.31) in the current framework of simultaneous type I and II familywise error control, it is natural to test (3.31) when only type I familywise error control is strictly required, and that problem has already been addressed in the sequential multiple testing setting by Bartro and Lai [5]. The hypotheses (3.30) can be tested with sequential generalized likelihood ratio (GLR) statistics, as follows. Letting b (i) n = (r (i) ) 1 1 n n X j=1 X (i) j ! denote the maximum likelihood estimate (MLE) of based on the data from the rst n observations, dene H (n) =n inf :u()=u 0 I( b (i) n ;) ; (3.32) G (n) =n inf :u()=u 1 I( b (i) n ;) ; (3.33) (i) (n) = 8 > > > < > > > : p 2n H (n); if u( b (i) n )>u 0 and H (n) G (n) p 2n G (n); otherwise. (3.34) 61 The statistics (3.32) and (3.33) are the log-GLR statistics for testing against H (i) and against G (i) , respectively, whose signed roots in (3.34) have standard normal large-n limiting distribution under u( (i) ) = u 0 and u 1 , respectively; see Jenni- son and Turnbull [28, Theorem 2], whose results further show that under group sequential sampling, the signed-root statistics have asymptotically independent increments, a fact which can be used with random walk theory to nd the critical valuesfA (i) s ;B (i) s g s2[k] for (i) (n) (see [7, Chapter 4]). However, our simulation studies have shown that under the fully-sequential sampling considered here, the small-n behavior of these statistics can deviate substantially from the standard normal random walk and therefore we advocate Monte Carlo determination of the critical valuesfA (i) s ;B (i) s g s2[k] for (i) (n), which then allows their inclusion in the sequential Holm procedure. 3.4 Simulation Studies In this section, we compare the sequential Holm procedure (denoted SH) with the xed-sample Holm [1979] procedure (denoted FH), the sequential Bonferroni procedure (denoted SB), and the sequential intersection scheme (denoted IS) pro- posed by De and Baron [14]. The SB procedure uses a SPRT on each data stream with error probability bounds =k and =k via the Wald approximations (3.24). That is, for each i2 [k], SB samples the ith stream until (3.5) is violated, with 62 A (i) 1 = log[(=k)=(1=k)] and B (i) 1 = log[(1=k)=(=k)]. The three sequen- tial procedures SH, SB, and IS are the only ones we know of that control both FWE I and FWE II . In our studies we have chosen the commonly used values of = :05 and = :2, i.e., familywise power at least 80%. This same value of is used for the xed-sample Holm procedure as well, which does not guarantee FWE II control at a prescribed level, so we have chosen its sample size to make its familywise power approximately the same as that of the SH procedure in order to have a meaningful comparison with the sequential procedures. Below we present two sets of simulations, the rst in Table 3.3 with independent streams of Bernoulli data, and the second in Table 3.4 with dependent streams of normal data gener- ated from a multivariate normal distribution with non-identity covariance matrix. For each scenario considered below, FWE I , FWE II , expected total sample size EN = E( P k i=1 N (i) ) of all the data streams where N (i) is the total sample size of the ith stream, and relative savings in sample size of SH are estimated as the result of 100,000 Monte Carlo simulated batteries ofk sequential tests. In each set of simulations, the data streams and hypotheses tested are similar for each data stream; we emphasize that this is only for the sake of getting a clear picture of the procedures' performance, and this uniformity is not required in order to be able to use the procedures considered. 63 3.4.1 Independent Bernoulli Data Streams Table 3.3 contains the operating characteristics of the above procedures for testing k hypotheses of the form H (i) :p (i) :6 vs. G (i) :p (i) :4; i = 1;:::;k; about the probabilityp (i) of success in theith stream of i.i.d. Bernoulli data; addi- tionally, the streams were generated independently of each other. The individual test statistics (3.12) were used and the SH procedure used the critical values in Table 3.2, as described in Section 3.3.1. The data was generated for each data stream with p (i) = :6 or :4 and the second column of Table 3.3 gives the number of hypotheses for which p (i) = :6. The columns labeled Savings give the percent decrease in expected sample size EN of the SH relative to each other procedure. The SH procedure has substantially smaller sample size compared to the other three, saving more than 50% compared to FH and IS in each scenario with k 5. Like its xed-sample analog, the SB procedure is conservative in that its attained error rates FWE I and FWE II are much smaller than the prescribed levels, and the IS also suers from this, perhaps as a result of its stringent stopping rule, which also leads to large average sample size. In fact, the IS has FWE I and FWE II even smaller than the SB procedure. The FH procedure handles its error rate more judiciously than the SB and IS procedures due to its step-down structure, and the 64 SH procedure has very similar error rates to FH but with much smaller expected sample sizes. 3.4.2 Correlated Normal Data Streams Table 3.4 contains the operating characteristics of the four procedures described above for testing k hypotheses of the form H (i) : (i) 0 vs. G (i) : (i) ; i = 1;:::;k; for known > 0, taken here to be 1, about the mean (i) of i.i.d. normal obser- vations with known variance 1, which makes up the ith data stream. We use = 0:583 for this example to get a better approximation. To investigate the performance of the procedures under dependent data streams, the streams were generated from a k-dimensional multivariate normal distribution with mean = ( (1) ;:::; (k) ), given in the third column of Table 3.4, and four dierent non-identity covariance matrices M 1 ;M 2 ;M 3 , and M 4 , given 65 Table 3.3: Operating characteristics of sequential and xed-sample multiple testing procedures controlling familywise error rates fork streams of independent Bernoulli data. k # of true H (i) Procedure FWE I (SE) FWE II (SE) EN(SE) Savings 2 2 SH 0.0444 (0.0063) - 47.4 (0.9) FH - - - - SB 0.0466 (0.0067) - 56.4 (1.0) 15.96% IS 0.0239 (0.0048) - 64.5 (1.3) 26.51% 1 SH 0.0288 (0.0053) 0.1358 (0.0093) 63.3 (1.2) FH 0.0292 (0.0054) 0.1357 (0.0106) 126 49.76% SB 0.0236 (0.0046) 0.0873 (0.0088) 66.8 (1.2) 5.24% IS 0.0219 (0.0048) 0.1186 (0.0092) 97.1 (1.5) 34.81% 0 SH - 0.1660 (0.0118) 72.7 (1.2) FH - 0.1669 (0.0120) 126 42.3% SB - 0.1633 (0.0119) 77.2 (1.1) 5.83% IS - 0.0904 (0.0087) 103.8 (1.7) 29.96% 5 3 SH 0.0339 (0.0059) 0.1070 (0.0093) 216.5 (2.2) FH 0.0383 (0.0060) 0.1094 (0.0088) 485 55.36% SB 0.0219 (0.0041) 0.0760 (0.0077) 230.1 (2.1) 5.91% IS 0.0130 (0.0036) 0.0449 (0.0061) 439.7 (5.5) 50.76% 2 SH 0.0265 (0.0054) 0.1286 (0.0105) 230.4 (2.3) FH 0.0329 (0.0050) 0.1270 (0.0098) 490 52.98% SB 0.0145 (0.0038) 0.1110 (0.0100) 247.0 (2.0) 56.72% IS 0.0110 (0.0033) 0.0433 (0.0069) 475.0 (6.0) 51.49% 10 8 SH 0.0342 (0.0056) 0.0704 (0.0078) 479.7 (3.7) FH 0.0302 (0.0046) 0.0774 (0.0088) 1200 60.03% SB 0.0264 (0.0048) 0.0344 (0.0059) 532.4 (3.3) 9.90% IS 0.0113 (0.0036) 0.0240 (0.0044) 1143.0 (12.4) 58.03% 5 SH 0.0265 (0.0056) 0.1122 (0.0102) 548.9 (3.3) FH 0.0455 (0.0067) 0.1115 (0.0099) 1240 55.73% SB 0.0158 (0.0040) 0.0829 (0.0086) 586.4 (3.2) 6.39% IS 0.0077 (0.0031) 0.0226 (0.0042) 1294.9 (12.7) 57.61% 2 SH 0.0152 (0.0041) 0.1274 (0.0108) 580.1 (3.8) FH 0.0331 (0.0060) 0.1342 (0.0098) 1180 50.84% SB 0.0066 (0.0025) 0.1298 (0.0104) 642.7 (3.4) 9.74% IS 0.0055 (0.0021) 0.0217 (0.0044) 1297.6 (14.5) 55.29% 20 16 SH 0.0356 (0.0059) 0.0664 (0.0088) 1129.9 (4.8) FH 0.0470 (0.0062) 0.0724 (0.0075) 2860 60.49% SB 0.0369 (0.0059) 0.0297 (0.0052) 1251.2 (5.2) 9.69% IS 0.0061 (0.0025) 0.0118 (0.0036) 3103.1 (28.6) 63.59% 10 SH 0.0274 (0.0085) 0.1075 (0.0079) 1272.9 (5.2) FH 0.0474 (0.0062) 0.1144 (0.0100) 3040 58.13% SB 0.0234 (0.0051) 0.0736 (0.0079) 1336.7 (5.4) 4.77% IS 0.0039 (0.0019) 0.0124 (0.0032) 3413.3 (29.2) 62.71% 4 SH 0.0165 (0.0036) 0.1386 (0.0105) 1332.1 (6.2) FH 0.0360 (0.0059) 0.1370 (0.0116) 2740 51.38% SB 0.0093 (0.0030) 0.1153 (0.0099) 1422.0 (5.0) 6.32% IS 0.0033 (0.0019) 0.0150 (0.0035) 3341 (24.9) 60.14% 66 below, which were chosen to give a variety of dierent scenarios of positively and negatively correlated data streams. M 1 = 0 B B @ 1 0:8 0:8 1 1 C C A M 2 = 0 B B @ 1 0:8 0:8 1 1 C C A M 3 = 0 B B B B B B B B B B @ 1 0:8 0:6 0:8 0:8 1 0:6 0:8 0:6 0:6 1 0:8 0:8 0:8 0:8 1 1 C C C C C C C C C C A M 4 = 0 B B B B B B B B B B B B B B B B B B @ 1 0:8 0:6 0:4 0:6 0:8 0:8 1 0:8 0:4 0:6 0:8 0:6 0:8 1 0:4 0:6 0:8 0:4 0:4 0:4 1 0:8 0:6 0:6 0:6 0:6 0:8 1 0:8 0:8 0:8 0:8 0:6 0:8 1 1 C C C C C C C C C C C C C C C C C C A The interaction of these various combinations of correlations with true or false null hypotheses all show somewhat similar behavior to the case of independent data streams in the previous section, in that the SH procedure has substantially smaller expected sample size than the FH procedure and the IS procedure in all cases, more 67 than a 30% reduction in most cases, and that the SH procedure has FWE I and FWE II much closer to the prescribed values and values than the other two sequential procedures SB and IS, and similar to the FH procedure in most cases. Because the SH procedure causes more early stopping, it is interesting to note that its error control is less conservative than the other sequential procedures even in cases when data streams with true null hypotheses are positively correlated with streams having false null hypotheses, such as the second case of the M 1 -generated data and the third case of the M 3 -generated data. 3.5 Discussion The sequential Holm procedure proposed herein is a general method for combining individual sequential tests into a sequential multiple hypothesis testing procedure which controls both the type I and II familywise error rates at prescribed levels without requiring the statistician to have any knowledge or model of the data streams' correlation structure, a desirable property that it inherits from Holm's xed-sample procedure. In our simulations in Section 3.4, the sequential Holm procedure exhibits much more eciency in terms of smaller average total sample size than existing sequential procedures, as well as Holm's xed-sample test. In terms of achieved familywise error rates, our simulations suggest that the sequen- tial Holm procedure occupies a \middle ground" between existing sequential pro- cedures, which have very conservative error rates and large average sample sizes, 68 Table 3.4: Operating characteristics of sequential and xed-sample multiple testing procedures controlling familywise error rates for k streams of correlated Normal data. Covariance True Value Procedure FWE I (SE) FWE II (SE) EN(SE) Savings M 1 (1; 0) SH 0.0482 (0.0069) 0.1891 (0.0120) 9.8 (0.1) FH 0.0508 (0.0071) 0.1919 (0.0109) 16 38.75% SB 0.0252 (0.0049) 0.1014 (0.0089) 10.8 (0.2) 9.26% IS 0.0267 (0.0056) 0.1070 (0.0096) 19.4 (0.3) 49.48% M 2 (1; 0) SH 0.0255 (0.0050) 0.1158 (0.0105) 10.5 (0.2) FH 0.0336 (0.0050) 0.1116 (0.0097) 20 47.50% SB 0.0252 (0.0046) 0.0993 (0.0090) 10.8 (0.2) 2.78% IS 0.0096 (0.0030) 0.0160 (0.0039) 17.9 (0.3) 41.34% M 3 (1; 0; 1; 0) SH 0.0333(0.0060) 0.1397(0.0109) 26.7(0.2) FH 0.0411(0.0065) 0.1368(0.0112) 52 48.65% SB 0.0249(0.0043) 0.0997(0.0094) 28.5(0.2) 6.32% IS 0.0105(0.0033) 0.0211(0.0049) 58.9(0.8) 54.67% (1; 1; 0; 0) SH 0.0217 (0.0046) 27.0 (0.4) FH 0.0306 (0.0059) 0.0784 (0.0090) 56 51.79% SB 0.0212 (0.0047) 0.0784 (0.0079) 28.6 (0.5) 5.59% IS 0.0017 (0.0013) 0.0026 (0.0016) 47.3 (1.0) 42.92% M 4 (1; 0; 0; 0; 0; 0) SH 0.0345 (0.0059) 0.0700 (0.0083) 38.5(0.4) FH 0.0369 (0.0059) 0.0696 (0.0076) 90 57.33% SB 0.0336 (0.0054) 0.0330 (0.0060) 43.9 (0.4) 12.30% IS 0.0027 (0.0015) 0.0072 (0.0028) 88.0 (1.2) 56.25% (1; 0; 0; 1; 0; 0) SH 0.0338 (0.0052) 0.1139 (0.0100) 42.4 (0.3) FH 0.0381 (0.0058) 0.1168 (0.0116) 90 52.89% SB 0.0278 (0.0052) 0.0654 (0.0079) 46.2 (0.3) 8.23% IS 0.0028 (0.0016) 0.0158 (0.0037) 100.2 (1.2) 57.68% (1; 1; 0; 0; 0; 0) SH 0.0310 (0.0058) 0.0766 (0.0085) 42.5 (0.5) FH 0.0354 (0.0062) 0.0890 (0.0102) 90 52.78% SB 0.0267 (0.0050) 0.0512 (0.0067) 46.2 (0.5) 8.01% IS 0.0075 (0.0026) 0.0016 (0.0013) 91.9 (1.3) 53.75% (1; 1; 1; 0; 0; 0) SH 0.0215 (0.0044) 0.0812 (0.0097) 45.7 (0.6) FH 0.0295 (0.0052) 0.0769 (0.0084) 96 52.40% SB 0.0193 (0.0047) 0.0700 (0.0093) 48.4 (0.7) 5.58% IS 0.0005 (0.0007) 0.0009 (0.0009) 84.2 (1.5) 45.72% (1; 1; 0; 1; 1; 0) SH 0.0262 (0.0055) 0.1250 (0.0094) 47.3 (0.4) FH 0.0401 (0.0070) 0.1234 (0.0090) 90 47.44% SB 0.0170 (0.0045) 0.1052 (0.0101) 50.8(0.4) 6.89% IS 0.0054 (0.0025) 0.0065 (0.0024) 101.5(1.1) 53.40% (1; 1; 1; 1; 0; 0) SH 0.0184 (0.0038) 0.1239 (0.0095) 47.6(0.5) FH 0.0285 (0.0060) 0.1211 (0.0092) 90 47.11% SB 0.0142 (0.0046) 0.1029 (0.0105) 50.8 (0.6) 6.30% IS 0.0009 (0.0011) 0.0202 (0.0045) 95.9 (1.1) 50.36% (1; 1; 1; 1; 1; 0) SH 0.0125 (0.0035) 0.1300 (0.0104) 48.1 (0.5) FH 0.0293 (0.0051) 0.1328 (0.0101) 84 42.74% SB 0.0082 (0.0029 0.1210 (0.0102) 53.1 (0.5) 9.42% IS 0.0019 (0.0014) 0.0113 (0.0037) 95.9 (1.4) 49.32% 69 and the xed-sample Holm test which achieves error rates closest to the prescribed values of all the procedures considered, but has still larger sample size and lacks the exibility and adaptive nature of the sequential procedures. We summarize our recommendations for using the sequential Holm procedure in practice as follows. For data streams whose hypotheses are simple, or are composite but can be reduced to considering simple hypotheses, we recommend using the sequential log-likelihood ratio statistic (3.21) with the closed-form critical values (3.25). For data streams with composite hypotheses of the form (3.30), we recom- mend using the sequential generalized likelihood ratio statistic (3.34) and determining the critical valuesfA i s ;B i s g s2[k] to satisfy (3.2)-(3.3) by Monte Carlo. For group-sequential sampling with moderate group size the critical values can be determined by normal approximation. Data streams with still other forms of hypotheses or test statistics (e.g., nonparametric) can be included in the sequential Holm procedure by deter- mining critical valuesfA i s ;B i s g s2[k] satisfying (3.2)-(3.3) by Monte Carlo or other methods. 70 Chapter 4 Sequential Testing of Multiple Hypotheses Controlling False Discovery Rate Recall that given 2 FDR is dened as FDR = FDR() =E (Q); where Q =V=(R_ 1); V is the number of true hypotheses rejected, andR the total number of hypotheses rejected; by convention V=R = 0 when R = 0. FDR is an important multiple testing error metric because it is less stringent than FWER, and requiring only FDR control allows the use of procedures with more power than under FWER control, in general. We rst introduce a xed-sample step-up procedure that controls FDR under independence in Section 4.1. Then in Section 4.2 we generalize it to the sequen- tial setting and prove that the sequential procedure controls FDR when the data 71 streams for the true hypotheses are independent and that a conservative modi- cation can keep the FDR under control even when data streams are dependent. Finally, we discuss how to nd the critical values required in the procedure for individual hypothesis in Section 4.3. Note that even though our results for FWER & FDR parallel each other, the methods are quite dierent. FDR-controlling procedures require all order statistics to be computed at all sampling points, whereas FWER-control can be achieved by only knowing the maximum of the active (i.e., still being sampled) test statistics, after the appropriate standardization, at each sampling point. 4.1 Benjamini and Hochberg's Fixed-Sample Step-Up Procedure Assume that we want to test k hypotheses H (1) ;H (2) ;:::;H (k) at the same time and that for each H (i) there is a valid p-value p (i) such that, for any 2 H (i) , P (p (i) ) for all 0 < < 1. Let p (i 1 ) p (i 2 ) p (i k ) be the ordered p-values. After introducing the test of the global hypothesis H 0 = T k i=1 H (i) , Simes [44] suggested an exploratory procedure for testing the individual hypothe- ses that rejects H (i 1 ) ;:::;H (im) , where m = maxfj : p (i j ) j=kg and accepts H (i m+1 ) ;:::;H (i k ) . In other words, the procedure rejects all the hypotheses if p (i k ) , otherwise, it moves on to H (i k1 ) and rejects H (i 1 ) ;:::;H (i k1 ) if 72 p (i k1 ) (k 1)=k, etc. This procedure is considered a step-up procedure since it starts with the least signicant hypothesis. However, Hommel [25] showed that Sime's procedure doesn't strongly control the FWER even if all the hypotheses are independent. To see why, suppose that among the k independent hypotheses, k 0 are true, and for the otherkk 0 false hypothesesH (i) ,P (p (i) =k) is close to 1. Then the probability of rejecting at least one of the true hypotheses is then close to 1 [1 (kk 0 + 1)=k] k 0 , which tends to 1 when k 0 =k=2 and k!1. Nev- ertheless, the procedure was shown by Benjamini and Hochberg [8] to control the FDR underk 0 =k when the true hypotheses are independent. Positive dependence was later addressed by Benjamini and Yekutieli [9], who showed that FDR-control holds under certain positive dependence. 4.2 A Sequential Procedure Controlling FDR Our approach to deriving rejective sequential procedures, i.e., procedures that only allow early rejections, that control FDR is inspired by both our work in Chapter 3 and [6] completed on FWER control, as well as recent work by Benjamini and Yekutieli [9] and Sarkar [40] which have generalized the situations under which the Benjamini-Hochberg [8] procedure (denoted BH) described in Section 4.1 con- trols FDR, and led to a better understanding of FDR in general by providing an alternative form for FDR. In this section, we introduce our rejective sequential 73 FDR-controlling procedure and then state our result that under independence of true hypotheses, the procedure controls FDR. 4.2.1 Notation and Set-Up Recall that one wants to investigate some phenomenon, resulting ink data streams: Data stream 1 X (1) 1 ;X (1) 2 ;::: from Experiment 1 Data stream 2 X (2) 1 ;X (2) 2 ;::: from Experiment 2 (4.1) . . . Data stream k X (k) 1 ;X (k) 2 ;::: from Experiment k. and we havek individual sequential test statisticsf (i) (n)g i2[k];n1 , where (i) (n) is the statistic for testingH (i) vs.G (i) based on the dataX (i) 1 ;X (i) 2 ;:::;X (i) n available from the ith stream at time n. For simplicity we adopt truncated sequential testing so that n takes the values 1; 2;:::;N <1, and we consider rejective sequential tests in which sampling is stopped beforen =N only to reject a hypothesis (e.g., a \discovery", as it is often called), and a hypothesis is accepted if its corresponding data stream is sampled until n = N. We are considering the truncated setting because the rejective approach would lead to unpractically large samples for true hypotheses and make 74 the procedure less ecient. None of these assumptions are crucial and what follows can be generalized. We use the same notation as in Section 3.2. For any positive integer j let [j] =f1;:::;jg. Assume that for eachi2 [k] and any2 (0; 1), the corresponding sequential statistic (i) (n) has critical values A (i) 1 A (i) 2 :::A (i) k (4.2) such that P i ( (i) (n)A (i) j some n<N) k + 1j k for all j2 [k]; i 2H i : (4.3) This is simply a bound on the type I error probability of the sequential test that stops and rejects H i at sample size n < N if (i) (n) A (i) j , and can easily be constructed for a variety of data types and hypotheses H (i) , which we will discuss in Section 4.3. For the same reason stated in Section 3.2, for each stream i we introduce a standardizing function ' i () which will be applied to the statistic i (n) before ranking. The standardizing functions' i can be any increasing functions such that 75 ' i (A i s ) do not depend on i. For simplicity, here we take the ' i to be piecewise linear functions such that ' i (A (i) s ) =s for alls2 [k]: (4.4) That is, for i2 [k] dene ' i (x) = 8 > > > > < > > > > : xA (i) s A (i) s+1 A (i) s +s; for A (i) s xA (i) s+1 ; 1s<k xA (i) k +k; for xA (i) k : 4.2.2 The Rejective Sequential Benjamini-Hochberg Pro- cedure We dene our sequential multiple testing procedure, which we call the Rejective Sequential Benjamini-Hochberg procedure, in terms of stages of sampling, between which hypotheses are rejected. LetI j denote the indices of the active hypotheses (i.e., the hypotheses that have not been rejected yet) at the beginning of the jth stage of sampling, and let n j denote the cumulative sample size of the active data streams at the end of the jth stage of sampling. Accordingly, setI 1 = [k] and n 0 = 0. The Rejective Sequential Benjamini-Hochberg Procedure. The jth stage of sampling (j = 1; 2;:::) proceeds as follows. 76 1. Sample and standardize the test statistics e (i) (n j ) = ' i ( (i) (n j )) for the active data streamsfX (i) n g i2I j ;n>n j1 until n equals n j =N^ n n>n j1 : e (i(n;`)) (n)`; some `2 [jI j j] o ; (4.5) where i(n;`) denotes the index of the `th ordered active test statistic at sample size n. 2. (a) If n j <N, reject the active hypotheses H (i(n j ;` j )) ;H (i(n j ;` j +1)) ;:::;H (i(n j ;jI j j)) ; where ` j = minf`2 [jI j j] : i(n j ;`) (n)`g: (4.6) SetI j+1 to be the indices of the remaining hypotheses and proceed to stage j + 1. (b) Otherwise, n =N so accept all active hypotheses H i , i2I j , and stop. In words, this procedure samples all active test statistics until at least one of them will be rejected, indicated by the stopping rule (4.5) which is related to the BH rejection rule. Once this happens, a step-up type rejection rule is used in (4.6) to reject certain hypotheses before the next stage of sampling begins. Note that if the same critical values are used for all data streams, that is, if A (i) s =A (i 0 ) s =A s 77 for all i;i 0 ;s2 [k], then the standardization performed in Step 1 can be dispensed with as long as the values to the right of the inequalities in (4.5) and (4.6) are replaced by A ` . Our main result is that the Rejective Sequential Benjamini-Hochberg procedure controls the FDR with a simple conservative modication, which is not necessary if the true hypotheses are independent. This generalizes the result of Benjamini and Yekutieli [9]. Theorem 4.1. Fix 2 (0; 1). If the test statistics (i) (n), i2 [k], n 1, and critical values A (i) s =A (i) s (), i;s2 [k], satisfy (4.3), then the Rejective Sequential Benjamini-Hochberg procedure dened above satises FDR() k X j=1 1 j ! k 0 k for all 2 ; where k 0 is the number of true hypotheses. Furthermore, if the data streams cor- responding to the k 0 true hypotheses are independent, then FDR() k 0 k for all 2 : Proof. Fix 2 , and omit it and from the notation. For simplicity let H 1 ;:::;H k 0 be the true hypotheses. For i2 [k 0 ] and r2 [k], dene the events W i;r =f e (i) (n)k + 1r some n<Ng 78 so that, by (4.3) and the denition of the standardizing function (4.4), P (W i;r )r=k: (4.7) For v2 [k 0 ] and s = 0;:::;kk 0 let v =f [k 0 ] :jj =vg, V v;s =fH i , i2, and s false hypotheses rejectedg for 2 v , and V v;s = [ 2 v V v;s =fv true and s false hypotheses rejectedg; and note that this union is disjoint. First we show that P (W i;v+s \V v;s ) 1fi2gP (V v;s ): (4.8) If i62 then the right-hand side is 0 so (4.8) holds. To show that (4.8) holds for i2 it suces to show thatV v;s W i;v+s , or equivalently thatV v;s \(W i;v+s ) c =;, where () c denotes complement. Suppose toward contradiction that there is an outcome inV v;s \(W i;v+s ) c , and letj denote the last stage at which any hypothesis is rejected. By denition of the procedure, for anyj2 [j ],k` j +1 is the number 79 of hypotheses that have been rejected up to and including thejth stage; from this, the following two facts follow: k + 1 (v +s) =` j ; and (4.9) ` j ` j for all j2 [j ]: (4.10) Since the outcome is in (W i;v+s ) c we have that for all n<N, e (i) (n)<k + 1 (v +s) =` j (by (4.9)) ` j for any j2 [j ] by (4.2) and (4.10). This shows that H i is not rejected at any stage j2 [j ] and is therefore accepted, contradicting that all the hypotheses in are rejected, and establishing (4.8). Now we follow the arguments of Theorem 1.3 in Benjamini and Yekutieli [9] more directly. Using (4.8) we write k 0 X i=1 P (W i;v+s \V v;s ) = k 0 X i=1 X 2 v P (W i;v+s \V v;s ) k 0 X i=1 X 2 v 1fi2gP (V v;s ) = X 2 v k 0 X i=1 1fi2gP (V v;s ) =jj X 2 v P (V v;s ) =vP (V v;s ): (4.11) 80 Using this and the denition of FDR, FDR = kk 0 X s=0 k 0 X v=1 v v +s P (V v;s ) kk 0 X s=0 k 0 X v=1 v v +s 1 v k 0 X i=1 P (W i;v+s \V v;s ) ! = kk 0 X s=0 k 0 X v=1 1 v +s k 0 X i=1 P (W i;v+s \V v;s ): (4.12) Dene U v;s;i be the event in which, if H i is rejected, then v 1 other true and s false hypotheses are also rejected, so that W i;v+s \V v;s = W i;v+s \U v;s;i . Let U r;i = S v+s=r U v;s;i and note that, for any i, U 1;i ;:::;U k;i partition the sample space. Then, starting at (4.12), FDR kk 0 X s=0 k 0 X v=1 1 v +s k 0 X i=1 P (W i;v+s \U v;s;i ) = k 0 X i=1 k X r=1 1 r P (W i;r \U r;i ) (4.13) Then we dene p i;j;r =P (W i;j \W c i;j1 \U r;i ): Here i = 1; 2;:::;k 0 , j = 1; 2;:::;r, and r = 1; 2;:::;k. By convention, set A k+1 =1 so that P (W i;0 ) = 0. It follows that 81 FDR k 0 X i=1 K X r=1 1 r r X j=1 p i;j;r = k 0 X i=1 k X j=1 k X r=j 1 r p i;j;r k 0 X i=1 k X j=1 k X r=j 1 j p i;j;r k 0 X i=1 k X j=1 1 j k X r=1 p i;j;r = k 0 X i=1 k X j=1 1 j P (W i;j \W c i;j1 ) = k 0 X i=1 k X j=1 1 j P (W i;j nW i;j1 ) = k 0 X i=1 k X j=1 1 j [P (W i;j )P (W i;j1 )] = k 0 X i=1 " k X j=1 1 j P (W i;j ) k1 X j=0 1 j + 1 P (W i;j ) # = k 0 X i=1 " k1 X j=1 P (W i;j )( 1 j 1 j + 1 ) + 1 k P (W i;k )P (W i;0 ) # k 0 X i=1 " k1 X j=1 k(j + 1) + k # (by (4.7)) = k 0 X i=1 k X j=1 kj = k X j=1 1 j ! k 0 k 82 If the data streams are independent, by (4.13) we have FDR k 0 X i=1 k X r=1 1 r P (W i;r )P (U r;i ) k 0 X i=1 k X r=1 1 r r k P (U r;i ) (by (4.7)) = k k 0 X i=1 k X r=1 P (U r;i ) = k k 0 X i=1 1 = k 0 k : If the hypotheses are dependent, which is the case in most applications, by The- orem 4.1, we can conduct the rejective sequential Benjamini-Hochberg procedure with =( P k j=1 1=j) instead of and the FDR will still be controlled. 4.3 ConstructingTestStatisticsthatSatisfy (4.3) for Individual Data Streams All that is needed in the above construction of the rejective sequential Benjamini- Hochberg procedure are sequential test statistics and critical values satisfying (4.3) for each data stream. In most situations, the critical values can be acquired using Monte Carlo simulation. However, for simple hypotheses, it is possible to nd 83 approximate closed form solutions, which is useful in practice since many com- posite hypotheses can be reduced to considering simple hypotheses by monotone likelihood ratio considerations and the like. In this section we show how to construct the test statistics (i) (n) and critical valuesfA (i) j ()g j2[k] satisfying (4.3) for H (i) : = 0 vs. G (i) : = 1 about streami of Normal distributed data with known . Without loss of generality, we set = 1 and assume that 0 < 1 . This approach will be equally applicable for other distributions in settings where the expected sample size is moderate. We know that S (i) n = P n j=1 X (i) j has the same distribution as W (n), where fW (t); 0t1g is a Brownian motion with drift. The log likelihood ratio for testing the drift of a Brownian motion is ( 1 0 )(W (t)( 0 + 1 )t=2). Therefore, we utilize the test statistic (i) (n) = S (i) n n, where = ( 0 + 1 )=2. By the argument in Section 3.3 of [42] concerning truncated sequential tests, we have that P ( (i) (n)b some n<N) = 1 b(N 1) 1=2 ()(N 1) 1=2 +e 2b() b(N 1) 1=2 ()(N 1) 1=2 (4.14) 84 where is the Normal c.d.f. The probability of type I error for the test that rejects when (i) (n)b for some n<N is P 0 ( (i) (n)b some n<N) = b(N 1) 1=2 + 1 2 ( 0 1 )(N 1) 1=2 +e b( 0 1 ) b(N 1) 1=2 1 2 ( 0 1 )(N 1) 1=2 : (4.15) We will show that the right-hand side of (4.15) is bounded by e b( 0 1 ) . Let f(N) denote the right-hand side of (4.15). First, note that lim N!1 f(N) = e b( 0 1 ) , so we will show that f is an increasing function. We have f 0 (N) = 1 2 b(N 1) 3=2 C 1 + 1 4 ( 0 1 )(N 1) 1=2 C 2 ; where C 1 = b(N 1) 1=2 + 1 2 ( 0 1 )(N 1) 1=2 +e b( 0 1 ) b(N 1) 1=2 1 2 ( 0 1 )(N 1) 1=2 ; C 2 = b(N 1) 1=2 + 1 2 ( 0 1 )(N 1) 1=2 e b( 0 1 ) b(N 1) 1=2 1 2 ( 0 1 )(N 1) 1=2 : 85 Note that C 1 > 0 and that (b(N 1) 1=2 + 1 2 ( 0 1 )(N 1) 1=2 ) =e b( 0 1 ) (b(N 1) 1=2 1 2 ( 0 1 )(N 1) 1=2 ); which implies C 2 = 0. Hence f 0 (N)> 0 and we have proved that P 0 ( (i) (n)b some n<N)e b( 0 1 ) : (4.16) If we choose b = j ln ()j 1 0 ; (4.17) then, by (4.16), the type I error rate is no larger than . Similarly, the critical values A (i) j () = ln ( k+1j k ) 0 1 for j = 1; 2;:::;k: (4.18) satisfy (4.3). When using the critical values dened by (4.17) and (4.18), one must remember that e b( 0 1 ) is the large-N limit of the type I error rate and may only be a good approximation to the error rate when N is large. For smallN, the type I error probability may be signicantly less than e b( 0 1 ) and hence the critical values (4.18) based on this approximation are very conservative. Table 4.1 shows the signicance levels actually attained by using the critical value (4.17) in the cse of Normal data. The actual signicance levels are much closer to the prescribed 86 levels when( 0 1 )(N 1) 1=2 is large. When( 0 1 )(N 1) 1=2 3, i.e., when N 9=( 0 1 ) 2 + 1, the actual signicance levels are much smaller than the desired levels, which can decrease the power of the test. Table 4.1: Signicance levels of testing = 0 vs. = 1 aboutN(; 1) data with critical value (4.17). ( 1 0 )(N 1) 1=2 = 0:01 = 0:025 = 0:05 = 0:1 = 0:2 2 0.0014 0.0072 0.0217 0.0597 0.1510 3 0.0061 0.0183 0.0408 0.0885 0.1873 4 0.0088 0.0232 0.0477 0.0973 0.1971 5 0.0097 0.0246 0.0495 0.0995 0.1994 6 0.0100 0.0249 0.0499 0.0999 0.1999 7 0.0100 0.0250 0.0500 0.1000 0.2000 By (4.14), we can calculate the power of the test with critical value (4.17): P 1 ( (i) (n)b some n<N) = ln () ( 0 1 )(N 1) 1=2 1 2 ( 0 1 )(N 1) 1=2 + 1 ln () ( 0 1 )(N 1) 1=2 + 1 2 ( 0 1 )(N 1) 1=2 : (4.19) Power calculated by (4.19) are listed in Table 4.2. It shows that if we set = 0:05, in order to have power 0:8 we need ( 0 ) p N to be at least around 3. That is to say N ' 9=( 1 0 ) 2 . If 0 = 0 and 1 = 1 then we need N to be at least 10. However, if we want to be able to detect a smaller deviation of, say 1 0 = 0:5, then N needs to be 37. To summarize, we encourage the use of critical values (4.18) because of the ease the closed form expressions aord when testing = 0 vs. = 1 for Normal 87 Table 4.2: Powers of testing = 0 vs. = 1 with critical value (4.17). ( 1 0 )(N 1) 1=2 = 0:01 = 0:025 = 0:05 = 0:1 = 0:2 2 0.1443 0.2882 0.4342 0.5971 0.7552 3 0.6063 07334 0.8167 0.8848 0.9366 4 0.6063 0.9289 0.9543 0.9729 0.9857 5 0.8833 0.9851 0.9907 0.9947 0.9972 6 0.9950 0.9975 0.9985 0.9991 0.9996 7 0.9994 0.9997 0.9998 0.9999 0.9999 data stream i with known when N ' 9=( 1 0 ) 2 and for other distributions when the analogous inequalities hold. For simple hypotheses, such as the one discussed in Section 4.3, sometimes it's possible to nd closed form critical values. However, for many hypotheses, the distributions of sequential test statistics are complicated and no closed-form expressions are possible. For example, to test = 1 vs. 6= 1 about Normal data with unknown , a reasonable choice of test statistic suggested in Chapter 5 of [42] is (i) (n) = 1 2 n log (1 + ( X (i) n ) 2 =(S (i) n ) 2 ); where X (i) n = 1 n n X j=1 X (i) j and (S (i) n ) 2 = 1 n n X j=1 (X (i) j X (i) n ) 2 : The asymptotic distribution of this test statistic is discussed in [42], but no closed- form critical values are possible. Under such circumstances, we advocate Monte Carlo determination of the critical values. In this case, for example, note that the 88 distribution of (i) (n) depends on only through =. Hence it does not depend on if = 0 and the null hypothesis is true. In this case we can simply use = 1 for the Monte Carlo simulation, and the critical values obtained will be applicable regardless of . 4.4 Discussion The rejective sequential Benjamini-Hochberg procedure proposed above is a gen- eralization of the xed-sample Benjamini-Hochberg procedure. No knowledge or model of the data streams' correlation structure is needed when performing this procedure, which is highly desirable in many applications. For simplicity we adopted truncated sequential testing so thatn takes the values 1; 2;:::;N <1. We emphasize that this assumption is not essential and can easily be generalized to group sequential settings. Instead of possible sample sizes being 1; 2;:::;N <1, n can take values N 1 ;N 2 ;:::;N m for certain m <1, which is the maximum number of groups the experimenters are willing to investigate. In that case, we need to choose critical values A (i) 1 A (i) 2 :::A (i) k (4.20) 89 such that P i ( (i) (n)A i j some n2fN 1 ;N 2 ;:::;N m g) k + 1j k for all j2 [k]; i 2H i (4.21) instead of the critical values (4.3), for which Theorem 4.1 will still be true. Although Theorem 4.1 shows that a conservative modication might be needed when the data streams are dependent, our simulations nd the FDR well below the prescribed level under various correlation structures of the data streams even without the modication of replacing with=( P k j=1 1=j). Benjamini and Yeku- tieli [9] proved that the xed-sample Benjamini-Hochberg procedure controls FDR when the test statistics have positive regression dependency, which covers many problems of practical interest. Our simulation results suggest that the rejective sequential Benjamini-Hochberg procedure also controls FDR under a variety of dependencies. This is a very interesting topic warranting future work which could potentially make the procedure applicable to a wider range of problems. 90 Chapter 5 Sequential Testing of Multiple Hypotheses Controlling Type I and Type II False Discovery Rates 5.1 Introduction Given 2 , the type I and type II false discovery rates, denoted FDR I () and FDR II (), are dened as FDR I () =E V R_ 1 ; FDR II () =E U S_ 1 : V is the number of true hypotheses rejected, R is the total number of hypotheses rejected, U is the number of false hypotheses accepted and S is the total num- ber of hypotheses accepted. Note that if all the null hypotheses are true, then FDR I coincides with FWE I , otherwise FDR I FWE I . Similarly, if all the null hypotheses are false, FDR II is the same as FWE I , but under any other congu- ration FDR II FWE II . Therefore any procedure that controls type I and type 91 II familywise error rates automatically controls type I and type II false discovery rates and procedures requiring only false discovery rate control are less stringent, in general. 5.2 A Sequential Test Controlling FDR I and FDR II 5.2.1 Notation and Set-Up Consider k individual sequential test statisticsf (i) (n)g i2[k];n1 , where (i) (n) is the statistic for testingH (i) vs.G (i) based on the dataX (i) 1 ;X (i) 2 ;:::;X (i) n available from the ith stream at time n. Given desired FDR I () and FDR II () bounds and 2 (0; 1), respectively, for each data stream i we assume the existence of critical values A (i) k A (i) k1 :::A (i) 1 <B (i) 1 B (i) 2 :::B (i) k (5.1) such that P (i)( (i) (n)B (i) s some n, (i) (n 0 )>A (i) k all n 0 <n) ks + 1 k for all (i) 2H (i) (5.2) 92 P (i)( (i) (n)A (i) s some n, (i) (n 0 )<B (i) k all n 0 <n) ks + 1 k for all (i) 2G (i) (5.3) for all i;s 2 [k]. As in the previous chapters, the sequential multiple testing procedure introduced below will involve ranking, so for each streami we introduce a standardizing function ' (i) () which will be applied to the statistic (i) (n) before ranking. The standardizing functions ' (i) can be any increasing functions such that ' (i) (A (i) s ) and ' (i) (B (i) s ) do not depend on i. For simplicity, here we take the ' (i) to be piecewise linear functions such that ' (i) (A (i) s ) =s and ' (i) (B (i) s ) =s for all s2 [k]. (5.4) That is, for i2 [k] dene ' (i) (x) = 8 > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > : xA (i) k k; forxA (i) k xA (i) s+1 A (i) s A (i) s+1 (s + 1); forA (i) s+1 xA (i) s ifA (i) s >A (i) s+1 ; 1s<k 2(xA (i) 1 ) B (i) 1 A (i) 1 1; forA (i) 1 xB (i) 1 xB (i) s1 B (i) s B (i) s1 +s 1; forB (i) s1 xB (i) s ifB (i) s >B (i) s1 ; 1<sk xB (i) k +k; forxB (i) k : 93 5.2.2 The Sequential Benjamini-Hochberg Procedure We shall describe the sequential Benjamini-Hochberg procedure in terms of stages of sampling, between which accept/reject decisions are made. LetI j [k] (j = 1; 2;:::) denote the index set of the active hypotheses (i.e., theH (i) which have been neither accepted nor rejected yet) at the beginning of the jth stage of sampling, and n j will denote the cumulative sample size of any active test statistic up to and including the jth stage. The total number of null hypotheses that have been rejected (resp. accepted) at the beginning of the jth stage will be denoted by r j (resp. a j ). Accordingly, setI 1 = [k], n 0 = 0, a 1 = r 1 = 0, letjj denote set cardinality, and x desired FDR I () and FDR II () bounds and , respectively. The Sequential Benjamini-Hochberg Procedure.The jth stage of sampling (j = 1; 2;:::) proceeds as follows: 1. Sample and standardize the test statistics e (i) (n j ) = ' i ( (i) (n j )) for the active data streamsfX (i) n g i2I j ;n>n j1 until n equals n j = inffn>n j1 :(jI j j +r j ` + 1)< e (i(n;`)) (n)<a j +` is violated for some i2I j g; (5.5) wherei(n;`) denotes the index of the`th ordered active standardized statistic at sample size n. 94 2. (a) If the rst inequality in (5.5) was violated for some i 2 I j , i.e., if e (i(n j ;`)) (n)(jI j j+r j `+1), then accept them j 1 null hypotheses H (i(j;1)) ;H (i(j;2)) ;:::;H (i(j;m j )) ; where m j = max n mjI j j : e (i(n j ;m)) (n j )(jI j j +r j m + 1) o ; (5.6) and set a j+1 =a j +m j . Otherwise set a j+1 =a j . (b) If the second inequality in (5.5) was violated for some i2I j , i.e., if e (i(n j ;`)) (n)a j +`, then reject the m 0 j 1 null hypotheses H (i(j;jI j j)) ;H (i(j;jI j j1)) ;:::;H (i(j;jI j jm 0 j +1)) ; (5.7) where m 0 j = max n mjI j j : e (i(n j ;jI j jm+1)) (n j )jI j j +a j m + 1 o ; (5.8) and set r j+1 =r j +m 0 j . Otherwise set r j+1 =r j . 95 3. Stop if there are no remaining active hypotheses, i.e., if a j+1 +r j+1 = k. Otherwise, letI j+1 be the indices of the remaining active hypotheses and continue on to stage j + 1. Note that if the same critical values are used for all data streams, that is, if A (i) s =A (i 0 ) s =A s andB (i) s =B (i 0 ) s =B s for alli;i 0 ;s2 [k], then the standardization performed in Step 1 can be dispensed with as long as the values to the right of the inequalities in (5.6) and (5.8) are replaced by A jI j j+r j m+1 and B jI j j+a j m+1 , respectively. Now we show that the Sequential Benjamini-Hochberg procedure con- trols FDR I and FDR II if we conduct the procedure with =( P k j=1 1=j) and =( P k j=1 1=j) instead of and . This modication is not necessary if the data streams are independent. Specically, if the data streams for the true hypotheses are independent, then FDR I k 0 =k, wherek 0 is the number of true hypotheses. Symmetrically, FDR II (kk 0 )=k when the data streams for the kk 0 false hypotheses are independent. Theorem 5.1. Fix ;2 (0; 1). If the test statistics (i) (n), i2 [k], n 1, and critical values A (i) s =A (i) s (;) and B (i) s =B (i) s (;), i;s2 [k], satisfy (5.2)-(5.3), then the Sequential Benjamini-Hochberg procedure dened above satises FDR I () k X j=1 1 j ! k 0 k and FDR II () k X j=1 1 j ! kk 0 k for all 2 ; 96 where k 0 is the number of true hypotheses. Furthermore, if the data streams cor- responding to the k 0 true hypotheses are independent, then FDR I () k 0 k for all 2 : If the data streams corresponding to the kk 0 false hypotheses are independent, then FDR II () kk 0 k for all 2 : Proof. Fix 2 and omit it from the notation. For simplicity let H (1) ;:::;H (k 0 ) be the true hypotheses. For i2 [k 0 ] and r2 [k], dene the events W i;r =f e (i) (n)k + 1r some n; e (i) (n 0 )>k all n 0 <ng so that, by (5.2) and the denition of the standardizing function (5.4), P (W i;r ) =P (f (i) (n)B (i) k+1r some n; (i) (n 0 )>A (i) k all n 0 <ng)r=k: (5.9) For v2 [k 0 ] and s = 0;:::;kk 0 let v =f [k 0 ] :jj =vg, V v;s =fH i , i2, and s false hypotheses rejectedg for 2 v , and V v;s = [ 2 v V v;s =fv true and s false hypotheses rejectedg; 97 and note that this union is disjoint. First we show that P (W i;v+s \V v;s ) 1fi2gP (V v;s ): (5.10) If i62 then the right-hand side is 0 so (5.10) holds. If i2 , let j be the last stage at which any hypothesis is rejected. There exists j 2 [j ] such that e (i) (n j )jI j j +a j m 0 j + 1 and e (i) (n 0 )>k all n 0 <n j . Note that jI j j +a j m 0 j + 1 =kr j m 0 j + 1 =k + 1 (r j +m 0 j )k + 1 (v +s) Hence e (i) (n j ) k + 1 (v +s) and e (i) (n 0 ) >k all n 0 < n j . This shows that V v;s W i;v+s when i2 and therefore (5.10) holds. Now using (5.10) we write k 0 X i=1 P (W i;v+s \V v;s ) = k 0 X i=1 X 2 v P (W i;v+s \V v;s ) k 0 X i=1 X 2 v 1fi2gP (V v;s ) = X 2 v k 0 X i=1 1fi2gP (V v;s ) =jj X 2 v P (V v;s ) =vP (V v;s ): (5.11) 98 Using this and the denition of FDR I , FDR I = kk 0 X s=0 k 0 X v=1 v v +s P (V v;s ) kk 0 X s=0 k 0 X v=1 v v +s 1 v k 0 X i=1 P (W i;v+s \V v;s ) ! = kk 0 X s=0 k 0 X v=1 1 v +s k 0 X i=1 P (W i;v+s \V v;s ): (5.12) Dene U v;s;i be the event in which, if H i is rejected, then v 1 other true and s false hypotheses are also rejected, so that W i;v+s \V v;s = W i;v+s \U v;s;i . Let U r;i = S v+s=r U v;s;i and note that, for any i, U 1;i ;:::;U k;i partition the sample space. Then, starting at (5.12), FDR I kk 0 X s=0 k 0 X v=1 1 v +s k 0 X i=1 P (W i;v+s \U v;s;i ) = k 0 X i=1 k X r=1 1 r P (W i;r \U r;i ) (5.13) Then we denep i;j;r =P ((W i;j nW i;j1 )\U r;i ). Herei = 1; 2;:::;k 0 ,j = 1; 2;:::;r, and r = 1; 2;:::;k. So far W i;r has only been dened for r 2 [k]. We now dene W i;0 =;. Note that W i;j1 W i;j for j = 1; 2;:::;r, therefore W i;r = S r j=1 (W i;j nW i;j1 ) and the union is disjoint. Hence we have 99 FDR I = k 0 X i=1 K X r=1 1 r r X j=1 p i;j;r = k 0 X i=1 k X j=1 k X r=j 1 r p i;j;r k 0 X i=1 k X j=1 k X r=j 1 j p i;j;r k 0 X i=1 k X j=1 1 j k X r=1 p i;j;r = k 0 X i=1 k X j=1 1 j P (W i;j nW i;j1 ) = k 0 X i=1 k X j=1 1 j [P (W i;j )P (W i;j1 )] = k 0 X i=1 " k X j=1 1 j P (W i;j ) k1 X j=0 1 j + 1 P (W i;j ) # = k 0 X i=1 " k1 X j=1 P (W i;j )( 1 j 1 j + 1 ) + 1 k P (W i;k )P (W i;0 ) # k 0 X i=1 " k1 X j=1 k(j + 1) + k # (by (5.9)) = k 0 X i=1 k X j=1 kj = k X j=1 1 j ! k 0 k 100 If the data streams are independent, from (5.13), we have FDR I k 0 X i=1 k X r=1 1 r P (W i;r )P (U r;i ) k 0 X i=1 k X r=1 1 r r k P (U r;i ) (by (5.9)) = k k 0 X i=1 k X r=1 P (U r;i ) = k k 0 X i=1 1 = k 0 k : The proof of for FDR II is entirely symmetric so the details are omitted. The only thing that could change the situation is the possibility that a hypothesis that would have been rejected in Step 2b is accepted in Step 2a. But this cannot happen because ifH (i) is accepted at stagej theni =i(j;m) for somemm j , so by (5.6) we have e (i) (n j ) = e (i(j;m)) (n j )(jI j j +r j m j + 1)< 0< (jI j j +a j m 0 j + 1); which, together with (5.7) and (5.8), shows that H (i) = H (i(j;m)) could not have also been rejected. A similar argument shows that a null hypothesis that is rejected could not also be accepted at the same stage. 101 5.3 ConstructingTestStatisticsthatSatisfy (5.2) and (5.3) for Individual Data Streams In this section we show how to construct the test statistics (i) (n) and critical valuesfA (i) s ;B (i) s g s2[k] satisfying (5.2)-(5.3) for any data stream i such that H (i) andG (i) are both simple hypotheses. We represent the simple null and alternative hypothesesH (i) andG (i) by the corresponding distinct density functionsh (i) (null) andg (i) (alternative) with respect to some common-nite measure (i) . Formally, the parameter space (i) corresponding to this data stream is the set of all densities f with respect to (i) , and H (i) is considered true if the true density f (i) satises f (i) = h (i) (i) -a.s., and is false if f (i) = g (i) (i) -a.s. Follow the reasoning of Section 3.3.1, we utilize the simple log-likelihood ratio test statistic (i) (n) = n X j=1 log g (i) (X (i) j ) h (i) (X (i) j ) ! : (5.14) and the critical values A(;) and B(;) satisfying P h (i)( (i) (n)B(;) some n, (i) (n 0 )>A(;) all n 0 <n) (5.15) P g (i)( (i) (n)A(;) some n, (i) (n 0 )<B(;) all n 0 <n): (5.16) 102 Again, our choice of the critical values is based on the Wald-approximations A(;) = log 1 +; B(;) = log 1 : (5.17) Theorem 5.2. Suppose that, for a certain data stream i, H (i) : f (i) = h (i) and G (i) :f (i) =g (i) are simple hypotheses. Let (i) Wald (;) and (i) Wald (;) be the val- ues of the probabilities on the left-hand-sides of (5.15) and (5.16), respectively, with (i) (n) given by (5.14) andA(;) andB(;) given by the Wald approximations (5.17). Now x ;2 (0; 1) and for s2 [k] let s = s (;) = k (ks + 1) k(k) ; s = s (;) = k (ks + 1) k(k) : Also, let (i) BH (s) and (i) BH (s) denote the left-hand-sides of (5.2) and (5.3), respec- tively, with A (i) s , B (i) s given by A (i) s =A (i) s (;) = log (ks + 1) (1 s )k + B (i) s =B (i) s (;) = log (1 s )k (ks + 1) : (5.18) Then, for all s2 [k], (i) BH (s) = (i) Wald ((ks + 1)=k; s ) and (5.19) (i) BH (s) = (i) Wald ( s ; (ks + 1)=k) (5.20) 103 and therefore (5.2)-(5.3) hold, up to Wald's approximation, when using the critical values (5.18). Proof. First note that s ; s 2 (0; 1) for all s2 [k] since 0< s = k (ks + 1) k(k) k k(k) = k < 1 ask 2, and similarly for s . A (i) s andB (i) s in (5.18) can be written as A( s ; (k s + 1)=k) andB((ks + 1)=k; s ), respectively, and it is simple algebra to then check that A((ks + 1)=k; s ) =A (i) k for any s2 [k]. Then, to verify (5.19), (i) BH (s) =P h (i)( (i) (n)B (i) s some n, (i) (n 0 )>A (i) k all n 0 <n) =P h (i)( (i) (n)B((ks + 1)=k; s ) some n, (i) (n 0 )>A((ks + 1)=k; s ) all n 0 <n) = (i) Wald ((ks + 1)=k; s ) by (5.15). The proof of (5.20) is similar. The theorem gives simple, closed form critical values (5.18) for calculating the 2k critical valuesfA (i) s ;B (i) s g s2[k] for a stream i whose hypotheses H (i) ;G (i) are simple. Example values of (5.18) for =:05 and =:2 are given in Table 5.1 for k = 2;:::; 10. Note that theA s s are larger than those in Table 3.2 and theB s are smaller than those in Table 3.2. This choice of critical values makes the sequential 104 Benjamini-Hochberg procedure less stringent than the sequential Holm procedure and is one of the reasons why it is less conservative. The other reason, of course, is that the sequential Benjamini-Hochberg procedure adopts the step-up approach instead of the step-down approach taken by the sequential Holm procedure. Table 5.1: Critical values (5.18) of the sequential Benjamini-Hochberg procedure for simple hypotheses, for =:05, =:2, = 0 andk = 2;:::; 10 to two decimal places. k A k ;:::;A 1 B k ;:::;B 1 2 -2.28 -1.59 3.58 2.89 3 -2.69 -2.00 -1.60 4.03 3.33 2.93 4 -2.98 -2.29 -1.89 -1.60 4.33 3.64 3.23 2.95 5 -3.21 -2.52 -2.11 -1.82 -1.60 4.56 3.87 3.47 3.18 2.96 6 -3.39 -2.70 -2.29 -2.01 -1.78 -1.60 4.75 4.06 3.66 3.37 3.15 2.96 7 -3.55 -2.86 -2.45 -2.16 -1.94 -1.76 -1.60 4.91 4.22 3.81 3.53 3.30 3.12 2.97 8 -3.68 -2.99 -2.58 -2.30 -2.07 -1.89 -1.74 -1.60 5.05 4.36 3.95 3.66 3.44 3.26 3.10 2.97 9 -3.80 -3.11 -2.70 -2.42 -2.19 -2.01 -1.86 -1.72 -1.60 5.17 4.48 4.07 3.78 3.56 3.38 3.23 3.09 2.97 10 -3.90 -3.21 -2.81 -2.52 -2.30 -2.12 -1.96 -1.83 -1.71 -1.61 5.28 4.59 4.18 3.89 3.67 3.49 3.33 3.20 3.08 2.98 105 5.4 Simulation Studies In this section, we compare the sequential Benjamini-Hochberg procedure (denoted SBH) with the xed-sample Benjamini-Hochberg [8] procedure (denoted FBH). As far as we know, no other sequential procedure has been proposed in the literature that controls both FDR I and FDR II . In our studies we have chosen the com- monly used values of = :05 and = :2. This same value of is used for the xed-sample Benjamini-Hochberg procedure as well, which does not guarantee FDR II control at a prescribed level, so we have chosen its sample size to make its FDR II approximately the same as that of the SBH procedure in order to have a meaningful comparison with the sequential procedures. Below we present two sets of simulations, the rst in Table 5.2 with independent streams of Bernoulli data, and the second in Table 5.3 with dependent streams of normal data gener- ated from a multivariate normal distribution with non-identity covariance matrix. For each scenario considered below we estimate FDR I ,k 0 =k,FDR II , (kk 0 )=k, expected total sample size EN =E( P k i=1 N (i) ) of all the data streams where N (i) is the total sample size of the ith stream, and relative savings in sample size of SBH using 100,000 Monte Carlo simulated batteries of k sequential tests. In each set of simulations, the data streams and hypotheses tested are similar for each data stream; this is only for the sake of getting a clear picture of the procedures' performance, and this uniformity is not required in order to use the procedures considered. 106 5.4.1 Independent Bernoulli Data Streams Table 5.2 contains the operating characteristics of the above procedures for testing k hypotheses of the form H (i) :p (i) :6 vs. G (i) :p (i) :4; i = 1;:::;k; about the probabilityp (i) of success in theith stream of i.i.d. Bernoulli data; addi- tionally, the streams were generated independently of each other. The sequential log likelihood ratio test statistics were used and the SBH procedure used the crit- ical values in Table 5.1, as described in Section 5.3. The data was generated for each data stream withp (i) =:6 or:4 and the second column of Table 5.2 gives the number of hypotheses for which p (i) = :6. The columns labeled Savings give the percent decrease in expected sample size EN of the SBH relative to FBH. The SBH procedure has much smaller expected sample size than the FBH procedure, saving around 40% to 50% in all the scenarios. Note that FDR I and FDR II are not only less than the prescribed level and, but also belowk 0 =k and (kk 0 )=k, respectively, which is consistent with the theoretic result in Section 5.2.2. The sample sizes of the SBH procedure are also signicantly smaller than those of the SH procedure in Table 3.3 and the dierences are getting more and more noticeable as the number of hypotheses increases. As we mentioned before, FDR-controlling 107 procedures are generally less stringent than FWER-controlling procedures, there- fore it is easier to reach conclusions for every individual hypothesis, thus speeding up the decision making for all the hypotheses. Table 5.2: Operating characteristics of sequential and xed-sample multiple testing procedures controlling false discovery rates fork streams of independent Bernoulli data. k k 0 Procedure FDR I (SE) k 0 =k FDR II (SE) (kk 0 )=k EN(SE) Savings 2 2 SBH 0.0157 (0.0030) 0.025 0.0772 (0.0059) 0.100 61.9 (1.0) FBH 0.0212 (0.0031) 0.0860 (0.0065) 120 48.42% 5 3 SBH 0.0170 (0.0023) 0.030 0.0412 (0.0027) 0.080 193.7 (1.9) FBH 0.0198 (0.0025) 0.0430 (0.0030) 370 47.65% 2 SBH 0.0115 (0.0017) 0.020 0.0628 (0.0044) 0.120 207.2 (1.8) FBH 0.0188 (0.0020) 0.0629 (0.0044) 375 44.75% 10 8 SBH 0.0195 (0.0026) 0.040 0.0201 (0.0015) 0.040 364.5 (3.3) FBH 0.0291 (0.0034) 0.0280 (0.0018) 760 52.04% 5 SBH 0.0114 (0.0014) 0.025 0.0512 (0.0028) 0.100 430.3 (3.1) FBH 0.0191 (0.0016) 0.0533 (0.0030) 770 44.12% 2 SBH 0.0048 (0.0007) 0.010 0.1015 (0.0046) 0.160 462.1 (3.2) FBH 0.0085 (0.0009) 0.1037 (0.0057) 770 39.99% 20 16 SBH 0.0183 (0.0019) 0.040 0.0191 (0.0010) 0.040 763.5 (4.2) FBH 0.0274 (0.0027) 0.0204 (0.0010) 1760 56.62% 10 SBH 0.0114 (0.0010) 0.025 0.0493 (0.0021) 0.100 891.9 (5.0) FBH 0.0208 (0.0013) 0.0544 (0.0021) 1640 45.62% 4 SBH 0.0047 (0.0005) 0.010 0.0854 (0.0039) 0.160 964.7 (4.5) FBH 0.0074 (0.0007) 0.0945 (0.0040) 1700 43.25% 5.4.2 Correlated Normal Data Streams Table 5.3 contains the operating characteristics of the two procedures described above for testing k hypotheses of the form H (i) : (i) 0 vs. G (i) : (i) ; i = 1;:::;k; 108 for known > 0, taken here to be 1, about the mean (i) of i.i.d. normal observa- tions with known variance 1, which makes up theith data stream. = 0:583 is cho- sen for this example to improve the approximation. To investigate the performance of the procedures under dependent data streams, the streams were generated from a k-dimensional multivariate normal distribution with mean = ( (1) ;:::; (k) ), given in the third column of Table 5.3, and four dierent non-identity covariance matrices M 1 ;M 2 ;M 3 , and M 4 , given in Section 3.4.2, which were chosen to give a variety of dierent scenarios of positively and negatively correlated data streams. The interaction of these various combinations of correlations with true or false null hypotheses all show somewhat similar behavior to the case of independent data streams in the previous section, in that the SBH procedure has substantially smaller expected sample size than the FBH procedure in all cases, about a 40% reduction in most cases. Although by Theorem 5.1, FDR I and FDR II are bounded by k 0 P k j=1 1=j =k and (kk 0 ) P k j=1 1=j =k, respectively, the FDR I and FDR II in Table 5.3 are actually all belowk 0 =k and (kk 0 )=k, hence the conser- vative modication is not necessary. Benjamini and Yekutieli [9] proved that the xed-sample Benjamini-Hochberg procedure controls FDR when the test statis- tics have positive regression dependency. The simulations suggest that similar result can be expected for the sequential procedure. However, under what type of dependency the sequential Benjamini-Hochberg procedure controls FDR I and FDR II requires further exploration. 109 Table 5.3: Operating characteristics of sequential and xed-sample multiple testing procedures controlling false discovery rates fork streams of correlated Normal data. True Value Procedure FDR I (SE) k 0 =k FDR II (SE) (kk 0 )=k EN(SE) Savings M 1 (1; 0) SBH 0.0249 (0.0035) 0.025 0.0983 (0.0065) 0.100 9.6 (0.1) FBH 0.0248 (0.0033) 0.0970 (0.0075) 16 35.63% M 1 (1; 0) SBH 0.0228 (0.0047) 0.025 0.0676 (0.0062) 0.100 10.5 (0.2) FBH 0.0293 (0.0043) 0.0626 (0.0053) 20 47.50% M 3 (1; 0; 1; 0) SBH 0.0212 (0.0030) 0.025 0.0767 (0.0045) 0.100 24.0 (0.2) FBH 0.0264 (0.0034) 0.0800 (0.0051) 40 40.00% (1; 1; 0; 0) SBH 0.0163 (0.0036) 0.025 0.0524 (0.0053) 0.100 24.1 (0.4) FBH 0.0249 (0.0042) 0.0578 (0.0053) 44 45.23% M 4 (1; 0; 0; 0; 0; 0) SBH 0.0302 (0.0047) 0.042 0.0213 (0.0016) 0.033 31.3 (0.3) FBH 0.0379(0.0043) 0.0236(0.0017) 72 56.53% (1; 0; 0; 1; 0; 0) SBH 0.0251 (0.0034) 0.033 0.0476 (0.0027) 0.067 34.9 (0.3) FBH 0.0324(0.0037) 0.0483(0.0029) 66 47.12% (1; 1; 0; 0; 0; 0) SBH 0.0225 (0.0038) 0.033 0.0378 (0.0034) 0.067 35.1 (0.5) FBH 0.0319(0.0039) 0.0370(0.0036) 72 51.25% (1; 1; 1; 0; 0; 0) SBH 0.0142 (0.0032) 0.025 0.0478 (0.0044) 0.100 38.3 (0.6) FBH 0.0250(0.0038) 0.0490(0.0048) 72 46.81% (1; 1; 0; 1; 1; 0) SBH 0.0137 (0.0019) 0.017 0.0952 (0.0061) 0.133 39.8 (0.4) FBH 0.0181 (0.0021) 0.0879 (0.0061) 66 39.70% (1; 1; 1; 1; 0; 0) SBH 0.0113 (0.0025) 0.017 0.0826 (0.0057) 0.133 40.2 (0.5) FBH 0.0175 (0.0027) 0.0884 (0.0052) 66 39.09% (1; 1; 1; 1; 1; 0) SBH 0.0069 (0.0014) 0.008 0.1174 (0.0091) 0.167 41.1 (0.4) FBH 0.0095 (0.0016) 0.1226 (0.0081) 66 37.73% 5.5 Discussion The sequential Benjamini-Hochberg procedure proposed in this chapter is a gen- eral method for combining individual sequential tests into a sequential multiple hypothesis testing procedure which controls both the type I and II false discovery rates at prescribed levels. It doesn't require any knowledge or model of the data streams' correlation structure. In Section 5.3, we recommend using the sequential log-likelihood ratio test statistics (5.14) with the closed-form critical values (5.18) for data streams whose hypotheses are simple, or are composite but can be reduced to considering simple 110 hypotheses. For data streams with more complicated hypotheses we recommend use Monte Carlo simulation to determine the critical values, because the distri- bution of the sequential test statistics are generally very complex, which makes it almost impossible to nd a nice closed-form for the critical values. In our simulations in Section 5.4, the sequential Benjamini-Hochberg pro- cedure exhibits much more eciency in terms of smaller average total sam- ple size than the xed-sample Benjamini-Hochberg test. Although, according to Theorem 5.1, FDR I and FDR II are controlled under k 0 P k j=1 1=j =k and (k k 0 ) P k j=1 1=j =k, respectively, the sequential Benjamini-Hochberg pro- cedure actually controls FDR I and FDR II for a variety of correlation structures appear in the simulation studies. The conservative modication should not be used if the original procedure implemented with and controls FDR I and FDR II at the prescribed levels, because the modied procedure with = P k j=1 1=j and = P k j=1 1=j reduces the power of the test, which is especially undesirable when the number of hypotheses tested at the same time is very large. Therefore, it is very important and necessary to study under what type of dependency the procedure controls FDR I and FDR II without modication. If the procedure can be proved to control FDR I and FDR II under various types of dependencies that often arise in practice, more researchers will be able to utilize it in their areas of interest. We look forward to further development on this exciting topic. 111 References [1] D.H. Baillie. Multivariate acceptance sampling-some applications to defence procurement. The Statistician, pages 465{478, 1987. [2] J. Bartro. Ecient three-stage t-tests. Lecture Notes-Monograph Series, pages 105{111, 2006. [3] J. Bartro. Optimal multistage sampling in a boundary-crossing problem. Sequential Analysis, 25(1):59{84, 2006. [4] J. Bartro. Asymptotically optimal multistage tests of simple hypotheses. The Annals of Statistics, 35(5):2075{2105, 2007. [5] J. Bartro and T.L. Lai. Multistage tests of multiple hypotheses. Com- munications in Statistics-Theory and Methods, 39(8):1597{1607, 2010. ISSN 0361-0926. [6] J. Bartro and J. Song. Sequential tests of multiple hypotheses controlling type I and II familywise error rates. Submitted. [7] J. Bartro, T.L. Lai, and M.C. Shih. Sequential Experimentation in Clinical Trials: Design and Analysis. Springer, 2013. [8] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practi- cal and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289{300, 1995. ISSN 0035-9246. [9] Y. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency. Annals of statistics, pages 1165{1188, 2001. [10] ID Bross. Sequential clinical trials. Journal of chronic diseases, 8(3):349, 1958. ISSN 0021-9681. [11] H. Cherno. Sequential analysis and optimal design, volume 8. Society for Industrial Mathematics, 1972. 112 [12] R.J. Cook and V.T. Farewell. Guidelines for monitoring ecacy and toxicity responses in clinical trials. Biometrics, pages 1146{1152, 1994. [13] S.K. De and M. Baron. Sequential bonferroni methods for multiple hypothesis testing with strong control of family-wise error rates i and ii. Sequential Analysis, 31(2):238{262, 2012. [14] S.K. De and M. Baron. Step-up and step-down methods for testing multiple hypotheses in sequential experiments. Journal of Statistical Planning and Inference, 142:2059{2070, 2012. [15] S. Dudoit and M.J. Van Der Laan. Multiple testing procedures with applica- tions to genomics. Springer, 2007. [16] DN Geary. Sequential testing in clinical trials with repeated measurements. Biometrika, 75(2):311, 1988. ISSN 0006-3444. [17] C.R. Genovese, N.A. Lazar, and T. Nichols. Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate* 1. Neuroimage, 15(4):870{878, 2002. ISSN 1053-8119. [18] J.J. Goeman and U. Mansmann. Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics, 24(4):537, 2008. ISSN 1367-4803. [19] J.J. Goeman and A. Solari. The sequential rejection principle of familywise error control. The Annals of Statistics, 38(6):3782{3810, 2010. ISSN 0090- 5364. [20] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer Series in Statistics, 2001. [21] Y. Hochberg. A sharper Bonferroni procedure for multiple tests of signicance. Biometrika, 75(4):800, 1988. ISSN 0006-3444. [22] Y. Hochberg and A. C. Tamhane. Multiple comparison procedures. John Wiley & Sons, Inc., 1987. ISBN 0-471-82222-1. [23] P.G. Hoel, S.C. Port, and C.J. Stone. Introduction to Statistical Theory. Houghton Miin Co., 1971. ISBN 978-0395046371. [24] S. Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2):65{70, 1979. ISSN 0303-6898. [25] G. Hommel. A stagewise rejective multiple test procedure based on a modied Bonferroni test. Biometrika, 75(2):383, 1988. ISSN 0006-3444. 113 [26] C. Jennison and B. W. Turbull. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC, 2000. [27] C. Jennison and B.W. Turnbull. Group sequential tests for bivariate response: Interim analyses of clinical trials with both ecacy and safety endpoints. Biometrics, pages 741{752, 1993. [28] C. Jennison and B.W. Turnbull. Group-sequential analysis incorporating covariate information. Journal of the American Statistical Association, 92 (440):1330{1341, 1997. [29] M. Lee. Analysis of Microarray Gene Expression Data. Springer, 2004. [30] R. Marcus, P. Eric, and K.R. Gabriel. On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63(3):655, 1976. ISSN 0006-3444. [31] MR Masjedi, A. Heidary, F. Mohammadi, AA Velayati, and P. Dokouhaki. Chromosomal aberrations and micronuclei in lymphocytes of patients before and after exposure to anti-tuberculosis drugs. Mutagenesis, 15(6):489, 2000. ISSN 0267-8357. [32] Y. Mei. Ecient scalable schemes for monitoring a large number of data streams. Biometrika, 97(2):419{433, 2010. [33] N. Meinshausen. Hierarchical testing of variable importance. Biometrika, 95 (2):265, 2008. ISSN 0006-3444. [34] T. Nichols and S. Hayasaka. Controlling the familywise error rate in functional neuroimaging: a comparative review. Statistical Methods in Medical Research, 12(5):419{446, 2003. ISSN 0962-2802. [35] P.C. O'Brien. Procedures for comparing samples with multiple endpoints. Biometrics, pages 1079{1087, 1984. [36] P.C. O'Brien and T.R. Fleming. A multiple testing procedure for clinical trials. Biometrics, pages 549{556, 1979. [37] E. Paulson. A sequential procedure for selecting the population with the largest mean fromk normal populations. The Annals of Mathematical Statis- tics, 35(1):174{180, 1964. [38] J.P. Romano and M. Wolf. Stepwise multiple testing as formalized data snoop- ing. Econometrica, 73(4):1237{1282, 2005. 114 [39] Paul R. Rosenbaum. Testing hypotheses in order. Biometrika, 95(1):248{252, 2008. ISSN 0006-3444. [40] S.K. Sarkar. Some probability inequalities for ordered MTP 2 random vari- ables: a proof of the simes conjecture. The Annals of Statistics, 26(2):494{504, 1998. [41] G. A. F. Seber and A. J. Lee. Linear Regression Analysis. Wiley Series in Probability and Statistics. John Wiley and Sons, 2003. [42] D. Siegmund. Sequential Analysis: Tests and Condence Intervals. Springer Series in Statistics. Springer, 1985. [43] D. Siegmund. A sequential clinical trial for comparing three treatments. The Annals of Statistics, 21(1):464{483, 1993. [44] R.J. Simes. An improved Bonferroni procedure for multiple tests of signi- cance. Biometrika, 73(3):751, 1986. ISSN 0006-3444. [45] G. Stefansson, WC Kim, and J.C. Hsu. On condence sets in multiple com- parisons. Statistical Decision Theory and Related Topics IV, 2:89{104, 1988. [46] A.C. Tamhane, Y. Hochberg, and C.W. Dunnett. Multiple test procedures for dose nding. Biometrics, 52(1):21{37, 1996. ISSN 0006-341X. [47] A.C. Tamhane, C.W. Dunnett, J.W. Green, and J.D. Wetherington. Mul- tiple test procedures for identifying the maximum safe dose. Journal of the American Statistical Association, 96(455):835{843, 2001. ISSN 0162-1459. [48] Ajit C. Tamhane and Brent R. Logan. Multiple test procedures for identifying the minimum eective and maximum safe doses of a drug. J. Amer. Statist. Assoc., 97(457):293{301, 2002. ISSN 0162-1459. [49] D.I. Tang, C. Gnecco, and N.L. Geller. Design of group sequential clinical tri- als with multiple endpoints. Journal of the American Statistical Association, 84(407):775{779, 1989. [50] D.I. Tang, N.L. Geller, and S.J. Pocock. On the design and analysis of ran- domized clinical trials with multiple endpoints. Biometrics, 49(1):23{30, 1993. ISSN 0006-341X. [51] A.G. Tartakovsky, X.R. Li, and G. Yaralov. Sequential detection of targets in multichannel systems. Information Theory, IEEE Transactions on, 49(2): 425{445, 2003. 115 [52] J.W. Tukey. Some thoughts on clinical trials, especially problems of multi- plicity. Science, 198(4318):679, 1977. ISSN 0036-8075. [53] A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathemat- ical Statistics, 16(2):117{186, 1945. ISSN 0003-4851. [54] P.H. Westfall and A. Krishen. Optimally weighted, xed sequence and gate- keeper multiple testing procedures. Journal of Statistical Planning and Infer- ence, 99(1):25{40, 2001. ISSN 0378-3758. [55] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (2):301{320, 2005. 116
Abstract (if available)
Abstract
There are many areas of scientific inquiry where it is desired to test multiple statistical hypotheses concerning data accumulated over time. For example, new technology in microarray analysis and neuroimaging often requires testing tens of thousands of hypotheses about streams of data from experiments or imaging runs. Multiple-endpoint biomedical clinical trials is another area with these characteristics, in which the need for sequential sampling can be especially strong to reduce the number of patients exposed to a toxic treatment, or decrease the delay until an efficacious treatment is made available to sick patients. In these examples, sequential analysis methods must be married with multiple testing error control methods to prevent false discoveries. This thesis presents new sequential methods that control the multiple testing error rate, of which we consider the two most commonly-used: familywise error rate (FWER) and false discovery rate (FDR). ❧ For FWER-control we first develop a general framework for rejective sequential multiple testing procedures. This allows us to generalize fixed-sample FWER-controlling procedures to the sequential setting. As an example we develop a sequential procedure for testing hypotheses textit{in order} cite{Rosenbaum2008} and apply it to two clinical trial data sets. Next we propose a general and flexible sequential procedure that allows the combination of standard sequential tests into a sequential multiple testing procedure that simultaneously control both type I and II FWER at prescribed levels, regardless of the data's between-stream dependence. Through extensive simulation studies, the proposed procedure shows both savings in expected sample size and less conservative error control when compared with fixed-sample Holm, sequential Bonferroni, and other previously proposed sequential procedures. ❧ Mirroring our proposal for FWER-control, for FDR-control we propose general and flexible sequential procedures that combine standard sequential tests into sequential multiple testing procedures that either control FDR, or simultaneously control type I and II FDR
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Asymptotically optimal sequential multiple testing with (or without) prior information on the number of signals
PDF
Sequential testing of multiple hypotheses with FDR control
PDF
Large-scale multiple hypothesis testing and simultaneous inference: compound decision theory and data driven procedures
PDF
Large scale inference with structural information
PDF
Sequential analysis of large scale data screening problem
PDF
Topics in selective inference and replicability analysis
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Multi-population optimal change-point detection
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Evaluation of sequential hypothesis tests for cross validation of learning models using big data
PDF
Large-scale inference in multiple Gaussian graphical models
PDF
Statistical and computational approaches for analyzing metagenomic sequences with reproducibility and reliability
PDF
Applications of Stein's method on statistics of random graphs
PDF
Applications of multiple imputations in survival analysis
PDF
Population modeling and Bayesian estimation for the deconvolution of blood alcohol concentration from transdermal alcohol biosensor data
PDF
Evaluating the effects of testing framework and annotation updates on gene ontology enrichment analysis
PDF
Statistical learning in High Dimensions: Interpretability, inference and applications
PDF
Sampling strategies based on existing information in nested case control studies
PDF
Generalized Taylor effect for main financial markets
PDF
Analysis using generalized linear models and its applied computation with R
Asset Metadata
Creator
Song, Jinlin
(author)
Core Title
Sequential testing of multiple hypotheses
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
04/12/2013
Defense Date
03/13/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
false discovery rate,family-wise error rate,multiple hypotheses testing,OAI-PMH Harvest,sequential analysis
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Bartroff, Jay (
committee chair
), Langholz, Bryan (
committee member
), Schumitzky, Alan (
committee member
)
Creator Email
jinlinso@usc.edu,songjl010@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-235593
Unique identifier
UC11293629
Identifier
etd-SongJinlin-1541.pdf (filename),usctheses-c3-235593 (legacy record id)
Legacy Identifier
etd-SongJinlin-1541-2.pdf
Dmrecord
235593
Document Type
Dissertation
Rights
Song, Jinlin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
false discovery rate
family-wise error rate
multiple hypotheses testing
sequential analysis